How to Rigorously Evaluate GenAI Applications


Evaluating GenAI applications is hard. Non-deterministic nature of LLMs and other components make them difficult to measure and improve. In this session, we will walk through the State of the Art techniques of LLM evaluation and best practices in industry.


Pasquale is co-founder and CTO of Relari, a company that helps AI teams rigorously test, validate, and improve GenAI applications throughout the entire development lifecycle. He holds a Ph.D. in aerospace engineering from MIT, where he focused on the reliability of complex AI systems such as autonomous cars. Pasquale Antonante has a PhD from MIT where he researched diagnosability of AI systems. He's also the Co-Founder and CTO of (backed by Y Combinator), the maintainer of the popular open-source GenAI evaluation framework continuous-eval.

Open Data Science




Open Data Science
One Broadway
Cambridge, MA 02142

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Consent to display content from - Youtube
Consent to display content from - Vimeo
Google Maps
Consent to display content from - Google