Building Modern Search Pipelines with Haystack, Large Language Models and Hybrid Retrieval

Abstract: 

In this talk, we navigate through the latest buzz around semantic search and separate the noise from the meaningful advancements. Is dense retrieval better than BM25’s keyword search? Do large language models outperform smaller transformers? How well do the models generalize to industry corpora? How can we leverage Question Answering?

We investigate those questions by walking through three main sections in the talk:

1) Short introduction into modern search systems, the main trends in 2022, an overview of typical pipelines behind it and typical challenges in the dev cycle
2) Investigating dense retrieval: With all the hype around vector databases and using transformers like BERT for search, it’s often difficult to understand the maturity of dense approaches for real use cases in the industry. We benchmark the performance of different retrieval methods (dense, sparse, hybrid) to understand where they fail, where they shine and which one might be relevant for your own industry use case. On top, we show how you can easily evaluate all popular methods on your own dataset using the open-source framework haystack.
3) Investigating Large Language models for retrieval: They are all around on social media, but are they actually useful for search? Again, we share benchmarks, observations from our own use cases and demonstrate with open-source code how you can evaluate and compare the performance on your own dataset using haystack.

The session is primarily addressing practitioners who want to get an overview of the state-of-the-art retrieval methods out there and open-source code to evaluate them.

Bio: 

Malte Pietsch is CTO & Co-Founder at deepset. His current focus is on building deepset Cloud - a SaaS platform for developers to build, deploy and operate modern NLP pipelines. He holds a M.Sc. with honors from TU Munich and conducted research at Carnegie Mellon University. Before founding deepset he worked as a data scientist for multiple startups. He is an active open-source contributor and author of the NLP framework Haystack.

Open Data Science

 

 

 

Open Data Science
One Broadway
Cambridge, MA 02142
info@odsc.com

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Youtube
Consent to display content from Youtube
Vimeo
Consent to display content from Vimeo
Google Maps
Consent to display content from Google