Real-time Document Q&A using Wikipedia, Open-Source Models, and Cassandra


Document Q&A has emerged as one of the most common generative AI use cases. Retrieval Augmented Generation (RAG) is the typical design pattern to implement document Q&A. This requires a tech stack that generates vector embeddings, stores the vectors for fast retrieval, and integrates the retrievals into a text generation at the end. When dealing with frequently changing data, designing a RAG tech stack that is able to query and return rapidly changing data requires careful design.

We’ll step through building an example document Q&A system that uses open-source models for embeddings and generations (or OpenAI if you do not wish to run everything locally), LangChain to develop LLM agents, and Cassandra for the vector store.

Then we’ll add some user feedback to our UI, collect data needed for the retrievals and generations of our system, and walk through some simple analysis on how to improve the system.

In this workshop, we’ll focus on creating simple components for an actual application using Docker images. Please have Docker and Docker Compose installed. Locally running notebooks will also be used for testing some components of the systems. Examples will all be in Python.

We’ll use open-source embedding and LLM models, but if you’d prefer to simplify the workshop and skip these steps, you can create an OpenAI account and have an OpenAI API key available.

If you plan to use an open-source LLM, we recommend downloading Mistral-7B in advance: wget

Session Outline:

Section 1 - Architecture patterns for RAG applications
* Review common design patterns for document Q&A systems built with RAG and how to select solutions for each component

Section 2 - Document ingestion via streaming
* Feed documents into a data stream
* Add an open-source embedding model to a locally running text embedding service
Integrate the locally running embedding service with LangChain and load data into the vector store

Section 3 - Create a Q&A agent
* Create a Q&A agent using LangChain
* Setup a basic chat UI

Section 3 - Collect and analyze feedback
* Add user feedback and store results back into the DB
* Analyze feedback logs to understand how to improve performance

Learning objectives: Open-Source LLMs, LangChain, Cassandra, RAG

Background Knowledge:



Alejandro Cantarero is Field CTO for AI at DataStax, where he works with customers to build and launch successful generative AI products. Alejandro has led AI and ML teams at many companies for more than 10 years with an emphasis on NLP including Shopify, Tribune Publishing, the Los Angeles Times as well as startups in mobile, social media, and blockchain. He has a Ph.D. in mathematics from UCLA.

Open Data Science




Open Data Science
One Broadway
Cambridge, MA 02142

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Consent to display content from - Youtube
Consent to display content from - Vimeo
Google Maps
Consent to display content from - Google