
Abstract: Document Q&A has emerged as one of the most common generative AI use cases. Retrieval Augmented Generation (RAG) is the typical design pattern to implement document Q&A. This requires a tech stack that generates vector embeddings, stores the vectors for fast retrieval, and integrates the retrievals into a text generation at the end. When dealing with frequently changing data, designing a RAG tech stack that is able to query and return rapidly changing data requires careful design.
We’ll step through building an example document Q&A system that uses open-source models for embeddings and generations (or OpenAI if you do not wish to run everything locally), LangChain to develop LLM agents, and Cassandra for the vector store.
Then we’ll add some user feedback to our UI, collect data needed for the retrievals and generations of our system, and walk through some simple analysis on how to improve the system.
In this workshop, we’ll focus on creating simple components for an actual application using Docker images. Please have Docker and Docker Compose installed. Locally running notebooks will also be used for testing some components of the systems. Examples will all be in Python.
We’ll use open-source embedding and LLM models, but if you’d prefer to simplify the workshop and skip these steps, you can create an OpenAI account and have an OpenAI API key available.
If you plan to use an open-source LLM, we recommend downloading Mistral-7B in advance: wget https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.1-GGUF/resolve/main/mistral-7b-instruct-v0.1.Q4_K_M.gguf
Session Outline:
Section 1 - Architecture patterns for RAG applications
* Review common design patterns for document Q&A systems built with RAG and how to select solutions for each component
Section 2 - Document ingestion via streaming
* Feed documents into a data stream
* Add an open-source embedding model to a locally running text embedding service
Integrate the locally running embedding service with LangChain and load data into the vector store
Section 3 - Create a Q&A agent
* Create a Q&A agent using LangChain
* Setup a basic chat UI
Section 3 - Collect and analyze feedback
* Add user feedback and store results back into the DB
* Analyze feedback logs to understand how to improve performance
Learning objectives: Open-Source LLMs, LangChain, Cassandra, RAG
Background Knowledge:
Python
Bio: Alejandro Cantarero is Field CTO for AI at DataStax, where he works with customers to build and launch successful generative AI products. Alejandro has led AI and ML teams at many companies for more than 10 years with an emphasis on NLP including Shopify, Tribune Publishing, the Los Angeles Times as well as startups in mobile, social media, and blockchain. He has a Ph.D. in mathematics from UCLA.

Alejandro Cantareo, Ph.D.
Title
Field CTO, AI | DataStax
