Abstract: Document Q&A has emerged as one of the most common generative AI use cases. Retrieval Augmented Generation (RAG) is the typical design pattern to implement document Q&A. This requires a tech stack that generates vector embeddings, stores the vectors for fast retrieval, and integrates the retrievals into a text generation at the end. When dealing with frequently changing data, designing a RAG tech stack that is able to query and return rapidly changing data requires careful design.
We’ll step through building an example document Q&A system that uses open-source models for embeddings and generations (or OpenAI if you do not wish to run everything locally), LangChain to develop LLM agents, and Cassandra for the vector store.
Then we’ll add some user feedback to our UI, collect data needed for the retrievals and generations of our system, and walk through some simple analysis on how to improve the system.
In this workshop, we’ll focus on creating simple components for an actual application using Docker images. Please have Docker and Docker Compose installed. Locally running notebooks will also be used for testing some components of the systems. Examples will all be in Python.
We’ll use open-source embedding and LLM models, but if you’d prefer to simplify the workshop and skip these steps, you can create an OpenAI account and have an OpenAI API key available.
If you plan to use an open-source LLM, we recommend downloading Mistral-7B in advance: wget https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.1-GGUF/resolve/main/mistral-7b-instruct-v0.1.Q4_K_M.gguf
Section 1 - Architecture patterns for RAG applications
* Review common design patterns for document Q&A systems built with RAG and how to select solutions for each component
Section 2 - Document ingestion via streaming
* Feed documents into a data stream
* Add an open-source embedding model to a locally running text embedding service
Integrate the locally running embedding service with LangChain and load data into the vector store
Section 3 - Create a Q&A agent
* Create a Q&A agent using LangChain
* Setup a basic chat UI
Section 3 - Collect and analyze feedback
* Add user feedback and store results back into the DB
* Analyze feedback logs to understand how to improve performance
Learning objectives: Open-Source LLMs, LangChain, Cassandra, RAG
Bio: Patrick McFadin works at DataStax in Developer Relations and as a committer on the Apache Cassandra project. He is also a co-author on the O’Reilly book “Managing Cloud-Native Data on Kubernetes.” Previous to DataStax, he held positions as Chief Architect, Engineering Lead and Database DBA/Developer.