Abstract: Retrieval Augmented Generation (RAG) is by far the most popular LLM application being built today. It allows you to ask questions against your own private data and to return relevant context. That context is then used to augment your prompts before passing them to the LLM (in this case, we’ll use Llama 2!). When it comes to optimizing your LLM application-building workflow, RAG systems should be constructed after you get as far as you can through prompt engineering, but before fine-tuning of the input-output schema. During this workshop, we will walk through each component of a simple RAG system - from vector stores (we’ll use Pinecone) to embedding models (we’ll pick one from Hugging Face’s embedding leaderboard) to the LLM Ops infrastructure glue that holds it all together (LangChain of course!) - including how to think about each component conceptually in addition to how to set them up in Python code. We will also discuss RAG evaluation and apply RAGAS - an emerging best-practice frameworks for assessing the quality of RAG outputs - to our system. We will discuss the best-practices way to improve simple RAG systems, from modeling optimization (e.g., chunk sizing, similarity metric, retrieval method) with RAGAS to using prompt or embedding caches to increase efficiency.
All demo code will be provided via GitHub and/or Colab!
Module 1: Introduction to RAG Systems: An Overview
There are a few primary subsystems of RAG, and we'll review each of the lego blocks that we need to put one of these systems together. We'll focus on the high-level overview of the big picture here and the tools that we'll use.
Module 2: Creating Your First Vector Database with LangChain and Pinecone
In this module we'll learn how we can build LLM application prototypes with LangChain, starting by building our first index (i.e. vector store, or vector database) with some fun open-source data (bring your own!). The vector DB that we'll be using is Pinecone, and we'll select an open-source embedding model from Hugging Face's Embedding's Leaderboard.
Module 3: Building your first LLM Question Answering Chain with Llama 2
Here we'll connect our Retrieval question-answering chain to our vector database and use it add additional context to our prompts! We'll see how this works real-time in code, and we'll even show you (and provide code!) to get this up and running within your own Chainlit app and deployed on a Hugging Face Space!
Module 4: RAG Evaluation with RAGAS
In the final module we'll discuss RAG evaluation and the emerging framework that is getting lots of attention. We'll discuss ways that we can use it to connect our RAG systems inputs and outputs to perform model optimization over our entire system. We'll also discuss other best-practices ways for improving the performance (computational and quality) of our RAG system.
**Who should attend the workshop?**
- Practitioners who want to build open-source Retrieval Augmented Generation (RAG) systems.
- Practitioners who want to understand each component of RAG systems and how they interact.
- Learners who want to leverage the open-source Llama 2 LLM in their applications.
- Learners who want to get up to speed on best-practice RAG evaluation tools and techniques in LLM Ops.
Foundational Machine Learning knowledge and strong python skills
Bio: Chris Alexiuk, is the Head of LLMs at AI Makerspace, where he serves as a programming instructor, curriculum developer, and thought leader for their flagship LLM Ops: LLMs in Production course. During the day, he’s a Founding Machine Learning Engineer at Ox. He is also a solo YouTube creator, Dungeons & Dragons enthusiast, and is based in Toronto, Canada.