Abstract: Deploying large language model (LLM) architectures with billions of parameters can pose significant challenges. Creating generative AI interfaces is difficult enough on its own but add to that the complexity of managing a complex architecture while juggling computational requirements and ensuring efficient resource utilization, and you’ve got a potential recipe for disaster when transitioning your training models to a real-world scenario. LangChain, an open source framework for developing applications powered by LLMs, aims to simplify creating these interfaces by streamlining the use of several neuro-linguistic programming (NLP) components into easily deployable chains. At the same time, Kubernetes can help manage the underlying infrastructure.
This talk walks you through how to smoothly and efficiently transition your trained models to working applications by deploying an end-to-end LLM containerized application built with LangChain in a cloud-native environment using open-source tools like Kubernetes, LangServe, and FastAPI. You'll learn how to deploy a trained model quickly and easily that's designed for scalability, flexibility, and seamless orchestration.
Bio: Ezequiel Lanza is an AI open source evangelist at Intel. He holds an M.S. in Data Science, and he’s passionate about helping people discover the exciting world of artificial intelligence; Ezequiel is a frequent AI conference presenter and the creator of use cases, tutorials, and guides that help developers adopt open-source AI tools.