Abstract: Once we've designed an AI/ML application and selected or trained the models, that's really just the beginning. We need our AI-powered services to be resilient and efficient, scalable to demand and adaptable to heterogeneous environments (like using GPUs or TPUs as effectively as possible). Moreover, when we build applications around online inference, we often need to integrate different services: multiple models, data sources, business logic, and more.
Ray Serve was built so that we can easily overcome all of those challenges.
In this class we'll learn to use Ray Serve to compose online inference applications meeting all of these requirements and more. We'll build services that integrate with each other while autoscaling individually, even supporting individual hardware and software requirements -- all using regular Python and often with just one new line of code.
* Introduction to Ray and Ray Serve
* Running large language models (LLMs) with Ray Serve
* Building complex applications -- adding business logic and multiple models
* Ray Serve features and patterns for production deployment
* Learn about Ray and Ray Serve
* Develop an understanding of the various architectural components of Ray Serve.
* Use Ray to serve machine learning models in production environments for online inference
* Combine multiple models to build complex logic, allowing for a more sophisticated machine learning pipeline.
Basic familiarity with Python and web / web-service applications
Bio: Kamil is a technical training lead at Anyscale Inc., where he builds technical training and educational resources for the broader Ray community. Prior to joining Anyscale he co-founded Neptune.ai and worked in the AI consultancy company. Kamil holds M.Sc. in Cognitive Science and B.Sc. in Computer Science. He is passionate about sports.