Scaling AI/ML Workloads with Ray

Abstract: 

Existing production machine learning systems often suffer from various problems that make them hard to use. For example, data scientists and ML practitioners often spend most of their time stitching and managing bespoke distributed systems to build end-to-end ML applications and push models to production.

To address this, the Ray community has built Ray AI Runtime (Ray AIR), an open-source toolkit for building large-scale end-to-end ML applications.

Ray is a distributed compute framework, powering large scale machine learning models such as OpenAI's ChatGPT. By leveraging Ray’s distributed compute strata and library ecosystem, the Ray AI Runtime brings scalability and programmability to ML platforms.

The main focus of the Ray AI Runtime is to provide the compute layer for Python-based AI/ML workloads and is designed to interoperate with popular ML frameworks and other systems for storage and metadata needs.

In this session, we’ll explore and discuss the following:
- Why and what is Ray
- How AIR, built atop Ray, allows you to program and scale your machine learning workloads easily
- AIR’s interoperability and easy integration points with other systems for storage and metadata needs
- AIR’s cutting-edge features for accelerating the machine learning lifecycle such as data preprocessing, last-mile data ingestion, tuning and training, and serving at scale

Key takeaways for attendees are:

- Ray as a general purpose framework for distributed computing
- Understand how Ray AI Runtime can be used to implement scalable, programmable machine learning workflows.
- Learn how to pass and share data across distributed trainers and Ray native libraries: Tune, Serve, Train, RLlib, etc.
- How to scale python-based workloads across supported public clouds

Background Knowledge:

General familiarity with machine learning tools and frameworks is assumed. We won't go deeper into any framework in particular, but attendees should e.g. know that PyTorch and Tensorflow exist and what they are used for.

Bio: 

Kai Fricke is a senior software engineer at Anyscale. As a core maintainer of the Ray AI Runtime he is building software for distributed machine learning training and tuning. During his postdoc at Cambridge he utilized reinforcement learning to optimize large graph structures and co-authored two open source reinforcement learning libraries.

Open Data Science

 

 

 

Open Data Science
One Broadway
Cambridge, MA 02142
info@odsc.com

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Youtube
Consent to display content from Youtube
Vimeo
Consent to display content from Vimeo
Google Maps
Consent to display content from Google