Scaling AI Applications with Ray

Abstract: The next generation of AI applications will continuously interact with the environment and learn from these interactions. To develop these applications, data scientists and engineers will need to seamlessly scale their work from running interactively to production clusters. In this talk we introduce Ray, a high-performance distributed execution engine, and its libraries for AI workloads. We cover each Ray library in turn, and also show how the Ray API allows these traditionally separate workflows to be composed and run together as one distributed application.

Ray is an open source project being developed at the RISE Lab in UC Berkeley for scalable hyperparameter optimization, distributed deep learning, and reinforcement learning. We focus on the following libraries in this tutorial:

TUNE: Tune is a scalable hyperparameter optimization framework for reinforcement learning and deep learning. Go from running one experiment on a single machine to running on a large cluster with efficient search algorithms without changing your code. Unlike existing hyperparameter search frameworks, Tune targets long-running, compute-intensive training jobs that may take many hours or days to complete, and includes many resource-efficient algorithms designed for this setting.

RLLIB: RLlib is an open-source library for reinforcement learning that offers both a collection of reference algorithms and scalable primitives for composing new ones. In this tutorial we discuss using RLlib to tackle both classic benchmark and applied problems, RLlib's primitives for scalable RL, and how RL workflows can be integrated with data processing and hyperparameter optimization.

Bio: Richard Liaw is a PhD student in BAIR/RISELab at UC Berkeley working with Joseph Gonzalez, Ion Stoica, and Ken Goldberg. He has worked on a variety of different areas, ranging from robotics to reinforcement learning to distributed systems. He is currently actively working on Ray, a distributed execution engine for AI applications; RLlib, a scalable reinforcement learning library; and Tune, a distributed framework for model training.