Abstract: Over the past decade the computation demands of machine learning (ML) workloads have grown much faster than the capabilities of a single processor, including hardware accelerators such as GPUs and TPUs. As a result researchers and practitioners have been left with no choice but to distribute these workloads. Unfortunately, developing distributed applications is very challenging. In this talk I will present two projects we developed at UC Berkeley, Ray (https://github.com/ray-project/ray) and Alpa (https://github.com/alpa-projects/alpa), that dramatically simplify scaling ML workloads
Bio: Ion Stoica is a Professor in the EECS Department at University of California at Berkeley. He does research on cloud computing and networked computer systems. Past work includes Apache Spark, Apache Mesos, Tachyon, Chord DHT, and Dynamic Packet State (DPS). He is an ACM Fellow and has received numerous awards, including the SIGOPS Hall of Fame Award (2015), the SIGCOMM Test of Time Award (2011), and the ACM doctoral dissertation award (2001). In 2013, he co-founded Databricks a startup to commercialize technologies for Big Data processing, and in 2006 he co-founded and Conviva, a startup to commercialize technologies for large scale video distribution.