GPU-accelerated Data Science with RAPIDS


The PyData ecosystem has grown to millions of data science users, who appreciate its ease of use, consistent syntax, and breadth of features. Traditionally, PyData frameworks were only executable on CPUs, making it difficult for users to take advantage of the increasingly-powerful GPUs that have already revolutionized deep learning and related fields. In this talk, we'll introduce RAPIDS, an open-source framework that brings transparent GPU backends to popular Python APIs, such as those from Pandas, scikit-learn, and NetworkX. We'll show how you can port a huge range of existing workloads to GPU in a matter of minutes and get speedups on the order of 40x or more for common workloads.

The talk will emphasize both data preparation (ETL) and machine learning operations, with a hands-on demonstration of porting a typical workflow from CPU to GPU and measuring the speedup. We’ll go into more detail on real-world applications taking advantage of these speed improvements, including hyperparameter optimization for machine learning models, single-cell genomics analysis, and applications in finance. For large-data users, we’ll discuss some of the options for scaling RAPIDS to multiple GPUs or multiple nodes, emphasizing the tight integration with the Dask ecosystem.


John Zedlewski is the director of GPU-accelerated machine learning on the NVIDIA Rapids team. Previously, he worked on deep learning for self-driving cars at NVIDIA, deep learning for radiology at Enlitic, and machine learning for structured healthcare data at Castlight. He has an MA/ABD in economics from Harvard with a focus in computational econometrics and an AB in computer science from Princeton.