Abstract: Deep learning has allowed remarkable results in fields such as Computer Vision and Natural Language Processing, but barriers to entry inhibit many data scientists from using deep learning as an everyday tool. Existing deep learning frameworks require significant programming, and scaling up via distributed computing requires even more work. In this talk, we discuss our new open source library meant to address some of these challenges.
Deep Learning Pipelines is an open source library integrating popular deep learning libraries with Apache Spark. The library aims to help engineers and data scientists familiar with Spark to train and deploy deep learning models into their existing workflows. This package simplifies deep learning in three major ways:
(1) It offers simple APIs built on top of Spark MLlib’s Pipeline APIs, so it integrates well with enterprise machine learning workflows.
(2) It uses Spark under the hood to automatically scale out common deep learning tasks such as inference, featurization and tuning.
(3) It enables data scientists to publish deep learning models as Spark SQL User Defined Functions, which can be used by collaborators with no knowledge of the underlying models.
To illustrate these benefits, we will run a live demo to show how Deep Learning Pipelines can be used to tackle the example use case of image classification.
Bio: Joseph Bradley is an Apache Spark PMC member and Machine Learning Software Engineer at Databricks. Previously, he was a postdoc at UC Berkeley after receiving his Ph.D. in Machine Learning from Carnegie Mellon in 2013.