Deploying Data Science Applications

Abstract: Bridging the gap from research and prototypes to production software continues to be a major challenge for most maturing data science teams. In this workshop we will discuss and demonstrate how various Data Science workflows and tasks are deployed into production.

We will center the discussion around three different use cases and workflows - machine learning models and pipelines, ETL/batch processing jobs, and real-time Analytics. We will start by demonstrating how to quickly deploy a simple machine learning model using Flask and Docker and then extend this into a more holistic end-to-end machine learning pipeline with AWS Sagemaker. Next, we will use Apache Airflow and Kubernetes to deploy and monitor an automated batch processing pipeline. Finally, we will demonstrate how to capture, transform and analyze streaming data in real time using AWS Kinesis and AWS Lambda.

For each of these workflows, we will introduce a specific use case and follow with an exploration of the technologies used to deploy these at the production scale. We will discuss the reasons for these choices - ultimately leading to some higher-level takeaways on matching system requirements to product requirements.

Bio: Garret Hoffman is the Director of Data Science at StockTwits. As the Director of Data Science, Garrett captures, creates, and shapes data from countless endpoints to create systems both visible and invisible on the StockTwits platform. There are times when he’s implementing new machine learning algorithms to improve the member experience. Other times, he uses StockTwits data to shape future products. Outside of StockTwits, Garrett plays and watches basketball, studies machine learning in graduate school, and keeps up on the latest in artificial intelligence advancements.