Unifying Development and Production Environments for Machine Learning Projects


The success of a machine learning project depends on many components: data, algorithms, hardware backend. Developing a machine learning system in production is further complicated by the glaring gap between the development and production environments.

To reduce the overhead cost of communication and collaboration, many companies expect their data scientists and ML engineers to own their projects end-to-end, from data management to modeling to deployment, forcing them to learn tools out of their comfort zones.

In the first part of the workshop, we will cover the challenges of different phases of productionizing machine learning models, as well as the gap between the development and production environments. We will discuss various solutions to address the gap.

In the second half, we will walk over a hands-on tutorial on how to use Metaflow to push the development code from a local machine to production on AWS Batch with a line of code.

Session Outline
Lesson 1: Common myths of machine learning in production
Common misunderstandings of machine learning in production by companies in the early phase of ML adoption.

Lesson 2: The gap between development and production environments
We’ll discuss the difference between development and production phases in terms of the set of tools used for each phase and their requirements.

Lesson 3: Solutions to the gap
We’ll discuss the existing solutions to close the gap, both technical (leveraging containers and compilers technology), as well as organizational (team structure).

Lesson 4: Metaflow tutorials
We’ll show that how infrastructure abstraction tools like Metaflow can help data scientists own their ML products end-to-end without having to worry about lower-level infrastructure details such as setting up Docker containers and Kubernetes.

We will start with a simple machine learning model in a notebook on a local machine, then we will show how to package and run it on AWS Batch by adding one line of code.

Background Knowledge
The first part of the workshop doesn't have any requirements. For the second part, you should be comfortable with Python.


Chip Huyen is an engineer and founder working to develop tools that leverage real-time machine learning. Through her work with Snorkel AI, NVIDIA, and Netflix, she has helped some of the world’s largest organizations deploy machine learning systems. She teaches Machine Learning Systems Design at Stanford. She’s also published four bestselling Vietnamese books.

Open Data Science




Open Data Science
One Broadway
Cambridge, MA 02142

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Consent to display content from - Youtube
Consent to display content from - Vimeo
Google Maps
Consent to display content from - Google