ML Engineering for Production ML Deployments
ML Engineering for Production ML Deployments


Delivering the results of advanced Machine Learning technology to customers requires a rigorous approach and production-ready systems. This is especially true for maintaining and improving model performance over the lifetime of a production application. Unfortunately, the issues involved and approaches available are often poorly understood.

An ML application in production must address all of the issues of modern software development methodology, as well as issues unique to ML and data science. Often ML applications are developed using tools and systems which suffer from inherent limitations in testability, scalability across clusters, training/serving skew, and the modularity and reusability of components. In addition, ML application measurement often emphasizes top level metrics, leading to issues in model fairness as well as predictive performance across user segments.

We discuss the use of ML pipeline architectures for implementing production ML applications, and in particular we review Google’s experience with TensorFlow Extended (TFX), as well as the advantages of containerizing pipeline architectures using platforms such as Kubeflow. Google uses TFX for large scale ML applications, and offers an open-source version to the community. TFX scales to very large training sets and very high request volumes, and enables strong software methodology including testability, hot versioning, and deep performance analysis.


Irene is a Technical Program Manager on the Research & Machine Intelligence team at Google, where she works to bring machine learning into production pipelines through TensorFlow (TFX). Previously, Irene helped Google’s Counter-Abuse Technology team train machine learning models to fight spam and abuse. Prior to joining Google, she spent over a decade at Microsoft working in several engineering and program management roles across the Enterprise and Operating Systems divisions.