
Abstract: An ML journey typically starts with trying to understand the world, and looking for data that describes it. This leads to an experimentation phase, where we try to use that data to model the parts of the world that were interested in, often because they directly affect our users or our business. Once we have one or more models that deliver good results, its time to move those models into production.
Deploying advanced Machine Learning technology to serve customers and/or business needs requires a rigorous approach and production-ready systems. This is especially true for maintaining and improving model performance over the lifetime of a production application. Unfortunately, the issues involved and approaches available are often poorly understood.
An ML application in production must address all of the issues of modern software development methodology, as well as issues unique to ML and data science. Often ML applications are developed using tools and systems which suffer from inherent limitations in testability, scalability across clusters, training/serving skew, and the modularity and reusability of components. In addition, ML application measurement often emphasizes top level metrics, leading to issues in model fairness as well as predictive performance across user segments.
We discuss the use of ML pipeline architectures for implementing production ML applications, and in particular we review Googles experience with TensorFlow Extended (TFX), as well as the advantages of containerizing pipeline architectures using platforms such as Kubeflow. Google uses TFX for large scale ML applications, and offers an open-source version to the community. TFX scales to very large training sets and very high request volumes, and enables strong software methodology including testability, hot versioning, and deep performance analysis.
Background Knowledge
Basic familiarity with ML modeling and feature engineering
Bio: A data scientist and TensorFlow addict, Robert has a passion for helping developers quickly learn what they need to be productive. Hes used TensorFlow since the very early days and is excited about how its evolving quickly to become even better than it already is. Before moving to data science Robert led software engineering teams for both large and small companies, always focusing on clean, elegant solutions to well-defined needs.

Robert Crowe
Title
TensorFlow Developer Engineer | Google
