
Abstract: Operational machine learning requires the orchestration of particularly complex data pipelines, because they incorporate every other type of pipeline, from data engineering to model training to performance monitoring. So providing a general framework for MLOps cannot be done from the perspective of any one point in the graph of data technologies, nor of any particular persona. Everything has to come together.
Apache Airflow is the de facto standard in open source orchestration, and is the glue that can bring it all together. Originally developed at Airbnb for ETL/ELT workloads, data teams everywhere have started to use it as the centerpiece in their MLOps architectures. By allowing users to write pipelines as code, with a rich set of scheduling APIs, and a vast ecosystem of integrations, Airflow allows users to tie together all the different technologies needed for productionizing machine learning pipelines. This presentation provides several examples of how data science teams use Airflow to run models in production, for use cases ranging for everything from fraud detection to classification. We'll end with an example of our own ml pipeline, that incorporates data quality checks, as well as a framework for model monitoring and serving.
Bio: Bio Coming Soon!