
Abstract: Airflow is a leading open-source workflow orchestrator that offers a very wide range of possibilities. It can be integrated with Kubernetes using the KubernetesPodOperator to create pipelines that are extremely customizable. This is what we use to preprocess data and train ML models for PowerOP, Dataswati's SAAS for optimizing food industry production lines.
Session Outline
Part 1: Installation
In this part, you will install MicroK8s on your computer to simulate a Kubernetes cluster. We will then use a Helm chart to deploy Airflow on MicroK8s and retrieve the code repository.
Part 2: Machine Learning
You will discover some simple data processing and machine learning code and how it can be deployed in an Airflow DAG using the KubernetesPodOperator.
Part 3: Play with Airflow
You will understand how to trigger a DAG and pass parameters, and how to schedule it to be run regularly and how to navigate through the airflow UI.
Background Knowledge
Python, Basic Machine Learning, Software installation with command line.
Bio: Luis spent his education and early career in the realm of Transport (Civil Engineering Diploma, MSc Transport from Imperial College London, French Civil Aviation Authority) but he took the train of data science and machine learning in 2014 with Kaggle and Coursera as teachers. He spent about 3 years as a Data Scientist at Quantmetry, a Data Science and AI Consultancy based in Paris where he helped several big companies in multiple sectors develop solutions involving Natural Language Processing, Geospatial data, and Time series. He has recently joined Dataswati to take the lead of the Data Science team. At Dataswati, they develop a SAAS called PowerOP that offers agro-industrial companies easy integration and visualization of their production data as well as explainable and actionable AI services like quality analysis, smart alerting and recipe or settings recommendation.