Abstract: Apache Airflow is a tool created by community to programmatically author, schedule and monitor workflows. The biggest advantage of Airflow is the fact that it does not limit the scope of pipelines. Airflow can be used for building Machine Learning models, transferring data or managing the infrastructure.
Because Airflow workflows are written in pure Python, there is no restriction on what a single task can do. It can execute a python callable, bash script or create a new Kubernetes pod. Thanks to that, a single Airflow workflow can include tasks for both setup and teardown of external infrastructure, as well as operations that will be performed using the new resources.
During this talk Tomek Urbaszek and Jarek Potiuk, both Apache Airflow official committers, will provide extensive introduction to Airflow. First part of the lecture will cover the most important components of Airflow. Attendees will become familiar with the definition of Directed Acyclic Graphs, task operators, integration hooks, tasks executors and the scheduler. This will help the audience to better understand underlying concepts of Apache Airflow.
Second part of the talk will cover practical aspects of using Airflow. It will treat about best practices for developing workflows and implementing custom operators. This knowledge should help attendees to avoid potential pitfalls. Finally speakers will provide a guide to when Airflow is the right choice and when other solutions should be considered.
The talk is for all those who want to learn more about Apache Airflow. To fully appreciate the talk attendees should be familiar with basic concepts of scheduling work and distributed systems.
Bio: As the CTO Jared built software house 10-fold: from 6 to 60 people. After a few years of being the CTO, he decided to go back to a full-time engineering role and he works as a Principal Software Engineer in his company (and is super happy about it). Jarek has been working as an engineer in many industries - Telecoms, Mobile app development, Google, Robotics and Artificial Intelligence, Cloud and Open Source data processing. Jarek is currently PMC in the Apache Airflow project.