Introduction to Apache Airflow
Introduction to Apache Airflow

Abstract: 

Apache Airflow is a tool created by community to programmatically author, schedule and monitor workflows. The biggest advantage of Airflow is the fact that it does not limit the scope of pipelines. Airflow can be used for building Machine Learning models, transferring data or managing the infrastructure.

Because Airflow workflows are written in pure Python, there is no restriction on what a single task can do. It can execute a python callable, bash script or create a new Kubernetes pod. Thanks to that, a single Airflow workflow can include tasks for both setup and teardown of external infrastructure, as well as operations that will be performed using the new resources.

During this talk Tomek Urbaszek and Jarek Potiuk, both Apache Airflow official committers, will provide extensive introduction to Airflow. First part of the lecture will cover the most important components of Airflow. Attendees will become familiar with the definition of Directed Acyclic Graphs, task operators, integration hooks, tasks executors and the scheduler. This will help the audience to better understand underlying concepts of Apache Airflow.

Second part of the talk will cover practical aspects of using Airflow. It will treat about best practices for developing workflows and implementing custom operators. This knowledge should help attendees to avoid potential pitfalls. Finally speakers will provide a guide to when Airflow is the right choice and when other solutions should be considered.

The talk is for all those who want to learn more about Apache Airflow. To fully appreciate the talk attendees should be familiar with basic concepts of scheduling work and distributed systems.

Bio: 

Tomek is a Software engineer at Polidea, Apache Airflow committer and book lover. He fancies functional programming because he is graduated mathematician. Every day he tries to make the world a better place.