Sell Cron, Buy Airflow: Modern Data Pipelines In Finance
Sell Cron, Buy Airflow: Modern Data Pipelines In Finance

Abstract: 

Quantopian's data pipelines ingest financial data from vendors to produce a unified view of market history repackaged into high-performance formats. In 2018 we entered into a partnership with FactSet and began dramatically expanding the data available on our platform. We selected Apache Airflow as a workflow engine in order to cope with the additional complexity and maintain high availability for our production data systems.

Our community of algorithm authors becomes more productive with every dataset we add to the platform, so we knew that we’d need to deploy new algorithms faster than ever before. In the latter half of 2018, we started building a new production system, the Quantopian Alpha Model (QAM), to allow our investment team to incorporate ideas from many more authors in our community.

We decided to build QAM by applying proven DevOps methodologies like containerization and continuous deployment to data science fundamentals. QAM’s nightly pipeline leverages scientific Python for data processing, Kubernetes for execution, and Apache Airflow for orchestration. QAM is also entirely code-defined and shipped via pull request, allowing developers to perform mid-day code deployments with no involvement from operations.

This development methodology has been a resounding success: our team of four went from creating a repo to placing live trades in three and a half months. Now that we’ve shipped QAM, we’d like to help you ship your data science projects faster, too.

Bio: 

James Meickle is a site reliability engineer at Quantopian, a Boston startup making algorithmic trading accessible to everyone. His current areas of interest include data pipelines, containerization platforms, and continuous delivery. In past roles, he’s been responsible for processing MRI scans at the Center for Brain Science at Harvard University, sales engineering and developer evangelism at AppNeta, and release engineering on a presidential campaign.

Open Data Science

 

 

 

Open Data Science
One Broadway
Cambridge, MA 02142
info@odsc.com

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Youtube
Consent to display content from Youtube
Vimeo
Consent to display content from Vimeo
Google Maps
Consent to display content from Google