
Abstract: In this tutorial, you'll learn everything you wanted to know about scaling your data science work to larger datasets and larger models, while staying in the comfort of the PyData ecosystem (numpy, pandas, scikit-learn, Jupyter notebooks).
Session Outline
* How to reason about when you need to scale your data and machine learning work and when to not;
* How to leverage distribute computation on your local workstation (such as your laptop) to analyze larger datasets and build larger, more complex models;
* How to harness the power of clusters to support larger-than-memory computation, all from the comfort of your own laptop;
* How to do all of this while writing code similar to the numpy, pandas, and/or sckit-learn code you already write.
Session Prerequisites:
https://github.com/coiled/data-science-at-scale
Bio: Hugo Bowne-Anderson is Head of Data Science Evangelism and VP of Marketing at Coiled, a company that makes it simple for organizations to scale their data science and machine learning in Python. He has extensive experience as a data scientist, educator, evangelist, content marketer, and data strategy consultant at DataCamp, the online education platform for all things data. He also has experience teaching basic to advanced data science topics at institutions such as Yale University and Cold Spring Harbor Laboratory, conferences such as SciPy, PyCon, and ODSC and with organizations such as Data Carpentry. He has developed over 30 courses on the DataCamp platform, impacting over 500,000 learners worldwide through his own courses. He also created the weekly data industry podcast DataFramed, which he hosted and produced for 2 years. He is committed to spreading data skills, access to data science tooling, and open source software, both for individuals and the enterprise.

Hugo Bowne-Anderson, PhD
Title
Head of Data Science Evangelism | Coiled
