Advanced Machine Learning with scikit-learn, Part I

Abstract: Scikit-learn is a machine learning library in Python, that has become a valuable tool for many data science practitioners. This training will cover some of the more advanced aspects of scikit-learn, such as building complex machine learning pipelines, advanced model evaluation, feature engineering and working with imbalanced datasets. We will also work with text data using the bag-of-word method for classification.

This workshop assumes familiarity with Jupyter notebooks and basics of pandas, matplotlib and numpy. It also assumes some familiarity with the API of scikit-learn and how to do cross-validations and grid-search with scikit-learn.

Bio: Thomas Fan is a Software Developer at Columbia University's Data Science Institute. He collaborates with the scikit-learn community to develop features, review code, and resolve issues. On his free time, Thomas contributes to skorch, a scikit-learn compatible neural network library that wraps PyTorch.