
Abstract: Scikit-learn is a machine learning library in Python that is used by many data science practitioners. In this training, we will learn about model evaluation, model calibration, and model inspection. We will start by learning about evaluating a machine learning model after it is trained. We will compare various metrics such as ROC AUC and mean average precision and see how they behave on datasets with different characteristics. We will use scikit-learn's plotting API to easily visualize the performance of a model and to compare multiple models. Next, we will learn about how to calibrate a machine learning model with scikit-learn. A well-calibrated model will predict probabilities that reflect the true likelihood of an event. We will compare models before and after they are calibrated and learn how to visualize an estimators’ calibration. Next, we will learn about techniques used for inspecting machine learning models after they are trained. Specifically, we will see how to inspect open-box machine learning models, such as linear models or simple decision trees. For linear models, their coefficients can be used to see conditional dependencies between a feature and the target. For simple tree-based models, the decision tree can be visualized and we can follow the binary decisions made by the tree. Finally, we will learn about inspection techniques used for more opaque models such as random forests or gradient boosted trees. These inspection techniques include permutation feature importance and partial dependence curves. These techniques are flexible because they can be applied to any machine learning model and gives a glimpse into how the model is generating its predictions.
Bio: Thomas J. Fan is a Staff Associate at the Data Science Institute at Columbia University. He is one of the core developers of scikit-learn, an open-source machine learning library is written in Python. Thomas holds a Masters in Mathematics from NYU and Masters in Physics from Stony Brook University. He also maintains skorch, a scikit-learn compatible neural network library that wraps PyTorch. He believes that developing open-source software is one of the best ways to maximize one's impact.

Thomas Fan
Title
Staff Associate - Machine Learning | Columbia University in the City of New York
