
Abstract: This session is a hands-on introduction to Machine Learning in Python with scikit-learn. You will learn to build and evaluate predictive models on tabular data using the main tools of the Python data-science stack (Jupyter, numpy, pandas, matplotlib and ... scikit-learn).
Session Outline
- Load a tabular dataset and quickly explore its contents with Jupyter, pandas and matplotlib.
- Train supervised machine learning models (Logistic Regression and Gradient Boosted trees) and evaluate their predictive power with the cross-validation procedure.
- Learn how to preprocess the data to correctly deal with numerical and categorical variables depending on the predictive model you use.
- Learn to automatically tune the hyper-parameters of the predictive pipeline using a Grid Search or a Random Search procedure.
Background Knowledge
Basic Python programming, some familiarity with basic statistical concepts (random variables, expected values, distributions).
Bio: Olivier Grisel is a machine learning engineer at Inria. He is a member of the team of maintainers of the scikit-learn project. Scikit-learn is an Open Source machine learning library written in Python. His work is supported by the Fondation Inria and its partners: https://scikit-learn.fondation-inria.fr/home/