
Abstract: It is often desirable that a model output *well-calibrated* probabilities. This tutorial will discuss why calibration is important, how to assess the calibration of a model, and how to calibrate by post-processing the scores. We will review calibration techniques such as Isotonic Regression, Platt scaling, Beta calibration, and Spline calibration and apply them to real-world examples.
This workshop will primarily feature Jupyter Notebooks, with slides used to illustrate some concepts in more detail. The jupyter notebooks will walk through real examples where we build a model, assess how well-calibrated they are, demonstrate various calibration methods, and assess the results.
Section 1: Why calibration?
- What it means for model outputs to be *well-calibrated*.
- Why and when it is important (or not).
- Specific scenarios where calibration may be valuable.
Section 2: Assessing the model
- How to determine if the model is well-calibrated.
- Reliability diagrams and how to use them.
- Issues with calibration for values close to 0/1.
Section 3: Calibrating the model
- Illustration of the various techniques:
- Isotonic Regression
- Beta Calibration
- Platt Scaling
- Spline Calibration
- Demonstrating their use and results on real data.
- Tradeoffs between the approaches.
- Calibrating multi-class models.
Section 4: Assessing the calibration
- Did the calibration improve model performance?
- Are there flaws in the calibration?
- How to adjust the calibration and improve further.
Bio: Brian Lucena is Principal at Numeristical, where he advises companies of all sizes on how to apply modern machine learning techniques to solve real-world problems with data. He is the creator of StructureBoost, ML-Insights, and the SplineCalib calibration tool. In previous roles he has served as Senior VP of Analytics at PCCI, Principal Data Scientist at Clover Health, and Chief Mathematician at Guardian Analytics. He has taught at numerous institutions including UC-Berkeley, Brown, USF, and the Metis Data Science Bootcamp.