Help! My Classes are Imbalanced!
Help! My Classes are Imbalanced!


Class imbalance is extremely common in real-world datasets. Yet so many data scientists use machine learning algorithms without considering this challenge. The presence of class imbalance violates key assumptions of many algorithms, which leads to flawed models. This talk will:
- Explain why such imbalances occur, drawing on research into the theory of class imbalance, class overlap, and noise.
- Describe error metrics that can expose or mitigate class imbalance (including ROC curves, precision, and recall).
- Discuss some solutions including pre-processing techniques (over- and under-sampling, weighting), modifications to existing algorithms, and post-processing techniques (threshold selection, cost-based classification).

You’ll leave this presentation with the ability to recognize class imbalance and the confidence to overcome it.


Samuel Taylor is a Data Scientist who has worked applying machine learning to such problems as ad targeting, music recommendation, and audience size prediction. With a background in data and software engineering, he has also provided technical leadership on data warehousing, ETL, and business automation efforts. Outside of work, he helps high school students learn to code and tries to teach computers sign language.

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Consent to display content from - Youtube
Consent to display content from - Vimeo
Google Maps
Consent to display content from - Google