Training Gradient Boosting models to solve Classification Problems with CatBoost
Training Gradient Boosting models to solve Classification Problems with CatBoost


Gradient boosting is a powerful machine-learning technique that achieves state-of-the-art results in a variety of practical tasks. For a number of years, it has remained the primary method for learning problems with heterogeneous features, noisy data, and complex dependencies: web search, recommendation systems, weather forecasting, and many others.

CatBoost ( is an open-source gradient boosting library, that outperforms existing publicly available implementations of gradient boosting in terms of quality. It has a set of additional advantages.

1. CatBoost is able to incorporate categorical features in your data (like music genre, device id, URL, etc.) in predictive models with no additional preprocessing. For more details on our approach please refer to our NIPS 2017 ML Systems Workshop paper (
2. CatBoost requires almost no hyperparameter tunning in order to get a model with good quality.
3. CatBoost has the fastest GPU and multi GPU training implementations of all the openly available gradient boosting libraries.

This tutorial will explain main features of the library on the example of solving classification problem.
We will walk you through all the steps of building a good predictive model. We will cover such topics as:

- Choosing suitable loss functions and metrics to optimize
- Training classification model
- Visualizing the process of training and cross-validation
- CatBoost built-in overfitting detector and means of reducing overfitting of gradient boosting models
- Selection of an optimal decision boundary
- Feature selection and explaining model predictions
- Testing trained CatBoost model on an unseen data


Anna Veronika Dorogush graduated from the Faculty of Computational Mathematics and Cybernetics of Lomonosov Moscow State University and from Yandex School of Data Analysis. She used to work at ABBYY, Microsoft, Bing and Google, and has been working at Yandex since 2015, where she currently holds the position of the head of Machine Learning Systems group and is leading the efforts in development of the CatBoost library.

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Consent to display content from - Youtube
Consent to display content from - Vimeo
Google Maps
Consent to display content from - Google