The Wisdom of the Cloud


What can we learn about data science by watching data science competitions?

During a data science competition like the ones hosted by DrivenData and Kaggle, the leaderboard lists the teams that have submitted models and the scores the top models have achieved.

As the competition proceeds, the scores often improve quickly as teams explore a variety of models and then more slowly as they approach the limits of what’s possible.

Using 170,000 scores from more than 50 competitions hosted by DrivenData, we explore the aggregated behavior of the competing teams.

What patterns can we see?

Based on early returns, can we predict the limits?
What factors influence the time, and number of submissions it takes to reach the performance plateau?
Do models tend to overfit the data as the contest progresses?
And what guidance can we provide for deciding when to stop searching?

In this talk, we will answer these questions and share other observations from the other side of the leaderboard.


Allen Downey is a Professor of Computer Science at Olin College of Engineering in Needham, MA. He is the author of several books related to computer science and data science, including Think Python, Think Stats, Think Bayes, and Think Complexity. Prof Downey has taught at Colby College and Wellesley College, and in 2009 he was a Visiting Scientist at Google. He received his Ph.D. in Computer Science from U.C. Berkeley, and M.S. and B.S. degrees from MIT.

Open Data Science




Open Data Science
One Broadway
Cambridge, MA 02142

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Consent to display content from - Youtube
Consent to display content from - Vimeo
Google Maps
Consent to display content from - Google