Machine learning and statistics: Don’t mind the gap

Abstract: What are the differences of machine learning and statistics? Answers to this question vary as widely as the array of tools employed by the two disciplines. Although both share more similarities than differences, their cultures and roots as well as the language they use are quite different. More recently, however, we can see a healthy cross-pollination where each field starts to adapt ideas of the other. In this talk, we will look at the ideas the two disciplines have developed, identify those that have already crossed the chasm, and those that are still grounded in one of the two fields. Some examples of such concepts are informative priors, neural networks, uncertainty, regularization, and hierarchical models. We will see how we can combine these various ideas to give us a rich toolbox to solve wide-ranging data science problems. For this recombination to be possible, we need a highly versatile and powerful framework. As I will show, probabilistic programming using PyMC3 allows us to perform both, machine learning and statistics, and blend freely between them to take the best ideas for the current problem that's being solved. Specific examples include hierarchical Bayesian neural networks with informed priors to achieve higher accuracy, and uncertainty around predictions to make better decisions.

Bio: Thomas Wiecki is the VP of Data Science at Quantopian, where he uses probabilistic programming and machine learning to help build the world’s first crowdsourced hedge fund. Among other open source projects, he is involved in the development of PyMC3—a probabilistic programming framework written in Python. A recognized international speaker, Thomas has given talks at various conferences and meetups across the US, Europe, and Asia. He holds a PhD from Brown University.

Open Data Science Conference