Abstract: In many real-world applications, data quality and curation and domain knowledge play a much larger role in building successful models than coming up with complex processing techniques and tweaking hyper-parameters.
Therefore, a machine learning toolbox should enable users to understand both data and model, and not burden the practitioner with picking preprocessing steps and hyperparameters.
The dabl library is a first step in this direction. It provides automatic visualization routines and model inspection capabilities while automating away model selection.
This talk will introduce the dabl library and show how to use it to quickly create supervised models and identify modelling and data quality issues.
Bio: Andreas Mueller received his MS degree in Mathematics (Dipl.-Math.) in 2008 from the Department of Mathematics at the University of Bonn. In 2013, he finalized his PhD thesis at the Institute for Computer Science at the University of Bonn. After working as a machine learning scientist at the Amazon Development Center Germany in Berlin for a year, he joined the Center for Data Science at the New York University in the end of 2014. In his current position as assistant research engineer at the Center for Data Science, he works on open source tools for machine learning and data science. He is one of the core contributors of scikit-learn, a machine learning toolkit widely used in industry and academia, for several years, and has authored and contributed to a number of open source projects related to machine learning.
Andreas Mueller, PhD
Author, Research Scientist, Core Developer of scikit-learn | Columbia Data Science Institute
machine-learning-w19 | talks-w19