#NOBLACKBOXES: HOW TO SOLVE REAL DATA SCIENCE PROBLEMS WITH AUTOMATION, WITHOUT LOSING TRANSPARENCY

Abstract: Data scientists, and the software they use, are moving further and further away from raw coding in Python and R every day. Both the allure, and the time savings, of having a software package perform everything from laborious ETL to sophisticated ensemble modeling methods are too tempting for enterprise customers to ignore. Yet we know that it is both risky and unethical to trust our data to a "black box" system that does not allow a data scientist to examine her model critically prior to operationalization. RapidMiner's new "Auto Model" allows us to take advantage of automated feature selection and modeling, while always able to "peer under the hood" at any time. This workshop will go through several real-world data science problems using Auto Model as the primary tool for machine learning model development and deployment, and show how data scientists can examine each model created by the software piece-by-piece for transparency and rigor.

Bio: Scott Genzer is a data scientist at RapidMiner, Inc. and currently manages their 300,000+ online user community. He has a B.S. in electrical engineering from Columbia University's School of Engineering and Applied Science, and a M.A. in mathematics education from Columbia University Teachers College. Scott was a K12 mathematics teacher and school administrator for 20 years before transitioning to the world of data science as a freelance consultant in 2013. His unorthodox skill set in this field also includes only knowing how to code in one obscure language (Scheme) and making pure maple syrup from his backyard trees in Norwich, Vermont.

Open Data Science Conference