Abstract: While stakeholders needs from a data science team vary considerably, those needs are often met using the same statistical model. I propose a framework in which a data scientist can evaluate a statistical models using one of three approaches — (1) models for feature/predictor evaluation, allowing stakeholders to take specific actions that maximize impact on the model outcome variable (2) models for actionable output, where the outcome variable’s value on a prediction set is of primary interest (3) and machine learning models for real-time prediction, where the algorithm and its value are of interest in their active and continued impact on a user.
In this 90-minute tutorial we will evaluate a set of best practices and methods for each of the three approaches to statistical model evaluation. Leveraging common Python libraries such as scikit-learn and statsmodels, we will examine available methods and model output that generate the desired value for stakeholders depending on their specific question. We will also discuss common stakeholder questions that fall into each of the 3 above categories, how to identify a question as belonging to a category, and common follow-up questions to anticipate in presenting your work.
By the end of this tutorial you will be able to:
identify the most effective libraries in Python (with additional recommendations in R) for each category of statistical model
effectively interpret and leverage available model output
classify stakeholder needs into each of the 3 above categories
prepare stakeholder’s requested output with necessary documentation for a variety of technical proficiencies
Bio: Mona is a Senior Data Scientist at Greenhouse Software in New York City, where they contribute to data-informed decision making across the company and machine learning solutions to improve the hiring process for Greenhouse customers. They’ve previously worked in government, creating analytics and machine learning solutions to improve the lives of New Yorkers, and continue to be involved in civic projects through a number of volunteer and non-profit organizations. They’ve also been a statistics and data science educator with DataCamp, Emeritus, and in university settings. They hold a graduate degree in Developmental Psychology, and are passionate about contributing to the ethical use of data science methodology in the public and private sector.