Building Better Machine Learning Models with R

Abstract: With machine learning, it is often difficult to make the leap from the classroom to the real world. Practical applications present challenges that require more advanced approaches for preparing, exploring, modeling, and evaluating the data. Furthermore, in a business setting, the stakes are much higher: an unsuccessful model may allow a competitor to drink your milkshake.

The goal of this workshop is to introduce beginning data scientists to the practical machine learning techniques that are not often found in textbooks, but rather acquired only through hands-on experience. To approximate a cutthroat business environment, we will apply these practical techniques to a real-world case study and simulate a machine learning competition like those found on Kaggle (https://www.kaggle.com/). Example code will be provided in R, but the methods discussed are applicable to any data science toolkit.

In exploring the case study, attendees will learn how to properly evaluate a model, especially in an environment in which models are repeatedly tested to identify the best performer. Participants will consider the impact of noise and outliers, learn tips and tricks for feature engineering, see how to handle missing data, consider approaches for feature selection and dimensionality reduction, and learn more about the problem of imbalanced data. We will also learn from some of Kaggle’s top competitors and discover the ensemble methods and automated pipelines that allow them to produce the highest-performing models.

Bio: Coming soon