Machine Learning in R Part I: Penalized Regression and Boosted Trees
Machine Learning in R Part I: Penalized Regression and Boosted Trees

Abstract: 

In this workshop we go over two fundamentals machine learning methods: Penalized regression and boosted trees. Learn the theory behind the models and get practical hands-on experience using {glmnet} and {xgboost} in R. You'll fit fit the models, assess model fit, tune hyperparameters and make predictions.

Session Outline
- See the curse of dimensionality
- Understand the importance of variable selection
- Learn about penalized regression
- L1 (Lasso)
- L2 (Ridge)
- L1 + L2 (Elastic Net)
- Perform feature engineering with {recipes}
- Fit a lasso regression using {glmnet}
- Choose the best amount of penalty
- Learn the different ways to define """"best""""
- Visualize the coefficients/weights with {coefplot}
- Fit a ridge regression using {glmnet}
- Fit an Elastic Net regression using {glmnet}
- Make predictions using these models

- Learn about decision trees
- Perform feature engineering with {parsnip}
- Fit a decision tree using {xgboost}
- Visualize the tree
- Learn about boosting
- Fit a boosted tree using {xgboost}
- Learn about variable importance
- Learn about the different {xgboost} tuning parameters
- Fit a boosted pseudo random forest using {xgboost}
- Make predictions using these models

Background Knowledge
Basic knowledge of R

Bio: 

Jared Lander is the Chief Data Scientist of Lander Analytics a data science consultancy based in New York City, the Organizer of the New York Open Statistical Programming Meetup and the New York R Conference and an Adjunct Professor of Statistics at Columbia University. With a masters from Columbia University in statistics and bachelors from Muhlenberg College in mathematics, he has experience in both academic research and industry. His work for both large and small organizations ranges from music and fundraising to finance and humanitarian relief efforts.
He specializes in data management, multilevel models, machine learning, generalized linear models, data management and statistical computing. He is the author of R for Everyone: Advanced Analytics and Graphics, a book about R Programming geared toward Data Scientists and Non-Statisticians alike and is creating a course on glmnet with DataCamp.