Machine Learning in R Part I

Abstract: Modern statistics has become almost synonymous with machine learning, a collection of techniques that utilize today's incredible computing power. This two-part course focuses on the available methods for implementing machine learning algorithms in R, and will examine some of the underlying theory behind the curtain. We start with the foundation of it all, the linear model and its generalization, the glm. We look how to assess model quality with traditional measures and cross-validation and visualize models with coefficient plots. Next we turn to penalized regression with the Elastic Net. After that we turn to Boosted Decision Trees utilizing xgboost. Attendees should have a good understanding of linear models and classification and should have R and RStudio installed, along with the `glmnet`, `xgboost`, `boot`, `ggplot2`, `UsingR` and `coefplot` packages.

Linear Models
Learn about the best fit line
Understand the formula interface in R
Understand the design matrix
Fit Models with `lm`
Visualize the coefficients with `coefplot`
Make predictions on new data

Generalized Linear Models
Learn about Logistic Regression for classification
Learn about Poisson Regression for count data
Fit models with `glm`
Visualize the coefficients with `coefplot`

Model Assessment
Compare models
`AIC`
`BIC`

Cross-validation
Learn the reasoning and process behind cross-validation

Elastic Net
Learn about penalized regression with the Lasso and Ridge
Fit models with `glmnet`
Understand the coefficient path
View coefficients with `coefplot`

Boosted Decision Trees
Learn how to make classifications (and regression) using recursive partitioning
Fit models with `xgboost`
Make compelling visualizations with `DiagrammeR`

Bio: Jared Lander is the Chief Data Scientist of Lander Analytics a data science consultancy based in New York City, the Organizer of the New York Open Statistical Programming Meetup and the New York R Conference and an Adjunct Professor of Statistics at Columbia University. With a masters from Columbia University in statistics and a bachelors from Muhlenberg College in mathematics, he has experience in both academic research and industry. His work for both large and small organizations ranges from music and fund raising to finance and humanitarian relief efforts.
He specializes in data management, multilevel models, machine learning, generalized linear models, data management and statistical computing. He is the author of R for Everyone: Advanced Analytics and Graphics, a book about R Programming geared toward Data Scientists and Non-Statisticians alike and is creating a course on glmnet with DataCamp.