
Abstract: The Introduction to Machine Learning Workshop will build upon the attendee’s foundation of math and coding knowledge to develop a basic understanding of the most popular machine learning algorithms used in industry today. We will answer such questions as: What are the different types of ML algorithms ? What is Overfitting and how can we avoid it? Why is XGBoost consistently outperform other algorithms?
The first half of the session will include a discussion of both the theory and the implementation of Supervised modeling such as: Linear Regression, Logistic Regression, Decision Trees, Random Forest and XGBoost. We will then shift our focus to Unsupervised models including Clustering and Topic Modeling algorithms. This session will be interactive such that students will be able to develop their own models within the scope of the workshop.
Session Outline:
Linear Regression:
Within this module, we will implement Linear Regression models using two very different approaches. First, we’ll look at LR through the statistical lens, and discuss Linear Regression Assumptions and model implementation and interpretation via Statsmodels in python. We will then develop a Linear Regression using a typical Machine Learning approach using Sklearn in python. We’ll compare these two different approaches and philosophies.
Bias-variance Trade-off:
Variance is the tendency of a model to become overfit (due to overcomplexity) while bias is the tendency of a model to suffer from oversimplified target-predictor relationships. We learn how to reduce both of these types of errors by discovering the optimal ‘degree’ of model complexity.
Logistic Regression:
In this module we will learn about one of the most popular classification algorithms. We will learn about the math and theory behind logistic regression as well as why this algorithm is so popular. Lastly, we will learn how to implement this model using Sklearn in python.
And more ...
Background Knowledge:
Basic working knowledge of Python
Bio: Julia Lintern currently works as an instructor for the Metis Data Science Flex Program. Previously, she worked as a Data Scientist for the New York Times. Julia began her career as a structures engineer designing repairs for damaged aircraft. Julia holds an MA in applied math from Hunter College, where she focused on visualizations of various numerical methods and discovered a deep appreciation for the combination of mathematics and visualizations. During certain seasons of her career, she has also worked on creative side projects such as Lia Lintern, her own fashion label.