Abstract: Overall, the session aims to equip attendees with the necessary skills and knowledge to apply feature engineering techniques effectively in their Data Science projects. In this session, we will cover the fundamental principles and practical applications of feature engineering as they pertain to time series forecasting. During our workshop, we will explore topics like formulating a feature hypothesis, implementing SQL and Python code to construct features, and integrating your features into a pre-existing machine-learning pipeline. We will provide a hands-on experience with tools and technologies necessary to aid in the design, organization, and automation of feature learning.
During this hands-on session, we will:
-Conduct exploratory data analysis on a time-series sales forecasting data set by exploring the relationship between target and source columns to gain insights into the data.
-Take the initial insights gained in step one and train a baseline model using available raw data.
-To enable feature engineering, we will introduce the concept of data denormalization and work in Python to design features using multiple data tables.
-With a constructed feature set, we will execute the machine-learning pipeline to make predictions on new data and to evaluate whether the manually constructed features lead to improvements in the baseline model.
-We will experiment with automation to identify time-savings and reduce the bias that may occur in developing features based purely on personal experience and intuition.
By the end of this session, attendees will have learned how to effectively apply feature engineering techniques and how to use automation to accelerate their feature engineering process and development work.
Bio: Dr. Joshua Gordon is a data scientist specializing in statistical modeling, machine learning, and deep learning technologies. While earning his Ph.D. in Statistics from the University of California Los Angeles, he published six articles focused on geospatial time series algorithms for earthquake prediction as well as residual analysis for spatial temporal models.
Joshua has over 10 years of experience working as a Data Scientist in the insurance and manufacturing industries prior to joining dotData as a Senior Data Scientist. In dotData, he leads customer-facing data science projects and is responsible for customers’ success with dotData technologies.