General Training Session: Apache Spark for Data Science Part I

Abstract: Using Apache Spark with Python for Data Science and Machine Learning at Scale Part 1: Assuming no prior experience with Apache Spark, this hands-on session covers when to user Spark, the parallel programming model behind Spark DataFrames, and how to use SparkML (or ""mllib"") to perform feature engineering, model fitting, prediction, evaluation, and hyperparameter tuning on big data sets using Apache Spark.

Bio: Adam Breindel consults and teaches widely on Apache Spark, big data engineering, and machine learning. He supports instructional initiatives and teaches as a senior instructor at Databricks, teaches classes on Apache Spark and on deep learning for O'Reilly, and runs a business helping large firms and startups implement data and ML architectures. Adam's 20 years of engineering experience include streaming analytics, machine learning systems, and cluster management schedulers for some of the world's largest banks, along with web, mobile, and embedded device apps for startups. His first full-time job in tech was on a neural-net-based fraud detection system for debit transactions, back in the bad old days when some neural nets were patented (!) and he's much happier living in the age of amazing open-source data and ML tools today