General Training Session: Apache Spark for Data Science Part II

Abstract: Using Apache Spark with Python for Data Science and Machine Learning at Scale Part 2: Assuming attendees been in Part 1 (or have equivalent experience), this hands-on session covers best practices for integrating Apache Spark with all of you favorite Python data science tools, including deep-learning frameworks. We will also learn about using Spark with xgboost, natural language processing integrations (e.g., SpaCy), model parallel tuning with scikit-learn and Spark, and production patterns for inference (i.e., making predictions with trained models).

Bio: Adam Breindel consults and teaches widely on Apache Spark, big data engineering, and machine learning. He supports instructional initiatives and teaches as a senior instructor at Databricks, teaches classes on Apache Spark and on deep learning for O'Reilly, and runs a business helping large firms and startups implement data and ML architectures. Adam's 20 years of engineering experience include streaming analytics, machine learning systems, and cluster management schedulers for some of the world's largest banks, along with web, mobile, and embedded device apps for startups. His first full-time job in tech was on a neural-net-based fraud detection system for debit transactions, back in the bad old days when some neural nets were patented (!) and he's much happier living in the age of amazing open-source data and ML tools today