General Training Session: Solving Real-World Data Problems with Spark

Abstract: In recent years, data teams across a wide range of industries have amassed large amounts of data that are generated by all our interactions in an increasingly digital world. While these exponentially growing datasets contain valuable insights that can improve our businesses and quality of life, they also become increasingly difficult to efficiently process. Fortunately, the open source project Apache Spark has emerged as the leading solution: a general purpose distributed framework for solving the latest data problems faced by the industry. In this training session, participants will learn how to solve real-world data problems by using Spark in a hands-on way.

This session will introduce all of the necessary concepts required for efficiently programming with Spark. We will focus on the higher-level Dataframes and Spark SQL APIs, while discussing just enough of the low-level internals to effectively monitor, debug, and optimize Spark jobs. Participants will gain an intermediate proficiency with Spark by walking through real-world examples and exercises that are inspired by the same problems faced by industry-leading data teams. Additionally, we'll compare and contrast similar technologies in the field of data to help participants evaluate exactly when (and when not) to use Spark.

In addition to using Spark's general-purpose features, we will cover several real-world examples that utilize Spark's machine learning and real-time libraries. We will cover the Spark Streaming and MLlib libraries using an approach of """"learning by doing"""". Specifically, we will be completing hands-on examples that mirror the real-world challenges that are routinely tackled by data teams at top companies in the industry. By completing this training, participants will be comfortable using Spark to solve their own data science and big data problems.

Bio: Judit Lantos is leading Insight Data Science's Data Engineering team in Silicon Valley. Her main focus is the Data Engineering Fellows Program which helps software engineers and academic programmers transition into data engineering careers. She earned her Masters in Engineering Physics from Budapest University and worked in medical imaging R&D for Stanford University before joining Insight.