Creating an Extensible Big Data Platform to Serve Data Scientists and Analysts – 100s of PetaBytes with Realtime Access

Abstract: This talk reflects on the design and architecture of an effective modern big data platform that can ingest, store, and serve 100+ PB of data with minute level latency. We’ll walk you through the typical workflow of a data scientist or data analysts at Uber to explore data, discover desired datasets, access the data, run interactive queries, visualize the output, or prepare derived datasets for advanced analytics and machine learning use cases. The audience will leave the talk with greater insight into how things work in an extensible modern Big Data platform and will be inspired to re-envision their own data platform to make it more generic and flexible for their data scientists and analysts.

The motivation for this talk is Uber's business needs for real-time Big data. Uber’s mission is to ignite opportunities by setting the world in motion. To fulfill this mission, Uber relies heavily on making data-driven decisions in every product area and we need to store and process an ever-increasing amount of data. To this end, we had redesigned traditional Big Data platform solutions to provide faster, more reliable, and more-performant access by adding a few critical technologies that overcome their limitations. In this talk, we will provide a behind-the-scenes look at the current Big data technology landscape, including various existing open-source technologies as well as what we had to build at Uber and open-source to fill the gaps and push the boundaries.

Bio: Reza Shiftehfar currently leads Uber’s Hadoop Platform teams. His teams help build and grow Uber’s reliable and scalable Big Data platform that serves petabytes of data utilizing technologies such as Apache Hadoop, Apache Hive, Apache Kafka, Apache Spark, and Presto. Reza is one of the founding engineers of Uber’s Data team and helped scale Uber's data platform from a few terabytes to over 100 petabytes while reducing big data latency from 24+ hours to minutes. Reza holds a Ph.D. in Computer Science from the University of Illinois, Urbana-Champaign.