Data Lake Analytics: Analyzing Millions of NYC Taxi Trips

Abstract: All data today tends to be big data. Companies of all sizes are searching for platforms and tools to manage their rapidly growing data sets. However, despite the number of platforms available, bringing heterogeneous data into a homogenous data lake environment is one of the most daunting aspects of any big data implementation.

Join Arjuna Chala, Sr Director of HPCC Systems as you walk through a real-world case study to illustrate how to manage disparate data with ease and efficiency. Arjuna will walk through how to profile, transform, aggregate, and analyze New York City taxi data to extract some important conclusions –Traffic volume by day, hour, and week for a given area; Number of trips to and from an Airport; Cash transactions vs Credit; and much more using the completely free, open source HPCC Systems platform built by one of the largest data aggregating companies in the world. LexisNexis Risk Solutions built HPCC Systems on the premise that big data is actually a solution and not a problem. Harvesting large data from thousands of sources helps in incorporating a learning-based approach to handling data.


In this talk, you will learn how to:

•Improve the speed and accuracy of the transformation, cleaning, normalization, and aggregation processes
•Enable efficient use of developer resources and development budgets
•Facilitate the use of standard hardware, operating systems, and protocols

Bio: Arjuna Chala is Sr. Director of Special Projects for the HPCC Systems® platform at LexisNexis Risk Solutions®. With almost 20 years of experience in software design, Arjuna leads the development of next generation big data capabilities including creating tools around exploratory data analysis, data streaming and business intelligence. Arjuna strives to understanding new technologies and bring innovative applications and design to the HPCC System platform.

Dedicated to development excellence, Arjuna served as a key member of the team to bring the HPCC Systems platform to the open source community. In his work with HPCC Systems community leaders and system integrator partners, Arjuna’s efforts have contributed to the spread of HPCC Systems technology into the enterprise domestically as well as the international markets of China, Brazil, Europe and India.

Arjuna has a BS in Computer Science from RVCE, Bangalore University.

Open Data Science Conference