
Abstract: The lakehouse architecture is a new, open data management architecture that combines the flexibility, cost-efficiency, and scale of data lakes with the data management and ACID transactions of data warehouses, enabling business intelligence (BI) and machine learning (ML) on all data.
In this session, participants will learn:
1. The core components of the Lakehouse architecture
2. Understand how Delta Lake and Spark supports the Lakehouse architecture
3. Perform end-to-end batch and streaming data ingestion to Delta Lake
Session Outline
Lesson 1: The core components of the Lakehouse architecture
The instructor will provide participants with an overview of current architectures of data ingestion and processing, and explain where the Lakehouse comes in. The instructor will explain the technologies used to support the Lakehouse.
Lesson 2: Delta Lake deep-dive
We take a deep dive into Delta Lake and learn what happens 'under the hood'.
Lesson 3: End-to-end batch and streaming use case
We get our hands dirty and build an end-to-end data pipeline for batch and streaming use-cases.
Background Knowledge
This workshop assumes participants have a basic to intermediate skill with Python.
Bio: Jonathan is on a mission to help businesses generate value through data. As a Senior Data Engineer at Cuusoo (Mantel Group), he empowers organisations with the technology, skills and processes to unleash insights from big data.
Outside of his nine-to-five, Jonathan trains and upskills the next generation of data professionals as the Instructor of the Data and Analytics Bootcamp at The University of Western Australia.

Jonathan Neo
Title
Senior Data Engineer | Data Bootcamp Instructor | Cuusoo (Mantel Group) | University of Western Australia
