Building a Lakehouse with Delta Lake and PySpark

Abstract: 

The lakehouse architecture is a new, open data management architecture that combines the flexibility, cost-efficiency, and scale of data lakes with the data management and ACID transactions of data warehouses, enabling business intelligence (BI) and machine learning (ML) on all data.

In this session, participants will learn:

1. The core components of the Lakehouse architecture
2. Understand how Delta Lake and Spark supports the Lakehouse architecture
3. Perform end-to-end batch and streaming data ingestion to Delta Lake

Session Outline
Lesson 1: The core components of the Lakehouse architecture
The instructor will provide participants with an overview of current architectures of data ingestion and processing, and explain where the Lakehouse comes in. The instructor will explain the technologies used to support the Lakehouse.

Lesson 2: Delta Lake deep-dive
We take a deep dive into Delta Lake and learn what happens 'under the hood'.

Lesson 3: End-to-end batch and streaming use case
We get our hands dirty and build an end-to-end data pipeline for batch and streaming use-cases.

Background Knowledge
This workshop assumes participants have a basic to intermediate skill with Python.

Bio: 

Jonathan is on a mission to help businesses generate value through data. As a Senior Data Engineer at Cuusoo (Mantel Group), he empowers organisations with the technology, skills and processes to unleash insights from big data.

Outside of his nine-to-five, Jonathan trains and upskills the next generation of data professionals as the Instructor of the Data and Analytics Bootcamp at The University of Western Australia.

Open Data Science

 

 

 

Open Data Science
One Broadway
Cambridge, MA 02142
info@odsc.com

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Youtube
Consent to display content from Youtube
Vimeo
Consent to display content from Vimeo
Google Maps
Consent to display content from Google