Record Level Data Lineage with Trino

Abstract: 

In a world with Gen-AI, it is becoming increasingly necessary to be able to track the lineage of data. A crucial aspect of data lineage is tracking the movement and transformation of data records. Data lineage is necessary for Data Operations and Governance that support incident response, legal investigations, and privacy and compliance standards. However, things can go wrong due to the proprietary hand-coded business logic that alters the data and obfuscates provenance. When that happens, the current data lineage systems that operate at the dataset/table level are not very helpful. They require additional analysis effort that can be expensive and time-consuming. In today's data landscape, we need a record-level lineage that can pinpoint the exact source and cause of data issues with minimal manual intervention. This problem has long been neglected due to its complexity, but we have a solution to propose. In this presentation, we will introduce a novel concept and its reference implementation using Starburst Enterprise Platform, which is backed by the c technology.

Session Outline:

Attendees will learn about some scenarios where record level lineage can be important, as well as a methodology to track record level lineage using Trino.

Bio: 

Dr. Ethan D. Peck is a Director of Data Engineering in charge of Data Operations for the Product Data organization at Zoominfo. Ethan has over 15 years of experience in data ranging from Data Science to Data Governance and everything in between. Ethan's background ranges from academia where he received his Ph.D. in Atmospheric and Oceanic Sciences doing global climate modeling, to the startup world where he focused on ML-based entity resolution, to DataOps at the publicly listed company Zoominfo. Besides his day job, Ethan is a husband, father of two, and a Shotokan Karate instructor.

Open Data Science

 

 

 

Open Data Science
One Broadway
Cambridge, MA 02142
info@odsc.com

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Youtube
Consent to display content from - Youtube
Vimeo
Consent to display content from - Vimeo
Google Maps
Consent to display content from - Google