Record Level Data Lineage with Trino


In a world with Gen-AI, it is becoming increasingly necessary to be able to track the lineage of data. A crucial aspect of data lineage is tracking the movement and transformation of data records. Data lineage is necessary for Data Operations and Governance that support incident response, legal investigations, and privacy and compliance standards. However, things can go wrong due to the proprietary hand-coded business logic that alters the data and obfuscates provenance. When that happens, the current data lineage systems that operate at the dataset/table level are not very helpful. They require additional analysis effort that can be expensive and time-consuming. In today's data landscape, we need a record-level lineage that can pinpoint the exact source and cause of data issues with minimal manual intervention. This problem has long been neglected due to its complexity, but we have a solution to propose. In this presentation, we will introduce a novel concept and its reference implementation using Starburst Enterprise Platform, which is backed by the c technology.

Session Outline:

Attendees will learn about some scenarios where record level lineage can be important, as well as a methodology to track record level lineage using Trino.


Pankaj Yawale, a Senior Principal Architect at ZoomInfo in Boston, MA, brings a wealth of experience and passion to the tech world. With a bachelor’s degree in Electrical Engineering, Pankaj’s love for technology extends beyond his official role. He thrives on exploring innovations, acquiring new skills, and tackling intricate challenges using cutting-edge tools. For Pankaj, technology isn’t just a job—it’s a hobby and a wellspring of inspiration.
Over his 24-year tenure in the tech industry, Pankaj has worked across diverse business domains. Starting as a programmer proficient in C, C++, and Java, he later transitioned to Enterprise Architecture. However, it was his pivot to Big Data a decade ago that ignited a special fervor within him. Solving data-intensive problems at scale became his forte. Pankaj’s toolkit includes a blend of technologies and patterns, making him a staunch advocate for concepts like Data Mesh and treating data as a valuable product.

Open Data Science




Open Data Science
One Broadway
Cambridge, MA 02142

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Consent to display content from - Youtube
Consent to display content from - Vimeo
Google Maps
Consent to display content from - Google