Ethan Peck, PhD

Ethan Peck, PhD

Director, Data Engineering at Zoominfo

    Dr. Ethan D. Peck is a Director of Data Engineering in charge of Data Operations for the Product Data organization at Zoominfo. Ethan has over 15 years of experience in data ranging from Data Science to Data Governance and everything in between. Ethan's background ranges from academia where he received his Ph.D. in Atmospheric and Oceanic Sciences doing global climate modeling, to the startup world where he focused on ML-based entity resolution, to DataOps at the publicly listed company Zoominfo. Besides his day job, Ethan is a husband, father of two, and a Shotokan Karate instructor.

    All Sessions by Ethan Peck, PhD

    Day 2 04/24/2024
    4:40 pm - 5:10 pm

    Record Level Data Lineage with Trino

    <span class="etn-schedule-location"> <span class="firstfocus">Data Engineering</span> </span>

    In a world with Gen-AI, it is becoming increasingly necessary to be able to track the lineage of data. A crucial aspect of data lineage is tracking the movement and transformation of data records. Data lineage is necessary for Data Operations and Governance that support incident response, legal investigations, and privacy and compliance standards. However, things can go wrong due to the proprietary hand-coded business logic that alters the data and obfuscates provenance. When that happens, the current data lineage systems that operate at the dataset/table level are not very helpful. They require additional analysis effort that can be expensive and time-consuming. In today's data landscape, we need a record-level lineage that can pinpoint the exact source and cause of data issues with minimal manual intervention. This problem has long been neglected due to its complexity, but we have a solution to propose. In this presentation, we will introduce a novel concept and its reference implementation using Starburst Enterprise Platform, which is backed by the c technology. Attendees will learn about some scenarios where record level lineage can be important, as well as a methodology to track record level lineage using Trino.

    Open Data Science




    Open Data Science
    One Broadway
    Cambridge, MA 02142

    Privacy Settings
    We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
    Consent to display content from - Youtube
    Consent to display content from - Vimeo
    Google Maps
    Consent to display content from - Google