Pankaj Yawale

Pankaj Yawale

Senior Principal Architect at Zoominfo

    Pankaj Yawale, a Senior Principal Architect at ZoomInfo in Boston, MA, brings a wealth of experience and passion to the tech world. With a bachelor’s degree in Electrical Engineering, Pankaj’s love for technology extends beyond his official role. He thrives on exploring innovations, acquiring new skills, and tackling intricate challenges using cutting-edge tools. For Pankaj, technology isn’t just a job—it’s a hobby and a wellspring of inspiration. Over his 24-year tenure in the tech industry, Pankaj has worked across diverse business domains. Starting as a programmer proficient in C, C++, and Java, he later transitioned to Enterprise Architecture. However, it was his pivot to Big Data a decade ago that ignited a special fervor within him. Solving data-intensive problems at scale became his forte. Pankaj’s toolkit includes a blend of technologies and patterns, making him a staunch advocate for concepts like Data Mesh and treating data as a valuable product.

    All Sessions by Pankaj Yawale

    Day 2 04/24/2024
    4:40 pm - 5:10 pm

    Record Level Data Lineage with Trino

    <span class="etn-schedule-location"> <span class="firstfocus">Data Engineering</span> </span>

    In a world with Gen-AI, it is becoming increasingly necessary to be able to track the lineage of data. A crucial aspect of data lineage is tracking the movement and transformation of data records. Data lineage is necessary for Data Operations and Governance that support incident response, legal investigations, and privacy and compliance standards. However, things can go wrong due to the proprietary hand-coded business logic that alters the data and obfuscates provenance. When that happens, the current data lineage systems that operate at the dataset/table level are not very helpful. They require additional analysis effort that can be expensive and time-consuming. In today's data landscape, we need a record-level lineage that can pinpoint the exact source and cause of data issues with minimal manual intervention. This problem has long been neglected due to its complexity, but we have a solution to propose. In this presentation, we will introduce a novel concept and its reference implementation using Starburst Enterprise Platform, which is backed by the c technology. Attendees will learn about some scenarios where record level lineage can be important, as well as a methodology to track record level lineage using Trino.

    Open Data Science




    Open Data Science
    One Broadway
    Cambridge, MA 02142

    Privacy Settings
    We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
    Consent to display content from - Youtube
    Consent to display content from - Vimeo
    Google Maps
    Consent to display content from - Google