Ryan Blue

Ryan Blue

Co-creator at Apache Iceberg | Committer at Apache Spark | Co-founder and CEO at Tabular

    Ryan is the co-creator of Apache Iceberg and spent the last decade working on big data infrastructure at Netflix, Cloudera, and now Tabular. He is an ASF member and a committer in the Apache Parquet, Avro, and Spark communities.

    All Sessions by Ryan Blue

    Day 2 04/24/2024
    3:50 pm - 4:20 pm

    Best Practices for CDC Pipelines to Apache Iceberg Tables

    <span class="etn-schedule-location"> <span class="firstfocus">Data Engineering</span> </span>

    DE Summit: Change data capture (CDC) is a tool to efficiently stream changes from transactional database tables as they happen. Because distributed analytic tools like Trino or Spark can easily overwhelm transactional databases, CDC streams play a critical role: supplying data to mirror tables in analytic formats like Apache Iceberg. This talk is about best practices for building CDC pipelines that maintain analytic tables, while preserving the ability to work with the change records as a stream. This will include how to avoid pitfalls like weak/eventual consistency as well as how to ensure the Apache Iceberg table is structured to deliver fast query results. You'll learn about: * CDC patterns and how to choose one * Common mistakes to avoid * Iceberg tables and best practices

    Day 2 04/24/2024
    3:50 pm - 4:20 pm

    Best Practices for CDC Pipelines to Apache Iceberg Tables

    <span class="etn-schedule-location"> <span class="firstfocus">Data Engineering</span> </span>

    DE Summit: Change data capture (CDC) is a tool to efficiently stream changes from transactional database tables as they happen. Because distributed analytic tools like Trino or Spark can easily overwhelm transactional databases, CDC streams play a critical role: supplying data to mirror tables in analytic formats like Apache Iceberg. This talk is about best practices for building CDC pipelines that maintain analytic tables, while preserving the ability to work with the change records as a stream. This will include how to avoid pitfalls like weak/eventual consistency as well as how to ensure the Apache Iceberg table is structured to deliver fast query results. You'll learn about: * CDC patterns and how to choose one * Common mistakes to avoid * Iceberg tables and best practices

    Open Data Science

     

     

     

    Open Data Science
    One Broadway
    Cambridge, MA 02142
    info@odsc.com

    Privacy Settings
    We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
    Youtube
    Consent to display content from - Youtube
    Vimeo
    Consent to display content from - Vimeo
    Google Maps
    Consent to display content from - Google