Abstract: The speakers offer an introduction to Apache Kafka, covering basic concepts, the data lifecycle, features, and APIs. The speaker then walks you through at a high level using Kafka’s data pipeline, stream processing, database replication, and CDC, explains how to evaluate if a Kafka cluster is healthy and how to analyze the performance of a Kafka cluster, and shares LinkedIn’s operation tools for Kafka.
LinkedIn has been using Kafka and Samza to build a stream processing platform that processes 1 trillion messages per day. With the above introduction to Kafka, we will first share in-depth knowledge on how we use Kafka to support realtime data ingestion, database change capture, and data re-processing. We will also share how we use Samza to process the data from Kafka to produce the realtime analytical results. Finally, we will talk about how the results are sent to our online serving stores.
Bio: Xinyu Liu is an Apache Samza comitter and staff software engineer at LinkedIn. His primary interests are internet-scale distributed systems, real-time data processing and web services. Xinyu graduated with a Master degree in Computer Science at the University of Virginia. He joined LinkedIn in October 2012, and first lead the development of messaging platform (LinkedIn Inbox). Then he joined Data Infra team to work on real-time stream processor (Samza). Before LinkedIn, he worked on the trading platform development at Goldman Sachs.
Through his work on Samza, Xinyu built the asynchronous/parallel processing engine which powers many applications for efficient remote I/O access. He contributed the plan and execution layer to the new Samza architecture of unified stream and batch processing. Recently he's been working on SQL, BEAM runner and auto-scaling. As a data infra engineer, Xinyu is passionate about building infrastructure and products that enable advanced data transformation.
Staff Software Engineer at LinkedIn