Abstract: Kafka has been widely adopted as the data streaming platform backbone of the leading companies. However, this rising adoption introduced new challenges. In particular, the growing cluster sizes, increasing volume and diversity of user traffic, and the age of network and server components led to a management overhead. Getting near-optimal performance from such an infrastructure service, maintaining its availability in the face of cascading failures, and achieving these objectives with minimal overhead are critical, but non-trivial tasks. Hence, human intervention alone tends to be insufficient in providing both reactive and proactive mitigation measures.
In this talk, we will share our work and experiences towards alleviating the management overhead of large-scale Kafka clusters using Cruise Control at LinkedIn. The talk will consist of three parts: The first part will provide an overview of Cruise Control, including the operational challenges that it solves, its high-level architecture, and some evaluation results from real-world scenarios. The second part will go through a hands-on tutorial to demonstrate how we can manage a real Kafka cluster using Cruise Control. Finally, we will also have time for a Q&A session at the end of the discussion.
Bio: Adem Efe Gencer received his PhD in Computer Science from Cornell University in 2017. His PhD research has focused on improving the scalability of blockchain technologies.
The protocols introduced in his research (e.g. Bitcoin-NG, Aspen) were adopted by Waves Platform, Aeternity, Cypherium, Enecuum, Ergo Platform, and Legalthings, and are
actively being developed into other systems.
Efe develops Apache Kafka and the ecosystem around it, and supports their operation at LinkedIn. In particular, he works on the design, development, and maintenance of
Cruise Control -- a system for alleviating the management overhead of large-scale Kafka clusters at LinkedIn.