Abstract: By 2025, analysts estimate that 30% of generated data will be real-time data. This is 52ZB of real-time data per year and is roughly the amount of total data produced in 2020! Soon, almost every data scientist and data engineer will be working with real-time data. The future of data science is real-time. Existing technologies for working with static data fail when applied to real-time data because they have no way to incrementally update calculations and visualizations. To make such enormous volumes of changing data useful, we need a toolkit for managing data, performing data science, and visualizing results in real-time. In this talk, we will explore production-quality, real-time data science using the current leading open, real-time technologies: Kafka, redpanda, ksqlDB, Materialize, and Deephaven Community Core.
Concepts that will be discussed:
Key components needed in a real-time data analytics system and how to choose the best open
options for your use case.
Why are current static technologies, such as SQL and Pandas, unsuited to real-time data science?
(Transactional DB vs streaming table)
Design patterns for building flexible real-time workflows.
Efficiently publishing real-time data to many users in a programming language agnostic way.
Using both real-time and static data in the same query.
Data exploration, visualization, and dashboards in real-time.
Query language selection for data science: SQL (ksqlDB and Materialize) vs data-frame
Working with time series.
Real-time AI/ML on streams of data.
Do current tools, such as Pandas, Matplotlib, and TensorFlow, have a place in a real-time world?
Bio: Chip Kent is the chief data scientist at Deephaven Data Labs. He holds a Ph.D. from CalTech, with decades of quantitative, mathematical, and computer science experience. Chip comes from a background in quantitative private investment, using data to make investments at Walleye Capital.