Real-time Data Science Made Easy


By 2025, analysts estimate that 30% of generated data will be real-time data.  This is 52ZB of real-time data per year and is roughly the amount of total data produced in 2020!  Soon, almost every data scientist and data engineer will be working with real-time data.  The future of data science is real-time. Existing technologies for working with static data fail when applied to real-time data because they have no way to incrementally update calculations and visualizations.  To make such enormous volumes of changing data useful, we need a toolkit for managing data, performing data science, and visualizing results in real-time.  In this talk, we will explore production-quality, real-time data science using the current leading open, real-time technologies:  Kafka, redpanda, ksqlDB, Materialize, and Deephaven Community Core.

Session Outline:
Concepts that will be discussed:
 Key components needed in a real-time data analytics system and how to choose the best open
options for your use case.
 Why are current static technologies, such as SQL and Pandas, unsuited to real-time data science?
(Transactional DB vs streaming table)
 Design patterns for building flexible real-time workflows.
 Efficiently publishing real-time data to many users in a programming language agnostic way.
 Using both real-time and static data in the same query. 
 Data exploration, visualization, and dashboards in real-time.
 Query language selection for data science: SQL (ksqlDB and Materialize) vs data-frame
 Working with time series.
 Real-time AI/ML on streams of data.
 Do current tools, such as Pandas, Matplotlib, and TensorFlow, have a place in a real-time world?


Chip Kent is the chief data scientist at Deephaven Data Labs. He holds a Ph.D. from CalTech, with decades of quantitative, mathematical, and computer science experience. Chip comes from a background in quantitative private investment, using data to make investments at Walleye Capital.

