Realtime Datasets using Replays


Generating datasets for ML models is a tried and true (and mundane) workflow - fetch the data from somewhere, sanitize it, normalize it and output it into CSV or another format.

Some folks may have certain pieces of this workflow automated but it is usually quite brittle which makes automating the whole thing an arduous task.

But there is a way to do this without the gymnastics - the pattern is called “replays”.

In this talk, I am going to detail the steps that our team took to be able to generate granular, realtime datasets directly from Kafka streams.

Attendees will leave with real, actionable steps on how to implement self-serve, real-time dataset generation using OSS.


Daniel is the co-founder and CTO of, a streaming data performance monitoring company. Prior to starting his own company, he was a principal engineer at companies like New Relic, InVision, Community and DigitalOcean, and before that spent over ten years in the data center space focusing on integrations and R&D. Dan specializes in distributed system design and is a huge advocate for all things asynchronous.

He has been writing Go since 2014, works primarily in backend, listens to a lot of black metal and prefers Stella's over IPA’s. He resides in Portland, Oregon but is originally from [Riga, Latvia](

Open Data Science




Open Data Science
One Broadway
Cambridge, MA 02142

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Consent to display content from - Youtube
Consent to display content from - Vimeo
Google Maps
Consent to display content from - Google