Tutorial on Anomaly Detection at Scale: Data Engineering Challenges meet Data Science Difficulties
Tutorial on Anomaly Detection at Scale: Data Engineering Challenges meet Data Science Difficulties


This tutorial will showcase a joint effort of Data Engineering setup and Data Science analysis in making a real-time anomaly detection system at scale. In particular, it will address data engineering challenges, like setup and configuration of Kafka, Spark Streaming and Spark cluster and downstream data storage, visualization and alerting. On data science part tutorial goes into difficulties with unsupervised data modeling both in batch and online fashion, implementation challenges in order to scale and touches on ensemble techniques for accuracy improvement and detector selection.

Anomaly detection is predominantly done in unsupervised fashion since labeled data is rarely available or classes are highly imbalanced. To make the problem harder, important anomalies turn out to be contextual or collective rather than just point anomalies in univariate time series. Session attendees will have a chance to hear about use of robust PCA, LSTMs, autoencoders and other methods implemented to serve as anomaly detectors. Methods are implemented in python library keras, but in order to scale and process data coming from Kafka in real-time, they are adjusted to run on Spark cluster utilizing Spark Streaming. At the end of a processing pipeline is bayesian ensemble learning model. Attendees will be able to see it in action and understand how it helps the system to select best detectors dynamically.


Dusan Randjelovic is a Senior Data Scientist at SmartCat.io, currently focused on bayesian ensemble techniques and analysis of time series data for anomaly detection applications. He has been working with various scientific (big) data, from astrophysics to bioinformatics, and is also interested in stream processing and distributed systems. Dusan has teaching experience from Faculty of Sciences, Novi Sad and Faculty of Mathematics, Belgrade, speaking experience from international conferences and is actively involved in local and regional data science communities.

Open Data Science




Open Data Science
One Broadway
Cambridge, MA 02142

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Consent to display content from - Youtube
Consent to display content from - Vimeo
Google Maps
Consent to display content from - Google