Building a Full Stack Low Latency Analytics Pipeline

Abstract: 

To really do low latency analytics right we need to take a full-stack approach to the problem. To get the best user experience, the front end, data processing and storage all need to work together but the same time should still allow for a flexible system.

In this workshop, we will take a look at each of the parts of a real-time analytics pipeline to understand the options available and trade-offs associated with different technologies and techniques in a modern data pipeline.

Session Outline
Module 1: An introduction to low latency processing pipelines

In this module we will cover some popular options for real-time analytics pipelines such as Kafka, Redis in the Open Source space as well as a brief overview of some competitive commercial options.

Module 2: Data storage for low latency analytics

When the going gets tough, the tough cheat. Processing high volumes of data very quickly can tough problem to solve, especially if you need to operate on a budget. In this second module we'll be looking at options for low latency data storage with a particular emphasis on approximate data structures to deliver good results with high performance.

Module 3: Computing and presentation

There's not much point in low latency processing if it takes five minutes to load a dashboard. In this last module we will focus on delivering visualization and analysis of our low latency data sources to the user. In this section we'll be using modern streaming data techniques to provide high performance visualization and "online" approaches to model fitting to help the user derive data insight in real-time.

Background Knowledge
A basic knowledge of multi-tier web application design will be helpful but not necessary. It would also be good to have a working knowledge of Javascript/Typescript as well as a backend language such as Java or Go.

Bio: 

Byron has developed large scale data pipelines and processing systems across a variety of industries including Life Sciences, Advertising and Enterprise Software systems. In particular he focuses on distributed systems with low latency requirements for both read and write workloads. Trained as a Statistician with a focus on statistical computing he is also the author of Real-time Analytics published by John Wiley and Sons, which describes both the operational and computational aspects of delivering these systems at scale.

Open Data Science

 

 

 

Open Data Science
One Broadway
Cambridge, MA 02142
info@odsc.com

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Youtube
Consent to display content from - Youtube
Vimeo
Consent to display content from - Vimeo
Google Maps
Consent to display content from - Google