Distributed Analytics with JuliaDB and OnlineStats

Abstract: As datasets get larger, so do the demands on tools for data science. JuliaDB is an analytical database designed for distributed and out-of-core processing of many large data files. It integrates with OnlineStats, a package for calculating statistics and models both on-line and in parallel, allowing you to easily scale analyses to production. OnlineStats is one-of-a-kind, in that everything from summary statistics to advanced statistical learning techniques can be updated with one observation at a time. The underlying data structures use constant memory and estimates can be calculated on infinite-sized data streams.

Bio: Josh is a recent PhD grad from NC State where his research focus was on-line algorithms for statistics. He enjoys working on difficult optimization problems and translating paper solutions into efficient computer programs (particularly with the Julia language). Besides math and computers, he enjoys music, playing guitar, hiking, biking, and ultimate frisbee.