The Software GPU: Making Inference Scale in the Real World
The Software GPU: Making Inference Scale in the Real World

Abstract: 

What if you could process complex deep learning models at scale without specialized AI hardware? Award-winning MIT mathematics professor Nir Shavit will demonstrate how to make inference scale in the real world using only software on commodity CPUs. The Neural Magic Inference Engine bucks conventional wisdom about high-throughput computing, and may change your mind about machine learning forever.

For nearly a decade, data scientists have believed that deep learning models need the parallel computing power of a GPU to get “good enough” results. GPUs, however, have limited cache memory, which requires a data scientist to sacrifice accuracy for performance gains to achieve inference at scale.

CPUs, by contrast, have much more sophisticated cache memory hierarchies that are perfect for addressing these constraints. This session will show how to run deep neural networks at scale with GPU-class performance on commodity CPUs, with all of the deployment flexibility of a software solution. Data science teams can run models at exceptional speeds, without the expense and complexity of dedicated hardware.

Bio: 

Nir Shavit is the CEO of Neural Magic, a machine learning startup that is currently in stealth mode and is a professor in the Department of Electrical Engineering and Computer Science at MIT. Shavit is a co-author of the book The Art of Multiprocessor Programming, is a recipient of the 2004 Gödel Prize in theoretical computer science and of the 2012 Dijkstra Prize in Distributed Computing, and is an ACM fellow. His recent interests include systems issues in machine learning and techniques for understanding how neural tissue computes by extracting connectivity maps of neural tissue, a field called connectomics.