Quantization To The Rescue: An Edge AI Story

Abstract: 

Over the last decade, deep neural networks have brought in a resurgence in artificial intelligence, with machines outperforming humans in some of the most popular image recognition problems. But all that jazz comes with its costs – high compute complexity and large memory requirements. These requirements translate to higher power consumption resulting in steep electricity bills and a sizeable carbon footprint. Optimizing model size and complexity thus becomes a necessity for a sustainable future for AI.
Memory and compute complexity optimizations also bring in the promise of unimaginable possibilities with edge AI - self-driving cars, predictive maintenance, smart speakers, body monitoring are only the beginning. The smartphone market, with its reach to nearly 4 billion people, is only a fraction of the potential edge devices waiting to be truly ‘smart’. Think smart hospitals or mining, oil and gas industrial automation and so much more.

In this session, we will talk about challenges in deep neural network (DNN) deployment on embedded systems with resource constraints Quantization, which has been popularly used in mathematics and digital signal processing to map values from a large often continuous set to values in a countable smaller set, now reimagined as a possible solution for compressing DNNs and accelerating inference. It is gaining popularity not only with machine learning frameworks like MATLAB, TensorFlow, and PyTorch but also amidst hardware toolchains like NVIDIA® TensorRT and Xilinx® DNNDK. The core idea behind quantization is the resiliency of neural networks to noise. Deep neural networks, in particular, are trained to pick up key patterns and ignore the noise. This means that the networks can cope with small changes resulting from quantization error, as backed by research indicating the minimal impact of quantization on the overall accuracy of the network. This, coupled with a significant reduction in memory footprint, power consumption, and gains in computational speed, makes quantization an efficient approach for deploying neural networks to embedded hardware.

Bio: 

Ashwathi Nambiar is an experienced Software Engineer. Her interests are AI/ML/DL. Currently, she is working towards efficient embedded deployment of neural networks.