Leaner and Greener AI with Quantization in PyTorch


Quantization refers to the practice of taking a neural network's painstakingly-tuned FP32 parameters and rounding that to an integer. Why would anyone possibly want to do that?! For one, it makes the model lighter, reduces how much power it consumes, and speeds up inferences - all without obliterating model accuracy! Sounds unreal, except it isn’t.

In this session, we'll learn more about quantization theory from first principles, and see how you can implement it yourself in PyTorch. We'll break down all of the available approaches to quantize your model, their benefits and pitfalls, and most importantly how you can make an informed decision for your use case. Finally, we put our learnings to the test on a large non-academic model to see how this works in the real world.


Suraj is an ML engineer and developer advocate at Meta AI. In a previous life, he was a data scientist in personal finance. After being bitten by the deep learning bug, he worked in healthcare research (predicting patient risk factors) and behavioral finance (preventing overly-risky trading). Outside of work, you can find him hiking barefoot in the Catskills or being tossed on the Aikido mat.

Open Data Science




Open Data Science
One Broadway
Cambridge, MA 02142

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Consent to display content from - Youtube
Consent to display content from - Vimeo
Google Maps
Consent to display content from - Google