Interpreting Features in Deep Networks


Despite significant advances in interpretable machine learning in recent years, many ML models---especially deep networks---to understand and control. One promising new direction in interpretable deep learning aims to understand models by understanding their learned features and internal representation. This tutorial will survey state-of-the-art techniques for feature-level interpretability, with a focus on vision and language processing applications. We'll learn how to automatically discover and describe the function of individual neurons within deep networks, and use these descriptions to identify model failures and improve their robustness. This tutorial is targeted at learners who have experience with neural network model and are interested in gaining a deeper understanding of how they work.

Session Outline:

Module 1: Visualizing and describing learned features. Learn about current techniques for automatically annotating neurons in deep networks with descriptions of their behavior.

Module 2: Extensions. Learn how to extend the technique from module one to describe other aspects of model behavior (e.g. directions in representation space rather than individual neurons) and generate more detailed feature descriptions.

Module 3: Applications. Learn how to use neuron labels to identify sources of bias and unreliability in trained models, and explore techniques for using labels to improve models after they have been trained.

Background Knowledge:

Familiarity with deep learning basics.


Jacob Andreas is the X Consortium Assistant Professor at MIT. His research aims to build intelligent systems that can communicate effectively using language and learn from human guidance. Jacob earned his Ph.D. from UC Berkeley, his M.Phil. from Cambridge (where he studied as a Churchill scholar) and his B.S. from Columbia. As a researcher at Microsoft Semantic Machines, he founded the language generation team and helped develop core pieces of the technology that powers conversational interaction in Microsoft Outlook. He has been the recipient of Samsung's AI Researcher of the Year award, MIT's Kolokotrones teaching award, and paper awards at NAACL and ICML.

Open Data Science




Open Data Science
One Broadway
Cambridge, MA 02142

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Consent to display content from - Youtube
Consent to display content from - Vimeo
Google Maps
Consent to display content from - Google