Reckoning with the Disagreement Problem: Post-Hoc Explanation Agreement as a Training Objective

Abstract: 

As neural networks increasingly make critical decisions in high-stakes settings, monitoring and explaining their behavior in an understandable and trustworthy manner has become a necessity. One commonly used type of explainer is post hoc feature attribution, which is a family of different methods of giving to each feature in a model’s input a score corresponding to the feature’s influence on the model’s output. A major limitation of this family of explainers in practice is that they can disagree on which features are more important than others. Our contribution in this paper is a method of training models with this disagreement problem in mind. We do this by including in the loss function, alongside the standard term corresponding to model performance, an additional term that measures the difference in feature attribution between a pair of explainers. We observe in our experiments on three datasets that models trained with this loss term can see improved explanation consensus on unseen data and on explainers that were not explicitly trained to agree. However, this improved consensus comes with a cost to model performance. Finally, we study how our method influences model outputs and explanations.

Bio: 

Max Cembalest is a researcher at Arthur focused on simplifying and explaining machine learning models. Previously, he received an M.S. in Data Science from Harvard University, where he concentrated on interpretability and graph-based models. He is particularly excited about recent advances in applying abstract algebra, topology, and category theory to neural network design.

Open Data Science

 

 

 

Open Data Science
One Broadway
Cambridge, MA 02142
info@odsc.com

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Youtube
Consent to display content from - Youtube
Vimeo
Consent to display content from - Vimeo
Google Maps
Consent to display content from - Google