
Abstract: Leveraging data to produce algorithmic rules is nothing new. However, we used to second guess these rules, and now we don't. Therefore, a human used to be accountable, and now it's the algorithm... But can it? How do we solve this accountability gap and ethical quandary?
Session Outline
Our workshop dataset involves racial bias in the criminal justice system.
In the first part of this workshop, we will provide some context to the historical roots of prejudice in criminology, how algorithmic decision-making plays a role, and what value interpretation can provide. We will broaden this discussion about the accountability gap with all it's ramifications in safety, reliability, consistency, and inclusiveness, to name a few. We will then underpin the importance of Machine Learning interpretation to make for more complete AI solutions. Completeness means we don't just focus on optimizing predictive performance but allow for mining models for scientific knowledge, making them safe and reliable to use even in rare and unexpected situations, and free from non-discriminatory practices. In a nutshell: Models learn from our data, and we can learn a lot from our models.. but only if we interpret them!
In part two, we kick off the more hands-on portion of the workshop! We will prepare the real-word dataset, train a CatBoost model with it, and one-by-one learn how to use the most popular model-agnostic interpretation methods to interpret the decisions of this and any "black-box" model. We will employ global interpretation methods such as Shapely Values with SHAP and Partial Dependence Plots (PDP). And we also learn to use local interpretation methods such as Local Interpretable Model-Agnostic Explanations (LIME), Anchors, and Counter Factual Explanations with Google's What-If-Tool (WIT).
In the third and final part, we will go over a few ways to tune our models for increased interpretability. To this end, we will propose better feature engineering and selection, mitigating bias in datasets and models, adversarial robustness, enforcing monotonic constraints and regularization as solutions. For this workshop's example, we will improve our models' interpretability with only two methods. Still, it's essential to realize that there are many ways you can tweak the data and the model to make models easier to interpret while making them less prone to attacks, bias, and spurious results.
Background Knowledge
The intended audience is knowledgeable in Python data structures and control flows and has at least a basic understanding of machine learning and using Google Colab.
Bio: Serg Masís has been at the confluence of the internet, application development, and analytics for the last two decades. Currently, he's a Data Scientist at Syngenta, a leading agribusiness company with a mission to improve global food security. Before that role, he co-founded a search engine startup, incubated by Harvard Innovation Labs, that combined the power of cloud computing and machine learning with principles in decision-making science to expose users to new places and events efficiently. Serg is passionate about providing the often-missing link between data and decision-making. His book titled "Interpretable Machine Learning with Python" is scheduled to be released in early 2021 by UK-based publisher Packt.