The Equity Kernel – Approaches for Addressing Underlying Bias in Data Sets

Abstract: 

Humans are biased creatures. Therefore, data around human behavior will reflect these biases. As AI solutions are by design built to pick up on complex patterns, biased data left unchecked will propagate through the AI solution, potentially leading to inequitable outcomes. As more and more decisions with increasing impact are automated through AI, this problem becomes increasingly front and center. However, there are steps that can be taken to mitigate bias both in the underlying data set and in the outcome. This talk will use a case study in higher education to explore our efforts in developing the Equity Kernel – a composite approach to addressing bias in data and AI solutions. Pandata is working with a major university to build an AI solution to increase student engagement with a focus on diversity and inclusion. This talk will address challenges such as quantifying bias in an underlying dataset with known biases and vetting outcomes for equity, equality, and alignment with project goals. An additional challenge is in dealing with “allowable” bias – a Women in Engineering group is likely to be, and is ok to be, mostly female. The Equity Kernel uses a combination of open source and custom metrics and algorithms alongside a human-in-the-loop component to promote equity and inclusion from a dataset with known shortcomings

Bio: 

Hannah Arnson serves as Director of Data Science with Pandata – a Cleveland-based AI consulting firm. There, she leverages her 10+ years of experience to lead AI solution design and development, with a focus on ethical and approachable AI. Hannah began her career as a neuroscientist, receiving a Ph.D. in neuroscience from Washington University in St. Louis, then continuing on to do postdoctoral research. During this time, she developed statistical and mathematical models to better understand topics ranging from the sense of smell to navigation in pigeons. As a data scientist, Hannah is passionate about approaches to understand and mitigate bias in both underlying data sets and AI-based solutions.