Despite high predictive performance, machine learning models can be unfair towards specific demographic subgroups characterized by sensitive attributes such as gender or race. This paper presents a novel approach using… Click to show full abstract
Despite high predictive performance, machine learning models can be unfair towards specific demographic subgroups characterized by sensitive attributes such as gender or race. This paper presents a novel approach using Computational Profile Likelihood (CPL) to assess potential bias in neural network decisions with respect to sensitive attributes. CPL estimates the conditional probability of a network's internal neuron excitation levels during predictions. To assess the impact of sensitive attributes on predictions, the CPL distribution of individuals sharing a particular value of a sensitive attribute and a specific outcome (e.g., “women” and “high income”) is compared to a subgroup sharing another value of the sensitive attribute but with the same outcome (e.g., “men” and “high income”). The resulting disparities between distributions can be used to quantify the bias with respect to the sensitive attribute and the outcome class. We also assess the efficacy of bias reduction techniques through their influence on the resulting disparities. Experimental results on three widely used datasets indicate that the CPL of the trained models can be used to characterize significant differences between multiple protected groups, highlighting that these models display quantifiable biases. Furthermore, after applying bias mitigation methods, the gaps in CPL distributions are reduced, indicating a more similar internal representation for profiles of different protected groups.
               
Click one of the above tabs to view related content.