While Deep Neural Networks (DNNs) are deriving the major innovations through their powerful automation, we are also witnessing the peril behind automation as a form of bias, such as automated… Click to show full abstract
While Deep Neural Networks (DNNs) are deriving the major innovations through their powerful automation, we are also witnessing the peril behind automation as a form of bias, such as automated racism, gender bias, and adversarial bias. As the societal impact of DNNs grows, finding an effective way to steer DNNs to align their behavior with the human mental model has become indispensable in realizing fair and accountable models. While establishing the way to adjust DNNs to "think like humans'' is in pressing need, there have been few approaches aiming to capture how "humans would think'' when DNNs introduce biased reasoning in seeing a new instance. We propose Interactive Attention Alignment (IAA), a framework that uses the methods for visualizing model attention, such as saliency maps, as an interactive medium that humans can leverage to unveil the cases of DNN's biased reasoning and directly adjust the attention. To realize more effective human-steerable DNNs than state-of-the-art, IAA introduces two novel devices. First, IAA uses Reasonability Matrix to systematically identify and adjust the cases of biased attention. Second, IAA applies GRADIA, a computational pipeline designed for effectively applying the adjusted attention to jointly maximize attention quality and prediction accuracy. We evaluated Reasonability Matrix in Study 1 and GRADIA in Study 2 in the gender classification problem. In Study 1, we found applying Reasonability Matrix in bias detection can significantly improve the perceived quality of model attention from human eyes than not applying Reasonability Matrix. In Study 2, we found using GRADIA significantly improves (1) the human-assessed perceived quality of model attention and (2) model performance in scenarios where the training samples are limited. Based on our observation in the two studies, we present implications for future design in the problem space of social computing and interactive data annotation toward achieving a human-centered steerable AI.
               
Click one of the above tabs to view related content.