Advances in reinforcement learning (RL) algorithms have made them become increasingly capable in many tasks in recent years. However, the vast majority of RL algorithms are not readily interpretable off… Click to show full abstract
Advances in reinforcement learning (RL) algorithms have made them become increasingly capable in many tasks in recent years. However, the vast majority of RL algorithms are not readily interpretable off the shelf. Moreover, the task of generating explanations in the form of human language has not been sufficiently addressed for these RL algorithms in previous works. Human language explanations have the advantages of being easy to understand, and they can help increase the satisfaction experienced by the end user while using the product. In this paper, we propose a method for generating explanations in the form of free-text human language to help the end user better understand the behaviors of RL agents. Our work involves generating explanations for both single actions and sequences of actions. We also create an open dataset as a baseline for future research. Our proposed method is evaluated in two simulated environments: Pong and the Minimalistic Gridworld Environment (MiniGgrid). The results demonstrate that our models are able to consistently generate accurate rationales, which are highly correlated with the expert rationales. Hence, this work offers a solution for bridging the gap of trust encountered when employing RL agents in virtual world or real-world applications.
               
Click one of the above tabs to view related content.