LAUSR.org creates dashboard-style pages of related content for over 1.5 million academic articles. Sign Up to like articles & get recommendations!

Behavior Decision of Mobile Robot With a Neurophysiologically Motivated Reinforcement Learning Model

Photo by hajjidirir from unsplash

Online model-free reinforcement learning (RL) approaches play a crucial role in coping with the real-world applications, such as the behavioral decision making in robotics. How to balance the exploration and… Click to show full abstract

Online model-free reinforcement learning (RL) approaches play a crucial role in coping with the real-world applications, such as the behavioral decision making in robotics. How to balance the exploration and exploitation processes is a central problem in RL. A balanced ratio of exploration/exploitation has a great influence on the total learning time and the quality of the learned strategy. Therefore, various action selection policies have been presented to obtain a balance between the exploration and exploitation procedures. However, these approaches are rarely, automatically, and dynamically regulated to the environment variations. One of the most amazing self-adaptation mechanisms in animals is their capacity to dynamically switch between exploration and exploitation strategies. This article proposes a novel neurophysiologically motivated model which simulates the role of medial prefrontal cortex (MPFC) and lateral prefrontal cortex (LPFC) in behavior decision. The sensory input is transmitted to the MPFC, then the ventral tegmental area (VTA) receives a reward and calculates a dopaminergic reinforcement signal, and the feedback categorization neurons in anterior cingulate cortex (ACC) calculate the vigilance according to the dopaminergic reinforcement signal. Then the vigilance is transformed to LPFC to regulate the exploration rate, finally the exploration rate is transmitted to thalamus to calculate the corresponding action probability. This action selection mechanism is introduced to the actor–critic model of the basal ganglia, combining with the cerebellum model based on the developmental network to construct a new hybrid neuromodulatory model to select the action of the agent. Both the simulation comparison with other four traditional action selection policies and the physical experiment results demonstrate the potential of the proposed neuromodulatory model in action selection.

Keywords: action; exploration; reinforcement learning; decision; model; exploration exploitation

Journal Title: IEEE Transactions on Cognitive and Developmental Systems
Year Published: 2022

Link to full text (if available)


Share on Social Media:                               Sign Up to like & get
recommendations!

Related content

More Information              News              Social Media              Video              Recommended



                Click one of the above tabs to view related content.