Reinforcement learning (RL) can be used to design smart driving policies in complex situations where traditional methods cannot. However, they are frequently black-box in nature, and the resulting policy may… Click to show full abstract
Reinforcement learning (RL) can be used to design smart driving policies in complex situations where traditional methods cannot. However, they are frequently black-box in nature, and the resulting policy may perform poorly, including in scenarios where few training cases are available. In this paper, we propose a method to use RL under two conditions: (i) RL works together with a baseline rule-based driving policy; and (ii) the RL intervenes only when the rule-based method seems to have difficulty handling and when the confidence of the RL policy is high. Our motivation is to use a not-well trained RL policy to reliably improve AV performance. The confidence of the policy is evaluated by Lindeberg-Levy Theorem using the recorded data distribution in the training process. The overall framework is named “confidence-aware reinforcement learning” (CARL). The condition to switch between the RL policy and the baseline policy is analyzed and presented. Driving in a two-lane roundabout scenario is used as the application case study. Simulation results show the proposed method outperforms the pure RL policy and the baseline rule-based policy.
               
Click one of the above tabs to view related content.