LAUSR.org creates dashboard-style pages of related content for over 1.5 million academic articles. Sign Up to like articles & get recommendations!

Learning from Oracle Demonstrations – A new approach to develop Autonomous Intersection Management control algorithms based on Multi-Agent Deep Reinforcement Learning

Photo by hajjidirir from unsplash

Worldwide, many companies are working towards safe and innovative control systems for Autonomous Vehicles (AVs). A key component is Autonomous Intersection Management (AIM) systems, which operate at the level of… Click to show full abstract

Worldwide, many companies are working towards safe and innovative control systems for Autonomous Vehicles (AVs). A key component is Autonomous Intersection Management (AIM) systems, which operate at the level of traffic intersections and manage the right-of-way of AVs, improving flow and safety. AIM traditionally uses control policies based on simple rules. However, Deep Reinforcement Learning (DRL) can provide advanced control policies, with the advantage of reacting proactively and forecasting hazardous situations. The main drawback of DRL is training time, which is fast in simple tasks, but not negligible when we address real-world problems with multiple agents. Learning from Demonstrations (LfD) emerges to solve this problem, speeding up training significantly, and reducing the exploration problem. The challenge is that LfD requires an expert to extract new demonstrations. Therefore, in this paper, we propose to use an agent, previously trained by imitation learning, to act as an expert to leverage LfD. We name this new agent Oracle, and our new approach is called Learning from Oracle Demonstrations (LfOD). We have implemented this novel method over the DRL TD3 algorithm, incorporating significant changes to TD3 that allow the use of Oracle demonstrations. The complete version is called TD3fOD. The results obtained in the AIM training scenario show that TD3fOD notably improves the learning process compared with TD3, and DDPGfD, speeding up learning to 5–6 times, while the policy found offers both significantly lower variance and better learning ability. The testing scenario also shows a relevant improvement in multiple key performance metrics compared to other vehicle control techniques on AIM, such as reducing waiting time by more than 90% and significantly decreasing fuel or electricity consumption and emissions, highlighting the benefits of LfOD.

Keywords: reinforcement learning; deep reinforcement; control; intersection management; oracle demonstrations; autonomous intersection

Journal Title: IEEE Access
Year Published: 2022

Link to full text (if available)


Share on Social Media:                               Sign Up to like & get
recommendations!

Related content

More Information              News              Social Media              Video              Recommended



                Click one of the above tabs to view related content.