LAUSR.org creates dashboard-style pages of related content for over 1.5 million academic articles. Sign Up to like articles & get recommendations!

Visuomotor Reinforcement Learning for Multirobot Cooperative Navigation

Photo by hajjidirir from unsplash

This article investigates the multirobot cooperative navigation problem based on raw visual observations. A fully end-to-end learning framework is presented, which leverages graph neural networks to learn local motion coordination… Click to show full abstract

This article investigates the multirobot cooperative navigation problem based on raw visual observations. A fully end-to-end learning framework is presented, which leverages graph neural networks to learn local motion coordination and utilizes deep reinforcement learning to generate visuomotor policy that enables each robot to move to its goal without the need of environment map and global positioning information. Experimental results show that, with a few tens of robots, our approach achieves comparable performance with the state-ofthe-art imitation learning-based approaches with bird-view state inputs. We also illustrate our generalizability to crowded and large environments and our scalability to ten times number of the training robots. In addition, we demonstrate that our model trained for multirobot case can also improve the success rate in the single-robot navigation task in unseen environments. Note to Practitioners—With the development of intelligent industrial and logistic systems, robotic transportation systems are widely implemented. However, existing multirobot path coordination and navigation approaches are basically under some unreasonable assumptions, which are very hard to be implemented in practical scenarios. This article aims to greatly promote the real application of learning-based multirobot cooperative navigation approach, in order to achieve the following. First, we introduce an end-to-end reinforcement learning framework instead of the Manuscript received June 25, 2021; accepted August 24, 2021. This article was recommended for publication by Associate Editor T. Xu and Editor D. O. Popa upon evaluation of the reviewers’ comments. This work was supported in part by the Natural Science Foundation of China under Grant 62073222 and Grant U1913204, in part by Shanghai Municipal Education Commission and Shanghai Education Development Foundation through “Shu Guang” Project under Grant 19SG08, in part by Shenzhen Science and Technology Program under Grant JSGG20201103094400002, and in part by the Science and Technology Commission of Shanghai Municipality under Grant 21511101900. (Zhe Liu and Qiming Liu contributed equally to this work.) (Corresponding author: Hesheng Wang.) Zhe Liu is with the Department of Computer Science and Technology, University of Cambridge, Cambridge CB2 1TN, U.K. (e-mail: [email protected]). Qiming Liu, Ling Tang, and Hongye Wang are with the Department of Automation, Shanghai Jiao Tong University, Shanghai 200240, China (e-mail: [email protected]; [email protected]; [email protected]). Kefan Jin is with the MOE Key Laboratory of Marine Intelligent Equipment and System and the State Key Laboratory of Ocean Engineering, Shanghai Jiao Tong University, Shanghai 200240, China (e-mail: [email protected]). Ming Liu is with the Department of Electronic and Computer Engineering, The Hong Kong University of Science and Technology, Hong Kong (e-mail: [email protected]). Hesheng Wang is with the Department of Automation, Key Laboratory of System Control and Information Processing of Ministry of Education, Key Laboratory of Marine Intelligent Equipment and System of Ministry of Education, Shanghai Engineering Research Center of Intelligent Control and Management, Shanghai Jiao Tong University, Shanghai 200240, China (e-mail: [email protected]). Color versions of one or more figures in this article are available at https://doi.org/10.1109/TASE.2021.3114327. Digital Object Identifier 10.1109/TASE.2021.3114327 commonly used imitation learning strategy, as the latter one needs exhaustive training data to cover all the scenarios and does not have the required generalizability. Second, we directly use the raw sensor data instead of the commonly used birdeye-view semantic observations, as the latter one is generally not representative of practical application scenario from the robot perspective and cannot solve the occlusion issue. Third, we interpret our learned model to illustrate which parts of the input and shared observations contribute most to the robots’ final actions. The above interpretability ensures predictability (thus safety) of our visuomotor policy in practical applications. Our learned visuomotor policy has the ability to coordinate dozens of robots by only using raw visual observations in unknown environments without map nor global localization information, this is the first time in the literature. Our future work includes solving the sim-to-real issue and conducting physical experiments.

Keywords: multirobot cooperative; reinforcement learning; visuomotor; navigation; science; cooperative navigation

Journal Title: IEEE Transactions on Automation Science and Engineering
Year Published: 2021

Link to full text (if available)


Share on Social Media:                               Sign Up to like & get
recommendations!

Related content

More Information              News              Social Media              Video              Recommended



                Click one of the above tabs to view related content.