LAUSR.org creates dashboard-style pages of related content for over 1.5 million academic articles. Sign Up to like articles & get recommendations!

TVENet: Transformer-based Visual Exploration Network for Mobile Robot in Unseen Environment

Photo by noaa from unsplash

This paper presents a Transformer-based Visual Exploration Network (TVENet) that capably serves as a solution for active perception problems, especially the visual exploration problem: How could a robot that is… Click to show full abstract

This paper presents a Transformer-based Visual Exploration Network (TVENet) that capably serves as a solution for active perception problems, especially the visual exploration problem: How could a robot that is equipped with a camera explore an unknown 3D environment? The TVENet consists of a Mapper, a Global Policy and a Local Policy. The mapper is trained by supervised learning to take the visual observation as input and generate an occupancy grid map for the explored environment. The Global Policy and the Local Policy are trained by reinforcement learning in order to make navigation decision. Most state-of-the-art methods in visual exploration domain use ResNet as feature extractor, and few of them pay attention to the extraction capability of the extractor. Therefore, this paper focuses on enhancing the extraction capability, and proposes a Transformer-based Feature Pyramid Module (TFPM). Moreover, two tricks for training process are introduced to improve the performance. Our experiments in photo-realistic simulated environment (Habitat) demonstrate the higher-accuracy mapping of TVENet. Experimental results prove that the TFPM and tricks have positive impacts on the mapping accuracy of the visual exploration and increase it by 5.31% compared with the state-of-the-art. Most importantly, the TVENet is deployed on a real robot (NVIDIA Jetbot) to prove the feasibility of Embodied AI approaches. To the authors’ knowledge, this paper is the first one that proves the viability of the Embodied AI style approach for visual exploration tasks and deploys the pre-trained model on the NVIDIA Jetson robot.

Keywords: environment; tvenet; transformer based; exploration; robot; visual exploration

Journal Title: IEEE Access
Year Published: 2022

Link to full text (if available)


Share on Social Media:                               Sign Up to like & get
recommendations!

Related content

More Information              News              Social Media              Video              Recommended



                Click one of the above tabs to view related content.