The underwater unmanned vehicle (UUV) is widely used in various marine operations, in which path planning and trajectory tracking are the critical technologies to achieve autonomous motion planning. Unlike previous… Click to show full abstract
The underwater unmanned vehicle (UUV) is widely used in various marine operations, in which path planning and trajectory tracking are the critical technologies to achieve autonomous motion planning. Unlike previous research methods, this article proposes the asynchronous multithreading proximal policy optimization-based path planning (AMPPO-PP) and trajectory tracking (AMPPO-TT) algorithms and applies these two methods to different task scenarios of UUVs. Taking advantage of the AMPPO, the expensive online computational procedure is converted to an offline training process. The proposed algorithms enable the UUV to learn autonomous planning, tracking, and emergency obstacle avoiding. Besides, the algorithm architecture of the AMPPO-PP and the AMPPO-TT is described in detail. By refining the reward in each timestep and utilizing the reward-shaping trick, the reward sparsity is avoided. The goal-distance heuristic reward function is used to make the UUV explore more directionally. Various simulation environments are developed from simple to complex, along with multiple comparative experiments to verify the effectiveness of the proposed algorithms.
               
Click one of the above tabs to view related content.