Reinforcement learning can achieve excellent performance in the field of robotic grasping if the grasping target is stable. However, during applications in the real world, robot needs to overcome the… Click to show full abstract
Reinforcement learning can achieve excellent performance in the field of robotic grasping if the grasping target is stable. However, during applications in the real world, robot needs to overcome the effects of a complex working environment with different types of target objects, so it is more difficult to maintain the quality of action planning, even in the same scene. In order to make an agent have the ability to plan actions in a more adaptive way, the deep attentive deterministic policy gradient algorithm is applied in this article. An attention region proposal network is used to select the message of the pre-exploration area. Then this message is calculated using the adaptive exploration method to regulate the strategy as the target changes. Furthermore, a stratified reward function, which is used to reduce the negative influence of miscellaneous information brought by the sparse reward matrix, is defined according to the distance between the end effector and the center of the pre-exploration area. The results show that the DADPG is able to produce a robust strategy with noise interference, and can train in a more efficient way due to the hierarchical reward function.
               
Click one of the above tabs to view related content.