In this brief, a parallel Deep Deterministic Policy Gradient (DDPG) algorithm is presented for biped robot gait control. Biped robot gait control is a high-dimensional continuous problem. It is challenging… Click to show full abstract
In this brief, a parallel Deep Deterministic Policy Gradient (DDPG) algorithm is presented for biped robot gait control. Biped robot gait control is a high-dimensional continuous problem. It is challenging to obtain a fast and stable gait. Traditional methods cannot fully utilize autonomous exploration capability of a biped robot. A multiple Actor-Critic (AC) network is established to expand the scope of exploration and improve training efficiency. For optimizing experience replay mechanism, an experience filtering unit is introduced, and a cosine similarity method is used to classify experience. Then, a Markov Decision Process (MDP) model based on knowledge and experience is designed to solve the problem of sparse rewards. Finally, experimental results show that the parallel DDPG algorithm can make the biped robot walk more quickly and stably, and the speed reaches 0.62 m/s.
               
Click one of the above tabs to view related content.