LAUSR.org creates dashboard-style pages of related content for over 1.5 million academic articles. Sign Up to like articles & get recommendations!

Anti-Martingale Proximal Policy Optimization.

Photo by aleexcif from unsplash

Since the sample data after one exploration process can only be used to update network parameters once in on-policy deep reinforcement learning (DRL), a high sample efficiency is necessary to… Click to show full abstract

Since the sample data after one exploration process can only be used to update network parameters once in on-policy deep reinforcement learning (DRL), a high sample efficiency is necessary to accelerate the training process of on-policy DRL. In the proposed method, a submartingale criterion is proposed on the basis of the equivalence relationship between the optimal policy and martingale, and then an advanced value iteration (AVI) method is proposed to conduct value iteration with a high accuracy. Based on this foundation, an anti-martingale (AM) reinforcement learning framework is established to efficiently select the sample data that is conducive to policy optimization. In succession, an AM proximal policy optimization (AMPPO) method, which combines the AM framework with proximal policy optimization (PPO), is proposed to reasonably accelerate the updating process of state value that satisfies the submartingale criterion. Experimental results on the Mujoco platform show that AMPPO can achieve better performance than several state-of-the-art comparative DRL methods.

Keywords: policy optimization; martingale proximal; proximal policy; policy; anti martingale

Journal Title: IEEE transactions on cybernetics
Year Published: 2022

Link to full text (if available)


Share on Social Media:                               Sign Up to like & get
recommendations!

Related content

More Information              News              Social Media              Video              Recommended



                Click one of the above tabs to view related content.