Sign Up to like & get
recommendations!
1
Published in 2022 at "IEEE transactions on cybernetics"
DOI: 10.1109/tcyb.2022.3170355
Abstract: Since the sample data after one exploration process can only be used to update network parameters once in on-policy deep reinforcement learning (DRL), a high sample efficiency is necessary to accelerate the training process of…
read more here.
Keywords:
policy optimization;
martingale proximal;
proximal policy;
policy ... See more keywords