Sign Up to like & get
recommendations!
1
Published in 2022 at "IEEE transactions on neural networks and learning systems"
DOI: 10.1109/tnnls.2022.3215596
Abstract: Deep off-policy actor-critic algorithms have been successfully applied to challenging tasks in continuous control. However, these methods typically suffer from the poor sample efficiency problem, limiting their widespread adoption in real-world domains. To mitigate this…
read more here.
Keywords:
pessimistic value;
value estimation;
policy;
value ... See more keywords