LAUSR: distributed policy

Distributed Off-Policy Temporal Difference Learning Using Primal-Dual Method

Sign Up to like & get
recommendations!
1 Published in 2022 at "IEEE Access"

DOI: 10.1109/access.2022.3211395

Abstract: The goal of this paper is to provide theoretical analysis and additional insights on a distributed temporal-difference (TD)-learning algorithm for the multi-agent Markov decision processes (MDPs) via saddle-point viewpoints. The (single-agent) TD-learning is a reinforcement… read more here.

Keywords: temporal difference; policy temporal; policy; distributed policy ... See more keywords

LAUSR

You are not signed in:

Sign Up!

Distributed Off-Policy Temporal Difference Learning Using Primal-Dual Method