LAUSR: mpq learning

Photo from archive.org

A temporal difference method for multi-objective reinforcement learning

Sign Up to like & get
recommendations!
0 Published in 2017 at "Neurocomputing"

DOI: 10.1016/j.neucom.2016.10.100

Abstract: Abstract This work describes MPQ-learning, an algorithm that approximates the set of all deterministic non-dominated policies in multi-objective Markov decision problems, where rewards are vectors and each component stands for an objective to maximize. MPQ-learning… read more here.

Keywords: method multi; temporal difference; multi objective; mpq learning ... See more keywords

LAUSR

You are not signed in:

Sign Up!

A temporal difference method for multi-objective reinforcement learning