Sign Up to like & get
recommendations!
0
Published in 2017 at "Neurocomputing"
DOI: 10.1016/j.neucom.2016.10.100
Abstract: Abstract This work describes MPQ-learning, an algorithm that approximates the set of all deterministic non-dominated policies in multi-objective Markov decision problems, where rewards are vectors and each component stands for an objective to maximize. MPQ-learning…
read more here.
Keywords:
method multi;
temporal difference;
multi objective;
mpq learning ... See more keywords