Sign Up to like & get
recommendations!
0
Published in 2017 at "Neurocomputing"
DOI: 10.1016/j.neucom.2016.08.152
Abstract: Abstract For reinforcement learning tasks with multiple objectives, it may be advantageous to learn stochastic or non-stationary policies. This paper investigates two novel algorithms for learning non-stationary policies which produce Pareto-optimal behaviour (w-steering and Q-steering),…
read more here.
Keywords:
optimal multiobjective;
pareto optimal;
approaches pareto;
reinforcement learning ... See more keywords