LAUSR.org creates dashboard-style pages of related content for over 1.5 million academic articles. Sign Up to like articles & get recommendations!

A temporal difference method for multi-objective reinforcement learning

Photo from archive.org

Abstract This work describes MPQ-learning, an algorithm that approximates the set of all deterministic non-dominated policies in multi-objective Markov decision problems, where rewards are vectors and each component stands for… Click to show full abstract

Abstract This work describes MPQ-learning, an algorithm that approximates the set of all deterministic non-dominated policies in multi-objective Markov decision problems, where rewards are vectors and each component stands for an objective to maximize. MPQ-learning generalizes directly the ideas of Q-learning to the multi-objective case. It can be applied to non-convex Pareto frontiers and finds both supported and unsupported solutions. We present the results of the application of MPQ-learning to some benchmark problems. The algorithm solves successfully these problems, so showing the feasibility of this approach. We also compare MPQ-learning to a standard linearization procedure that computes only supported solutions and show that in some cases MPQ-learning can be as effective as the scalarization method.

Keywords: method multi; temporal difference; multi objective; mpq learning; difference method

Journal Title: Neurocomputing
Year Published: 2017

Link to full text (if available)


Share on Social Media:                               Sign Up to like & get
recommendations!

Related content

More Information              News              Social Media              Video              Recommended



                Click one of the above tabs to view related content.