"A Task-Agnostic Regularizer for Diverse Subpolicy Discovery in Hierarchical Reinforcement Learning"

The automatic subpolicy discovery approach in hierarchical reinforcement learning (HRL) has recently achieved promising performance on sparse reward tasks. This accelerates transfer learning and unsupervised intelligent creatures while eliminating the domain-specific knowledge constraint. Most previously developed approaches are demonstrated to suffer from collapsing into the situation where one subpolicy dominates the whole task, since they cannot ensure the diversity of different subpolicies. In contrast, this article proposes a task-agnostic regularizer (TAR) for learning diverse subpolicies in HRL. Specifically, we first formulate the discovery of diverse subpolicies as a trajectory inference problem and then propose a corresponding information-theoretic objective to encourage diversity. Subsequently, considering computability, we instantiate the objective as two simplifications for discrete and continuous action spaces. We extensively evaluate the proposed diversity-driven regularizer on three HRL task domains: 1) meta reinforcement learning; 2) hierarchical policy learning in the option framework; and 3) unsupervised subpolicy discovery. The extensive results obtained show that our TAR approach can improve upon the state-of-the-art performance on all three HRL domains without modifying any existing hyperparameters, indicating the wide applicability and robustness of our approach.

Keywords: reinforcement learning; hierarchical reinforcement; subpolicy; subpolicy discovery; discovery

Journal Title: IEEE Transactions on Systems, Man, and Cybernetics: Systems
Year Published: 2023

Link to full text (if available)

Share on Social Media: Sign Up to like & get
recommendations!
2

LAUSR

You are not signed in:

Sign Up!

Related content

More Information News Social Media Video Recommended