"Gradient dynamics in reinforcement learning."

Despite the success achieved by the analysis of supervised learning algorithms in the framework of statistical mechanics, reinforcement learning has remained largely untouched by physicists. Here we move towards closing the gap by analyzing the dynamics of the policy gradient algorithm. For a convex problem, namely the k-armed bandit, we show that the learning dynamics obeys a drift-diffusion motion described by a Langevin equation, the coefficients of which can be tuned by the learning rate. We explore the striking similarity between our Langevin equation and the Kimura equation, describing genotypes evolution. Furthermore, we propose a mapping between a nonconvex reinforcement learning setting describing multiple joints of a robotic arm and a disordered system, namely a p-spin glass. This mapping enables us to show how the learning rate acts as an effective temperature and thus is capable of smoothing rough landscapes, corroborating what is displayed by the drift-diffusive description and paving the way for physics-inspired algorithmic optimization based on annealing procedures in disordered systems.

Keywords: reinforcement learning; equation; dynamics reinforcement; gradient; gradient dynamics

Journal Title: Physical review. E
Year Published: 2022

Link to full text (if available)

Share on Social Media: Sign Up to like & get
recommendations!
0

LAUSR

You are not signed in:

Sign Up!

Related content

More Information News Social Media Video Recommended