"Learning motions from demonstrations and rewards with time-invariant dynamical systems based policies"

An important challenge when using reinforcement learning for learning motions in robotics is the choice of parameterization for the policy. We use Gaussian Mixture Regression to extract a parameterization with relevant non-linear features from a set of demonstrations of a motion following the paradigm of learning from demonstration. The resulting parameterization takes the form of a non-linear time-invariant dynamical system (DS). We use this time-invariant DS as a parameterized policy for a variant of the PI2 policy search algorithm. This paper contributes by adapting PI2 for our time-invariant motion representation. We introduce two novel parameter exploration schemes that can be used to (1) sample model parameters to achieve a uniform exploration in state space and (2) explore while ensuring stability of the resulting motion model. Additionally, a state dependent stiffness profile is learned simultaneously to the reference trajectory and both are used together in a variable impedance control architecture. This learning architecture is validated in a hardware experiment consisting of a digging task using a KUKA LWR platform.

Keywords: learning motions; motions demonstrations; time invariant; time; demonstrations rewards; invariant dynamical

Journal Title: Autonomous Robots
Year Published: 2018

Link to full text (if available)

Share on Social Media: Sign Up to like & get
recommendations!
0

LAUSR

You are not signed in:

Sign Up!

Related content

More Information News Social Media Video Recommended