Despite significant advances in few-shot classification, object detection, or speech recognition in recent years, training an effective robot to adapt to previously unseen environments in a small data regime is… Click to show full abstract
Despite significant advances in few-shot classification, object detection, or speech recognition in recent years, training an effective robot to adapt to previously unseen environments in a small data regime is still a long-lasting problem for learning from demonstrations (LfD). A promising solution is meta-learning. However, we notice that simply constructing a model with a more complicated and deeper network via previous meta-learning methods does not perform well as we expected. One possible reason is that the shallow features are gradually lost as the network deepens, while these shallow features play an essential role in the adaptation process of meta-learning. Thus, we present a novel yet effective Multi-Level and Multi-Attention Domain-Adaptive Domain-Adaptive Meta-Learning (MLMA-DAML) framework, which meta-learns multiple visual features via different attention heads to update the model policy. Once the model is updated, our MLMA-DAML predicts robot actions (e.g., positions of end-effectors) via fully connected layers (FCL). As we notice that directly converting visual signals to robot actions via FCL following prior methods is not robust to perform robot manipulation tasks, we further extend our MLMA-DAML to MLMA-DAML++. The proposed MLMA-DAML++ learns an effective representation of manipulation tasks via an extra goal prediction network with convolutional layers (CL) to predict more reliable robot actions (represented by feature pixels/grids). Extensive experiments on a UR5 robot arm demonstrate that our proposed methods significantly outperform current related state-of-the-art methods in different real-world placing settings.
               
Click one of the above tabs to view related content.