Abstract Spoken language based natural Human-Robot Interaction (HRI) requires robots to have the ability to understand spoken language, and extract the intention-related information from the working scenario. For grounding the… Click to show full abstract
Abstract Spoken language based natural Human-Robot Interaction (HRI) requires robots to have the ability to understand spoken language, and extract the intention-related information from the working scenario. For grounding the intention-related object in the working environment, object affordance recognition could be a feasible way. To this end, we propose a dataset and a deep CNN based architecture to learn the human-centered object affordance. Furthermore, we present an affordance based multimodal fusion framework to realize intended object grasping according to the spoken instructions of human users. The proposed framework contains an intention semantics extraction module which is employed to extract the intention from spoken language, a deep Convolutional Neural Network (CNN) based object affordance recognition module which is applied to recognize human-centered object affordance, and a multimodal fusion module which is adopted to bridge the extracted intentions and the recognized object affordances. We also complete multiple intended object grasping experiments on a PR2 platform to validate the feasibility and practicability of the presented HRI framework.
               
Click one of the above tabs to view related content.