Extreme instance imbalance among categories and combinatorial explosion make the recognition of Human-Object Interaction (HOI) a challenging task. Few studies have addressed both challenges directly. Motivated by the success of… Click to show full abstract
Extreme instance imbalance among categories and combinatorial explosion make the recognition of Human-Object Interaction (HOI) a challenging task. Few studies have addressed both challenges directly. Motivated by the success of few-shot learning that learns a robust model from a few instances, we formulate HOI as a few-shot task in a meta-learning framework to alleviate the above challenges. Due to the fact that the intrinsical characteristic of HOI is diverse and interactive, we propose a Semantic-guided Attentive Prototypes Network (SAPNet) framework to learn a semantic-guided metric space where HOI recognition can be performed by computing distances to attentive prototypes of each class. Specifically, the model generates attentive prototypes guided by the category names of actions and objects, which highlight the commonalities of images from the same class in HOI. In addition, we design two alternative prototypes calculation methods, i.e., Prototypes Shift (PS) approach and Hallucinatory Graph Prototypes (HGP) approach, which explore to learn a suitable category prototypes representations in HOI. Finally, in order to realize the task of few-shot HOI, we reorganize 2 HOI benchmark datasets with 2 split strategies, i.e., HICO-NN, TUHOI-NN, HICO-NF, and TUHOI-NF. Extensive experimental results on these datasets have demonstrated the effectiveness of our proposed SAPNet approach.
               
Click one of the above tabs to view related content.