Emotion recognition from body gestures is challenging since similar emotions can be expressed by arbitrary spatial configurations of joints, which results in relying on modeling spatial-temporal patterns from a more… Click to show full abstract
Emotion recognition from body gestures is challenging since similar emotions can be expressed by arbitrary spatial configurations of joints, which results in relying on modeling spatial-temporal patterns from a more global level. However, most recent powerful graph convolution networks (GCNs) separate the spatial and temporal modeling into isolated processes, where GCN models spatial interactions using partially fixed adjacent matrices and 1D convolution captures temporal dynamics, which is insufficient for emotion recognition. In this work, we propose the 3D-Shift GCN, which enables interactions of joints within a spatial-temporal volume for global feature extraction. Besides, we further develop a multiscale architecture, the MS-Shift GCN, to fuse features captured under different temporal ranges for modeling richer dynamics. After conducting evaluation on two regular action recognition benchmarks and two gesture based emotion recognition datasets, the results show that the proposed method outperforms several state-of-the-art methods.
Click one of the above tabs to view related content.