Human action recognition methods based on skeleton data have been widely studied owing to their strong robustness to illumination and complex backgrounds. Existing methods have achieved good recognition results; however,… Click to show full abstract
Human action recognition methods based on skeleton data have been widely studied owing to their strong robustness to illumination and complex backgrounds. Existing methods have achieved good recognition results; however, they have certain challenges, such as the fixed topological structure of the graph, the omission of nonphysical joint correlation, and the inability to extract local spatial–temporal features. Herein, we propose spatial–temporal mixing of global and local self-attention graph convolutional networks (STGL-GCN) using skeleton data. The global self-attention matrix captures the potential dependencies of nonphysical correlations between joints, and the local self-attention matrix determines the connection strength of the physical edges of joints. The matrices are updated together with the convolution parameters in each network layer as the model is trained for optimal graph structure to achieve accurate action expressions Experiments on the NTU-RGBD dataset demonstrate that our model accurately recognizes actions.
               
Click one of the above tabs to view related content.