LAUSR.org creates dashboard-style pages of related content for over 1.5 million academic articles. Sign Up to like articles & get recommendations!

Two-Stream Spatial Graphormer Networks for Skeleton-Based Action Recognition

Photo by camadams from unsplash

In skeleton-based human action recognition, Transformer, which models the correlations between joint pairs in global topology, has achieved remarkable results. However, compared to many researches on changing graph topology learning… Click to show full abstract

In skeleton-based human action recognition, Transformer, which models the correlations between joint pairs in global topology, has achieved remarkable results. However, compared to many researches on changing graph topology learning in graph convolution network (GCN), Transformer self-attention ignores the topology of the skeleton graph when capturing the dependencies between joints. To address these problems, we propose a novel two-stream spatial Graphormer network (2s-SGR), which uses self-attention incorporating structural encodings to model joint and bone information, and which consists of two networks, the joint stream spatial Graphormer network (Js-SGR) and the bone stream spatial Graphormer network (Bs-SGR). First, in the Js-SGR, while Transformer models joint correlations in the global topology of the space, the topology of the joints and the edge information of the bones are introduced into the self-attention through custom structural encodings. At the same time, joint motion information is modeled in spatial-temporal blocks. The added information on structure and motion can effectively capture the dependencies of nodes between frames and enhance feature representation. Second, for the second-order information of the skeleton, the Bs-SGR adapts to the structure of the bone by adjusting the custom structural encodings. Finally, the global spatial-temporal features of joints and bones in the skeleton are fused and input into the classification network to obtain action recognition results. Extensive experiments on three large-scale datasets, NTU-RGB+D 60, NTU-RGB+D 120, and Kinetics, demonstrate that the performance of the 2s-SGR proposed in this paper is at the state-of-the-art level and is effectively validated by ablation experiments.

Keywords: topology; spatial graphormer; stream spatial; action recognition

Journal Title: IEEE Access
Year Published: 2022

Link to full text (if available)


Share on Social Media:                               Sign Up to like & get
recommendations!

Related content

More Information              News              Social Media              Video              Recommended



                Click one of the above tabs to view related content.