Despite a lot of work in excavating the emotion descriptor from the hidden information, learning an effective spatiotemporal feature is a challenging issue for micro-expression recognition due to the fact… Click to show full abstract
Despite a lot of work in excavating the emotion descriptor from the hidden information, learning an effective spatiotemporal feature is a challenging issue for micro-expression recognition due to the fact that the micro-expression has a small difference in dynamic change and occurs in localized facial regions. Therefore, these properties of micro-expression suggest that the representation is sparse in the spatiotemporal domain. In this letter, a high-performance spatiotemporal feature learning based on sparse transformer is presented to solve the above issue. We extract the strong associated spatiotemporal feature by distinguishing the spatial attention map and attentively fusing the temporal feature. Thus, the feature map extracted from the critical relation will be fully utilized, while the superfluous relation will be masked. Our proposed method achieves remarkable results compared to state-of-the-art methods, proving that the sparse representation can be successfully integrated into the self-attention mechanism for micro-expression recognition.
               
Click one of the above tabs to view related content.