Long term temporal representation methods demand high computational cost, restricting their practical use in real world applications. We propose a two-step deep residual method for efficiently learning long-term discriminative temporal… Click to show full abstract
Long term temporal representation methods demand high computational cost, restricting their practical use in real world applications. We propose a two-step deep residual method for efficiently learning long-term discriminative temporal representation, whilst significantly reducing computational cost. In the first step, a novel self-supervision deep temporal embedding method is presented to embed repetitive short-term motions at a cluster-friendly feature space. In the second step, an efficient temporal representation is made by leveraging the differences between the original data and its associated repetitive motion clusters as a novel deep residual method. Experimental results demonstrate that, the proposed method achieves competitive results on some challenging human action recognition datasets like UCF101, HMDB51, THUMOS14, and Kinetics-400.
               
Click one of the above tabs to view related content.