LAUSR.org creates dashboard-style pages of related content for over 1.5 million academic articles. Sign Up to like articles & get recommendations!

Transferable Knowledge-Based Multi-Granularity Fusion Network for Weakly Supervised Temporal Action Detection

Photo from wikipedia

Despite remarkable progress, temporal action detection is still limited for real application due to the great amount of manual annotations. This issue motivates interest in addressing this task under weak… Click to show full abstract

Despite remarkable progress, temporal action detection is still limited for real application due to the great amount of manual annotations. This issue motivates interest in addressing this task under weak supervision, namely, locating the action instances using only video-level class labels. Many current works on this task are mainly based on the Class Activation Sequence (CAS), which is generated by the video classification network to describe the probability of each snippet being in a specific action class of the video. However, the CAS generated by a simple classification network can only focus on local discriminative parts instead of locating the entire interval of target actions. In this paper, we present a novel framework to handle this issue. Specifically, we propose to utilize convolutional kernels with varied dilation rates to enlarge the receptive fields, which can transfer the discriminative information to the surrounding non-discriminative regions. Then, we design a cascaded module with the proposed Online Adversarial Erasing (OAE) mechanism to further mine more relevant regions of target actions by feeding the erased-feature maps of discovered regions back into the system. In addition, inspired by the transfer learning method, we adopt an additional module to transfer the knowledge from trimmed videos to untrimmed videos to promote the classification performance on untrimmed videos. Finally, we employ a boundary regression module embedded with Outer-Inner-Contrastive (OIC) loss to automatically predict the boundaries based on the enhanced CAS. Extensive experiments are conducted on two challenging datasets, THUMOS14 and ActivityNet-1.3, and the experimental results clearly demonstrate the superiority of our unified framework.

Keywords: action detection; temporal action; action; knowledge; network

Journal Title: IEEE Transactions on Multimedia
Year Published: 2021

Link to full text (if available)


Share on Social Media:                               Sign Up to like & get
recommendations!

Related content

More Information              News              Social Media              Video              Recommended



                Click one of the above tabs to view related content.