LAUSR.org creates dashboard-style pages of related content for over 1.5 million academic articles. Sign Up to like articles & get recommendations!

Temporal Textual Localization in Video via Adversarial Bi-Directional Interaction Networks

Photo by tolga__ from unsplash

Given a natural language description, temporal textual localization aims to localize the most relevant segment in an untrimmed video, which is a natural and imperative extension of temporal action localization.… Click to show full abstract

Given a natural language description, temporal textual localization aims to localize the most relevant segment in an untrimmed video, which is a natural and imperative extension of temporal action localization. Most existing temporal textual localization works neglect the long-range semantic modeling in video contents and lack accurate textual understanding. Moreover, they remain in single-task learning and fail to exploit multi-view supervised information. Based on these observations, we introduce a novel adversarial bi-directional interaction network, which is a global framework to retrieve the target segment directly. Specifically, we propose a bi-directional attention mechanism to build bi-directional information interaction, which captures long-range semantic dependencies from video context and enhances textual representation learning. After localization, we further advise an auxiliary discriminator network to verify the localization result and boost the performance by adversarial training process. We adopt multi-task learning approach to train our model, including: (1) predicting coordinate probability distribution task, which selects start and end frame to localize target segment; (2) predicting frame-level correlation distribution task, which calculates the correlation between frame and description; (3) auxiliary adversarial learning task, which calculates matched score between localization and description to boost the performance. The extensive experiments on ActivityNet Captions and TACoS show the significant effectiveness and efficiency of our method.

Keywords: temporal textual; task; localization; adversarial directional; interaction; textual localization

Journal Title: IEEE Transactions on Multimedia
Year Published: 2021

Link to full text (if available)


Share on Social Media:                               Sign Up to like & get
recommendations!

Related content

More Information              News              Social Media              Video              Recommended



                Click one of the above tabs to view related content.