"Semantic-Aware Network for Natural Language Tracking"

Natural language tracking aims to locate the position of a target specified by a natural language description. Existing methods are trained on vision-language datasets with a small number of language descriptions, which may lead to limited semantic generalization. Moreover, they extract visual and language features separately, which limits visual-semantic capabilities. To overcome these limitations, we propose a novel semantic-aware tracking framework, SATrack, which integrates a semantic-aware attention module and a cross-modal aggregation module. The proposed SATrack enjoys several merits. First, the semantic-aware attention module utilizes language semantics as a bridge to build associations between visual features, enabling stronger visual-semantic capabilities. Second, the cross-modal aggregation module transfers the semantic knowledge of CLIP into the tracking framework for semantic generalization. Extensive experimental results demonstrate that SATrack outperforms previous state-of-the-art trackers on four natural language tracking benchmarks.

Keywords: aware network; module; language tracking; natural language; language; semantic aware

Journal Title: IEEE Transactions on Circuits and Systems for Video Technology
Year Published: 2025

Link to full text (if available)

Share on Social Media: Sign Up to like & get
recommendations!
0

LAUSR

You are not signed in:

Sign Up!

Related content

More Information News Social Media Video Recommended