LAUSR.org creates dashboard-style pages of related content for over 1.5 million academic articles. Sign Up to like articles & get recommendations!

Multigranularity Localization Transformer With Collaborative Understanding for Referring Multiobject Tracking

As an essential component of vision-based measurement (VBM), referring multiobject tracking (RMOT) involves localizing and tracking specific objects in video frames using linguistic prompts as references. To enhance the effectiveness… Click to show full abstract

As an essential component of vision-based measurement (VBM), referring multiobject tracking (RMOT) involves localizing and tracking specific objects in video frames using linguistic prompts as references. To enhance the effectiveness of linguistic prompts when training, we introduce a novel multigranularity localization transformer with collaborative understanding, termed multigranularity localization transformer with collaborative understanding (MGLT). Unlike previous methods focused on visual-language fusion and postprocessing, MGLT reevaluates RMOT by preventing linguistic clues from attenuating during propagation. MGLT comprises two key components: multigranularity implicit query bootstrapping (MGIQB) and multigranularity track-prompt alignment (MGTPA). MGIQB ensures that tracking and linguistic features are preserved in later layers of network propagation by bootstrapping the model to generate text-relevant and temporal-enhanced track queries. Simultaneously, MGTPA with multigranularity linguistic prompts enhances the model’s localization ability by understanding the relative positions of different referred objects within a frame. Extensive experiments on well-recognized benchmarks demonstrate that MGLT achieves the state-of-the-art performance. Notably, it shows significant improvements on the Refer-KITTI dataset of 2.73%, 7.95%, and 3.18% in HOTA, AssA, and IDF1, respectively. The code will be available at https://github.com/JiajunChern/MGLT.

Keywords: multigranularity localization; localization transformer; collaborative understanding; localization; transformer collaborative; multigranularity

Journal Title: IEEE Transactions on Instrumentation and Measurement
Year Published: 2025

Link to full text (if available)


Share on Social Media:                               Sign Up to like & get
recommendations!

Related content

More Information              News              Social Media              Video              Recommended



                Click one of the above tabs to view related content.