"Semi-Supervised Video Object Segmentation via Learning Object-Aware Global-Local Correspondence"

In semi-supervised video object segmentation (VOS) task, temporal coherent object-level cues play a key role yet are hard to accurately model. To this end, this paper presents an object-aware global-local correspondence architecture, which enables to extract the inter-frame temporal coherent object-level features for accurate VOS. Specifically, we first generate a set of object masks by the ground-truth segmentation, and then we squeeze the current frame representation inside the object masks into a set of global object embeddings. Second, we compute the similarity between each embedding and the feature map, producing an object-aware weight for each pixel. The object-aware feature at each pixel is then constructed by summing the object embeddings weighted by their corresponding object-aware weights, which is able to capture rich object category information. Third, to establish the accurate correspondences between the inter-frame temporal coherent cues, we further design a novel global-local correspondence module to refine the temporal feature representations. Finally, we augment the object-aware features with the global-local aligned information to produce a strong spatio-temporal representation, which is essential to a more reliable pixel-wise segmentation prediction. Extensive evaluations are conducted on three popular VOS benchmarks containing Youtube-VOS, Davis2017 and Davis2016, demonstrating that the proposed method achieves favourable performance compared to the state-of-the-arts.

Keywords: semi supervised; local correspondence; segmentation; object aware; global local

Journal Title: IEEE Transactions on Circuits and Systems for Video Technology
Year Published: 2022

Link to full text (if available)

Share on Social Media: Sign Up to like & get
recommendations!
1

LAUSR

You are not signed in:

Sign Up!

Related content

More Information News Social Media Video Recommended