"Learning and Integrating Multi-Level Matching Features for Image-Text Retrieval"

In recent years, several retrieval methods for measuring the similarity between images and texts have been proposed. Despite the efficiency of most of these methods, the scalar-based cosine similarities may not be sufficiently expressive to fully capture the intricate matching pattern between the visual and textual features. In addition, the hybrid methods empirically integrate the global and local matching similarities, which results in less interpretability. This letter proposes a novel Multi-Level Matching Network (MLMN) which learns and integrates the vector-based multi-level matching features. Two vector-based matching branches are first designed to learn more powerful matching features. An interpretable matching integration strategy is also proposed, which adaptively integrate the learned matching features according to the global matching information. Moreover, the image-text retrieval is further considered as a binary classification problem, and the MLMN is trained by the binary cross-entropy loss with hardest negatives. Several experiments are performed using the MSCOCO and Flickr30K datasets. The results demonstrate that MLMN achieves a higher performance than that of the state-of-the-art methods.

Keywords: level matching; multi level; matching features; image text

Journal Title: IEEE Signal Processing Letters
Year Published: 2022

Link to full text (if available)

Share on Social Media: Sign Up to like & get
recommendations!
1

LAUSR

You are not signed in:

Sign Up!

Related content

More Information News Social Media Video Recommended