LAUSR.org creates dashboard-style pages of related content for over 1.5 million academic articles. Sign Up to like articles & get recommendations!

Regularizing Visual Semantic Embedding With Contrastive Learning for Image-Text Matching

Photo by usgs from unsplash

Learning visual semantic embedding for image-text matching has achieved high success by using triplet loss to pull positive image-text pairs which share similar semantic meaning and to push negative image-text… Click to show full abstract

Learning visual semantic embedding for image-text matching has achieved high success by using triplet loss to pull positive image-text pairs which share similar semantic meaning and to push negative image-text pairs which share different semantic meaning. Without modeling constraints from image-image or text-text pairs, the generated visual semantic embedding inevitably faces the problem of semantic misalignments among similar images or among similar texts. To solve this problem, we present a contrastive visual semantic embedding framework, named ConVSE, which achieves intra-modal semantic alignment by contrastive learning from augmented image-image (or text-text) pairs and achieves inter-modal semantic alignment by applying hardest-negative-enhanced triplet loss on image-text pairs. To the best of our knowledge, we are the first to find that contrastive learning benefits visual semantic embedding. Extensive experiments on large-scale MSCOCO and Flickr30 K datasets verify the effectiveness of our proposed ConVSE by outperforming visual semantic embedding-based methods and achieving new state-of-the-art.

Keywords: image text; semantic embedding; visual semantic; text pairs; image

Journal Title: IEEE Signal Processing Letters
Year Published: 2022

Link to full text (if available)


Share on Social Media:                               Sign Up to like & get
recommendations!

Related content

More Information              News              Social Media              Video              Recommended



                Click one of the above tabs to view related content.