LAUSR.org creates dashboard-style pages of related content for over 1.5 million academic articles. Sign Up to like articles & get recommendations!

Semantic-Guided Selective Representation for Image Captioning

Photo by mykjohnson from unsplash

Grid-based features have been proven to be as effective as region-based features in multi-modal tasks such as visual question answering. However, its application to image captioning encounters two main issues,… Click to show full abstract

Grid-based features have been proven to be as effective as region-based features in multi-modal tasks such as visual question answering. However, its application to image captioning encounters two main issues, namely, noisy features and fragmented semantics. In this paper, we propose a novel feature selection scheme, with a Relation-Aware Selection (RAS) and a Fine-grained Semantic Guidance (FSG) learning strategy. Based on the grid-wise interactions, RAS can enhance the salient visual regions and channels, and suppress the less important ones. In addition, this selection process is guided by FSG, which uses fine-grained semantic knowledge to supervise the selection process. Experimental results on the MS COCO show the proposed RAS-FSG scheme achieves state-of-the-art performance on both the off-line and on-line testing, i.e., 134.3 CIDEr for the off-line testing and 135.4 for the on-line testing of MSCOCO. Extensive ablation studies and visualizations also validate the effectiveness of our scheme.

Keywords: line testing; semantic guided; image; image captioning; selection

Journal Title: IEEE Access
Year Published: 2023

Link to full text (if available)


Share on Social Media:                               Sign Up to like & get
recommendations!

Related content

More Information              News              Social Media              Video              Recommended



                Click one of the above tabs to view related content.