LAUSR.org creates dashboard-style pages of related content for over 1.5 million academic articles. Sign Up to like articles & get recommendations!

Using Neural Encoder-Decoder Models With Continuous Outputs for Remote Sensing Image Captioning

Photo from wikipedia

Remote sensing image captioning involves generating a concise textual description for an input aerial image. The task has received significant attention, and several recent proposals are based on neural encoder-decoder… Click to show full abstract

Remote sensing image captioning involves generating a concise textual description for an input aerial image. The task has received significant attention, and several recent proposals are based on neural encoder-decoder models. Most previous methods are trained to generate discrete outputs corresponding to word tokens that match the reference sentences word-by-word, thereby optimizing the generation locally at token-level instead of globally at sentence-level. This paper explores an alternative generation method based on continuous outputs, which generates sequences of embedding vectors instead of directly predicting word tokens. We argue that continuous output models have the potential to better capture the global semantic similarity between captions and images, e.g. by facilitating the use of loss functions matching different views of the data. This includes comparing representations for individual tokens and for the entire captions, and also comparing captions against intermediate image representations. We experimentally compare discrete versus continuous output captioning methods over the UCM and RSICD datasets, which are extensively used in the area despite some issues which we also discuss. Results show that the alternative encoder-decoder framework with continuous outputs can indeed lead to better results on the two datasets, compared to the standard approach based on discrete outputs. The proposed approach is also competitive against the state-of-the-art model in the area.

Keywords: image; continuous outputs; remote sensing; encoder decoder

Journal Title: IEEE Access
Year Published: 2022

Link to full text (if available)


Share on Social Media:                               Sign Up to like & get
recommendations!

Related content

More Information              News              Social Media              Video              Recommended



                Click one of the above tabs to view related content.