LAUSR.org creates dashboard-style pages of related content for over 1.5 million academic articles. Sign Up to like articles & get recommendations!

COMIC: Toward A Compact Image Captioning Model With Attention

Photo from wikipedia

Recent works in image captioning have shown very promising raw performance. However, we realize that most of these encoder–decoder style networks with attention do not scale naturally to large vocabulary… Click to show full abstract

Recent works in image captioning have shown very promising raw performance. However, we realize that most of these encoder–decoder style networks with attention do not scale naturally to large vocabulary size, making them difficult to deploy on embedded systems with limited hardware resources. This is because the size of word and output embedding matrices grow proportionally with the size of vocabulary, adversely affecting the compactness of these networks. To address this limitation, this paper introduces a brand new idea in the domain of image captioning. That is, we tackle the problem of compactness of image captioning models which is hitherto unexplored. We showed that our proposed model, named COMIC for compact image captioning, achieves comparable results in five common evaluation metrics with state-of-the-art approaches on both MS-COCO and InstaPIC-1.1M datasets despite having an embedded vocabulary size that is 39×−99× smaller.

Keywords: image captioning; attention; compact image; model; image; size

Journal Title: IEEE Transactions on Multimedia
Year Published: 2019

Link to full text (if available)


Share on Social Media:                               Sign Up to like & get
recommendations!

Related content

More Information              News              Social Media              Video              Recommended



                Click one of the above tabs to view related content.