LAUSR.org creates dashboard-style pages of related content for over 1.5 million academic articles. Sign Up to like articles & get recommendations!

Deep Memory Network for Cross-Modal Retrieval

Photo from wikipedia

With the explosive growth of multimedia data on the Internet, cross-modal retrieval has attracted a great deal of attention in both computer vision and multimedia communities. However, this task is… Click to show full abstract

With the explosive growth of multimedia data on the Internet, cross-modal retrieval has attracted a great deal of attention in both computer vision and multimedia communities. However, this task is challenging due to the heterogeneity gap between different modalities. Current approaches typically involve a common representation learning process that maps data from different modalities into a common space by linear or nonlinear embedding. Yet, most of them only handle the dual-modal situation and generalize poorly to complex cases that involve multiple modalities. In addition, they often require expensive fine-grained alignment of training data among diverse modalities. In this paper, we address these with a novel cross-modal memory network (CMMN), in which memory contents across modalities are simultaneously learned from end to end without the need of exact alignment. We further account for the diversity across multiple modalities using the strategy of adversarial learning. Extensive experimental results on several large-scale datasets demonstrate that the proposed CMMN approach achieves state-of-the-art performance in the task of cross-modal retrieval.

Keywords: modal retrieval; cross modal; memory network; modal

Journal Title: IEEE Transactions on Multimedia
Year Published: 2019

Link to full text (if available)


Share on Social Media:                               Sign Up to like & get
recommendations!

Related content

More Information              News              Social Media              Video              Recommended



                Click one of the above tabs to view related content.