"Guest Editorial: Image and Language Understanding"

We are pleased to present this special issue of IJCV on combined image and language understanding. It contains some of the latest work in a long line of research into… Click to show full abstract

We are pleased to present this special issue of IJCV on combined image and language understanding. It contains some of the latest work in a long line of research into problems at the intersection of computer vision and natural language processing. Research on language and vision has been stimulated by recent advances in object recognition. While multi-layer (or “deep”)models have been applied formore than twenty years (Lawrence et al. 1997; LeCun et al. 1989; Nowlan and Platt 1995), recently they have been shown to be extremely effective at large-vocabulary object recognition (Krizhevsky et al. 2012) and at text generation (Mikolov et al. 2010). The next logical step was to combine these two tasks to enable image captioning: generating a short language description based on an image (Kulkarni et al. 2013; Mitchell et al. 2012). In 2015, deep models produced state-of-the-art results in image captioning (Donahue et al. 2015; Fang et al. 2015; Karpathy and Fei-Fei 2015; Vinyals et al. 2015). These results were facilitated by the MSCOCO data set, which provided multiple crowd-sourced labels for thousands of images (Lin et al. 2014). The success of deep image captioning initially seemed promising. Had we finally solved combined image and language understanding?A closer inspection, however, revealed that suchunderstandingwas far fromsolved:Approaches that

Keywords: image; guest editorial; image language; image captioning; language understanding; language

Journal Title: International Journal of Computer Vision
Year Published: 2017

Link to full text (if available)

Share on Social Media: Sign Up to like & get
recommendations!
1

LAUSR

You are not signed in:

Sign Up!

Related content

More Information News Social Media Video Recommended