Change captioning is an emerging task to describe the changes between a pair of images. The difficulty in this task is to discover the differences between the two images. Recently,… Click to show full abstract
Change captioning is an emerging task to describe the changes between a pair of images. The difficulty in this task is to discover the differences between the two images. Recently, some methods have been proposed to address this problem. However, they all employ unidirectional difference localization to identify the changes. This can lead to ambiguity about the nature of the changes. Instead, we propose a framework with bidirectional difference localization and semantic consistency reasoning to describe the image changes. First, we locate the changes in the two images by capturing bidirectional differences. Then we design a decoder with spatial‐channel attention to generate the change caption. Finally, we introduce semantic consistency reasoning to constrain our bidirectional difference localization module and spatial‐channel attention module. Extensive experiments on three public data sets show that the performance of our proposed model outperforms the state‐of‐the‐art change captioning models by a large margin.
               
Click one of the above tabs to view related content.