Movies provide us with a mass of visual content as well as attracting stories. Existing methods have illustrated that understanding movie stories through only visual content is still a hard… Click to show full abstract
Movies provide us with a mass of visual content as well as attracting stories. Existing methods have illustrated that understanding movie stories through only visual content is still a hard problem. In this paper, for answering questions about movies, we introduce a new dataset called PlotGraphs, as external knowledge. The dataset contains massive graph-based information of movies. In addition, we put forward a model that can utilize movie clip, subtitle, and graph-based external knowledge. The model contains two main parts: a layered memory network (LMN) and a plot graph representation network (PGRN). In particular, the LMN can represent frame-level and clip-level movie content by the fixed word memory module and the adaptive subtitle memory module, respectively. And the plot graph representation network can represent the entire graph. We first extract words and sentences from the training movie subtitles and then the hierarchically formed movie representations, which are learned from LMN. At the same time, the PGRN can represent the semantic information and the relationships in the graph. We conduct extensive experiments on the MovieQA dataset and the PlotGraphs dataset. With only visual content as inputs, the LMN with frame-level representation obtains a large performance improvement. When incorporating subtitles into LMN to form the clip-level representation, we achieve the state-of-the-art performance on the online evaluation task of “Video+Subtitles.” After the integration of external knowledge, the performance of the model consisting of the LMN and the PGRN is further improved. The good performance successfully demonstrates that the external knowledge and the proposed model are effective for movie understanding.
               
Click one of the above tabs to view related content.