Abstract Video summarization has unprecedented importance to help us overview current ever-growing amount of video collections. In this paper, we propose a novel dynamic video summarization model based on deep… Click to show full abstract
Abstract Video summarization has unprecedented importance to help us overview current ever-growing amount of video collections. In this paper, we propose a novel dynamic video summarization model based on deep learning architecture. We are the first to solve the imbalanced class distribution problem in video summarization. The over-sampling algorithm is used to balance the class distribution on training data. The novel two-stream deep architecture with the cost-sensitive learning is proposed to handle the class imbalance problem in feature learning. In the spatial stream, RGB images are used to represent the appearance of video frames, and in the temporal stream, multi-frame motion vectors with deep learning framework is firstly introduced to represent and extract temporal information of the input video. The proposed method is evaluated on two standard video summarization datasets and a standard emotional dataset. Empirical validations for video summarization demonstrate that our model achieves performance improvement over the existing and state-of-the-art methods. Moreover, the proposed method is able to highlight the video content with the active level of arousal in affective computing task. In addition, the proposed frame-based model has another advantage. It can automatically preserve the connection between consecutive frames. Although the summary is constructed based on the frame level, the final summary is comprised of informative and continuous segments instead of individual separate frames.
               
Click one of the above tabs to view related content.