Abstract. Convolutional neural networks (CNN) have given rise to a new generation of video super-resolution (SR) technique. However, most existing CNN-based video SR algorithms treat the consecutive frames as a… Click to show full abstract
Abstract. Convolutional neural networks (CNN) have given rise to a new generation of video super-resolution (SR) technique. However, most existing CNN-based video SR algorithms treat the consecutive frames as a series of feature maps, just as the procedure performed in single image SR algorithms. We propose an end-to-end three-dimensional (3-D) CNN video SR framework. The input frames are considered as a cube in our framework. 3-D convolution is performed on it to extract features along spatial and temporal dimension. Image prior knowledge, such as optical flows, is introduced in reconstruction. A combination of mean square error loss and multiscale structure similarity index (MS-SSIM) loss is used to optimize the model. Experimental results show that the proposed method reconstructs high-resolution frames with more accurate and visually pleasant structures compared with state-of-the-art video SR algorithms. We also achieve comparable PSNR/SSIM results with less computation time.
               
Click one of the above tabs to view related content.