Video frame interpolation aims at synthesizing new video frames in-between existing frames to generate higher frame rate video. Current methods usually use two adjacent frames to generate intermediate frames, but… Click to show full abstract
Video frame interpolation aims at synthesizing new video frames in-between existing frames to generate higher frame rate video. Current methods usually use two adjacent frames to generate intermediate frames, but sometimes fail to handle challenges like large motion, occlusion, and motion blur. This paper proposes a multi-frame pyramid refinement network to effectively use spatio-temporal information contained in multiple frames (more than two). There are three technical contributions in the proposed network. First, a special coarse-to-fine framework is proposed to refine optical flows in-between multiple frames with residual flows at each pyramid level. Therefore, large motion and occlusion can be effectively estimated. Second, a 3D U-net feature extractor is used to excavate spatio-temporal context and restore texture, which tend to disappear at course pyramid levels. Third, a multi-step perceptual loss is adopted to preserve more details in intermediate frame. It is worth mentioning that our approach can be easily extended to multi-frame interpolation. Our network is trained end-to-end using more than 80K collected frame groups (25 frames per group). Experimental results on several independent datasets show that our approach can effectively handle challenging cases, and perform consistently better than other state-of-the-art methods.
               
Click one of the above tabs to view related content.