The newest video coding standard, versatile video coding (VVC), has just been published recently. While it greatly improves the performance over the last High Efficiency Video Coding (HEVC) standard, there… Click to show full abstract
The newest video coding standard, versatile video coding (VVC), has just been published recently. While it greatly improves the performance over the last High Efficiency Video Coding (HEVC) standard, there are still blocking artifacts under the more flexible block partitioning structures. In order to reduce the blocking artifact and improve the quality of the reconstructed video frame, an explicit self-attention-based multimodality convolutional neural network (CNN) is proposed in this paper. It adaptively adjusts the restoration of different coding units (CU) according to the CU partition structure and texture of the reconstructed video frame, considering that the loss scales of different CUs can be quite different. The proposed method takes advantage of the CU partition map by using it as a different modality and combined with the attention mechanism. Moreover, the unfiltered reconstructed image is also used to enhance the attention branch, which forms an explicit self-attention model. Then a densely integrated multi-stage fusion is developed where the attention branch is densely fused to the main filtering CNN to adaptively adjust the overall image recovery scale. Thorough analysis on the proposed method is provided with ablation study on each module. Experimental results show that the proposed method achieves the state-of-the-art performance under all intra (AI) configuration, with 7.24% BD-rate savings on average compared with the VVC reference software (VTM).
               
Click one of the above tabs to view related content.