"An explicit self-attention-based multimodality CNN in-loop filter for versatile video coding"

The newest video coding standard, versatile video coding (VVC), has just been published recently. While it greatly improves the performance over the last High Efficiency Video Coding (HEVC) standard, there are still blocking artifacts under the more flexible block partitioning structures. In order to reduce the blocking artifact and improve the quality of the reconstructed video frame, an explicit self-attention-based multimodality convolutional neural network (CNN) is proposed in this paper. It adaptively adjusts the restoration of different coding units (CU) according to the CU partition structure and texture of the reconstructed video frame, considering that the loss scales of different CUs can be quite different. The proposed method takes advantage of the CU partition map by using it as a different modality and combined with the attention mechanism. Moreover, the unfiltered reconstructed image is also used to enhance the attention branch, which forms an explicit self-attention model. Then a densely integrated multi-stage fusion is developed where the attention branch is densely fused to the main filtering CNN to adaptively adjust the overall image recovery scale. Thorough analysis on the proposed method is provided with ablation study on each module. Experimental results show that the proposed method achieves the state-of-the-art performance under all intra (AI) configuration, with 7.24% BD-rate savings on average compared with the VVC reference software (VTM).

Keywords: video; attention; video coding; self attention; cnn; explicit self

Journal Title: Multimedia Tools and Applications
Year Published: 2021

Link to full text (if available)

Share on Social Media: Sign Up to like & get
recommendations!
0

LAUSR

You are not signed in:

Sign Up!

Related content

More Information News Social Media Video Recommended