As one of the key technologies of Versatile Video Coding (VVC), a flexible quad-tree with a nested multi-type tree (QTMT) partition structure significantly improves the rate-distortion (RD) performance. However, this… Click to show full abstract
As one of the key technologies of Versatile Video Coding (VVC), a flexible quad-tree with a nested multi-type tree (QTMT) partition structure significantly improves the rate-distortion (RD) performance. However, this structure brings additional complexity due to the recursive search for the best partition type. Traditional fast partition methods in previous encoders, cannot adapt to this new complex structure, because it’s too complicated to predict each block size from one layer to another layer. Some indirect bottom-up designed methods are simple enough, but cannot predict specific split structures, making the acceleration capacity limited. Therefore, in this paper, we propose a learning-based approach to effectively predict the QTMT structure without having to heuristically explore the partitions of each layer. Firstly, we propose a hierarchy grid fully convolutional network (HG-FCN) framework, which concisely requires inference only once to obtain the entire partition information of the current CU and sub-CUs, and the inference is highly parallel. Secondly, we design a representation of complicated QTMT of CU partition in the form of hierarchy grid map (HGM), which can directly and effectively predict the specific hierarchical split structure. Lastly, a dual-threshold decision scheme is adopted to automatically control the trade-off between coding performance and complexity. Extensive experiments demonstrate the effectiveness of HG-FCN, which can reduce 51.15% $\sim ~65.53$ % complexity of VVC intra coding with negligible 1.17% $\sim ~2.19$ % BD-BR increase, superior to other state-of-the-art methods.
               
Click one of the above tabs to view related content.