Blind image quality assessment (BIQA) for authentic distortions is still a great challenge, even in today’s deep learning era. It has been widely acknowledged that local and global features are… Click to show full abstract
Blind image quality assessment (BIQA) for authentic distortions is still a great challenge, even in today’s deep learning era. It has been widely acknowledged that local and global features are both indispensable for IQA, which play complementary roles. While combining local and global features is straightforward in traditional handcrafted feature-based IQA metrics, it is not an easy task in the deep learning framework. This is mainly due to the fact that deep neural networks typically require input images with a fixed size. Current metrics either resize the image or use local patches as input, which are problematic in that they cannot integrate local and global aspects as well as their interactions to achieve comprehensive quality evaluation. Motivated by the above facts, this paper presents a new BIQA metric for authentic distortions by aggregating local and global deep features in a Vision-Transformer framework. In the proposed metric, selective local regions and global content are simultaneously input for complementary feature extraction, and the Vision-Transformer is employed to build the relationship between different local patches and image quality. Self-attention mechanism is further adopted to explore the interaction between local and global deep features, producing the final image quality score. Extensive experiments on five authentically distorted IQA databases demonstrate that the proposed metric outperforms the state-of-the-arts in terms of both prediction performance and generalization ability.
               
Click one of the above tabs to view related content.