LAUSR.org creates dashboard-style pages of related content for over 1.5 million academic articles. Sign Up to like articles & get recommendations!

Audio Matters in Video Super-Resolution by Implicit Semantic Guidance

Photo by anniespratt from unsplash

Video super-resolution (VSR) aims to use multiple consecutive low-resolution frames to recover the corresponding high-resolution frames. However, existing VSR methods only consider videos as image sequences, ignoring another essential timing… Click to show full abstract

Video super-resolution (VSR) aims to use multiple consecutive low-resolution frames to recover the corresponding high-resolution frames. However, existing VSR methods only consider videos as image sequences, ignoring another essential timing information-audio, while in fact, there is a semantic link between audio and vision, and extensive studies have shown that audio can provide supervisory information in visual networks. Meanwhile, the addition of semantic priors has been proven to be effective in super-resolution (SR) tasks, but a pretrained segmentation network is required to obtain semantic segmentation maps. By contrast, audio as the information contained in the video itself can be directly used. Therefore, in this study, we propose a novel and pluggable multiscale audiovisual fusion (MS-AVF) module to enhance VSR performance by exploiting the relevant audio information, which can be regarded as implicit semantic guidance compared with the kind of explicit segmentation priors. Specifically, we first fuse audiovisual features on the semantic feature maps of different granularities of the target frames, and then through a top-down multiscale fusion approach, feedback high-level semantics to the underlying global visual features layer by layer, thereby providing effective audio implicit semantic guidance for VSR. Experimental results show that audio can further improve the VSR effect. Moreover, by visualizing the learned attention mask, the proposed end-to-end model can automatically learn potential audiovisual semantic links, especially improving the accuracy and effectiveness of the SR of sound sources and their surrounding regions.

Keywords: vsr; super resolution; semantic guidance; resolution; implicit semantic

Journal Title: IEEE Transactions on Multimedia
Year Published: 2022

Link to full text (if available)


Share on Social Media:                               Sign Up to like & get
recommendations!

Related content

More Information              News              Social Media              Video              Recommended



                Click one of the above tabs to view related content.