Window self-attention based Transformer receives the advanced results in image denoising. However, the current methods still have some limitations in capturing the global dependencies and local responses. To tackle these… Click to show full abstract
Window self-attention based Transformer receives the advanced results in image denoising. However, the current methods still have some limitations in capturing the global dependencies and local responses. To tackle these problems, this paper proposes a novel Transformer based image denoising method, called as CSformer, which is equipped with two key blocks, including the cross-scale features fusion (CS2F) block and mixed global-local Swin (M-Swin) Transformer block. The CSformer has a specific multi-scale framework, in which the multi-scale features, extracted by M-Swin Transformer, are fused using CS2F block. Such cross-scale fusion not only enriches the features, but also yields the multi-scale self-attention. In addition, the M-Swin Transformer block consists of the Swin Transformer block and the separable convolution based convolutional local-extraction (CLE) block, which can boost the ability of Transformer in local representation. We demonstrate the superiority of CSformer on some well-known datasets at different noise levels with comparisons to several state-of-the-art methods.
               
Click one of the above tabs to view related content.