The quality of underwater imagery is inherently degraded by light absorption and scattering, a challenge that severely limits its application in critical domains such as marine robotics and archeology. While… Click to show full abstract
The quality of underwater imagery is inherently degraded by light absorption and scattering, a challenge that severely limits its application in critical domains such as marine robotics and archeology. While existing enhancement methods, including recent hybrid models, attempt to address this, they often struggle to restore fine-grained details without introducing visual artifacts. To overcome this limitation, this work introduces a novel hybrid U-Net-Transformer (UTR) architecture that synergizes local feature extraction with global context modeling. The core innovation is a Recurrent Multi-Scale Feature Modulation (R-MSFM) mechanism, which, unlike prior recurrent refinement techniques, employs a gated modulation strategy across multiple feature scales within the decoder to iteratively refine textural and structural details with high fidelity. This approach effectively preserves spatial information during upsampling. Extensive experiments demonstrate the superiority of the proposed method. On the EUVP dataset, UTR achieves a PSNR of 28.347 dB, a significant gain of +3.947 dB over the state-of-the-art UWFormer. Moreover, it attains a top-ranking UIQM score of 3.059 on the UIEB dataset, underscoring its robustness. The results confirm that UTR provides a computationally efficient and highly effective solution for underwater image enhancement.
               
Click one of the above tabs to view related content.