In recent years, the transformer-based dual-branch magnitude and complex spectrum estimation framework achieves state-of-the-art performance for monaural speech enhancement. However, the insufficient utilization of the interactive information in the middle… Click to show full abstract
In recent years, the transformer-based dual-branch magnitude and complex spectrum estimation framework achieves state-of-the-art performance for monaural speech enhancement. However, the insufficient utilization of the interactive information in the middle layers makes each branch lack the ability of compensation and rectification. To address this problem, this letter proposes a novel dual-branch progressive fusion rectification network (PFRNet) for monaural speech enhancement. PFRNet is an encoder-decoder-based dual-branch structure with interactive improved real & complex transformers. In PFRNet, the fusion rectification block is proposed to convert the implicit relationship of the two branches into a fusion feature by the frequency-domain mutual attention mechanism. The fusion feature provides a platform for the interaction in the middle layers. The interactive time-frequency improved real & complex transformer can make better use of the long-term dependencies in the time-frequency domain. Experimental results show that the proposed PFRNet outperforms most advanced dual-branch speech enhancement approaches and previous advanced systems in terms of speech quality and intelligibility.
               
Click one of the above tabs to view related content.