In this letter, a novel weighted mean square error (WMSE) is proposed to improve the DNN-based mask approximation method for speech enhancement, in which the weighting is closely related to… Click to show full abstract
In this letter, a novel weighted mean square error (WMSE) is proposed to improve the DNN-based mask approximation method for speech enhancement, in which the weighting is closely related to the power exponent about noisy spectrum amplitude (NSA) base. The power exponents 0 and 2 separately reflect ideal amplitude masking (IAM) without any clippings and the indirect mapping (IM) on short-time spectral amplitude (STSA), and it is highly related to the enhanced spectrum and the performance of the enhanced signal based on the tests. Also, the experimental results show that the outstanding weighting is the noisy spectrum base with the power exponent 1 for the phase-unaware masking and results in better harmonic structure restoration. The objective function with the WMSE on the NSA (WMSE-NSA) can averagely improve 0.1 on the test of perceptual evaluation of speech quality (PESQ) and 1.7% on the test of short-time objective intelligibility (STOI) compared with the MSE-based mask approximation methods.
               
Click one of the above tabs to view related content.