The voice conversion system modifies the speaker specific characteristics of the source speaker to that of the target speaker, so it perceives like target speaker. The speaker specific characteristics of… Click to show full abstract
The voice conversion system modifies the speaker specific characteristics of the source speaker to that of the target speaker, so it perceives like target speaker. The speaker specific characteristics of the speech signal are reflected at different levels such as the shape of the vocal tract, shape of the glottal excitation and long term prosody. The shape of the vocal tract is represented by Line Spectral Frequency (LSF) and the shape of glottal excitation by Linear Predictive (LP) residuals. In this paper, the fourth level wavelet packet transform is applied to LP-residual to generate the sixteen sub-bands. This approach not only reduces the computational complexity but also presents a genuine transformation model over state of the art statistical prediction methods. In voice conversion, the alignment is an essential process which aligns the features of the source and target speakers. In this paper, the Mel Frequency Cepstrum Coefficients (MFCC) based warping path is proposed to align the LSF and LP-residual sub-bands using proposed constant source and constant target alignment. The conventional alignment technique is compared with two proposed approaches namely, constant source and constant target. Analysis shows that, constant source alignment using MFCC warping path performs slightly better than the constant target alignment and the state-of-the-art alignment approach. Generalized mapping models are developed for each sub-band using Radial Basis Function neural network (RBF) and are compared with Gaussian Mixture mapping model (GMM) and residual selection approach. Various subjective and objective evaluation measures indicate significant performance of RBF based residual mapping approach over the state-of-the-art approaches. HighlightsThe LSF fails to represent formant valleys but good for formant peaks. Hence, calculated warping path is not satisfactory to yield a better alignment.This LSF based warping overcome through a new alignment using MFCC based warping path, which improves the conversion performance of proposed system.Further, the existing techniques for mapping the LP-residual suffer from issues of artifacts generated in consecutive frames. The residual signal is also quite complex to map.In order to solve the high dimensionality issue of residual signal is reduced and the complexity of the model is decreased, the WPT and RBF pairs are employed.The experimental results prove that the proposed MFCC based warping path and the WPT-RBF based transformation for residual signal outperforms the state of the art methods of residual selection and GMM model.
               
Click one of the above tabs to view related content.