The objective of this paper is to demonstrate the significance of combining different features present in the glottal activity region for statistical parametric speech synthesis (SPSS). Different features present in… Click to show full abstract
The objective of this paper is to demonstrate the significance of combining different features present in the glottal activity region for statistical parametric speech synthesis (SPSS). Different features present in the glottal activity regions are broadly categorized as F0, system, and source features, which represent the quality of speech. F0 feature is computed from zero frequency filter and system feature is computed from 2-D based Riesz transform. Source features include aperiodicity and phase component. Aperiodicity component representing the amount of aperiodic component present in a frame is computed from Riesz transform, whereas, phase component is computed by modeling integrated linear prediction residual. The combined features resulted in better quality compared to STRAIGHT based SPSS both in terms of objective and subjective evaluation. Further, the proposed method is extended to two Indian languages, namely, Assamese and Manipuri, which shows similar improvement in quality.
               
Click one of the above tabs to view related content.