"A Multilingual to Polyglot Speech Synthesizer for Indian Languages Using a Voice-Converted Polyglot Speech Corpus"

A multilingual synthesizer synthesizes speech, for any given monolingual or mixed-language text, that is intelligible to human listeners. The necessity for such synthesizer arises in a country like India, where multiple languages coexist. For the current work, multilingual synthesizers are developed using HMM-based speech synthesis technique. However, for a mixed-language text, the synthesized speech shows speaker switching at language switching points which is quite annoying to the listener. This is due to the fact that, speech data used for training is collected for each language from a different (native) speaker. To overcome the speaker switching at language switching points, a polyglot speech synthesizer is developed using polyglot speech corpus (all the speech data in a single speaker’s voice). The polyglot speech corpus is obtained using cross-lingual voice conversion (CLVC) technique. In the current work, polyglot synthesizer is developed for five languages namely Tamil, Telugu, Hindi, Malayalam and Indian English. The regional Indian languages considered are acoustically similar, to certain extent, and hence, common phoneset and question set is used to build the synthesizer. Experiments are carried out by developing various bilingual polyglot synthesizers to choose the language (thereby the speaker) that can be considered as target for polyglot synthesizer. The performance of the synthesizers is evaluated subjectively for speaker/language switching using perceptual test and quality using mean opinion score. Speaker identity is evaluated objectively using a GMM-based speaker identification system. Further, the polyglot synthesizer developed using polyglot speech corpus is compared with the adaptation-based polyglot synthesizer, in terms of quality of the synthesized speech and amount of data required for adaptation and voice conversion. It is observed that the performance of the polyglot synthesizer developed using polyglot speech corpus obtained from CLVC technique is better or almost similar to that of the adaptation-based polyglot synthesizer.

Keywords: polyglot speech; synthesizer; polyglot; speech; speech corpus; language

Journal Title: Circuits, Systems, and Signal Processing
Year Published: 2018

Link to full text (if available)

Share on Social Media: Sign Up to like & get
recommendations!
0

LAUSR

You are not signed in:

Sign Up!

Related content

More Information News Social Media Video Recommended