LAUSR.org creates dashboard-style pages of related content for over 1.5 million academic articles. Sign Up to like articles & get recommendations!

Lhasa-Tibetan Speech Synthesis Using End-to-End Model

Photo by historyhd from unsplash

With the development of deep learning technology, speech synthesis based on deep neural networks has gradually become the mainstream method in the field of speech synthesis. In this paper, we… Click to show full abstract

With the development of deep learning technology, speech synthesis based on deep neural networks has gradually become the mainstream method in the field of speech synthesis. In this paper, we explored the Tacotron2 model for Lhasa-Tibetan dialect speech synthesis by constructing a feature prediction network with a seq2seq structure which maps the character vector to Mel spectrum, and combining with the WaveNet model trained in a semi-supervised way to synthesize the Mel spectrum into a time domain waveform. The model avoids processing front-end text analysis that requires extensive prior knowledge in Lhasa-Tibetan dialect and reduces the need of a large amount of transcribed speech data. Experimental results show that the proposed method is effective and has higher clarity and naturalness than other related synthesis models for Lhasa-Tibetan dialect.

Keywords: speech synthesis; synthesis; end; lhasa tibetan; model

Journal Title: IEEE Access
Year Published: 2019

Link to full text (if available)


Share on Social Media:                               Sign Up to like & get
recommendations!

Related content

More Information              News              Social Media              Video              Recommended



                Click one of the above tabs to view related content.