LAUSR.org creates dashboard-style pages of related content for over 1.5 million academic articles. Sign Up to like articles & get recommendations!

A Controllable Multi-Lingual Multi-Speaker Multi-Style Text-to-Speech Synthesis With Multivariate Information Minimization

Photo from wikipedia

In this letter, we propose a multivariate information minimization method that disentangles three or more latent representations. We show that control factors can be disentangled by minimizing interactive dependency, which… Click to show full abstract

In this letter, we propose a multivariate information minimization method that disentangles three or more latent representations. We show that control factors can be disentangled by minimizing interactive dependency, which can be expressed as a sum of mutual information upper bound terms. Since the upper bound estimate converges from the early training stage, there is little performance degradation due to auxiliary loss. The proposed technique is applied to train a text-to-speech synthesizer with multi-lingual, multi-speaker, and multi-style corpora. Subjective listening tests validate that the proposed method can improve the synthesizer in terms of quality as well as controllability.

Keywords: multi; information minimization; information; multivariate information; multi lingual; text speech

Journal Title: IEEE Signal Processing Letters
Year Published: 2022

Link to full text (if available)


Share on Social Media:                               Sign Up to like & get
recommendations!

Related content

More Information              News              Social Media              Video              Recommended



                Click one of the above tabs to view related content.