Emotional Internet of Things (EmIoT), which provides Internet of Things (IoT) devices cognitive and socialization capabilities, has been regarded as a future direction to improve users’ experiences. With the development… Click to show full abstract
Emotional Internet of Things (EmIoT), which provides Internet of Things (IoT) devices cognitive and socialization capabilities, has been regarded as a future direction to improve users’ experiences. With the development of intelligent techniques, the requirement of EmIoT is not only sensing the users’ emotional states but also providing emotional feedbacks. Human–computer interaction has been studied to achieve speech interaction with IoT devices. The recent advances in neural text-to-speech (TTS) have made “human parity” synthesized speech possible for IoT-enabled human–computer interaction. Furthermore, emotion control can be achieved by using the emotional codes in a unified model, referred to as emotional TTS (or ETTS for short). Such ETTS models have achieved promising emotional expressiveness using large-scale emotion-annotated English data set; however, they are not practical in IoT environments with other mainstream languages, especially for Chinese. In fact, the limited available large-scale emotion-annotated data set is challenging the development of Chinese ETTS. To address that we propose a multistage deep transfer learning scheme to design a high-quality Chinese ETTS system under a small-scale training corpus to achieve EmIoT in Mandarin environments. In this scheme, the pretrained knowledge from the former stages corresponding to a large-scale neutral English and a medium-scale emotional English corpora is transferred to a Mandarin ETTS model. Thereby, the trained model can achieve high-quality emotional speech with limited available emotional corpus, which is able to serve various EmIoT-oriented applications. The experiments have been conducted to demonstrate the effectiveness and superiority of the proposed model as compared to other counterparts in terms of naturalness and emotional expressiveness. We refer readers to visit our demo Webpage1 enjoy the synthesized speech samples.
               
Click one of the above tabs to view related content.