Advances in emotional speech recognition and synthesis essentially rely on the availability of annotated emotional speech corpora. As a low resource language, the Thai language critically lacks corpora of emotional… Click to show full abstract
Advances in emotional speech recognition and synthesis essentially rely on the availability of annotated emotional speech corpora. As a low resource language, the Thai language critically lacks corpora of emotional speech, although a few corpora have been constructed for speech recognition and synthesis. This paper presents the design of a Thai emotional speech corpus (namely EMOLA), its construction and annotation process, and its analysis. In the corpus design, four basic types with twelve subtypes of emotions are defined with consideration of the Pleasure-Arousal-Dominance emotional state model. To construct the corpus, a series of Thai dramas (1397 min) were selected and its video clips of approximately 868 min were annotated. As a result, 8987 transcriptions (of conversation turns) were derived in total, with each transcription tagged as one basic type and a few subtypes. Finally, an analysis was conducted to describe the characteristics of this corpus in three sets of statistics: collection-level, annotator-oriented and actor-oriented statistics.
               
Click one of the above tabs to view related content.