In human-machine interaction systems, speech emotion recognition plays a key role. Recognition of categorical emotions has made a great improvement during the last few decades, but emotion recognition of spontaneous… Click to show full abstract
In human-machine interaction systems, speech emotion recognition plays a key role. Recognition of categorical emotions has made a great improvement during the last few decades, but emotion recognition of spontaneous speech is still very challenging. This paper aims to investigate emotion recognition from the spontaneous speech in the three-dimensional model. Each dimension represents one primitive, generic attribute of an emotion. Middle levels of each dimension were introduced in this paper. LSTM network was employed to estimate the dimensions due to its effectiveness in speech emotion recognition. In the experiments, we use the IEMOCAP database and the accuracy is 30–35%. The confusion matrixes show that our method leads to a more concentrated dimension location. Furthermore, dimensions were applied in categorical emotion recognition. This indicates that increasing dimension levels could provide a possibility of dimension estimation, and suggests that it is possible to promote speech emotion recognition with dimensions.
               
Click one of the above tabs to view related content.