The most commonly adopted approaches in speech emotion recognition (SER) utilize magnitude spectrum and nonlinear Teager energy operator (TEO) based features while information about phase spectrum is often omitted. The… Click to show full abstract
The most commonly adopted approaches in speech emotion recognition (SER) utilize magnitude spectrum and nonlinear Teager energy operator (TEO) based features while information about phase spectrum is often omitted. The information about phase has been frequently overlooked in approaches applied by speech processing researchers due to the signal processing difficulties. We present study of two phase-based features: The relative phase shift (RPS) based features and modified group delay features (MODGDF) that represents phase structure of speech in the task of emotional arousal recognition. The evaluation is performed on the CRISIS acted speech database which allows us to recognize five levels of emotional arousal from speech. To exploit these features, we employ concept of deep neural network. The efficiency of the approaches based on features mentioned earlier is compared to baseline platform using Mel frequency cepstral coefficients (MFCCs) and all pole group delay features (APGD). The combination of anot...
               
Click one of the above tabs to view related content.