LAUSR.org creates dashboard-style pages of related content for over 1.5 million academic articles. Sign Up to like articles & get recommendations!

Multichannel CNN-BLSTM Architecture for Speech Emotion Recognition System by Fusion of Magnitude and Phase Spectral Features Using DCCA for Consumer Applications

Photo by visuals from unsplash

Conventional Speech Emotion Recognition (SER) approaches put more emphasis on extracting magnitude spectrum-based features, such as Mel Frequency Cepstral Coefficients (MFCCs), and Mel spectrogram. However, phase information is ignored due… Click to show full abstract

Conventional Speech Emotion Recognition (SER) approaches put more emphasis on extracting magnitude spectrum-based features, such as Mel Frequency Cepstral Coefficients (MFCCs), and Mel spectrogram. However, phase information is ignored due to signal processing difficulties such as the phase wrapping issue. This work develops a multichannel Convolution Neural Network-Bidirectional Long Short Term Memory (CNN-BLSTM) architectures with an attention mechanism for speaker-independent SER by considering phase and magnitude spectrum-based features. The phase-based features are extracted using the Modified Group Delay Function (MODGD). The obtained phase features are combined with MFCC features. The CNN-BLSTM network extract learned representation from magnitude and phase features. The learned representation from MFCCs and MODGD are combined and given as an input to the Support Vector Machine (SVM) for classification. The Deep Canonical Correlation Analysis (DCCA) is used to maximize the correlation between magnitude and phase information to improve the conventional SER system’s performance. Here the IEMOCAP database is used for performance analysis. The experimental results show improvement over MFCC features and existing approaches for unimodal SER. In this work, we also developed real-time Web server application for the proposed architecture.

Keywords: magnitude; cnn blstm; phase; magnitude phase; emotion recognition; speech emotion

Journal Title: IEEE Transactions on Consumer Electronics
Year Published: 2023

Link to full text (if available)


Share on Social Media:                               Sign Up to like & get
recommendations!

Related content

More Information              News              Social Media              Video              Recommended



                Click one of the above tabs to view related content.