This paper describes a new posed multimodal emotional dataset and compares human emotion classification based on four different modalities -audio, video, electromyography (EMG), and electroencephalography (EEG). Results are reported with… Click to show full abstract
This paper describes a new posed multimodal emotional dataset and compares human emotion classification based on four different modalities -audio, video, electromyography (EMG), and electroencephalography (EEG). Results are reported with several baseline approaches using various feature extraction techniques and machine learning algorithms. First, we collected a dataset from 11 human subjects expressing six basic emotions and a neutral emotion. We then extracted features from each modality using principal component analysis, autoencoder, convolution network, and Mel-frequency cepstrum coefficients (MFCC), some unique to individual modalities. A number of baseline models were applied to compare classification performance in emotion recognition, including k-nearest neighbors (KNN), support vector machines (SVM), random forest, multilayer perceptron (MLP), long short-term memory (LSTM) model, and convolutional neural network (CNN). Our results show that bootstrapping the biosensor signals (i.e., EMG and EEG) can greatly increase the emotion classification performance by reducing noise. In contrast, the best classification results are obtained by a traditional KNN, while audio and image sequences of human emotion can be better classified with LSTM.
               
Click one of the above tabs to view related content.