Abstract. Automatic emotion recognition for video clips has become a popular area of research in recent years. Previous studies have explored emotion recognition methods through monomodal approaches, such as voice,… Click to show full abstract
Abstract. Automatic emotion recognition for video clips has become a popular area of research in recent years. Previous studies have explored emotion recognition methods through monomodal approaches, such as voice, text, facial expression, and physiological information. We focus on the complementarity of the information and construct an automatic emotion recognition model based on deep learning technology and multimodal fusion strategy. In this model, visual features, audio features, and text features are extracted from the video clips. A decision-level fusion strategy, based on the theory of evidence, is proposed to fuse the multiple classification results. To solve the problem of evidence conflict in evidence theory, we study a compatibility algorithm designed to correct conflicting evidence based on the similarity matrix of the evidence. This approach is shown to improve the accuracy of emotion recognition.
               
Click one of the above tabs to view related content.