LAUSR.org creates dashboard-style pages of related content for over 1.5 million academic articles. Sign Up to like articles & get recommendations!

Lexical Dependent Emotion Detection Using Synthetic Speech Reference

Photo by brandi1 from unsplash

This paper aims to create neutral reference models from synthetic speech to contrast the emotional content of a speech signal. Modeling emotional behaviors is a challenging task due to the… Click to show full abstract

This paper aims to create neutral reference models from synthetic speech to contrast the emotional content of a speech signal. Modeling emotional behaviors is a challenging task due to the variability in perceiving and describing emotions. Previous studies have indicated that relative assessments are more reliable than absolute assessments. These studies suggest that having a reference signal with known emotional content (e.g., neutral emotion) to compare a target sentence may produce more reliable metrics to identify emotional segments. Ideally, we would like to have an emotionally neutral sentence with the same lexical content as the target sentence where their contents are timely aligned. In this fictitious scenario, we would be able to identify localized emotional cues by contrasting frame-by-frame the acoustic features of the target and reference sentences. This paper explores the idea of building these reference sentences leveraging the advances in speech synthesis. This paper builds a synthetic speech signal that conveys the same lexical information and is timely aligned with the target sentence in the database. Since it is expected that a single synthetic speech will not capture the full range of variability observed in neutral speech, we build multiple synthetic sentences using various voices and text-to-speech approaches. This paper analyzes whether the synthesized signals provide valid template references to describe neutral speech using feature analysis and perceptual evaluation. Finally, we demonstrate how this framework can be used in emotion recognition, achieving improvements over classifiers trained with the state-of-the-art features in detecting low versus high levels of arousal and valence.

Keywords: synthetic speech; paper; sentence; reference; speech; emotion

Journal Title: IEEE Access
Year Published: 2019

Link to full text (if available)


Share on Social Media:                               Sign Up to like & get
recommendations!

Related content

More Information              News              Social Media              Video              Recommended



                Click one of the above tabs to view related content.