LAUSR.org creates dashboard-style pages of related content for over 1.5 million academic articles. Sign Up to like articles & get recommendations!

Nemesis: Neural Mean Teacher Learning-Based Emotion-Centric Speaker

Photo from wikipedia

Image captioning is the multi-modal task of automatically describing a digital image based on its contents and their semantic relationship. This research area has gained increasing popularity over the past… Click to show full abstract

Image captioning is the multi-modal task of automatically describing a digital image based on its contents and their semantic relationship. This research area has gained increasing popularity over the past few years; however, most of the previous studies have been focused on purely objective content-based descriptions of the image scenes. In this study, efforts have been made to generate more engaging captions by leveraging human-like emotional responses. To achieve this task, a mean teacher learning-based method has been applied to the recently introduced ArtEmis dataset. ArtEmis is the first large-scale dataset for emotion-centric image captioning, containing 455K emotional descriptions of 80K artworks from WikiArt. This method includes a self-distillation relationship between memory-augmented language models with meshed connectivity. These language models are trained in a cross-entropy phase and then fine-tuned in a self-critical sequence training phase. According to various popular natural language processing metrics, such as BLEU, METEOR, ROUGE-L, and CIDEr, our proposed model has obtained a new state of the art on ArtEmis.

Keywords: emotion centric; mean teacher; learning based; teacher learning; image

Journal Title: Algorithms
Year Published: 2023

Link to full text (if available)


Share on Social Media:                               Sign Up to like & get
recommendations!

Related content

More Information              News              Social Media              Video              Recommended



                Click one of the above tabs to view related content.