LAUSR.org creates dashboard-style pages of related content for over 1.5 million academic articles. Sign Up to like articles & get recommendations!

Incorporating Relative Position Information in Transformer-Based Sign Language Recognition and Translation

Photo by samule from unsplash

Recent advancements in machine translation tasks, with the advent of attention mechanisms and Transformer networks, have accelerated the research in Sign Language Translation (SLT), a spatio-temporal vision translation task. Fundamentally,… Click to show full abstract

Recent advancements in machine translation tasks, with the advent of attention mechanisms and Transformer networks, have accelerated the research in Sign Language Translation (SLT), a spatio-temporal vision translation task. Fundamentally, Transformers are unaware of the sequential orderings in input, and therefore position-information should be explicitly fed into them. The sequence learning capability of Transformers is heavily dependent on this ordering information. Compared to the existing Transformer models for SLT that use baseline version with sinusoidal position embedding, this work focuses on incorporating a new positioning scheme into the Transformer networks, in the context of SLT. This is the first work in SLT that explores the positioning scheme of Transformers for optimizing translation scores. The study proposes Gated Recurrent Unit (GRU)-Relative Sign Transformer (RST) for jointly learning Continuous Sign Language Recognition (CSLR) and translation. This model significantly improves the video translation quality. In this approach, GRU acts as the relative position encoder and RST is the Transformer model with relative position incorporated in the Multi-Head Attention (MHA). The evaluation was done on the RWTH-PHOENIX-2014T benchmark dataset. This study reports state-of-the-art Bilingual Evaluation Understudy (BLEU-4) score of 22.4 and Recall-Oriented Understudy for Gisting Evaluation (ROUGE) score of 48.55 for SLT, with GRU-RST. The best Word Error Rate (WER) obtained with this approach is 23.5. A detailed study of the position encoding schemes of Transformers is presented. Further, we analyze the translation performance under various combinations of the positioning schemes.

Keywords: position; relative position; transformer; translation; sign language

Journal Title: IEEE Access
Year Published: 2021

Link to full text (if available)


Share on Social Media:                               Sign Up to like & get
recommendations!

Related content

More Information              News              Social Media              Video              Recommended



                Click one of the above tabs to view related content.