Sign language recognition is a conversion of sign language into text or speech, bridging the communication between the hearing and society. Recently, sequence-to-sequence video to text (S2VT) models has been… Click to show full abstract
Sign language recognition is a conversion of sign language into text or speech, bridging the communication between the hearing and society. Recently, sequence-to-sequence video to text (S2VT) models has been employed in the field of sign language recognition as an effective method. However, more than 20 million parameters trained in S2VT models will result in a huge consumption in memory and computational resources, making it hard to be applied in mobile devices. In order to overcome this issue, we proposed to employ tensor-train decomposition in S2VT models to reduce the parameters. First, the impact of parameters of tensor-train factorization on the model performance was investigated systematically. After that, we applied tensor-train decomposition in different layers of a S2VT model to establish 6 tensor-train S2VT models for Chinese sign language recognition. The experimental results demonstrated that when the fully-connected layer and the first LSTM layer in S2VT was represented with tensor-train format, the model could obtain the best performance, remaining high accuracy and reducing parameters and memory significantly. The proposed tensor-train S2VT models can also be applied in other sequence-to-sequence problems to improve the performance.
               
Click one of the above tabs to view related content.