Silent Speech Interface (SSI) based on neuromuscular signals has become popular in Human–Computer Interaction. Articulatory neuromuscular activity can be captured with surface electromyography (sEMG). However, in the absence of acoustic… Click to show full abstract
Silent Speech Interface (SSI) based on neuromuscular signals has become popular in Human–Computer Interaction. Articulatory neuromuscular activity can be captured with surface electromyography (sEMG). However, in the absence of acoustic sounds, it remains challenging to convert the articulatory neuromuscular signals of silent speakers into corresponding audio. This article proposes a Mandarin-based SSI, a novel converter between articulatory neuromuscular from silent speakers and audio signals, by providing an auxiliary set of sEMG and audio pairs from vocal speakers. The proposed method is based on a hybrid framework combining convolutional neural networks (CNNs) and Transformer to utilize both global and local information. We experimentally validate the feasibility of the proposed method by obtaining an average objective character error rate (CER) of 10.69% using an automatic speech recognition (ASR) evaluation tool with four silent speakers. The results show that our Mandarin-based SSI with vocal speaker assistance facilitates the conversion from sEMG into audio in the cross-subject scenario.
               
Click one of the above tabs to view related content.