The vibration of the vocal folds, which directly decides the properties of voice during phonations, is an important issue for speech processing. Radar sensors with good acoustic noise rejection and… Click to show full abstract
The vibration of the vocal folds, which directly decides the properties of voice during phonations, is an important issue for speech processing. Radar sensors with good acoustic noise rejection and directional discrimination have been widely used in time-varying vocal folds’ vibration detection. However, the specific methods for extracting multitarget vocal folds’ vibration have not been studied. In this article, we present a noncontact method based on multiple-input multiple-output (MIMO) frequency modulated continuous-wave (FMCW) radar to capture and analyze the vocal folds’ vibrations of multiple targets. The multiple signal classification (MUSIC)-based direction-of-arrival (DOA) estimation and linear constraint minimal variance (LCMV) adaptive digital beamforming (ADBF) are applied to obtain separated signals of multiple subjects. The empirical wavelet transform (EWT)-based algorithm is proposed to decompose the preprocessed signal to achieve the fundamental frequency component and extract the time-varying frequency of vocal folds’ vibration of each subject. Extensive experiments are carried out to confirm the capability of the proposed method. The results compared with the reference sensors show that the proposed method can effectively obtain vocal folds’ vibration frequency when multiple subjects speak simultaneously.
               
Click one of the above tabs to view related content.