Sound source localization and separation are essential functions for robot audition to comprehend acoustic environments. The widely-used multiple signal classification (MUSIC) can precisely estimate the directions of arrival (DoAs) of… Click to show full abstract
Sound source localization and separation are essential functions for robot audition to comprehend acoustic environments. The widely-used multiple signal classification (MUSIC) can precisely estimate the directions of arrival (DoAs) of multiple sound sources if its hyperparameters are selected appropriately depending on the surrounding environment. A popular separation method based on a complex Gaussian mixture model (CGMM), on the other hand, can extract multiple sources even in noisy environments if its latent variables are properly initialized to avoid bad local optima. To overcome the drawbacks of both the MUSIC and CGMM, we propose a robot audition framework that complementarily combines the MUSIC and CGMM in a probabilistic manner. Our method is based on a variant of the CGMM conditioned by the localization results of MUSIC. The hyperparameters of MUSIC are estimated by the type II maximum likelihood estimation of the CGMM, and the CGMM itself is efficiently initialized and regularized by using the localization results of MUSIC. Experimental results show that our method outperformed conventional localization and separation methods even when the number of sound sources is unknown. we also demonstrate that our method can work even with moving sound sources in real time.
               
Click one of the above tabs to view related content.