LAUSR.org creates dashboard-style pages of related content for over 1.5 million academic articles. Sign Up to like articles & get recommendations!

Closed-set speaker identification using VQ and GMM based models

An array of features and methods are being developed over the past six decades for Speaker Identification (SI) and Speaker Verification (SV) , jointly known as Speaker Recognition(SR) . Mel… Click to show full abstract

An array of features and methods are being developed over the past six decades for Speaker Identification (SI) and Speaker Verification (SV) , jointly known as Speaker Recognition(SR) . Mel Frequency Cepstral Coefficients (MFCC) is generally used as feature vectors in most of the cases because it gives higher accuracy compared to other features. The presented paper focuses on comparative study of state-of-the-art SR techniques along with their design challenges, robustness issues and performance evaluation methods. Rigorous experiments have been performed using Gaussian Mixture Model (GMM) with variations like Universal Background Model (UBM) and/or Vector Quantization (VQ) and/or VQ based UBM-GMM (VQ-UBM-GMM) with detail discussion. Other popular methods have been included, namely, Linear Discriminate Analysis (LDA) , Probabilistic LDA (PLDA) , Gaussian PLDA (GPLDA) , Multi-condition GPLDA (MGPLDA), Identity Vector (i-vector) for comparative study only. Three popular audio data-sets have been used in the experiments, namely, IITG-MV SR, Hyke-2011 and ELSDSR. Hyke-2011 and ELSDSR contain clean speech while IITG-MV SR contains noisy audio data with variations in channel (device), environment, spoken style. We propose a new data mixing approach for SR to make the system independent of recording device, spoken style and environment. The accuracy we obtained for VQ and GMM based methods for databases, Hyke-2011 and ELSDSR are varies from $$99.6\%$$ 99.6 % to $$100\%$$ 100 % whereas accuracy for IITG-MV SR is upto $$98\%$$ 98 % . Indeed, in some cases the accuracies degrade drastically due to mismatch between training and testing data as well as singularity problem of GMM. The experimental results serve as a benchmark for VQ/GMM/UBM based methods for the IITG-MV SR database.

Keywords: gmm based; speaker; hyke 2011; speaker identification

Journal Title: International Journal of Speech Technology
Year Published: 2022

Link to full text (if available)


Share on Social Media:                               Sign Up to like & get
recommendations!

Related content

More Information              News              Social Media              Video              Recommended



                Click one of the above tabs to view related content.