LAUSR.org creates dashboard-style pages of related content for over 1.5 million academic articles. Sign Up to like articles & get recommendations!

Hybrid machine learning classification scheme for speaker identification

Photo by cokdewisnu from unsplash

Motivated by the requirement to prepare for the next generation of “Automatic Spokesperson Recognition” (ASR) system, this paper applied the fused spectral features with hybrid machine learning (ML) strategy to… Click to show full abstract

Motivated by the requirement to prepare for the next generation of “Automatic Spokesperson Recognition” (ASR) system, this paper applied the fused spectral features with hybrid machine learning (ML) strategy to the speech communication field. This strategy involved the combined spectral features such as mel‐frequency cepstral coefficients (MFCCs), spectral kurtosis, spectral skewness, normalized pitch frequency (NPF), and formants. The characterization of suggested classification method could possibly serve in advanced speaker identification scenarios. Special attention was given to hybrid ML scheme capable of finding unknown speakers equipped with speaker id‐detecting classifier technique, known as “Random Forest‐Support Vector Machine” (RF‐SVM). The extracted speaker precise spectral attributes are applied to the hybrid RF‐SVM classifier to identify/verify the particular speaker. This work aims to construct an ensemble decision tree on a bounded area with minimal misclassification error using a hybrid ensemble RF‐SVM strategy. A series of standard, real‐time speaker databases, and noise conditions are functionally tested to validate its performance with other state‐of‐the‐art mechanisms. The proposed fusion method succeeds in the speaker identification task with a high identification rate (97% avg) and lower equal error rate (EER) (<2%), compared with the individual schemes for the recorded experimental dataset. The robustness of the classifier is validated using the standard ELSDSR, TIMIT, and NIST audio datasets. Experiments on ELSDSR, TIMIT, and NIST datasets show that the hybrid classifier produces 98%, 99%, and 94% accuracy, and EERs were 2%, 1%, and 2% respectively. The findings are then compared with well‐known other speaker recognition schemes and found to be superior.

Keywords: hybrid machine; speaker identification; identification; machine learning; speaker

Journal Title: Journal of Forensic Sciences
Year Published: 2022

Link to full text (if available)


Share on Social Media:                               Sign Up to like & get
recommendations!

Related content

More Information              News              Social Media              Video              Recommended



                Click one of the above tabs to view related content.