As an essential biological feature of human beings, voiceprint is increasingly used in medical research and diagnosis, especially in identifying Parkinson’s Disease (PD). This paper proposes a Spectrogram Deep Convolutional… Click to show full abstract
As an essential biological feature of human beings, voiceprint is increasingly used in medical research and diagnosis, especially in identifying Parkinson’s Disease (PD). This paper proposes a Spectrogram Deep Convolutional Generative Adversarial Network (S-DCGAN) for sample augmentation to overcome the limited amount of existing patient voiceprint datasets and samples. S-DCGAN generates a high-resolution spectrogram by increasing network layers, adding the Spectral Normalization (SN) method, and combining feature matching strategy. The high-similarity and low-distortion spectrogram are selected in light of Structural Similarity Index (SSIM) values and Peak Signal to Noise Ratio (PSNR) to augment the samples. Fréchet Inception Distance (FID) and GAN-train result show the generalization ability of the generated data. We construct the ResNet50 model with a Global Average Pooling(GAP) layer to extract the voiceprint features and classify them effectively to improve recognition accuracy. The GAP suppresses the over-fitting problem and optimizes quickly. Finally, on the Sakar dataset, comparative experiments were conducted on different models and classification methods. Results show that the S-DCGAN-ResNet50 hybrid model can achieve the highest voiceprint recognition accuracy of 91.25% and specificity of 92.5%, which can distinguish between PD patients and healthy people more precisely compared with DCGAN-ResNet50. It augments the application environment of voiceprint recognition in the medical field and makes it universal in different datasets.
               
Click one of the above tabs to view related content.