This paper studies the performance degradation of Gaussian probabilistic linear discriminant analysis (GPLDA) speaker verification system, when only short-utterance data is used for speaker verification system development. Subsequently, a number of… Click to show full abstract
This paper studies the performance degradation of Gaussian probabilistic linear discriminant analysis (GPLDA) speaker verification system, when only short-utterance data is used for speaker verification system development. Subsequently, a number of techniques, including utterance partitioning and source-normalised weighted linear discriminant analysis (SN-WLDA) projections are introduced to improve the speaker verification performance in such conditions. Experimental studies have found that when short utterance data is available for speaker verification development, GPLDA system overall achieves best performance with a lower number of universal background model (UBM) components. As a lower number of UBM components significantly reduces the computational complexity of speaker verification system, that is a useful observation. In limited session data conditions, we propose a simple utterance-partitioning technique, which when applied to the LDA-projected GPLDA system shows over 8% relative improvement on EER values over baseline system on NIST 2008 truncated 10–10 s conditions. We conjecture that this improvement arises from the apparent increase in the number of sessions arising from our partitioning technique and this helps to better model the GPLDA parameters. Further, partitioning SN-WLDA-projected GPLDA shows over 16% and 6% relative improvement on EER values over LDA-projected GPLDA systems respectively on NIST 2008 truncated 10–10 s interview-interview, and NIST 2010 truncated 10–10 s interview-interview and telephone-telephone conditions.
               
Click one of the above tabs to view related content.