New advances in Language Identification (LID) using Recurrent Neural Networks (RNNs) and Neural Embeddings have been proposed recently. While their application has been successfully applied at a word level, results… Click to show full abstract
New advances in Language Identification (LID) using Recurrent Neural Networks (RNNs) and Neural Embeddings have been proposed recently. While their application has been successfully applied at a word level, results at a phoneme level may not be as good because of the greater variability found in phoneme sequences which reduces LID accuracy. Thus, we propose to use phonetic units called “phone-grams” that implicitly include longer-context information and use them to train neural embeddings and RNN language models (RNNLMs). Neural embeddings are used in a pre-processing data phase to reduce the scattering problem produced by the high number of resulting phone-gram units, and, in a second phase, we have used the RNNLMs to obtain the scores of each language in the identification task following a PPRLM structure. Results in terms of Cavg on the KALAKA-3 database show that the use of phone-grams provides up to 14.4% relative improvement over a baseline using only phonemes as features. In addition, our proposed strategy of reducing the number of phone-gram units using neural embeddings contributes to obtain up to 22.5% relative improvement. Finally, fusing the best system with MFCC-based acoustic i-vectors and a traditional PPRLM architecture provides up to 37.76% improvement.
               
Click one of the above tabs to view related content.