LAUSR.org creates dashboard-style pages of related content for over 1.5 million academic articles. Sign Up to like articles & get recommendations!

Dilated residual networks with multi-level attention for speaker verification

Photo from wikipedia

Abstract With the development of deep learning techniques, speaker verification (SV) systems based on deep neural network (DNN) achieve competitive performance compared with traditional i-vector-based works. Previous DNN-based SV methods… Click to show full abstract

Abstract With the development of deep learning techniques, speaker verification (SV) systems based on deep neural network (DNN) achieve competitive performance compared with traditional i-vector-based works. Previous DNN-based SV methods usually employ time-delay neural network, limiting the extension of the network for an effective representation. Besides, existing attention mechanisms used in DNN-based SV systems are only applied to a single level of network architectures, leading to insufficiently extraction of important features. To address above issues, we propose an effective deep speaker embedding architecture for SV, which combines a residual connection of one-dimensional dilated convolutional layers, called dilated residual networks (DRNs), with a multi-level attention model. The DRNs can not only capture long time-frequency context information of features, but also exploit information from multiple layers of DNN. In addition, the multi-level attention model, which consists of two-dimensional convolutional block attention modules employed at the frame level and the vector-based attention utilized at the pooling layer, can emphasize important features at multiple levels of DNN. Experiments conducted on NIST SRE 2016 dataset show that the proposed architecture achieves a superior equal error rate (EER) of 7.094% and a better detection cost function (DCF16) of 0.552 compared with state-of-the-art methods. Furthermore, the ablation experiments demonstrate the effectiveness of dilated convolutions and the multi-level attention on SV tasks.

Keywords: residual networks; speaker verification; level attention; dilated residual; multi level; attention

Journal Title: Neurocomputing
Year Published: 2020

Link to full text (if available)


Share on Social Media:                               Sign Up to like & get
recommendations!

Related content

More Information              News              Social Media              Video              Recommended



                Click one of the above tabs to view related content.