Recurrent neural network (RNN) has the ability to learn long-term dependencies, which makes it suitable for acoustic modeling in speech recognition. In this paper, we revise RNN model used in… Click to show full abstract
Recurrent neural network (RNN) has the ability to learn long-term dependencies, which makes it suitable for acoustic modeling in speech recognition. In this paper, we revise RNN model used in acoustic modeling, namely, mGRUIP with Context module (mGRUIP-Ctx), and propose an advanced model which named Projected minimal Gated Recurrent Unit (PmGRU). The paper demonstrates two major contributions: firstly, in the case that adding context information to context module in mGRUIP-Ctx will bring about large amount of parameter, we propose to insert a smaller output projection layer after the mGRUIP-Ctx cell’s output to form the PmGRU, which is inspired by the idea of low-rank decomposition of matrix. The output projection layer has been proved to be able to save most of the effective information with the reduction of model parameters. Secondly, in the case that too much context information of previous layer introduced by context module will cause declining of model performance, we adjust the ratio of context information of the previous layer to the current layer by moving the position of batch normalization layer, and the final RNN model Normalization Projected minimal Gated Recurrent Unit (Norm-PmGRU) is generated. In the five automatic speech recognition (ASR) tasks, the Norm-PmGRU has been proved more effectively in the experiments compared with mGRUIP-Ctx, TDNN-OPGRU, TDNN-LSTMP and other RNN baseline acoustics models.
               
Click one of the above tabs to view related content.