LAUSR.org creates dashboard-style pages of related content for over 1.5 million academic articles. Sign Up to like articles & get recommendations!

Understanding Subtitles by Character-Level Sequence-to-Sequence Learning

Photo from wikipedia

This paper presents a character-level seque-nce-to-sequence learning method, RNNembed. This method allows the system to read raw characters, instead of words generated by preprocessing steps, into a pure single neural… Click to show full abstract

This paper presents a character-level seque-nce-to-sequence learning method, RNNembed. This method allows the system to read raw characters, instead of words generated by preprocessing steps, into a pure single neural network model under an end-to-end framework. Specifically, we embed a recurrent neural network into an encoder–decoder framework and generate character-level sequence representation as input. The dimension of input feature space can be significantly reduced as well as avoiding the need to handle unknown or rare words in sequences. In the language model, we improve the basic structure of a gated recurrent unit by adding an output gate, which is used for filtering out unimportant information involved in the attention scheme of the alignment model. Our proposed method was examined in a large-scale dataset on an English-to-Chinese translation task. Experimental results demonstrate that the proposed approach achieves a translation performance comparable, or close, to conventional word-based and phrase-based systems.

Keywords: understanding subtitles; level sequence; sequence learning; character level; subtitles character; sequence

Journal Title: IEEE Transactions on Industrial Informatics
Year Published: 2017

Link to full text (if available)


Share on Social Media:                               Sign Up to like & get
recommendations!

Related content

More Information              News              Social Media              Video              Recommended



                Click one of the above tabs to view related content.