Tutorial

Welcome to the homepage for our ACL 2016 tutorial, with more information and updated material to supplement the official ACL page! The presenters were Thang Luong @lmthang, Kyunghyun Cho @kchonyc, and Christopher Manning @chrmanning.

Neural Machine Translation (NMT) is a simple new architecture for getting machines to learn to translate. Despite being relatively new (Kalchbrenner and Blunsom, 2013; Cho et al., 2014; Sutskever et al., 2014), NMT has already shown promising results, achieving state-of-the-art performance for various language pairs (Luong et al, 2015a; Jean et al, 2015; Luong et al, 2015b; Sennrich et al., 2016; Luong and Manning, 2016). While many of these NMT papers were presented to the ACL community, research and practice of NMT are only at their beginning stage. This tutorial would be a great opportunity for the whole community of machine translation and natural language processing to learn more about a very promising new approach to MT. This tutorial has four parts.

In the first part, we start with an introduction to (neural) machine translation before discussing phrase-based statistical machine translation which has been a dominant approach over the past twenty years. We then go through background of neural language models, including recurrent neural language models, which is the basis for NMT.

The second part describes basics of NMT. We go into details how to train NMT with the maximum likelihood estimation approach and the back-propagation-through-time algorithm. We explain why the vanishing gradient problem happens to motivate usages of gated recurrent and long short-term memory units. After that, various decoding strategies are highlighted.

The third part of our tutorial describes techniques to build state-of-the-art NMT. We start with approaches to extend the vocabulary coverage of NMT (Luong et al., 2015a; Jean et al., 2015; Chitnis and DeNero, 2015). We then introduce the idea of jointly learning both translations and alignments through an attention mechanism (Bahdanau et al., 2015); other variants of attention (Luong et al., 2015b; Tu et al., 2016) are discussed too. We describe a recent trend in NMT, that is to translate at the sub-word level (Chung et al., 2016; Luong and Manning, 2016; Sennrich et al., 2016), so that language variations can be effectively handled.

Lastly, we conclude by describing promising approaches and areas for the future or NMT. Topics include (a) combining multiple tasks to help translation (Dong et al., 2015; Luong et al., 2016; Firat et al., 2016; Zoph and Knight, 2016), (b) building larger-context NMT, (c) running NMT on mobile devices (See et al., 2016; Kim and Rush, 2016), (d) making unsupervised learning work, (and) approaches beyond maximum likelihood estimation.

Slides

You can download: slides (version 4). This was the version used at the ACL 2016 tutorial.

Tutorial Content

    1. Introduction - 40mins (Chris Manning)

      1. Intro to (Neural) Machine Translation

      2. Phrase-based Statistical Machine Translation

      3. Neural Language Models

    2. Basic NMT - 50mins (Kyunghyun Cho)

      1. Training: maximum likelihood estimation with backpropagation through time

      2. Vanishing gradient and gated recurrent units/long short-term memory units

      3. Conditional recurrent language modeling: Encoder-Decoder

      4. Decoding strategies

    1. Advanced NMT - 60mins (Thang Luong)

      1. Extending the vocabulary coverage

      2. Learning alignment: attention mechanism

      3. Handling language variations: subword-level translation

      4. Utilizing monolingual data

    1. Future of NMT - 30mins (Kyunghyun Cho and Thang Luong)

      1. Multilingual/multi-task learning

      2. Larger context

      3. Mobile devices

      4. Beyond maximum likelihood estimation

Selected References (see the sidebar for a link to all references for the tutorial)

Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. ICLR.

Rohan Chitnis and John DeNero. 2015. Variable-Length Word Encodings for Neural Translation Models. EMNLP.

Kyunghyun Cho, Bart Van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. EMNLP.

Junyoung Chung, Kyunghyun Cho, and Yoshua Bengio. 2016. A Character-level Decoder without Explicit Segmentation for Neural Machine Translation. ACL.

Daxiang Dong, Hua Wu, Wei He, Dianhai Yu, and Haifeng Wang. 2015. Multi-task learning for multiple language translation. ACL.

Orhan Firat, Kyunghyun Cho, Yoshua Bengio. 2016. Multi-Way, Multilingual Neural Machine Translation with a Shared Attention Mechanism. NAACL.

Mikel L. Forcada and Ramón Ñeco. 1997. Recursive hetero-associative memories for translation. In Biological and Artificial Computation: From Neuroscience to Technology, pages 453–462. Springer.

Sebastien Jean, Kyunghyun Cho, Roland Memisevic, and Yoshua Bengio. 2015. On using very large target vocabulary for neural machine translation. ACL.

Nal Kalchbrenner and Phil Blunsom. 2013. Recurrent continuous translation models. EMNLP.

Yoon Kim and Alexander M. Rush. 2016. Sequence-Level Knowledge Distillation. EMNLP

Minh-Thang Luong, Ilya Sutskever, Quoc V. Le, Oriol Vinyals, and Wojciech Zaremba, 2015a. Addressing the rare word problem in neural machine translation. ACL.

Minh-Thang Luong, Hieu Pham, and Christopher D Manning. 2015b. Effective approaches to attention-based neural machine translation. EMNLP.

Minh-Thang Luong, Quoc V. Le, Ilya Sutskever, Oriol Vinyals, and Lukasz Kaiser, 2016. Multi-task Sequence to Sequence Learning. ICLR.

Minh-Thang Luong and Christopher D Manning. 2016. Achieving Open Vocabulary Neural Machine Translation with Hybrid Word-Character Models. ACL.

Abigail See, Minh-Thang Luong, and Christopher D Manning. 2016. Compression of Neural Machine Translation Models via Pruning. CoNLL.

Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016. Improving neural machine translation models with monolingual data. ACL.

Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016. Neural machine translation of rare words with subword units. ACL.

Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. Sequence to sequence learning with neural networks. NIPS.

Zhaopeng Tu, Zhengdong Lu, Yang Liu, Xiaohua Liu, and Hang Li. 2016. Modeling Coverage for Neural Machine Translation. ACL.

Barret Zoph and Kevin Knight. 2016. Multi-source neural translation. In NAACL.