Effective Approaches to Attention-based NMT (Luong Attention)

“Effective Approaches to Attention-based Neural Machine Translation” by Minh-Thang Luong, Hieu Pham, and Christopher D. Manning (submitted August 17, 2015) followed shortly after the first attention mechanism for translation and refined it into forms that were simpler and easier to use. The paper studies “two simple and effective classes of attentional mechanism”: a global approach that attends to all source words, and a local approach that focuses on only a small window at a time.

The practical results were strong. The authors report that “with local attention, we achieve a significant gain of 5.0 BLEU points over non-attentional systems,” and their ensemble set “a new state-of-the-art result in the WMT’15 English to German translation task with 25.9 BLEU points.” Beyond the scores, the paper’s contribution was a cleaner set of scoring functions for computing how much each source word should matter, which many later systems adopted under the name Luong attention.

This work matters because attention went on to become the central idea of the transformer and of every large language model since. The Luong paper is part of the short, intense burst of 2014 and 2015 research that turned attention from a clever add-on into a standard component of modern language technology.

Effective Approaches to Attention-based NMT (Luong Attention)

Sources

Related