Moses, the open-source statistical machine translation toolkit

Moses is the open-source toolkit that made statistical machine translation widely available. It was introduced in the demo paper “Moses: Open Source Toolkit for Statistical Machine Translation,” presented at the 45th Annual Meeting of the Association for Computational Linguistics in Prague in 2007, with a long author list led by Philipp Koehn and Hieu Hoang and including Alexandra Birch, Chris Callison-Burch, Marcello Federico, Chris Dyer, Ondrej Bojar, and others (pages 177-180).

Before Moses, building a competitive translation system meant assembling many separate components and a great deal of in-house engineering. Moses bundled everything an end-to-end phrase-based system needed: tools to train translation and language models from parallel text, to tune the system’s weights, and to decode new sentences. It also added capabilities beyond plain phrase-based translation, such as support for linguistically motivated factors and confusion-network decoding for handling ambiguous input from, for example, a speech recognizer.

Because it was free and well-documented, Moses became the de facto standard for machine translation research and a common foundation for commercial systems through the late 2000s and early 2010s. It let universities, startups, and government agencies that could not build a translation engine from scratch train one on their own data, and it standardized how the field reported results, since everyone could compare against the same baseline.

Moses represents a recurring pattern in AI: a strong open-source release compresses years of specialized engineering into a tool anyone can run, accelerating an entire field. It played that role for statistical translation much as later open frameworks would for deep learning, and it remained in active use until neural machine translation displaced the statistical approach it embodied.

Moses, the open-source statistical machine translation toolkit

Sources

Related