Google Translate goes live with statistical machine translation

On April 28, 2006, Google research scientist Franz Och announced on the company’s research blog that Google was putting its statistical machine translation system online. The first language pair was Arabic and English in both directions, which Och noted was especially demanding because Arabic “requires long-distance reordering of words and has a very rich morphology.”

The key idea was that the system learned to translate instead of being programmed to. As Och wrote, “we feed the computer with billions of words of text, both monolingual text in the target language, and aligned text consisting of examples of human translations between the languages.” From those examples, statistical learning built a translation model. This was the same reframing of translation as a statistics problem that IBM researchers had pioneered around 1990, now scaled up on Google’s data and infrastructure and offered free to anyone with a browser.

Och was candid about the limits. The system “works better for some types of text (e.g. news) than for others (e.g. novels),” and he warned against expecting good poetry translations. Early output was often clumsy and sometimes comically wrong, but it was instantly available, covered ever more languages, and improved as more data flowed in.

For business readers, Google Translate is a landmark in turning a hard AI research problem into a free consumer utility used by hundreds of millions. It also set the stage for one of AI’s clearest before-and-after moments a decade later, when Google replaced this statistical system with a neural one and translation quality jumped sharply.

Google Translate goes live with statistical machine translation

Sources

Related