WMT, the Conference on Machine Translation, is an annual venue that grew out of a workshop series first held in 2006 and is the closest thing machine translation has to a standing championship and rulebook. Its 2024 edition (WMT24) was held November 15-16 in Miami as part of EMNLP, billed as the ninth conference in the series.
The heart of WMT is its shared tasks: organized competitions in which teams translate the same held-out test sets so results can be compared fairly. Over the years these have spanned general news translation, low-resource language pairs, Indic languages, biomedical and patent text, literary translation, chat, quality estimation, and a Metrics task that evaluates the evaluators by checking which automatic scores best match human judgment. The common test sets WMT releases, such as the newstest collections, became reference benchmarks used far beyond the conference itself.
WMT matters because shared, standardized evaluation is what lets a field measure real progress rather than each group grading its own homework. Many of the metrics and findings in modern translation, from BLEU’s prominence to the rise of neural metrics like COMET, were shaped by WMT results.
For decision-makers, WMT is a useful signal: claims about translation quality carry more weight when validated on its public test sets and metrics tasks rather than on a vendor’s private benchmark.