BART: Denoising Sequence-to-Sequence Pre-training

BART was introduced in “BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension” by Mike Lewis and colleagues at Facebook AI (submitted October 29, 2019). It uses a standard transformer encoder-decoder, the same shape used in machine translation, but trains it in a self-supervised way. As the authors describe, “BART is trained by (1) corrupting text with an arbitrary noising function, and (2) learning a model to reconstruct the original text.”

What set BART apart was the variety of corruptions it learned to undo. The best configuration came from “randomly shuffling the order of the original sentences and using a novel in-filling scheme, where spans of text are replaced with a single mask token.” Because it must regenerate the original, BART is naturally good at producing text, not just classifying it. The authors frame the architecture as a generalization that “can be seen as generalizing BERT” (an encoder) and GPT (a decoder) within one model, and they report gains of up to 6 ROUGE points on abstractive summarization along with strong results on dialogue and question answering.

BART matters to general readers because it is one of the workhorse models behind automatic summarization and text rewriting tools. Its denoising recipe showed that teaching a model to repair damaged text is a powerful way to prepare it for many downstream writing tasks.

BART: Denoising Sequence-to-Sequence Pre-training

Sources

Related