“Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer” was submitted to arXiv in October 2019 by Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter Liu of Google. It introduced T5, the Text-to-Text Transfer Transformer.
T5’s central idea was to frame every text-based problem in one format: text in, text out. Translation, summarization, classification, and question answering were all expressed as taking an input string and producing an output string, so a single model and training objective could handle them all. The paper was also a large, systematic study, sweeping across architectures, pre-training objectives, datasets, and fine-tuning strategies to find what actually mattered for transfer learning, and scaling the largest model to 11 billion parameters.
To pre-train at scale the team built and released the Colossal Clean Crawled Corpus, or C4, a cleaned dataset derived from Common Crawl that became a widely used resource in its own right. T5 reached state-of-the-art results across many language benchmarks and helped cement both the unified text-to-text framing and the practice of carefully cleaning web-scale training data.