Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

“Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks,” submitted to arXiv on May 22, 2020 by Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Kuttler, Mike Lewis, Wen-tau Yih, Tim Rocktaschel, Sebastian Riedel, and Douwe Kiela, is the paper that gave RAG its name and its first concrete recipe. It was published at NeurIPS 2020.

The architecture combines two kinds of memory. Parametric memory is the knowledge baked into a pre-trained seq2seq model’s weights. Non-parametric memory is an external store - here, a dense vector index of Wikipedia accessed through a neural retriever (Dense Passage Retrieval). When the model answers a question, the retriever first pulls relevant passages, and the generator conditions on them. The paper explores two variants: one that uses the same retrieved passages for the whole answer, and one that can attend to different passages for each generated token.

The results set state of the art on three open-domain question-answering tasks. Beyond QA, the authors found that retrieval made generation “more specific, diverse and factual” than a comparable model relying on its weights alone - the central argument for why grounding a model in retrieved documents reduces hallucination.

Why business readers should care: nearly every production system that lets a language model answer questions over a company’s own documents traces back to this paper. RAG is how organizations connect a general model to private, current, and verifiable knowledge without retraining it.

Sources

Last verified June 7, 2026