Retrieval-Augmented Generation (RAG)

Retrieval-Augmented Generation (RAG) combines a language model with a search step: before answering, the system retrieves relevant documents from an external source and feeds them to the model as context. This grounds the model’s output in specific, up-to-date material rather than relying solely on what it memorized during training.

The technique was introduced by Lewis et al. in the 2020 paper “Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks,” work from Facebook AI Research and collaborators. The abstract notes that large pre-trained models “store factual knowledge in their parameters” but argues that pairing them with an explicit retrieval step improves accuracy on knowledge-heavy tasks and makes sources traceable.

RAG has become the standard way to connect a general model to a company’s private or current data without retraining it.

Why business readers should care: RAG is the dominant pattern for putting an LLM on top of your own documents, policies, or product data. It reduces made-up answers, keeps responses current, and lets you cite sources — usually faster and cheaper than fine-tuning.

Sources

Last verified June 6, 2026