Self-RAG, published by Akari Asai, Zeqiu Wu, Yizhong Wang, Avirup Sil, and Hannaneh Hajishirzi in October 2023, makes retrieval-augmented generation more selective and self-aware. Plain RAG retrieves a fixed number of passages for every query and stuffs them into the prompt, which can waste effort when the model already knows the answer and can drag in irrelevant text that hurts quality.
Self-RAG trains the model to do three things. It decides on demand whether retrieval is even needed for a given step rather than always fetching. It emits special reflection tokens that let it judge whether retrieved passages are relevant and whether its own generated text is supported and well-formed. And those same tokens make the model controllable at inference time, so a user can tune how aggressively it retrieves or how strictly it demands evidence.
The authors found that 7-billion- and 13-billion-parameter Self-RAG models outperformed ChatGPT and larger retrieval-augmented systems on question answering, reasoning, and fact verification, while improving factual accuracy and citation quality in longer-form answers.
For a business, Self-RAG points toward systems that retrieve only when useful and can flag when their own answers are not well supported, two properties that matter a great deal when accuracy and traceability are on the line.