Rainbow was introduced in “Rainbow: Combining Improvements in Deep Reinforcement Learning,” posted to arXiv on October 6, 2017 by Matteo Hessel, Joseph Modayil, Hado van Hasselt, Tom Schaul, Georg Ostrovski, Will Dabney, Dan Horgan, Bilal Piot, Mohammad Azar, and David Silver at DeepMind, and presented at AAAI 2018. By 2017 several independent extensions to the original Deep Q-Network had each shown gains in isolation, but it was unclear whether they were complementary.
Rainbow integrates six of them into a single agent: Double DQN, dueling architectures, prioritized experience replay, multi-step learning, distributional reinforcement learning, and noisy networks for exploration. The combined agent set a new state of the art on the Atari 2600 benchmark, and an ablation study removing one component at a time showed which contributions mattered most, with prioritized replay and multi-step returns standing out.
Rainbow became a key reference point for value-based deep RL. For a general reader, it is a clear demonstration that well-understood improvements can stack, and that careful engineering of how techniques combine is itself a research contribution.