The dueling network architecture was introduced in “Dueling Network Architectures for Deep Reinforcement Learning,” posted to arXiv on November 20, 2015 by Ziyu Wang, Tom Schaul, Matteo Hessel, Hado van Hasselt, Marc Lanctot, and Nando de Freitas at DeepMind. It changes the internal structure of a deep Q-network rather than the learning algorithm itself.
Instead of estimating the value of each action directly, the network splits into two streams: one estimates how good it is to be in the current state at all, and the other estimates the relative advantage of each available action. The two are recombined into action values. The benefit is that in many states the exact action choice barely matters, and separating the state value lets the network learn it efficiently without re-estimating it for every action.
This architecture set new records on Atari 2600 games and became another component folded into the Rainbow agent. For a general reader, it is a good example of how thoughtful network design, not just more data, can sharpen what a learning system pays attention to.