Double DQN was introduced in “Deep Reinforcement Learning with Double Q-learning,” posted to arXiv on September 22, 2015 by Hado van Hasselt, Arthur Guez, and David Silver at DeepMind. The paper demonstrates that the original Deep Q-Network commonly overestimates action values during training, because the same network both selects and evaluates the best next action, and any noise in the estimates is amplified by always taking the maximum.
The fix is to decouple those two roles. Double DQN uses one network to choose the best action and a separate target network to evaluate it, adapting the older tabular Double Q-learning idea to deep function approximation. This requires almost no extra computation but produces more accurate value estimates and better, more stable performance across the Atari benchmark.
Double DQN became one of the standard ingredients later combined in the Rainbow agent. For a general reader, it illustrates a recurring theme in reinforcement learning: small structural changes that correct a hidden bias can matter more than scaling up the model.