“Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network,” submitted to arXiv on September 15, 2016 by Christian Ledig and colleagues, introduced SRGAN, the first framework able to infer photo-realistic natural images at a 4x upscaling factor. Earlier super-resolution methods, including SRCNN, optimized for pixel-level accuracy by minimizing mean squared error. That objective rewards a safe, blurry average of all plausible high-resolution images, so the results scored well on accuracy metrics but looked smooth and lacked believable fine detail.
SRGAN attacked the perceptual-quality problem directly with two ideas. First, it added an adversarial loss: a discriminator network learns to tell generated high-resolution images from real ones, pushing the generator toward outputs that live on the manifold of natural images rather than a blurry mean. Second, it replaced pixel-space content loss with a perceptual loss computed in the feature space of a pretrained network, so the model is rewarded for matching high-level texture and structure rather than exact pixel values. The generator itself is a deep residual network.
In mean-opinion-score testing, SRGAN produced results that human raters judged far closer to the original high-resolution images than competing methods, even though its MSE-based scores were worse - a vivid demonstration that the metric you optimize is not always the quality you want. The paper was presented at CVPR 2017.
Why business readers should care: SRGAN crystallized a lesson that recurs across generative AI - optimizing the obvious numerical target can give you the wrong product. Sometimes you have to optimize for how something looks or feels to a person, which is exactly what adversarial and perceptual losses do.