“Auto-Encoding Variational Bayes,” posted to arXiv on December 20, 2013 by Diederik P. Kingma and Max Welling, introduced what became known as the variational autoencoder, or VAE. The paper addressed a long-standing difficulty in machine learning: how to do efficient inference and learning in directed probabilistic models that have continuous latent variables whose true posterior distribution is intractable to compute.
The key idea was the reparameterization trick. Rather than sampling a latent variable directly, which blocks gradients from flowing back through the sampling step, the authors rewrote the random sample as a deterministic function of the model’s parameters plus a fixed source of noise. This let them optimize a variational lower bound on the data likelihood using ordinary stochastic gradient descent, the same workhorse method used to train neural networks. An encoder network maps each input to a distribution over latent codes, and a decoder network reconstructs the input from a sampled code.
The VAE became one of the two dominant families of deep generative models in the years that followed, alongside generative adversarial networks. Its probabilistic framing gave it a smooth, structured latent space useful for interpolation and representation learning, and its descendants such as VQ-VAE fed directly into later image and audio generation systems. For a general reader, the VAE matters because it showed that a neural network could learn to compress data into a meaningful internal representation and then generate new, plausible examples from it, a capability that underpins much of modern generative AI.