Landmark Papers

What the papers actually said - linked to the originals.

644 entries, all primary-sourced
paper June 27, 2016

Gaussian Error Linear Units (GELUs)

The 2016 Hendrycks-Gimpel paper introducing the GELU activation function, the smooth nonlinearity used in BERT, GPT, and most modern Transformers.

paper July 21, 2016

Layer Normalization

The 2016 Ba-Kiros-Hinton paper introducing layer normalization, the per-example normalization that became standard inside every Transformer.

paper October 26, 2016

Universal Adversarial Perturbations

A 2016 paper showing a single fixed perturbation can fool a vision classifier on most natural images at once, not just one chosen image.

paper November 24, 2016

The Off-Switch Game

A 2016 paper showing that an AI kept uncertain about its true objective has an incentive to let humans switch it off rather than resist.

paper January 26, 2017

Wasserstein GAN (WGAN)

WGAN reformulated GAN training around the Wasserstein distance, giving more stable training and a meaningful loss curve.

paper March 20, 2017

Mask R-CNN

Mask R-CNN added a mask branch to Faster R-CNN, predicting a pixel-level outline for every detected object in one pass.

paper June 12, 2017

Attention Is All You Need

The 2017 Google paper that introduced the Transformer architecture, the foundation of virtually all modern large language models.