Landmark Papers

What the papers actually said - linked to the originals.

644 entries, all primary-sourced

paper June 27, 2016

Gaussian Error Linear Units (GELUs)

The 2016 Hendrycks-Gimpel paper introducing the GELU activation function, the smooth nonlinearity used in BERT, GPT, and most modern Transformers.

paper July 1, 2016

Deep Learning with Differential Privacy (DP-SGD)

Introduced DP-SGD, a way to train neural networks with a provable privacy guarantee by clipping and adding noise to gradients.

paper July 3, 2016

node2vec: Scalable Feature Learning for Networks

node2vec learns node embeddings using biased random walks that flexibly balance local and global graph structure.

paper July 15, 2016

FastText: Enriching Word Vectors with Subword Information

Word embeddings built from character n-grams, giving vectors to rare and unseen words and capturing word morphology.

paper July 21, 2016

Layer Normalization

The 2016 Ba-Kiros-Hinton paper introducing layer normalization, the per-example normalization that became standard inside every Transformer.

paper July 21, 2016

Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings

A study showing that word embeddings learned from news text encoded gender stereotypes, plus a method to reduce them.

paper August 7, 2016

Back-Translation: Improving NMT with Monolingual Data (Sennrich 2016)

The 2016 back-translation paper showed translating monolingual text backwards makes cheap synthetic training pairs that boost neural translation.

paper August 16, 2016

Towards Evaluating the Robustness of Neural Networks (C&W Attack)

The 2016 Carlini and Wagner paper introducing the C&W attacks, which broke defensive distillation and set the bar for evaluating defenses.

paper August 25, 2016

Densely Connected Convolutional Networks (DenseNet)

The 2016 paper that connected every layer to every later layer, improving gradient flow and feature reuse while cutting parameter counts.

paper September 9, 2016

Semi-Supervised Classification with Graph Convolutional Networks

Kipf and Welling's GCN, a simple and scalable way to run convolution-like layers directly on graph data.

paper September 9, 2016

Stealing Machine Learning Models via Prediction APIs

A 2016 paper showing an attacker can copy a paid machine learning model with near-perfect fidelity just by querying its prediction API.

paper September 12, 2016

WaveNet: A Generative Model for Raw Audio

DeepMind's 2016 WaveNet generated raw audio sample by sample, jumping past the old vocoders and reshaping text-to-speech.

paper September 14, 2016

Toward an Integration of Deep Learning and Neuroscience

Marblestone, Wayne, and Kording argued the brain optimizes cost functions, and that deep learning offers a framework for understanding it.

paper September 15, 2016

Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network (SRGAN)

SRGAN, the 2016 paper that used a GAN and a perceptual loss to hallucinate photo-realistic texture at 4x upscaling.

paper September 19, 2016

Inherent Trade-Offs in the Fair Determination of Risk Scores

The paper proving that, in general, a risk score cannot satisfy several natural fairness conditions at once.

paper September 22, 2016

Using Deep Learning for Image-Based Plant Disease Detection

A 2016 paper trained a CNN on 54,306 leaf images to identify 26 crop diseases, reaching 99.35% accuracy in the lab.

paper October 7, 2016

Equality of Opportunity in Supervised Learning

The paper that introduced equalized odds and equal opportunity, two of the most widely used group-fairness criteria for classifiers.

paper October 7, 2016

Grad-CAM (Gradient-weighted Class Activation Mapping)

The 2016 Selvaraju paper that highlights which image regions a convolutional network used for a prediction, using gradients.

paper October 18, 2016

Membership Inference Attacks Against Machine Learning Models

A 2016 paper showing an attacker can tell whether a specific record was in a model's training set using only black-box query access.

paper October 18, 2016

PATE: Semi-supervised Knowledge Transfer for Deep Learning from Private Training Data

Introduced PATE, a privacy method where many teacher models trained on private data vote, with noise, to teach a public student model.

paper October 26, 2016

Universal Adversarial Perturbations

A 2016 paper showing a single fixed perturbation can fool a vision classifier on most natural images at once, not just one chosen image.

paper November 5, 2016

Neural Architecture Search with Reinforcement Learning

Zoph and Le's 2016 paper that used a reinforcement-learning controller to automatically design neural network architectures.

paper November 8, 2016

Random Synaptic Feedback Weights Support Error Backpropagation

Lillicrap and colleagues showed that fixed random feedback weights can carry learning signals nearly as well as backpropagation, easing a biological objection.

paper November 16, 2016

Aggregated Residual Transformations (ResNeXt)

The 2016 paper that added cardinality - the number of parallel transformation paths - as a new dimension for scaling convolutional networks.

paper November 21, 2016

Image-to-Image Translation with Conditional Adversarial Networks (pix2pix)

pix2pix used a conditional GAN to turn sketches, maps, and labels into photo-like images with a single general-purpose method.

paper November 24, 2016

Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields (OpenPose)

OpenPose detected the 2D body poses of everyone in an image in real time, using Part Affinity Fields to link joints to people.

paper November 24, 2016

The Off-Switch Game

A 2016 paper showing that an AI kept uncertain about its true objective has an incentive to let humans switch it off rather than resist.

paper December 2, 2016

PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation

PointNet was the first network to learn directly on raw 3D point clouds, respecting that points have no inherent order.

paper December 9, 2016

Feature Pyramid Networks for Object Detection

The 2016 paper that built a multi-scale feature pyramid inside a network, letting detectors find small and large objects at little extra cost.

paper December 13, 2016

Deep learning for detecting diabetic retinopathy in retinal photographs (JAMA 2016)

Google's 2016 JAMA study trained a CNN on 128,175 retinal photos to detect diabetic retinopathy with over 90 percent sensitivity and specificity.

paper December 22, 2016

SampleRNN: An Unconditional End-to-End Neural Audio Generation Model

SampleRNN generated raw audio one sample at a time with stacked recurrent networks, an early rival to WaveNet's approach.

paper January 23, 2017

Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer

The 2017 Shazeer-led paper that made mixture-of-experts practical, routing each input to a few specialized sub-networks to reach 137 billion parameters.

paper January 26, 2017

Wasserstein GAN (WGAN)

WGAN reformulated GAN training around the Wasserstein distance, giving more stable training and a meaningful loss curve.

paper February 2, 2017

Procedural Content Generation via Machine Learning (PCGML)

The 2017 PCGML survey defines procedural content generation via machine learning, training models on existing game levels.

paper February 28, 2017

Billion-scale similarity search with GPUs (FAISS)

The 2017 paper behind FAISS, the open-source library that made nearest-neighbor search over a billion vectors practical by running it on GPUs.

paper March 2017

Robots and Jobs (Acemoglu and Restrepo)

Acemoglu and Restrepo found each additional industrial robot per thousand workers measurably lowered US local employment and wages.

paper March 20, 2017

Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World

OpenAI's 2017 paper trained a robot vision model on randomized non-realistic simulation and transferred it to reality at 1.5 cm accuracy.

paper March 20, 2017

Mask R-CNN

Mask R-CNN added a mask branch to Faster R-CNN, predicting a pixel-level outline for every detected object in one pass.

paper March 30, 2017

Unpaired Image-to-Image Translation with CycleGAN

CycleGAN translated images between domains without paired examples, using a cycle-consistency loss to preserve content.

paper April 4, 2017

Neural Message Passing for Quantum Chemistry (MPNN)

Gilmer and colleagues unified many graph networks into the Message Passing Neural Network framework for molecular prediction.

paper April 13, 2017

DeepAR: Probabilistic Forecasting with Autoregressive Recurrent Networks

Amazon's DeepAR, a recurrent network that learns from many related time series at once and outputs probability distributions, not just point forecasts.

paper April 16, 2017

In-Datacenter Performance Analysis of a Tensor Processing Unit

Google's ISCA 2017 paper revealing the first TPU, a custom inference chip running in its datacenters since 2015.

paper April 17, 2017

MobileNets: Efficient Convolutional Networks for Mobile Vision

The 2017 Google paper using depthwise separable convolutions to build small, fast vision models that run on phones and embedded devices.

paper April 29, 2017

Semi-supervised Sequence Tagging with Bidirectional LMs (TagLM)

Peters et al. add pretrained bidirectional language-model embeddings to taggers, a direct precursor to ELMo.

paper May 22, 2017

SHAP (SHapley Additive exPlanations)

The 2017 Lundberg-Lee paper that grounds feature-importance explanations in game theory using Shapley values.

paper May 23, 2017

The Marginal Value of Adaptive Gradient Methods in Machine Learning

The 2017 paper arguing that adaptive optimizers like Adam can generalize worse than plain SGD despite training faster.

paper June 7, 2017

Inductive Representation Learning on Large Graphs (GraphSAGE)

GraphSAGE learns to sample and aggregate neighbor features so embeddings generalize to nodes unseen during training.

paper June 12, 2017

Attention Is All You Need

The 2017 Google paper that introduced the Transformer architecture, the foundation of virtually all modern large language models.