Landmark Papers

What the papers actually said - linked to the originals.

644 entries, all primary-sourced

paper December 20, 2014

Explaining and Harnessing Adversarial Examples

The 2014 paper behind the famous panda example: it explained adversarial examples as a result of neural networks' linearity and introduced FGSM.

paper December 22, 2014

Adam: A Method for Stochastic Optimization

The 2014 paper introducing Adam, the adaptive optimizer that became the default for training deep neural networks.

paper December 31, 2014

Image Super-Resolution Using Deep Convolutional Networks (SRCNN)

SRCNN, the 2014 paper that first applied a deep convolutional network end-to-end to single-image super-resolution.

paper February 10, 2015

Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

Show, Attend and Tell added visual attention to captioning, letting the model fix its gaze on the object it was naming word by word.

paper February 11, 2015

Batch Normalization: Accelerating Deep Network Training

The 2015 paper introducing batch normalization, which let much deeper networks train far faster and more reliably.

paper February 19, 2015

Trust Region Policy Optimization (TRPO)

The 2015 TRPO paper made policy-gradient training stable with a trust region, paving the way for PPO and modern deep RL.

paper March 9, 2015

Distilling the Knowledge in a Neural Network

The 2015 Hinton-Vinyals-Dean paper that formalized knowledge distillation, training a small student model to mimic a large model's soft outputs.

paper April 2015

Rotation-invariant CNNs for galaxy morphology prediction

A 2015 paper showed a rotation-invariant CNN could classify galaxy shapes from Galaxy Zoo crowdsourced labels at over 99% agreement.

paper April 30, 2015

Fast R-CNN

The 2015 paper that sped up region-based object detection by running the convolutional network once per image and sharing features across regions.

paper May 3, 2015

Highway Networks

The 2015 Schmidhuber-lab paper that used learned gates to let gradients flow through very deep networks, a direct precursor to ResNet's skip connections.

paper May 3, 2015

VQA: Visual Question Answering

The 2015 VQA paper defined free-form visual question answering and shipped a dataset of 0.25M images with 0.76M questions.

paper May 18, 2015

U-Net: Convolutional Networks for Biomedical Image Segmentation

The 2015 U-Net paper introduced an encoder-decoder with skip connections for image segmentation, now a backbone of diffusion image generators.

paper June 4, 2015

Faster R-CNN: Towards Real-Time Object Detection

The 2015 paper that built region proposals into the network itself, making accurate object detection nearly real-time and fully end-to-end.

paper June 8, 2015

Generalized Advantage Estimation (GAE)

The 2015 GAE paper introduced a variance-reduction technique for policy gradients that underpins methods like PPO.

paper June 8, 2015

You Only Look Once (YOLO)

The 2015 YOLO paper reframed object detection as a single regression pass, hitting 45 frames per second and making real-time detection mainstream.

paper June 17, 2015

FaceNet: A Unified Embedding for Face Recognition and Clustering

Google's 2015 FaceNet learned a face embedding with a triplet loss and reached 99.63% on the LFW benchmark.

paper June 22, 2015

BookCorpus and Aligning Books and Movies

The 2015 paper that introduced BookCorpus, a collection of free e-books that later quietly trained BERT, GPT, and many early language models.

paper August 2015

Why Are There Still So Many Jobs? (Autor)

David Autor's argument that automation both substitutes for and complements labor, which is why employment persists despite automation.

paper August 5, 2015

Listen, Attend and Spell

A 2015 Google paper that transcribed speech to characters with an attention-based encoder-decoder, no separate phoneme or HMM stage.

paper August 17, 2015

Effective Approaches to Attention-based NMT (Luong Attention)

Introduces global and local attention for translation, with simpler scoring that became widely adopted.

paper August 31, 2015

Byte Pair Encoding for Neural Machine Translation

Adapts a compression algorithm to split rare words into subword units, solving open-vocabulary translation.

paper September 2015

chrF: Character n-gram F-score for MT Evaluation

chrF is a 2015 translation metric that compares text at the character level, working better than BLEU for morphologically rich languages.

paper September 9, 2015

Continuous control with deep reinforcement learning (DDPG)

DeepMind's 2015 DDPG paper extended deep Q-learning to continuous action spaces, enabling RL control of simulated robots from pixels.

paper September 22, 2015

Double DQN

The 2015 Double DQN paper showed standard deep Q-learning overestimates values and fixed it with a simple decoupling trick.

paper October 12, 2015

Model Inversion Attacks that Exploit Confidence Information

A 2015 paper showing that an ML model's confidence scores can be used to reconstruct sensitive training inputs, even faces.

paper November 18, 2015

Prioritized Experience Replay

The 2015 prioritized replay paper showed RL agents learn faster by replaying important past experiences more often.

paper November 19, 2015

Unsupervised Representation Learning with Deep Convolutional GANs (DCGAN)

DCGAN gave GANs a stable convolutional architecture, making adversarial image generation practical and reproducible.

paper November 20, 2015

Dueling Network Architectures for Deep RL

The 2015 dueling network paper split a Q-network into separate value and advantage streams, improving Atari performance.

paper November 21, 2015

Session-based Recommendations with Recurrent Neural Networks (GRU4Rec)

Used recurrent neural networks to recommend the next item from a short browsing session alone.

paper December 7, 2015

Efficient and Robust Automated Machine Learning (auto-sklearn)

Feurer et al.'s 2015 NeurIPS paper introducing auto-sklearn, which automatically selects, configures, and ensembles ML pipelines.

paper December 7, 2015

Hidden Technical Debt in Machine Learning Systems

Google's influential 2015 NeurIPS paper arguing that the model is a small part of a real ML system and most cost is hidden maintenance debt.

paper December 8, 2015

SSD: Single Shot MultiBox Detector

The 2015 paper that detected objects in one network pass using default boxes across multiple feature scales, trading a little accuracy for big speed.

paper December 10, 2015

Deep Residual Learning for Image Recognition (ResNet)

The 2015 Microsoft paper that introduced residual connections, letting networks grow to hundreds of layers and winning that year's ImageNet contest.

paper February 4, 2016

Asynchronous Methods for Deep Reinforcement Learning (A3C)

DeepMind's 2016 A3C paper trained deep RL agents in parallel on CPUs, beating the state of the art on Atari in half the time.

paper February 8, 2016

Practical Black-Box Attacks against Machine Learning

A 2016 paper showing adversarial examples transfer: a local substitute model can fool a remote classifier you cannot see inside.

paper February 16, 2016

LIME (Local Interpretable Model-agnostic Explanations)

The 2016 Ribeiro paper that explains any classifier's individual predictions by fitting a simple model in a local neighborhood.

paper February 17, 2016

Communication-Efficient Learning of Deep Networks from Decentralized Data (FedAvg)

The original federated learning paper, introducing the FedAvg algorithm that trains a shared model without moving raw data off devices.

paper February 24, 2016

Group Equivariant Convolutional Networks (G-CNN)

Cohen and Welling extended convolution to respect rotations and reflections, not just translations, improving sample efficiency.

paper March 9, 2016

XGBoost: A Scalable Tree Boosting System

Chen and Guestrin's 2016 paper introducing XGBoost, the scalable gradient-boosting system that dominated tabular machine learning and Kaggle.

paper March 21, 2016

Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization

Li et al.'s 2016 method that speeds up tuning by giving more compute to promising configurations and killing weak ones early.

paper March 30, 2016

HNSW: Hierarchical Navigable Small World Graphs

The graph-based nearest-neighbor algorithm that powers fast vector search in most vector databases.

paper May 9, 2016

Theano: A Python framework for fast computation of mathematical expressions

The 2016 paper documenting Theano, the pioneering Python math compiler from MILA that seeded the modern deep learning framework era.

paper May 27, 2016

TensorFlow: A System for Large-Scale Machine Learning

Google's 2016 systems paper describing TensorFlow's dataflow-graph architecture for running machine learning across CPUs, GPUs, and TPUs at cluster scale.

paper June 2016

Computational Imaging for VLBI Image Reconstruction (CHIRP)

The 2016 CHIRP paper introduced a computational-imaging method for reconstructing black hole images from sparse VLBI data.

paper June 2016

The Age of Em (Robin Hanson, 2016)

Robin Hanson's 2016 book forecasts an economy run by emulated human brains, copied and sped up at will.

paper June 19, 2016

CryptoNets: Applying Neural Networks to Encrypted Data

Showed a neural network can run on homomorphically encrypted inputs, making predictions a cloud server cannot read.

paper June 21, 2016

Concrete Problems in AI Safety

A 2016 paper that reframed AI safety around five concrete engineering problems in present-day machine learning systems rather than far-off speculation.

paper June 24, 2016

Wide & Deep Learning for Recommender Systems

Google's recommender that combines a memorizing linear model with a generalizing neural network.