Landmark Papers

What the papers actually said - linked to the originals.

644 entries, all primary-sourced

paper October 4, 2019

ZeRO: Memory Optimizations Toward Training Trillion Parameter Models

The 2019 Microsoft paper introducing ZeRO, the memory-partitioning technique behind DeepSpeed that made training models with 100B-plus parameters feasible.

paper October 8, 2019

Human Compatible (Stuart Russell, 2019)

Stuart Russell's 2019 book argues that AI should be rebuilt to be uncertain about human preferences rather than to optimize a fixed objective.

paper October 23, 2019

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer (T5)

The 2019 Google paper introducing T5, which casts every NLP task as text-in, text-out, and the C4 web corpus used to train it.

paper October 29, 2019

BART: Denoising Sequence-to-Sequence Pre-training

An encoder-decoder model pretrained by corrupting text and learning to reconstruct it, strong at summarization and generation.

paper November 5, 2019

On the Measure of Intelligence

Francois Chollet's 2019 paper defines intelligence as skill-acquisition efficiency and introduces the ARC benchmark.

paper November 5, 2019

XLM-R: Unsupervised Cross-lingual Representation Learning at Scale

Facebook's 2019 paper showed one Transformer trained on 100 languages can beat multilingual BERT without hurting per-language quality.

paper November 27, 2019

Music Source Separation in the Waveform Domain (Demucs)

Meta's Demucs split a song into vocals, drums, bass, and other by working on the raw waveform instead of spectrograms.

paper December 3, 2019

Analyzing and Improving the Image Quality of StyleGAN (StyleGAN2)

StyleGAN2 removed the blob-like artifacts of the original StyleGAN and set a new bar for photorealistic face synthesis.

paper December 3, 2019

Dreamer: Learning Behaviors by Latent Imagination

The 2019 Dreamer paper learned a world model from images and trained behaviors by imagining rollouts in its latent space.

paper December 3, 2019

PyTorch: An Imperative Style, High-Performance Deep Learning Library

The 2019 NeurIPS paper describing PyTorch's design, arguing that an imperative, Pythonic framework can be both easy to use and fast on GPUs.

paper December 4, 2019

Deep Double Descent: Where Bigger Models and More Data Hurt

The 2019 OpenAI paper showing that as models grow, test error first worsens then improves again, breaking the classic bias-variance tradeoff.

paper December 9, 2019

Machine Unlearning (SISA Training)

Introduced SISA, a training design that lets a model efficiently forget specific data without retraining from scratch.

paper December 19, 2019

Temporal Fusion Transformer

Lim and colleagues' attention-based forecasting model that handles mixed inputs and stays interpretable for multi-horizon time-series prediction.

paper January 14, 2020

DDSP: Differentiable Digital Signal Processing

Google Magenta's DDSP put classic synthesizer building blocks inside a neural network, enabling pitch and timbre control.

paper January 15, 2020

AlphaFold 1: Improved protein structure prediction using potentials from deep learning

The 2020 Nature paper describing the first AlphaFold, which won CASP13 by predicting protein shapes with deep-learned distance potentials.

paper January 23, 2020

Scaling Laws for Neural Language Models (Kaplan et al.)

The 2020 OpenAI paper that found language-model loss falls as a smooth power law in model size, data, and compute over seven orders of magnitude.

paper February 10, 2020

REALM: Retrieval-Augmented Language Model Pre-Training

Google paper that pre-trains a language model together with a document retriever, learning to look things up instead of memorizing.

paper February 20, 2020

Halicin: a deep learning approach to antibiotic discovery (Cell 2020)

MIT's 2020 Cell paper used a neural network to screen molecules and discover halicin, a structurally novel antibiotic effective against resistant bacteria.

paper March 6, 2020

AutoML-Zero: Evolving Machine Learning Algorithms From Scratch

Google's 2020 paper that used evolutionary search to rediscover ML algorithms like neural nets and backpropagation from raw math operations.

paper March 19, 2020

NeRF: Representing Scenes as Neural Radiance Fields

The 2020 NeRF paper fit a small neural network to a handful of photos and rendered photorealistic novel views of a 3D scene.

paper March 23, 2020

ELECTRA: Pre-training Text Encoders as Discriminators

Replaces masked-word prediction with detecting fake tokens, learning far more efficiently than BERT.

paper April 10, 2020

Dense Passage Retrieval for Open-Domain Question Answering

Showed that learned dense embeddings can beat keyword search like BM25 at finding relevant passages.

paper April 10, 2020

Longformer: The Long-Document Transformer

Transformer that replaces full attention with a sliding window plus global attention, scaling to long documents linearly.

paper April 27, 2020

ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT

A retrieval model that keeps BERT's accuracy but precomputes document representations for fast search.

paper May 16, 2020

Conformer: Convolution-augmented Transformer for Speech Recognition

Google's 2020 Conformer combined convolution and self-attention to set new accuracy records on the LibriSpeech speech benchmark.

paper May 22, 2020

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

The 2020 Lewis et al. paper that coined RAG, combining a seq2seq model with a learned retriever over a Wikipedia vector index.

paper May 26, 2020

End-to-End Object Detection with Transformers (DETR)

DETR treated object detection as direct set prediction with a transformer, dropping the hand-tuned anchors and NMS earlier detectors needed.

paper May 28, 2020

Language Models are Few-Shot Learners (GPT-3)

The 2020 OpenAI paper introducing GPT-3, a 175-billion-parameter model that performed new tasks from a few examples in its prompt, with no retraining.

paper June 5, 2020

DeBERTa: Decoding-enhanced BERT with Disentangled Attention

Separates word content and position into distinct vectors and was first to top the human baseline on SuperGLUE.

paper June 8, 2020

Conservative Q-Learning (CQL)

The 2020 CQL paper made offline RL reliable by learning a Q-function that lower-bounds true value, curbing overestimation.

paper June 8, 2020

FastSpeech 2: Fast and High-Quality End-to-End Text to Speech

Microsoft's 2020 FastSpeech 2 dropped the teacher-student trick and conditioned on pitch, energy, and duration for better fast TTS.

paper June 15, 2020

DreamCoder: Wake-Sleep Bayesian Program Learning

A program synthesis system that learns its own library of concepts while solving problems, guided by neural search.

paper June 18, 2020

SE(3)-Transformers: 3D Roto-Translation Equivariant Attention Networks

An attention network for 3D point clouds and graphs whose predictions stay consistent under rotation and translation.

paper June 19, 2020

Denoising Diffusion Probabilistic Models (DDPM)

The 2020 DDPM paper made diffusion models work for high-quality image generation, setting a new FID record on CIFAR-10 and seeding the diffusion era.

paper June 20, 2020

wav2vec 2.0: Self-Supervised Learning of Speech Representations

Meta's wav2vec 2.0 learned speech from raw unlabeled audio, then matched strong systems with only minutes of labeled data.

paper July 16, 2020

Hopfield Networks is All You Need

A modern continuous Hopfield network that stores exponentially many patterns and whose update rule equals Transformer attention.

paper July 28, 2020

Big Bird: Transformers for Longer Sequences

Google sparse-attention transformer combining window, random, and global attention to handle sequences up to 8x longer.

paper September 2, 2020

Learning to Summarize from Human Feedback

The 2020 OpenAI paper applying RLHF to text summarization, showing models tuned on human preferences beat much larger supervised models.

paper September 7, 2020

Generative Language Modeling for Automated Theorem Proving (GPT-f)

Used a transformer to prove formal theorems in Metamath, the first deep-learning proofs adopted by a math community.

paper September 18, 2020

COMET: A Neural Framework for MT Evaluation

COMET is a 2020 neural metric that scores machine translations by how well they match human judgments, beating older word-overlap metrics.

paper September 30, 2020

Rethinking Attention with Performers

Performers approximate softmax attention in linear time and memory using random features, with provable accuracy.

paper October 5, 2020

DreamerV2: Mastering Atari with Discrete World Models

The 2020 DreamerV2 paper was the first model-based agent to reach human-level Atari performance using a learned world model.

paper October 12, 2020

HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis

HiFi-GAN turned spectrograms into 22kHz audio about 168 times faster than real time on a GPU at near-human quality.

paper October 18, 2020

Fourier Neural Operator for Parametric Partial Differential Equations

The 2020 paper that learned to solve whole families of PDEs by parameterizing the solver in Fourier space, far faster than classical solvers.

paper October 21, 2020

Learning quadrupedal locomotion over challenging terrain

A neural controller drove the ANYmal robot over mud, snow, rubble, and vegetation it never saw in training, using only touch and joint feel.

paper October 21, 2020

M2M-100: Beyond English-Centric Multilingual Machine Translation

Facebook's M2M-100 was the first single model to translate directly between any pair of 100 languages without routing through English.

paper October 22, 2020

An Image is Worth 16x16 Words (Vision Transformer)

The 2020 Google paper that applied the Transformer directly to image patches, showing ViT can match or beat CNNs given enough pretraining data.

paper October 22, 2020

mT5: A Massively Multilingual Text-to-Text Transformer

Google's mT5 extended the T5 text-to-text recipe to 101 languages, becoming a widely used open multilingual model.