Landmark Papers

What the papers actually said - linked to the originals.

644 entries, all primary-sourced

paper November 26, 2020

Score-Based Generative Modeling through Stochastic Differential Equations

Song and colleagues unified diffusion and score-based generation under a continuous stochastic differential equation framework.

paper December 14, 2020

Extracting Training Data from Large Language Models

A 2020 paper that recovered verbatim training text, including personal data, from GPT-2 just by querying it, proving language models memorize.

paper December 14, 2020

Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting

Zhou and colleagues' Informer, a transformer variant with sparse attention that makes forecasting very long sequences computationally practical.

paper January 1, 2021

Prefix-Tuning: Optimizing Continuous Prompts for Generation

A 2021 method that adapts a frozen language model by training only a small continuous prefix, tuning about 0.1 percent of parameters.

paper January 11, 2021

Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity

The 2021 paper that simplified mixture-of-experts by routing each token to a single expert, scaling Transformers to a trillion parameters efficiently.

paper February 11, 2021

ALIGN: Scaling Visual and Vision-Language Learning with Noisy Text Supervision

Google's ALIGN trained a contrastive dual-encoder on over a billion noisy image alt-text pairs, skipping the expensive curation step.

paper February 19, 2021

E(n) Equivariant Graph Neural Networks (EGNN)

EGNN is a graph network equivariant to rotations, translations, and reflections without costly higher-order representations.

paper March 2021

NSCAI Final Report (2021)

The 2021 report of the US National Security Commission on AI warned the country was not ready to defend or compete in the AI era.

paper March 2021

On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?

The paper that warned about the environmental, social, and bias costs of ever-larger language models, and helped trigger a high-profile firing.

paper March 16, 2021

Moore's Law for Everything (Sam Altman, 2021)

Sam Altman's 2021 essay predicting AI will drive the cost of goods toward zero and proposing a wealth fund paid to every citizen.

paper March 18, 2021

DeepONet: learning nonlinear operators (Lu, Jin, Karniadakis)

The 2021 Nature Machine Intelligence paper introducing DeepONet, a network that learns operators between function spaces from data.

paper March 25, 2021

Swin Transformer: Hierarchical Vision Transformer

The 2021 Microsoft paper that gave vision transformers a hierarchy and shifted local windows, making them efficient general-purpose vision backbones.

paper April 7, 2021

Dynabench: Rethinking Benchmarking in NLP

A platform where humans write examples that fool the model, making benchmarks dynamic and harder to saturate.

paper April 18, 2021

The Power of Scale for Parameter-Efficient Prompt Tuning

A 2021 Google paper showing that learned soft prompts rival full fine-tuning as models grow into the billions of parameters.

paper April 20, 2021

RoFormer: Enhanced Transformer with Rotary Position Embedding (RoPE)

The 2021 Su et al. paper introducing rotary position embeddings, the position-encoding scheme now used in Llama, GPT-NeoX, and most modern LLMs.

paper April 27, 2021

Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges

Bronstein and colleagues frame CNNs, GNNs, and Transformers as instances of one geometric principle: building in symmetry.

paper June 2, 2021

Decision Transformer

The 2021 Decision Transformer paper recast reinforcement learning as sequence modeling, using a Transformer to predict actions.

paper June 3, 2021

Trajectory Transformer

The 2021 Trajectory Transformer paper modeled whole RL trajectories with a Transformer and used beam search as a planner.

paper June 14, 2021

HuBERT: Speech Representation Learning by Masked Prediction

HuBERT learned speech representations by clustering audio into pseudo-labels and predicting masked ones, a BERT-style recipe for sound.

paper July 7, 2021

Evaluating Large Language Models Trained on Code (Codex)

The 2021 OpenAI paper that introduced Codex and the HumanEval benchmark and described the model behind GitHub Copilot.

paper July 7, 2021

SoundStream: An End-to-End Neural Audio Codec

Google's 2021 SoundStream compressed speech and music at very low bitrates with a learned codec and residual vector quantization.

paper July 12, 2021

SPLADE: Sparse Lexical and Expansion Model for First Stage Ranking

Neural retriever that outputs sparse, term-based vectors, blending keyword-style exact matching with learned semantic expansion.

paper July 14, 2021

Deduplicating Training Data Makes Language Models Better

A 2021 study showing that web training corpora are full of duplicates, and that removing them cuts memorization tenfold and speeds up training.

paper July 15, 2021

RoseTTAFold: protein structure prediction with a three-track neural network

Baker lab's 2021 Science paper introduced RoseTTAFold, a three-track network that predicted protein structures and complexes nearly as well as AlphaFold 2.

paper July 28, 2021

Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods

The 2021 Liu et al. survey that named and organized the prompting paradigm as a third way alongside pre-train-and-fine-tune.

paper August 24, 2021

Isaac Gym: GPU-based physics simulation for robot learning

NVIDIA's Isaac Gym ran robot physics and policy training together on the GPU, cutting RL training time by two to three orders of magnitude.

paper August 27, 2021

ALiBi: Train Short, Test Long with Attention with Linear Biases

Positional method that biases attention by token distance, letting a model handle longer inputs than it was trained on.

paper September 3, 2021

Finetuned Language Models Are Zero-Shot Learners (FLAN)

The 2021 Google paper that introduced instruction tuning, fine-tuning a model on many tasks phrased as instructions to boost zero-shot performance.

paper September 24, 2021

Learning to walk in minutes with massively parallel deep RL

By simulating thousands of robots at once on a single GPU, this work trained a quadruped walking policy in minutes instead of days.

paper October 2021

Tasks, Automation, and the Rise in US Wage Inequality

Acemoglu and Restrepo linked 50 to 70 percent of the change in the US wage structure since 1980 to automation displacing routine-task workers.

paper October 4, 2021

Protein complex prediction with AlphaFold-Multimer

A 2021 DeepMind preprint extending AlphaFold to predict how multiple protein chains assemble into complexes.

paper October 30, 2021

EfficientZero

The 2021 EfficientZero paper reached above human-level Atari with only two hours of gameplay, a huge gain in sample efficiency.

paper October 31, 2021

Efficiently Modeling Long Sequences with Structured State Spaces (S4)

S4 made state space models practical for very long sequences, outperforming Transformers on extreme long-range tasks.

paper November 11, 2021

Masked Autoencoders Are Scalable Vision Learners (MAE)

The 2021 Meta paper that pretrained vision transformers by masking most of an image and reconstructing it, a simple, scalable self-supervised recipe.

paper November 26, 2021

AI and the Everything in the Whole Wide World Benchmark

Argues popular AI benchmarks lack the construct validity to stand for general progress toward flexible AI.

paper December 1, 2021

Advancing mathematics by guiding human intuition with AI

A 2021 Nature paper showed machine learning could surface patterns that led mathematicians to new theorems in knot theory and representation theory.

paper December 2, 2021

ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction

Retrieval model that keeps a vector per token for fine-grained matching, then compresses them to make the approach storage-practical.

paper December 8, 2021

RETRO: Improving Language Models by Retrieving from Trillions of Tokens

DeepMind model that matches GPT-3 quality using a fraction of the parameters by retrieving from a 2-trillion-token database.

paper December 17, 2021

WebGPT: Browser-assisted Question-Answering with Human Feedback

OpenAI's 2021 paper teaching GPT-3 to search and browse the web to answer questions, an early ancestor of browser agents.

paper December 20, 2021

GLIDE: Text-Guided Diffusion for Image Generation and Editing

OpenAI's GLIDE showed text-guided diffusion could beat the original DALL-E and edit images from natural-language instructions.

paper December 20, 2021

High-Resolution Image Synthesis with Latent Diffusion Models

The 2021 paper that moved diffusion into a compressed latent space and became the architecture behind Stable Diffusion.

paper December 22, 2021

A Mathematical Framework for Transformer Circuits

The 2021 Anthropic paper that reverse-engineered small attention-only transformers and named induction heads, launching the circuits agenda.

paper January 6, 2022

Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets

The 2022 OpenAI paper documenting grokking, where a network keeps memorizing for a long time and then suddenly generalizes perfectly far past overfitting.

paper January 10, 2022

A ConvNet for the 2020s (ConvNeXt)

The 2022 Meta paper that modernized a plain ResNet step by step until it matched vision transformers, showing convolutions were not obsolete.

paper January 16, 2022

Instant Neural Graphics Primitives with a Multiresolution Hash Encoding (instant-ngp)

NVIDIA's 2022 instant-ngp paper cut neural radiance field training from hours to seconds with a multiresolution hash encoding.

paper January 28, 2022

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

The 2022 Wei et al. paper showing that prompting an LLM to show its reasoning steps sharply improves its accuracy on arithmetic and reasoning tasks.

paper February 7, 2022

data2vec: A General Self-Supervised Framework for Speech, Vision and Language

Meta's 2022 data2vec used one self-supervised recipe across speech, images, and text by predicting the model's own latent targets.

paper February 7, 2022

Red Teaming Language Models with Language Models

The 2022 DeepMind paper that used one language model to automatically generate adversarial prompts and surface harmful behavior in another.