Landmark Papers

What the papers actually said - linked to the originals.

644 entries, all primary-sourced
paper March 8, 2022

In-context Learning and Induction Heads

The 2022 Anthropic paper identifying induction heads, an attention circuit that appears to drive in-context learning in transformers.

paper May 26, 2022

Matryoshka Representation Learning

Training method that packs coarse-to-fine detail into one embedding, so you can truncate it to shorter vectors without retraining.

paper July 26, 2022

Classifier-Free Diffusion Guidance

Classifier-free guidance let diffusion models follow a text prompt more closely without needing a separate classifier.

paper September 12, 2022

FP8 Formats for Deep Learning

NVIDIA, Arm, and Intel jointly propose two 8-bit floating-point formats for AI, the precision behind Hopper and Blackwell.

paper September 14, 2022

Toy Models of Superposition

The 2022 Anthropic paper showing neural networks pack more features than they have neurons by storing them in superposition.

paper October 26, 2022

Broken Neural Scaling Laws

The 2022 paper proposing a smoothly-broken power law that fits and extrapolates scaling behavior, including double descent and sharp jumps.

paper November 18, 2022

PAL: Program-Aided Language Models

The 2022 Gao et al. paper that has a language model write code as its reasoning steps and offloads the calculation to a Python interpreter.

paper January 24, 2023

A Watermark for Large Language Models

A 2023 paper that embeds a hidden, statistically detectable signal in LLM text so machine-generated output can be identified.