Landmark Papers

What the papers actually said - linked to the originals.

644 entries, all primary-sourced

paper January 26, 2023

MusicLM: Generating Music From Text

Google's 2023 MusicLM generated minutes of coherent music from a text caption and released MusicCaps, 5,500 expert-written pairs.

paper January 30, 2023

A deep-learning search for technosignatures of 820 nearby stars

A 2023 Breakthrough Listen study used a deep-learning autoencoder to scan 820 stars for technosignatures, flagging eight signals to recheck.

paper January 30, 2023

BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Encoders and LLMs

BLIP-2 bridged a frozen image encoder and a frozen LLM with a tiny Q-Former, beating Flamingo with 54x fewer trainable parameters.

paper February 8, 2023

Glaze: Protecting Artists from Style Mimicry by Text-to-Image Models

A University of Chicago tool that adds near-invisible 'cloaks' to art so AI models cannot easily learn and copy an artist's style.

paper February 9, 2023

Performance of ChatGPT on USMLE (Kung et al., 2023)

A 2023 PLOS Digital Health study found ChatGPT scored at or near the passing threshold on all three US medical licensing exams with no medical training.

paper February 9, 2023

Toolformer: Language Models Can Teach Themselves to Use Tools

The 2023 Meta paper that taught a language model to decide on its own when to call APIs like a calculator or search engine.

paper February 10, 2023

Adding Conditional Control to Text-to-Image Diffusion Models (ControlNet)

ControlNet, the 2023 method that adds pose, edge, and depth control to a frozen diffusion model using zero-initialized convolutions.

paper February 20, 2023

Poisoning Web-Scale Training Datasets Is Practical

A 2023 paper showing an attacker could poison popular web-scale datasets like LAION-400M for around $60 by exploiting expired domains.

paper February 21, 2023

Hyena Hierarchy: Towards Larger Convolutional Language Models

Hyena replaces attention with long implicit convolutions and gating, reaching Transformer quality at subquadratic cost.

paper February 23, 2023

Not What You've Signed Up For: Indirect Prompt Injection

The 2023 Greshake paper that named indirect prompt injection: hiding malicious instructions in data an LLM later reads.

paper March 2, 2023

Consistency Models

Consistency models generate images in a single step by mapping noise directly to data, cutting diffusion's many sampling steps.

paper March 6, 2023

PaLM-E: An Embodied Multimodal Language Model

PaLM-E injected images and robot sensor data into a 562B-parameter language model, planning robot tasks while topping vision benchmarks.

paper March 7, 2023

Diffusion Policy: Visuomotor Policy Learning via Action Diffusion

Diffusion Policy generated robot actions with a denoising diffusion process, beating prior methods by 46.9 percent on average.

paper March 15, 2023

GPT-4 Passes the Bar Exam (Katz et al., 2023)

A study reported that GPT-4 scored well enough on the Uniform Bar Exam to pass in every US jurisdiction that uses it.

paper March 16, 2023

ESMFold: evolutionary-scale protein structure prediction with a language model

Meta's 2023 Science paper showed a 15-billion-parameter protein language model could predict structures fast enough to map 600 million metagenomic proteins.

paper March 17, 2023

GPTs are GPTs: An Early Look at the Labor Market Impact Potential of Large Language Models

An OpenAI-led paper estimated about 80% of US workers could have at least 10% of their tasks affected by LLMs.

paper March 20, 2023

Reflexion: Language Agents with Verbal Reinforcement Learning

A 2023 paper that let agents learn from written self-reflection stored in memory rather than weight updates, hitting 91% on HumanEval.

paper March 22, 2023

Catalyzing Next-Generation Artificial Intelligence through NeuroAI

A 27-author manifesto arguing that progress in AI should draw on neuroscience, proposing an embodied Turing test as a benchmark.

paper March 22, 2023

Sparks of Artificial General Intelligence: Early experiments with GPT-4

The 2023 Microsoft Research paper arguing an early GPT-4 showed sparks of general intelligence, sparking a fierce debate.

paper March 30, 2023

BloombergGPT: A Large Language Model for Finance

Bloomberg's 50-billion-parameter language model, trained on a 363-billion-token financial corpus plus general data.

paper March 30, 2023

Self-Refine: Iterative Refinement with Self-Feedback

The 2023 Madaan et al. paper where a model critiques and improves its own output across iterations, with no extra training.

paper March 31, 2023

CAMEL: Communicative Agents for Mind Exploration of Large Language Model Society

A 2023 KAUST paper that paired role-playing LLM agents and let them cooperate autonomously through inception prompting.

paper April 4, 2023

TPU v4: An Optically Reconfigurable Supercomputer for Machine Learning

Google's paper on TPU v4, which uses optical circuit switches to rewire 4,096 chips into a reconfigurable AI supercomputer.

paper April 5, 2023

Segment Anything (SAM)

Meta's 2023 Segment Anything introduced a promptable segmentation model and SA-1B, a dataset of over 1 billion masks on 11 million images.

paper April 7, 2023

Generative Agents: Interactive Simulacra of Human Behavior

The 2023 Stanford and Google paper that put 25 LLM-driven characters in a sandbox town and watched them plan a party on their own.

paper April 12, 2023

AI-Descartes: combining data and theory for derivable scientific discovery

A 2023 paper pairing symbolic regression with logical reasoning to find equations that fit data and follow from background theory.

paper April 14, 2023

DINOv2: Learning Robust Visual Features without Supervision

Meta's DINOv2 trained a billion-parameter vision transformer with no labels, producing general features that work across tasks without fine-tuning.

paper April 17, 2023

Visual Instruction Tuning (LLaVA)

LLaVA connected a CLIP vision encoder to an LLM and trained it on GPT-4-generated image instructions to build an open visual assistant.

paper April 21, 2023

PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel

Meta's 2023 paper on Fully Sharded Data Parallel, which shards parameters, gradients, and optimizer states across workers to train far larger models.

paper April 24, 2023

Generative AI at Work

The first large field study of generative AI at work found a 14% average productivity gain for support agents, concentrated among novices.

paper April 28, 2023

Are Emergent Abilities of Large Language Models a Mirage?

The 2023 Schaeffer et al. paper, a NeurIPS best-paper winner, arguing that LLM emergent abilities are an artifact of the chosen metric, not the model.

paper May 9, 2023

ImageBind: One Embedding Space To Bind Them All

Meta's ImageBind learned a single embedding across six modalities using only image-paired data to tie them all together.

paper May 17, 2023

Tree of Thoughts: Deliberate Problem Solving with Large Language Models

The 2023 Yao et al. framework letting models search over branching reasoning steps with lookahead and backtracking.

paper May 18, 2023

Re-evaluating GPT-4's Bar Exam Performance (Martinez, 2023)

A follow-up study argued GPT-4's famous near-90th-percentile bar exam result was overstated and closer to the median.

paper May 20, 2023

Glot500: Scaling Multilingual Models to 500 Languages

Glot500 is a 2023 model and corpus covering 511 mostly low-resource languages, pushing multilingual NLP well past the usual 100.

paper May 22, 2023

GQA: Training Generalized Multi-Query Transformer Models (Grouped-Query Attention)

The 2023 Google paper introducing grouped-query attention, which shrinks the inference memory of Transformers while keeping near-full attention quality.

paper May 22, 2023

RWKV: Reinventing RNNs for the Transformer Era

RWKV trains in parallel like a Transformer but runs at inference like an RNN, with constant memory per token.

paper May 23, 2023

QLoRA: Efficient Finetuning of Quantized LLMs

The 2023 paper showing a 65B-parameter model could be finetuned on a single 48GB GPU using 4-bit quantization, putting LLM tuning within hobbyist reach.

paper May 23, 2023

ReWOO: Decoupling Reasoning from Observations for Efficient Augmented Language Models

A 2023 paper that plans all of an agent's tool calls up front instead of interleaving them, cutting token use roughly fivefold on HotpotQA.

paper May 24, 2023

Gorilla: Large Language Model Connected with Massive APIs

A 2023 Berkeley paper that fine-tuned an open model to write accurate API calls and beat GPT-4 at the task.

paper May 25, 2023

Voyager: An Open-Ended Embodied Agent with Large Language Models

The 2023 paper in which a GPT-4-driven agent explored Minecraft on its own, writing and saving reusable code as skills.

paper May 29, 2023

Direct Preference Optimization (DPO)

The 2023 Stanford paper showing preference alignment can be done with a simple classification loss, skipping the reward model and RL of classic RLHF.

paper May 29, 2023

Faith and Fate: Limits of Transformers on Compositionality

Shows transformers solve multi-step tasks by matching memorized subpatterns, not true compositional reasoning.

paper May 31, 2023

Let's Verify Step by Step (process supervision)

The 2023 OpenAI paper showing step-by-step process supervision beats outcome supervision, and releasing the PRM800K dataset.

paper June 1, 2023

AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

A 2023 quantization method that protects a small fraction of salient weights, guided by activations, to compress LLMs to 4 bits accurately.

paper June 2023

The RefinedWeb Dataset for Falcon LLM

The 2023 TII paper arguing that filtered, deduplicated web data alone can train models that beat ones trained on curated corpora like The Pile.

paper June 8, 2023

Simple and Controllable Music Generation (MusicGen)

Meta's 2023 MusicGen generated music from text or a melody with a single transformer, and shipped open as part of AudioCraft.

paper June 9, 2023

Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena

The 2023 paper that introduced MT-Bench and showed GPT-4 can grade other chatbots with over 80% agreement with humans.