Survey of Hallucination in Natural Language Generation
The 2022 Ji et al. survey that organized the study of hallucination across NLP tasks and became a standard reference.
What the papers actually said - linked to the originals.
The 2022 Ji et al. survey that organized the study of hallucination across NLP tasks and became a standard reference.
A 2022 Nature paper by DeepMind and EPFL used reinforcement learning to control the magnetic coils of a real tokamak and sculpt fusion plasma shapes.
The 2022 OpenAI paper behind InstructGPT, showing a 1.3B model tuned with human feedback was preferred over the 175B GPT-3 - the recipe behind ChatGPT.
The 2022 Anthropic paper identifying induction heads, an attention circuit that appears to drive in-context learning in transformers.
The 2022 Wang et al. method that samples many reasoning paths and takes a majority vote on the final answer.
SayCan paired a language model's task knowledge with robot skill value functions so a robot only attempts steps it can actually do.
The 2022 OpenAI paper behind DALL-E 2, generating images by inverting CLIP embeddings through a prior and a diffusion decoder.
DeepMind's Flamingo bridged frozen vision and language models so one model handled new image tasks from a few examples.
A 2022 Microsoft TTS system that, on a standard benchmark, produced speech statistically indistinguishable from human recordings.
The 2022 Zhou et al. paper that solves hard problems by decomposing them into simpler subproblems solved in sequence.
Google's 2022 Imagen paper showed a frozen text-only language model is a surprisingly strong text encoder for image generation.
The 2022 Kojima et al. paper showing the phrase 'Let's think step by step' triggers multi-step reasoning with zero examples.
Showed LLMs can translate natural-language math into formal proofs, perfectly converting a quarter of competition problems.
Training method that packs coarse-to-fine detail into one embedding, so you can truncate it to shorter vectors without retraining.
The 2022 paper that sped up Transformer attention by minimizing GPU memory traffic rather than approximating, enabling far longer context windows.
The 2022 Wei et al. paper arguing that some LLM capabilities appear suddenly at scale, absent in small models and unpredictable from their trends.
Hollmann and colleagues' TabPFN, a pre-trained transformer that classifies small tabular datasets in seconds with no training or tuning.
OSDI 2022 system that introduced iteration-level scheduling (continuous batching) to serve generative transformers far more efficiently.
Grinsztajn, Oyallon, and Varoquaux's careful benchmark showing tree ensembles still beat deep learning on medium-sized tabular data, and explaining why.
Classifier-free guidance let diffusion models follow a text prompt more closely without needing a separate classifier.
A 2022 Nature Communications paper showing a GPT-style language model can generate realistic, novel protein sequences from scratch.
The 2022 paper behind bitsandbytes, halving LLM memory with 8-bit math while keeping full accuracy by isolating outlier features.
DreamBooth, the 2022 Google method that teaches a diffusion model a specific subject from just a few photos using a unique token.
Google's 2022 AudioLM treated audio as a string of tokens and generated speech and piano continuations by predicting the next one.
Rectified flow learns straight transport paths between noise and data, enabling fast, even single-step, generation.
NVIDIA, Arm, and Intel jointly propose two 8-bit floating-point formats for AI, the precision behind Hopper and Blackwell.
The 2022 Anthropic paper showing neural networks pack more features than they have neurons by storing them in superposition.
DreamFusion generated 3D objects from text by optimizing a NeRF against a frozen 2D image diffusion model, with no 3D data.
Meta's Make-A-Video generated video from text by learning appearance from image-text pairs and motion from unlabeled video.
Flow matching gave a simple, simulation-free way to train continuous normalizing flows, rivaling diffusion in quality.
The 2022 paper that interleaved reasoning traces with tool actions, a recipe that became the backbone of modern LLM agents.
Meta's 2022 EnCodec compressed audio into discrete tokens with a neural codec, becoming the token layer for later audio language models.
The 2022 paper proposing a smoothly-broken power law that fits and extrapolates scaling behavior, including double descent and sharp jumps.
Method to compress large language model weights to 3-4 bits after training, letting big models run on a single GPU.
The 2022 Anthropic paper proposing the sandwiching method to study how humans can supervise AI on tasks the AI handles better than they do.
The 2022 Gao et al. paper that has a language model write code as its reasoning steps and offloads the calculation to a Python interpreter.
The 2022 Google paper introducing speculative decoding, which uses a small draft model to make a large model generate text 2-3x faster with identical output.
Microsoft embedding family trained with contrastive learning on web pairs; first to beat BM25 zero-shot on the BEIR benchmark.
Google's RT-1 trained a transformer on 130,000 robot demonstrations covering 700 tasks to control real robots at scale.
The 2022 Anthropic paper introducing Constitutional AI, which trains a harmless model using AI feedback guided by a written set of principles.
The 2022 Anthropic paper that used language models to write 154 test datasets, revealing sycophancy and goal-seeking that grow with scale and RLHF.
Peebles and Xie replaced the U-Net in diffusion models with a transformer, showing that more compute reliably lowers error.
The 2022 Gao et al. paper that retrieves documents by first having a model write a fake answer, then embedding and matching it.
DeepMind's 2022 paper on GraphCast, a graph neural network that forecasts global weather faster and more accurately than the leading conventional system.
Microsoft's 2023 VALL-E cloned a voice from a 3-second sample by treating text-to-speech as language modeling over codec tokens.
The 2023 DreamerV3 paper used one fixed configuration to beat specialized agents across 150+ tasks and mine Minecraft diamonds.
A 2023 report by OpenAI, Stanford, and Georgetown on how language models could change online propaganda and how to blunt it.
A 2023 paper that embeds a hidden, statistically detectable signal in LLM text so machine-generated output can be identified.