Landmark Papers

What the papers actually said - linked to the originals.

644 entries, all primary-sourced

paper March 23, 2018

Datasheets for Datasets

Gebru and colleagues' 2018 proposal that every dataset ship with a standard document recording its motivation, collection, and recommended uses.

paper April 8, 2018

DeepMimic: Example-Guided Deep RL of Physics-Based Character Skills

DeepMimic trains physics-based game characters to imitate motion-capture clips with deep reinforcement learning.

paper May 2, 2018

AI Safety via Debate

The 2018 OpenAI paper proposing that two AI agents debate to help a human judge decide questions too hard to evaluate directly.

paper May 9, 2018

Vector-based navigation using grid-like representations in artificial agents

DeepMind's 2018 Nature paper in which an AI navigation network spontaneously developed grid-cell-like codes resembling those in the mammal brain.

paper June 18, 2018

The iNaturalist Species Classification and Detection Dataset

A 2018 CVPR paper released 859,000 citizen-science photos of 5,000+ species, a deliberately imbalanced real-world vision benchmark.

paper June 19, 2018

Neural Ordinary Differential Equations

Neural ODEs replace discrete network layers with a continuous dynamics solved by an ODE solver, won a NeurIPS best paper.

paper June 20, 2018

Neural Tangent Kernel: Convergence and Generalization

The 2018 paper showing infinitely wide neural networks behave like a fixed kernel method, opening a theory of training dynamics.

paper June 24, 2018

DARTS: Differentiable Architecture Search

The 2018 paper that made neural architecture search efficient by relaxing the discrete search into a continuous, gradient-trainable problem.

paper July 4, 2018

BOHB: Robust and Efficient Hyperparameter Optimization at Scale

Falkner, Klein, and Hutter's 2018 method that combines Bayesian optimization with Hyperband for fast, robust hyperparameter tuning.

paper July 9, 2018

Glow: Generative Flow with Invertible 1x1 Convolutions

Glow was a flow-based generative model with exact likelihoods that produced realistic, editable face images.

paper July 13, 2018

Tune: A Research Platform for Distributed Model Selection and Training

The 2018 paper introducing Ray Tune, a distributed framework that unifies many hyperparameter search algorithms behind one interface.

paper July 26, 2018

All-Optical Machine Learning Using Diffractive Deep Neural Networks

A 2018 Science paper from UCLA that built a neural network out of 3D-printed diffractive layers, classifying digits with light instead of electronics.

paper August 19, 2018

SentencePiece: Language-Independent Subword Tokenizer

A tokenizer that learns subword units directly from raw text, with no language-specific pre-tokenization needed.

paper August 29, 2018

Deep learning of aftershock patterns following large earthquakes

A 2018 Nature paper using a neural network to forecast where earthquake aftershocks occur, sparking a debate about deep learning in seismology.

paper September 7, 2018

Unity: A General Platform for Intelligent Agents (ML-Agents)

Unity releases ML-Agents, turning its game engine into a platform for training agents with reinforcement learning.

paper September 24, 2018

DeepVariant: a universal SNP and indel caller using deep neural networks

Google's 2018 Nature Biotechnology paper recast DNA variant calling as image classification, beating prior tools and generalizing across species.

paper September 28, 2018

Large Scale GAN Training for High Fidelity Image Synthesis (BigGAN)

BigGAN scaled GANs to large batches and models to generate high-fidelity, class-conditional ImageNet images.

paper October 1, 2018

How Powerful are Graph Neural Networks? (GIN)

This paper analyzed the expressive limits of graph networks and proposed GIN, a model as discriminating as the Weisfeiler-Lehman test.

paper October 11, 2018

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

The 2018 Google paper introducing BERT, a Transformer that reads text in both directions and set the template of pre-train-then-fine-tune for NLP.

paper November 8, 2018

Federated Learning for Mobile Keyboard Prediction

Google's report on training Gboard's next-word prediction with federated learning, an early production deployment of the technique.

paper November 16, 2018

GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism

Google's 2018 paper introducing pipeline parallelism, splitting a deep model across accelerators by layer and pipelining micro-batches for near-linear speedup.

paper December 12, 2018

A Style-Based Generator Architecture for GANs (StyleGAN)

NVIDIA's 2018 StyleGAN generated photorealistic faces of people who do not exist and powered the viral site thispersondoesnotexist.com.

paper 2019

OpenWebText: An Open Replication of GPT-2's WebText

The open-source recreation of OpenAI's secret WebText corpus, built from outbound Reddit links so outsiders could study GPT-2's training data.

paper January 9, 2019

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

Adds segment-level recurrence and relative positions so transformers can model dependencies past a fixed window.

paper February 2019

Physics-informed neural networks (Raissi, Perdikaris, Karniadakis)

The 2019 paper that trained neural networks to obey physical laws written as differential equations, founding the PINN field.

paper February 2, 2019

Parameter-Efficient Transfer Learning for NLP (Adapters)

The 2019 paper that introduced adapter modules, small trainable layers inserted into a frozen network to specialize it cheaply.

paper February 8, 2019

Certified Adversarial Robustness via Randomized Smoothing

A 2019 paper giving the first scalable way to certify, with a proof, that no small perturbation can change a classifier's prediction.

paper February 25, 2019

NAS-Bench-101: Towards Reproducible Neural Architecture Search

The first public benchmark dataset for neural architecture search, with precomputed results for over five million trained models.

paper March 25, 2019

A Survey of Code-switched Speech and Language Processing

A 2019 survey of how NLP handles code-switching, the everyday mixing of two languages in one sentence by bilingual speakers.

paper April 22, 2019

The Curious Case of Neural Text Degeneration (nucleus sampling)

The 2019 Holtzman et al. paper that introduced nucleus (top-p) sampling and explained why greedy and beam search produce dull, repetitive text.

paper May 22, 2019

FastSpeech: Fast, Robust and Controllable Text to Speech

A 2019 Microsoft and Zhejiang University model that generated speech spectrograms in parallel, far faster than autoregressive TTS.

paper May 24, 2019

N-BEATS: Neural Basis Expansion Analysis for Time Series Forecasting

Oreshkin and colleagues' deep neural forecaster that beat the M4 competition winner using only stacked fully-connected layers, no time-series machinery.

paper May 27, 2019

AI Feynman: a physics-inspired method for symbolic regression

A 2019 paper by Udrescu and Tegmark recovered symbolic physics formulas from data, rediscovering all 100 equations from Feynman's lectures.

paper May 28, 2019

EfficientNet: Rethinking Model Scaling

The 2019 Google paper that scaled network depth, width, and resolution together with one coefficient, hitting top accuracy at much smaller size.

paper May 31, 2019

Deep Learning Recommendation Model (DLRM)

Meta's open-sourced recommendation model and the engineering needed to train its huge embedding tables.

paper June 5, 2019

Energy and Policy Considerations for Deep Learning in NLP

The 2019 Strubell paper that quantified the carbon cost of training NLP models, estimating a full architecture search at 626,155 lbs of CO2.

paper June 5, 2019

Risks from Learned Optimization (Mesa-Optimization)

The 2019 paper that introduced mesa-optimization and inner alignment: the worry that a trained model becomes an optimizer with its own goal.

paper June 6, 2019

When Does Label Smoothing Help?

The 2019 study explaining why softening training targets improves accuracy and calibration but can hurt knowledge distillation.

paper June 19, 2019

XLNet: Generalized Autoregressive Pretraining

Learns bidirectional context by predicting words in all possible orders, avoiding BERT's masking mismatch.

paper July 2019

Emotional Expressions Reconsidered (Barrett et al.)

A 2019 review by Barrett and colleagues found facial movements are not a reliable readout of emotion, undercutting emotion-recognition AI.

paper July 25, 2019

Optuna: A Next-generation Hyperparameter Optimization Framework

The 2019 paper introducing Optuna, a define-by-run hyperparameter optimization framework with built-in pruning of poor trials.

paper July 26, 2019

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Shows BERT was undertrained; with more data and longer training the same architecture beats later models.

paper August 20, 2019

TabNet: Attentive Interpretable Tabular Learning

Arik and Pfister's TabNet, a deep learning architecture for tabular data that uses sequential attention to pick features and stay interpretable.

paper August 27, 2019

FinBERT: Financial Sentiment Analysis with Pre-trained Language Models

FinBERT adapted Google's BERT to financial text, beating prior methods at judging whether financial news reads as positive or negative.

paper August 27, 2019

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

Method that turns BERT into a fast producer of sentence vectors, making semantic search over large collections practical.

paper September 17, 2019

Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism

NVIDIA's 2019 paper introducing tensor (intra-layer) model parallelism, a technique now standard for training the largest language models across many GPUs.

paper September 26, 2019

ALBERT: A Lite BERT

Cuts BERT's parameter count with factorized embeddings and weight sharing while keeping accuracy high.

paper October 2, 2019

DistilBERT: A Distilled Version of BERT

Knowledge distillation shrinks BERT by 40 percent and makes it 60 percent faster while keeping 97 percent of accuracy.