Datasheets for Datasets
Gebru and colleagues' 2018 proposal that every dataset ship with a standard document recording its motivation, collection, and recommended uses.
What the papers actually said - linked to the originals.
Gebru and colleagues' 2018 proposal that every dataset ship with a standard document recording its motivation, collection, and recommended uses.
DeepMimic trains physics-based game characters to imitate motion-capture clips with deep reinforcement learning.
The 2018 OpenAI paper proposing that two AI agents debate to help a human judge decide questions too hard to evaluate directly.
DeepMind's 2018 Nature paper in which an AI navigation network spontaneously developed grid-cell-like codes resembling those in the mammal brain.
A 2018 CVPR paper released 859,000 citizen-science photos of 5,000+ species, a deliberately imbalanced real-world vision benchmark.
Neural ODEs replace discrete network layers with a continuous dynamics solved by an ODE solver, won a NeurIPS best paper.
The 2018 paper showing infinitely wide neural networks behave like a fixed kernel method, opening a theory of training dynamics.
The 2018 paper that made neural architecture search efficient by relaxing the discrete search into a continuous, gradient-trainable problem.
Falkner, Klein, and Hutter's 2018 method that combines Bayesian optimization with Hyperband for fast, robust hyperparameter tuning.
Glow was a flow-based generative model with exact likelihoods that produced realistic, editable face images.
The 2018 paper introducing Ray Tune, a distributed framework that unifies many hyperparameter search algorithms behind one interface.
A 2018 Science paper from UCLA that built a neural network out of 3D-printed diffractive layers, classifying digits with light instead of electronics.
A tokenizer that learns subword units directly from raw text, with no language-specific pre-tokenization needed.
A 2018 Nature paper using a neural network to forecast where earthquake aftershocks occur, sparking a debate about deep learning in seismology.
Unity releases ML-Agents, turning its game engine into a platform for training agents with reinforcement learning.
Google's 2018 Nature Biotechnology paper recast DNA variant calling as image classification, beating prior tools and generalizing across species.
BigGAN scaled GANs to large batches and models to generate high-fidelity, class-conditional ImageNet images.
This paper analyzed the expressive limits of graph networks and proposed GIN, a model as discriminating as the Weisfeiler-Lehman test.
The 2018 Google paper introducing BERT, a Transformer that reads text in both directions and set the template of pre-train-then-fine-tune for NLP.
Google's report on training Gboard's next-word prediction with federated learning, an early production deployment of the technique.
Google's 2018 paper introducing pipeline parallelism, splitting a deep model across accelerators by layer and pipelining micro-batches for near-linear speedup.
NVIDIA's 2018 StyleGAN generated photorealistic faces of people who do not exist and powered the viral site thispersondoesnotexist.com.
The open-source recreation of OpenAI's secret WebText corpus, built from outbound Reddit links so outsiders could study GPT-2's training data.
Adds segment-level recurrence and relative positions so transformers can model dependencies past a fixed window.
The 2019 paper that trained neural networks to obey physical laws written as differential equations, founding the PINN field.
The 2019 paper that introduced adapter modules, small trainable layers inserted into a frozen network to specialize it cheaply.
A 2019 paper giving the first scalable way to certify, with a proof, that no small perturbation can change a classifier's prediction.
The first public benchmark dataset for neural architecture search, with precomputed results for over five million trained models.
A 2019 survey of how NLP handles code-switching, the everyday mixing of two languages in one sentence by bilingual speakers.
The 2019 Holtzman et al. paper that introduced nucleus (top-p) sampling and explained why greedy and beam search produce dull, repetitive text.
A 2019 Microsoft and Zhejiang University model that generated speech spectrograms in parallel, far faster than autoregressive TTS.
Oreshkin and colleagues' deep neural forecaster that beat the M4 competition winner using only stacked fully-connected layers, no time-series machinery.
A 2019 paper by Udrescu and Tegmark recovered symbolic physics formulas from data, rediscovering all 100 equations from Feynman's lectures.
The 2019 Google paper that scaled network depth, width, and resolution together with one coefficient, hitting top accuracy at much smaller size.
Meta's open-sourced recommendation model and the engineering needed to train its huge embedding tables.
The 2019 Strubell paper that quantified the carbon cost of training NLP models, estimating a full architecture search at 626,155 lbs of CO2.
The 2019 paper that introduced mesa-optimization and inner alignment: the worry that a trained model becomes an optimizer with its own goal.
The 2019 study explaining why softening training targets improves accuracy and calibration but can hurt knowledge distillation.
Learns bidirectional context by predicting words in all possible orders, avoiding BERT's masking mismatch.
A 2019 review by Barrett and colleagues found facial movements are not a reliable readout of emotion, undercutting emotion-recognition AI.
The 2019 paper introducing Optuna, a define-by-run hyperparameter optimization framework with built-in pruning of poor trials.
Shows BERT was undertrained; with more data and longer training the same architecture beats later models.
Arik and Pfister's TabNet, a deep learning architecture for tabular data that uses sequential attention to pick features and stay interpretable.
FinBERT adapted Google's BERT to financial text, beating prior methods at judging whether financial news reads as positive or negative.
Method that turns BERT into a fast producer of sentence vectors, making semantic search over large collections practical.
NVIDIA's 2019 paper introducing tensor (intra-layer) model parallelism, a technique now standard for training the largest language models across many GPUs.
Cuts BERT's parameter count with factorized embeddings and weight sharing while keeping accuracy high.
Knowledge distillation shrinks BERT by 40 percent and makes it 60 percent faster while keeping 97 percent of accuracy.