Landmark Papers

What the papers actually said - linked to the originals.

644 entries, all primary-sourced

paper 1985

A Learning Algorithm for Boltzmann Machines

The 1985 paper by Ackley, Hinton, and Sejnowski that gave a learning rule to a stochastic neural network borrowed from statistical physics.

paper 1986

Parallel Distributed Processing (Rumelhart and McClelland)

The 1986 two-volume PDP books set out the modern connectionist program and popularized learning internal representations in neural networks.

paper October 9, 1986

Learning Representations by Back-Propagating Errors

The 1986 Nature paper by Rumelhart, Hinton, and Williams that popularized backpropagation, the algorithm that lets multi-layer neural networks learn.

paper November 12, 1986

The Structure of the Nervous System of the Nematode Caenorhabditis elegans

The 1986 White paper that mapped every neuron and connection of a worm, the first complete connectome of any animal's nervous system.

paper 1987

Parallel Networks that Learn to Pronounce English Text (NETtalk)

The 1987 NETtalk paper by Sejnowski and Rosenberg, a neural network that learned to turn written English into speech sounds.

paper September 1987

Soar: An Architecture for General Intelligence

The 1987 paper by Laird, Newell, and Rosenbloom set out Soar, a production-rule architecture meant as a unified theory of cognition.

paper August 1988

Learning to Predict by the Methods of Temporal Differences

Richard Sutton's 1988 paper introduced temporal-difference learning, the prediction method that became a pillar of reinforcement learning.

paper 1989

Approximation by Superpositions of a Sigmoidal Function

Cybenko's 1989 proof that a neural network with a single hidden layer can approximate any continuous function.

paper October 1990

Neuromorphic Electronic Systems (Mead)

Carver Mead's 1990 paper coined neuromorphic engineering, arguing analog VLSI modeled on neurons could compute far more efficiently than digital logic.

paper 1991

Eigenfaces for Recognition (Turk and Pentland)

The 1991 MIT paper that recognized faces by projecting them onto eigenfaces, the principal components of a set of face images.

paper 1991

Intelligence Without Representation

Rodney Brooks's manifesto for behavior-based robotics, arguing intelligent systems need no central world model: the world is its own best model.

paper 1991

Other Bodies, Other Minds: The Total Turing Test

Harnad's 1991 paper proposing the Total Turing Test, which requires robotic sensorimotor ability, not just conversation, to show a mind.

paper May 1992

Q-learning (Watkins and Dayan, 1992)

Watkins and Dayan's 1992 paper proved that Q-learning converges to optimal action values, giving model-free RL a firm guarantee.

paper May 1992

REINFORCE and policy-gradient methods (Williams, 1992)

Ronald Williams's 1992 REINFORCE paper gave reinforcement learning a way to improve a policy directly by following the gradient of expected reward.

paper June 1993

The Mathematics of Statistical Machine Translation (IBM Models)

The 1993 IBM paper recast translation as probability and word alignment, founding statistical machine translation and its five IBM Models.

paper August 1994

Okapi BM25: Approximations to the 2-Poisson Model

The Robertson-Walker ranking function that became the default keyword-search baseline for thirty years.

paper 1995

Cognitive Tutors (Anderson et al., 1995)

Carnegie Mellon's Cognitive Tutors used the ACT-R theory of cognition to build math tutoring software that reached real classrooms.

paper 1995

Facing Up to the Problem of Consciousness

David Chalmers's 1995 paper that split the 'easy problems' of mind from the 'hard problem' of why experience exists at all.

paper 1996

Regression Shrinkage and Selection via the Lasso

Tibshirani's 1996 paper introducing the lasso, an L1 penalty that shrinks regression coefficients and sets some to exactly zero.

paper August 1996

Bagging Predictors

Leo Breiman's 1996 paper introducing bagging, which builds many models on bootstrap samples and averages them to cut variance.

paper March 14, 1997

A Neural Substrate of Prediction and Reward

Schultz, Dayan, and Montague showed dopamine neurons signal reward prediction error, the same quantity that drives temporal-difference learning.

paper April 1997

No Free Lunch Theorems for Optimization

Wolpert and Macready's 1997 paper proving that no optimization algorithm beats all others when averaged over every possible problem.

paper August 1997

A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting

Freund and Schapire's 1997 paper introducing AdaBoost, the algorithm that combines many weak rules into one strong classifier.

paper 1998

A Bayesian Approach to Filtering Junk E-Mail

The 1998 paper that applied a naive Bayes classifier to spam, a foundational use of machine learning for cybersecurity.

paper 1998

How Long Before Superintelligence? (Bostrom, 1998)

Nick Bostrom's 1998 paper defined superintelligence and argued the hardware to build it would arrive in the early 21st century.

paper January 29, 1998

The PageRank Citation Ranking: Bringing Order to the Web

PageRank ranked web pages by treating links as votes, modeling a random surfer to measure each page's importance.

paper March 1998

Reinforcement Learning: An Introduction (Sutton and Barto)

Sutton and Barto's textbook became the standard reference for reinforcement learning and is given away free by the authors online.

paper November 1998

Gradient-Based Learning Applied to Document Recognition (LeNet-5)

The 1998 paper by LeCun and colleagues that introduced LeNet-5, the convolutional neural network that read handwritten digits and shaped modern computer vision.

paper January 1999

Predictive Coding in the Visual Cortex (Rao and Ballard)

Rao and Ballard modeled vision as a hierarchy where higher areas predict lower-level activity and only the prediction errors flow upward.

paper April 3, 2000

A Theory of Universal Artificial Intelligence based on Algorithmic Complexity (AIXI)

Hutter's 2000 paper that fuses Solomonoff induction with decision theory to define AIXI, a mathematically optimal but uncomputable agent.

paper April 24, 2000

The Information Bottleneck Method

The 2000 paper framing learning as compressing an input while keeping what is relevant to a target, later applied to deep networks.

paper 2001

Creativity, the Turing Test, and the (Better) Lovelace Test

Bringsjord, Bello, and Ferrucci's 2001 paper proposing a creativity-based alternative to the Turing Test for machine minds.

paper October 2001

Greedy Function Approximation: A Gradient Boosting Machine

Jerome Friedman's 2001 paper that formalized gradient boosting as stagewise function fitting via gradient descent, the basis for XGBoost and its successors.

paper October 2001

Random Forests

Leo Breiman's 2001 paper introducing random forests, an ensemble of randomized decision trees that became a default workhorse classifier.

paper 2002

k-Anonymity: A Model for Protecting Privacy

Latanya Sweeney's model requiring each released record to be indistinguishable from at least k-1 others, an early formal privacy standard.

paper July 2002

BLEU: a Method for Automatic Evaluation of Machine Translation

The 2002 BLEU paper gave machine translation a cheap automatic score, and it became the field's default metric for two decades.

paper 2003

A Neural Probabilistic Language Model (Bengio 2003)

The foundational paper that learned distributed word representations with a neural net to fight the curse of dimensionality.

paper 2003

Latent Dirichlet Allocation

The 2003 LDA paper introduced topic modeling, a way to discover the hidden themes running through a collection of documents.

paper May 27, 2003

Statistical Phrase-Based Translation (Koehn, Och, Marcu 2003)

The 2003 phrase-based translation paper showed translating chunks of words, not single words, sharply improved statistical machine translation.

paper 2004

Distinctive Image Features from Scale-Invariant Keypoints (SIFT)

David Lowe's 2004 paper defining SIFT, features invariant to scale and rotation that dominated image matching before deep learning.

paper July 2004

NLTK: The Natural Language Toolkit

The 2004 ACL paper introducing NLTK, the open-source Python toolkit that taught a generation how to do natural language processing.

paper November 2, 2004

An Information Integration Theory of Consciousness

Tononi's 2004 paper proposing that consciousness is the capacity of a system to integrate information, measured by a quantity called phi.

paper December 2004

The Bayesian Brain (Knill and Pouget)

Knill and Pouget's review argued the brain represents uncertainty and combines evidence in a near-optimal Bayesian way during perception and action.

paper June 20, 2005

Histograms of Oriented Gradients for Human Detection (HOG)

The 2005 paper introducing the HOG descriptor, a pre-deep-learning feature that became the standard for pedestrian and object detection.

paper June 29, 2005

METEOR: An Automatic Metric for MT Evaluation (Banerjee, Lavie 2005)

METEOR, introduced in 2005, scored machine translation by matching word stems and synonyms, correlating with human judgment better than BLEU.

paper August 7, 2005

Learning to Rank using Gradient Descent (RankNet)

The Microsoft paper that introduced RankNet and helped make machine-learned ranking the basis of modern search.

paper 2006

Gaussian Processes for Machine Learning

The 2006 Rasmussen and Williams book that became the standard reference for Gaussian process models in machine learning.

paper March 4, 2006

Calibrating Noise to Sensitivity in Private Data Analysis

The foundational differential privacy paper, showing how to add noise scaled to a query's sensitivity to protect any single individual.