Rajesh P. N. Rao and Dana H. Ballard published “Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects” in Nature Neuroscience in January 1999 (Vol. 2, pages 79-87). It became the most influential computational statement of predictive coding as a theory of cortical function.
The model arranges visual processing as a hierarchy. Feedback connections from a higher cortical area carry predictions of the activity in the area below, while feedforward connections carry only the residual error between the prediction and what actually occurred. The system learns by trying to make its predictions match incoming signals, so the bulk of the traffic moving up the hierarchy is the part the brain failed to anticipate rather than the raw input.
When Rao and Ballard trained their hierarchical network on natural images, the units developed receptive fields resembling those of simple cells in primary visual cortex, and the network reproduced extra-classical receptive-field effects such as endstopping. These context effects, which had been puzzling, fell out naturally from a system using feedback to predict and explain away its input as an efficient code for natural scenes.
For a general reader the paper matters because it offers a single principle, prediction-error minimization, that ties together perception, learning, and the wiring of feedback in the cortex, and it anticipates ideas now common in self-supervised machine learning.