Neural Tangent Kernel: Convergence and Generalization

“Neural Tangent Kernel: Convergence and Generalization in Neural Networks” was submitted to arXiv on June 20, 2018 by Arthur Jacot, Franck Gabriel, and Clement Hongler at EPFL. It gave theorists a powerful new handle on the otherwise hard question of what neural networks actually do while they train.

The central result concerns the limit of infinite width. As a network’s hidden layers are made arbitrarily wide, the function it computes during gradient-descent training turns out to be governed by a fixed object the authors call the Neural Tangent Kernel, or NTK. In this limit the NTK stops changing as training proceeds, which means the network’s evolution can be analyzed in function space using the mature mathematics of kernel methods rather than the tangled, high-dimensional space of its weights.

From this vantage point several things become provable. Training convergence can be tied to the kernel being positive-definite; for least-squares regression the network follows a simple linear differential equation as it learns; and the kernel’s principal components give a principled account of why early stopping helps. In effect the NTK builds a bridge between deep networks and the classical theory of kernel machines.

For a general reader, the neural tangent kernel matters less for any product than for understanding: it is one of the few results that turns the famously opaque process of training a neural network into something mathematicians can reason about exactly, even if only in an idealized limit.

Neural Tangent Kernel: Convergence and Generalization

Sources

Related