The rectified linear unit, or ReLU, is the activation function that sits between the layers of most modern neural networks and gives them the ability to model nonlinear relationships. Its definition could hardly be simpler: it passes positive values through unchanged and replaces any negative value with zero. Despite that simplicity it has two big advantages over the smooth, S-shaped functions like the sigmoid that preceded it. It is cheap to compute, and crucially it does not saturate for positive inputs, which means gradients flow through it cleanly during training rather than shrinking toward zero as they do at the flat ends of a sigmoid. That property, avoiding the vanishing-gradient problem, is much of what made very deep networks trainable.
The use of rectified linear units in modern deep learning is associated with the 2010 paper “Rectified Linear Units Improve Restricted Boltzmann Machines” by Vinod Nair and Geoffrey Hinton at the University of Toronto, which showed that they learned better features than binary units for object recognition. ReLU then featured prominently in the 2012 AlexNet breakthrough and rapidly became the default choice. Its one notable weakness, that a neuron stuck outputting zero can stop learning entirely, prompted variants such as Leaky ReLU and the parametric ReLU.
Why a business reader should care: ReLU is a tiny mathematical choice with outsized impact, one of the handful of ingredients that turned neural networks from a finicky research idea into the practical, deep architectures behind today’s AI.