A support vector machine (SVM) is a method for teaching a computer to sort things into categories, for example spam versus not-spam. The idea was introduced in its modern form by Corinna Cortes and Vladimir Vapnik in their 1995 paper “Support-Vector Networks.” The paper describes “a new learning machine for two-group classification problems.”
The core intuition is the margin. Imagine plotting your examples as points and drawing a line that separates the two classes. Many lines might separate them, but an SVM looks for the one that leaves the widest gap, the largest margin, between the line and the nearest points on either side. Those nearest points are the “support vectors,” and they alone determine the boundary; the rest of the data could be removed without changing it. A wide margin tends to generalize better to new, unseen examples.
Real data is rarely separable by a straight line. SVMs handle this with the kernel trick: a kernel is a function that, as the paper puts it, lets input vectors be “non-linearly mapped to a very high-dimension feature space” where a straight separating surface can be found. The clever part is that the math never has to compute coordinates in that huge space directly, which keeps the method efficient even when the implied space is enormous.
The historical role is what makes SVMs worth knowing. With strong theoretical backing from Vapnik’s statistical learning theory and excellent practical results, SVMs and related kernel methods were the dominant, most respected approach to classification from the late 1990s through the 2000s. For roughly fifteen years they outperformed neural networks on many tasks and were widely seen as the more principled choice, which is a large part of why the neural network revival arrived only in the 2010s.