"The support-vector network combines 3 ideas: the solution technique from optimal hyperplanes (that allows for an expansion of the solution vector on support vectors), the idea of convolution of the dot-product (that extends the solution surfaces from linear to non-linear), and the notion of soft margins (to allow for errors on the training set)."

Cortes and Vapnik (1995)

"The methods presented in the last two sections, namely the idea of regularization, and the kernel technique, are elegantly combined in a learning algorithm known as *support vector learning* (SV learning)."

Herbrich p49

(problems addressed)
local minima

size of output hypothesis

overfitting/generalization

small number of parameters

"The key features of SVMs are the use of kernels, the absence of local minima, the sparseness of the solution and the capacity control obtained by optimising the margin."

Shawe-Taylor and Cristianini (2004)

"It is, however, the combination of this optimisation problem [the problem of separating two sets of data with a maximal margin hyperplane] with kernels that produced support vector machines,..."

Shawe-Taylor and Cristianini (2004)