Kernel Methods (.pdf)
Firstly, linearity is rather special, and outside quantum mechanics no model of a real system is truly linear. Secondly, detecting linear relations has been the focus of much research in statistics and machine learning for decades and the resulting algorithms are well understood, well developed and efficient. Naturally, we want the best of both worlds. So, if a problem is nonlinear, instead of trying to fit a nonlinear model, we can map the problem from the input space to a new (higher-dimensional) space (called the feature space) by doing a nonlinear transformation using suitably chosen basis functions and then use a linear model in the feature space. The linear model in the feature space corresponds to a nonlinear model in the input space. This approach can be used in both classification and regression problems. The choice of kernel function is crucial for the success of all kernel algorithms because the kernel constitutes prior knowledge that is available about a task. Accordingly, there is no free lunch (see No Free Lunch Theorems) in kernel choice.