SVM Parameters

C

“However, it is critical here, as in any regularization scheme, that a proper value is chosen for C, the penalty factor. If it is too large, we have a high penalty for nonseparable points and we may store many support vectors and overfit. If it is too small, we may have underfitting.”
Alpaydin (2004), page 224

“…the coefficient C affects the trade-off between complexity and proportion of nonseparable samples and must be selected by the user.”
Cherkassky and Mulier (1998), page 366

“Selecting parameter C equal to the range of output values [6]. This is a reasonable proposal, but it does not take into account possible effect of outliers in the training data.”
“Our empirical results suggest that with optimal choice of ε, the value of regularization parameter C has negligible effect on the generalization performance (as long as C is larger than a certain threshold analytically determined from the training data).”
Cherkassky and Ma (2002)

“In the support-vector networks algorithm one can control the trade-off between complexity of decision rule and frequency of error by changing the parameter C,…”
Cortes and Vapnik (1995)

“There are a number of learning parameters that can be utilized in constructing SV machines for regression. The two most relevant are the insensitivity zone e and the penalty parameter C, which determines the trade-off between the training error and VC dimension of the model. Both parameters are chosen by the user.”
Kecman (2001), page 182

The parameter C controls the trade off between errors of the SVM on training data and margin maximization (C = ∞ leads to hard margin SVM).
Rychetsky (2001), page 82

“The parameter C controls the trade-off between the margin and the size of the slack variables.”
Shawe-Taylor and Cristianini (2004) p220

“[Tuning the parameter C] In practice the parameter C is varied through a wide range of values and the optimal performance assessed using a separate validation set or a technique known as cross-validation for verifying performance using only a training set.”
Shawe-Taylor and Cristianini (2004) p220

“…the parameter C has no intuitive meaning.”
Shawe-Taylor and Cristianini (2004) p225

“The factor C in (3.15) is a parameter that allows one to trade off training error vs. model complexity. A small value for C will increase the number of training errors, while a large C will lead to a behavior similar to that of a hard-margin SVM.”
Joachims (2002), page 40

“Let us suppose that the output values are in the range [0, B]. […] a value of C about equal to B can be considered to be a robust choice.”
Mattera and Haykin (1999), pages 226-227 in Advances in Kernel Methods

Epsilon (ε)

“Similarly, Mattera and Haykin [6] propose to choose ε – value so that the percentage of SVs in the SVM regression model is around 50% of the number of samples. However, one can easily show examples when optimal generalization performance is achieved with the number of SVs larger or smaller than 50%.”
“Smola et al [8] and Kwok [9] proposed asymptotically optimal ε – values proportional to noise variance, in agreement with general sources on SVM [2,7]. The main practical drawback of such proposals is that they do not reflect sample size. Intuitively, the value of ε should be smaller for larger sample size than for small sample size (with same noise level).”
“Optimal setting of ε requires the knowledge of noise level. The noise variance can be estimated directly from training data, i.e. by fitting very flexible (high-variance) estimator to the data. Alternatively, one can first apply least-modulus regression to the data, in order to estimate noise level.”
Cherkassky and Ma (2002)

“For an SVM the value of ε in the ε-insensitive loss function should also be selected. ε has an effect on the smoothness of the SVM’s response and it affects the number of support vectors, so both the complexity and the generalization capability of the network depend on its value. There is also some connection between observation noise in the training data and the value of ε. Fixing the parameter ε can be useful if the desired accuracy of the approximation can be specified in advance.”
Horváth (2003), page 392 in Suykens et al.

“There are a number of learning parameters that can be utilized in constructing SV machines for regression. The two most relevant are the insensitivity zone e and […] Both parameters are chosen by the user. […] An increase in e means a reduction in requirements for the accuracy of approximation. It also decreases the number of SVs, leading to data compression.”
Kecman (2001), pages 182-183

“Under the assumption of asymptotically unbiased estimators we show that there exists a nontrivial choice of the insensitivity parameter in Vapnik’s ε-insensitive loss function which scales linearly with the input noise of the training data. This finding is backed by experimental results.”
Smola, et al. (1998),

“The value of epsilon determines the level of accuracy of the approximated function. It relies entirely on the target values in the training set. If epsilon is larger than the range of the target values we cannot expect a good result. If epsilon is zero, we can expect overfitting. Epsilon must therefore be chosen to reflect the data in some way. Choosing epsilon to be a certain accuracy does of course only guarantee that accuracy on the training set; often to achieve a certain accuracy overall, we need to choose a slightly smaller epsilon.”

“Parameter ε controls the width of the ε-insensitive zone, used to fit the training data. The value of ε can affect the number of support vectors used to construct the regression function. The bigger ε, the fewer support vectors are selected. On the other hand, bigger ε-values results in more �flat� estimates. Hence, both C and ε-values affect model complexity (but in a different way).”
Support Vector Machine Regression

“A robust compromise can be to impose the condition that the percentage of Support Vectors be equal to 50%. A larger value of ε can be utilized (especially for very large and/or noisy training sets)…”
Mattera and Haykin (1999)

“the optimal value of ε scales linearly with σ [variance of Gaussian noise].”
Learning with Kernels, page 79

Kernel Parameters

“For classification problems, the optimal σ can be computed on the basis of Fisher discrimination. And for regression problems, based on scale space theory, we demonstrate the existence of a certain range of σ, within which the generalization performance is stable. An appropriate σ within the range can be achieved via dynamic evaluation. In addition, the lower bound of iterating step size of σ is given.”
Wang, et al., 2003.

Ali and Smith (2003) proposed an automatic parameter selection approach for the polynomial kernel.

General

[c] Ancona-etal02 showed that the Receiver Operating Characteristic (ROC) curves, measured on a suitable validation set, are effective for selecting, among the classifiers the machine implements, the one having performances similar to the reference classifier.

Bibliography

- ALI, S. and K.A. SMITH, 2003. Automatic parameter selection for polynomial kernel, Proceedings of the IEEE International Conference on Information Reuse and Integration (IRI 2003), pages 243-249. [Cited by 2] (0.62/year)
- ALPAYDIN, Ethem, 2004. Introduction to Machine Learning. books.google.com. [Cited by 28] (12.60/year)
- ANCONA, N., et al., 2002. Object detection in images: Run-time complexity and parameter selection of Support Vector Machines, Proceedings of the 16th International Conference on Pattern Recognition (ICPR’02) – Volume 2, pages 426-429. [not cited] (0/year)
- BOARDMAN, Matthew and Thomas TRAPPENBERG, 2006. A Heuristic for Free Parameter Optimization with Support Vector Machines, Proceedings of the 2006 IEEE International Joint Conference on Neural Networks (IJCNN 2006), pp. 1337-1344. [not cited] (0/year)
- CASSABAUM, Mary L., et al., 2004. Unsupervised optimization of support vector machine parameters, Automatic Target Recognition XIV. Proceedings of the SPIE, Volume 5426 edited by Firooz A. Sadjadi, pages 316-325. [Cited by 2] (0.91/year)
- CHALIMOURDA, Athanassia, Bernhard SCHOLKOPF and Alex J. SMOLA, 2000. Choosing $\nu$ in support vector regression with different noise models—theory and experiments, IJCNN 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks, Volume 5, edited by Shun-Ichi Amari et al., Pages 199-204. [Cited by 1] (0.16/year)
- CHALIMOURDA, Athanassia, Bernhard SCHÖLKOPF and Alex J. SMOLA, 2004. Experimentally optimal $\nu$ in support vector regression for different noise models and parameter settings, Neural Networks, Volume 17, Issue 1, January 2004, Pages 127-141. [Cited by 9] (4.07/year)
- CHALIMOURDA, Athanassia, Bernhard SCHÖLKOPF and Alex J. SMOLA, 2005. Letter to the Editor: Experimentally optimal $\nu$ in support vector regression for different noise models and parameter settings, Neural Networks, Volume 18, Issue 2 (March 2005), Page 205.
- CHAPELLE, Olivier, et al., 2002. Choosing multiple parameters for support vector machines, Machine Learning, Volume 46, Numbers 1-3 (January 2002), Pages 131-159. [Cited by 257] (61.07/year)

Chapelle, Olivier, et al.,
Uses gradient descent.

- CHERKASSKY, Vladimir and Yunqian MA, 2002. Selection of meta-parameters for support vector regression, Artificial Neural Networks — ICANN 2002: International Conference, Madrid, Spain, August 2002, Proceedings, edited by José R. Dorronsoro, pages 687-693. [Cited by 8] (1.90/year)
- CHERKASSKY, Vladimir and Yunqian MA, 2003. Comparison of Model Selection for Regression, Neural Computation, Volume 15, Issue 7 (July 2003), Pages 1691-1714. [Cited by 13] (4.05/year)
- CHERKASSKY, Vladimir and Yunqian MA, 2004. Practical selection of SVM parameters and noise estimation for SVM regression, Neural Networks, Volume 17, Issue 1, January 2004, Pages 113-126. [Cited by 54] (24.45/year)
- CHERKASSKY, Vladimir and Filip MULIER, 1998. Learning from Data: Concepts, Theory, and Methods, John Wiley & Sons, Inc. New York, N.Y., USA. [Cited by 367] (44.71/year)
- CORTES, C. and V. VAPNIK, 1995. Support-vector networks. Machine Learning. [Cited by 1802] (160.55/year)
- DEBNATH, R. and H. TAKAHASHI, 2004. An efficient method for tuning kernel parameter of the support vector machine, Proceeding of the IEEE International Symposium on Communications and Information Technology, 2004 (ISCIT 2004), Volume 2, pp. 1023-1028. [not cited] (0/year)
- DUAN, Kaibo, S. Sathiya KEERTHI and Aun Neow POO, 2003. Evaluation of simple performance measures for tuning SVM hyperparameters, Neurocomputing, Volume 51, April 2003, Pages 41-59. [Cited by 77] (24.00/year)

“For SVMs with L1 soft-margin formulation, none of the simple measures yields a performance uniformly as good as k-fold cross validation; Joachims� Xi-Alpha bound and the GACV of Wahba et al. come next and perform reasonably well. For SVMs with L2 soft-margin formulation, the radius margin bound gives a very good prediction of optimal hyperparameter values.”
Duan, Keerthi and Poo, 2002

- FROHLICH, H. and A. ZELL, 2005. Efficient parameter selection for support vector machines in classification and regression via model-based global optimization, Proceedings of the 2005 IEEE International Joint Conference on Neural Networks (IJCNN ’05), Volume 3, pages 1431-1436. [not cited] (0/year)
- GOLD, Carl and Peter SOLLICH, 2005. Fast Bayesian Support Vector Machine Parameter Tuning with the Nystrom Method, Proceedings of the IEEE International Joint Conference on Neural Networks (IJCNN ’05), Volume 5, pages 2820-2825. [Cited by 1] (0.82/year)
- HORVATH, G., 2003. Neural Networks in Measurement Systems (an engineering view). NATO SCIENCE SERIES SUB SERIES III COMPUTER AND SYSTEMS …. [not cited] (0/year)
- IMBAULT, F. and K. LEBART, 2004. A stochastic optimization approach for parameter tuning of support vector machines, Proceedings of the 17th International Conference on Pattern Recognition (ICPR 2004), Volume 4, Pages 597-600. [Cited by 1] (0.45/year)
- JENG, Jin-Tsong, 2006. Hybrid approach of selecting hyperparameters of support vector machine for regression, IEEE Transactions on Systems, Man, and Cybernetics—Part B: Cybernetics, Volume 36, Issue 3, June 2006, Pages 699-709. [not cited] (0/year)
- JENG, Jing-Tsong and Chen-Chia CHUANG, 2002. A novel approach for the hyperparameters of support vector regression, Proceedings of the 2002 International Joint Conference on Neural Networks (IJCNN ’02), Volume 1, Pages 642-647. [Cited by 1] (0.24/year)
- JOACHIMS, T., 2002. Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms. books.google.com. [Cited by 263] (62.26/year)
- JORDAAN, E.M. and G.F. SMITS, 2002. Estimation of the regularization parameter for support vector regression, Proceedings of the 2002 International Joint Conference on Neural Networks (IJCNN ’02), Volume 3, Pages 2192-2197. [Cited by 5] (1.19/year)
- KECMAN, V., 2001. Learning and soft computing. MIT Press Cambridge, Mass. [Cited by 53] (10.15/year)
- KLINKENBERG, Ralf, 2002. “Informed Parameter Setting for Support Vector Machines: Using Additional User Knowledge in Classification Tasks” (Report Number CI-126/02, Collaborative Research Center on Computational Intelligence (SFB CI) (SFB 531), University of Dortmund, Dortmund, Germany, 2002; ISSN 1433-3325). Informed Parameter Setting for Support Vector Machines: Using Additional User Knowledge in Classification Tasks Many applications of machine learning involve the learning of classifiers. Given a set of labeled training examples, the task is to learn a classifier for predicting the labels of previously unseen examples. By providing the labels of the training examples, the user already specifies a lot of her or his knowledge about the classification problem at hand. In some cases, however, the user may not be satisified with the result provided by the learning method. Hence the user may want to specificy additional knowledge about the problem or constraints on the desired solution and she or he may want the learner to provide a classifier that fits better to her or his needs. The goal of this research is to allow the user to specify additional knowledge about the classification problem and to incorporate this knowledge into the learning process. In this work, support vector machines (SVMs) were chosen as learning methods for classifiers. Two methods for integrating user knowledge into the learning process of an SVM for classification tasks are discussed. Example weighting allows the user to set individual weights for individual traning examples, which are then used in SVM training. Kernel modification allows the incorporation of a user’s knowledge about similarities and dissimilarities of examples into the SVM training by modified kernel functions. \citeasnoun{Klinkenberg02} discussed two methods for integrating user knowledge into the learning process of an SVM for classification. Example weighting allows the user to set individual weights for individual training examples, whilst kernel modification allows the incorporation of a user’s knowledge about similarities and dissimilarities of examples.
- KUBA, Petr, et al., 2002. Exploiting sampling and meta-learning for parameter setting support vector machines, Proceedings of the Workshop de Minería de Datos Y Aprendizaje of (IBERAMIA 2002), edited by F.J. Garijo and J.C. Riquelme and M. Toro, pages 217-225. [Cited by 1] (0.24/year)
- KULKARNI, Abhijit, V.K. JAYARAMAN and B.D. KULKARNI, 2004. Support vector classification with parameter tuning assisted by agent-based technique, Computers and Chemical Engineering [Cited by 7] (3.15/year)
- KWOK, James T. and Ivor W. TSANG, 2003. Linear Dependency between ε and the Input Noise in ε-Support Vector Regression, IEEE Transactions on Neural Networks [Cited by 10] (1.92/year)
- LIM, Hojung, 2004. Support vector parameter selection using experimental design based generating set search (SVEG) with application to predictive software data modeling, Doctoral Thesis, Syracuse University. [not cited] (0/year)
- LIN, Pao-Tsun, Shun-Feng SU and Tsu-Tian LEE, 2005. Support vector regression performance analysis and systematic parameter selection, Proceedings of the International Joint Conference on Neural Networks (IJCNN ’05), Volume 2, pages 877-882. [not cited] (0/year)
- MATTERA, Davide and Simon HAYKIN, 1999. Support vector machines for dynamic reconstruction of a chaotic system. In: Advances in Kernel Methods: Support Vector Learning, edited by Bernhard Schölkopf and Christopher J. C. Burges and Alexander J. Smola, Pages 209-241. [Cited by 26] (3.61/year)
- QUAN, Yong and Jie YANG, 2003. An improved parameter tuning method for support vector machines, Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing, 9th International Conference, RSFDGrC 2003, Chongqing, China, May 26-29, 2003, Proceedings, Pages 607-610. [not cited] (0/year)
- RYCHETSKY, Matthias, 2001. Algorithms and Architectures for Machine Learning based on Regularized Neural Networks and Support Vector Approaches, Shaker Verlag. [not cited] (0/year)
- SCHITTKOWSKI, K., 2005. Optimal parameter selection in support vector machines, Journal of Industrial and Management Optimization, Volume 1, Number 4, November 2005, pp. 465-476. [not cited] (0/year)
- SCHÖLKOPF, Bernhard and Alexander J. SMOLA, 2002. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. [Cited by 328] (62.78/year)
- SHAWE-TAYLOR, John and Nello CRISTIANINI, 2004. Kernel Methods for Pattern Analysis. books.google.com. [Cited by 215] (96.66/year)
- SHITONG, Wang, et al., 2005. Theoretically Optimal Parameter Choices for Support Vector Regression Machines with Noisy Input, Soft Computing – A Fusion of Foundations, Methodologies and Applications [not cited] (0/year)
- SMOLA, A., et al., 1998. Asymptotically optimal choice of ε–loss for support vector machines, Proceedings of the International Conference on Artificial Neural Networks, pp. 105-110, Springer, Berlin. [Cited by 34] (4.14/year)
- SOARES, Carlos, Pavel B. BRAZDIL and Petr KUBA, 2004. A meta-learning method to select the kernel width in support vector regression, Machine Learning, Volume 54, Number 3 (March 2004), Pages 195-209. [Cited by 9] (4.04/year)
- STAELIN, Carl, 2003. Parameter selection for support vector machines, HP Laboratories Israel, Tech. Rep. HPL-2002-354 (R. 1), Nov. [Cited by 10] (3.12/year)
- STEINWART, Ingo, 2003. On the optimal parameter choice for ν-support vector machines, IEEE Transactions on Pattern Analysis and Machine Intelligence, October 2003 (Vol. 25, No. 10), pp. 1274-1284. [Cited by 8] (2.49/year)
- WANG, Wenjian, et al., 2003. Determination of the spread parameter in the Gaussian kernel for classification and regression, Neurocomputing, Volume 55, Number 3, October 2003, pp. 643-663. [Cited by 21] (6.52/year)
- WANG, Xin, et al., 2005. Parameter selection of support vector regression based on hybrid optimization algorithm and its application, Journal of Control Theory and Applications, pages 371-376. [not cited] (0/year)
- WANG, S., et al., 2006. Experimental study on parameter choices in norm-r support vector regression machines with noisy input, Soft Computing – A Fusion of Foundations, Methodologies and Applications, Volume 10, Number 3 / February, 2006, pages 219-223. [not cited] (0/year)
- YU, Xinying, Shie-Yui LIONG and Vladan BABOVIC, 2004. EC-SVM approach for real-time hydrologic forecasting, Journal of Hydroinformatics, Volume 6, Number 3, July 2004, Pages 209-223. [Cited by 1] (0.45/year)

“The selection of appropriate values for the three parameters (C, e, s) in the above expressions has been proposed by various researchers. Cherkassky & Mulier (1998) suggested the use of cross-validation for the SVM parameter choice. Mattera & Haykin (1999) proposed the parameter C to be equal to the range of output values. They also proposed the selection of the e value to be such that the percentage of support vectors in the SVM regression model is around 50% of the number of samples. Smola et al. (1998) assigned optimal e values as proportional to the noise variance, in agreement with general sources on SVM. Cherkassky & Ma (2004) proposed the selection of e parameters based on the estimated noise. Different approaches yield different values for the three parameters. As shown later, this study finds the optimal parameter set simultaneously by minimising the prediction error as the objective function.”
Yu, Liong and Babovic (2004)

found the optimal parameter set simultaneously by minimising the prediction error as the objective function.

Support Vector Machines (SVMs)

Parameters

SVM Parameters

C

Epsilon (ε)

Kernel Parameters

General

Bibliography

Tags

Leave a Reply Cancel reply