SVMs in a Nutshell What is an SVM? • Support Vector Machine – More accurately called support vector classifier – Separates training data into two classes so that they are maximally apart Simpler version • Suppose the data is linearly separable • Then we could draw a line between the two classes Simpler version • But what is the best line? In SVM, we’ll use the maximum margin hyperplane Maximum Margin Hyperplane What if it’s non-linear? Higher dimensions • SVM uses a kernel function to map the data into a different space where it can be separated What if it’s not separable? • Use linear separation, but allow training errors • This is called using a “soft margin” • Higher cost for errors = creation of more accurate model, but may not generalize • Choice of parameters (kernel and cost) determines accuracy of SVM model • To avoid over- or under-fitting, use cross validation to choose parameters Some math • • • • • Data: {( x1, c1), (x2, c2), …, (xn, cn)} xi is vector of attributes/features, scaled ci is class of vector (-1 or +1) Dividing hyperplane: wx - b = 0 Linearly separable means there exists a hyperplane such that wxi - b > 0 if positive example and wxi - b < 0 if negative example • w points perpendicular to hyperplane More math • wx - b = 0 Support vectors • wx - b = 1 • wx - b = -1 Distance between hyperplanes is 2/|w|, so minimize |w| More math • • • • For all i, either w xi - b 1 or wx - b -1 Can be rewritten: ci(w xi - b) 1 Minimize (1/2)|w| subject to ci(w xi - b) 1 This is a quadratic programming problem and can be solved in polynomial time A few more details • So far, assumed linearly separable – To get to higher dimensions, use kernel function instead of dot product; may be nonlinear transform – Radial Basis Function is commonly used kernel: k(x, x’) = exp(||x - x’||2) [need to choose ] • So far, no errors; soft margin: – Minimize (1/2)|w| + C i – Subject to ci(w xi - b) 1 - i – C is error penalty