Lecture 14 Outline Support Vector Machine 1. Overview of SVM 2. Problem setting of linear separators 3. Soft Margin Method 4. Lagrange Multiplier Method to find solutions 1. Support Vector Machines (SVM) Invented by Vladimir Vapnik and co-workers Introduced at the Computational Learning Theory (COLT) 1992 conference Derived from statistical learning theory Support Vector Machines (SVM) Empirically good performance: successful applications in many fields (bioinformatics, text, image recognition, . . . ) Quite popular Now superseded by deep learning neural networks Support Vector Machines (SVM) Linear regression: Use hyper-planes to separate two classes Based on idea of maximum “support” 1. Support Vector Machines If the two classes can be separated perfectly by a line in the x space, how do we choose the “best” line? Support Vector Machines B1 B2 B2 B1 B2 One solution is to choose the line (hyperplane) with the largest margin. The margin is the distance between the two parallel lines on either side. B1 B2 b21 b22 margin b11 b12 2. Optimization Problem setting B1 w x b 0 w x b 1 w x b 1 b11 1 if w x b 1 f ( x) 1 if w x b 1 b12 This can be formulated as a constrained optimization problem. We want to maximize 2 This is equivalent to minimizing L( w) || w || 2 We have the following constraints 1 if w xi b 1 f ( xi ) 1 if w x i b 1 So we have a quadratic objective function with linear constraints which means it is a convex optimization problem and we can use Lagrange multipliers 2. Linear SVM Maximum margin becomes constrained optimization problem min w 2 2 subject to: w yi w xi b 1,i 1,2,,N Quadratic programming optimization problem Can apply Lagrange multipliers Read Example 5.5 In Page 264 3. Soft Margin for Linear SVM What to do when complete linear separation is impossible? 3. Linear SVMs Soft Margin method Corinna Cortes and Vladimir Vapnik propose (1995) modification allowing for mislabeled examples using “slack variables” n w 2 min C i w, ,b i 1 2 subject to: yi w xi b 1 i ,i 1,2,,N What if the problem is not linearly separable? Then we can introduce slack variables: 2 Minimize || w || N k L( w) C i 2 i 1 Subject to 1 if w x b 1- i i f ( xi ) 1 if w x i b 1 i # number of mistakes If data not separable introduce penalty Choose C based on cross validation How to penalize mistakes? 4. Use quadratic solver Online Lessons for Lagrange Simplex Method and Optimization https://modelsim.wordpress.com/modules/optimizati on/ Mathematical Modeling and Simulation Module 2, lesson 2 – 6. Exercise in Geometry Prove the distance between the two parallel planes Ax By Cz D1 is Ax By Cz D2 | D1 D2 | / || n ||, n [ A, B, C ] Hint, randomly select two points P1 & P2, one on each plane, and project the vector P1P2 to the normal n. the distance is the length of the projected vector of p1p2 on the normal vector. The Midge Classification Challenge MCM Problem 1987 Adapted from Dr. Ben Fusaro Biologists W.L. Grogan of Salisbury Univ., and W.W. Wirth of the Smithsonian Institute, do research on biting midges. The Midge Classification Challenge Grogan and Wirth were doing field work & captured 18 biting midges. They agreed that nine of the midges belonged to an antenna-dominated species, Ma, and six belonged to a wing-dominated species, Mw. The were sure that each of the three left-overs (red dots) belonged to one of the two species but which one...? The challenge -- Take a look at their antenna-wing data and see if you can help them out. ------------------ Midge Classification – Problem A (Continuous) from the 1989 MCM. = Ma = Mw The three unknowns -(1.24, 1.80), (1.28, 1.84), (1.40, 2.04).