Linear SVM - WordPress.com

advertisement
Lecture 14
Outline Support Vector Machine
1.
Overview of SVM
2. Problem setting of linear separators
3. Soft Margin Method
4. Lagrange Multiplier Method to find solutions
1. Support Vector Machines (SVM)
 Invented by Vladimir Vapnik and co-workers
 Introduced at the Computational Learning Theory
(COLT) 1992 conference
 Derived from statistical learning theory
Support Vector Machines (SVM)
 Empirically good performance: successful applications
in many fields (bioinformatics, text, image
recognition, . . . )
 Quite popular
 Now superseded by deep learning neural networks
Support Vector Machines (SVM)
 Linear regression: Use hyper-planes to separate two
classes
 Based on idea of maximum “support”
1. Support Vector Machines
If the two classes
can be separated
perfectly by a line
in the x space,
how do we choose
the “best” line?
Support Vector Machines
B1
B2
B2
B1
B2
One solution is to choose the line (hyperplane) with the
largest margin. The margin is the distance between the two
parallel lines on either side.
B1
B2
b21
b22
margin
b11
b12
2. Optimization Problem setting
B1
 
w x b  0
 
w  x  b  1
 
w  x  b  1
b11
 
1
if
w
x  b 1


f ( x)  
 

1
if
w
 x  b  1

b12
 This can be formulated as a constrained optimization
problem.
 We want to maximize
 2
 This is equivalent to minimizing L( w)  || w ||
2
 We have the following constraints
 
1
if
w
 xi  b  1


f ( xi )  
 

1
if
w
 x i  b  1

 So we have a quadratic objective function with linear
constraints which means it is a convex optimization
problem and we can use Lagrange multipliers
2. Linear SVM
 Maximum margin becomes constrained optimization
problem
min
w
2
2
subject to:
w
yi w  xi  b   1,i  1,2,,N
 Quadratic programming optimization problem
 Can apply Lagrange multipliers
Read
Example 5.5
In Page 264
3. Soft Margin for Linear SVM
 What to do when complete linear separation is
impossible?
3. Linear SVMs
 Soft Margin method
 Corinna Cortes and Vladimir Vapnik propose (1995)
modification allowing for mislabeled examples using
“slack variables”
n
 w 2

min
 C  i 
w, ,b
i 1
 2

subject to: yi w  xi  b   1  i ,i  1,2,,N
What if the problem is not linearly separable?
Then we can introduce
slack variables:
 2
Minimize
|| w ||
 N k
L( w) 
 C  i 
2
 i 1 
 
Subject to
1
if
w
 x  b  1- 


i
i
f ( xi )  
 
 1 if w  x i  b  1  i
# number of mistakes
If data not separable introduce penalty
Choose C based on cross validation
How to penalize mistakes?
4. Use quadratic solver
Online Lessons for Lagrange Simplex
Method and Optimization
 https://modelsim.wordpress.com/modules/optimizati
on/
 Mathematical Modeling and Simulation
 Module 2, lesson 2 – 6.
Exercise in Geometry
 Prove the distance between the two parallel planes
Ax  By  Cz  D1
is
Ax  By  Cz  D2
| D1  D2 | / || n ||,
n  [ A, B, C ] Hint, randomly select two points P1 & P2, one on each
plane, and project the vector P1P2 to the normal n. the
distance is the length of the projected vector of p1p2
on the normal vector.
The Midge Classification Challenge
MCM Problem 1987 Adapted from Dr. Ben Fusaro
Biologists W.L. Grogan of Salisbury Univ., and W.W. Wirth
of the Smithsonian Institute, do research on biting midges.
The Midge Classification Challenge
Grogan and Wirth were doing field work & captured
18 biting midges. They agreed that nine of the midges
belonged to an antenna-dominated species, Ma, and
six belonged to a wing-dominated species, Mw.
The were sure that each of the three left-overs (red dots)
belonged to one of the two species but which one...?
The challenge -- Take a look at their antenna-wing data
and see if you can help them out.
------------------
Midge Classification – Problem A (Continuous) from the 1989
MCM.
= Ma
= Mw
The three
unknowns -(1.24, 1.80),
(1.28, 1.84),
(1.40, 2.04).
Download