# Document

```Review of Leture 13
• Data ontamination
0.8
Expeted Error
• Validation
0.7
D
−
Eout gm
∗
0.6
(N )
Dtrain
−
Eval gm
∗
0.5
(N − K )
5
g
Dval
(K )
Dval
15
Validation Set Size,
K
25
slightly ontaminated
• Cross validation
z
Eval(g )
g
Eval(g −)
estimates
Eout(g)
D
}|
{
D1 D2 D3 D4 D5 D6 D7 D8 D9 D10
train validate
train
10-fold ross validation
Learning From Data
Yaser S. Abu-Mostafa
California Institute of Tehnology
Leture 14: Support
Vetor Mahines
Sponsored by Calteh's Provost Oe, E&amp;AS Division, and IST
•
Thursday, May 17, 2012
Outline
•
Maximizing the margin
•
The solution
•
Nonlinear transforms
AM
L
Creator: Yaser Abu-Mostafa - LFD Leture 14
2/20
Better linear separation
Hi
Hi
Hi
Linearly separable data
Dierent separating lines
Whih is best?
Hi
Hi
Hi
Two questions:
1. Why is bigger margin better?
2. Whih
AM
L
Creator: Yaser Abu-Mostafa - LFD Leture 14
w
maximizes the margin?
3/20
Remember the growth funtion?
All dihotomies with any line:
AM
L
Creator: Yaser Abu-Mostafa - LFD Leture 14
4/20
Dihotomies with fat margin
Fat margins imply fewer dihotomies
AM
L
infinity
0.866
0.5
0.397
infinity
0.866
0.5
0.397
Creator: Yaser Abu-Mostafa - LFD Leture 14
5/20
Finding
Let
xn
w
with large margin
be the nearest data point to the plane
wTx = 0.
How far is it?
2 preliminary tehnialities:
1.
Normalize
w:
|wTxn| = 1
2.
Pull out
w0:
w = (w1, &middot; &middot; &middot; , wd)
The plane is now
AM
L
Creator: Yaser Abu-Mostafa - LFD Leture 14
apart from
w Tx + b = 0
b
(no
x0 )
6/20
Computing the distane
The distane between
⊥ to
The vetor
w
is
Take
x′
and
x′′
xn
the plane in the
X
wTx + b = 0
′
and
′′
where
|wTxn + b| = 1
spae:Hi
on the plane
wTx′ + b = 0
T
and the plane
xn
w
x’
wTx′′ + b = 0
=⇒ w (x − x ) = 0
x’’
Hi
AM
L
Creator: Yaser Abu-Mostafa - LFD Leture 14
7/20
and the distane is
Distane between
Take any point
Projetion of
x
AM
L
and the plane:
on the plane
xn − x
w
ŵ =
=⇒
kwk
distane
xn
on
w
...
Hi
xn
w
x
T
distane = ŵ (xn − x)
Hi
1
1 T
1 T
T T
=
w xn − w x =
w xn + b − w x − b =
kwk
kwk
kwk
Creator: Yaser Abu-Mostafa - LFD Leture 14
8/20
The optimization problem
Maximize
1
kwk
subjet to
min
n=1,2,...,N
|wTxn + b| = 1
Notie:
Minimize
subjet to
AM
L
Creator: Yaser Abu-Mostafa - LFD Leture 14
|wTxn + b| = yn (wTxn + b)
1 T
ww
2
T
yn (w xn + b) ≥ 1
for
n = 1, 2, . . . , N
9/20
Outline
•
Maximizing the margin
•
The solution
•
Nonlinear transforms
AM
L
Creator: Yaser Abu-Mostafa - LFD Leture 14
10/20
Constrained optimization
Minimize
subjet to
1 T
ww
2
yn (wTxn + b) ≥ 1
for
n = 1, 2, . . . , N
w ∈ Rd, b ∈ R
Lagrange?
AM
L
Creator: Yaser Abu-Mostafa - LFD Leture 14
inequality onstraints
=⇒
KKT
11/20
We saw this before
Remember regularization?
Minimize
E (w) =
in
(Zw − y)T(Zw − y)
w
lin
normal
w
in
normal to onstraint
optimize
onstrain
E
w Tw
Regularization:
SVM:
AM
L
in
wTw ≤ C
subjet to:
∇E
1
N
E = onst.
in
wTw
Creator: Yaser Abu-Mostafa - LFD Leture 14
E
∇E
in
wtw = C
in
12/20
Lagrange formulation
Minimize
1 T
L(w, b, α) = w w −
2
w.r.t.
w
and
b
and
N
X
αn(yn (wTxn + b) −1)
n=1
maximize w.r.t. eah
∇wL = w −
αn ≥ 0
N
X
αnynxn = 0
n=1
∂L
= −
∂b
AM
L
Creator: Yaser Abu-Mostafa - LFD Leture 14
N
X
αn y n = 0
n=1
13/20
Substituting
w =
N
X
αnynxn
N
X
and
1 T
L(w, b, α) = w w −
2
in the Lagrangian
N
X
αn (yn (wTxn+b) −1 )
n=1
N
X
N X
N
X
1
L(α) =
αn −
ynym αnαm xnT xm
2 n=1 m=1
n=1
we get
AM
L
αn y n = 0
n=1
n=1
Maximize w.r.t. to
...
α
subjet to
Creator: Yaser Abu-Mostafa - LFD Leture 14
αn ≥ 0
for
n = 1, &middot; &middot; &middot; , N
and
PN
n=1 αnyn
=0
14/20
The solution - quadrati programming

min
α
subjet to

1 T
α 
2

|
y1y1 x1Tx1 y1y2 x1Tx2
y2y1 x2Tx1 y2y2 x2Tx2
...
...
yN y1 xNTx1 yN y2 xNTx2
{z
T
y
| α{z= 0}


T
)α
 α + |(−1
{z
}

linear
}
linear onstraint
0
|{z}
lower bounds
AM
L
. . . y1yN x1TxN
. . . y2yN x2TxN
...
...
. . . yN yN xNTxN

Creator: Yaser Abu-Mostafa - LFD Leture 14
≤
α
≤
∞
|{z}
upper bounds
15/20
QP hands us
Solution:
α = α1 , &middot; &middot; &middot; , αN
=⇒ w =
N
X
αnynxn
α
E = onst.
in
w
lin
n=1
KKT ondition:
normal
For
n = 1, &middot; &middot; &middot; , N
w
αn (yn (wTxn + b) − 1) = 0
∇E
We saw this before!
αn &gt; 0 =⇒ xn
AM
L
is a
in
support vetor
Creator: Yaser Abu-Mostafa - LFD Leture 14
wtw = C
16/20
Support vetors
Hi
Closest
xn's
to the plane: ahieve the margin
=⇒ yn (wTxn + b) = 1
X
w =
xn
Solve for
is SV
b
αnynxn
using any SV:
yn (wTxn + b) = 1
Hi
AM
L
Creator: Yaser Abu-Mostafa - LFD Leture 14
17/20
Outline
•
Maximizing the margin
•
The solution
•
Nonlinear transforms
AM
L
Creator: Yaser Abu-Mostafa - LFD Leture 14
18/20
z
x
N
X
N X
N
X
1
L(α) =
αn −
ynym αnαm zTnzm
2
n=1
n=1 m=1
PSfrag replaements
1
1
0
X PSfrag
−→replaements
Z 0.5
−1
−1
AM
L
Creator: Yaser Abu-Mostafa - LFD Leture 14
0
0
1
0
0.5
1
19/20
Support vetors in
X
spae
Hi
Support vetors live in
In
X
Z
spae
spae, pre-images of support vetors
The margin is maintained in
Z
spae
Generalization result
E[Eout]
≤
E [#
of SV's
]
N −1
Hi
AM
L
Creator: Yaser Abu-Mostafa - LFD Leture 14
20/20
```