支持向量机SVM

Support Vector machine

2015/4/8

1










SVM寻优算法


2015/4/8
Chap8 SL Zhongzhi Shi
2

2015/4/8
Chap8 SL Zhongzhi Shi
3





1900-1920 数据描述
1920-1940 统计模型的曙光
1940-1960 数理统计时代




1990-1999 建模复杂的数据结构
2015/4/8
Chap8 SL Zhongzhi Shi
4


Tikhonov, Ivanov 和 Philips 发现的关于解决不适定


Parzen, Rosenblatt 和Chentsov 发现的非参数统计学；

Vapnik 和Chervonenkis 发现的在泛函数空间的大数


Kolmogorov, Solomonoff 和Chaitin 发现的算法复杂

2015/4/8
Chap8 SL Zhongzhi Shi
5









2015/4/8
Chap8 SVM Zhongzhi Shi
6


（1）搜集数据：采样、实验设计

（2）分析数据：建模、知识发现、可视化

（3）进行推理：预测、分类

2015/4/8
Chap8 SVM Zhongzhi Shi
7





2015/4/8
SVM是一种基于统计学习理论的机器

Vapnik在COLT-92上首次提出，从此

Vapnik V N. 1995. The Nature of
Statistical Learning Theory. SpringerVerlag, New York
Vapnik V N. 1998. Statistical Learning
Theory. Wiley-Interscience Publication,
John Wiley&Sons, Inc

Chap8 SVM Zhongzhi Shi
8




P(X,Y)，P(X,Y)反映了某种知识。学习问

( independently drawn and identically
distributed )的观测样本train set，
(x1,y1),(x2,y2),…,(xn,yn)
2015/4/8
Chap8 SVM Zhongzhi Shi
9



(typically in Rn), independently drawn
from some fixed distribution F(x)
 训练器Supervisor (S) labels each input
x with an output value y according to
some fixed distribution F(y|x)
 学习机Learning Machine (LM) “learns”
from an i.i.d. l-sample of (x,y)-pairs
output from G and S, by choosing a
function that best approximates S from
a parameterised function class f(x,),
where  is in  the parameter set

functions f(x,) and the equivalent
representation of each f using its index 

G
x
S
LM
y
y^

2015/4/8
Chap8 SVM Zhongzhi Shi
10

R( w)   L( y, f ( x, w))dF ( x, y )

2015/4/8
Chap8 SVM Zhongzhi Shi
11

1 l
Remp ( w)   L( yi , f ( xi , w))
l i 1

2015/4/8
Chap8 SVM Zhongzhi Shi
12

R(w)  Remp (w)  (h / l )
2015/4/8
Chap8 SVM Zhongzhi Shi
13
VC维
VC维(Vapnik-Chervonenkis Dimension)。模式识别方法

 (h / l ) 
h(ln(2l / h  1)  ln( / 4)
l
h是函数H=f(x, w)的VC维, l是样本数.
2015/4/8
Chap8 SVM Zhongzhi Shi
14

2015/4/8
Chap8 SVM Zhongzhi Shi
15

Problem: how rich class of classifications q(x;θ) to use.
underfitting
good fit
overfitting
Problem of generalization: a small emprical risk Remp does not
imply small true expected risk R.
2015/4/8
Chap8 SVM Zhongzhi Shi
16

1. 学习过程的一致性理论
What are (necessary and sufficient) conditions for consistency (convergence
of Remp to R) of a learning process based on the ERM Principle?
2.学习过程收敛速度的非渐近理论
How fast is the rate of convergence of a learning process?
3. 控制学习过程的泛化能力理论
How can one control the rate of convergence (the generalization
ability) of a learning process?
4. 构造学习算法的理论
How can one construct algorithms that can control the
generalization ability?
2015/4/8
Chap8 SVM Zhongzhi Shi
17


ERM is intended for relatively large
samples (large l/h)


Large l/h induces a small  which
decreases the the upper bound on risk
Small samples? Small empirical risk
doesn’t guarantee anything!
…we need to minimise both terms of the
RHS of the risk bounds
1.
2.
2015/4/8
The empirical risk of the chosen 
An expression depending on the VC dimension of

Chap8 SVM Zhongzhi Shi
18


The Structural Risk Minimisation (SRM)
Principle

Let S = {Q(z,),}. An admissible
structure S1S2…Sn…S:


For each k, the VC dimension hk of Sk is finite and
h1≤h2≤…≤hn≤…≤hS
Every Sk is either is non-negative bounded, or
satisfies for some (p,k)

Q
sup
 
2015/4/8
k
p
1
p
z,dFz 
R
  k, p 2
Chap8 SVM Zhongzhi Shi
19

S1
S2
Sn
h1

h*
hn
h
The SRM Principle continued


2015/4/8
For given z1,…,zl and an admissible structure
S1S2…Sn… S, SRM chooses function
Q(z,lk) minimising Remp in Sk for which the
guaranteed risk (risk upper-bound) is minimal
Thus manages the unavoidable trade-off of
quality of approximation vs. complexity of
approximation
Chap8 SVM Zhongzhi Shi
20

on the risk

Confidence interval

h1
Sn
hn
h
S*
S1
2015/4/8
h*
risk
S*
Sn
Chap8 SVM Zhongzhi Shi
21


SVMs are learning systems that

use a hyperplane of linear functions

in a high dimensional feature space — Kernel function


trained with a learning algorithm from optimization
theory — Lagrange
Implements a learning bias derived from statistical
learning theory — Generalisation SVM is a classifier
derived from statistical learning theory by Vapnik and
Chervonenkis
2015/4/8
Chap8 SVM Zhongzhi Shi
22

x

f
yest
f(x,w,b) = sign(w. x - b)
denotes +1
denotes -1
How would you
classify this data?
2015/4/8
Chap8 SVM Zhongzhi Shi
23

denotes +1

x
f
yest
f(x,w,b) = sign(w. x - b)
denotes -1
How would you
classify this data?
2015/4/8
Chap8 SVM Zhongzhi Shi
24

denotes +1

x
f
yest
f(x,w,b) = sign(w. x - b)
denotes -1
How would you
classify this data?
© 2001, 2003, Andrew W. Moore
Chap8 SVM Zhongzhi Shi
25

denotes +1

x
f
yest
f(x,w,b) = sign(w. x - b)
denotes -1
How would you
classify this data?
© 2001, 2003, Andrew W. Moore
Chap8 SVM Zhongzhi Shi
26

denotes +1

x
f
yest
f(x,w,b) = sign(w. x - b)
denotes -1
How would you
classify this data?
© 2001, 2003, Andrew W. Moore
Chap8 SVM Zhongzhi Shi
27

denotes +1

x
f
f(x,w,b) = sign(w. x - b)
The maximum
margin linear
classifier is the
linear classifier
with the maximum
margin.
denotes -1
© 2001, 2003, Andrew W. Moore
yest
Linear SVM
Chap8 SVM Zhongzhi Shi
This is the
simplest kind of
SVM (Called an
LSVM)
28



Training set: (xi, yi), i=1,2,…N; yi{+1,-1}
Hyperplane: wx+b=0

This is fully determined by (w,b)
2015/4/8
Chap8 SVM Zhongzhi Shi
29

According to a theorem
from Learning Theory,
from all possible linear
decision functions the one
that maximises the margin
of the training set will
minimise the
generalisation error.
2015/4/8
Chap8 SVM Zhongzhi Shi
30

Note1: decision functions (w,b) and
(cw, cb) are the same
Note2: but margins as measured by the
outputs of the function xwx+b are
not the same if we take (cw, cb).
Definition: geometric margin: the margin
w
given by the canonical decision
function, which is when c=1/||w||
Strategy:
1) we need to maximise the geometric
wx+b>0
margin! (cf result from learning
wx+b<0
theory)
wx+b=0 2) subject to the constraint that
training examples are classified
correctly
2015/4/8
Chap8 SVM Zhongzhi Shi
31

According to Note1, we can demand the function output for
the nearest points to be +1 and –1 on the two sides of the
decision function. This removes the scaling freedom.
Denoting a nearest positive example x+ and a nearest negative
example x-, this is wx  b  1 and wx  b  1
Computing the geometric margin (that has to be maximised):
1 w
b
w
b
1
1
(
x 

x 
)
(wx   b  wx   b) 
2 || w ||
|| w || || w ||
|| w ||
2 || w ||
|| w ||
And here are the constraints:
wxi  b  1 for yi  1
wxi  b  1 for yi  1
2015/4/8
yi (wxi  b) 1  0 for all i
Chap8 SVM Zhongzhi Shi
32
Maximum margin – summing up
Given a linearly separable
training set (xi, yi),
i=1,2,…N; yi{+1,-1}
Minimise ||w||2
Subject to
yi (wxi  b) 1  0, i  1,..N
programming problem with
linear inequality constraints.
wx+b=1 There are well known
wx+b=0 procedures for solving it 
wx+b=-1
wx+b>1
wx+b<1
2015/4/8
Chap8 SVM Zhongzhi Shi
33

The training points
that are nearest to the
separating function
are called support
vectors.
What is the output of
our decision function
for these points?
2015/4/8
Chap8 SVM Zhongzhi Shi
34

T  {( x1, y1 ), ,( xl , yl )} ( x  y)l

yi  y  {1, 1} 是输出指标，或输出.

2维空间上的分类问题)
2015/4/8
 n维空间上的分类问题.
Chap8 SVM Zhongzhi Shi
35

,( xl , yl )} ( x  y)l

x
f ( x)  sgn( g ( x))



2015/4/8
Chap8 SVM Zhongzhi Shi
36

SVM分类问题大致有三种：线性可分问题、近似线性可分

2015/4/8
Chap8 SVM Zhongzhi Shi
37

l3

2

)
l2和 l3 之间的距
w

。
2015/4/8
Chap8 SVM Zhongzhi Shi
38

l2 , l3 可表示为

(w  x)  b  k1，(w  x)  b  k 2

（w  x)  b  k , (w  x)  b  -k

w
b
, b  ，则两式可以等价写为
k
k
（w  x)  b  1, (w  x)  b  -1

2015/4/8
Chap8 SVM Zhongzhi Shi
39

 1- b 1  b 
2


D


w2 
w1
 w2

D
w2
2
[x]1
（w  x )  b  1
（w  x)  b  1
1 b
w2 


D
[x]2

 1  b
D
w2
2015/4/8
Chap8 SVM Zhongzhi Shi
2
w12  w2 2

2
w
40

“间隔”
2

|| w ||

w b和 的最优化问题
min
w ,b
s.t.
1
|| w ||2 ,
2
yi (( w  xi )  b)  1, i  1, , l

(1.2.1)
(1.2.2)

n
R

2015/4/8
Chap8 SVM Zhongzhi Shi
41
Margin =
2
W
H1平面：
W  X1  b  1
H2平面：
W  X 2  b  1
yi [(W  X i )  b]  1  0
2015/4/8
…..(1)
Chap8 SVM Zhongzhi Shi
…..(2)
42

l
1 l l
min
yiyj i j  xi  xj     j


2 i 1 j 1
j 1
l
s.t.
 y
i
i
 0,
i 1

 i  0, i  1 l
i 为原始问题中与每个约束条件对应的Lagrange乘子。这是

2015/4/8
Chap8 SVM Zhongzhi Shi
*
43

 *  (a1* ,

, al* )T
l
*
*
*
*

i 1
l
b  yj   yi i*  xi  xj 
*
i 1


*
*
w

x

b
 0 ,决策函数 f  x   sgn((w*  x)  b* )

*


*

*
2015/4/8
Chap8 SVM Zhongzhi Shi
44

（即“软化” 约束条件）

  1 ,
l 
T
l
i 作

i 1

2
w
l


i 1
i

2015/4/8
Chap8 SVM Zhongzhi Shi
45

min
w,b ,
s.t
l
1 2
w  C  i
2
i 1
yi ((w  xi )  b)  1  i , i  1,
l
i  0, i  1, l
l
  体现了经验风险，而
i 1
i
w 则体现了表达能力。所以

SVC的原始问题。
2015/4/8
Chap8 SVM Zhongzhi Shi
46
(广义)线性支持向量分类机算法
1. 设已知训练集 T  {( x1 , y1 ),
l
,( xl , yl )} ( x  y)，其中
xi  x  Rn , yi  y  {1, 1}, i  1, , l
2. 选择适当的惩罚参数 C  0 ，构造并求解最优化问题
l
1 l l
min
yi y j i j  xi  x j     j


2 i 1 j 1
j 1
l
 y
s.t.
i 1
i
i
0
0   i  C , i  1,
l
l

al* )T
*
*
3. 计算 w   yii xi ，选择  * 的一个分量 0   j  C，并据此
*
i 1

l
*
i 1
*
*
4. 构造分划超平面 (w*  x)  b*  0 ,决策函数 f ( x)  sgn((w  x)  b )
2015/4/8
Chap8 SVM Zhongzhi Shi
47

2015/4/8
Chap8 SVM Zhongzhi Shi
48
Non-linear Classification
What can we do if the boundary is nonlinear ?
Idea:
2015/4/8
transform the data vectors to a
space where the separator is
linear
Chap8 SVM Zhongzhi Shi
49
Non-linear Classification
The transformation many times is made to an infinite
dimensional space, usually a function space.
Example: x  cos(uTx)
2015/4/8
Chap8 SVM Zhongzhi Shi
50
Non-linear SVMs



Transform x  (x)
The linear algorithm depends only on xxi, hence
transformed algorithm depends only on (x)(xi)
Use kernel function K(xi,xj) such that K(xi,xj)= (x)(xi)
2015/4/8
Chap8 SVM Zhongzhi Shi
51

T

[w]1  2[w]2[ x]1  2[w]3[ x]2  2[w]4[ x]1[ x]2  [w]5[ x]12 [w]6[ x]22  b  0
T

 ( x)  (1, 2[ x]1, 2[ x]2 , 2[ x]1[ x]2 ,[ x]12 ,[ x]22 )T
(1)

[w]1[ X ]1  2[w]2[ X ]2  2[w]3[ X ]3  2[w]4[ X ]4  [w]5[ X ]5 [w]6[ X ]6  b  0
2015/4/8
Chap8 SVM Zhongzhi Shi
52

(w*  x)  b*  0，其中w*  ([w* ]1, [w* ]6 )T

[w* ]1  2[w* ]2[ x]1  2[w* ]3[ x]2  2[w* ]4[ x]1[ x]2  [w* ]5[ x]12  [w* ]6[ x]22  b  0

2015/4/8
Chap8 SVM Zhongzhi Shi
53

l
1 l l
min
yi y j i j  ( xi )   ( x j )     j


2 i 1 j 1
j 1
l
s.t.
 y
i 1
i
i
0
0   i  C , i  1,
l

 ( xi )  (1, 2[ xi ]1, 2[ xi ]2 , 2[ xi ]1[ xi ]2 ,[ xi ]12 ,[ xi ]22 )T
 ( x j )  (1, 2[ x j ]1, 2[ x j ]2 , 2[ x j ]1[ x j ]2 ,[ x j ]12 ,[ x j ]22 )T
( ( xi )   ( x j ))  1  2[ xi ]1[ x j ]1  2[ xi ]2 [ x j ]2  2[ xi ]1[ xi ]2[ x j ]1[ x j ]2
 [ xi ]12 [ x j ]12  [ xi ]22 [ x j ]22
2015/4/8
Chap8 SVM Zhongzhi Shi
(2)
54

*
*
* T


(

,


1
l ) 后，得到分划超平面
(w*  x)  b*  0

l
w*   yii* ( xi ), j { j | 0   *j  C}
i 1
l
b  y j   yi i ( ( xi )   ( x j ))
*
i 1

2015/4/8
f ( x)  sgn(( w*   ( x))  b* )
l
f ( x)  sgn( yii ( ( xi )   ( x))  b* )
i 1
Chap8 SVM Zhongzhi Shi
55

(( xi  x j ) 1)2

 [ xi ]12 [ x j ]12  [ xi ]22 [ x j ]22  1  2[ xi ]1[ x j ]1[ xi ]2 [ x j ]2
 2[ xi ]1[ x j ]1  2[ xi ]2 [ x j ]2
(3)

( ( xi )  ( x j ))  K ( xi , x j )  (( xi  x j ) 1)2

2015/4/8
Chap8 SVM Zhongzhi Shi
56

(( xi  x j ) 1)2

l
1 l l
min
yi y j i j K  xi , x j     j


2 i 1 j 1
j 1
l
s.t.
 y
i 1
i
i
0
0   i  C , i  1,
*
*


(


1,
2015/4/8
l
l* )T，而决策函数
Chap8 SVM Zhongzhi Shi
57

l

f ( x)  sgn( yii K ( xi , x))  b* )
i 1
l

b*  y j   yii K ( xi , x j ) j { j | 0   *j  C}
i 1
K ( xi , x) － 核函数
2015/4/8
Chap8 SVM Zhongzhi Shi
58

 
:
x

 ( x)
K ( x, x)  ( ( x)   ( x))

2015/4/8
Chap8 SVM Zhongzhi Shi
59



K ( x, xi )  [( x  xi )  c]q 得到q 阶多项式分类器


K ( x, xi )  exp{
| x  xi |2

2
}


Sigmoind内核
K ( x, xi )  tanh( ( x  xi )  c)

2015/4/8
Chap8 SVM Zhongzhi Shi
60

K ( x, y )  x  y

The kind of kernel represents the inner
product of two vector(point) in a feature
space of  n  d  1 dimension.


d
d

For example
2015/4/8
Chap8 SVM Zhongzhi Shi
61
SVM寻优算法

 需要计算存储核函数矩阵。当样本点数较大时，需要很

 SVM在二次型寻优过程中要进行大量的矩阵运算，通常

－
Edgar Osuna(Cambridge,MA)等人在IEEE NNSP’97发表

Machines ,提出了SVM的分解算法，即将原问题分解为若

2015/4/8
Chap8 SVM Zhongzhi Shi
62
SVM寻优算法

1. 块算法(Chunking Algorithm)：

2015/4/8
Chap8 SVM Zhongzhi Shi
63
SVM寻优算法

SMO使用了块与分解技术，而SMO算法则将分解算法思想推

2015/4/8
Chap8 SVM Zhongzhi Shi
64
SVM寻优算法
SMO算法每次迭代时，在可行的区域内选择两点，

2015/4/8
Chap8 SVM Zhongzhi Shi
65
SVM寻优算法

146
4
97.26

83
0
100

137
3
97.81

32
2
93.75

106
2
98.11

90
1
98.89

34
1
97.06

87
2
97.70

111
2
98.20

40
1
97.50

91
1
98.90

3
0
100

24
2
91.67

984
21
97.87
2015/4/8
Chap8 SVM Zhongzhi Shi
66
SVM寻优算法
SMO算法核缓存算法
SMO算法在每次迭代只选择两个样本向量优化目标

2015/4/8
Chap8 SVM Zhongzhi Shi
67
SVM寻优算法
SMO算法核缓存算法

2015/4/8
Chap8 SVM Zhongzhi Shi
68
SVM寻优算法


1
5624
5624*23
40726
7:06
10
5624
5624*233
40726
3:50
20
5624
5624*466
40726
2:41
30
5624
5624*699
40726
1:56
40
5624
5624*932
40726
1:29
50
5624
5624*1165
40726
1:23
60
5624
5624*1398
40726
1:08
70
5624
5624*1631
40726
1:05
80
5624
5624*1864
40726
1:04
90
5624
5624*2097
40726
1:07
100
5624
5624*2330
40726
1:37
250
5624
5624*5624
40726
1:12
2015/4/8
Chap8 SVM Zhongzhi Shi
69
SVM寻优算法

2015/4/8
Chap8 SVM Zhongzhi Shi
70
SVM寻优算法

2015/4/8
Chap8 SVM Zhongzhi Shi
71
SVM寻优算法

alpha=C）是否以后不再被启发式选择，或者不再被判定为

2015/4/8
Chap8 SVM Zhongzhi Shi
72
SVM寻优算法

•这样就可以在SMO算法收敛前，提前将边界上的点从训

•训练开始前，训练活动集样本初始化为全部训练样本。每

•检查完当前活动向量集中所有样本后，产生了新的活动向

2015/4/8
Chap8 SVM Zhongzhi Shi
73

SVM寻优算法
2. 固定工作样本集 (Osuna et al.)：

2015/4/8
Chap8 SVM Zhongzhi Shi
74
SVM applications

Pattern recognition
o

DNA array expression data analysis
o

Features: words counts
Features: expr. levels in diff. conditions
Protein classification
o
2015/4/8
Features: AA composition
Chap8 SVM Zhongzhi Shi
75
Handwritten Digits Recognition
2015/4/8
Chap8 SVM Zhongzhi Shi
76
Applying SVMs to Face Detection

The SVM face-detection system
1. Rescale the
input image
several times
2. Cut 19x19
window patterns
out of the scaled
image
5. If the class corresponds
to a face, draw a rectangle
around the face in the
output image.
3. Preprocess the
light correction and
histogram equalization
2015/4/8
4. Classify the
pattern using
the SVM
Chap8 SVM Zhongzhi Shi
77
Applying SVMs to Face Detection

Experimental results on static images


2015/4/8
Set A: 313 high-quality, same number of faces
Set B: 23 mixed quality, total of 155 faces
Chap8 SVM Zhongzhi Shi
78
Applying SVMs to Face Detection

Extension to a real-time
system
An example
of the skin
detection
module
implemented
using SVMs
Face
Detection
on the PCbased
Color Real
Time
System
2015/4/8
Chap8 SVM Zhongzhi Shi
79
References


Vladimir Vapnik. The Nature of Statistical Learning Theory, Springer, 1995
Andrew W. Moore. cmsc726: SVMs.
http://www.cs.cmu.edu/~awm/tutorials




C. Burges. A tutorial on support vector machines for pattern
recognition. Data Mining and Knowledge Discovery, 2(2):955-974,
1998. http://citeseer.nj.nec.com/burges98tutorial.html
Vladimir Vapnik. Statistical Learning Theory. Wiley-Interscience;
1998
Thorsten Joachims (joachims_01a): A Statistical Learning Model of Text
Classification for Support Vector Machines
Ben Rubinstein. Statistical Learning Theory. Dept. Computer Science &
Software Engineering, University of Melbourne; and Division of Genetics &
Bioinformatics, Walter & Eliza Hall Institute
2015/4/8
Chap8 SVM Zhongzhi Shi
80
www.intsci.ac.cn/shizz/
Questions?!
2015/4/8
Chap8 SVM Zhongzhi Shi
81
Arab people

15 Cards