2012-6-20

advertisement
Support Vector Machine
李旭斌(LI Xubin)
xmubingo@gmail.com
@Data mining Lab.
6/19/2012
http://datamining.xmu.edu.cn
Page: 1 of 38
Structural risk minimization
VC dimension
hyperplane
Maximum Margin Classifier
Kernel function Bla,bla….
Theory is so complicated…
No theory, Just use
Paper: What is a support vector machine?
http://datamining.xmu.edu.cn
Page: 2 of 38
What can it do?
Main usage:
Classification: C-SVC, nu-SVC
Regression: epsilon-SVR, nu-SVR
Distribution estimation: one-class SVM
Other:
clustering
http://datamining.xmu.edu.cn
Page: 3 of 38
But, we have many software
with friendly interface.
http://datamining.xmu.edu.cn
Page: 4 of 38
Who can achieve SVM?
libSVM
Java, C, R, MATLAB, Python, Perl, C#...CUDA!
Hadoop(Mahout)!
WEKA
Weka-Parallel
MATLAB SVM Toolbox
Spider
SVM in R
GPU-accelerated LIBSVM
http://datamining.xmu.edu.cn
Page: 5 of 38
Examples for Machine Learning
Algorithms
http://datamining.xmu.edu.cn
Page: 6 of 38
Classification
SVM
http://datamining.xmu.edu.cn
Page: 7 of 38
Regression
SVR
http://datamining.xmu.edu.cn
Page: 8 of 38
Clustering
K-means
Shortcuts from MLDemos.
http://datamining.xmu.edu.cn
Page: 9 of 38
Let’s back to libSVM
http://datamining.xmu.edu.cn
Page: 10 of 38
Format of input
The format of training and testing data file is:
<label> <index1>:<value1> <index2>:<value2> ...
.
.
Each line contains an instance and is ended by a '\n' character. For
classification, <label> is an integer indicating the class label
(multi-class is supported). For regression, <label> is the target
value which can be any real number. For one-class SVM, it's not used
so can be any number. The pair <index>:<value> gives a feature
(attribute) value: <index> is an integer starting from 1 and <value>
is a real number.
Example:
1 0:1 1:4 2:6 3:1
1 0:2 1:6 2:8 3:0
0 0:3 1:1 2:0 3:1
http://datamining.xmu.edu.cn
Page: 11 of 38
Parameters
Usage: svm-train [options] training_set_file [model_file]
options:
-s svm_type : set type of SVM (default 0)
0 -- C-SVC
1 -- nu-SVC
Parameters
2 -- one-class SVM
3 -- epsilon-SVR
in formula
4 -- nu-SVR
-t kernel_type : set type of kernel function (default 2)
0 -- linear: u'*v
1 -- polynomial: (gamma*u'*v + coef0)^degree
2 -- radial basis function: exp(-gamma*|u-v|^2)
3 -- sigmoid: tanh(gamma*u'*v + coef0)
4 -- precomputed kernel (kernel values in training_set_file)
Attention:
http://datamining.xmu.edu.cn
Page: 12 of 38
-d degree : set degree in kernel function (default 3)
-g gamma : set gamma in kernel function (default 1/num_features)
-r coef0 : set coef0 in kernel function (default 0)
-c cost : set the parameter C of C-SVC, epsilon-SVR, and nu-SVR
(default 1)
-n nu : set the parameter nu of nu-SVC, one-class SVM, and nuSVR (default 0.5)
-p epsilon : set the epsilon in loss function of epsilon-SVR
(default 0.1)
-m cachesize : set cache memory size in MB (default 100)
-e epsilon : set tolerance of termination criterion (default 0.001)
-h shrinking : whether to use the shrinking heuristics, 0 or 1
(default 1)
-b probability_estimates : whether to train a SVC or SVR model
for probability estimates, 0 or 1 (default 0)
-wi weight : set the parameter C of class i to weight*C, for C-SVC
(default 1)
-v n: n-fold cross validation mode
-q : quiet mode (no outputs)
http://datamining.xmu.edu.cn
Page: 13 of 38
nu-SVC & C-SVC
“Basically they are the same thing but with
different parameters. The range of C is from
zero to infinity but nu is always between [0,1].
A nice property of nu is that it is related to the
ratio of support vectors and the ratio of the
training error. ”
http://datamining.xmu.edu.cn
Page: 14 of 38
one-class SVM
Fault diagnosis
Train set is always made up of normal instances.
Label: 1 (no -1)
Test set contains unknown statuses (instances).
Output Label: 1 or -1
1 : normal
-1: anomalous
Anomaly detection
http://datamining.xmu.edu.cn
Page: 15 of 38
epsilon-SVR & nu-SVR
Paper:
LIBSVM: A Library for Support Vector
Machines
http://datamining.xmu.edu.cn
Page: 16 of 38
epsilon
Comparison
epsilon
nu
nu
http://datamining.xmu.edu.cn
Page: 17 of 38
Related experience
Usage and grid search
Code Analysis
Chinese version of libSVM FAQ
http://datamining.xmu.edu.cn
Page: 18 of 38
libSVM Guide
http://www.csie.ntu.edu.tw/~cjlin/papers/guide/g
uide.pdf
http://datamining.xmu.edu.cn
Page: 19 of 38
Flowchart of Task
train
Train set and test set
should been both scaled.
Before that, do you really
need to scale them?
svm-scale
train.scale
svm-train
.model
svm-predict
svm-scale
test.scale
test
result
http://datamining.xmu.edu.cn
Page: 20 of 38
Parameters are important!
Features are important!
Model is also important!
Stupid line
Good parameters will build a good model.
How to get the ‘good’ parameters?
http://datamining.xmu.edu.cn
Page: 21 of 38
Example
Train Set
 ROC ?
Click here
C=2, g=100
Positive 83%
Negative 85%
C=50, g=100
Positive 86%
Negative 91%
http://datamining.xmu.edu.cn
Page: 22 of 38
Parameter Selection
Grid Search
Particle Swarm Optimization
Other Algorithm
Manual try…Random? My God!!
Now, our work:
Type: Classification.
Goal: Find best (C, G)
http://datamining.xmu.edu.cn
Page: 24 of 38
Grid Search
http://datamining.xmu.edu.cn
Page: 25 of 38
Parallel Grid Search
SSH Command
grid.py
Hadoop-based:
使用MapReduce对svm模型进行训练
http://datamining.xmu.edu.cn
Page: 26 of 38
Particle Swarm Optimization
(PSO)
demo
Demo can’t work?
Click here
http://datamining.xmu.edu.cn
Page: 27 of 38
Similar Algorithms
Hill-climbing algorithm
Genetic algorithm
Ant colony optimization
Simulated annealing algorithm
http://datamining.xmu.edu.cn
Page: 29 of 38
Let’s back to PSO.
Paper: Development of Particle Swarm Optimization Algorithm
http://datamining.xmu.edu.cn
Page: 30 of 38
Particle Swarm Optimization
G
(Cbest, Gbest)
Distance
0
Birds hurt food
http://datamining.xmu.edu.cn
C
Page: 31 of 38
PSO and Parameter Selection
PSO
Find a point (C, G) to
make the distance
between (C, G) and
(Cbest, Gbest) shortest.
Parameter Selection
Find a pair (C, G) to
make the error rate
lowest.
Estimate function
http://datamining.xmu.edu.cn
Page: 32 of 38
Position of Particle i : X i  ( xi1 , xi 2 ,
Speed:
Vi  (vi1, vi 2 ,
Update
weight
X k 1
Vk
V k 1
, piN )T
Global best: Pg  ( pg1, pg 2 ,
Update rule:
, xiN )
, viN )T
Particle i best: Pi  ( pi1, pi 2 ,
Update
position
T
X gBest
, pgN )T
xidk 1  xidk  vidk 1
Xk
X pBest
Update
speed
vidk 1  w vidk  c1  rand ( )  ( pid  xidk )  c2  rand ( )  ( pgd  xidk )
w  wmax 
wmax  wmin
 iter
itermax
http://datamining.xmu.edu.cn
Page: 33 of 38
Algorithm const variables
•Dimension
(M = 2)
•Number of Particles
(N = 20-50)
•Space scope
(0<X[i]<1024, 0<i<M)
•Max speed
vd max  k  xd max 0.1  k  0.2
•Speedup factor
c1 = c2 = 2
Stop criterion
•Max Iteration(20)
•threshold (0.03)
•Max dead-stop times(10)
http://datamining.xmu.edu.cn
Page: 34 of 38
Begin
Init Swarm, Let k=0, i=0
Update Wi
Calculate Score for particle i
Yes
Score(i) > Pi
k = k+1
Pi = Score(i)
No
i = i+1
Update Xi+1, Vi+1
Yes
i<N
No
Update Global Best Pg
No
Satisfy stop criteria
Yes
 Figure too small?
Click here
Output Pg
Over
http://datamining.xmu.edu.cn
Page: 35 of 38
Example
There is a problem.
http://datamining.xmu.edu.cn
Page: 36 of 38
Discussion
http://datamining.xmu.edu.cn
Page: 37 of 38
Thank you for your attention!
http://datamining.xmu.edu.cn
Page: 38 of 38
Download