Support Vector Machine 李旭斌(LI Xubin) xmubingo@gmail.com @Data mining Lab. 6/19/2012 http://datamining.xmu.edu.cn Page: 1 of 38 Structural risk minimization VC dimension hyperplane Maximum Margin Classifier Kernel function Bla,bla…. Theory is so complicated… No theory, Just use Paper: What is a support vector machine? http://datamining.xmu.edu.cn Page: 2 of 38 What can it do? Main usage: Classification: C-SVC, nu-SVC Regression: epsilon-SVR, nu-SVR Distribution estimation: one-class SVM Other: clustering http://datamining.xmu.edu.cn Page: 3 of 38 But, we have many software with friendly interface. http://datamining.xmu.edu.cn Page: 4 of 38 Who can achieve SVM? libSVM Java, C, R, MATLAB, Python, Perl, C#...CUDA! Hadoop(Mahout)! WEKA Weka-Parallel MATLAB SVM Toolbox Spider SVM in R GPU-accelerated LIBSVM http://datamining.xmu.edu.cn Page: 5 of 38 Examples for Machine Learning Algorithms http://datamining.xmu.edu.cn Page: 6 of 38 Classification SVM http://datamining.xmu.edu.cn Page: 7 of 38 Regression SVR http://datamining.xmu.edu.cn Page: 8 of 38 Clustering K-means Shortcuts from MLDemos. http://datamining.xmu.edu.cn Page: 9 of 38 Let’s back to libSVM http://datamining.xmu.edu.cn Page: 10 of 38 Format of input The format of training and testing data file is: <label> <index1>:<value1> <index2>:<value2> ... . . Each line contains an instance and is ended by a '\n' character. For classification, <label> is an integer indicating the class label (multi-class is supported). For regression, <label> is the target value which can be any real number. For one-class SVM, it's not used so can be any number. The pair <index>:<value> gives a feature (attribute) value: <index> is an integer starting from 1 and <value> is a real number. Example: 1 0:1 1:4 2:6 3:1 1 0:2 1:6 2:8 3:0 0 0:3 1:1 2:0 3:1 http://datamining.xmu.edu.cn Page: 11 of 38 Parameters Usage: svm-train [options] training_set_file [model_file] options: -s svm_type : set type of SVM (default 0) 0 -- C-SVC 1 -- nu-SVC Parameters 2 -- one-class SVM 3 -- epsilon-SVR in formula 4 -- nu-SVR -t kernel_type : set type of kernel function (default 2) 0 -- linear: u'*v 1 -- polynomial: (gamma*u'*v + coef0)^degree 2 -- radial basis function: exp(-gamma*|u-v|^2) 3 -- sigmoid: tanh(gamma*u'*v + coef0) 4 -- precomputed kernel (kernel values in training_set_file) Attention: http://datamining.xmu.edu.cn Page: 12 of 38 -d degree : set degree in kernel function (default 3) -g gamma : set gamma in kernel function (default 1/num_features) -r coef0 : set coef0 in kernel function (default 0) -c cost : set the parameter C of C-SVC, epsilon-SVR, and nu-SVR (default 1) -n nu : set the parameter nu of nu-SVC, one-class SVM, and nuSVR (default 0.5) -p epsilon : set the epsilon in loss function of epsilon-SVR (default 0.1) -m cachesize : set cache memory size in MB (default 100) -e epsilon : set tolerance of termination criterion (default 0.001) -h shrinking : whether to use the shrinking heuristics, 0 or 1 (default 1) -b probability_estimates : whether to train a SVC or SVR model for probability estimates, 0 or 1 (default 0) -wi weight : set the parameter C of class i to weight*C, for C-SVC (default 1) -v n: n-fold cross validation mode -q : quiet mode (no outputs) http://datamining.xmu.edu.cn Page: 13 of 38 nu-SVC & C-SVC “Basically they are the same thing but with different parameters. The range of C is from zero to infinity but nu is always between [0,1]. A nice property of nu is that it is related to the ratio of support vectors and the ratio of the training error. ” http://datamining.xmu.edu.cn Page: 14 of 38 one-class SVM Fault diagnosis Train set is always made up of normal instances. Label: 1 (no -1) Test set contains unknown statuses (instances). Output Label: 1 or -1 1 : normal -1: anomalous Anomaly detection http://datamining.xmu.edu.cn Page: 15 of 38 epsilon-SVR & nu-SVR Paper: LIBSVM: A Library for Support Vector Machines http://datamining.xmu.edu.cn Page: 16 of 38 epsilon Comparison epsilon nu nu http://datamining.xmu.edu.cn Page: 17 of 38 Related experience Usage and grid search Code Analysis Chinese version of libSVM FAQ http://datamining.xmu.edu.cn Page: 18 of 38 libSVM Guide http://www.csie.ntu.edu.tw/~cjlin/papers/guide/g uide.pdf http://datamining.xmu.edu.cn Page: 19 of 38 Flowchart of Task train Train set and test set should been both scaled. Before that, do you really need to scale them? svm-scale train.scale svm-train .model svm-predict svm-scale test.scale test result http://datamining.xmu.edu.cn Page: 20 of 38 Parameters are important! Features are important! Model is also important! Stupid line Good parameters will build a good model. How to get the ‘good’ parameters? http://datamining.xmu.edu.cn Page: 21 of 38 Example Train Set ROC ? Click here C=2, g=100 Positive 83% Negative 85% C=50, g=100 Positive 86% Negative 91% http://datamining.xmu.edu.cn Page: 22 of 38 Parameter Selection Grid Search Particle Swarm Optimization Other Algorithm Manual try…Random? My God!! Now, our work: Type: Classification. Goal: Find best (C, G) http://datamining.xmu.edu.cn Page: 24 of 38 Grid Search http://datamining.xmu.edu.cn Page: 25 of 38 Parallel Grid Search SSH Command grid.py Hadoop-based: 使用MapReduce对svm模型进行训练 http://datamining.xmu.edu.cn Page: 26 of 38 Particle Swarm Optimization (PSO) demo Demo can’t work? Click here http://datamining.xmu.edu.cn Page: 27 of 38 Similar Algorithms Hill-climbing algorithm Genetic algorithm Ant colony optimization Simulated annealing algorithm http://datamining.xmu.edu.cn Page: 29 of 38 Let’s back to PSO. Paper: Development of Particle Swarm Optimization Algorithm http://datamining.xmu.edu.cn Page: 30 of 38 Particle Swarm Optimization G (Cbest, Gbest) Distance 0 Birds hurt food http://datamining.xmu.edu.cn C Page: 31 of 38 PSO and Parameter Selection PSO Find a point (C, G) to make the distance between (C, G) and (Cbest, Gbest) shortest. Parameter Selection Find a pair (C, G) to make the error rate lowest. Estimate function http://datamining.xmu.edu.cn Page: 32 of 38 Position of Particle i : X i ( xi1 , xi 2 , Speed: Vi (vi1, vi 2 , Update weight X k 1 Vk V k 1 , piN )T Global best: Pg ( pg1, pg 2 , Update rule: , xiN ) , viN )T Particle i best: Pi ( pi1, pi 2 , Update position T X gBest , pgN )T xidk 1 xidk vidk 1 Xk X pBest Update speed vidk 1 w vidk c1 rand ( ) ( pid xidk ) c2 rand ( ) ( pgd xidk ) w wmax wmax wmin iter itermax http://datamining.xmu.edu.cn Page: 33 of 38 Algorithm const variables •Dimension (M = 2) •Number of Particles (N = 20-50) •Space scope (0<X[i]<1024, 0<i<M) •Max speed vd max k xd max 0.1 k 0.2 •Speedup factor c1 = c2 = 2 Stop criterion •Max Iteration(20) •threshold (0.03) •Max dead-stop times(10) http://datamining.xmu.edu.cn Page: 34 of 38 Begin Init Swarm, Let k=0, i=0 Update Wi Calculate Score for particle i Yes Score(i) > Pi k = k+1 Pi = Score(i) No i = i+1 Update Xi+1, Vi+1 Yes i<N No Update Global Best Pg No Satisfy stop criteria Yes Figure too small? Click here Output Pg Over http://datamining.xmu.edu.cn Page: 35 of 38 Example There is a problem. http://datamining.xmu.edu.cn Page: 36 of 38 Discussion http://datamining.xmu.edu.cn Page: 37 of 38 Thank you for your attention! http://datamining.xmu.edu.cn Page: 38 of 38