PPT - Bo Yuan

Classification IV Lecturer: Dr. Bo Yuan LOGO E-mail: yuanb@sz.tsinghua.edu.cn Overview Support Vector Machines 2 Linear Classifier f ( x, w, b)  sign ( g ( x))  sign ( w  x  b) w Just in case ... w·x + b <0 n w  x   wi xi i 1 w·x + b >0 w  x1  b  w  x2  b w( x1  x2 )  0 3 Distance to Hyperplane g ( x)  w  x  b x x  x'w g ( x)  w( x'  w)  b  w  x'b  w  w x'  w  w M || x  x' |||| w || |b| || w || | g ( x) |  || w || | g ( x) |   w w || w || 4 Selection of Classifiers ? Which classifier is the best? All have the same training error. How about generalization? 5 Unknown Samples B A Classifier B divides the space more consistently (unbiased). 6 Margins Support Vectors Support Vectors 7 Margins  The margin of a linear classifier is defined as the width that the boundary could be increased by before hitting a data point.  Intuitively, it is safer to choose a classifier with a larger margin.  Wider buffer zone for mistakes  The hyperplane is decided by only a few data points.  Support Vectors!  Others can be discarded!  Select the classifier with the maximum margin.  Linear Support Vector Machines (LSVM)  Works very well in practice.  How to specify the margin formally? 8 Margins M=Margin Width x+ X- 2 M || w || 9 Objective Function  Correctly classify all data points: w  xi  b  1 if yi  1 w  xi  b  1 if yi  1 yi (w  xi  b)  1  0  Maximize the margin 2 1 T max M   min w w w 2  Quadratic Optimization Problem  Minimize  Subject to 1 t  ( w)  w w 2 yi ( w  xi  b)  1 10 Lagrange Multipliers l l 1 2 LP  || w ||   i yi ( w  xi  b)    i 2 i 1 i 1 L p w L p b l  0  w    i yi xi i 1 Dual Problem l  0    i yi  0 i 1 LD    i  i 1  i j yi y j xi  x j  2 i, j 1 T    i   H where H ij  yi y j xi  x j 2 i subject to :   iyi  0 &  i  0 i 11 Quadratic problem again! Solutions of w & b Support Vectors : Samples with positive  l g ( x)    i yi xi  x  b y s ( xs  w  b)  1 i 1 y s (   m y m xm  x s  b )  1 mS y s2 (   m ym xm  x s b)  y s inner product mS b  y s    m y m xm  x s mS 1 b Ns  ( y   sS s mS m y m xm  x s ) 12 An Example x2 2 1  y (1, 1, +1) i 1 i i  0  1   2  0  1   2 H12   y1 y1 x1  x1 H H   11    H 21 H 22   y2 y1 x2  x1 2 (0, 0, -1) x1 x1  x2 1  0 y1 y2 x1  x2  2 0   y2 y2 x2  x2  0 0 1  1 LD   i  1 ,  2 H    21  12 2 i 1  2  2 2 w    i yi xi  11 [1,1]  1 (1)  [0,0]  [1,1] i 1 1  1;  2  1 b   wx1  1  2  1  1 g ( x)  wx  b  x1  x2  1 M 13 2 2   2 w 2 Soft Margin e2 e11 e7 yi ( wxi  b)  1  i  0 1 t  ( w)  w w  C   i 2 i i  0 l l l 1 2 L P  w  C   i    i [ yi ( w  xi  b)  1   i ]   i i 2 i 1 i 1 i 1 14 Soft Margin L p w L p b L p  i l  0  w    i yi xi i 1 l  0    i yi  0 i 1  0  C   i  ui l l l 1 2 L P  w  C   i    i [ yi ( w  xi  b)  1   i ]   i i 2 i 1 i 1 i 1 1 T LD    i   H 2 i s.t. 0   i  C and  y  0 i i 15 i Non-linear SVMs 0 x 0 x x2 x 16 Feature Space x2 x22 Φ: x → φ(x) x1 𝑥1 2 + 𝑥2 2 = 𝑟 2 x12 17 Feature Space x2 Φ: x → φ(x) x1 18 Quadratic Basis Functions 1     2 x 1    2 x2        2 xm    2 x   1  x2  2       x2  m   ( x)    2 x1 x2    2 x x 1 3        2 x x 1 m    2x x  2 3       2x x  2 m         2 xm 1 xm  Constant Terms Linear Terms Number of terms C Pure Quadratic Terms Quadratic Cross-Terms 19 2 m 2 (m  2)( m  1) m 2   2 2 Calculation of Φ(xi )·Φ(xj ) 1 1         2 a 2 b 1 1      2 a2   2b2             2am   2bm      2 2 a b     1 1  a2   b2  2 2            a2   b2  m m    (a )   (b)    2a1a2   2b1b2      2 a a 2 b b 1 3   1 3             2 a a 2 b b 1 m 1 m      2a a   2b b  2 3 2 3            2a a   2b b  2 m 2 m                2am 1am   2bm 1bm  1 m  2a b i i i 1 m 2 2 a  i bi xi  x j   ( xi )   ( x j ) i 1 m 1 m   2a a b b i 1 j i 1 i j i 20 j It turns out … m m 1 m m (a)  (b)  1  2 ai bi   a b    2ai a j bi b j i 1 i 1 2 2 i i i 1 j i 1 m m i 1 i 1 (a  b  1) 2  (a  b) 2  2a  b  1  ( ai bi ) 2  2 ai bi  1 m m m   ai bi a j b j  2 ai bi  1 i 1 j 1 i 1 m 1 m   (ai bi )  2 2 i 1 m a b a b i 1 j i 1 i i j m j  2 ai bi  1 i 1 K (a, b)  (a  b  1) 2  (a)  (b) O(m 2 ) O (m) 21 Kernel Trick  The linear classifier relies on dot products between vectors xi·xj  If every data point is mapped into a high-dimensional space via some transformation Φ: x → φ(x), the dot product becomes: φ(xi) ·φ(xj)  A kernel function is some function that corresponds to an inner product in some expanded feature space: K(xi, xj) = φ(xi) ·φ(xj)  Example: x=[x1, x2]; K(xi, xj) = (1 + xi · xj)2 K ( xi , x j )  (1  xi  x j ) 2  1  xi21 x 2j1  2 xi1 x j1 xi 2 x j 2  xi22 x 2j 2  2 xi1 x j1  2 xi 2 x j 2  [1, xi21 , 2 xi1 xi 2 , xi22 , 2 xi1 , 2 xi 2 ]  [1, x 2j1 , 2 x j1 x j 2 , x 2j 2 , 2 x j1 , 2 x j 2 ]   (x i )   (x j ), where  ( x)  [1, x12 , 2 x1 x2 , x22 , 2 x1 , 2 x2 ] 22 Kernels Polynomial : K ( xi , x j )  ( xi  x j  1) d  x x i j  Gaussian : K ( xi , x j )  exp   2 2   2     Hyperbolic Tangent : K ( xi , x j )  tanh( xi  x j  c) 23 String Kernel Similarity between text strings: Car vs. Custard K (car , cat )  4 K (car , car )  K (cat , cat )  24  6 24 Solutions of w & b l w    i yi ( xi ) i 1 l l i 1 i 1 w  ( x j )    i yi ( xi )  ( x j )    i yi K ( xi , x j ) 1 1 b   ( y s    m y m  ( xm )   ( x s ) )   ( y s    m y m K ( xm , x s ) ) N s sS N s sS mS mS l g ( x)    i yi K ( xi , x)  b i 1 l g ( x)  w  x  b    i yi xi  x  b i 1 25 Decision Boundaries 26 More Maths … 27 SVM Roadmap Linear Classifier Maximum Margin Linear SVM Noise Soft Margin Nonlinear Problem a·b → Φ(a)·Φ(b) High Computational Cost Kernel Trick K(a,b)=Φ(a)·Φ(b) 28 Reading Materials  Text Book  Nello Cristianini and John Shawe-Taylor, An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge University Press, 2000.  Online Resources  http://www.kernel-machines.org/  http://www.support-vector-machines.org/  http://www.tristanfletcher.co.uk/SVM%20Explained.pdf  http://www.csie.ntu.edu.tw/~cjlin/libsvm/  A list of papers uploaded to the web learning portal  Wikipedia & Google 29 Review  What is the definition of margin in a linear classifier?  Why do we want to maximize the margin?  What is the mathematical expression of margin?  How to solve the objective function in SVM?  What are support vectors?  What is soft margin?  How does SVM solve nonlinear problems?  What is so called “kernel trick”?  What are the commonly used kernels? 30 Next Week’s Class Talk  Volunteers are required for next week’s class talk.  Topic : SVM in Practice  Hints:  Applications  Demos  Multi-Class Problems  Software • A very popular toolbox: Libsvm  Any other interesting topics beyond this lecture  Length: 20 minutes plus question time 31

PPT - Bo Yuan

Related documents

Products

Support

PPT - Bo Yuan

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib