Review of Leture 13 • Data ontamination 0.8 Expeted Error • Validation 0.7 D − Eout gm ∗ 0.6 (N ) Dtrain − Eval gm ∗ 0.5 (N − K ) 5 g Dval (K ) Dval 15 Validation Set Size, K 25 slightly ontaminated • Cross validation z Eval(g ) g Eval(g −) estimates Eout(g) D }| { D1 D2 D3 D4 D5 D6 D7 D8 D9 D10 train validate train 10-fold ross validation Learning From Data Yaser S. Abu-Mostafa California Institute of Tehnology Leture 14: Support Vetor Mahines Sponsored by Calteh's Provost Oe, E&AS Division, and IST • Thursday, May 17, 2012 Outline • Maximizing the margin • The solution • Nonlinear transforms AM L Creator: Yaser Abu-Mostafa - LFD Leture 14 2/20 Better linear separation Hi Hi Hi Linearly separable data Dierent separating lines Whih is best? Hi Hi Hi Two questions: 1. Why is bigger margin better? 2. Whih AM L Creator: Yaser Abu-Mostafa - LFD Leture 14 w maximizes the margin? 3/20 Remember the growth funtion? All dihotomies with any line: AM L Creator: Yaser Abu-Mostafa - LFD Leture 14 4/20 Dihotomies with fat margin Fat margins imply fewer dihotomies AM L infinity 0.866 0.5 0.397 infinity 0.866 0.5 0.397 Creator: Yaser Abu-Mostafa - LFD Leture 14 5/20 Finding Let xn w with large margin be the nearest data point to the plane wTx = 0. How far is it? 2 preliminary tehnialities: 1. Normalize w: |wTxn| = 1 2. Pull out w0: w = (w1, · · · , wd) The plane is now AM L Creator: Yaser Abu-Mostafa - LFD Leture 14 apart from w Tx + b = 0 b (no x0 ) 6/20 Computing the distane The distane between ⊥ to The vetor w is Take x′ and x′′ xn the plane in the X wTx + b = 0 ′ and ′′ where |wTxn + b| = 1 spae:Hi on the plane wTx′ + b = 0 T and the plane xn w x’ wTx′′ + b = 0 =⇒ w (x − x ) = 0 x’’ Hi AM L Creator: Yaser Abu-Mostafa - LFD Leture 14 7/20 and the distane is Distane between Take any point Projetion of x AM L and the plane: on the plane xn − x w ŵ = =⇒ kwk distane xn on w ... Hi xn w x T distane = ŵ (xn − x) Hi 1 1 T 1 T T T = w xn − w x = w xn + b − w x − b = kwk kwk kwk Creator: Yaser Abu-Mostafa - LFD Leture 14 8/20 The optimization problem Maximize 1 kwk subjet to min n=1,2,...,N |wTxn + b| = 1 Notie: Minimize subjet to AM L Creator: Yaser Abu-Mostafa - LFD Leture 14 |wTxn + b| = yn (wTxn + b) 1 T ww 2 T yn (w xn + b) ≥ 1 for n = 1, 2, . . . , N 9/20 Outline • Maximizing the margin • The solution • Nonlinear transforms AM L Creator: Yaser Abu-Mostafa - LFD Leture 14 10/20 Constrained optimization Minimize subjet to 1 T ww 2 yn (wTxn + b) ≥ 1 for n = 1, 2, . . . , N w ∈ Rd, b ∈ R Lagrange? AM L Creator: Yaser Abu-Mostafa - LFD Leture 14 inequality onstraints =⇒ KKT 11/20 We saw this before Remember regularization? Minimize E (w) = in (Zw − y)T(Zw − y) w lin normal w in normal to onstraint optimize onstrain E w Tw Regularization: SVM: AM L in wTw ≤ C subjet to: ∇E 1 N E = onst. in wTw Creator: Yaser Abu-Mostafa - LFD Leture 14 E ∇E in wtw = C in 12/20 Lagrange formulation Minimize 1 T L(w, b, α) = w w − 2 w.r.t. w and b and N X αn(yn (wTxn + b) −1) n=1 maximize w.r.t. eah ∇wL = w − αn ≥ 0 N X αnynxn = 0 n=1 ∂L = − ∂b AM L Creator: Yaser Abu-Mostafa - LFD Leture 14 N X αn y n = 0 n=1 13/20 Substituting w = N X αnynxn N X and 1 T L(w, b, α) = w w − 2 in the Lagrangian N X αn (yn (wTxn+b) −1 ) n=1 N X N X N X 1 L(α) = αn − ynym αnαm xnT xm 2 n=1 m=1 n=1 we get AM L αn y n = 0 n=1 n=1 Maximize w.r.t. to ... α subjet to Creator: Yaser Abu-Mostafa - LFD Leture 14 αn ≥ 0 for n = 1, · · · , N and PN n=1 αnyn =0 14/20 The solution - quadrati programming min α subjet to 1 T α 2 | y1y1 x1Tx1 y1y2 x1Tx2 y2y1 x2Tx1 y2y2 x2Tx2 ... ... yN y1 xNTx1 yN y2 xNTx2 {z quadrati oeients T y | α{z= 0} T )α α + |(−1 {z } linear } linear onstraint 0 |{z} lower bounds AM L . . . y1yN x1TxN . . . y2yN x2TxN ... ... . . . yN yN xNTxN Creator: Yaser Abu-Mostafa - LFD Leture 14 ≤ α ≤ ∞ |{z} upper bounds 15/20 QP hands us Solution: α = α1 , · · · , αN =⇒ w = N X αnynxn α E = onst. in w lin n=1 KKT ondition: normal For n = 1, · · · , N w αn (yn (wTxn + b) − 1) = 0 ∇E We saw this before! αn > 0 =⇒ xn AM L is a in support vetor Creator: Yaser Abu-Mostafa - LFD Leture 14 wtw = C 16/20 Support vetors Hi Closest xn's to the plane: ahieve the margin =⇒ yn (wTxn + b) = 1 X w = xn Solve for is SV b αnynxn using any SV: yn (wTxn + b) = 1 Hi AM L Creator: Yaser Abu-Mostafa - LFD Leture 14 17/20 Outline • Maximizing the margin • The solution • Nonlinear transforms AM L Creator: Yaser Abu-Mostafa - LFD Leture 14 18/20 z instead of x N X N X N X 1 L(α) = αn − ynym αnαm zTnzm 2 n=1 n=1 m=1 PSfrag replaements 1 1 0 X PSfrag −→replaements Z 0.5 −1 −1 AM L Creator: Yaser Abu-Mostafa - LFD Leture 14 0 0 1 0 0.5 1 19/20 Support vetors in X spae Hi Support vetors live in In X Z spae spae, pre-images of support vetors The margin is maintained in Z spae Generalization result E[Eout] ≤ E [# of SV's ] N −1 Hi AM L Creator: Yaser Abu-Mostafa - LFD Leture 14 20/20