Machine Learning – CMPUT 551, Winter 2009 Homework Assignment #2 – Support vector machines for classification and Regression Reihaneh Rabbany P1) (a) Equations 12.8 and 12.25 of the textbook are equal because by starting from equation 12.8 we have: π ππππ½,π½0 π π’πππππ‘ π‘π: 1 βπ½β2 + πΎ ∑ ππ 2 π=1 ∀π, ππ ≥ 0 π π¦π (π₯π π½ + π½0 ) ≥ 1 − ππ This equation could be converted to equation (2), As πΎ is a constant and independent of π½, π½0 , we (1) can conclude that: ππππ½,π½0 πΎπ΄ = ππππ½,π½0 π΄ π ππππ½,π½0 π π’πππππ‘ π‘π: 1 βπ½β2 + ∑ ππ 2πΎ π=1 ∀π, ππ ≥ 0 π¦π (π₯ππ π½ + π½0 ) ≥ 1 − ππ π π’πππππ‘ π‘π: 1 βπ½β2 + ∑ ππ 2πΎ π=1 ∀π, ππ ≥ 0 π¦π π(π₯π ) ≥ 1 − ππ π¦π π(π₯π ) − (1 − ππ ) ≥ 0 πππ ππ = 0 ⇒ π¦π π(π₯π ) − 1 ≥ 0 ⇒ π¦π π(π₯π ) ≥ 1 In the former from 12.16 we have: πΌπ [ π¦π π(π₯π ) − (1 − ππ )] = 0 πππ πΌπ ≠ 0 ⇒ π¦π π(π₯π ) − (1 − ππ ) = 0 ⇒ 1 − π¦π π(π₯π ) = ππ 1 − π¦π π(π₯π ) ππ π¦π π(π₯π ) ≤ 1 0 ππ π¦π π(π₯π ) ≥ 1 1 − π¦π π(π₯π ) ππ 1 − π¦π π(π₯π ) ≥ 0 ππ = { 0 ππ‘βπππ€ππ π ππ = [1 − π¦π π(π₯π )]+ ππ = { π ππππ½,π½0 1 βπ½β2 + ∑[1 − π¦π π(π₯π )]+ 2πΎ π π’πππππ‘ π‘π: π=1 ∀π, ππ ≥ 0 π¦π π(π₯π ) ≥ 1 − ππ (2) Let it be here for now, we know that equation 12.8 is equal to equations 12.9-12.16 using quadratic programming. Consequently we could conclude (3) from equation 12.8 (or 12.15 and 12.12) that: Either ππ should be zero or πΌπ should be equal to πΎ which is a nonzero value. π ππππ½,π½0 From 12.1 we know that π(π₯π ) = π₯ππ π½ + π½0 Therefore this equation is equal to equation (3) In the later from 12.14 we have: (4) (5) From 4 and 5 we conclude that in 12.8 we have (6) (6) is further equal to (7): This is also equal to (8): (7) Putting this into (3) we obtain (9) (8) In (9) we already considered these two subject to conditions and they are satisfied already; also as we can see π(π₯π ) and π¦π here become dependent from ππ , which means that we have no ππ in our (9) minimization function. So we could eliminate these two constraints and reach the equation 12.25. π ππππ½,π½0 1 βπ½β2 + ∑[1 − π¦π π(π₯π )]+ 2πΎ π=1 .β 1 Machine Learning – CMPUT 551, Winter 2009 Homework Assignment #2 – Support vector machines for classification and Regression Reihaneh Rabbany P1) (b) Support vectors are so sensitive toward outliners since outliners play a dominant role in determining the decision hyperplane as they tend to have the largest margin losses. Consequently, for a loss function to be robust to outliners, it should prevent form assigning large loss values to outliners. In other word, the less loss assigned by the loss function to the outliners, the more robust it is. Therefore using Table 12.1 and Figure 12.4, these three given loss functions are sorted from the most robust to the least robust to outliers as follows: 1) Binomial log likelihood 2) Support vector loss function 3) Square-loss Because as we could see in Figure 12.4, for outliners (which are points too far that yf(x)<<0) the Binomial log likelihood loss function gives the most loss and has the most growth. However, SV loss function and Square root loss function are nearly the same, and grow linearly, but the Binomial log likelihood’s asymptote is lower than SV. 2 Machine Learning – CMPUT 551, Winter 2009 Homework Assignment #2 – Support vector machines for classification and Regression Reihaneh Rabbany P2) a) Running P2\a_linear_SVM\p2a.m would result in: ο§=0.01 ο§=10000 3 Machine Learning – CMPUT 551, Winter 2009 Homework Assignment #2 – Support vector machines for classification and Regression Reihaneh Rabbany b) Running P2\b_cross_validation\cross_validation.m would draw figures like above during run and finally it would draw a new figure for average result. We could conclude that ο§=0.0120.015 gives a slightly better results but choosing πΎ ∈ [100,1000] is more confident as the error remains constant between different runs for this range. 4 Machine Learning – CMPUT 551, Winter 2009 Homework Assignment #2 – Support vector machines for classification and Regression Reihaneh Rabbany c) Running P2\c_non_linear_SVM\p2c.m would result in: ο§=0.01 ο§=10000 5 Machine Learning – CMPUT 551, Winter 2009 Homework Assignment #2 – Support vector machines for classification and Regression Reihaneh Rabbany P3) Running P3\p3.m would result in plotting these two figures. For training set error we would have: And for the test set error we would have: For implementing this support vector regression I used linear kernel (no kernel). 6