Machine Learning – CMPUT 551, Winter 2009 Homework Assignment #2 – Support vector machines for classification and Regression Reihaneh Rabbany P1) Boosting, Bagging and Trees (a) Bagged regression tree For a Matlab bagged regression tree program to predict log volume, I used Matlab tree function “treefit” to fit the tree on training data and also “treeval” to compute predicted value of the resulted tree on test data. For computing the proper number of bootstrap samples I repeat the algorithm for different number of samples. The result is printed bellow: Figure 1 - Mean Absolute Error per Number of Bootstrap Samples, the x axis is number of bootstrap samples divided by 10 Based on above results I chose to draw 50 bootstrap samples and train the trees on these. The results for true value – predicted value on test set and train set are presented here. These could be re-obtained by running “P1\p1a.m”. Figure 2 – Result of Problem 1 part a, the left diagram is Train Error (in Blue) and the right one is Test Error (in Red). I’ve also check different level of pruning but they do not lead to a significant change on the results. 1 Machine Learning – CMPUT 551, Winter 2009 Homework Assignment #2 – Support vector machines for classification and Regression Reihaneh Rabbany (b) Boosted tree For a Matlab boosted tree program to predict log volume, I implement Forward stage wise additive modeling algorithm from the textbook. For computing basis functions I used Matlab tree functions “treefit” and I used squared-error loss in boosting; therefore, the solution 10.28 simplified to fitting a tree on data for π¦ − ππ−1 (π₯). For computing proper number for M or number of functions, I compute the mean error on training and test data for different number of Ms; the bellow diagrams show the result. Figure 3 – Mean Error per Number of Functions, The left diagram is on train data and the left one is on test data These show that although the train result is decreasing for large Ms, the test result is increasing due to overfitting. Based on these results, I chose M equal to 4 and compute the true value – predicted value for each data point. The bellow diagrams show the results which could be regenerated using “P1\p1b.m”. Figure 4 – Result of Problem 1 part b for M=4, the left diagram is Train Error (in Blue) and the right one is Test Error (in Red). 2 Machine Learning – CMPUT 551, Winter 2009 Homework Assignment #2 – Support vector machines for classification and Regression Reihaneh Rabbany P2) Neural Networks and Back-propagation Algorithm (a) Implementation of Back-propagation To implement back-propagation algorithm for solving the classification problem in Figure 11.4 from the textbook, we have two hidden layers. So we have a network as bellow: X W ο‘ ο’ ο‘0 ο’0 Z T ο§ ο§0 As we should have softmax function, we would have: ππ = π(πΌ π π) ππ = π(π½ π π) ππ = πΎ π π ππ (π) = ππ (π) π ππ ππ (π) = πΎ π ∑π=1 π π πΎ ππ π × (∑π=1 π ππ ) − π ππ × π ππ ππ′ (π) = ππ 2 (∑πΎ π=1 π ) = ππ (π)(1 − ππ (π)) 1 π(π₯) = , 1 + π −π₯ −π₯ π π ′ (π₯) = = π(π₯) × (1 − π(π₯)) (1 + π −π₯ )2 π = 1. .5, π = 1. .5 , π = 1. .2, πΌ: πΏ × (π + 1) π½: π × (πΏ + 1) πΎ: πΎ × (π + 1) (1) (2) (3) (4) (5) (5-1) (5-2) We also should use cross entropy error function, the back-propagation equations are derived bellow: π πΎ π (6) π (π) = − ∑ ∑ π¦ππ πππ ππ (π₯π ) = − ∑ π π π=1 π=1 π=1 ππ π π¦ππ =− × ππ′ (πΎππ π§π ) × π§ππ ππΎππ ππ (π₯π ) = −π¦ππ × (1 − ππ (π₯π )) × π§ππ 3 (7) Machine Learning – CMPUT 551, Winter 2009 Homework Assignment #2 – Support vector machines for classification and Regression Reihaneh Rabbany πΎ (8) ππ π π¦ππ π = −∑ × ππ′ (πΎππ π§π ) × πΎππ × π ′ (π½π π€π ) × π€ππ ππ½ππ ππ (π₯π ) π=1 πΎ π π = − ∑ π¦ππ × (1 − ππ (π₯π )) × πΎππ × π(π½π π€π ) × (1 − π(π½π π€π )) × π€ππ π=1 πΎ = − ∑ π¦ππ × (1 − ππ (π₯π )) × πΎππ × π§ππ × (1 − π§ππ ) × π€ππ π=1 (9) π πΎ ππ π π¦ππ π =−∑ ∑ × ππ′ (πΎππ π§π ) × πΎππ × π ′ (π½π π€π ) × π½ππ × π ′ (πΌππ π₯π ) × π₯ππ ππΌππ ππ (π₯π ) π=1 π=1 π πΎ = − ∑ ∑ π¦ππ × (1 − ππ (π₯π )) × πΎππ × π§ππ × (1 − π§ππ ) × π½ππ × π€ππ × (1 π=1 π=1 − π€ππ ) × π₯ππ For gradient descent update we have: π πΎππ π+1 ππ π ππΎππ π (10) ππ π ππ½ππ π (11) ππ π ππΌππ π (12) π = πΎππ − ππ ∑ π=1 π π½ππ π+1 = π½ππ π − ππ ∑ π=1 π πΌππ π+1 = πΌππ π − ππ ∑ π=1 Where ππ is the learning rate. From (7)-(9) we can obtain back-propagation equations: ππ π = πΏππ × π§ππ , ππΎππ πΏππ = −π¦ππ × (1 − ππ (π₯π )) πΎ ππ π = π ππ × π€ππ , ππ½ππ π ππ = π§ππ × (1 − π§ππ ) × ∑ πΎππ × πΏππ ππ π = πππ × π₯ππ , ππΌππ πππ = π€ππ × (1 − π€ππ ) × ∑ π½ππ × π ππ π=1 π (13) (14) (15) π=1 In two-pass back-propagation algorithm, equations (1)-(4) are used in forward pass and computing πΜπ (π₯π ) and equations (13)-(15) are used for computing error in backward pass. 4 Machine Learning – CMPUT 551, Winter 2009 Homework Assignment #2 – Support vector machines for classification and Regression Reihaneh Rabbany These are implemented and could be run via “P2\p2a”. The results are presented in bellow diagrams. Figure 5 – Results for part a of problem 2, Cross entropy error function with softmax, epoch = 500, The left one is without weight decay and the right one is with weight decay (b) Matlab toolbox The result could be regenerated by running “P2\p2b”. One sample result is brought here. Figure 6- The result for part b of problem 2; the left one is without weight decay and the right one is with weight decay 5