WriteUp

Machine Learning – CMPUT 551, Winter 2009 Homework Assignment #2 – Support vector machines for classification and Regression Reihaneh Rabbany P1) Boosting, Bagging and Trees (a) Bagged regression tree For a Matlab bagged regression tree program to predict log volume, I used Matlab tree function “treefit” to fit the tree on training data and also “treeval” to compute predicted value of the resulted tree on test data. For computing the proper number of bootstrap samples I repeat the algorithm for different number of samples. The result is printed bellow: Figure 1 - Mean Absolute Error per Number of Bootstrap Samples, the x axis is number of bootstrap samples divided by 10 Based on above results I chose to draw 50 bootstrap samples and train the trees on these. The results for true value – predicted value on test set and train set are presented here. These could be re-obtained by running “P1\p1a.m”. Figure 2 – Result of Problem 1 part a, the left diagram is Train Error (in Blue) and the right one is Test Error (in Red). I’ve also check different level of pruning but they do not lead to a significant change on the results. 1 Machine Learning – CMPUT 551, Winter 2009 Homework Assignment #2 – Support vector machines for classification and Regression Reihaneh Rabbany (b) Boosted tree For a Matlab boosted tree program to predict log volume, I implement Forward stage wise additive modeling algorithm from the textbook. For computing basis functions I used Matlab tree functions “treefit” and I used squared-error loss in boosting; therefore, the solution 10.28 simplified to fitting a tree on data for 𝑦 − 𝑓𝑚−1 (𝑥). For computing proper number for M or number of functions, I compute the mean error on training and test data for different number of Ms; the bellow diagrams show the result. Figure 3 – Mean Error per Number of Functions, The left diagram is on train data and the left one is on test data These show that although the train result is decreasing for large Ms, the test result is increasing due to overfitting. Based on these results, I chose M equal to 4 and compute the true value – predicted value for each data point. The bellow diagrams show the results which could be regenerated using “P1\p1b.m”. Figure 4 – Result of Problem 1 part b for M=4, the left diagram is Train Error (in Blue) and the right one is Test Error (in Red). 2 Machine Learning – CMPUT 551, Winter 2009 Homework Assignment #2 – Support vector machines for classification and Regression Reihaneh Rabbany P2) Neural Networks and Back-propagation Algorithm (a) Implementation of Back-propagation To implement back-propagation algorithm for solving the classification problem in Figure 11.4 from the textbook, we have two hidden layers. So we have a network as bellow: X W   0 0 Z T  0 As we should have softmax function, we would have: 𝑊𝑙 = 𝜎(𝛼 𝑇 𝑋) 𝑍𝑚 = 𝜎(𝛽 𝑇 𝑊) 𝑇𝑘 = 𝛾 𝑇 𝑍 𝑓𝑘 (𝑋) = 𝑔𝑘 (𝑇) 𝑒 𝑇𝑘 𝑔𝑘 (𝑇) = 𝐾 𝑇 ∑𝑙=1 𝑒 𝑙 𝐾 𝑇𝑘 𝑒 × (∑𝑙=1 𝑒 𝑇𝑙 ) − 𝑒 𝑇𝑘 × 𝑒 𝑇𝑘 𝑔𝑘′ (𝑇) = 𝑇𝑙 2 (∑𝐾 𝑙=1 𝑒 ) = 𝑔𝑘 (𝑇)(1 − 𝑔𝑘 (𝑇)) 1 𝜎(𝑥) = , 1 + 𝑒 −𝑥 −𝑥 𝑒 𝜎 ′ (𝑥) = = 𝜎(𝑥) × (1 − 𝜎(𝑥)) (1 + 𝑒 −𝑥 )2 𝑙 = 1. .5, 𝑚 = 1. .5 , 𝑘 = 1. .2, 𝛼: 𝐿 × (𝑝 + 1) 𝛽: 𝑀 × (𝐿 + 1) 𝛾: 𝐾 × (𝑀 + 1) (1) (2) (3) (4) (5) (5-1) (5-2) We also should use cross entropy error function, the back-propagation equations are derived bellow: 𝑁 𝐾 𝑁 (6) 𝑅(𝜃) = − ∑ ∑ 𝑦𝑖𝑘 𝑙𝑜𝑔 𝑓𝑘 (𝑥𝑖 ) = − ∑ 𝑅𝑖 𝑖=1 𝑘=1 𝑖=1 𝜕𝑅𝑖 𝑦𝑖𝑘 =− × 𝑔𝑘′ (𝛾𝑘𝑇 𝑧𝑖 ) × 𝑧𝑚𝑖 𝜕𝛾𝑘𝑚 𝑓𝑘 (𝑥𝑖 ) = −𝑦𝑖𝑘 × (1 − 𝑓𝑘 (𝑥𝑖 )) × 𝑧𝑚𝑖 3 (7) Machine Learning – CMPUT 551, Winter 2009 Homework Assignment #2 – Support vector machines for classification and Regression Reihaneh Rabbany 𝐾 (8) 𝜕𝑅𝑖 𝑦𝑖𝑘 𝑇 = −∑ × 𝑔𝑘′ (𝛾𝑘𝑇 𝑧𝑖 ) × 𝛾𝑘𝑚 × 𝜎 ′ (𝛽𝑚 𝑤𝑖 ) × 𝑤𝑖𝑙 𝜕𝛽𝑚𝑙 𝑓𝑘 (𝑥𝑖 ) 𝑘=1 𝐾 𝑇 𝑇 = − ∑ 𝑦𝑖𝑘 × (1 − 𝑓𝑘 (𝑥𝑖 )) × 𝛾𝑘𝑚 × 𝜎(𝛽𝑚 𝑤𝑖 ) × (1 − 𝜎(𝛽𝑚 𝑤𝑖 )) × 𝑤𝑙𝑖 𝑘=1 𝐾 = − ∑ 𝑦𝑖𝑘 × (1 − 𝑓𝑘 (𝑥𝑖 )) × 𝛾𝑘𝑚 × 𝑧𝑚𝑖 × (1 − 𝑧𝑚𝑖 ) × 𝑤𝑙𝑖 𝑘=1 (9) 𝑀 𝐾 𝜕𝑅𝑖 𝑦𝑖𝑘 𝑇 =−∑ ∑ × 𝑔𝑘′ (𝛾𝑘𝑇 𝑧𝑖 ) × 𝛾𝑘𝑚 × 𝜎 ′ (𝛽𝑚 𝑤𝑖 ) × 𝛽𝑚𝑙 × 𝜎 ′ (𝛼𝑙𝑇 𝑥𝑖 ) × 𝑥𝑖𝑝 𝜕𝛼𝑙𝑝 𝑓𝑘 (𝑥𝑖 ) 𝑚=1 𝑘=1 𝑀 𝐾 = − ∑ ∑ 𝑦𝑖𝑘 × (1 − 𝑓𝑘 (𝑥𝑖 )) × 𝛾𝑘𝑚 × 𝑧𝑚𝑖 × (1 − 𝑧𝑚𝑖 ) × 𝛽𝑚𝑙 × 𝑤𝑙𝑖 × (1 𝑚=1 𝑘=1 − 𝑤𝑖𝑙 ) × 𝑥𝑖𝑝 For gradient descent update we have: 𝑁 𝛾𝑘𝑚 𝑟+1 𝜕𝑅𝑖 𝜕𝛾𝑘𝑚 𝑟 (10) 𝜕𝑅𝑖 𝜕𝛽𝑚𝑙 𝑟 (11) 𝜕𝑅𝑖 𝜕𝛼𝑙𝑝 𝑟 (12) 𝑟 = 𝛾𝑘𝑚 − 𝜆𝑟 ∑ 𝑖=1 𝑁 𝛽𝑚𝑙 𝑟+1 = 𝛽𝑚𝑙 𝑟 − 𝜆𝑟 ∑ 𝑖=1 𝑁 𝛼𝑙𝑝 𝑟+1 = 𝛼𝑙𝑝 𝑟 − 𝜆𝑟 ∑ 𝑖=1 Where 𝜆𝑟 is the learning rate. From (7)-(9) we can obtain back-propagation equations: 𝜕𝑅𝑖 = 𝛿𝑘𝑖 × 𝑧𝑚𝑖 , 𝜕𝛾𝑘𝑚 𝛿𝑘𝑖 = −𝑦𝑖𝑘 × (1 − 𝑓𝑘 (𝑥𝑖 )) 𝐾 𝜕𝑅𝑖 = 𝑠𝑚𝑖 × 𝑤𝑖𝑙 , 𝜕𝛽𝑚𝑙 𝑠𝑚𝑖 = 𝑧𝑚𝑖 × (1 − 𝑧𝑚𝑖 ) × ∑ 𝛾𝑘𝑚 × 𝛿𝑘𝑖 𝜕𝑅𝑖 = 𝑒𝑙𝑖 × 𝑥𝑖𝑝 , 𝜕𝛼𝑙𝑝 𝑒𝑙𝑖 = 𝑤𝑙𝑖 × (1 − 𝑤𝑙𝑖 ) × ∑ 𝛽𝑚𝑙 × 𝑠𝑚𝑖 𝑘=1 𝑀 (13) (14) (15) 𝑚=1 In two-pass back-propagation algorithm, equations (1)-(4) are used in forward pass and computing 𝑓̂𝑘 (𝑥𝑖 ) and equations (13)-(15) are used for computing error in backward pass. 4 Machine Learning – CMPUT 551, Winter 2009 Homework Assignment #2 – Support vector machines for classification and Regression Reihaneh Rabbany These are implemented and could be run via “P2\p2a”. The results are presented in bellow diagrams. Figure 5 – Results for part a of problem 2, Cross entropy error function with softmax, epoch = 500, The left one is without weight decay and the right one is with weight decay (b) Matlab toolbox The result could be regenerated by running “P2\p2b”. One sample result is brought here. Figure 6- The result for part b of problem 2; the left one is without weight decay and the right one is with weight decay 5

WriteUp

Related documents

Products

Support

WriteUp

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib