WriteUp

advertisement
Machine Learning – CMPUT 551, Winter 2009
Homework Assignment #2 – Support vector machines for classification and Regression
Reihaneh Rabbany
P1) Boosting, Bagging and Trees
(a) Bagged regression tree
For a Matlab bagged regression tree program to predict log volume, I used Matlab tree function “treefit”
to fit the tree on training data and also “treeval” to compute predicted value of the resulted tree on test
data. For computing the proper number of bootstrap samples I repeat the algorithm for different
number of samples. The result is printed bellow:
Figure 1 - Mean Absolute Error per Number of Bootstrap Samples, the x axis is number of bootstrap samples divided by 10
Based on above results I chose to draw 50 bootstrap samples and train the trees on these. The results
for true value – predicted value on test set and train set are presented here. These could be re-obtained
by running “P1\p1a.m”.
Figure 2 – Result of Problem 1 part a, the left diagram is Train Error (in Blue) and the right one is Test Error (in Red).
I’ve also check different level of pruning but they do not lead to a significant change on the results.
1
Machine Learning – CMPUT 551, Winter 2009
Homework Assignment #2 – Support vector machines for classification and Regression
Reihaneh Rabbany
(b) Boosted tree
For a Matlab boosted tree program to predict log volume, I implement Forward stage wise additive
modeling algorithm from the textbook. For computing basis functions I used Matlab tree functions
“treefit” and I used squared-error loss in boosting; therefore, the solution 10.28 simplified to fitting a
tree on data for 𝑦 − π‘“π‘š−1 (π‘₯). For computing proper number for M or number of functions, I compute
the mean error on training and test data for different number of Ms; the bellow diagrams show the
result.
Figure 3 – Mean Error per Number of Functions, The left diagram is on train data and the left one is on test data
These show that although the train result is decreasing for large Ms, the test result is increasing due to
overfitting. Based on these results, I chose M equal to 4 and compute the true value – predicted value
for each data point. The bellow diagrams show the results which could be regenerated using
“P1\p1b.m”.
Figure 4 – Result of Problem 1 part b for M=4, the left diagram is Train Error (in Blue) and the right one is Test Error (in Red).
2
Machine Learning – CMPUT 551, Winter 2009
Homework Assignment #2 – Support vector machines for classification and Regression
Reihaneh Rabbany
P2) Neural Networks and Back-propagation Algorithm
(a) Implementation of Back-propagation
To implement back-propagation algorithm for solving the classification problem in Figure 11.4 from the
textbook, we have two hidden layers. So we have a network as bellow:
X
W


0
0
Z
T

0
As we should have softmax function, we would have:
π‘Šπ‘™ = 𝜎(𝛼 𝑇 𝑋)
π‘π‘š = 𝜎(𝛽 𝑇 π‘Š)
π‘‡π‘˜ = 𝛾 𝑇 𝑍
π‘“π‘˜ (𝑋) = π‘”π‘˜ (𝑇)
𝑒 π‘‡π‘˜
π‘”π‘˜ (𝑇) = 𝐾 𝑇
∑𝑙=1 𝑒 𝑙
𝐾
π‘‡π‘˜
𝑒 × (∑𝑙=1 𝑒 𝑇𝑙 ) − 𝑒 π‘‡π‘˜ × π‘’ π‘‡π‘˜
π‘”π‘˜′ (𝑇) =
𝑇𝑙 2
(∑𝐾
𝑙=1 𝑒 )
= π‘”π‘˜ (𝑇)(1 − π‘”π‘˜ (𝑇))
1
𝜎(π‘₯) =
,
1 + 𝑒 −π‘₯
−π‘₯
𝑒
𝜎 ′ (π‘₯) =
= 𝜎(π‘₯) × (1 − 𝜎(π‘₯))
(1 + 𝑒 −π‘₯ )2
𝑙 = 1. .5,
π‘š = 1. .5 ,
π‘˜ = 1. .2,
𝛼: 𝐿 × (𝑝 + 1)
𝛽: 𝑀 × (𝐿 + 1)
𝛾: 𝐾 × (𝑀 + 1)
(1)
(2)
(3)
(4)
(5)
(5-1)
(5-2)
We also should use cross entropy error function, the back-propagation equations are derived bellow:
𝑁
𝐾
𝑁
(6)
𝑅(πœƒ) = − ∑ ∑ π‘¦π‘–π‘˜ π‘™π‘œπ‘” π‘“π‘˜ (π‘₯𝑖 ) = − ∑ 𝑅𝑖
𝑖=1 π‘˜=1
𝑖=1
πœ•π‘…π‘–
π‘¦π‘–π‘˜
=−
× π‘”π‘˜′ (π›Ύπ‘˜π‘‡ 𝑧𝑖 ) × π‘§π‘šπ‘–
πœ•π›Ύπ‘˜π‘š
π‘“π‘˜ (π‘₯𝑖 )
= −π‘¦π‘–π‘˜ × (1 − π‘“π‘˜ (π‘₯𝑖 )) × π‘§π‘šπ‘–
3
(7)
Machine Learning – CMPUT 551, Winter 2009
Homework Assignment #2 – Support vector machines for classification and Regression
Reihaneh Rabbany
𝐾
(8)
πœ•π‘…π‘–
π‘¦π‘–π‘˜
𝑇
= −∑
× π‘”π‘˜′ (π›Ύπ‘˜π‘‡ 𝑧𝑖 ) × π›Ύπ‘˜π‘š × πœŽ ′ (π›½π‘š
𝑀𝑖 ) × π‘€π‘–π‘™
πœ•π›½π‘šπ‘™
π‘“π‘˜ (π‘₯𝑖 )
π‘˜=1
𝐾
𝑇
𝑇
= − ∑ π‘¦π‘–π‘˜ × (1 − π‘“π‘˜ (π‘₯𝑖 )) × π›Ύπ‘˜π‘š × πœŽ(π›½π‘š
𝑀𝑖 ) × (1 − 𝜎(π›½π‘š
𝑀𝑖 )) × π‘€π‘™π‘–
π‘˜=1
𝐾
= − ∑ π‘¦π‘–π‘˜ × (1 − π‘“π‘˜ (π‘₯𝑖 )) × π›Ύπ‘˜π‘š × π‘§π‘šπ‘– × (1 − π‘§π‘šπ‘– ) × π‘€π‘™π‘–
π‘˜=1
(9)
𝑀
𝐾
πœ•π‘…π‘–
π‘¦π‘–π‘˜
𝑇
=−∑ ∑
× π‘”π‘˜′ (π›Ύπ‘˜π‘‡ 𝑧𝑖 ) × π›Ύπ‘˜π‘š × πœŽ ′ (π›½π‘š
𝑀𝑖 ) × π›½π‘šπ‘™ × πœŽ ′ (𝛼𝑙𝑇 π‘₯𝑖 ) × π‘₯𝑖𝑝
πœ•π›Όπ‘™π‘
π‘“π‘˜ (π‘₯𝑖 )
π‘š=1 π‘˜=1
𝑀
𝐾
= − ∑ ∑ π‘¦π‘–π‘˜ × (1 − π‘“π‘˜ (π‘₯𝑖 )) × π›Ύπ‘˜π‘š × π‘§π‘šπ‘– × (1 − π‘§π‘šπ‘– ) × π›½π‘šπ‘™ × π‘€π‘™π‘– × (1
π‘š=1 π‘˜=1
− 𝑀𝑖𝑙 ) × π‘₯𝑖𝑝
For gradient descent update we have:
𝑁
π›Ύπ‘˜π‘š
π‘Ÿ+1
πœ•π‘…π‘–
πœ•π›Ύπ‘˜π‘š π‘Ÿ
(10)
πœ•π‘…π‘–
πœ•π›½π‘šπ‘™ π‘Ÿ
(11)
πœ•π‘…π‘–
πœ•π›Όπ‘™π‘ π‘Ÿ
(12)
π‘Ÿ
= π›Ύπ‘˜π‘š − πœ†π‘Ÿ ∑
𝑖=1
𝑁
π›½π‘šπ‘™ π‘Ÿ+1 = π›½π‘šπ‘™ π‘Ÿ − πœ†π‘Ÿ ∑
𝑖=1
𝑁
𝛼𝑙𝑝 π‘Ÿ+1 = 𝛼𝑙𝑝 π‘Ÿ − πœ†π‘Ÿ ∑
𝑖=1
Where πœ†π‘Ÿ is the learning rate. From (7)-(9) we can obtain back-propagation equations:
πœ•π‘…π‘–
= π›Ώπ‘˜π‘– × π‘§π‘šπ‘– ,
πœ•π›Ύπ‘˜π‘š
π›Ώπ‘˜π‘– = −π‘¦π‘–π‘˜ × (1 − π‘“π‘˜ (π‘₯𝑖 ))
𝐾
πœ•π‘…π‘–
= π‘ π‘šπ‘– × π‘€π‘–π‘™ ,
πœ•π›½π‘šπ‘™
π‘ π‘šπ‘– = π‘§π‘šπ‘– × (1 − π‘§π‘šπ‘– ) × ∑ π›Ύπ‘˜π‘š × π›Ώπ‘˜π‘–
πœ•π‘…π‘–
= 𝑒𝑙𝑖 × π‘₯𝑖𝑝 ,
πœ•π›Όπ‘™π‘
𝑒𝑙𝑖 = 𝑀𝑙𝑖 × (1 − 𝑀𝑙𝑖 ) × ∑ π›½π‘šπ‘™ × π‘ π‘šπ‘–
π‘˜=1
𝑀
(13)
(14)
(15)
π‘š=1
In two-pass back-propagation algorithm, equations (1)-(4) are used in forward pass and computing
π‘“Μ‚π‘˜ (π‘₯𝑖 ) and equations (13)-(15) are used for computing error in backward pass.
4
Machine Learning – CMPUT 551, Winter 2009
Homework Assignment #2 – Support vector machines for classification and Regression
Reihaneh Rabbany
These are implemented and could be run via “P2\p2a”. The results are presented in bellow diagrams.
Figure 5 – Results for part a of problem 2, Cross entropy error function with softmax, epoch = 500, The left one is without
weight decay and the right one is with weight decay
(b) Matlab toolbox
The result could be regenerated by running “P2\p2b”. One sample result is brought here.
Figure 6- The result for part b of problem 2; the left one is without weight decay and the right one is with weight decay
5
Download