CSCI 567_hw3_coding

advertisement
CSCI 567 – Machine Learning
Fei Sha
HW #3 Coding Part
Due: 10/13/11
Mahdi Azmandian
Samantha Danesis 5088212498
Gautam Kowshik 8797 940252
Karan Singh
Step 1 – binary logistic regression classifier (odd vs even)
We did not expect binary logistic regression classifier to do well. We estimated that the accuracy on the
training data would be between 40% and 70%. The reason for this low estimation is because we
perceived that knowing the difference between an odd and even number had nothing to do with how
the numbers looked in the 8x8 images. Odd vs even is just if there is a remainder or not when divided
by two. Therefore a 1, 3, 5, 7, and 9 are all odd but look very different. We thought that the logistic
regression would try to combine the 8x8 picture of all the digits into a very warped “digit” that the
program would then think is an odd number. We had doubts as to whether something could be an
“odd” image vs an “even” image.
The results of the LR classifier for odd vs even surprised all of us. The accuracy for the training and
validation sets are just under 90%. Accuracy of the validation data we are submitting is 87.3%.
Prediction values are saved as lr_prediction in hw3_prediction.mat.
Figure 1: binary logistic regression odd vs even
1
Step 2 – neural network classifier (odd vs even)
From the very start we assumed that the neural network classifier would outmatch the binary logistic
regression classifier, but after seeing the accuracy of the LR we think it will be close. Our final answer is
that we think the neural network will perform with accuracy in between 85% and 99%. We still believe
it will be more accurate because of the amount of nodes we can put in the hidden layer. We believe
that the more nodes in the hidden layer the better the performance due to more modes to back
propagate and reduce error.
We were correct in our thinking about the accuracy of the neural network. We used the Matlab Neural
Network Toolbox to create the network and then changed some code around to tailor to the test case.
It is based on a two layer feed forward network using sigmoid transfer functions in the hidden and
output layers. The network is trained with scaled conjugate gradient back propagation. The parameters
we used to run the toolbox are as follows:
Epochs = 20
goal = 0
min_grad = 0
max_fail = 10
This is the number of iterations each network goes through
This is set to zero so that each network goes through same number of iterations
This is set to zero so that each network goes through same number of iterations
If validation data fails 10 times then the network stops training to prevent over fitting
The code ran for a number of different hidden nodes, from 1 hidden node to 32 hidden nodes (going
higher than that takes immense amount of time). There are also four points below of interest to the
homework: 4, 8, 16, and 32 nodes. The following is the performances and accuracy/confusion chart
showing how much of the training and validation data was calculated correctly. We have decided to
turn in our prediction values using a neural network with 32 nodes, validation accuracy approximately
96.7%. Predictions are saved as nn_prediction in hw3_prediction.mat.
2
Figure 2: 4 nodes
Figure 3: 8 nodes
3
Figure 4: 16 nodes
Figure 5: 32 nodes
4
Download