CSCI 567 – Machine Learning Fei Sha HW #3 Coding Part Due: 10/13/11 Mahdi Azmandian Samantha Danesis 5088212498 Gautam Kowshik 8797 940252 Karan Singh Step 1 – binary logistic regression classifier (odd vs even) We did not expect binary logistic regression classifier to do well. We estimated that the accuracy on the training data would be between 40% and 70%. The reason for this low estimation is because we perceived that knowing the difference between an odd and even number had nothing to do with how the numbers looked in the 8x8 images. Odd vs even is just if there is a remainder or not when divided by two. Therefore a 1, 3, 5, 7, and 9 are all odd but look very different. We thought that the logistic regression would try to combine the 8x8 picture of all the digits into a very warped “digit” that the program would then think is an odd number. We had doubts as to whether something could be an “odd” image vs an “even” image. The results of the LR classifier for odd vs even surprised all of us. The accuracy for the training and validation sets are just under 90%. Accuracy of the validation data we are submitting is 87.3%. Prediction values are saved as lr_prediction in hw3_prediction.mat. Figure 1: binary logistic regression odd vs even 1 Step 2 – neural network classifier (odd vs even) From the very start we assumed that the neural network classifier would outmatch the binary logistic regression classifier, but after seeing the accuracy of the LR we think it will be close. Our final answer is that we think the neural network will perform with accuracy in between 85% and 99%. We still believe it will be more accurate because of the amount of nodes we can put in the hidden layer. We believe that the more nodes in the hidden layer the better the performance due to more modes to back propagate and reduce error. We were correct in our thinking about the accuracy of the neural network. We used the Matlab Neural Network Toolbox to create the network and then changed some code around to tailor to the test case. It is based on a two layer feed forward network using sigmoid transfer functions in the hidden and output layers. The network is trained with scaled conjugate gradient back propagation. The parameters we used to run the toolbox are as follows: Epochs = 20 goal = 0 min_grad = 0 max_fail = 10 This is the number of iterations each network goes through This is set to zero so that each network goes through same number of iterations This is set to zero so that each network goes through same number of iterations If validation data fails 10 times then the network stops training to prevent over fitting The code ran for a number of different hidden nodes, from 1 hidden node to 32 hidden nodes (going higher than that takes immense amount of time). There are also four points below of interest to the homework: 4, 8, 16, and 32 nodes. The following is the performances and accuracy/confusion chart showing how much of the training and validation data was calculated correctly. We have decided to turn in our prediction values using a neural network with 32 nodes, validation accuracy approximately 96.7%. Predictions are saved as nn_prediction in hw3_prediction.mat. 2 Figure 2: 4 nodes Figure 3: 8 nodes 3 Figure 4: 16 nodes Figure 5: 32 nodes 4