Alexander Saites CS425 05-Oct-12 UsingNeuralNetworkstoSolvea ClassificationProblem Introduction Artificial Neural Networks are a biologically inspired mathematical model which uses “neurons” connected via “synapses” to move data through a network. Feed-forward artificial neural nets use the back-propagation algorithm to systematically change the synaptic weights while (input, output) vectors are presented to the network. During this process, the network is able to discover regularities in the data which map the input to the output (provided such a function exists, and the network is given sufficient time, neurons, and a small enough step size). In this project, I programmed a feed-forward, artificial neural network and used it to classify letters from an adapted version of the “Artificial Character Database” from UCI Machine Learning Repository. This paper provides a brief explanation of the problem, shows the basic design of the neural network, and provides results of testing the problem set on a few different neural network architectures. The Problem This problem is a basic classification problem. A grid representation of the “Artificial Character Database” has been provided by Dr. Parker. In it, each character is represented as a 12x8 array of binary pixels, with 0 representing “off” and 1 representing “on”. 600 example images were used, and these were divided into three groups: 100 images were used as training data, 250 images were used as validation data after each epoch of training data, and finally 250 images were used to test the resulting network architecture. The images represented 10 capital letters, A, C, D, E, F, G, H, L, P, and R. The Network The artificial neural network used is a feed-forward ANN using back-propagation to update synaptic weights. There are 96 input neurons, each representing one of the 12x8 pixels in the image. There are 10 output neurons, each representing a different letter, of which one should output 1 (the correct letter) while all others output 0 (indicating that is not the correct letter). The hidden neurons were activated usingtanh , as it allows the output to be between -1 and 1. Its derivative (sech )^2, was used for back propagation. Soft-max was used to scale the output neurons to values between 0 and 1, making their error calculation easier and training more efficient. The network was trained “online”; for each epoch, first a permutation of the input set was produced. This input set was then run, in its permuted order, through the network. After each input example, Alexander Saites CS425 05-Oct-12 weights were updated. Upon competition of the epoch, the sum of squared errors for the all in input examples was recorded. Additionally, the sum of squared errors for each output neuron was recorded after each epoch. The validation data was then presented to the network. No weight updates occurred during validation. After the validation data was presented, the sum of squared errors for the validation data was recorded. Finally, after all epochs were performed, the network was tested using the testing data, and its total accuracy, accuracy for each letter, and confusion matrix are presented to the user. Results Given the sum of squared errors for each epoch of both the training and validation data, we are able to observe the training and validation errors for a particular network architecture to quickly determine a) if the network is capable of classifying the input data to our satisfaction, and b) at what point we should stop training the network, as the validation error is no longer decreasing. Using 40 neurons, with an eta value of .1, I trained for several hundred epochs. Here are the error results for the first 50 epochs, showing that strong results are obtained after as few as 25. Error vs Epochs 0.3 0.25 Scalled Error 0.2 0.15 Training Error Validation Error 0.1 0.05 0 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 Epochs Figure 1: A good network This network has an overall accuracy rating of .958, and achieves perfect classification for some letters. It’s confusion matrix shows which letters look quite similar to the network: Alexander Saites a 224 0 0 0 3 0 11 0 2 1 a c d e f g h l p r CS425 c 0 250 0 0 0 6 0 0 0 0 d 0 0 246 0 0 0 0 0 4 0 e 1 0 1 244 0 0 0 0 0 0 f 4 0 0 3 226 0 5 0 13 0 05-Oct-12 g 0 0 0 0 0 244 0 0 0 0 h 19 0 1 0 0 0 233 0 0 0 l 0 0 0 3 0 0 0 250 0 0 p 1 0 2 0 20 0 0 0 231 1 r 1 0 0 0 1 0 1 0 0 248 As we can see, “A” is often misclassified as “H” and vice-versa. Similarly, the network gets confused attempting to distinguish “P” from “F”. And here is the success rate for each letter: a c d e f g h l p r 0.896 1 0.984 0.976 0.904 0.976 0.932 1 0.924 0.992 This network is provided with these files, and can be loaded into the network and tested if desired. For fun, a terrible network design is provided. With only 4 neurons and an eta of .4, the network is only able to achieve a bit above 50% overall accuracy on the testing data: Error vs Epochs 0.45 0.4 0.35 0.25 Training Error 0.2 Validation Error 0.15 0.1 0.05 0 1 9 17 25 33 41 49 57 65 73 81 89 97 105 113 121 129 137 145 Scaled Error 0.3 Epochs Figure 2: A bad network