Using Neural Networks to Solve a Classification Problem Introduction

advertisement
Alexander Saites
CS425
05-Oct-12
UsingNeuralNetworkstoSolvea
ClassificationProblem
Introduction
Artificial Neural Networks are a biologically inspired mathematical model which uses “neurons”
connected via “synapses” to move data through a network. Feed-forward artificial neural nets use the
back-propagation algorithm to systematically change the synaptic weights while (input, output) vectors
are presented to the network. During this process, the network is able to discover regularities in the
data which map the input to the output (provided such a function exists, and the network is given
sufficient time, neurons, and a small enough step size).
In this project, I programmed a feed-forward, artificial neural network and used it to classify letters
from an adapted version of the “Artificial Character Database” from UCI Machine Learning Repository.
This paper provides a brief explanation of the problem, shows the basic design of the neural network,
and provides results of testing the problem set on a few different neural network architectures.
The Problem
This problem is a basic classification problem. A grid representation of the “Artificial Character
Database” has been provided by Dr. Parker. In it, each character is represented as a 12x8 array of binary
pixels, with 0 representing “off” and 1 representing “on”. 600 example images were used, and these
were divided into three groups: 100 images were used as training data, 250 images were used as
validation data after each epoch of training data, and finally 250 images were used to test the resulting
network architecture. The images represented 10 capital letters, A, C, D, E, F, G, H, L, P, and R.
The Network
The artificial neural network used is a feed-forward ANN using back-propagation to update synaptic
weights. There are 96 input neurons, each representing one of the 12x8 pixels in the image. There are
10 output neurons, each representing a different letter, of which one should output 1 (the correct
letter) while all others output 0 (indicating that is not the correct letter).
The hidden neurons were activated usingtanh , as it allows the output to be between -1 and 1. Its
derivative (sech )^2, was used for back propagation.
Soft-max was used to scale the output neurons to values between 0 and 1, making their error
calculation easier and training more efficient.
The network was trained “online”; for each epoch, first a permutation of the input set was produced.
This input set was then run, in its permuted order, through the network. After each input example,
Alexander Saites
CS425
05-Oct-12
weights were updated. Upon competition of the epoch, the sum of squared errors for the all in input
examples was recorded. Additionally, the sum of squared errors for each output neuron was recorded
after each epoch. The validation data was then presented to the network. No weight updates occurred
during validation. After the validation data was presented, the sum of squared errors for the validation
data was recorded.
Finally, after all epochs were performed, the network was tested using the testing data, and its total
accuracy, accuracy for each letter, and confusion matrix are presented to the user.
Results
Given the sum of squared errors for each epoch of both the training and validation data, we are able to
observe the training and validation errors for a particular network architecture to quickly determine a) if
the network is capable of classifying the input data to our satisfaction, and b) at what point we should
stop training the network, as the validation error is no longer decreasing.
Using 40 neurons, with an eta value of .1, I trained for several hundred epochs. Here are the error
results for the first 50 epochs, showing that strong results are obtained after as few as 25.
Error vs Epochs
0.3
0.25
Scalled Error
0.2
0.15
Training Error
Validation Error
0.1
0.05
0
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49
Epochs
Figure 1: A good network
This network has an overall accuracy rating of .958, and achieves perfect classification for some letters.
It’s confusion matrix shows which letters look quite similar to the network:
Alexander Saites
a
224
0
0
0
3
0
11
0
2
1
a
c
d
e
f
g
h
l
p
r
CS425
c
0
250
0
0
0
6
0
0
0
0
d
0
0
246
0
0
0
0
0
4
0
e
1
0
1
244
0
0
0
0
0
0
f
4
0
0
3
226
0
5
0
13
0
05-Oct-12
g
0
0
0
0
0
244
0
0
0
0
h
19
0
1
0
0
0
233
0
0
0
l
0
0
0
3
0
0
0
250
0
0
p
1
0
2
0
20
0
0
0
231
1
r
1
0
0
0
1
0
1
0
0
248
As we can see, “A” is often misclassified as “H” and vice-versa. Similarly, the network gets confused
attempting to distinguish “P” from “F”.
And here is the success rate for each letter:
a
c
d
e
f
g
h
l
p
r
0.896
1
0.984
0.976
0.904
0.976
0.932
1
0.924
0.992
This network is provided with these files, and can be loaded into the network and tested if desired.
For fun, a terrible network design is provided. With only 4 neurons and an eta of .4, the network is only
able to achieve a bit above 50% overall accuracy on the testing data:
Error vs Epochs
0.45
0.4
0.35
0.25
Training Error
0.2
Validation Error
0.15
0.1
0.05
0
1
9
17
25
33
41
49
57
65
73
81
89
97
105
113
121
129
137
145
Scaled Error
0.3
Epochs
Figure 2: A bad network
Download