doc - METU Computer Engineering

advertisement
An ANALYSIS of MULTI-LAYER PERCEPTRON
with BACKPROPAGATION
An ANALYSIS of MULTI-LAYER PERCEPTRON
with BACKPROPAGATION
Abstract
Artificial neural networks are tools mostly used in machine learning and/or pattern recognition
branches of computer science. They are tools using a computing technique inspired from the basic
elements of the brain: neurons. This technique is a parametric one and thus prior to learning. Hence
learning in artificial neural networks consists of adjusting these parameters by certain methods (or
algorithms). “Artificial neural networks can be most adequately characterized as `computational
models` with particular properties such as the ability to adapt or learn, to generalize, or to cluster or
organize data, and which operation is based on parallel processing”1.
I simulated an ANN in two different major tasks and obtained results according to different initial
parameters. Deriving relations between the parameters and the results was difficult since in most of the
test cases the training ended with immediate success; a rapid convergence.
The rest of this analysis report continues as follows: First part describes the criteria (the parameters)
which can be adjusted from the user interface and hence its effects can be tested. Second part describes
the inner structure of the network. Third and fourth chapters include the selected test results and the
comments on them. Appendix consists of the programs GUI and an extra list of test results.
1.
PROGRAM
CAPABILITIES
(TESTED CRITERIA)
Neuro-Trainer (NEUTRA) v1.0 is an ANN
interface in which some initial parameters
can be set and the specified network can be
trained. NEUTRA user interface is
explained in full detail in Appendix A.
NEUTRA has four major tasks in itself: It
can create a network, train the network,
test the network and display the test
results. The parameters set from the user
interface are below. However they will be
introduced in detail later.
The parameters which are set during the
creation process are:
Output Layer: The number code of the
output layer.
Input Layer: The number code of the input
layer. It gives the number of the layers
with the “Output Layer” value.
Layer Sizes: Neuron counts in the layers.
The numbers are introduced with commas
in between.
Bias: The bias used in all of the neurons of
the system.
Alpha: The coefficient used in the
redefined error calculations.
Disable: The neurons to be disabled from
the upper hidden layer.
Error: The type of the error. Either
standard or redefined.
Direct Input Conn: The input neurons are
directly connected to the output layer
neurons if checked.
Training parameters are:
Data Size: The number of training loops.
Learning Rate: The coefficient used in
backpropagation.
Momentum: The coefficient that represents
the probability of path-changing in
training.
Training/Data: The number that defines the
percentages of the data to be used in
training and testing. If the number is 100
than all the data is also used in testing. It is
not active in the first task since it has only
4 different input patterns.
1
An ANALYSIS of MULTI-LAYER PERCEPTRON
with BACKPROPAGATION
NEUTRA’s evaluation functions and data
generators are implemented for two major
tasks. First is a [2*2*1] network which
stands for the logic operator XOR. It is
also capable of simulating AND, OR,
NAND, NOR operators. The second is
[5*5*1] network which stands for ((O1
AND O2) OR (O3 AND O5)). The tests and
displaying functions and all methods
requiring a heuristic function is specifically
designed for an output layer of count 1.
Although NEUTRA is capable of doing
more (specifying the layer count,
momentum and etc.) I mostly focused on
the differences in error types, biases and
disabling hidden layer neurons.
2.
NETWORK STRUCTURE
The network is a generic full-connected
multi-layer
perceptron
with
backpropagation. Its neurons are basically
McCulloch-Pitts neurons having evaluation
functions and the network itself is the same
network introduced by Frank Rosenblatt2.
The characteristics of such a network can
be listed as its evaluation function, its
weight updating rule, its error function and
so on. In NEUTRA these characteristics
are defined accordingly:
For the ith neuron of the jth layer, the firing
function is:
N ( j 1)
Vji = f (
V
(j-1)iWjik
)
k 1
where
f ( x) = tanh( x)
The standard error of ith neuron of the jth
layer in a multi-layer perceptron is:
 ji = f '(Vji)(expi - Vji) if it is the output
layer,
 ji = f '(Vji)(
N ( j 1)
  (j+1)kW(j+1)ki)
k 1
otherwise;
where
f '( x) = 1- tanh 2 ( x) , V is the value fired
by the neuron and W are the connection
weights.
Weight updating rule is:
Wji(n) = Wji(n-1) + Wji(n)
where
Wji(n) = µWji(n-1) + ß ji(n)Vi(n)
Here μ represents momentum coefficient
whereas β represents the beta parameter.
The redefined error is:
E ref = E std +
 (W )
ij
2
i,j
There is no stopping criterion defined. The
training ends when the maximum number
of loops is reached. I did not implement a
stopping criterion because of practical
reasons. In almost every case the training
made successful adjustments in a very
small number of loops and this made it
unnecessary to define a stopping criterion.
When a neuron is disabled its connections
to the upper layers are disabled. When
‘Direct Input Connection’ is selected new
connections from input neurons to output
neurons are added.
For both of the tasks the inputs are either 1
or -1; so as the output values. It is basically
a design issue in order to specify a
correspondence between the neurons and
the evaluation function tanh which has a
range of [-1,1].
2
An ANALYSIS of MULTI-LAYER PERCEPTRON
with BACKPROPAGATION
Initial weights are randomized between
[0,1]. Data separation specified with
‘Training/Data’ parameter for training and
test phases is randomized too. This means,
in each training the data given to the
training and test phases will be different.
3.
RESULTS & COMMENTS
Tests are done in order to see the effects of
changes in some specific parameters as
mentioned before. These parameters are
the error type in the first task and the
Training/Data percentage in the second
task. However, during the test procedures I
saw extraordinary effects of some other
parameters. Also I tested the first tasks
network structure in other similar tasks
such as AND, OR problems. Those were to
test the structure only. The results of these
tests are fully present in Appendix B.
3.1
First Task
The first class tests are performed with 10
different amounts of training data (Figure
3.1).
The
data
count
is
(5,10,15,25,50,100,250,1000,5000,10000)
accordingly. I used such small values since
the network converges to a successful limit
immediately. The Error/Epochs and
Tests/Epochs graphics are below (Figure
3.2, Figure 3.3). Since there is not a
significant difference among the other tests
of the class considering these results, these
snapshots can also be considered as mean
values.
Figure 3.1: Configurations of the first class tests.
3
An ANALYSIS of MULTI-LAYER PERCEPTRON
with BACKPROPAGATION
Figure 3.2: Epochs / Tests graphics for data size of 50.
Figure 3.3: Epochs / Errors graphics for data size of 50.
The ‘jumps’ in the figure 3.3 points out the
oscillation due to the value of β. As seen in
the figure 3.2, network gives the exact
solution only after the third loop.
The second class tests are performed
similarly. The data size sets are the same
with the first class. The only difference is
that the ‘Error Type’ parameter is set to
4
An ANALYSIS of MULTI-LAYER PERCEPTRON
with BACKPROPAGATION
“Redefined”. The figures of the test results,
again corresponding to the fifth data size,
are below (Figure 3.4, Figure 3.5).
Figure 3.4: Epochs / Tests graphics for data size of 50.
Figure 3.5: Epochs / Errors graphics for data size of 50.
5
An ANALYSIS of MULTI-LAYER PERCEPTRON
with BACKPROPAGATION
Figure 3.6: Output / Input graphics for the output layer of the network trained with data size of 50.
As seen in the figures, the redefinition of
the error parameter has no significant
effect on the network, or it is impossible to
observe the difference since the training
converges in the third loop immediately.
The decision surfaces do not differ either
(Figure 3.6). This situation may have
several reasons. Firstly, the simplicity of
the task and the structure of the network do
not allow us to differentiate the changes in
this parameter. There is one neuron in the
output layer and there is no specific
heuristic function. Secondly the input
patterns are bipolarized i.e. they can have
only two values, which leads to a network
responding only such patterns. However I
have to mention that increasing the α
coefficient leads the network to different
convergence states (Figure 3.7). High
values for α (> 0.1) makes the network
unsuccessful in total.
3.2
Second Task
In the second task the main target is the
‘Training/Data’ parameter. The class tests
are performed with three different values
for the training data percentages: 70, 90,
and 100 (Figure 3.8).
In the first phase, their differences are
figured separately (Figure 3.9, Figure 3.10,
and Figure 3.11). Since there is no
difference among the Epochs / Tests
graphics, they are not figured. Similar to
the first task, the network converges in the
second or the third loop (Figure B.1).
6
An ANALYSIS of MULTI-LAYER PERCEPTRON
with BACKPROPAGATION
Figure 3.7: The convergence state of a large α value (0.1)
Figure 3.8: Configurations of the second class tests.
7
An ANALYSIS of MULTI-LAYER PERCEPTRON
with BACKPROPAGATION
Figure 3.9: Epochs / Errors graphic for %70 training percentage with data size of 10000.
Figure 3.10: Epochs / Errors graphic for %90 training percentage with data size of 10000.
8
An ANALYSIS of MULTI-LAYER PERCEPTRON
with BACKPROPAGATION
Figure 3.11: Epochs / Errors graphic for %100 training percentage with data size of 10000.
The main idea which can be resolved from
these figures is that the network converges
rapidly when it is tested with the training
data (third case). Another observation is
that it converges in the first case more
rapidly than in the second case. This is
because the training data is smaller and the
network is able to adapt itself to the data
easier.
4.
FURTHER COMMENTS
Throughout the implementation and testing
processes some of the characteristics of the
tasks made it quite impossible to reach
successful observations. Most importantly
the effect of the redefinition of error
criterion cannot be observed due to the
simplicity of the problem. Network, as
mentioned before, rapidly converges.
Increasing the data size of training, the red
area in the Input/Output graphic shrinks to
some degree i.e. the network becomes
more and more stable. However it is not
sensitive to hope better results since the
input patterns are bipolarized.
I used relatively small values for data sizes
in the first task than the second one. The
reason was that the convergence of the first
task had been established in the early steps
of training. Choosing small values gave me
the chance to observe small differences
between the cases (if there existed any).
Independently, I found out that this rapid
convergence of the network highly
depends on the choice of the bias. Setting
bias to 0 makes the network divergent.
Choosing bias too large delays the
convergence, however choosing a small
value leads to different convergence states
(Figure B.2).
9
An ANALYSIS of MULTI-LAYER PERCEPTRON
with BACKPROPAGATION
The polarized character of the input
patterns shows itself in the testing phase.
All 9 types of the tests performed give the
same result in this phase:
This is ironically a deceptive situation
which impedes the convergence of the
network and the observation of the effects
of the different configurations of
parameters. I believe that more heuristic
problems are more suitable to see the
effects of changes in these parameters.
Kemal Taşkın
M.S. in Cognitive Sciences
METU - ANKARA
mail@kemaltaskin.net
http://www.kemaltaskin.net
18.03.2004
Figure 4.1: Test results.
10
An ANALYSIS of MULTI-LAYER PERCEPTRON
with BACKPROPAGATION
APPENDIX
A.
NEUTRA 1.0 PROGRAM USER INTERFACE
Neuro-Trainer v1.0 is a multi-layer perceptron simulator (see 1) written in C#. Its user
interface contains fields for adjusting network parameters and starting simple network actions
(create, train, test, display) (Figure A.1).
Figure A.1: Standard interface of NEUTRA v1.0
The configurations panel was described before (see 1).
Tasks Panel: There are two radio buttons in this panel, corresponding to XOR problem and
the second generic problem. The combo box, which is active only when the first task is
selected, includes additional tasks mentioned before. When the second task is selected the
‘Training/Data’ parameter becomes active. It is not enabled in the first task though.
Actions: This panel involves three major tasks concerning the network. <Create> button
creates the network. <Train> button begins the training and <Delete> button deletes the
network in order to create networks for more tests.
Display: This panel involves 4 buttons. <Weights> button displays the connection weights of
the network at that moment. <Tests> button displays the graphics window for test results
(Correct/All), <Errors> button displays the graphics window for drifts of the results and
<Decision Surfaces> displays a window corresponding to the decision surfaces of each upper
hidden layer.
Test: This panel has two buttons on it. <Test> button tests the system with the specified
number of test data and displays the results. It is also possible to test a single input pattern
with <Ind. Test> button by writing the button into the text box left.
i
An ANALYSIS of MULTI-LAYER PERCEPTRON
with BACKPROPAGATION
The source code and the executable are available in http://thesis.kemaltaskin.net/.
B.
FURTHER TEST RESULTS & FIGURES
Figure B.1: Epochs / Tests graphics (common) for the second task.
Figure B.2: Epochs / Tests graphics for the first task with bias 0.1
ii
An ANALYSIS of MULTI-LAYER PERCEPTRON
with BACKPROPAGATION
Figure B.3: Epochs / Tests graphics for the first task with AND problem.
C.
REFERENCES
1:
2:
Smagt, P. Van Der & Kröse, B. (1996), An Introduction to Neural Networks [p.13]
Rosenblatt, F. (1958), The Perceptron: A Probabilistic Model for Information Storage
and Organization in the Brain [p. 386-408]
Kawaguchi, K. (2000), A Multithreaded Software Model for Backpropagation Neural
Network Applications [Chp. 2.4.4]
Bodén, M. (2001), A Guide to Recurrent Neural Networks and Backpropagation
3:
4:
iii
Download