3.4. How to organize learning the networks?

advertisement
3.4. How to organize learning the networks?
(translation by Agata Krawcewicz, hogcia@gmail.com)
Necessary changes of values of weight coefficients in each neuron are counted basing on
special rules (sometimes called the paradigms of networks), where the number of different
rules that are used today and their varieties is truthfully extreme, because almost every
explorer tried to carry his own contribution in to the field of neural networks in the form of a
new rule of learning. One opinion about this problem gives a collection of algorithms of
learning described in my book "Neural networks" to which I constantly send the more
inquiring Readers for more detailed information (http://winntbg.bg.agh.edu.pl/skrypty/0001/).
Here I will talk over shortly (without the use of mathematics, because such a deal we
contracted at the very beginning of this book!) two basic rules of learning: the rule of the
quickest fall, lying at grassroots of the most algorithms of learning with a teacher and the
Hebb rule, defining the most simple example of learning without a teacher.
The rule of the quickest fall relies on the fact that every neuron having received definite
signals on its entries (from the networks entries or from other neurons, which are earlier levels
of processing the information) makes its own output signal using possessed knowledge in the
form of earlier settled values of all amplification factors (weights) of all entries and (possibly)
the threshold. Manners of marking the value of output signals by neurons, on the ground of
input signals were talked over more precisely in the previous chapter. The value of the output
signal, appointed by the neuron in the given step of the process of learning is compared with
the standard-answer given by the teacher within the learning set. If there is a divergence (and
at the beginning of the process of learning such divergence will almost always appear,
because where on earth does this neuron have to know, what do we want from it?) - the
neuron gets the difference between its own output signal and the value of the signal which
would be - according to the teacher - correct, and also fixes (by means of the method of the
quickest fall which I will soon explain), how to change the values of the weights, so that this
error will most quickly grow smaller.
In further considerations it will be useful to know the notion of the area of the error which we
will now introduce and talk over precisely. And so you already know the fact that the activity
of the network relies on the values of weight coefficients of neurons being its elements. If you
know the set of all weight coefficients, appearing in all neurons of the whole network, then
you know how such network will act. Particularly you can show to such network (in turns) all
examples of assignments accessible to you, together with solutions, which are a part of
learning set. Every time the network will make its own answer to the asked question - you can
compare it with the pattern of the correct answer which is found in the learning set, marking
the error which the network committed. A measure of this error is usually the difference
between the value of the result delivered by the network and with the value of the result read
from the learning set. To overall rate the activity of networks with the defined set of weightcoefficients in its neurons - we usually use the pattern of the sum of squares of errors
committed by the network for each case from learning set. Before summing up the errors are
squared, to avoid the effect mutual compensating of positive and negative errors, and what is
more the squaring causes that the especially „heavy penalty” meets the network for large
errors (twice greater error is a quadruple component in the created sum).
Fig. 3.4. Method of error surface forming. Discussion In the text. Please observe the yellow
rectangle and yellow cloud fist and navy blue rectangle and cloud next.
Look at figure 3.4. A situation was shown on it, which could take place in a network so small
that it would have only two coefficients of the weights. Such small neural networks do not
exist, but imagine that you have such a network, just because only for such very small
network will we succeed to draw its behavior without going into difficult, multidimensional
spaces. Every state of better or worse learning of this network will be joined with some point
on the horizontal (light-blue) visible surface shown in the figure, together with its coordinates,
that is with both considered weights- coefficients. Imagine now that you placed such values of
weights in the network, which comply with the location of the red point on the surface.
Examining such network by means of all elements of the learning set, you will find the total
value of the error of this network - and in the place of the red point you will put a (red!)
arrow, pointing up, with the height being the calculated value of the error (according to
description the vertical axis in fig.).
Next, choose other values of weights, marking other position of the point on the surface (navy
blue) - and perform the same acts, receiving the navy blue pointer.
And now imagine that these acts you perform for all combinations of weight- coefficients,
that is to say for all points of the light-blue surface. In one places errors will be greater, in
other smaller, what you would be able to see (if you had a patience to examine your network
so many times) in the form of error surface, spreading over the surface of the changed
weights. An example of such surface I have shown you it in fig. 3.5.
Fig. 3.5. Example of error surface formed during the neural network learning
As you can see there are many „knolls” on this surface - those are the places in which the
network commits especially many errors, so such places should be avoided, and we have
many deep valleys which we find very interesting, because there, on the bottom of such
valleys, the neural network commits little errors, that is to say solves its assignments
especially well.
How should we find such valley?
And so you should consider learning the neural network as the multistage-process. During this
process you will try step by step to improve values of weights in the network, changing old
(worse) sets of the weights, causing that the network to commit a large error, for new,
regarding which you will be hoping (but you will not be certain) that they are better. Look at
the figure 3.6, whereon I have tried to illustrate this.
Fig. 3.6. Illustration of neural network learning process as “sliding down” on error surface
You begin from the situation shown in the left bottom corner of the figure, which means you
have some old set (a vector) of weights, noted on the surface of parameters of the network
with the yellow circle. For this vector of weights you find the error, which the network makes,
and you „land” on the surface of the error in the place which is marked with a yellow arrow in
the left upper corner of the figure. Nothing good can be said about this situation: the error is
very high, so the network has temporarily very bad parameters. It is necessary to improve this.
How?
Well exactly. Methods of learning the neural networks can find which way it is necessary to
change the weight coefficients, to obtain the effect of the diminution of the error. Such
direction of the quickest fall of the error is noted in fig. 3.6 as a large black pointer.
Unfortunately, the details of how the methods of learning do this, cannot be explained
without using complicated mathematics and such notions as the gradient or the partial
derivative, however conclusions from these quite complicated mathematical considerations
are simple enough. And so every neuron in the network makes the modification of its own
weight- (and possibly the threshold) coefficients, using the following two simple rules:
weights are changed these more strongly, when a greater error was detected;
weights connected with these entries, on which large values of input signals appeared, are
changed more, than weights of these entries on which the input signal was small.
Previously described basic rules in practice still need several additional corrections (in a
moment I will say more about them), however the described outline of the method of learning
is surely clear. Knowing the error committed by the neuron and knowing its input signals you
can easily foresee, how its weights will change. Also notice, how very logical and sensible are
these mathematically led out rules: For example the faultless reaction of the neuron on given
input signal should of course cause leaving its weights without a change - because they led us
to a success. And this is just what is happening!
Notice that the network using described methods in practice, breaks the process of
learning itself, when it is already well trained, because small errors cause only minimum,
"cosmetic" corrections of the weights. It is logical, similarly as the rule of the subordination of
the size of the correction from the size of the input-signal delivered by considered weight because these entries on which greater signals appeared had a greater influence on the result
of the activity of the neuron which proved to be incorrect, so it is necessary to "sharpen" them
more strongly. Particularly the described algorithm causes that for the entries on which in this
very moment the signals were not given (during calculations they had zero-values), suitable
weights are not changed, because not we do not know whether they are good or not, because it
did not participate in the creation of the current (surely incorrect, if it is necessary improve it)
output signal.
Returning to the presentation of one step of the process of learning, shown in fig. 3.6,
notice, what goes on further: Having found the direction of the quickest fall of the error the
algorithm of learning the network makes the migration in the space of weights, consisting of
changing the old (worse) vector of weights to a new one (better). This migration causes that
on the surface of the error we „will slide down” to a new point - most often situated lower,
that is to say bringing near the network to the longed-for valley in which errors are least, and
solution of put assignments - most perfect. Such optimistic scenario of gradual and efficient
moving toward the place, where errors are least, is shown in figure 3.7.
Fig. 3.7. Searching (and fining!) network parameters (weight coefficient for all neurons)
guarantying minimal value of the error – during supervised learning.
Download