7.1. What is backpropagation?

advertisement
7.1. What is backpropagation?
(translation by Piotr Czech, Piotr.Czech@polsl.pl)
In the previous chapter we were discussing some chosen aspects of functioning and teaching a
single-layer neural network built from non-linear elements. Here, we continue the analysis
showing how the multi-layer non-linear networks can work. Such networks have, as you
already know, more significant and interesting possibilities – which you may have found
checking the functioning of the program Example 06. Today we will talk about how such
networks can be used and taught.
You have already got the information that we are going to build multi-layer networks from
non-linear neurons. You also know how a non-linear neuron can be taught (revise the program
Example 06). However, you do not know (and even if you know you have not felt it!) that the basic
problem with teaching such multi-layer neural networks built out of non-linear neurons is the
problem of so-called hidden layers (fig. 7.1).
Fig. 7.1. Hidden layers in a multi-layer neural network
What does this problem mean?
The rules of teaching that you have acquired in the previous sub-chapters were based on a
simple but very successful method: each neuron of the network individually introduced corrections
to its working knowledge (by changing the values of the weight coefficients on all its inputs) on
the basis of the known error value which has been made. In case of single-layer network the
situation was simple and obvious: output signal of each neuron was compared with the correct
value given by the teacher which gave the sufficient basis to correct the weights. In case of multilayer network the process is not so easy. Neurons of the final (output) layer may have their errors
estimated in quite simple and certain way – as previously, by comparison of the signal produced
by each neuron with the model signal given by the teacher.
What about the neurons from the previous layers? Here, the errors must be estimated
mathematically because they cannot be measured directly – we lack the information what
SHOULD the values of the right signals equal, because the teacher does not define the
intermediate values, he or she only concentrates on the final effect.
A method, which is commonly used to “guess” the errors of neurons in hidden layers is
the method called backpropagation (backward propagation of errors). This method is so popular
that in most ready programs which are used to created networks models and to teaching networks
– the method is applied as default method, although there are currently many other teaching
methods, for example an accelerated method of this algorithm called quick-propagation, as well
as methods based on more sophisticated mathematical methods such as conjugate gradient
method and the Levenberg-Marquardt method. The mentioned methods (with at least a dozen
more which are even more sophisticated) have such advantage that they are very fast. But such
advantage occurs only in case when the problem which should be solved by the neural network
(by finding out the method of its solution on the basis of teaching process) meets all sophisticated
mathematical requirements. However, in most cases when we know the problem which should be
solved by the network we do not know if the problem meets such complicated assumptions or not.
What does it mean in practice?
In a nutshell, the following situation occurs:
We have a difficult task to solve, so we take a neural network and we start to teach it
with the use of one of those sophisticated and modern methods – for example we use the
Levenberg-Marquardt algorithm. If we had that kind of problem, how easy the network can be
taught with the use of this method – then the network would quickly be well trained. But if not so,
and in case such modern algorithm will endlessly lead us astray and the network will not learn
anything – it means such theoretical assumptions are not met from the beginning. On the contrary
to the previous method, the backpropagation, which I shall present in this chapter, has such nice
advantage that it works independently from whatsoever theoretical assumptions.
It means that, contrary to the other clever algorithms which sometimes work, the
backpropagation method always works. Of course, sometimes it may work irritatingly slow –
but it will never let you down. It is worth to get to know the method, because those people who
use the neural networks professionally often and willingly come back to this method as to a welltried out partner.
The backpropagation method will be presented in action through the analysis of
behaviour of another program which I will present here. Before it happens, however, we must
come back to one detailed issue, which will be very important here and which has been a little
neglected so far. I will tackle the problem of the shape of nonlinear characteristics used in
testing neurons.
Download