7.1. What is backpropagation? (translation by Piotr Czech, Piotr.Czech@polsl.pl) In the previous chapter we were discussing some chosen aspects of functioning and teaching a single-layer neural network built from non-linear elements. Here, we continue the analysis showing how the multi-layer non-linear networks can work. Such networks have, as you already know, more significant and interesting possibilities – which you may have found checking the functioning of the program Example 06. Today we will talk about how such networks can be used and taught. You have already got the information that we are going to build multi-layer networks from non-linear neurons. You also know how a non-linear neuron can be taught (revise the program Example 06). However, you do not know (and even if you know you have not felt it!) that the basic problem with teaching such multi-layer neural networks built out of non-linear neurons is the problem of so-called hidden layers (fig. 7.1). Fig. 7.1. Hidden layers in a multi-layer neural network What does this problem mean? The rules of teaching that you have acquired in the previous sub-chapters were based on a simple but very successful method: each neuron of the network individually introduced corrections to its working knowledge (by changing the values of the weight coefficients on all its inputs) on the basis of the known error value which has been made. In case of single-layer network the situation was simple and obvious: output signal of each neuron was compared with the correct value given by the teacher which gave the sufficient basis to correct the weights. In case of multilayer network the process is not so easy. Neurons of the final (output) layer may have their errors estimated in quite simple and certain way – as previously, by comparison of the signal produced by each neuron with the model signal given by the teacher. What about the neurons from the previous layers? Here, the errors must be estimated mathematically because they cannot be measured directly – we lack the information what SHOULD the values of the right signals equal, because the teacher does not define the intermediate values, he or she only concentrates on the final effect. A method, which is commonly used to “guess” the errors of neurons in hidden layers is the method called backpropagation (backward propagation of errors). This method is so popular that in most ready programs which are used to created networks models and to teaching networks – the method is applied as default method, although there are currently many other teaching methods, for example an accelerated method of this algorithm called quick-propagation, as well as methods based on more sophisticated mathematical methods such as conjugate gradient method and the Levenberg-Marquardt method. The mentioned methods (with at least a dozen more which are even more sophisticated) have such advantage that they are very fast. But such advantage occurs only in case when the problem which should be solved by the neural network (by finding out the method of its solution on the basis of teaching process) meets all sophisticated mathematical requirements. However, in most cases when we know the problem which should be solved by the network we do not know if the problem meets such complicated assumptions or not. What does it mean in practice? In a nutshell, the following situation occurs: We have a difficult task to solve, so we take a neural network and we start to teach it with the use of one of those sophisticated and modern methods – for example we use the Levenberg-Marquardt algorithm. If we had that kind of problem, how easy the network can be taught with the use of this method – then the network would quickly be well trained. But if not so, and in case such modern algorithm will endlessly lead us astray and the network will not learn anything – it means such theoretical assumptions are not met from the beginning. On the contrary to the previous method, the backpropagation, which I shall present in this chapter, has such nice advantage that it works independently from whatsoever theoretical assumptions. It means that, contrary to the other clever algorithms which sometimes work, the backpropagation method always works. Of course, sometimes it may work irritatingly slow – but it will never let you down. It is worth to get to know the method, because those people who use the neural networks professionally often and willingly come back to this method as to a welltried out partner. The backpropagation method will be presented in action through the analysis of behaviour of another program which I will present here. Before it happens, however, we must come back to one detailed issue, which will be very important here and which has been a little neglected so far. I will tackle the problem of the shape of nonlinear characteristics used in testing neurons.