2.2 How to make an artificial neuron? (translation by Agata Barabasz agata.barabasz@op.pl) The basic “building materials” that we use to create a neural network are artificial neurons. Now we will try to learn about them more precisely. In the previous chapter you have seen some pictures illustrating the shape of a biological neuron, but it will not harm to recall one more picture, so see in picture 2.4, how an exemplary neuron (in simplification) looks like. Picture 2.4. The orientation structure of a biological nerve cell (a neuron). So that you do not think that all real neurons look exactly like that, in picture 2.5 I am showing you one more illustration of a real biological neuron, dissected free of a rat’s cerebral cortex. Picture 2.5 The view of a microscopic preparation of a real neuron It is hard in this picture to guess, which of the many visible on it fibres is an axon, which is always single and as the only one delivers signals from the given neuron to all the others, and which are performing the role of dendrites. Nevertheless, this is also a real biological neuron, thus also such a cell as this one our artificial neuron has to map well, and of which we will take care now more precisely. Artificial building neurons used in the networks technique are of course very simplified models of nerve cells, that occur in nature. A structure of an artificial neuron best illustrates the scheme presented in picture 2.6. Comparing this illustration with pictures 2.4 or 2.5 you will realise how far neural networks’ researchers simplify biological reality. Picture 2.6 A general scheme of an artificial neuron shows the degree of it’s simplification However, in spite of this simplifications artificial neurons keep all these features, which are valid from the point of view of tasks we want to entrust them within built networks, being the computer science tools, rather than biology models. Firstly, they are characterised by having many inputs and one output. The input signals xi (i = 1,2,…,n) and the output signal y may take on only numerical values, generally of the range from 0 to 1 ( sometimes also from –1 to + 1), whereas the fact that within the tasks being solved by networks they represent some information (e.g. as the output of a decision, who has been recognised by the neural network, which has been analysing someone’s photo), is the result of a specific agreement. Generally particular meanings are ascribed to network’s input and output signals in such a way that the most crucial is this, on which input or output a given signal has occurred (each input and output is associated with a specific meaning of a signal), additionally signals scaling is used, so selected that signal values that would be circulating in a network, would not be out of an agreed range – e.g. from 0 to 1. Secondly – artificial neurons perform specific activities on signals, which they receive on inputs, as a consequence they produce signals (only one by each single neuron), which are present on their outputs and are sent forward (to other neurons, or onto this network’s output, as the solution of a raised problem). Network’s assignment, reduced to the functioning of it’s basic element, which is a neuron, is based on this that it transforms an input data xi into a result y applying rules resulting from that how it has been built, and what has been taught. Considered up to this point neuron’s properties have been illustrated on picture 2.7. x1 x2 y ... xn Picture 2.7 Basic signals occurring in a neuron Thirdly – neurons may learn. This purpose serve wi coefficients called synaptic weights. As you certainly remember from the previous chapter – these reflect rather complicated biochemical and bioelectric processes, which take place in real biological neuron’s synapses. From further considerations point of view the most significant is that synaptic weights can be modified (i.e. their values can be changed), x1 x2 xn w1 w2 y ... wn Picture 2.8 Adding to a neuron’s structure adjustable weight’s coefficients makes it a learnable unit what constitutes a basis for teaching networks. A scheme of a neuron capable of learning has been shown on picture 2.8. Summing up this discourse it could be ascertained that artificial neurons can be treated as elementary processors with the following features: each neuron receives many input signals xi and on their basis determines it’s own “answer” y, that is produces one output signal; with each separated neuron’s input is connected a parameter called weight wi . This name means that it expresses a degree of significance of an information arriving to this neuron through just this input; a signal, coming in through a particular input is first modified with the use of the weight of that given input. Most often a modification is based on this that a signal is simply multiplied through the weight of a given input, so in consequence in further calculations it is already participating in the modified form: strengthened ( if the weight is greater than 1) or restrained (if the weight’s value is less than 1). A signal from a particular input may occur even in the form opposite in relation to signals from the other inputs, if it’s weight has a negative value. Inputs with negative weights are among neural networks users defined as so called inhibitory inputs, whereas these with positive weights are called excitatory inputs. input signals (modified by adequate weights) are aggregated in a neuron (see picture 2.9). Once again considering networks in general, many ways of input signals aggregation may be given, nevertheless most often it is based on this that signals are simply summed up giving as the result some helpful internal signal, called a cumulative neuron stimulation or a postsynaptic stimulation. This signal may be also defined as a net value. x1 x2 xn w1 w2 ... wn s g wi , xi y i 1,,n Picture 2.9. An aggregation of input data as the first of neuron’s internal functions to so created sum of signals the neuron adds sometimes (not in all networks’ types, but generally often) some extra component independent of input signals, called a bias. A bias, if it is taken into account, also undergoes a learning process, that is why sometimes one can imagine, that a bias is an extra synaptic weight associated with the input, on which it is provided an internal signal of constant value equal to 1. A bias role lies in this that thanks to it’s presence during a learning process a neuron’s properties may be formed in a much more free way (without having it the aggregation function characteristics always must pass through the beginning of the coordinate system, what sometimes is a burdensome “ anchor”). A scheme of a neuron, in which a bias has been taken into account, is shown in picture 2.10; Picture 2.10. The application of the additional parameter, which is bias A sum of internal signals multiplied by weights plus (possibly) a bias may be sometimes sent directly to it’s axon and treated as a neuron’s output signal. In many types of networks that is enough. In this way work so called linear networks (for example a net named ADALINE = ADaptive LINEar). However, in networks with richer abilities (for example in very popular networks called MLP from the words Multi–Layer Perceptron) a neuron’s output signal is calculated by means of some nonlinear function. This function in the whole book we will be designating with the symbol ƒ( ) or φ ( ). A scheme of a neuron including both an input signals’ aggregation and an output signal’s generation is presented in picture 2.11; Picture 2.11. The full number of neuron’s internal functions a function φ ( ) is called a characteristic of a neuron (a transfer function). There are known many different neuron’s characteristics, what illustrates picture 2.12 Some of them are chosen in a such way that artificial neuron’s behaviour would be the most similar to a real biological neuron’s behaviour (a sigmoid function), but they also could be selected in such manner, which would assure the maximum efficiency of computations carried on by a neural network (a Gauss function). In all the cases function φ ( ) constitutes an important element going between a joint stimulation of a neuron and it’s output signal; Picture 2.12. Some of the more often used neuron’s characteristics a knowledge of the input signals, weights’ coefficients, inputs aggregation method and neuron’s characteristic, allow to unequivocally define at any time it’s output signal, with usual assuming that (in contrast to what takes place in real neurons) this process occurs immediately. Thanks to this in artificial neural networks changes of input signals are practically immediately appearing on output. Of course this is a clearly theoretical assumption, because after input signals change even in electronic realization some time would be needed for establishing the right value of an output signal by an adequate integrated circuit.. . Much more time would be necessary to achieve the same effect in a net working as a simulation model, because a computer imitating network activities must then calculate all values of all signals on all neurons outputs of this network, what even on very fast computers could take a lot of time. While speaking about a prompt neuron’s action I mean that considering network’s functioning we will not pay attention to a factor, which is a time of neuron’s reaction, because this will be insignificant for us. A complete structure of a single neuron is presented in picture 2.13. x1 x2 w1 w2 wn xn es y neuron liniowy linear neuron Picture 2.13. Structure of a neuron as a processor, which is the basis for building neural networks A neuron presented in this picture is the most typical “material”, which is used for creating a network. More precisely – such typical “material” is a neuron of a network defined as MLP (Multi–Layer Perceptron), the most crucial elements of which I have collected and presented in picture 2.14. It is visible in this picture that neuron MLP is characterised by the aggregation function consisting of simple summing up the input signals multiplied by weights, and uses a nonlinear transfer function with a distinctive sigmoid shape. x1 x2 xn w1 w2 ... wn y:=1/(1+exp(-0.5*x)) 1.1 0.9 n s wx i 1 i y 0.7 0.5 i 0.3 0.1 -0.1 -10 -5 0 5 10 Picture 2.14 The most popular component of neural networks – the MLP type neuron However sometimes in neural networks for special purposes there are used so called radial neurons. They have an atypical method of input data aggregation, and they also use an untypical characteristic (a Gauss’s one) and are taught in an unusual way. At this moment I do not intend to elaborate on a subject of this specific neurons, which are used mainly to create special networks called RBF (Radial Basis Functions), but in picture 2.15 I present a scheme of such a radial neuron, to enable you to make a comparison with discussed earlier a typical neuron shown in picture 2.14. 1 t1 x1 r1 f ... xn t n f y x-t r n In this type of neuron the aggregation of the input signals consists of evaluating the distance between the present input vector X and the centroid of a certain subset T determined during a teaching process Also a nonlinear transfer function in these neurons has a different form – ”bell-shaped” gaussoid - i.e. it is a non-monotonic function. Picture 2.15 A structure and peculiar properties of a radial neuron, denoted also as RBF