Chapter 9. Artificial Neural Networks Biological Neurons and Neural Networks The human brain consists of a densely interconnected network of around 10 billion neurons, about the same number of stars in a typical galaxy, and there are more than 100 billion galaxies in the universe. The brain’s neural network provides it with enormous processing power enabling it to perform computationally demanding perceptual acts, such as face recognition, speech recognition. The brain also provides us with advanced control functions, such as walking, but perhaps most importantly is that it can learn how to do all of this. Whereas a modern CPU chip’s performance derives from its raw speed (3GHz = 3,000,000,000 Hz), a neural network is slow, computing with a frequency of 10-100 Hz. The brain’s performance derives from the massive parallelism of the neural network, where each of the 10 billion neurons may be connected to around 1000 other neurons. Figure 1. To get an idea of the structure of a biological neural network, glance at Figure 1 which is a sketch of asection through an animal retina, one of the first ever visualisations of a neural network produced by Golgi and Cajal who received a Nobel Prize in 1906. You can see roundish neurons with their output axons. Some leave the area (those at the bottom which form the ‘optic nerve’) and other axons input into other neurons via their input connections called dendrites. Neuron e receives its input from four other neurons and sends its output down to the optic nerve. So we see that a neuron has several inputs, a body and one output which connects to other neurons. It was realised that neurons function electrically by sending electrical signals down their axons, based on flows of potassium and sodium ions. This signal is in the form of a pulse (rather like the sound of a hand clap). A single neuron can only emit a pulse (“fires”) when the total input is above a certain threshold. This characteristic led to the McCulloch and Pitts model (1943) of the artificial neural network (ANN). Glance briefly at Figure 2. which illustrates how learning occurs in a biological neural network. It is assumed here that both cells A and B fire often and simultaneously over time. When this condition occurs, then the strength of synaptic connexion between A and B increases. This concept was contributed by Hebb (1949) and is known as Hebbian Learning. The idea is that if two neurons are simultaneously active, then their interconnection should be strengthened. The Mathematics of a Single Artificial Neuron. The McCulloch–Pitts model of a single neuron is sketched in Figure 3. We can think about the calculation as proceeding in two parts, input processing then the calculation of the output. The inputs to the neuron body are shown as 𝑥1 , 𝑥2 , … 𝑥𝑁 . When the input (from the previous neurons) reach this neuron, then their values are multiplied by the synaptic strength. As mentioned above, these strengths are dependent on how much learning has occurred. These strengths are called weights and are represented by 𝑤1 etc. So a single input is calculated as 𝑤1 𝑥1 So the total weighted input to the neuron is 𝑤1 𝑥1 + 𝑤2 𝑥2 + 𝑤3 𝑥3 + ⋯ + 𝑤𝑁 𝑥𝑁 This total input is then passed on to the output-calculation part of the neuron. Here, if the total input is greater than some threshold 𝜃, then the neuron will ‘fire’ and produce an output. This is shown in Figure 4 where the mathematical function the “activation function” used to model the output calculation is a simple step function. Note that the output of the neuron is effectively binary; fire or no-fire 0 or 1. While it is not directly relevant to our work here, it is interesting to note that correct choices of weights and thresholds make the neuron behave like logic gates, especially AND and NOT. Given that all CPUs can be described as a combination of AND and NOT gates, we conclude that neural nets can provide all the computations performed on PCs. It is possible to relax the binary nature of the output, to obtain real numbers (such as decimals). This is done using the ‘sigmoid’ function which is illustrated in Figure 5. The parameter k adjusts the slope of the transition between fire and no-fire states, as shown in Figure 5. The mathematical description of the sigmoid function is 1 𝑓(𝑎) = 1+𝑒 −𝑘𝑎 It is clear that choosing a large value of k approximates the binary McCulloch-Pitts neuron more closely. Figure 5. Sigmoid Functions. Left has k = 10, right has k = 5. Large k means steep slope. Neural Networks and Braitenberg Vehicles In order to understand how individual neurons may be connected together to form neural networks, let’s consider the simple example of a “Braitenberg Vehicle”. Braintenberb is a … He suggested his vehicles to demonstrate that apparent purposive behaviour does not need to have a representation of the external environment in a creature’s brain. Rather behaviour can obtain by simply reacting to the environment in a structured manner. Let’s have a look at the vehicles shown in Figure 6 (from his book). Each vehicle has two eye sensors and two motor actuators. In his vehicle (a) the right eye is connected to the right motor and vice versa. As located in the diagram, the right eye receives more light than the left eye, since it is close to the light. So the drive to the right motor is greater than the drive to the left. So the vehicle moves away from the light. On the other hand, in his vehicle (b) the eyes and motors are cross-coupled, so now, when the left eye receives more light, it gives the right motor more drive, so the vehicle moves towards the light. Figure 6 How does this relate to neural networks? Well each eye can be considered as a sensor neuron and each motor is driven by an actuator neuron. So we have a basic 2-input 2-output neural network shown in Figure 7(a). This is indeed a simple network, yet it turns out that it can be trained to function as an AND gate, a NAND gate, an OR gate, and a NOR gate, as well as a NOT gate. However it cannot ever function as an XOR gate. Fortunately more complex neural networks can be designed as shown in Figure 7(b). Here there is an input layer of neurons, an output layer but now an in-between layer, the hidden layer. Each layer may contain an arbitrary number of neurons (which is related to the application), but in general the hidden layer contains more neurons than either the input or the output layers. The mathematics of the network is simple, it’s just a question of using the output of each neuron connected to the input of the neuron in the next layer, and computing the weighted sums as above. One important point to note about the networks sketched in Figure 7(b). All neurons in the input layer are connected to all neurons in the hidden layer, and all neurons in the hidden layer are connected to all neurons in the output layer. Of course the strength of each connection may not be the same, and this is where learning comes into play. Learning by Back-Propagation of Errors. We saw above that we could construct a 2-in/2-out network to elicit two different behaviours of the vehicle. While this is interesting, it is not the point of ANNs, since as we mentioned above, the biological neural networks in our brain are able to learn. So can we construct an ANN which can learn the two behaviours discussed above? Of course the answer is we can! We shall explore how to train and ANN, causing it to learn in this section. Look at Figure 8. On the left is an untrained network for the vehicle problem. Each input neuron is connected to each output neuron with a random weight. To train the vehicle to move towards a light, we need to strengthen the “cross” connections and weaken the “direct” connections between the left and right eyes and actuators. We do this by applying a training set of inputs and outputs to the net, and adjusting the weights to make the applied values agree. Specifically we apply an input pair of values, and the net produces an output pair. We compare the net’s output pair with the pair we require and use the error between the two to change the network weights. (How we do this is explained below). This process (a “learning cycle”) is repeated many times, until the error is sufficiently small. For the above example, we need the following training set Training Set A B Input to Left Eye Neuron 0.1 0.9 Input to Right Eye Neuron 0.9 0.1 Output to Left Motor Neuron 0.9 0.1 Output to Right Motor Neuron 0.1 0.9 This set is applied to the ANN when it is in the learning state hundreds or thousands of times. We can monitor the “mean squared error” (MSE) between actual and desired outputs, and stop the learning when this is at an acceptable level. The MSE is the average error over all the output neurons, the “squared” is to prevent negative errors from cancelling out positive errors. The mathematics of the back-propagation of errors is straightforward. Let’s say for output neuron i, the desired output is 𝑑𝑖 and the actual output is 𝑜𝑖 which gives us an error (𝑑𝑖 − 𝑜𝑖 ). So we change the weight of each neuron j feeding into this output neuron (with value 𝑥𝑗 ) according to the expression ∆𝑤𝑗 = 𝜌(𝑑𝑖 − 𝑜𝑖 )𝑥𝑗 where 𝜌 is the rate of learning. While it is not possible to explain how this expression is derived, it is straightforward to understand. The change in weight ∆𝑤𝑗 of each neuron feeding into neuron i is determined both by the error on neuron i and by the value of the input neuron j. This is Hebbian Learning as discussed above. The synaptic strength is determined by the simultaneous activity of pairs of neurons. The effect of changing the weight by the amount ∆𝑤𝑗 is to reduce the output error (𝑑𝑖 − 𝑜𝑖 ). Coding of some Simple ANNs [next instance] Applications of Neural Networks [next instance]