Artificial Neural Networks

advertisement
Chapter 9. Artificial Neural Networks
Biological Neurons and Neural Networks
The human brain consists of a densely interconnected network of around 10 billion neurons, about the same
number of stars in a typical galaxy, and there are more than 100 billion galaxies in the universe. The brain’s
neural network provides it with enormous processing power enabling it to perform computationally
demanding perceptual acts, such as face recognition, speech recognition. The brain also provides us with
advanced control functions, such as walking, but perhaps most importantly is that it can learn how to do all
of this. Whereas a modern CPU chip’s performance derives from its raw speed (3GHz = 3,000,000,000 Hz),
a neural network is slow, computing with a frequency of 10-100 Hz. The brain’s performance derives from
the massive parallelism of the neural network, where each of the 10 billion neurons may be connected to
around 1000 other neurons.
Figure 1.
To get an idea of the structure of a biological neural network, glance at Figure 1 which is a sketch of
asection through an animal retina, one of the first ever visualisations of a neural network produced by Golgi
and Cajal who received a Nobel Prize in 1906. You can see roundish neurons with their output axons. Some
leave the area (those at the bottom which form the ‘optic nerve’) and other axons input into other neurons
via their input connections called dendrites. Neuron e receives its input from four other neurons and sends
its output down to the optic nerve. So we see that a neuron has several inputs, a body and one output which
connects to other neurons.
It was realised that neurons function electrically by sending electrical signals down their axons, based on
flows of potassium and sodium ions. This signal is in the form of a pulse (rather like the sound of a hand
clap). A single neuron can only emit a pulse (“fires”) when the total input is above a certain threshold. This
characteristic led to the McCulloch and Pitts model (1943) of the artificial neural network (ANN).
Glance briefly at Figure 2. which illustrates how learning occurs in a biological neural network. It is
assumed here that both cells A and B fire often and simultaneously over time. When this condition occurs,
then the strength of synaptic connexion between A and B increases. This concept was contributed by Hebb
(1949) and is known as Hebbian Learning. The idea is that if two neurons are simultaneously active, then
their interconnection should be strengthened.
The Mathematics of a Single Artificial Neuron.
The McCulloch–Pitts model of a single neuron is sketched in Figure 3. We can think about the calculation as
proceeding in two parts, input processing then the calculation of the output. The inputs to the neuron body
are shown as 𝑥1 , 𝑥2 , … 𝑥𝑁 . When the input (from the previous neurons) reach this neuron, then their values
are multiplied by the synaptic strength. As mentioned above, these strengths are dependent on how much
learning has occurred. These strengths are called weights and are represented by 𝑤1 etc. So a single input is
calculated as
𝑤1 𝑥1
So the total weighted input to the neuron is
𝑤1 𝑥1 + 𝑤2 𝑥2 + 𝑤3 𝑥3 + ⋯ + 𝑤𝑁 𝑥𝑁
This total input is then passed on to the output-calculation part of the neuron. Here, if the total input is
greater than some threshold 𝜃, then the neuron will ‘fire’ and produce an output. This is shown in Figure 4
where the mathematical function the “activation function” used to model the output calculation is a simple
step function. Note that the output of the neuron is effectively binary; fire or no-fire 0 or 1. While it is not
directly relevant to our work here, it is interesting to note that correct choices of weights and thresholds
make the neuron behave like logic gates, especially AND and NOT. Given that all CPUs can be described as
a combination of AND and NOT gates, we conclude that neural nets can provide all the computations
performed on PCs.
It is possible to relax the binary nature of the output, to obtain real numbers (such as decimals). This is done
using the ‘sigmoid’ function which is illustrated in Figure 5. The parameter k adjusts the slope of the
transition between fire and no-fire states, as shown in Figure 5. The mathematical description of the sigmoid
function is
1
𝑓(𝑎) = 1+𝑒 −𝑘𝑎
It is clear that choosing a large value of k approximates the binary McCulloch-Pitts neuron more closely.
Figure 5. Sigmoid Functions. Left has k = 10, right has k = 5. Large k means steep slope.
Neural Networks and Braitenberg Vehicles
In order to understand how individual neurons may be connected together to form neural networks, let’s
consider the simple example of a “Braitenberg Vehicle”. Braintenberb is a … He suggested his vehicles to
demonstrate that apparent purposive behaviour does not need to have a representation of the external
environment in a creature’s brain. Rather behaviour can obtain by simply reacting to the environment in a
structured manner.
Let’s have a look at the vehicles shown in Figure 6 (from his book). Each vehicle has two eye sensors and
two motor actuators. In his vehicle (a) the right eye is connected to the right motor and vice versa. As
located in the diagram, the right eye receives more light than the left eye, since it is close to the light. So the
drive to the right motor is greater than the drive to the left. So the vehicle moves away from the light. On the
other hand, in his vehicle (b) the eyes and motors are cross-coupled, so now, when the left eye receives more
light, it gives the right motor more drive, so the vehicle moves towards the light.
Figure 6
How does this relate to neural networks? Well each eye can be considered as a sensor neuron and each
motor is driven by an actuator neuron. So we have a basic 2-input 2-output neural network shown in Figure
7(a). This is indeed a simple network, yet it turns out that it can be trained to function as an AND gate, a
NAND gate, an OR gate, and a NOR gate, as well as a NOT gate. However it cannot ever function as an
XOR gate. Fortunately more complex neural networks can be designed as shown in Figure 7(b). Here there
is an input layer of neurons, an output layer but now an in-between layer, the hidden layer. Each layer may
contain an arbitrary number of neurons (which is related to the application), but in general the hidden layer
contains more neurons than either the input or the output layers.
The mathematics of the network is simple, it’s just a question of using the output of each neuron connected
to the input of the neuron in the next layer, and computing the weighted sums as above. One important point
to note about the networks sketched in Figure 7(b). All neurons in the input layer are connected to all
neurons in the hidden layer, and all neurons in the hidden layer are connected to all neurons in the output
layer. Of course the strength of each connection may not be the same, and this is where learning comes into
play.
Learning by Back-Propagation of Errors.
We saw above that we could construct a 2-in/2-out network to elicit two different behaviours of the vehicle.
While this is interesting, it is not the point of ANNs, since as we mentioned above, the biological neural
networks in our brain are able to learn. So can we construct an ANN which can learn the two behaviours
discussed above? Of course the answer is we can! We shall explore how to train and ANN, causing it to
learn in this section. Look at Figure 8. On the left is an untrained network for the vehicle problem. Each
input neuron is connected to each output neuron with a random weight. To train the vehicle to move towards
a light, we need to strengthen the “cross” connections and weaken the “direct” connections between the left
and right eyes and actuators.
We do this by applying a training set of inputs and outputs to the net, and adjusting the weights to make the
applied values agree. Specifically we apply an input pair of values, and the net produces an output pair. We
compare the net’s output pair with the pair we require and use the error between the two to change the
network weights. (How we do this is explained below). This process (a “learning cycle”) is repeated many
times, until the error is sufficiently small.
For the above example, we need the following training set
Training
Set
A
B
Input to
Left Eye
Neuron
0.1
0.9
Input to
Right Eye
Neuron
0.9
0.1
Output to
Left Motor
Neuron
0.9
0.1
Output to
Right Motor
Neuron
0.1
0.9
This set is applied to the ANN when it is in the learning state hundreds or thousands of times. We can
monitor the “mean squared error” (MSE) between actual and desired outputs, and stop the learning when
this is at an acceptable level. The MSE is the average error over all the output neurons, the “squared” is to
prevent negative errors from cancelling out positive errors.
The mathematics of the back-propagation of errors is straightforward. Let’s say for output neuron i, the
desired output is 𝑑𝑖 and the actual output is 𝑜𝑖 which gives us an error (𝑑𝑖 − 𝑜𝑖 ). So we change the weight of
each neuron j feeding into this output neuron (with value 𝑥𝑗 ) according to the expression
∆𝑤𝑗 = 𝜌(𝑑𝑖 − 𝑜𝑖 )𝑥𝑗
where 𝜌 is the rate of learning. While it is not possible to explain how this expression is derived, it is
straightforward to understand. The change in weight ∆𝑤𝑗 of each neuron feeding into neuron i is determined
both by the error on neuron i and by the value of the input neuron j. This is Hebbian Learning as discussed
above. The synaptic strength is determined by the simultaneous activity of pairs of neurons. The effect of
changing the weight by the amount ∆𝑤𝑗 is to reduce the output error (𝑑𝑖 − 𝑜𝑖 ).
Coding of some Simple ANNs
[next instance]
Applications of Neural Networks
[next instance]
Download