coppin chapter 11e.ppt

advertisement
Chapter 11
Neural Networks
1
Chapter 11 Contents (1)





Biological Neurons
Artificial Neurons
Perceptrons
Multilayer Neural Networks
Backpropagation
2
Chapter 11 Contents (2)






Recurrent Networks
Hopfield Networks
Bidirectional Associative Memories
Kohonen Maps
Hebbian Learning
Evolving Neural Networks
3
Biological Neurons

The human brain
is made up of
billions of simple
processing units
– neurons.

Inputs are received on dendrites, and if the
input levels are over a threshold, the neuron
fires, passing a signal through the axon to
the synapse which then connects to another
neuron.
4
Artificial Neurons (1)




Artificial neurons are based on biological neurons.
Each neuron in the network receives one or more
inputs.
An activation function is applied to the inputs,
which determines the output of the neuron – the
activation level.
The charts on
the right show
three typical
activation
functions.
5
Artificial Neurons (2)

A typical activation function works as
follows:

Each node i has a weight, wi associated with
it. The input to node i is xi.
t is the threshold.
So if the weighted sum of the inputs to the
neuron is above the threshold, then the
neuron fires.


6
Example
Two inputs to a neuron, x1 and x2
Weights are 0.8 and 0,4
X = (0.8 X 0.7) + (0.4 X 0.9) = 0.92
+1 for X > t
Y=
0 for X ≤ t
7
Perceptrons (1)



A perceptron is a single neuron that
classifies a set of inputs into one of two
categories (usually 1 or -1).
If the inputs are in the form of a grid, a
perceptron can be used to recognize visual
images of shapes.
The perceptron usually uses a step
function, which returns 1 if the weighted
sum of inputs exceeds a threshold, and –1
otherwise.
8
Perceptrons (1.5)
Typically used to classify into two
categories, 1 and 0.
9
Perceptrons (2)

The perceptron is trained as follows:
First, inputs are given random weights
(usually between –0.5 and 0.5).
An item of training data is presented. If the
perceptron mis-classifies it, the weights are
modified according to the following:

e is the size of the error, and a is the learning
rate, between 0 and 1.
10
Classify OR
Start w1 = .5, w2 = .5
x1 = 0, x2 = 0, result should be 0
compute .5*0+.5*0 = 0 good
 Now consider x1 = 1, x2 = 0,
 .5*1+.5*0 = .5 if t < .5 good.
…

11
Classify OR
If we had started with -0.2 and 0.4 we
would have had to adjust the weights
to yield the correct results according
to:
 wi = wi + (a * xi * e) a learning rate
e error


See table 11.1
12
Perceptrons (3)



Perceptrons can only classify linearly
separable functions.
The first of the following graphs shows a
linearly separable function (OR).
The second is not linearly separable
(Exclusive-OR).
13
Multilayer Neural Networks



Multilayer neural networks can classify a range
of functions, including non linearly separable
ones.
Each input layer neuron
connects to all neurons
in the hidden layer.
The neurons in the
hidden layer connect to
all neurons in the output
A feed-forward network
layer.
14
Backpropagation (1)



Multilayer neural networks learn in the same way
as perceptrons.
However, there are many more weights, and it is
important to assign credit (or blame) correctly
when changing weights.
Backpropagation networks use the sigmoid
activation function, as it is easy to differentiate:
15
Backpropagation (2)



For node j, Xj is the output
Yj is the output
n is the number of inputs to node j
j is the threshold for j
After values are fed forward through the network,
errors are fed back to modify the weights in order
to train the network.
For each node, we calculate an error gradient.
16
Backpropagation (3)





For a node k in the output layer, the error ek is the
difference between the desired output and the
actual output.
The error gradient for k is:
Similarly, for a node j in the
hidden layer:
Now the weights are
updated as follows:
 is the learning rate, (a positive number
below 1)
17
Recurrent Networks



Feed forward networks do not have
memory.
Recurrent networks can have connections
between nodes in any layer, which enables
them to store data – a memory.
Recurrent networks can be used to solve
problems where the solution depends on
previous inputs as well as current inputs
(e.g. predicting stock market movements).
18
Hopfield Networks







A Hopfield Network is a recurrent network.
Use a sign activation function:
If a neuron receives a 0 as an input it does not
change state.
Inputs are usually represented as matrices.
The network is trained to represent a set of
attractors, or stable states.
Any input will be mapped to an output state
which is the attractor closest to the input.
A Hopfield network is autoassociative – it can
only associate an item with itself or a similar19
one.
Hopfield Network (2)

Weights are represented as a matrix
N

W=∑
t
Xi Xi –
NI
i=1
20
Bidirectional Associative Memories
A BAM is a heteroassociative memory:
Like the brain, it can learn to
associate one item with another
completely unrelated item.
 The network consists of two fully
connected layers of nodes – every node
in one layer is connected to every node
in the other layer.

21
Kohonen Maps



An unsupervised learning system.
Two layers of nodes: an input layer and a
cluster (output) layer.
Uses competitive learning:
 Every input is compared with the weight vectors of each
node in the cluster node.
 The node which most closely matches the input, fires.
This is the classification of the input.
 Euclidean distance is used.
 The winning node has its weight vector modified to be
closer to the input vector.
22
Kohonen Maps (example)
The nodes in the cluster layer are
arranged in a grid, as shown:
The diagram on the left
shows the training
data.
Initially, the weights are arranged
as shown here:
23
Kohonen Maps (example)
After training, the weight vectors
have been rearranged to match
the training data:
24
Hebbian Learning (1)
Hebb’s law:
“When an axon of cell A is near enough to
excite a cell B and repeatedly or persistently
takes part in firing it, some growth process
or metabolic change takes place in one or
both cells such that A’s efficiency, as one of
the cells firing B, is increased”.
 Hence, if two neurons that are connected
together fire at the same time, the weights
of the connection between them is
strengthened.

25
Hebbian Learning (2)

The activity product rule is used to modify the
weights of a connection between two nodes that
fire at the same time:

 is the learning rate; xi is the input to node i and yi
is the output of node i.
Hebbian networks usually also use a forgetting
factor, which decreases the weight of the
connection between if two nodes if they fire at
different times.

26
Evolving Neural Networks
Neural networks can be susceptible
to local maxima.
 Evolutionary methods (genetic
algorithms) can be used to determine
the starting weights for a neural
network, thus avoiding these kinds of
problems.

27
Download