Document

advertisement
Multi layer feed-forward NN
ARCHITECTURE
Output
layer
Input
layer
Hidden Layer
Neural Networks
NN 3
1
A solution for the XOR problem
x1
x1
-1
-1
1
1
x2
-1
1
-1
1
1
x1 xor x2
-1
1
1
-1
-1
1
x2
-1
-1
0.1
+1
x1
+1
-1
(v) =
if v > 0
-1 if v  0
 is the sign function.
-1
x2
1
+1
+1
-1
Neural Networks
NN 3
2
NEURON MODEL
• Sigmoid Function
 (v j )
1
Increasing a
 (v j ) 
vj 
vj
-10 -8 -6 -4 -2
•
•
•
•
2
4
6
8
10
1
e
w
i 0,...,m
1
 av j
y
ji i
v j induced field of neuron j
Most common form of activation function
a     threshold function
Differentiable (important in theory)
Neural Networks
NN 3
3
LEARNING ALGORITHM
• Back-propagation algorithm
Function signals
Forward Step
Error signals
Backward Step
• It adjusts the weights of the NN in order to
minimize the average squared error.
Neural Networks
NN 3
4
Average Squared Error
• Error signal of output neuron j at presentation of n-th
training example: e (n)  d (n) - y (n)
j
• Total energy at
time n:
layer
j
E(n) 
1
2
e
jC
• Average squared error
Measure of learning
performance
j
EAV 
2
j
C: Set of neurons
(n)
in output
N: size
of training set
N
1
N
 E (n)
n 1
• Goal:
Adjust weights of NN
Neural Networks
NN 3to minimize EAV
5
Notation
e j Error at output of neuron j
yj
Output of neuron j
vj 
w
i 0,...,m
Neural Networks
ji yi
induced local
field of neuron j
NN 3
6
Weight Update
Update rule is based on the gradient descent method
take a step in the direction yielding the maximum
decrease of E
E
w ji  -
w ji
Step in direction opposite to the gradient
With w ji weight associated to the link from neuron i
to neuron j
Neural Networks
NN 3
7
Define the Local Gradient of
neuronejj (n)  d j (n) - y j (n)
E E(n)  e (n)
Local Gradient
j  jC
v j
1
2
We obtain
2
j
 j  e j  ( v j )
because
E
E e j y j


 e j ( 1) ' ( v j )
v j
e j y j v j
Neural Networks
NN 3
8
Update Rule
• We obtain
w ji   j yi
because
E
E v j

w ji
v j w ji
E

j
v j
Neural Networks
NN 3
v j
w ji
E
w ji  -
w ji
 yi
9
Compute local gradient of
neuron j
• The key factor is the calculation of ej
• There are two cases:
– Case 1): j is a output neuron
– Case 2): j is a hidden neuron
Neural Networks
NN 3
10
Error ej of output neuron
• Case 1: j output neuron
ej  dj - yj
Then
 j  (d j - y j ) ' (v j )
Neural Networks
NN 3
11
Local gradient of hidden
neuron
• Case 2: j hidden neuron
• the local gradient for neuron j is recursively
determined in terms of the local gradients of
all neurons to which neuron j is directly
connected
Neural Networks
NN 3
12
Use the Chain Rule
e j (n)  d j (n) - y j (n)
E y j
j  y j v j
y j
v j
E(n) 
  ' (v j )
E
e k

  ek

y j
y j
kC
from
We obtain
Neural Networks
e (n)
jC
2
j
  e k  v k
ek 

 y

v
kC
k
j


e k

  ' (vk )
v k
E


y j
1
2

kC
NN 3
k
v k
 w kj
y j
w kj
13
Local Gradient of hidden
neuron j
Hence
 j    ( v j ) k w kj
kC
w1j
j ’(vj)
1
’(v1)
e1
k
’(vk)
ek
’(vm)
em
wkj
wm j
m
Neural Networks
NN 3
Signal-flow
graph of
backpropagation
error signals
to neuron j
14
Delta Rule
• Delta rule wji = j yi
j 
  (v j )(dj  y j )
  ( v j ) k wkj
IF j output node
IF j hidden node
kC
C: Set of neurons in the layer following the one
containing j
Neural Networks
NN 3
15
Local Gradient of neurons
 ' ( v j )  ay j[1  y j ]
a>0
if j hidden node
ay
[1

y
]

w
j
j  k
kj
j 
k
ay j[1  y j ][d j  y j ] If j output node
Neural Networks
NN 3
16
Backpropagation algorithm
• Two phases of computation:
– Forward pass: run the NN and compute the error for
each neuron of the output layer.
– Backward pass: start at the output layer, and pass
the errors backwards through the network, layer by
layer, by recursively computing the local gradient of
each neuron.
Neural Networks
NN 3
17
TRAINING
• Sequential mode (on-line, pattern or
stochastic mode):
– (x(1), d(1)) is presented, a sequence of
forward and backward computations is
performed, and the weights are updated
using the delta rule.
– Same for (x(2), d(2)), … , (x(N), d(N)).
Neural Networks
NN 3
18
Training
• The learning process continues on an epochby-epoch basis until the stopping condition is
satisfied.
• From one epoch to the next choose a
randomized ordering for selecting examples in
the training set.
Neural Networks
NN 3
19
Stopping criterions
• Sensible stopping criterions:
– Average squared error change:
Back-prop is considered to have
converged when the absolute rate of
change in the average squared error per
epoch is sufficiently small (in the range
[0.1, 0.01]).
– Generalization based criterion:
After each epoch the NN is tested for
generalization. If the generalization
performance is adequate then stop.
Neural Networks
NN 3
20
Generalization
• Generalization: NN generalizes well if the I/O
mapping computed by the network is nearly
correct for new data (test set).
• Factors that influence generalization:
– the size of the training set.
– the architecture of the NN.
– the complexity of the problem at hand.
• Overfitting (overtraining): when the NN
learns too many I/O examples it may end up
memorizing the training data.
Neural Networks
NN 3
21
Expressive capabilities of NN
Boolean functions:
• Every boolean function can be represented by
network with single hidden layer
• but might require exponential (in number of inputs)
hidden units
Continuous functions:
• Every bounded continuous function can be
approximated with arbitrarily small error, by network
with one hidden layer
• Any function can be approximated to arbitrary
accuracy by a network with two hidden layers
Neural Networks
NN 3
22
Generalized delta rule
• If  small  Slow rate of learning
If  large  Large changes of weights
 NN can become unstable
(oscillatory)
• Method to overcome above drawback:
include a momentum term in the delta
rule
Generalized
w ji (n)  w ji (n  1)  j (n)yi (n)
delta
function
momentum constant
Neural Networks
NN 3
23
Generalized delta rule
• the momentum accelerates the descent in steady
downhill directions.
• the momentum has a stabilizing effect in
directions that oscillate in time.
Neural Networks
NN 3
24
 adaptation
Heuristics for accelerating the convergence of
the back-prop algorithm through  adaptation:
• Heuristic 1: Every weight should have its own .
• Heuristic 2: Every  should be allowed to vary from
one iteration to the next.
Neural Networks
NN 3
25
NN DESIGN
•
•
•
•
•
Data representation
Network Topology
Network Parameters
Training
Validation
Neural Networks
NN 3
26
Setting the parameters
• How are the weights initialized?
• How is the learning rate chosen?
• How many hidden layers and how many
neurons?
• How many examples in the training set?
Neural Networks
NN 3
27
Initialization of weights
• In general, initial weights are randomly chosen, with
typical values between -1.0 and 1.0 or -0.5 and 0.5.
• If some inputs are much larger than others, random
initialization may bias the network to give much more
importance to larger inputs. In such a case, weights
can be initialized as follows:
w ji   21N
wkj   21N
Neural Networks

For weights from the input to
the first layer
1
|x i |
i 1,...,N

i 1,...,N
(
1
w ji x )

i
For weights from the first to the
second layer
NN 3
28
Choice of learning rate
• The right value of  depends on the
application. Values between 0.1 and 0.9
have been used in many applications.
• Other heuristics adapt  during the training
as described in previous slides.
Neural Networks
NN 3
29
How many layers and neurons
• The number of layers and of neurons
depend on the specific task. In practice this
issue is solved by trial and error.
• Two types of adaptive algorithms can be
used:
– start from a large network and successively
remove some neurons and links until network
performance degrades.
– begin with a small network and introduce new
neurons until performance is satisfactory.
Neural Networks
NN 3
30
How many examples
• Rule of thumb:
– the number of training examples should be at
least five to ten times the number of weights of
the network.
• Other rule:
|W|
N
(1 - a)
Neural Networks
|W|= number of weights
a=expected accuracy on test set
NN 3
31
Download