Supervised Learning VII. Neural Networks and Learning Part 7: Neural Networks & Learning

advertisement
Part 7: Neural Networks & Learning
11/24/04
Supervised Learning
• Produce desired outputs for training inputs
• Generalize reasonably & appropriately to
other inputs
• Good example: pattern recognition
• Feedforward multilayer networks
VII. Neural Networks
and Learning
11/24/04
1
11/24/04
2
Typical Artificial Neuron
Feedforward Network
input
layer
11/24/04
hidden
layers
...
...
...
...
...
...
connection
weights
inputs
output
layer
output
threshold
3
11/24/04
4
1
Part 7: Neural Networks & Learning
11/24/04
Typical Artificial Neuron
linear
combination
Equations
activation
function
n
hi = w ij s j j=1
h = Ws Net input:
net input
(local field)
si = ( hi )
Neuron output:
s = (h)
11/24/04
5
11/24/04
6
Single-Layer Perceptron
Variables
x1
xj
...
...
w1
h
y
wn
xn
11/24/04
wj
7
11/24/04
8
2
Part 7: Neural Networks & Learning
11/24/04
Single Layer Perceptron
Equations
2D Weight Vector
w2
w x = w x cos Binary threshold activation function :
cos =
1, if h > 0
( h ) = ( h ) = 0, if h 0
x
–
v
x
+
w
w x = w v
1, if w x > j j
j
Hence, y = 0, otherwise
w x > 1, if w x > =
0, if w x w v >
v > w
11/24/04
9
N-Dimensional Weight Vector
v
w1
w
11/24/04
10
Goal of Perceptron Learning
+
w
normal
vector
separating
hyperplane
• Suppose we have training patterns x1, x2,
…, xP with corresponding desired outputs
y1, y2, …, yP
• where xp {0, 1}n, yp {0, 1}
• We want to find w, such that
yp = (wxp – ) for p = 1, …, P
–
11/24/04
11
11/24/04
12
3
Part 7: Neural Networks & Learning
11/24/04
Treating Threshold as Weight
Treating Threshold as Weight
n
h = w j x j j=1
x1
x1
n
= + w j x j
w1
xj
h
w1
y
xj
wn
xn
n
j=1
wj
n
h = w j x j j=1
x0 = –1
j=1
wj
= + w j x j
= w0
h
y
wn
xn
Let x 0 = 1 and w 0 = n
n
˜ x˜
h = w 0 x 0 + w j x j = w j x j = w
j=1
11/24/04
13
14
Reformulation as Positive
Examples
Augmented Vectors
w
˜ = 1
w
M wn j= 0
11/24/04
We have positive (y p = 1) and negative (y p = 0) examples
1
p
x1
x˜ p = M
p
xn ˜ x˜ p > 0 for positive, w
˜ x˜ p 0 for negative
Want w
Let z p = x˜ p for positive, z p = ˜x p for negative
˜ z p 0, for p = 1,K,P
Want w
˜ x˜ ), p = 1,K,P
We want y = ( w
p
11/24/04
p
Hyperplane through origin with all z p on one side
15
11/24/04
16
4
Part 7: Neural Networks & Learning
11/24/04
Outline of
Perceptron Learning Algorithm
Adjustment of Weight Vector
z5
1. initialize weight vector randomly
z1
z9
z10 11
z
z6
z2
z3
2. until all patterns classified correctly, do:
z8
a) for p = 1, …, P do:
z4
1) if zp classified correctly, do nothing
z7
2) else adjust weight vector to be closer to correct
classification
11/24/04
17
Weight Adjustment
z
p
11/24/04
18
Improvement in Performance
˜ z p < 0,
If w
p
˜ z
w
z p
˜
w
˜
w
˜ + z p ) z p
˜ z p = (w
w
˜ z p + z p z p
=w
˜ zp + zp
=w
2
˜ zp
>w
11/24/04
19
11/24/04
20
5
Part 7: Neural Networks & Learning
11/24/04
Classification Power of
Multilayer Perceptrons
Perceptron Learning Theorem
• If there is a set of weights that will solve the
problem,
• then the PLA will eventually find it
• (for a sufficiently small learning rate)
• Note: only applies if positive & negative
examples are linearly separable
• Perceptrons can function as logic gates
• Therefore MLP can form intersections,
unions, differences of linearly-separable
regions
• Classes can be arbitrary hyperpolyhedra
• Minsky & Papert criticism of perceptrons
• No one succeeded in developing a MLP
learning algorithm
11/24/04
11/24/04
21
Credit Assignment Problem
22
Adaptive System
System
Evaluation Function
(Fitness, Figure of Merit)
S
F
input
layer
...
...
...
...
...
hidden
layers
...
...
How do we adjust the weights of the hidden layers?
Desired
output
output
layer
P1 … Pk … Pm
Control Parameters
11/24/04
23
11/24/04
Control
Algorithm
C
24
6
Part 7: Neural Networks & Learning
11/24/04
Gradient Ascent
on Fitness Surface
Gradient
F
measures how F is altered by variation of Pk
Pk
F
F
P1 M F = F P k
M F
Pm +
gradient ascent
–
F points in direction of maximum increase in F
11/24/04
25
11/24/04
Gradient Ascent
by Discrete Steps
26
Gradient Ascent is Local
But Not Shortest
F
+
+
–
11/24/04
–
27
11/24/04
28
7
Part 7: Neural Networks & Learning
11/24/04
Gradient Ascent Process
General Ascent in Fitness
P˙ = F (P)
Note that any parameter adjustment process P( t )
Change in fitness :
m F d P
m
dF
k
=
= (F ) k P˙k
F˙ =
k=1 P d t
k=1
dt
k
˙
˙
F = F P
F˙ = F F = F
2
will increase fitness provided :
0 < F˙ = F P˙ = F P˙ cos
where is angle between F and P˙
Hence we need cos > 0
0
or < 90 o
Therefore gradient ascent increases fitness
(until reaches 0 gradient)
11/24/04
29
11/24/04
General Ascent
on Fitness Surface
30
Fitness as Minimum Error
Suppose for Q different inputs we have target outputs t1,K,t Q
Suppose for parameters P the corresponding actual outputs
+
are y1,K,y Q
F
Suppose D(t,y ) [0,) measures difference between
–
target & actual outputs
Let E q = D(t q ,y q ) be error on qth sample
Q
Q
Let F (P) = E q (P) = D[t q ,y q (P)]
q=1
11/24/04
31
11/24/04
q=1
32
8
Part 7: Neural Networks & Learning
11/24/04
Jacobian Matrix
Gradient of Fitness
y1q
y1q
P1 L
Pm Define Jacobian matrix J q = M
O
M y q
y nq
n
L
P1
Pm F = E q = E q
q
q
D(t q ,y q ) y qj
E q
=
D(t q ,y q ) = Pk Pk
y qj
Pk
j
d D(t q ,y q ) y q
=
d yq
Pk
q
q
q
= D t ,y y
yq
(
)
Note J q nm and D(t q ,y q ) n1
Since (E q ) k =
T
E q = (J q ) D(t q ,y q )
Pk
11/24/04
33
Derivative of Squared Euclidean
Distance
Suppose D(t,y ) = t y = (t y )
2
11/24/04
i
q
q
E q d D(t ,y ) y q
=
Pk
d yq
Pk
i
D(t y) (t y )
2
=
(ti y i ) = i i
y j i
y j
y j
i
2
= 2( y q t q ) y q
Pk
= 2 ( y qj t qj )
2
d( t y i )
= i
= 2( t i y i )
d yi
34
Gradient of Error on qth Input
2
i
q
q
y q D(t ,y )
E q
= j
,
q
Pk
Pk
y j
j
j
y qj
Pk
d D(t,y )
= 2( y t )
dy
11/24/04
35
11/24/04
36
9
Part 7: Neural Networks & Learning
11/24/04
Recap
T
P˙ = ( J q ) (t q y q )
q
To know how to decrease the differences between
actual & desired outputs,
we need to know elements of Jacobian,
y qj
Pk ,
which says how jth output varies with kth parameter
(given the qth input)
The Jacobian depends on the specific form of the system,
in this case, a feedforward neural network
11/24/04
37
10
Download