Artificial Neural Networks

advertisement
2-1
Chapter 2: Adaline and Madaline
Adaline : Adaptive Linear neuron
Madaline : Multiple – Adaline
2.1 Adaline
Developed by Bernard Widrow at Stanford
University
x0 : the bias term (feedback term, error term,
gain term, adjust term)
n
y   w j x j  wt x
j 0
2-2
2.1.1. LMS (Least Mean Square) Learning Rule
(or Delta Learning Rule)
◎ Input vectors : {x1, x 2, …, x L}
Output values : {d1, d2, …, dL}
Objective:
Find a weight vector that associates each
input vector with its desired output value
◎ Idea : minimize the mean squared error
Actual output : {y1, y2, …, yL}
The mean squared error method
k
2

 d k  yk 
2

d
 w xk
t
k

2
---(2.4)
 d k 2  w t xk xk t w  2 d k xk t w
Let ξ  εk2 , p  d k xk
R  xk xk t : correlation matrix
(2.4)    d k2  w t Rw  2 pt w
To minimize ξ , let
w *  R 1 p
d
 2 Rw  2 p  0
dw
2-3
Note that
d
  
   
,
,
i,
dw w
 w1 w2
 
,
wn 
t
ii, The graph of ( w ) is a hyperparaboloid
  d k2  w t Rw  2 pt w
Difficulties of analytical formula :
1. Large dimensions
R-1 difficult to calculate
2. Knowledge of probabilities
< >: expected value
2-4
◎ Steepest Descent
1. Initialize weight values
2. Determine the steepest descent
direction w (t )    ( w (t ))
w
3. Modify weight values
w (t  1)  w (t )  w (t ) ,  : step size
4. Repeat 2~4.
2-5
◎ Approximation for the gradient
Objectives: Simplify   ( w (t ))  2 Rw  2 p by
w
sequentially inputting training samples
1. Apply an input vector xk
2. εk (t )  ( d k  yk )  (d k  w (t )  xk )
2
2
T
3.  (t )   εk2 (t )   εk2 (t )
w
w
w
 2(d k  w t (t )  xk ) xk  2εk (t ) xk
4. w (t  1)  w (t )  2 μεk (t ) xk
5. Repeat 1~4 with the next input vector
2
2-6
○ Practical Considerations:
(a) no. of training vectors, (b) stopping criteria
(c) initial weights, (d) step size
2.3. Adaptive Signal Processing
2.3.1. Echo Cancellation in Telephone Circuits
Hybrid circuit: deals with the leaking issue,
which attempts to isolate incoming
signals from outgoing signals
Adaptive filter: deals with the choppy issue,
which mimics the leakage of the incoming
voice for suppressing the choppy speech
from the outgoing signals
2-7
n : incoming voice,
n :
s : outgoing voice
noise (leakage of the incoming voice)
y : the output of the filter mimics n
 s2  ( s  n  y )2  s 2    (n  y )2 
 2  s(n  y )  s 2    (n  y ) 2 
 s(n  y )  0 (s not correlated with y, n )
< ε 2 > =  s  2  -  s 2  =  ( n  y ) 2 
 minimize < ε 2 > = minimize  (n  y ) 2 
2.3.2 Predict Signal Value
2-8
2.3.3 Reproduce Signal
2.3.4. Adaptive beam–forming antenna arrays
Antenna : spatial array of sensors which are
directional in their reception
characteristics.
Adaptive filter learns to steer antennae in order
that they can respond to incoming signals no
matter what their directions are, which reduce
responses to unwanted noise signals coming in
from other directions
2-9
2.4 Madaline : Many adaline
○ Compute XOR function
x1
x1 x2
XOR’ XOR
x2
0 0 -1 -1
0 1 -1
1 0
1
1 1
-1
0
1
1
1
-1
1
1
1
-1
0
1
x  2 x  1,
x
x  1
2
2-10
2.4.1. Architecture
2.4.2. Madaline Rule II (MRII)
○ Training algorithm – A trial–and–error
procedure with a minimum disturbance
principle (those nodes that can affect the
output error while incurring the least
change in their weights should have
2-11
precedence in the learning process)
○ Procedure –
1. Input a training pattern
2. Count #incorrect values in the output layer
3. For all units on the output layer
3.1. Select the first previously unselected
error node whose analog output is
closest to zero ( this node can
reverse its bipolar output with the
least change in its weights)
3.2. Change the weights on the selected
unit s.t. the bipolar output of the
unit changes
3.3. Input the same training pattern
3.4. If reduce # errors, then accept the
weight change, otherwise restore
the original weights
4. Repeat Step3 for all layers except the
2-12
input layer
5. For all units on the output layer
5.1. Select the previously unselected pair
of units whose output are closest to
zero
5.2. Apply a weight correction to both
units, in order to change their bipolar
outputs
5.3. Input the same training pattern
5.4. If reduce # errors, accept the
correction; otherwise, restore
the original weights.
6. Repeat step5 for all layers except the input
layer.
※Steps 5 and 6 can be repeated with triplets,
quadruplets or longer combinations of units
until satisfactory results are obtained.
2-13
2.4.3. A Madaline for Translation – Invariant
Pattern Recognition
Difficulties for pattern recognition -- noise,
incompletion, distortion, transformation,
occlusion
○ Translation-invariant pattern recognition
2-14
。 Relationships among the weight matrices of
Adalines
2-15
Adalines possess the identical set of weight
values, which have been trained to detect a
particular pattern
。 Majority votetaker
2-16
The majority node computes a binary output
based on the majority of the outputs of the
Adalines connecting to it.
○ Extension -- Mutiple slabs with different key
weight matrices for discriminating more
then two classes of patterns
Download