2-1 Chapter 2: Adaline and Madaline Adaline : Adaptive Linear neuron Madaline : Multiple – Adaline 2.1 Adaline Developed by Bernard Widrow at Stanford University x0 : the bias term (feedback term, error term, gain term, adjust term) n y w j x j wt x j 0 2-2 2.1.1. LMS (Least Mean Square) Learning Rule (or Delta Learning Rule) ◎ Input vectors : {x1, x 2, …, x L} Output values : {d1, d2, …, dL} Objective: Find a weight vector that associates each input vector with its desired output value ◎ Idea : minimize the mean squared error Actual output : {y1, y2, …, yL} The mean squared error method k 2 d k yk 2 d w xk t k 2 ---(2.4) d k 2 w t xk xk t w 2 d k xk t w Let ξ εk2 , p d k xk R xk xk t : correlation matrix (2.4) d k2 w t Rw 2 pt w To minimize ξ , let w * R 1 p d 2 Rw 2 p 0 dw 2-3 Note that d , , i, dw w w1 w2 , wn t ii, The graph of ( w ) is a hyperparaboloid d k2 w t Rw 2 pt w Difficulties of analytical formula : 1. Large dimensions R-1 difficult to calculate 2. Knowledge of probabilities < >: expected value 2-4 ◎ Steepest Descent 1. Initialize weight values 2. Determine the steepest descent direction w (t ) ( w (t )) w 3. Modify weight values w (t 1) w (t ) w (t ) , : step size 4. Repeat 2~4. 2-5 ◎ Approximation for the gradient Objectives: Simplify ( w (t )) 2 Rw 2 p by w sequentially inputting training samples 1. Apply an input vector xk 2. εk (t ) ( d k yk ) (d k w (t ) xk ) 2 2 T 3. (t ) εk2 (t ) εk2 (t ) w w w 2(d k w t (t ) xk ) xk 2εk (t ) xk 4. w (t 1) w (t ) 2 μεk (t ) xk 5. Repeat 1~4 with the next input vector 2 2-6 ○ Practical Considerations: (a) no. of training vectors, (b) stopping criteria (c) initial weights, (d) step size 2.3. Adaptive Signal Processing 2.3.1. Echo Cancellation in Telephone Circuits Hybrid circuit: deals with the leaking issue, which attempts to isolate incoming signals from outgoing signals Adaptive filter: deals with the choppy issue, which mimics the leakage of the incoming voice for suppressing the choppy speech from the outgoing signals 2-7 n : incoming voice, n : s : outgoing voice noise (leakage of the incoming voice) y : the output of the filter mimics n s2 ( s n y )2 s 2 (n y )2 2 s(n y ) s 2 (n y ) 2 s(n y ) 0 (s not correlated with y, n ) < ε 2 > = s 2 - s 2 = ( n y ) 2 minimize < ε 2 > = minimize (n y ) 2 2.3.2 Predict Signal Value 2-8 2.3.3 Reproduce Signal 2.3.4. Adaptive beam–forming antenna arrays Antenna : spatial array of sensors which are directional in their reception characteristics. Adaptive filter learns to steer antennae in order that they can respond to incoming signals no matter what their directions are, which reduce responses to unwanted noise signals coming in from other directions 2-9 2.4 Madaline : Many adaline ○ Compute XOR function x1 x1 x2 XOR’ XOR x2 0 0 -1 -1 0 1 -1 1 0 1 1 1 -1 0 1 1 1 -1 1 1 1 -1 0 1 x 2 x 1, x x 1 2 2-10 2.4.1. Architecture 2.4.2. Madaline Rule II (MRII) ○ Training algorithm – A trial–and–error procedure with a minimum disturbance principle (those nodes that can affect the output error while incurring the least change in their weights should have 2-11 precedence in the learning process) ○ Procedure – 1. Input a training pattern 2. Count #incorrect values in the output layer 3. For all units on the output layer 3.1. Select the first previously unselected error node whose analog output is closest to zero ( this node can reverse its bipolar output with the least change in its weights) 3.2. Change the weights on the selected unit s.t. the bipolar output of the unit changes 3.3. Input the same training pattern 3.4. If reduce # errors, then accept the weight change, otherwise restore the original weights 4. Repeat Step3 for all layers except the 2-12 input layer 5. For all units on the output layer 5.1. Select the previously unselected pair of units whose output are closest to zero 5.2. Apply a weight correction to both units, in order to change their bipolar outputs 5.3. Input the same training pattern 5.4. If reduce # errors, accept the correction; otherwise, restore the original weights. 6. Repeat step5 for all layers except the input layer. ※Steps 5 and 6 can be repeated with triplets, quadruplets or longer combinations of units until satisfactory results are obtained. 2-13 2.4.3. A Madaline for Translation – Invariant Pattern Recognition Difficulties for pattern recognition -- noise, incompletion, distortion, transformation, occlusion ○ Translation-invariant pattern recognition 2-14 。 Relationships among the weight matrices of Adalines 2-15 Adalines possess the identical set of weight values, which have been trained to detect a particular pattern 。 Majority votetaker 2-16 The majority node computes a binary output based on the majority of the outputs of the Adalines connecting to it. ○ Extension -- Mutiple slabs with different key weight matrices for discriminating more then two classes of patterns