6-1 Chapter 6: Counterpropagation Network I. Combine competitive network and Grossberg’s outstar structure II. Simulating functional mapping, i.e., y = ( x ) , where : a continue function , e.g., cos III. Backward-mapping, i.e., x = 1 ( y ) 6-2 。 Forward-mapping CPN 6.1. CPN Building Blocks Three major components: 1. Instar: hidden node with input weights 2. Competitive layer: hidden layer composed of instars 3. Outstar: a structure 6.1.2 The Instar Input: net I t w . Assume I w 1 6-3 ○ Output: y ay b(net ) , a, b > 0 ------ (A) Solve the above equation and assume y(0) = 0 b y (t ) (net )(1 e ) y (1 e ) a b y (net ) a If input removed at t , i.e., net = 0 at eq at eq From y ay b(net ), y ay Solve it y (t ) y eq e at ○ Learning of instar -- learn to respond maximally to a particular input vector w cw dI , c, d > 0 ------- (6.8) 6-4 In the absence of I w cw Called forgetfulness To avoid forgetfulness, an alternative learning rule: w (cw dI ) U (net ) ------ (6.10) 1 net 0 where U (net ) 0 net 0 。 When net 0 , w cw dI , w ctw d tI t Let ct d t. Then, w ( I w ) w w (t 1) w (t ) w w (t ) ( I w (t )) 。 Learn a cluster of patterns Set initial weight vector to some member of the cluster Average 6-5 。 Learning step: 1. Select an input vector I i at random 2. Calculate w (t ) (I i - w (t ) ) 3. Update w(t 1) w(t ) w(t ) 4. Repeat steps 1-3 for all input vectors in the cluster 5. Repeat 1-4 several times Reduce the value of as training proceeds 6.1.3. Competitive Layer The hidden layer is a competitive layer formed by instars 6-6 An on-center off-surround system can be used to implement the competition among a group of instars 。 The unit activation xi Axi ( B xi ) f ( xi ) neti ] xi ( f ( xk ) net k ) k i ----- (6.13) where A, B: positive constant, f () : transfer function Rearrange xi Axi B f ( xi ) neti ] xi f ( xk ) netk ----- (6.13)’ k k Sum over i, x Ax ( B x ) f x(k ) netk , x xi k k i Let xk xX k , where X : reflectance variable k x Ax ( B x) f ( xX k ) netk k k ----- (6.14) 6-7 From xi xX i , xi xX i xX i xX ii xi xX ----- (A) Substitute (6.13)’ xi Axi B f ( xi ) neti ] xi f ( xk ) netk k k and (6.14) x Ax ( B x) f ( xX k ) netk into (A) k k xX i xi xX i Axi B f ( xi ) neti ] xi f ( xX k ) netk k k { Ax ( B x) f ( xX k ) netk } X i k k --- (B) ( AxX i Axi ) B f ( xi ) neti xi f ( xX k ) netk k k ( B x) X i f ( xX k ) ( B x) X i netk k k Bf ( xi ) Bneti xi net k BX i f ( xX k ) k k BX i netk xX i net k k k Bf ( xX i ) BX i f ( xX k ) Bneti BX i netk k (Let k f ( ) g ( ) ) BxX i g ( xX i ) BX i xX k g ( xX k ) k Bneti BX i netk k BxX i X k g ( xX i ) g ( xX k ) Bneti k BX i netk BX i neti k i 6-8 BxX i X k g ( xX i ) g ( xX k ) k B(1 X i )neti BX i netk k i xX i BxX i X k g ( xX i ) g ( xX k ) k ----- (6.15) B (1 X i )neti BX i netk k i where g ( w) w1 f ( w) or f () g () 。 Let f ( ) (linear) g ( ) 1 BxX i X k g ( xX i ) g ( xX k ) 0 k xX i B (1 X i )neti BX i netk Bneti BX i netk k i k X i stablizes (i.e., X i 0) at X i neti netk k xi xX i x eq neti netk k at equilibrium 。If the input pattern is removed (i.e., neti 0 ), (6.14) => x Ax ( B x) f ( xX k ) k f ( xX k ) f ( xk ) xk ( ( f ( xX k k f () ) ) xk x ) k Ax ( B x) x ( B A) x x 2 6-9 If B A, then x 0 x decreasing If B A, x increases until x B A, which is then stored permanently on the units. Called short-term memory 。 Example: f (w) w (linear function) 6-10 。 Example: f ( w) w2 (fast-than-linear output function) 1 1 g ( w) w f ( w) w w w 2 g ( xXi ) g ( xX ) xX i xX x[ X i X ] k k k i. if X i X k , X k [ g ( xX i ) g ( xX k )] is an excitatory term for xi Otherwise, an inhibitory term for xi The network tends to enhance the activity of the unit with the largest value of X i 6-11 ii. After the input pattern is removed , (6.13)’ => xi Axi Bf ( xi ) xi f ( xk ) k Axi Bxi2 xi xk2 Bxi2 ( A xk2 ) xi k k With appropriate A and B, xi 0 for only the unit with the largest xi iii. f (w) wn can be used to implement a winner-take-all network when (n > 1) 。 Example: sigmoid function (contrast enhancement) Quenching threshold (QT): units whose net inputs are above the QT will have their activities enhanced 6-12 6.1.4. Outstar -- is composed of a single hidden unit and all output units 6-13 ○ Classical-conditioning 。 Hebbian rule 。 Initially, the conditioned stimulus (CS) is assumed to be unable to elicit a response from any of the units to which it is connected An unconditioned stimulus (UCS) can cause an unconditioned response (UCR) 6-14 If the CS is present while the UCS is causing the UCR , then the strength of the connection from the CS unit to the UCR unit will also be increased Later, the CS will be able to cause a conditioned response (CR), even if the UCS is absent 。 During the training period, the winner of the competition on the hidden layer turns on, providing a single CS to the output units. The UCS is supplied by the Y portion of the input layer. After training is complete, the appearance of the CS will cause the CR value to appear at the output units even though the UCS values will be zero 6-15 。 Recognize an input pattern through a winner-take-all competition. Once a winner has been declared, that unit becomes the CS for an outstar. The outstar associates some value or identity with the input pattern ◎ The output values of the outstar i, During training phase – yi a yi b iy c ni e t where a, b, c 0, yi : training input yi: the output of the outstar node i Only a single hidden unit c (winner) due to competition has a nonzero output at any given time. Net input to any output neuron i reduces to a single term wic z (z 1 the output of the winner) i.e., neti z wic wic yi ayi byi cwic ii, After training -- Training input y absent i yi ayi cwi ----- (6.17) 6-16 ◎ During training, the evolution of weights wi (dwi eyi z)U ( z) (similar to instar) Only the winner ( z 1, U ( z) 1) involves learning wi dwi eyi ----- (6.19) At equilibrium, wi 0 , wieq e yi: for identical output of the d members of a cluster wieq e yi : for average output of the d members of a cluster After training, (6.17) yi ayi ce yi d At eguilibrium yieq c e yi ad Want yieq yi , a c, d e yieq yi wieq From (6.19), wi (t 1) wi (t ) (eyi dwi (t )) wi (t ) ( yi wi (t )) ◎ Summary of training the CPN Two learning algorithms for (a) Competitive layer (input – hidden layers) (b) Output layer (hidden – output layers) 6-17 (a) Competitive layer training 1. Select an input vector I 2. Normalize Ι x ( x 1) 3. Apply to the competitive layer 4. Determine the winner c 5. Calculate ( x wc ) for the winner 6. Update the winner weight vector wc (t 1) wc (t) ( x wc ) 6-18 7. Repeat steps 1- 6, until all input vectors have been processed 8. Repeat step 7 until all input vectors have been classified properly We may remove the hidden nodes, which haven’t been used (b) Output layer training 1. Apply a normalized input vector xk , and its corresponding output vector yk , to the X and Y portions of the CPN, respectively. 2. Determine the winner c 3. Update its associated weights w (t 1) w (t ) ( y w (t )) , i 1, ..., m ic ic i ic 4. Repeat steps 1- 3, until all vectors of all classes map to satisfactory outputs ◎ Forward processing (production phase) 1. xi Ii I n 2 i , x 1 2. Apply the input vector to X portion of the input layer Apply zero vector to Y portion of the input layer 6-19 3. X portion units are distributed to the competitive layer 4. The winner unit has an output of 1 All other units have outputs of 0 5. Excites the corresponding outstar * The output of the outstar units is the value of the weights on the connection from the winner unit. 6.2.4. Complex CPN Forward mapping x y The Full CPN: Reverse mapping y x 6-20 During training, both x and y are applied to the input units input ( x, ) results in an output y ( x) After training, input ( , y) results in an output x 。 Let r: the weight vector from the portion of x inputs to any hidden unit s: the weight vector from the portion of y inputs to any hidden unit neti r x s y For hidden unit i, Its output zi 1 0 During training ni efit mn ae xt { k otherwise k } ri x ( x ri ), si y ( y si ) Note that only the winner is allowed to learn for a given input vector 。For output layer, y units have weight vector w , i x units have weight vector v i The learning laws: wij y ( yi wij ), vij x (xi vij ) Note that only the winner is allowed to learn