Sample questions and solutions Hebbian Learning Case 1: Simulating the AND-function x1 ^ x2 with a hard-limiter transfer function Step 1: Develop the input pattern set (X1, X2) & the corresponding desired/target set T x1 0 0 1 1 x2 0 1 0 1 T 0 0 0 1 Step 2: Train network by presenting it with input and output pairs in the same order as the truth table. The Hebbian learning rule is given as: W= X’T X = the 4x2 input pattern set (i.e. the two left-most columns above) T = the desired column matrix (rightmost column above) W = 2x1 matrix denoting weights for input x1 and x2 Thus we have: X = [ 0 0; 0 1; 1 0; 1 1] X’ = [0 0 1 1; 0 1 0 1] and T = [ 0; 0; 0; 1] W = [0 0 1 1; 0 1 0 1] * [0; 0; 0; 1] = [1; 1] Step 3: Calculate actual outputs of network based on discovered W Using the formula V = Σ wixi = W’X’: V = [1 1] * [0 0 1 1; 0 1 0 1] = [ 0 1 1 2] Our table now looks like: x1 0 0 1 1 x2 0 1 0 1 V 0 1 1 2 T 0 0 0 1 Step 4: Apply threshold θ as Y = sign(Σ wixi - θ), using hard-limiter transfer function: X1 0 0 1 1 x2 0 1 0 1 V 0 1 1 2 Y (θ = 1.5) 0 0 0 1 T 0 0 0 1 Case 2: The necessity of bipolar values Imagine we are to simulate the binary function x1 ^ -- x2. Using the same technique as in case 1 with θ = 0.5, we run through all four steps as above to get W = [1; 0] and: x1 0 0 1 1 x2 0 1 0 1 Y (θ = 0.5) 0 0 1 1( ERROR !) V 0 0 1 1 T 0 0 1 0 There is a reason for this error: using binary values [0,1] for inputs and targets, we only allow INCREASES in the values of weights. A more general approach would be to allow the values of the weights to decrease. So we need to inhibit ( decrease value of) a weight if its associated input is active and the corresponding output is not active, or vice versa. This can be achieved by using BIPOLAR values for inputs and targets instead. Simply replace all zeroes in the input and target patterns by –1. We therefore start off with the following truth table: x1 -1 -1 +1 +1 x2 -1 +1 -1 +1 T -1 -1 +1 -1 Again, going all the four steps in Case 1, we get W = X’T = [-1 -1 +1 +1; -1 +1 -1 +1] * [ -1 -1 +1 - 1] = [ 2; -2] x1 -1 -1 +1 +1 x2 -1 +1 -1 +1 V 0 -4 4 0 We see that the network is now correctly trained. Y (0 < θ < 4) 0 0 1 0 Y2 (bipolar) -1 -1 +1 -1 T -1 -1 +1 -1 Back-propagation Case 1: Three layer perceptron for exclusive-OR problem Our truth table is as: X0(0) 1 1 1 1 X0(1) 0 0 1 1 X0(2) 0 1 0 1 T 0 1 1 0 We construct our network as: X0(0) X1(0) X1(1) X0(1) X0(2) X1(2) X2(0) W2(0,1) W2(1,1) W2(2,1) W2(0,2) W2(1,2) W2(2,2) X2(1) W3(0,1) W3(1,1) W3(2,1) X3(1) X2(2) Input layer Suppose we start with an initial weight set: W2(0,1) = 0.862518 W2(0,2) = 0.834986 W3(0,1) = 0.036498 w2(1,1) = -0.155797 w2(1,2) = -0.505997 w3(1,1) = -0.430437 w2(2,1) = 0.282885 w2(2,2) = 0.864449 w3(2,1) = 0.481210 And calculate the value for X3(1) by presenting the first input pattern, viz: (X0(0),X0(1), X0(2)) = (X1(0), X1(1), X1(2)) = (1 0 0 ): Hidden layer Neuron 1 X1(0) 1 X1(1) 0 X1(2) 0 Net 0.862518 X2(1) 0.7031864 Hidden layer Neuron 2 X1(0) 1 X1(1) 0 X1(2) 0 Net 0.834986 X2(2) 0.6974081 Output layer Neuron 1 X2(0) 1 X2(1) 0.7031864 X2(2) 0.6974081 Net 0.0694203 X3(1) 0.5173481 We see that we get an output value 0.5173481, using sigmoid activation function. We now move back, calculating the value of δ for each of the layers, starting with the output layer: δ3(1) = x3(1) * (1 – x3(1)) * (d – x3(1) ) = -0.1291812 δ2(1) = x2(1) * (1 – x2(1)) * w3(1,1) * δ3(1) = 0.0116054 δ2(2) = x2(2) * (1 – x2(2)) * w3(2,1) * δ3(1) = -0.0131183 So the changes to the weights when η = 0.5 are: Δw2(0,1) = η* x1(0) * δ2(1) = 0.5 * 1 * 0.0116054 = 0.0058027 Δw2(1,1) = η* x1(1) * δ2(1) = 0.5 * 0 * 0.0116054 = 0 Δw2(2,1) = η* x1(2) * δ2(1) = 0.5 * 0 * 0.0116054 = 0 Δw2(0,2) = η* x1(0) * δ2(2) = 0.5 * 1 * -0.0131183 = -0.0065592 Δw2(1,2) = η* x1(1) * δ2(2) = 0.5 * 0 * -0.0131183 = 0 Δw2(2,2) = η* x1(2) * δ2(2) = 0.5 * 0 * -0.0131183 = 0 Δw3(0,1) = η* x2(0) * δ3(1) = 0.5 * 1 * -0.1291812 = -0.0645906 Δw3(1,1) = η* x2(1) * δ3(1) = 0.5 * 0.7031864 * -0.1291812 = -0.0454192 Δw3(2,1) = η* x2(2) * δ3(1) = 0.5 * 0.6974081 * -0.1291812 = 0.045046 The new values for the weights are now: w2(0,1) = 0.868321 w2(0,2) = 0.828427 w3(0,1) = -0.028093 w2(1,1) = -0.155797 w2(1,2) = -0.505997 w3(1,1) = -0.475856 w2(2,1) = 0.282885 w2(2,2) = -0.864449 w3(2,1) = 0.436164 The network is then presented with the next input pattern and the whole process of calculating the weight adjustment is repeated. This continues until the error between the actual and desired output is smaller than some specified value, at which point the training stops. After several thousand iterations, the weights are: w2(0,1) = -6.062263 w2(0,2) = -4.893081 w3(0,1) = -9.792470 w2(1,1) = -6.072185 w2(1,2) = -4.894898 w3(1,1) = 9.484580 w2(2,1) = 2.454509 w2(2,2) = 7.293063 w3(2,1) = -4.473972 With these values the output looks like: X0(0) 1 1 1 1 X0(1) 0 0 1 1 X0(2) 0 1 0 1 X3(1) 0.017622 0.981504 0.981491 0.027782 This shows that back-propagation can find a set of weights for the exclusive-OR function, provided that the architecture of the network is suitable. Case 2: A multi-layered perceptron has two input units, two hidden units and one output unit. What are the formulae for updating the weights in each of the units? Ans. For the unit in the output layer: Δw3(0,1) = η δ3(1) x2(0) Δw3(1,1) = η δ3(1) x2(1) Δw3(2,1) = η δ3(1) x2(2) where δ3(1) = x3(1)[1 – x3(1)][ d1 – x3(1)]. For the first unit in the hidden layer: Δw2(0,1) = η δ2(1) x1(0) Δw2(1,1) = η δ2(1) x1(1) Δw2(2,1) = η δ2(1) x1(2) where δ2(1) = x2(1)[1 – x2(1)] * w3(1,1) * δ3(1) For the second unit in the hidden layer: Δw2(0,2) = η δ2(2) x1(0) Δw2(1,2) = η δ2(2) x1(1) Δw2(2,2) = η δ2(2) x1(2) where δ2(2) = x2(2) *[1 – x2(2)] * w3(2,1) * δ3(1). Case 3: a. What is the value of the output of the network described in case 2 if the weights are given by the following table and both inputs are +1.0: W2(0,1) 1.7 W2(1,1) 2.6 W2(2,1) 0.2 W2(0,2) -0.1 W2(1,2) W2(2,2) W3(0,1) W3(1,1) 0.7 1.5 0.5 1.2 Ans. To calculate the output of the network start at the hidden layer and work forward. For the first neuron in the hidden layer (using sigmoid activation functions): Net(actual) = 4.5 x2(1) = 0.989 W3(2,1) -0.3 For the second neuron in the hidden layer: Net(actual) = 2.1 x2(2) = 0.891 For the single neuron in the output layer: Net(Actual) = 1.142 x3(1) = 0.805 (b). What are the values of δ for the same network with these inputs if the desired output is +1.0? Ans. To calculate the values of δ, start at the ouput layer and work backwards. δ3(1) = x3(1) * (1 – x3(1) ) * ( d – x3(1) ) = 0.805 * (1 – 0.805) * ( 1 – 0.805 ) = 0.031 At the hidden layer, the first value is: δ2(1) = x2(1) * (1 – x2(1) ) * w3(1,1) * δ3(1) = 0.989 * (1 – 0.989) * ( 1.2 ) * (0.031) = 0.0004 And the second is: δ2(2) = x2(2) * (1 – x2(2) ) * w3(2,1) * δ3(1) = 0.891 * (1 – 0.891) * (-0.3) * (0.031) = - 0.0009