Sample questions and solutions

advertisement
Sample questions and solutions
Hebbian Learning
Case 1: Simulating the AND-function x1 ^ x2 with a hard-limiter transfer function
Step 1: Develop the input pattern set (X1, X2) & the corresponding desired/target set T
x1
0
0
1
1
x2
0
1
0
1
T
0
0
0
1
Step 2: Train network by presenting it with input and output pairs in the same order as the
truth table.
The Hebbian learning rule is given as:
W= X’T
X = the 4x2 input pattern set (i.e. the two left-most columns above)
T = the desired column matrix (rightmost column above)
W = 2x1 matrix denoting weights for input x1 and x2
Thus we have:
X = [ 0 0; 0 1; 1 0; 1 1] X’ = [0 0 1 1; 0 1 0 1] and T = [ 0; 0; 0; 1]
W = [0 0 1 1; 0 1 0 1] * [0; 0; 0; 1] = [1; 1]
Step 3: Calculate actual outputs of network based on discovered W
Using the formula V = Σ wixi = W’X’:
V = [1 1] * [0 0 1 1; 0 1 0 1] = [ 0 1 1 2]
Our table now looks like:
x1
0
0
1
1
x2
0
1
0
1
V
0
1
1
2
T
0
0
0
1
Step 4: Apply threshold θ as Y = sign(Σ wixi - θ), using hard-limiter transfer function:
X1
0
0
1
1
x2
0
1
0
1
V
0
1
1
2
Y (θ = 1.5)
0
0
0
1
T
0
0
0
1
Case 2: The necessity of bipolar values
Imagine we are to simulate the binary function x1 ^ -- x2. Using the same technique as in
case 1 with θ = 0.5, we run through all four steps as above to get W = [1; 0] and:
x1
0
0
1
1
x2
0
1
0
1
Y (θ = 0.5)
0
0
1
1( ERROR !)
V
0
0
1
1
T
0
0
1
0
There is a reason for this error: using binary values [0,1] for inputs and targets, we only allow
INCREASES in the values of weights. A more general approach would be to allow the values of
the weights to decrease. So we need to inhibit ( decrease value of) a weight if its associated input
is active and the corresponding output is not active, or vice versa.
This can be achieved by using BIPOLAR values for inputs and targets instead. Simply replace all
zeroes in the input and target patterns by –1.
We therefore start off with the following truth table:
x1
-1
-1
+1
+1
x2
-1
+1
-1
+1
T
-1
-1
+1
-1
Again, going all the four steps in Case 1, we get
W = X’T = [-1 -1 +1 +1; -1 +1 -1 +1] * [ -1 -1 +1 - 1] = [ 2; -2]
x1
-1
-1
+1
+1
x2
-1
+1
-1
+1
V
0
-4
4
0
We see that the network is now correctly trained.
Y (0 < θ < 4)
0
0
1
0
Y2 (bipolar)
-1
-1
+1
-1
T
-1
-1
+1
-1
Back-propagation
Case 1: Three layer perceptron for exclusive-OR problem
Our truth table is as:
X0(0)
1
1
1
1
X0(1)
0
0
1
1
X0(2)
0
1
0
1
T
0
1
1
0
We construct our network as:
X0(0)
X1(0)
X1(1)
X0(1)
X0(2)
X1(2)
X2(0)
W2(0,1)
W2(1,1)
W2(2,1)
W2(0,2)
W2(1,2)
W2(2,2)
X2(1)
W3(0,1)
W3(1,1)
W3(2,1)
X3(1)
X2(2)
Input layer
Suppose we start with an initial weight set:
W2(0,1) = 0.862518
W2(0,2) = 0.834986
W3(0,1) = 0.036498
w2(1,1) = -0.155797
w2(1,2) = -0.505997
w3(1,1) = -0.430437
w2(2,1) = 0.282885
w2(2,2) = 0.864449
w3(2,1) = 0.481210
And calculate the value for X3(1) by presenting the first input pattern, viz:
(X0(0),X0(1), X0(2)) = (X1(0), X1(1), X1(2)) = (1 0 0 ):
Hidden layer
Neuron 1
X1(0)
1
X1(1)
0
X1(2)
0
Net
0.862518
X2(1)
0.7031864
Hidden layer
Neuron 2
X1(0)
1
X1(1)
0
X1(2)
0
Net
0.834986
X2(2)
0.6974081
Output layer
Neuron 1
X2(0)
1
X2(1)
0.7031864
X2(2)
0.6974081
Net
0.0694203
X3(1)
0.5173481
We see that we get an output value 0.5173481, using sigmoid activation function.
We now move back, calculating the value of δ for each of the layers, starting with the output
layer:
δ3(1) = x3(1) * (1 – x3(1)) * (d – x3(1) ) = -0.1291812
δ2(1) = x2(1) * (1 – x2(1)) * w3(1,1) * δ3(1) = 0.0116054
δ2(2) = x2(2) * (1 – x2(2)) * w3(2,1) * δ3(1) = -0.0131183
So the changes to the weights when η = 0.5 are:
Δw2(0,1) = η* x1(0) * δ2(1) = 0.5 * 1 * 0.0116054 = 0.0058027
Δw2(1,1) = η* x1(1) * δ2(1) = 0.5 * 0 * 0.0116054 = 0
Δw2(2,1) = η* x1(2) * δ2(1) = 0.5 * 0 * 0.0116054 = 0
Δw2(0,2) = η* x1(0) * δ2(2) = 0.5 * 1 * -0.0131183 = -0.0065592
Δw2(1,2) = η* x1(1) * δ2(2) = 0.5 * 0 * -0.0131183 = 0
Δw2(2,2) = η* x1(2) * δ2(2) = 0.5 * 0 * -0.0131183 = 0
Δw3(0,1) = η* x2(0) * δ3(1) = 0.5 * 1 * -0.1291812 = -0.0645906
Δw3(1,1) = η* x2(1) * δ3(1) = 0.5 * 0.7031864 * -0.1291812 = -0.0454192
Δw3(2,1) = η* x2(2) * δ3(1) = 0.5 * 0.6974081 * -0.1291812 = 0.045046
The new values for the weights are now:
w2(0,1) = 0.868321
w2(0,2) = 0.828427
w3(0,1) = -0.028093
w2(1,1) = -0.155797
w2(1,2) = -0.505997
w3(1,1) = -0.475856
w2(2,1) = 0.282885
w2(2,2) = -0.864449
w3(2,1) = 0.436164
The network is then presented with the next input pattern and the whole process of calculating the
weight adjustment is repeated.
This continues until the error between the actual and desired output is smaller than some specified
value, at which point the training stops.
After several thousand iterations, the weights are:
w2(0,1) = -6.062263
w2(0,2) = -4.893081
w3(0,1) = -9.792470
w2(1,1) = -6.072185
w2(1,2) = -4.894898
w3(1,1) = 9.484580
w2(2,1) = 2.454509
w2(2,2) = 7.293063
w3(2,1) = -4.473972
With these values the output looks like:
X0(0)
1
1
1
1
X0(1)
0
0
1
1
X0(2)
0
1
0
1
X3(1)
0.017622
0.981504
0.981491
0.027782
This shows that back-propagation can find a set of weights for the exclusive-OR function,
provided that the architecture of the network is suitable.
Case 2:
A multi-layered perceptron has two input units, two hidden units and one output unit.
What are the formulae for updating the weights in each of the units?
Ans. For the unit in the output layer:
Δw3(0,1) = η δ3(1) x2(0)
Δw3(1,1) = η δ3(1) x2(1)
Δw3(2,1) = η δ3(1) x2(2)
where δ3(1) = x3(1)[1 – x3(1)][ d1 – x3(1)].
For the first unit in the hidden layer:
Δw2(0,1) = η δ2(1) x1(0)
Δw2(1,1) = η δ2(1) x1(1)
Δw2(2,1) = η δ2(1) x1(2)
where δ2(1) = x2(1)[1 – x2(1)] * w3(1,1) * δ3(1)
For the second unit in the hidden layer:
Δw2(0,2) = η δ2(2) x1(0)
Δw2(1,2) = η δ2(2) x1(1)
Δw2(2,2) = η δ2(2) x1(2)
where δ2(2) = x2(2) *[1 – x2(2)] * w3(2,1) * δ3(1).
Case 3:
a. What is the value of the output of the network described in case 2 if the weights are given
by the following table and both inputs are +1.0:
W2(0,1)
1.7
W2(1,1)
2.6
W2(2,1)
0.2
W2(0,2)
-0.1
W2(1,2) W2(2,2) W3(0,1) W3(1,1)
0.7
1.5
0.5
1.2
Ans.
To calculate the output of the network start at the hidden layer and work forward. For the first
neuron in the hidden layer (using sigmoid activation functions):
Net(actual) = 4.5
x2(1) = 0.989
W3(2,1)
-0.3
For the second neuron in the hidden layer:
Net(actual) = 2.1
x2(2) = 0.891
For the single neuron in the output layer:
Net(Actual) = 1.142
x3(1) = 0.805
(b). What are the values of δ for the same network with these inputs if the desired output is
+1.0?
Ans.
To calculate the values of δ, start at the ouput layer and work backwards.
δ3(1) = x3(1) * (1 – x3(1) ) * ( d – x3(1) ) = 0.805 * (1 – 0.805) * ( 1 – 0.805 ) = 0.031
At the hidden layer, the first value is:
δ2(1) = x2(1) * (1 – x2(1) ) * w3(1,1) * δ3(1) = 0.989 * (1 – 0.989) * ( 1.2 ) * (0.031) = 0.0004
And the second is:
δ2(2) = x2(2) * (1 – x2(2) ) * w3(2,1) * δ3(1) = 0.891 * (1 – 0.891) * (-0.3) * (0.031) = - 0.0009
Download