Exercise II: Self-Organizing and Feedback Networks

advertisement
FYTN06
HT 2015
Exercise II: Self-Organizing and Feedback
Networks
Supervisor: Mattias Ohlsson
(mattias@thep.lu.se, 046-222 77 82)
Deadline: Jan 5, 2015
Abstract
In this exercise we will study self-organizing maps and feed-back networks, with the
following subtitles:
• Competitive networks for clustering
• Kohonen’s self-organizing feature map
• Learning Vector Quantization (LVQ)
• Elman networks for time series prediction
• Hopfield networks as a error-correcting associative memory.
The problems we will look at are both artificial and real world problems. The
environment you will work in is Matlab, specifically the Neural Network Toolbox
(as in the previous exercise).
Section 1 and 2 contains a summary of the different network types we are going to
look at. Section 3 gives a small description of the data sets that we will use and
section 4 contains the actual exercises. The last section (5) contains a demo for the
graph bisection problem and the traveling salesman problem.
1
1.1
Self-organizing networks
Competitive learning
A competitive network can be used to cluster data, i.e. to replace a set of input vectors by a fewer
set of reference vectors. Figure 1 shows a competitive network with N inputs and K outputs. A
y1
y
. . . . . .
. . . . . .
yK
Wij
11
00
x1
0110 11
00
00
11
11
00
. . . . . .
x
0110 00
11
00
11
2
xN
Figure 1: A competitive network.
competitive network is defining a winning node j ∗ . Matlab’s implementation is
j ∗ = arg max(−||x − ω j ||)
(1)
j
The “activation” function for the output nodes is
1 if j is the winning node
yj =
0 otherwise
The winning weight vector is updated according to the following equation:
wj ∗ (t + 1) = wj ∗ (t) + η (x(n) − wj ∗ (t))
In order to avoid “dead” neurons, i.e. neuron that never wins and hence never gets updated,
Matlab implements something called the bias learning rule. This means that a positive bias is
added to the negative distance (see Eqn. 1), making a distant neuron more likely to win. These
biases are updated such that the biases of frequently active neurons become smaller, and biases
of infrequently active neurons become larger. Eventually a dead neuron will win an move towards
input vector. There is a specific learning rate associated with updating theses biases and this one
should be much smaller than the ordinary learning rate.
1.2
Kohonen’s self-organizing feature map (SOFM)
Figure 2 shows a SOFM where a 2-dimensional input is mapped onto a 2-dimensional discrete
output.
The weights are updated according to the following equation:
wj (t + 1) = wj (t) + ηΛjj ∗ (x(n) − wj (t))
where j ∗ is the winning node for input x(n). The neighborhood function Λjj ∗ that is implemented
in Matlab’s toolbox is defined as
1
if djj ∗ ≤ do
Λjj ∗ =
(2)
0
if djj ∗ > do
1
31
d ij
22
1
2
6
W
31
W1
W22
x1
x2
Figure 2: A self-organizing feature map with 2 inputs and a 6x10 square grid output.
where djj ∗ is the distance between neuron j and j ∗ on the grid, hence all neurons within a specific
distance do are updated. Matlab have a few different distance measures (dist, linkdist, boxdist and
mandist) and we will be using the default linkdist distance function. This measures the number of
links between two nodes.
The default learning in Matlab for SOFM is batch mode (learnsomb). It is divided into two
phases, one ordering phase and one tuning phase. During ordering, which occurs for a number of
epochs and starts with an initial neighborhood size, the neurons are expected to order themselves
in inputs space with the same topology in which they are ordered physically. In the tuning phase
smaller weight corrections are expected and also, the neighborhood size is decreased.
1.3
Learning vector quantization
The previous methods are un-supervised, meaning that the user doesn’t have any preconceived
idea of which data points belong to what cluster. If the user supplies a target class for each data
point and the network uses this information when training, we have supervised learning. Kohonen
(1989) suggested a supervised version of the clustering algorithm for competitive networks called
learning vector quantization (LVQ)1 . There are usually more output neurons than classes, which
means that several output neurons can belong the the same class. The update rule for the LVQ
weights is therefore
wj ∗ + η (x(n) − wj ∗ ) if j ∗ and x(n) belong to the same class.
wj ∗ →
(3)
wj ∗ − η (x(n) − wj ∗ ) if j ∗ and x(n) belong to different classes.
where j ∗ is the winning node for input x(n).
Note! In Matlab’s toolbox a LVQ network is implemented using 2 layers of neurons, first a
competitive layer that does the clustering and then a linear layer for the supervised learning task
(e.g. classification). From a practical point of view this does not change the functionality of the
LVQ network and we can assume it consists of only one layer of neurons.
1 Clustering
is also called vector quantization.
2
2
Feedback networks
Matlab implements a few different architectures suitable for time series analysis. One can, for
example, choose from time delay networks 2 , FIR networks3 and Elman type of networks4 . We
will also use Hopfield models as associative memories.
2.1
Elman networks (Matlab version)
Elman networks have feedback from the hidden layer to a new set of inputs called context units.
The feedback connection is fixed, but can have several time-delays associated with it, which means
that the output from the context units are previous copies of the hidden nodes. Figure 3 shows an
Elman network with one hidden layer and two time-lags on the feedback.
Figure 3: An Elman network with one hidden layer and two time-lags.
Note 1 Training an Elman network for the sunspot time series only requires one input and one
output node, since the network itself can find out (using the context units) suitable dependencies
(time lags).
2.2
The Hopfield network
The Hopfield network consists of a fully recurrent network of N neurons s = (si , ..., sN ) with
si ∈ {−1, 1} (see Figure 4). Every neuron is connected to all other neurons with symmetrical
connections wij = wji , and wii = 0. The neurons are updated (asynchronously), using the sign of
the incoming signal vi ,
N
X
vi =
wij sj
j=1
according to,
si =
sgn(vi ) if vi 6= 0
si
if vi = 0
(4)
The training (i.e. determination of the weights) is very fast for the Hopfield network. The weights
2 called
focused time delay networks in Matlab
distributed time delay networks in Matlab
4 called layer-recurrent networks in Matlab
3 called
3
S2
W21=W12
S1
W32=W23
W31=W13
W42=W24
W41=W14
S3
W43=W34
S4
Figure 4: A Hopfield network with 4 neurons.
are given by
wij
=
P
1 X µ µ
ξ ξ
N µ=1 i j
wii
=
0
i 6= j
for P patterns stored in the vectors ξ µ .
Note! Matlab has its own Hopfield model5 , that is implemented using newhop. This model is
better at avoiding spurious states than the original model.
3
Data sets used in this exercise
The data sets used in this exercise are (as for Exercise I) both artificial and real world data.
Some of the data from Exercise I will be reused, namely sunspot, wood and some of the synthetic
classification data. As in Exercise I, we will use Matlab’s own naming for the matrix P that is
storing the inputs to the neural network. If you have 3 input variables and 200 data points the
P will be a 3x200 matrix (the variables are stored row-wise). The targets (if there are any) are
stored (row-wise) in T (1x200 if one output and 200 data points). The files are available from
(http://www.thep.lu.se/∼mattias/teaching/fytn06/)
3.1
Synthetic cluster data
To demonstrate the competitive networks ability to cluster data, a 2-dimensional data set consisting
of 6 Gaussian distributions with small widths will be used. You can use the function loadclust1
to get the data and use Matlab’s plot function to visualize it. For example,
>> [P,T] = loadclust1(200);
>> plot(P(1,:),P(2,:),’r*’);
5 This model is an implementation of the algorithm described in Li, J., A. N. Michel, and W. Porod, “Analysis
and synthesis of a class of neural networks: linear systems operating on a closed hypercube”, IEEE Transactions
on Circuits and Systems, vol 36, np. 11, pp. 1405-1422, 1989.
4
3.2
The face ORL data
With the function loadfacesORL you will get 80 images of 40 people (2 images of each person). The
images are taken from the AT&T Laboratories6 and they are stored in the FaceORL directory.
Each face is a gray-scale image with the original size 112 x 92. There is however an option to
loadfacesORL that will make the images smaller, and perhaps more manageable for the networks
that will use these data. For example,
[P,ims] = loadfacesORL(0.5);
will load the images rescale to a size of 56 x 46. You can then view one of the images with
viewfaceORL;, e.g.
viewfaceORL(P(:,45),ims);
3.3
The letter data
This data set will be used in connection with the Hopfield model. It consists of the first 26 uppercase
letter and they a coded as 7 x 5 (black and white) images. Use loadletters or loadlett to load
all or a single one, respectively. You can look at them using viewlett. Example,
P = loadletters(26); viewlett(P(:,11),’Letter K’);
4
Exercises
4.1
Competitive networks for clustering
In this exercise you will use a competitive network to do clustering. We will start with a synthetic
data set and later on use the faces from ORL.
2-dimensional data You can use the function syn comp in the next 3 exercises.
• Exercise 1: Use 100-200 data from the synthetic data set, 6 output neurons and default values
for the learning parameters. Does it work?
• Exercise 2: Use 100-200 data from the synthetic data set and 6 output neurons. Try different settings of the learning parameters and see what happens. Especially, put
the bias learning rate to zero and see if “dead neurons” appear.
• Exercise 3: Use 100-200 data from the synthetic data set. Use more than 6 output neurons
(e.g. 10) and run the network. What happens with the superfluous neurons (weight
vectors)?
ORL face data In the next exercises we will try to cluster the faces from ORL. In the previous
exercise the network had 2 inputs corresponding to the 2 coordinates for the data. How are we
going to present the faces for the competitive network? We are going to do the simplest thing and
let each pixel value in the image represent one input. This means that a 112 x 92 image will be
represented by a 10304 long input vector.
For the next 2 exercises you should modify syn comp in order to deal with the ORL face data. You
should in principle only have to replace the loadclust1 function with the loadfacesORL function,
comment out the plotting part and possibly increase the def ntrain parameter. It can also be
a good idea to normalize the input data. To view the result you can use the showfacecluster
function (for details, see >> help showfacecluster).
6 http://www.cl.cam.ac.uk/research/dtg/attarchive/facedatabase.html
5
• Exercise 4: Train a competitive network with 5-10 outputs on the ORL faces. You may have
to increase the number of epochs and/or lower the learning rate in order to reach convergence. (Hint: Look at the mean change of weight locations). View the different clusters using
showfacecluster (e.q. showfacecluster(net, ’comp’, P, ims, 1) to view all faces that have
output node 1 as the winner). Does the network cluster the faces into “natural”
clusters?
• Exercise 5: Repeatability! Does the result change a lot if you repeat exercise 4 a
couple of times?
• Not required, only for the fun of it: The weight vectors in the competitive network represents
the cluster centers for each cluster and can therefore be interpreted as the mean face for this
cluster. Modify showfacecluster so that it can show these “cluster faces”.
4.2
Self-organizing feature map
Now we are going to use Kohonen’s self-organizing feature map (SOFM) to cluster both the synthetic data and the ORL face data. The function syn sofm will be used in exercise 6. For exercise
7-8 you should modify it so that it can work with the ORL face data.
• Exercise 6: Run syn sofm to approximate the synthetic (6 Gaussian’s) cluster data. List
at least 2 properties of the SOFM algorithm and relate these properties to the
figure showing the result for the synthetic data.
• Exercise 7: Modify the syn sofm function to deal with the ORL faces and train a [3 x 3]
SOFM on these face data. Use showfacecluster to view the result. Note! The SOFM
can take some time to train, it can therefore be a good idea to work with smaller faces (e.g.
scale factor 0.5). By looking at the clusters that the SOFM creates, are they better
or worse compared to the clusters created by the (simple) competitive networks
above?
• Exercise 8: Can you confirm the property of the SOFM that states that output
nodes far apart from each other corresponds to different features of the input
space?
4.3
LVQ
In this section we will use the LVQ classifier on the same problems as in exercise I. You can use
syn lvq for the synthetic problems and a modify it to handle the liver data. You may have to tune
the learning rate in order to get a good result. You can use the boundaryLVQ function to plot the
boundary implemented by the LVQ network.
• Exercise 9: Use 100 data points from synthetic data set 3 and 5-10 outputs. Optimize your
network with respect to the validation set. What number of outputs gave the best result
and why do you need this many neurons to do a good job? How does the result
compare to the MLP networks of Exercise I?
4.4
Time-delay networks
Now you are going to test the Elman network, called layer-recurrent networks in Matlab, for time
series prediction. To compare with previous models we are going to use the sunspot data as our
prediction task. You can use the function sunspot lrn for these exercises.
6
• Exercise 10: Use an Elman network with 1-5 hidden neurons and 1 delay feedback. Train
a couple of times with different number of hidden neurons. How is the performance
compared to the MLP’s you trained in Exercise I?
• Exercise 11: Experiment with the “delay feedback” parameter. Can you find an optimal
value?
• Not required: Use time-delay networks (timedelaynet) or FIR networks (distdelaynet)
for the sunspot data. Comments?
4.5
Hopfield networks
In this exercise we are going to use the Hopfield model as an error-correcting associative memory.
The images we are going to store in the memory are the first 26 uppercase letters, coded as 7 x 5
black and white images. You can use the lett hopf function for these exercises, which lets the user
store the first k letters in the alphabet; e.g. for k = 5 the letters ’A’ through ’E’ are stored. This
function calls either Matlab’s own Hopfield model or the functions newhop2 and simhop, which
implements and simulates the original Hopfield model (as you know it). For the first 2 exercises
you should use the original Hopfield model.
• Exercise 12: Store the first 5 letters in a Hopfield network. What is the capacity of this
network (if we assume random patterns)? Initiate the network with a letter without
any distortion. Are the stored letters stable? Why / why not? Now initiate the model
with a noisy letter. How good is the ability to retrieve distorted letters?
• Exercise 13: Store 10 letters in the Hopfield model. How many stable letters are there
this time? Why the difference in performance, compared with last exercise?
• Exercise 14: Store 10 letters in Matlab’s Hopfield model. Are the 10 letters stable?
How good is the retrieval ability?
5
Combinatorial optimization (DEMO)
Note! This section is optional, there are no questions to answer. It is provided in order
demonstrate the Hopfield model as a “solver” for the graph bisection problem and to look at the
traveling salesman problem (TSP).
To test the graph bisection problem change to the GB directory and run GBRun.
To test the TSP problem change to the TSP directory and run TspRun.
7
Download