Exploration of effects of different network topologies on the ESN

advertisement
Exploration of effects of different network topologies on the ESN signal
crosscorrelation matrix spectrum
by
Benjamin Liebald
b.liebald@iu-bremen.de
A thesis submitted in partial satisfaction of the
requirements for the degree of
Bachelor of Science (BSc.)
in the
School of Engineering & Science
at
INTERNATIONAL UNIVERSITY BREMEN
Supervisor:
Herbert Jaeger
Spring Semester 2004
2
Contents
1 Executive Summary
4
2 Summary Description
5
3 Motivation and Proposed Research Questions
6
4 Conducted Experiments
8
4.1
4.2
The General Test Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8
4.1.1
The Echo State Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9
4.1.2
The Testing Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10
4.1.3
ESN Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12
4.1.4
ESN Exploitation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13
4.1.5
Computing the Eigenvalue Spread . . . . . . . . . . . . . . . . . . . . . . . . .
14
4.1.6
Parameter variation & Validation Schemes
. . . . . . . . . . . . . . . . . . . .
14
Network Topologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14
4.2.1
Scale-Free Nets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14
4.2.2
Small-World Nets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
16
4.2.3
Nets arising from spatial growth . . . . . . . . . . . . . . . . . . . . . . . . . .
18
4.2.4
Why Scale-Free? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
20
4.2.5
Recursively built nets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
21
5 Experimental Results
23
5.1
Random Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
23
5.2
Scale-Free Nets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
24
5.3
Small-World Nets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
24
5.4
Spatial-Growth Nets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
25
5.5
Recursive Nets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
25
5.6
Overall Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
27
6 Future Investigations
28
CONTENTS
CONTENTS
Bibliography
29
A MATLAB Code
31
3
4
Chapter 1
Executive Summary
Echo State Networks (ESN) have been used successfully in a broad range of applications, from dynamic
pattern recognition to channel equalization in digital communication systems [4, 6, 8]. Their simplicity
and ease of use, paired with their underlying mathematical power make them an ideal choice in many
black-box modelling tasks. For many applications, however, it is mandatory to learn and adjust the
ESN parameters online, i.e. during the actual task that the ESN is supposed to perform. A good
example is high-frequency digital communications with cellular phones, where an ESN could be used
to cancel out the noise that is injected into the radiofrequency signal. As the characteristics of the
noise change over time (since sender and receiver are moving), the ESN needs to change its internal
parameters adaptively. So far, certain properties of ESNs make online learning difficult and slow. The
aim of this thesis is to investigate how changes in the topological structure of Echo State Networks
affect its online learning ability. My results suggest that the tested methods do not lead to a significant
improvement over standard ESN network topologies. However, some ideas for future research in this
area are outlined, and some potentially better suited approaches are described.
5
Chapter 2
Summary Description
Echo State Networks (ESN) [4, 6] are a special form of recurrent neural networks (RNNs), which
allow for the blackbox modelling of nonlinear dynamical systems. In contrast to normal RNNs, where
training is computationally very costly [6], ESNs only adjust the set of output weights leading from
the internal nodes to the output nodes. The computation of optimal weights can then be achieved by
a simple linear regression in the offline case. ESNs have been applied to a variety of tasks from inverse
system identification to dynamic pattern recognition tasks with great success, often outperforming
state of the art methods by orders of magnitude. In many tasks, however, an online adaptive learning
of the output weights is required, for example in high frequency communications, where ESNs could
be used for channel equalization. The most well-known online learning algorithm, the Least-MeanSquare (LMS) algorithm, however, is difficult to use with ESNs as the cross-correlation matrix of
internal states shows a large eigenvalue spread. This leads to a very slow convergence behaviour of
the weight vector w , rendering the LMS algorithm useless with current ESN implementations. The
aim of this thesis is to investigate how changes to the internal topology of ESNs affect the eigenvalue
spread of the crosscorrelation matrix R . To this end, several different network topologies are used
to represent the internal weight matrix of the ESN, among them nets that possess so-called scale-free
and small-world properties [9, 1, 12]. These ESNs are then used for a standard system identification
task and the eigenvalue spread of R is observed, as well as the modelling quality of the ESN. The
obtained results suggest that none of the investigated network topologies is well suited to decrease the
eigenvalue spread of R . A second experiment investigates a large number of very small networks (3-4
internal neurons). Those that exhibit low eigenvalue spread are then used to build larger networks
in a recursive fashion. My results suggest that this method is not suited to significantly lower the
eigenvalue spread either, but more systematic experiments should be carried out to gather further
evidence.
6
Chapter 3
Motivation and Proposed Research
Questions
Echo State Networks (ESN) are recurrent neural networks (RNN) that allow for the blackbox modelling
of (nonlinear) dynamical systems. In contrast to other RNN approaches, ESNs do not train the input
and internal weights of the network. The learning procedure purely consists of finding optimal output
weights Wout that lead from internal neurons to output nodes. As the activation function of output
nodes is assumed to be linear, optimal weights can be found by performing a conceptually simple and
computationally fast linear regression. This makes ESNs both easy to implement and efficient to use
(see figure 3.1)
Figure 3.1: Traditional training of RNNs and the echo state network approach (from [2])
In many applications, it is necessary to learn the output weights online, i.e. during the actual task
that the ESN is supposed to carry out. A good example is channel equalization in high-frequency
digital communication, where the channel properties change over time. The most popular algorithm
for online learning of adaptive filters is the Least Mean Square (LMS) algorithm (also known as
Widrow-Hoff delta rule or stochastic gradient descent)[3]. The performance of this algorithm, however,
depends critically on the eigenvalue spread of the cross-correlation matrix R = E[x(n)x(n)T ] of input
CHAPTER 3. MOTIVATION AND PROPOSED RESEARCH QUESTIONS
signals x(n). When the ratio s = |λmax (R)/λmin (R)| is large, the LMS algorithm converges very
slowly and becomes inefficient. In the ESN case, the internal states of the network (referred to as
”dynamical reservoir”) can be seen as the input signals to the LMS algorithm. With all current ESN
implementations, their eigenvalue spread is very large, ranging from approximately 1012 to 1018 1 ,
effectively prohibiting utilization of the LMS algorithm.
So far, ESN tasks that required online learning of weights employed the Recursive Least Square (RLS)
filter algorithm. However, this algorithm has a number of disadvantages compared to LMS: It has
higher computational cost (quadratic in filter length, as compared to linear computational complexity
for LMS) and space complexity, it is more difficult to implement and can fall prey to instability [3].
Therefore, it would be highly desirable if the LMS algorithm instead of RLS could be used for ESN
online learning tasks.
In this thesis, I am addressing the following questions:
• How can the spectrum of the cross-correlation matrix R be changed to permit the usage of
LMS?
• In particular, what effect do different network topologies have on the eigenvalue spread of R?
The following network topologies will be investigated:
– Scale-Free nets (section 4.2.1)
– Small-World nets (section 4.2.2)
– Nets that arise by spatial growth mechanisms (section 4.2.3)
– ”Recursive nets”, that are built by interconnecting lot of small identical nets (section 4.2.5)
• If a certain topology is able to reduce the spread, does it also deliver a similar performance (in
terms of mean-square-error) as current ESN implementations?
The conducted experiment are described in more detail in Chapter 4.
1 For network sizes of 400 to 1000 internal neurons. Source: Personal communication with Herbert Jaeger and own
experiments.
7
8
Chapter 4
Conducted Experiments
The general aim of all conducted experiments was to investigate the influence of different network
topologies on the eigenvalue spread of the crosscorrelation matrix R = E[x(n)x(n)T ] of internal
states x(n). To this end, a general experimental framework was developed, which is described in
detail in Section 4.1. The different experiments then generated nets of a certain topology and ran
them on the provided framework. The different network types are described in more detail in Sections
4.2.1 to 4.2.5. All experiments were implemented in MATLAB 6 (Release 13). The essential parts of
the code can be found in Appendix A.
4.1
The General Test Framework
The general testing framework consists of several parts:
• An Echo State Network.
• A discrete-time input/output (I/O) sequence (divided into training and testing part).
• An algorithm that computes the optimal weights from the training part of the input output
sequence.
• A testing part which tests the computed weights on the testing data of the I/O sequence.
• A part that computes the eigenvalue spread of the crosscorrelation matrix R .
Each of these parts is described in the following sections.
4.1. THE GENERAL TEST FRAMEWORK
4.1.1
CHAPTER 4. CONDUCTED EXPERIMENTS
The Echo State Network
An Echo State Network is a type of a recurrent neural network. It is fully characterized by its weight
matrices and activation functions. In particular, an ESN consists of
• K input units u(n) = (u1 (n), . . . , uK (n))T ,
• N internal units x(n) = (x1 (n), . . . , xN (n))T ,
• L output units y(n) = (y1 (n), . . . , yL (n))T ,
• An N × K input weight matrix Win ,
• An N × N internal weight matrix W,
• An L × (K + N + L) output weight matrix Wout ,
• Possibly an N × L backprojection weight matrix Wback ,
• An activation function f (usually tanh or another sigmoidal function),
• An output function f out (in our case usually linear).
Input signals are fed into the input units and propagate into the internal units. Formally
x(n + 1) = f (Win u(n + 1) + Wx(n) + Wback y(n))
(4.1)
The output at time n + 1 is then computed as
y(n + 1) = f out (Wout (u(n + 1), x(n + 1), y(n + 1)))
(4.2)
In traditional implementations of RNNs, all weights were trained to adjust the output. Standard
training procedures to achieve this have a high computational complexity and could only find local
minima1 . Therefore, the size of these RNNs was usually limited to 3 to 30 internal units. In the
echo state network approach, the internal weight matrix W, the input weight matrix Win and the
backprojection matrix Wback are defined initially and are not affected by the training procedure. Only
the output weight matrix Wout is trained. Since the activation function of output units is linear, this
can be achieved by computing a simple linear regression [4, 6].
In contrast to traditional RNNs, Echo State Networks are therefore usually quite large, with hundreds
or even thousands of internal units. This is necessary to have a rich set of dynamics from which the
output signal can be ”composed”. The exact number of internal neurons N is task specific. It should
not be too small, because this would prohibit learning of complex dynamics with long memory. On
1 For
a good overview of the most common training techniques for RNNs, see [6]
9
4.1. THE GENERAL TEST FRAMEWORK
CHAPTER 4. CONDUCTED EXPERIMENTS
the other hand, it should not be too large either, because this would lead to an overfitting of the
training data. This is the classical bias-variance dilemma which arises in all supervised learning tasks.
In the test cases described in this thesis, N was always set to 500.
The procedure to generate the weight matrices Win , Wback , Wout and W is as follows (adapted from
[6]):
• Win : These weights can, in all cases, simply be drawn from a uniform distribution over [−a, a].
The bigger a, the more input-driven the network becomes, i.e. if a is large, the internal states
will be strongly dominated by the input, making the sigmoid activation functions operate in the
non-central, non-linear part. In the extreme case, large a leads to binary switching dynamics of
internal nodes (i.e. their activation is either +1 or −1). For smaller a, the network is operating
close to its zero state, i.e. the state attained if no input is given. The sigmoid activation functions
are operated in the central, linear area. In my implementation a was usually set to 1, if not
mentioned otherwise.
• Wback : Similar remarks as for Win apply. Large a will lead to a strong dependence on the
output, small a to a much more subtle excitation. In my implementation a was usually set to
1, if not mentioned otherwise.
• Wout : This is the only matrix that is trained by the ESN training algorithm (see Section 4.1.3.
For the training phase, these weights are irrelevant and can be set to arbitrary values (usually
simply the zero vector 0)
• W: In standard ESN implementations, W is a sparse random matrix W0 which is scaled by
its largest eigenvalue to obtain the Echo State property: W = α · W0 /|λmax |, where λmax is
the largest eigenvalue of W0 and α < 1 is a scaling constant (spectral radius), which depends
on the specific task at hand (in all my experiments, α = 0.8). In my experiments, however, W0
was not created by a random procedure, but generated algorithmically (as outlined in Sections
4.2.1 to 4.2.5). These nets were then scaled by the same procedure as above.
4.1.2
The Testing Systems
The Echo State Network created as described in Section 4.1.1 was then driven by two input/output
sequences:
• A NARMA system
• The Mackey-Glass chaotic attractor time series
10
4.1. THE GENERAL TEST FRAMEWORK
CHAPTER 4. CONDUCTED EXPERIMENTS
The NARMA System
A NARMA (Nonlinear Autoregressive Moving Average) system is a discrete time system of the following form
y[n] = f (y[n − 1], y[n − 2], . . . , y[0], u[n], u[n − 1], . . . , u[0])
where y[n] is the system output at time n, u[n] is the system input at time n, and f is an arbitrary
vector-valued, possibly non-linear function. The characteristic property of these systems is that the
current output depends both on the input and output history. Modelling these systems is, in general,
quite difficult, due to the arbitrary non-linearity and possibly long memory (i.e. y[n] might depend
on many former inputs and outputs).
The particular system that was chosen for our test had an input/output dimension of 2 and was
described by the following equations
y1 [n]
= u2 [n − 5] · u2 [n − 10] + u2 [n − 2] · y2 [n − 2]
y2 [n]
= u2 [n − 1] · u2 [n − 3] + u2 [n − 2] · y1 [n − 2]
As inputs we chose u1 [n] = 1∀n (a so-called bias input, only relevant for the network) and u2 [n] as
random samples drawn from a uniform distribution over the interval [0, 1]. The output samples were
then computed using the update equations above. All sequences had a length of 5000 samples each.
A sample plot of both outputs is shown in Figure 4.1.
Figure 4.1: The 2 outputs of the NARMA system (for a sequence of 500 samples)
The Mackey-Glass Chaotic Attractor
In addition to the NARMA system described above, some ESNs were also tested on the Mackey-Glass
time series (MGS), a standard benchmark test in the time series prediction community. This system
has no inputs and only one output, which is computed as follows:
11
4.1. THE GENERAL TEST FRAMEWORK
y[n + 1] = y[n] + δ
CHAPTER 4. CONDUCTED EXPERIMENTS
!
0.2y[n − τ /δ]
10
(1 + y[n − τ /δ])
− 0.1y[n]
where δ is a stepsize parameter that arises from the discretization of the original continuous-time
MG equation, and τ is a delay parameter that influences the degree of chaos the MGS features. A
standard choice is τ = 17, which produces a ”mildly chaotic” behaviour, and δ = 0.1 with subsequent
subsampling by 10. It is important to note that the evolution of the MGS depends on the initial
values of y (y[0], . . . y[τ /δ]). These were simply random samples drawn from a uniform distribution
over [1.1, 1.3]. Using this method, we generated 4100 samples, of which the first 100 were discarded
to wash out initial transients. Although the MGS output does not depend on any input sequence, we
still created a bias input of the same length, which is simply used by the network in its state update
procedure.
A sample plot of 1000 samples from the MGS output is shown in figure 4.2.
Figure 4.2: A sample plot of the Mackey-Glass time series (1000 timesteps)
4.1.3
ESN Training
Given the ESN and the I/O sequences from Sections 4.1.1 and 4.1.2, the network can now be trained
to learn the system characteristics. To this end, the available I/O sequences are divided into three
parts:
1. An initial part, which is not used for training but serves the purpose of getting rid of initial
transients in the network’s internal states (length `init )
12
4.1. THE GENERAL TEST FRAMEWORK
CHAPTER 4. CONDUCTED EXPERIMENTS
2. A training part, which is used in the actual learning procedure of adjusting the output weights
(length `train )
3. A testing part, which is used to test the newly trained network on additional data (length `test )
The training of the network is done as follows:
First, the networks internal state vector x is initialized to random values. Then, the system is run on
the initial part and the training part of the I/O sequence, i.e. input samples are written into the input
nodes, output samples are written into the output nodes, and the internal states of the next timestep
are computed according to equation 4.1. It is important to note that the output update equation 4.2
is not used during training, since the output weights are not yet set to their final values. Instead, the
output nodes are just overwritten by the output part of the I/O sequence (”teacher forcing”).
The internal state vector is recorded for each sample that is part of the training sequence after the
network has been run on the inital sequence to remove transient effects. From the set of these state
vectors, one can compute the N × 1 cross-correlation vector of internal states with the desired output:
p = E[x(n)d(n)]
(4.3)
where d(n) is the desired output signal at time n, and the N × N cross-correlation matrix
R = E[x(n)x(n)T ]
(4.4)
which provides a measure of the correlation between different internal states.
The optimal output weights can then be calculated according to the Wiener-Hopf equation (see [3]
for a derivation of this formula):
Wout = R−1 p
(4.5)
This computation can be shown to be equivalent to a least-mean-square solution by computing the
pseudo-inverse of the state vector x(n).
4.1.4
ESN Exploitation
After the computation of optimal weights according to equation 4.5, the network is run on the remaining I/O sequence, to test whether the new solution generalizes well to unknown data. In this
so-called exploitation phase, the output of the system is not known anymore, but only the input signal
is given to the ESN. As before, internal states are updated according to equation 4.1. In contrast to
the training phase the output units are not forced to the teacher signals, but are computed according
to equation 4.2. As a standard error measure, a normalized mean square error is computed as follows:
13
4.2. NETWORK TOPOLOGIES
CHAPTER 4. CONDUCTED EXPERIMENTS
s
NRMSE =
P`test
i=1
2
(ytest [i] − dtest [i])
`test · σ 2
where ytest [n] is the network output during the testing phase, dtest [n] is the desired output during
testing phase, and σ 2 is the variance of the desired output signal.
4.1.5
Computing the Eigenvalue Spread
The main aim of this thesis is to investigate the influence of different topologies on the eigenvalue
spread s of the cross-correlation matrix R. Therefore, this spread needs to be computed. Since R
has been computed already in order to solve equation 4.5 (Wiener-Hopf), the computation of s is
straightforward:
λmax (R) s ≡ λmin (R) (4.6)
where λmax (R) and λmin (R) are the largest and smallest eigenvalue of R, respectively.
4.1.6
Parameter variation & Validation Schemes
In order to achieve reliable results and eliminate outliers, multiple runs of the same experiment were
conducted.
1. Each network topology had one or more adjustable parameters. These were systematically within
sensible intervals in order to avoid using the wrong net configuration. The particular choices of
parameters are given in the sections describing the particular network topologies (Section 4.2).
2. For given network parameters, the test runs were repeated ten times in order to average outliers.
Both mean and standard deviation of the test error and the spread were computed.
4.2
Network Topologies
The aim of this thesis is to investigate the influence of different network topologies on the eigenvalue
spread of R, as described earlier. In the following sections, I will describe the kinds of networks that
I used in my experiments, why they might prove useful, and how they can be generated.
4.2.1
Scale-Free Nets
Networks are said to be scale-free if the probability P (k) of a node having k connections to other nodes
follows a power-law distribution P (k) ∝ k −γ (with γ typically between 2 and 4), i.e. there are very
14
4.2. NETWORK TOPOLOGIES
CHAPTER 4. CONDUCTED EXPERIMENTS
few highly connected nodes, whereas most nodes have only very few connections. Many real-world
networks have been shown to be scale-free, for example scientific or Hollywood actor collaboration
graphs, the World Wide Web [1], or the German highway system [9]. For a visualization of a scale-free
graph, see Figure 4.3
Figure 4.3: A scale-free graph of size 100 (generated with MATLAB)
It has been suggested that the scale-free organization of nodes arises from phenomena that are typical
for many networks that occur in nature: Growth over time and preferential attachment. Growth
over time means that most naturally occuring networks don’t come into existence with a fixed size,
but expand over time (new locations being added to the highway system, for instance). The idea of
preferential attachment suggests that newly added nodes are likely to connect to those nodes in the
net that already have a high connectivity. It was shown by Barabasi [1] that networks constructed
according to these principles feature a power-law distribution of P (k). An algorithm to generate scalefree networks is given in pseudo-code below, a MATLAB implementation can be found in appendix
A.
It should be noted that this algorithm generates undirected graphs, i.e. if node a is influenced by
node b, then node b is also influenced by node a. This symmetry is a potential drawback which could
be remedied by separately checking for each direction whether a link should be established.
The parameter k can be used to tune the density of the net. If k is large, the probability of establishing
a link grows, producing dense networks. If k is very low, the network will consequently feature very
low connectivity.
15
4.2. NETWORK TOPOLOGIES
CHAPTER 4. CONDUCTED EXPERIMENTS
Algorithm 1: Generate a scale free net of size N
1:
Let the net be represented by G.
2:
Start with a small net of m (3 to 10) initial nodes, randomly connnected to each other
3:
while m < N do
4:
generate a new node a, initially unconnected
5:
for all nodes b that are already part of the net do
6:
deg(b) ← number of nodes that b is connected to.
7:
edges(G) ← total number of edges in G.
8:
if rand() < k · deg(b)/edges(G) then {connect new node to old node with a probability
proportional to connectivity of old node}
Create a new undirected link between a and b
9:
end if
10:
11:
end for
12:
m←m+1
13:
end while
After creating a network with a scale-free topology in this manner, the weights were changed to random
values, since simple 0-1 connections seemed unreasonable. This was achieved by simple element-wise
multiplication with a random matrix of the same size, where random values where drawn from a
uniform distribution over [−0.5, 0.5]. In the next step, the spectral radius of the weight matrix was
changed as described in Section 4.1.1.
In my experiments I tested Scale-Free nets with the following choice of parameters:
k ∈ (0.1, 0.5, 1, 1.5, 2, 2.5, 3, 3.5, 4, 4.5, 5)
4.2.2
Small-World Nets
In addition to the scale-free property described above, many real-world networks have been shown
to possess the so-called Small-World property: Their average shortest-path (ASP) is comparable to
random networks of the same size, while they maintain a much higher clustering coefficient than
random networks. The ASP is the average path length of all shortest paths between any two nodes
in the network. The clustering coefficient is a measure to quantify the amount of interconnections
between groups of nodes: If node a is connected to a set of nodes S, then the clustering coefficient of
a is directly proportional to the number of connections within the set S ∪ a.
Some examples of real-world Small-World nets include scientific and actor collaboration graphs, subparts of the US power grid, the neural network of the nematode worm C. elegans (the only completely
mapped natural neural network), and subparts of mammal brains [12, 9]. Small-World nets have
16
4.2. NETWORK TOPOLOGIES
CHAPTER 4. CONDUCTED EXPERIMENTS
become famous due to the ”six degrees of separation” thesis by Watts et al [11], which states that two
humans are connected by a link chain with an average length of 6.
Watts and Strogatz [12] used the algorithm below to construct and quantify Small-World nets (a
MATLAB implementation is given in appendix A).
Algorithm 2: Generate a Small-World net of size N with E edges
1:
Let p be a probability measure between 0 and 1
2:
Let G represent the graph
3:
Generate a regular lattice of undirected links between nodes. For example, arrange nodes in a
circular fashion and let each node be connected to its two closest neighbours on either side.
4:
5:
for all edges e in the net do
if rand() < p then {rewire with probability p}
6:
delete edge e from net
7:
add a new edge between one of the nodes the original node belonged to and a randomly
chosen node
8:
9:
end if
end for
{Add more edges to reach a total of E edges}
10:
11:
12:
while edges(G) < E do
add another edge between two randomly chosen nodes
end while
The parameter p can be used to control the ”randomness” of the net. If p = 1, the net is totally
random, as all edges have been rewired. If p = 0 the net is totally regular, as the original regular
arrangement has been maintained (assuming that no additional nodes needed to be added). Watts
and Strogatz observed that a small increase in p from 0 to drastically lowers the average path length
since random connections are introduced into the network, which in most cases form shortcuts between
formerly distant nodes, whereas the average clustering coefficient remained almost unaffected.
A Small-World net with 50 nodes and 155 edges, generated with the above algorithm (with p = 1/4)
is shown in Figure 4.4.
As with the Scale-Free net algorithm, the generated nets are undirected, leading to a symmetric
matrix. This could again be remedied by starting off with a regular lattice of directed links and
rewiring these.
After the nets were generated as described above, they were again multiplied elementwise with a
random matrix (uniform distribution over [−0.5, 0.5]) and scaled to achieve a spectral radius smaller
than 1, as described in Section 4.1.1
I used the following set of parameters to create small-free nets of varying density (network size in all
17
4.2. NETWORK TOPOLOGIES
CHAPTER 4. CONDUCTED EXPERIMENTS
Figure 4.4: A Small-World graph of size 50 (generated with MATLAB)
cases was 500):
E ∈ (1e + 5, 5e + 4, 1e + 4, 5e + 3, 1e + 3)
4.2.3
Nets arising from spatial growth
A third algorithm to generate nets with Scale-Free and Small-World properties was proposed by Kaiser
and Hilgetag [9]: In their approach, nodes in the network have spatial information associated with
them, i.e. they can be located in some multidimensional metric space. When new nodes are added to
the network, they are likely to form connections to nodes that are close to them (where ”closeness”
is simply quantified by the Euclidean distance between the two nodes). This approach makes sense
when one considers some real-world networks: For a city in New England it makes more sense to
get connected to the Boston Highway Hub rather than to the one of Los Angeles, even though Los
Angeles is bigger (i.e. has higher connectivity). An algorithm that generates such spatial-growth nets
is given below (A MATLAB implementation can be found in the appendix).
A spatial-growth net generated with this algorithm can be seen in Figure 4.5
As can be seen from the algorithm, the parameter α can be used to adjust the distance sensitivity of
the probability P (a, b) of forming a link between a and b, whereas β can be used to adjust the overall
density of the net. Kaiser and Hilgetag classify nets into different categories (including Small-World
and Scale-Free nets) depending on these parameter choices [9].
It should be noted that the algorithm returns an adjacency matrix like the two other algorithms, but
in addition also a 2D coordinate for each node. However, for my purposes, this position information is
simply discarded, and the remaining weight matrix is multiplied element-wise with a random matrix
to get rid of simple 1-0 connections. Another possibility would be to adjust the weights between two
nodes depending on their distance, i.e. nodes with a great distance would have weaker connections
18
4.2. NETWORK TOPOLOGIES
CHAPTER 4. CONDUCTED EXPERIMENTS
Algorithm 3: Generate a Spatial-Growth net (in 2D space)
1:
Given the parameters α and β and the desired total numbers of nodes N
2:
Let the net be represented by G
3:
Start with a single node at (0.5,0.5)
4:
k←1
5:
while k < N do
6:
create a new node a at a random position in 2D space with coordinates in the interval [0, 1]
7:
c←0
8:
for all nodes b ∈ G do
9:
Calculate distance between a and b: d(a, b) =
p
(ax − bx )2 + (ay − by )2
10:
Calculate probability of forming a connection: P (a, b) = βe−αd(a,b)
11:
if rand() < P (a, b) then
12:
Add an undirected link between a and b to G
13:
c←c+1
14:
end if
15:
end for
16:
if c = 0 then {no connections could be established}
17:
18:
19:
delete node a from G
else {at least one connection was established, keep the node in the net}
k ←k+1
20:
end if
21:
end while
Figure 4.5: A Spatial-Growth graph of size N = 100, α = 5.0, β = 0.1 (generated with MATLAB)
19
4.2. NETWORK TOPOLOGIES
CHAPTER 4. CONDUCTED EXPERIMENTS
(small weights), whereas nodes that are close to each other are characterized by strong connnections.
I varied the parameters α and β over the following sets (all possible combinations were tested):
4.2.4
α
=
(0, 0.51, 1.5, 2, 2.5, 3, 5, 7, 9)
β
=
(0.1, 0.5, 1, 1.5, 2, 2.5, 3, 3.5, 4, 4.5, 5)
Why Scale-Free?
The rationale for using Scale-Free, Small-World, and similar nets as a model for the internal ESN
network topology is twofold:
1. As has been mentioned before, Scale-Free and Small-World nets occur in many different domains
in nature. Especially interesting is the fact that parts of mammal brains have been identified
as essentially Small-World nets [9]. In addition, it has also been shown that such structures can
arise even when nets are artificially constructed. For example parts of the Java Class Library
have been identified as both Scale-Free and Small-World [10]. It is therefore hoped that such
apparently natural structures might also lead to an improvement in the ESN case.
2. The second reason is connected to the strong clustering of Small-World (and, in parts, ScaleFree) nets. With random matrices, no particular clustering occurs, which intuitively leads to
internal dynamics that are ”averaged out” between different nodes. This would in turn then lead
to a relatively strong correlation between states. Strong correlations are connected to a very
uneven power density spectrum of x(n), which can be shown to negatively affect the eigenvalue
spread (for a more rigorous argument, see [3]). In contrast to these random nets, Small-World
and Scale-Free nets feature a strong clustering of nodes. Intuitively, this should lead to network
dynamics that are concentrated in these clusters, i.e. nodes within one cluster show similar
dynamics, whereas states of different clusters should essentially be uncorrelated, leading to an
overall lower correlation between pairs of nodes.
3. Even if it turns out that the overall correlation is not significantly decreased by the clustering,
it might be possible to select a few representative nodes from each cluster, and use only these as
the dynamical reservoir from which the output is composed. Using fewer nodes always leads to a
decrease in spread, but is usually inappropriate, as the learning capability of the network suffers
from this reduction. However, if nodes are used that appropriately represent the dynamics of
an entire cluster, the resulting dynamics might still be rich enough to approximate the original
system well enough.
20
4.2. NETWORK TOPOLOGIES
4.2.5
CHAPTER 4. CONDUCTED EXPERIMENTS
Recursively built nets
Another approach that is essentially unconnected to the aforementioned ones is presented in this
section. The idea is to find very small nets (3-5 nodes) that exhibit a comparably small spread and
investigate
• which topological features are shared by these nets,
• how many of these small nets can be combined to produce large networks (100 or more nodes).
The entire idea of starting off with small nets and iteratively building larger networks from them was
inspired from a talk given by Viktoras Jucikas in the ESN Guided Research Seminar at IUB. However,
he only presented the rough idea, the concrete implementation that I am describing was designed by
me.
My experiments in this area are far from being exhaustive. The very simple and naive approach I
have taken in this thesis is to view the the adjacency matrix of a net as a bitstring, i.e. each entry in
the adjacency matrix is either 1 or 0.
Here is a simple example. Consider the following net, consisting of 4 nodes:
Figure 4.6: A small net, consisting of only 4 nodes (generated with MATLAB)
The corresponding adjacency/ESN weight matrix is:
21
4.2. NETWORK TOPOLOGIES
CHAPTER 4. CONDUCTED EXPERIMENTS


0
1
1
0

 0

W=
 0

1
0
0
1
0
0
0

0 


0 

0
The corresponding bitstring is then 0001001000000110 (the least significant bit corresponding to the
entry (1,1) in the matrix). For a 3 × 3 matrix, there are 29 = 512 possible permutations, whereas for a
4 × 4 matrix there are 216 = 65536 possible permutations. Of course, many of them will be essentially
identical (since the ordering of nodes is arbitrary), but this was simply neglected in my approach.
In order to find nets with good spread, I simply iterated over all possible 1-0 matrices of size 4 × 4
and used them as an ESN weight matrix (after multiplying elementwise with a random matrix and
scaling to achieve a spectral radius smaller than 1). After finding those nets that exhibited low spread
(averaged over several runs), I tried to combine them into bigger nets as follows:
1. Given an N × N weight matrix WN , construct a 2N × 2N matrix W2N :
W2N =
WN
0
0
WN
!
2. In order to allow for some interaction between the two clusters, add a few random connections
between them (i.e. change some of the non-diagonal elements of W2N into 1’s).
3. Repeat steps 1 and 2 until the desired size is reached.
4. In order to get suitable ESN weight matrices, multiply element-wise with a random matrix and
scale in order to obtain a spectral radius smaller than 1, as described before.
Using this algorithm, I constructed nets of size 512 and tested them on the same datasets as the other
nets.
22
23
Chapter 5
Experimental Results
In this chapter, I will summarize the results that I obtained. Since I conducted a lot of experiments
with different parameters, I will only present a few representative results in more detail. A complete
table of all results can be found at http://pandora.iu-bremen.de/ bliebald/ESN (MATLAB .mat files).
In all cases, I will only present the results obtained with the NARMA system, since I did not have
the computational resources to do all experiments with the MGS as well.
For comparison, I will first present results for ESNs represented by a random sparse internal weight
matrix. These random matrices are the standard choice for most applications as of now. I will then
compare these results to the ones obtained with different network topologies.
5.1
Random Networks
As all other networks, the random networks that I investigated had a size of 500 internal nodes and
a spectral radius of 0.8. I used nets of different density d:
d ∈ (1e − 6, 1e − 5, 1e − 4, 1e − 3, 1e − 2, 1e − 1)
As mentioned before, each test was conducted ten times and the results were averaged. The best
results, regarding both spread and testing error were achieved with nets of comparably high density
(1% and 10%, respectively). This might simply be because all other nets just had a density that is
not used in practice. For example, for a net of 500 nodes, a density of 1e − 5 means that there are
only around 3 edges in total! Densities of 1-10% are simply more realistic for real-world tasks, so the
parameters should probably have been taken from a more representative set. Nevertheless, the almost
identical values for nets of 1% and 10% density and additional, less systematic tests, suggest that
there is hardly any difference for nets in this regime, at least in terms of spread. The most important
5.2. SCALE-FREE NETS
CHAPTER 5. EXPERIMENTAL RESULTS
results are summarised in Table 5.1.
network
spread
spread
test error test error
density
mean
std. dev.
mean
std. dev.
1 % (10e-2)
3.91e+13 1.3533e+13 0.5171
0.6162
10 % (10e-1) 2.45e+13 1.0625e+13 0.3102
0.0678
Table 5.1: Results for Random Networks of density 1% and 10%
The testing error is only given for the first output of the network for simplicity reasons. The results
for the second output are similar and can be found in at http://pandora.iu-bremen.de/ bliebald/ESN.
As is clearly visible from these results, the eigenvalue spread is quite large for random networks, but
this was, of course, expected.
5.2
Scale-Free Nets
We expected better results for Scale-Free nets. Unfortunately, this turned out to be a wrong hope.
The best results that could be achieved within our parameter variation are listed in Table 5.2.
network
spread
spread
density
mean
std. dev.
1.8% (k = 4.5) 6.3238e+13 5.2006e+13
2 % (k = 5)
5.3822e+13 3.7233e+13
Table 5.2: Results for Scale-Free Networks
test error test error
mean
std. dev.
0.3246
0.0564
0.8215
1.3709
of density 1.8% and 2%
Even though the average spread is slightly better for k = 5 as compared to k = 4.5, the testing
error is far worse, and the standard deviation of the error is quite huge. This is striking, since the
overall density is almost the same for both parameters. It is also interesting to note that the spread’s
standard deviation is larger than in the random case. I am not sure why this is the case.
Overall, there is no significant improvement over random nets, but Scale-Free nets actually appear to
perform worse or equally bad.
5.3
Small-World Nets
Unfortunately, also Small-World nets did not deliver a better performance than random networks.
The most important results are summarized in table 5.3.
network
spread
spread
density
mean
std. dev.
2% (E = 5e + 3)
2.5567e+13 1.3537e+13
40% (E = 1e + 5) 2.4813e+13 2.3269e+13
Table 5.3: Results for Small-World Networks
24
test error
mean
0.3224
0.2960
of density 2%
test error
std. dev.
0.0659
0.0557
and 40%
5.4. SPATIAL-GROWTH NETS
CHAPTER 5. EXPERIMENTAL RESULTS
These results are in its entirety in very similar ranges as the random networks, regarding both spread
as well as standard deviation. Small-World nets don’t seem to perform worse, but there is also no
significant improvement over random networks.
5.4
Spatial-Growth Nets
As was already expected from the previous results, Spatial-Growth nets also did not perform significantly better than random networks. The most important results are summarized in table 5.4.
network
spread
spread
test error
density
mean
std. dev.
mean
47 % (α = 5, β = 4.5) 1.3862e+13 6.4638e+12 0.2471
99 % (α = 0, β = 2.5) 1.4262e+13 4.5347e+12 0.2835
99 % (α = 1, β = 4)
1.5148e+13 6.0675e+12 0.2599
Table 5.4: Results for Spatial-Growth Networks of density 47%
test error
std. dev.
0.0376
0.0517
0.0502
and 99%
These results are all in all very surprising. Spatial-Growth nets seem to be the only nets that perform
better than random networks, though not substantially (1.38e+13 as compared to 2.45e+13 in the
best cases). However, the nets with the best performance show a surprisingly high density, which
does not allow for any particular clustering (with 99% density, almost all nodes are connected to each
other, making clustering virtually impossible).
Looking at Scale-Free, Small-World, and Spatial-Growth nets as a whole, it seems that their topology
has very little influence on the eigenvalue spread or the testing error of the ESN. I cannot give a definite
explanation why this is the case. An obvious reason might be that topology is simply irrelevant for
the spread. However, there are nets with special topologies (like feedforward nets) that have a very
low spread, so topology should play some role. Another possible reason might be that the parameters
were chosen incorrectly, and that different parameters would have led to better results. However, this
is also rather unlikely, since sensible parameter ranges were chosen for all three network types and the
spread varied only very little over all investigated parameter choices.
5.5
Recursive Nets
The results for recursive nets can be split into two parts: First, I ran an experiment over all possible
1-0 combinations of 4 × 4 weight matrices (using the NARMA system). As usual, matrices were
multiplied element-wise with a random values between -0.5 and 0.5. The best performing nets are
shown in Table 5.5.
By investigating the ten best performing networks, it is striking that all of them have a relatively
high density of 40 % or higher (corresponding to at least 6 non-zero elements in the matrix). Other
than that, it is hard to find any similarities. Some of the matrices have single dependent nodes (i.e.
25
5.5. RECURSIVE NETS
network
matrix

0 1
 0 1

 0 0
 1 1
0 1
 1 1

 0 1
 1 0
1 1
 1 1

 1 0
1 1
Table 5.5:
CHAPTER 5. EXPERIMENTAL RESULTS
spread
mean
spread
std. dev.
test error
mean

1 1
0 0 
 26390
8811
0.9844
1 1 
0 0 
1 1
1 0 
 27710
8147
0.9475
1 1 
0 1 
0 1
1 0 
 30612
13404
0.9135
1 0 
0 1
Results for small nets of size 4 (averaged over
test error
std. dev.
0.0316
0.0294
0.0487
5 runs each)
nodes that have only one incoming connection), some don’t, some of the matrices have empty rows
(corresponding to nodes with no incoming connections), some don’t.
In the second part of this experiment, I tried to expand these well-performing nets into bigger ones,
in order to make them useful for real ESNs. As mentioned before, I used the simple block-diagonal
expansion to produce matrices of size 512, which corresponds to 7 ”doubling steps”. As the resulting
nets are slightly larger than the nets used before, an exact comparison is impossible. However, the
difference between 512 and 500 is sufficiently small to compare results in a qualitative fashion. The
performance of the three nets from above, expanded to 512 nodes, is summarized in table5.6.
network
spread
spread
matrix
mean
std. dev.


0 1 1 1
 0 1 0 0 


 0 0 1 1  1.1101e+16 1.4274e+16
 1 1 0 0 
0 1 1 1
 1 1 1 0 


 0 1 1 1  3.4353e+14 1.9275e+14
 1 0 0 1 
1 1 0 1
 1 1 1 0 


 1 0 1 0  6.7843e+14 1.8717e+14
1 1 0 1
Table 5.6: Results for Recursive Nets of size 512
test error
mean
test error
std. dev.
1.1595
1.9017
0.3592
0.0509
0.5779
0.2722
(averaged over 5 runs each)
It should be noted that this experiment was run on the ten best-performing nets of the first part.
Those results that are not shown do not differ significantly from the ones shown in Table 5.6. It is
striking (and disappointing) to see, that these nets actually perform worse by one or more orders of
magnitude than random nets. On the other hand, this at least suggests that there has to be some
connection between network topology and eigenvalue spread.
26
5.6. OVERALL EVALUATION
CHAPTER 5. EXPERIMENTAL RESULTS
I did not have sufficient time to investigate these Recursive Nets any further. A few ideas for future
research in this area are laid out in chapter 6.
5.6
Overall Evaluation
As a general result, I can say that none of the investigated network topologies was able to perform
significantly better than simple random networks, both in terms of eigenvalue spread as well as testing
error. In fact, most networks performed worse than the randomly created nets. It is also striking to
see that high-density networks often perform comparably well, such as the presented Spatial-Growth
networks, which featured connectivity of 40 % or higher.
The reasons for these rather disappointing results are unknown to me. It might be that topology does
not play a big role for the eigenvalue spread, but the results obtained with Recursive Nets do not
support this claim. It might be a good idea for future research, to find some theoretical insights into
this connection.
27
28
Chapter 6
Future Investigations
Unfortunately, my experiments did not lead to any improvement of the eigenvalue spread of R.
However, there are a few points in connection with different network topologies that would deserve
some more attention:
• Recursive nets should be investigated further. For example, one could investigate all possible
1-0 combinations of 5 × 5 matrices. Even though there are 225 ≈ 33 · 106 permutations, one could
significantly reduce this number by discarding symmetrically identical matrices and thus make
this experiment computationally feasible. More interesting, however, should be the investigation
of different expansion methods, i.e. how to build larger networks from small ones. The approach
that I have chosen might not necessarily be the best one.
• It could be investigated in how far it is possible to use only very few nodes as the actual reservoir
from which the output signal is composed. The idea is to take a strongly clustered net, such as a
Small-World net, and use only few representatives per cluster for learning. Since only few nodes
participate in the learning procedure, the eigenvalue spread will be decreased significantly.
• One could try to use one-to-one models of real-world networks (such as the neural network of
the C. elegans worm) as ESN weight matrices.
29
Bibliography
[1] Albert-László Barabási and Réka Albert. Emergence of scaling in random networks. Science,
286:509–512, 1999.
[2] Herbert Jaeger. The echo-state approach to recurrent neural networks.
slide presentation, can be obtained from
http://www.faculty.iu-bremen.de/hjaeger/courses/SeminarSpring04/ESNStandardSlides.pdf .
[3] Herbert Jaeger. Lecture notes for ”Machine Learning”, Fall Semester 2003, International University Bremen.
[4] Herbert Jaeger. The ”Echo State” Approach to Analysing and Training Recurrent Neural Networks. Technical Report 148, GMD - Forschungszentrum Informationstechnik GmbH - Institut
für autonome intelligente Systeme, 2001.
[5] Herbert Jaeger. Short Term Memory in Echo State networks. Technical Report 152, GMD
- Forschungszentrum Informationstechnik GmbH - Institut für autonome intelligente Systeme,
2001.
[6] Herbert Jaeger. Tutorial on training recurrent neural networks, covering BPPT, RTRL, EKF
and the ”echo state network” approach. Technical Report 159, GMD - Forschungszentrum Informationstechnik GmbH - Institut für autonome intelligente Systeme, 2001.
[7] Herbert Jaeger. Adaptive nonlinear system identification with echo state networks. In NIPS,
2002.
[8] Herbert Jaeger and Harald Haas. Harnessing nonlinearity: predicting chaotic systems and boosting wireless communication. Science, April 2nd, 2004.
[9] Marcus Kaiser and Claus C. Hilgetag. Spatial growth of real-world networks. Phys. Rev. E,
69:036103, 2004.
[10] Sergi Valverde, Ramon Ferrer Cancho, and Ricard V. Solé. Scale-free networks from optimal
design. Europhysics Letters, 2002.
BIBLIOGRAPHY
BIBLIOGRAPHY
[11] Duncan Watts. Website of the Small World Project: http://smallworld.columbia.edu/.
[12] Duncan J. Watts and Steven H. Strogatz. Collective dynamics of small-world networks. Nature,
393:440–442, 1998.
30
31
Appendix A
MATLAB Code
c
Listing A.1: A function that generates a Scale-Free graph (Markus
Kaiser)
function m at r i x = s c a l e f r e e g r a p h ( n , f c ) ;
% matrix = s c a l e f r e e g r a p h (n , f c )
% y i e l d s m a t r i x o f a s c a l e −f r e e g r ap h
% w i t h n nodes
% fc i s a f ac t or determining f i n a l density
% Author : Marcus K a i s e r
Date : 8 . 1 2 . 0 2
NODES = n ;
INITIALNODES = 3 ;
% g e n e r a t e i n i t i t a l m a t r i x ( two nodes ; u n d i r e c t e d l i n k )
m at r i x = zeros (NODES,NODES) ;
m at r i x ( 1 , 2 ) = 1 ;
m at r i x ( 2 , 1 ) = 1 ;
m at r i x ( 1 , 3 ) = 1 ;
m at r i x ( 3 , 1 ) = 1 ;
m at r i x ( 3 , 2 ) = 1 ;
m at r i x ( 2 , 3 ) = 1 ;
nodes incl = [1;2;3];
% a g g r e g a t i o n o f nodes t o i n i t i a l m a t r i x
m = INITIALNODES ;
k = zeros (NODES, 1 ) ;
APPENDIX A. MATLAB CODE
f o r i = 1 :m
k ( i ) = (sum( m a tr i x ( i , : ) )+sum( matrix ( : , i ) ) ) ;
end ;
while m < NODES
m = m + 1;
f o r i = 1 :m−1
P = k ( i ) / sum( k ) ;
i f ( rand ( 1 , 1 ) <= P ∗ f c )
k( i ) = k( i ) + 2;
k (m) = k (m) + 2 ;
mat r i x ( i ,m) = 1 ;
mat r i x (m, i ) = 1 ;
end ; % i f
end ; % f o r
end ; % w h i l e m
return ;
32
APPENDIX A. MATLAB CODE
c
Listing A.2: A function that generates a Small-World graph (Markus
Kaiser)
function sw = s m a l l w o r l d g r a p h ( n , e ) ;
% sw = s m a l l w o r l d g r a p h ( n , e )
% y i e l d s m a t r i x o f a s m a l l −w o r l d g r ap h
% w i t h n nodes and e e d g e s
% a l g o r i t h m d e s c r i b e d i n : Watts & S t r o g a t z , 1 9 9 8
% constants
K = ceil ( e / n) ; % neighbors
OneSideK = K / 2 ;
prob = 0 . 2 5 ;
% g e n e r a t e i n i t i a l m a t r i x ( p=0)
sw=zeros ( n , n ) ;
for i = 1 : n
f o r j =1: OneSideK
% n e i g h b o r i n g nodes a f t e r node i
neighbor = i + j ;
i f neighbor > n
neighbor = neighbor − n ;
end ;
sw ( i , n e i g h b o r ) = 1 ;
sw ( n e i g h b o r , i ) = 1 ;
end ;
f o r j =1: OneSideK
% n e i g h b o r i n g nodes b e f o r e node i
neighbor = i − j ;
i f neighbor < 1
neighbor = neighbor + n ;
end ;
sw ( i , n e i g h b o r ) = 1 ;
sw ( n e i g h b o r , i ) = 1 ;
end ;
end ;
% r e w i r i n g ( p=p r o b )
f o r j =1: OneSideK
f o r i =1:n
neighbor = i − j ;
% n e i g h b o r j b e f o r e node i
33
APPENDIX A. MATLAB CODE
i f neighbor < 1
neighbor = neighbor + n ;
end ;
i f ( sw ( i , n e i g h b o r )==1) && (rand < prob )
dummy = randperm( n ) ;
while ( sw ( i , dummy( 1 ) ) ˜ = 0 ) | | ( dummy( 1 ) == i )
dummy = randperm( n ) ;
end ;
sw ( i , n e i g h b o r ) = 0 ;
sw ( i , dummy( 1 ) ) = 1 ;
end ;
neighbor = i + j ;
% n e i g h b o r j a f t e r node i
i f neighbor > n
neighbor = neighbor − n ;
end ;
i f ( sw ( i , n e i g h b o r )==1) && (rand < prob )
dummy = randperm( n ) ;
while ( sw ( i , dummy( 1 ) ) ˜ = 0 ) | | ( dummy( 1 ) == i )
dummy = randperm( n ) ;
end ;
sw ( i , n e i g h b o r ) = 0 ;
sw ( i , dummy( 1 ) ) = 1 ;
end ;
end ; % f o r i
end ; % f o r j
% d e l e t e a l l but e edges
e d g e s = sum(sum( sw ) ) ;
victims = edges − e ;
while v i c t i m s > 0
dummy = randperm( n ) ;
i = dummy( 1 ) ;
dummy = randperm( n ) ;
j = dummy( 1 ) ;
while sw ( i , j )==0
dummy = randperm( n ) ;
i = dummy( 1 ) ;
dummy = randperm( n ) ;
34
APPENDIX A. MATLAB CODE
j = dummy( 1 ) ;
end ;
sw ( i , j ) = 0 ;
victims = victims − 1;
end ;
%e d g e s = sum ( sum ( sw ) )
% analysis
%cc = c l u s t e r c o e f f ( sw )
%d = d e n s i t y ( sw )
%l = asp ( sw )
35
APPENDIX A. MATLAB CODE
c
Listing A.3: A function that generates a Spatial-Growth graph (Markus
Kaiser)
function [ matrix , p o s i t i o n ] = s p a t i a l g r a p h ( n , a s t a r t , b ) ;
% [ matrix , p o s i t i o n ] = s p a t i a l g r a p h ( n , a s t a r t , b ) ;
% n : number o f nodes
% a s t a r t : s t a r t i n g v a l u e o f t h e d i s t a n c e −dependence a l p h a t h a t remains
%
unchanged as l o n g as a s t e p i s s e t t o z e r o
%
a s t a r t −> 0
=> n e t w o r k i s i n d e p e n d e n t o f d i s t a n c e s
%
a s t a r t > > 10
=> o n l y n e a r b y nodes remain
% b : s c a l i n g parameter b e t a a f f e c t i n g t h e d e n s i t y o f t h e n e t w o r k
% Author :
Marcus K a i s e r
% Date :
4.09.2002
% constants
NODES = n ;
INISIZE = 1 ;
% parameters
astep = 0;
%. 2 5 ;
a = astart ;
% variables
m at r i x = zeros (NODES,NODES) ;
% c o n n e c t i v i t y m a t r i x ( no d i s t a n c e s ! )
p o s i t i o n = zeros (NODES, 2 ) ;
% ( x , y ) p o s i t i o n s o f t h e nodes
d i s t a n c e = zeros (NODES, 1 ) ;
% d i s t a n c e s o f new node t o e x i s t i n g nodes
% i n i m a t r i x ( one i n i t i a l node a t p o s i t i o n ( 0 . 5 , 0 . 5 )
position (1 ,:) = [0.5 0.5];
n = INISIZE + 1 ;
while n <= NODES
p o s i t i o n ( n , : ) = rand ( 1 , 2 ) ;
% random p o s i t i o n f o r c a n d i d a t e node
f o r i =1:n−1
% d i s t a n c e s t o node n
d i s t a n c e ( i ) = sqrt ( ( p o s i t i o n ( n , 1 )−p o s i t i o n ( i , 1 ) ) ˆ 2 + ( p o s i t i o n ( n , 2 )−
position ( i ,2) ) ˆ2 ) ;
36
APPENDIX A. MATLAB CODE
prob = b ∗ exp( − a ∗ d i s t a n c e ( i ) ) ;
i f rand ( 1 ) <= prob
ma t r i x ( i , n ) = 1 ;
ma t r i x ( n , i ) = 1 ;
end ; % i f
end ; % f o r
i f deg ( matrix , n ) > 0
n = n + 1;
a = a + astep ;
end ; % i f
end ; % w h i l e n
return ;
37
% spatial contraint
APPENDIX A. MATLAB CODE
Further MATLAB scripts to repeat the described experiments, etc. are available from the author
upon request.
38
Download