Exploration of effects of different network topologies on the ESN signal crosscorrelation matrix spectrum by Benjamin Liebald b.liebald@iu-bremen.de A thesis submitted in partial satisfaction of the requirements for the degree of Bachelor of Science (BSc.) in the School of Engineering & Science at INTERNATIONAL UNIVERSITY BREMEN Supervisor: Herbert Jaeger Spring Semester 2004 2 Contents 1 Executive Summary 4 2 Summary Description 5 3 Motivation and Proposed Research Questions 6 4 Conducted Experiments 8 4.1 4.2 The General Test Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 4.1.1 The Echo State Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 4.1.2 The Testing Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 4.1.3 ESN Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 4.1.4 ESN Exploitation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 4.1.5 Computing the Eigenvalue Spread . . . . . . . . . . . . . . . . . . . . . . . . . 14 4.1.6 Parameter variation & Validation Schemes . . . . . . . . . . . . . . . . . . . . 14 Network Topologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 4.2.1 Scale-Free Nets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 4.2.2 Small-World Nets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 4.2.3 Nets arising from spatial growth . . . . . . . . . . . . . . . . . . . . . . . . . . 18 4.2.4 Why Scale-Free? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 4.2.5 Recursively built nets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 5 Experimental Results 23 5.1 Random Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 5.2 Scale-Free Nets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 5.3 Small-World Nets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 5.4 Spatial-Growth Nets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 5.5 Recursive Nets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 5.6 Overall Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 6 Future Investigations 28 CONTENTS CONTENTS Bibliography 29 A MATLAB Code 31 3 4 Chapter 1 Executive Summary Echo State Networks (ESN) have been used successfully in a broad range of applications, from dynamic pattern recognition to channel equalization in digital communication systems [4, 6, 8]. Their simplicity and ease of use, paired with their underlying mathematical power make them an ideal choice in many black-box modelling tasks. For many applications, however, it is mandatory to learn and adjust the ESN parameters online, i.e. during the actual task that the ESN is supposed to perform. A good example is high-frequency digital communications with cellular phones, where an ESN could be used to cancel out the noise that is injected into the radiofrequency signal. As the characteristics of the noise change over time (since sender and receiver are moving), the ESN needs to change its internal parameters adaptively. So far, certain properties of ESNs make online learning difficult and slow. The aim of this thesis is to investigate how changes in the topological structure of Echo State Networks affect its online learning ability. My results suggest that the tested methods do not lead to a significant improvement over standard ESN network topologies. However, some ideas for future research in this area are outlined, and some potentially better suited approaches are described. 5 Chapter 2 Summary Description Echo State Networks (ESN) [4, 6] are a special form of recurrent neural networks (RNNs), which allow for the blackbox modelling of nonlinear dynamical systems. In contrast to normal RNNs, where training is computationally very costly [6], ESNs only adjust the set of output weights leading from the internal nodes to the output nodes. The computation of optimal weights can then be achieved by a simple linear regression in the offline case. ESNs have been applied to a variety of tasks from inverse system identification to dynamic pattern recognition tasks with great success, often outperforming state of the art methods by orders of magnitude. In many tasks, however, an online adaptive learning of the output weights is required, for example in high frequency communications, where ESNs could be used for channel equalization. The most well-known online learning algorithm, the Least-MeanSquare (LMS) algorithm, however, is difficult to use with ESNs as the cross-correlation matrix of internal states shows a large eigenvalue spread. This leads to a very slow convergence behaviour of the weight vector w , rendering the LMS algorithm useless with current ESN implementations. The aim of this thesis is to investigate how changes to the internal topology of ESNs affect the eigenvalue spread of the crosscorrelation matrix R . To this end, several different network topologies are used to represent the internal weight matrix of the ESN, among them nets that possess so-called scale-free and small-world properties [9, 1, 12]. These ESNs are then used for a standard system identification task and the eigenvalue spread of R is observed, as well as the modelling quality of the ESN. The obtained results suggest that none of the investigated network topologies is well suited to decrease the eigenvalue spread of R . A second experiment investigates a large number of very small networks (3-4 internal neurons). Those that exhibit low eigenvalue spread are then used to build larger networks in a recursive fashion. My results suggest that this method is not suited to significantly lower the eigenvalue spread either, but more systematic experiments should be carried out to gather further evidence. 6 Chapter 3 Motivation and Proposed Research Questions Echo State Networks (ESN) are recurrent neural networks (RNN) that allow for the blackbox modelling of (nonlinear) dynamical systems. In contrast to other RNN approaches, ESNs do not train the input and internal weights of the network. The learning procedure purely consists of finding optimal output weights Wout that lead from internal neurons to output nodes. As the activation function of output nodes is assumed to be linear, optimal weights can be found by performing a conceptually simple and computationally fast linear regression. This makes ESNs both easy to implement and efficient to use (see figure 3.1) Figure 3.1: Traditional training of RNNs and the echo state network approach (from [2]) In many applications, it is necessary to learn the output weights online, i.e. during the actual task that the ESN is supposed to carry out. A good example is channel equalization in high-frequency digital communication, where the channel properties change over time. The most popular algorithm for online learning of adaptive filters is the Least Mean Square (LMS) algorithm (also known as Widrow-Hoff delta rule or stochastic gradient descent)[3]. The performance of this algorithm, however, depends critically on the eigenvalue spread of the cross-correlation matrix R = E[x(n)x(n)T ] of input CHAPTER 3. MOTIVATION AND PROPOSED RESEARCH QUESTIONS signals x(n). When the ratio s = |λmax (R)/λmin (R)| is large, the LMS algorithm converges very slowly and becomes inefficient. In the ESN case, the internal states of the network (referred to as ”dynamical reservoir”) can be seen as the input signals to the LMS algorithm. With all current ESN implementations, their eigenvalue spread is very large, ranging from approximately 1012 to 1018 1 , effectively prohibiting utilization of the LMS algorithm. So far, ESN tasks that required online learning of weights employed the Recursive Least Square (RLS) filter algorithm. However, this algorithm has a number of disadvantages compared to LMS: It has higher computational cost (quadratic in filter length, as compared to linear computational complexity for LMS) and space complexity, it is more difficult to implement and can fall prey to instability [3]. Therefore, it would be highly desirable if the LMS algorithm instead of RLS could be used for ESN online learning tasks. In this thesis, I am addressing the following questions: • How can the spectrum of the cross-correlation matrix R be changed to permit the usage of LMS? • In particular, what effect do different network topologies have on the eigenvalue spread of R? The following network topologies will be investigated: – Scale-Free nets (section 4.2.1) – Small-World nets (section 4.2.2) – Nets that arise by spatial growth mechanisms (section 4.2.3) – ”Recursive nets”, that are built by interconnecting lot of small identical nets (section 4.2.5) • If a certain topology is able to reduce the spread, does it also deliver a similar performance (in terms of mean-square-error) as current ESN implementations? The conducted experiment are described in more detail in Chapter 4. 1 For network sizes of 400 to 1000 internal neurons. Source: Personal communication with Herbert Jaeger and own experiments. 7 8 Chapter 4 Conducted Experiments The general aim of all conducted experiments was to investigate the influence of different network topologies on the eigenvalue spread of the crosscorrelation matrix R = E[x(n)x(n)T ] of internal states x(n). To this end, a general experimental framework was developed, which is described in detail in Section 4.1. The different experiments then generated nets of a certain topology and ran them on the provided framework. The different network types are described in more detail in Sections 4.2.1 to 4.2.5. All experiments were implemented in MATLAB 6 (Release 13). The essential parts of the code can be found in Appendix A. 4.1 The General Test Framework The general testing framework consists of several parts: • An Echo State Network. • A discrete-time input/output (I/O) sequence (divided into training and testing part). • An algorithm that computes the optimal weights from the training part of the input output sequence. • A testing part which tests the computed weights on the testing data of the I/O sequence. • A part that computes the eigenvalue spread of the crosscorrelation matrix R . Each of these parts is described in the following sections. 4.1. THE GENERAL TEST FRAMEWORK 4.1.1 CHAPTER 4. CONDUCTED EXPERIMENTS The Echo State Network An Echo State Network is a type of a recurrent neural network. It is fully characterized by its weight matrices and activation functions. In particular, an ESN consists of • K input units u(n) = (u1 (n), . . . , uK (n))T , • N internal units x(n) = (x1 (n), . . . , xN (n))T , • L output units y(n) = (y1 (n), . . . , yL (n))T , • An N × K input weight matrix Win , • An N × N internal weight matrix W, • An L × (K + N + L) output weight matrix Wout , • Possibly an N × L backprojection weight matrix Wback , • An activation function f (usually tanh or another sigmoidal function), • An output function f out (in our case usually linear). Input signals are fed into the input units and propagate into the internal units. Formally x(n + 1) = f (Win u(n + 1) + Wx(n) + Wback y(n)) (4.1) The output at time n + 1 is then computed as y(n + 1) = f out (Wout (u(n + 1), x(n + 1), y(n + 1))) (4.2) In traditional implementations of RNNs, all weights were trained to adjust the output. Standard training procedures to achieve this have a high computational complexity and could only find local minima1 . Therefore, the size of these RNNs was usually limited to 3 to 30 internal units. In the echo state network approach, the internal weight matrix W, the input weight matrix Win and the backprojection matrix Wback are defined initially and are not affected by the training procedure. Only the output weight matrix Wout is trained. Since the activation function of output units is linear, this can be achieved by computing a simple linear regression [4, 6]. In contrast to traditional RNNs, Echo State Networks are therefore usually quite large, with hundreds or even thousands of internal units. This is necessary to have a rich set of dynamics from which the output signal can be ”composed”. The exact number of internal neurons N is task specific. It should not be too small, because this would prohibit learning of complex dynamics with long memory. On 1 For a good overview of the most common training techniques for RNNs, see [6] 9 4.1. THE GENERAL TEST FRAMEWORK CHAPTER 4. CONDUCTED EXPERIMENTS the other hand, it should not be too large either, because this would lead to an overfitting of the training data. This is the classical bias-variance dilemma which arises in all supervised learning tasks. In the test cases described in this thesis, N was always set to 500. The procedure to generate the weight matrices Win , Wback , Wout and W is as follows (adapted from [6]): • Win : These weights can, in all cases, simply be drawn from a uniform distribution over [−a, a]. The bigger a, the more input-driven the network becomes, i.e. if a is large, the internal states will be strongly dominated by the input, making the sigmoid activation functions operate in the non-central, non-linear part. In the extreme case, large a leads to binary switching dynamics of internal nodes (i.e. their activation is either +1 or −1). For smaller a, the network is operating close to its zero state, i.e. the state attained if no input is given. The sigmoid activation functions are operated in the central, linear area. In my implementation a was usually set to 1, if not mentioned otherwise. • Wback : Similar remarks as for Win apply. Large a will lead to a strong dependence on the output, small a to a much more subtle excitation. In my implementation a was usually set to 1, if not mentioned otherwise. • Wout : This is the only matrix that is trained by the ESN training algorithm (see Section 4.1.3. For the training phase, these weights are irrelevant and can be set to arbitrary values (usually simply the zero vector 0) • W: In standard ESN implementations, W is a sparse random matrix W0 which is scaled by its largest eigenvalue to obtain the Echo State property: W = α · W0 /|λmax |, where λmax is the largest eigenvalue of W0 and α < 1 is a scaling constant (spectral radius), which depends on the specific task at hand (in all my experiments, α = 0.8). In my experiments, however, W0 was not created by a random procedure, but generated algorithmically (as outlined in Sections 4.2.1 to 4.2.5). These nets were then scaled by the same procedure as above. 4.1.2 The Testing Systems The Echo State Network created as described in Section 4.1.1 was then driven by two input/output sequences: • A NARMA system • The Mackey-Glass chaotic attractor time series 10 4.1. THE GENERAL TEST FRAMEWORK CHAPTER 4. CONDUCTED EXPERIMENTS The NARMA System A NARMA (Nonlinear Autoregressive Moving Average) system is a discrete time system of the following form y[n] = f (y[n − 1], y[n − 2], . . . , y[0], u[n], u[n − 1], . . . , u[0]) where y[n] is the system output at time n, u[n] is the system input at time n, and f is an arbitrary vector-valued, possibly non-linear function. The characteristic property of these systems is that the current output depends both on the input and output history. Modelling these systems is, in general, quite difficult, due to the arbitrary non-linearity and possibly long memory (i.e. y[n] might depend on many former inputs and outputs). The particular system that was chosen for our test had an input/output dimension of 2 and was described by the following equations y1 [n] = u2 [n − 5] · u2 [n − 10] + u2 [n − 2] · y2 [n − 2] y2 [n] = u2 [n − 1] · u2 [n − 3] + u2 [n − 2] · y1 [n − 2] As inputs we chose u1 [n] = 1∀n (a so-called bias input, only relevant for the network) and u2 [n] as random samples drawn from a uniform distribution over the interval [0, 1]. The output samples were then computed using the update equations above. All sequences had a length of 5000 samples each. A sample plot of both outputs is shown in Figure 4.1. Figure 4.1: The 2 outputs of the NARMA system (for a sequence of 500 samples) The Mackey-Glass Chaotic Attractor In addition to the NARMA system described above, some ESNs were also tested on the Mackey-Glass time series (MGS), a standard benchmark test in the time series prediction community. This system has no inputs and only one output, which is computed as follows: 11 4.1. THE GENERAL TEST FRAMEWORK y[n + 1] = y[n] + δ CHAPTER 4. CONDUCTED EXPERIMENTS ! 0.2y[n − τ /δ] 10 (1 + y[n − τ /δ]) − 0.1y[n] where δ is a stepsize parameter that arises from the discretization of the original continuous-time MG equation, and τ is a delay parameter that influences the degree of chaos the MGS features. A standard choice is τ = 17, which produces a ”mildly chaotic” behaviour, and δ = 0.1 with subsequent subsampling by 10. It is important to note that the evolution of the MGS depends on the initial values of y (y[0], . . . y[τ /δ]). These were simply random samples drawn from a uniform distribution over [1.1, 1.3]. Using this method, we generated 4100 samples, of which the first 100 were discarded to wash out initial transients. Although the MGS output does not depend on any input sequence, we still created a bias input of the same length, which is simply used by the network in its state update procedure. A sample plot of 1000 samples from the MGS output is shown in figure 4.2. Figure 4.2: A sample plot of the Mackey-Glass time series (1000 timesteps) 4.1.3 ESN Training Given the ESN and the I/O sequences from Sections 4.1.1 and 4.1.2, the network can now be trained to learn the system characteristics. To this end, the available I/O sequences are divided into three parts: 1. An initial part, which is not used for training but serves the purpose of getting rid of initial transients in the network’s internal states (length `init ) 12 4.1. THE GENERAL TEST FRAMEWORK CHAPTER 4. CONDUCTED EXPERIMENTS 2. A training part, which is used in the actual learning procedure of adjusting the output weights (length `train ) 3. A testing part, which is used to test the newly trained network on additional data (length `test ) The training of the network is done as follows: First, the networks internal state vector x is initialized to random values. Then, the system is run on the initial part and the training part of the I/O sequence, i.e. input samples are written into the input nodes, output samples are written into the output nodes, and the internal states of the next timestep are computed according to equation 4.1. It is important to note that the output update equation 4.2 is not used during training, since the output weights are not yet set to their final values. Instead, the output nodes are just overwritten by the output part of the I/O sequence (”teacher forcing”). The internal state vector is recorded for each sample that is part of the training sequence after the network has been run on the inital sequence to remove transient effects. From the set of these state vectors, one can compute the N × 1 cross-correlation vector of internal states with the desired output: p = E[x(n)d(n)] (4.3) where d(n) is the desired output signal at time n, and the N × N cross-correlation matrix R = E[x(n)x(n)T ] (4.4) which provides a measure of the correlation between different internal states. The optimal output weights can then be calculated according to the Wiener-Hopf equation (see [3] for a derivation of this formula): Wout = R−1 p (4.5) This computation can be shown to be equivalent to a least-mean-square solution by computing the pseudo-inverse of the state vector x(n). 4.1.4 ESN Exploitation After the computation of optimal weights according to equation 4.5, the network is run on the remaining I/O sequence, to test whether the new solution generalizes well to unknown data. In this so-called exploitation phase, the output of the system is not known anymore, but only the input signal is given to the ESN. As before, internal states are updated according to equation 4.1. In contrast to the training phase the output units are not forced to the teacher signals, but are computed according to equation 4.2. As a standard error measure, a normalized mean square error is computed as follows: 13 4.2. NETWORK TOPOLOGIES CHAPTER 4. CONDUCTED EXPERIMENTS s NRMSE = P`test i=1 2 (ytest [i] − dtest [i]) `test · σ 2 where ytest [n] is the network output during the testing phase, dtest [n] is the desired output during testing phase, and σ 2 is the variance of the desired output signal. 4.1.5 Computing the Eigenvalue Spread The main aim of this thesis is to investigate the influence of different topologies on the eigenvalue spread s of the cross-correlation matrix R. Therefore, this spread needs to be computed. Since R has been computed already in order to solve equation 4.5 (Wiener-Hopf), the computation of s is straightforward: λmax (R) s ≡ λmin (R) (4.6) where λmax (R) and λmin (R) are the largest and smallest eigenvalue of R, respectively. 4.1.6 Parameter variation & Validation Schemes In order to achieve reliable results and eliminate outliers, multiple runs of the same experiment were conducted. 1. Each network topology had one or more adjustable parameters. These were systematically within sensible intervals in order to avoid using the wrong net configuration. The particular choices of parameters are given in the sections describing the particular network topologies (Section 4.2). 2. For given network parameters, the test runs were repeated ten times in order to average outliers. Both mean and standard deviation of the test error and the spread were computed. 4.2 Network Topologies The aim of this thesis is to investigate the influence of different network topologies on the eigenvalue spread of R, as described earlier. In the following sections, I will describe the kinds of networks that I used in my experiments, why they might prove useful, and how they can be generated. 4.2.1 Scale-Free Nets Networks are said to be scale-free if the probability P (k) of a node having k connections to other nodes follows a power-law distribution P (k) ∝ k −γ (with γ typically between 2 and 4), i.e. there are very 14 4.2. NETWORK TOPOLOGIES CHAPTER 4. CONDUCTED EXPERIMENTS few highly connected nodes, whereas most nodes have only very few connections. Many real-world networks have been shown to be scale-free, for example scientific or Hollywood actor collaboration graphs, the World Wide Web [1], or the German highway system [9]. For a visualization of a scale-free graph, see Figure 4.3 Figure 4.3: A scale-free graph of size 100 (generated with MATLAB) It has been suggested that the scale-free organization of nodes arises from phenomena that are typical for many networks that occur in nature: Growth over time and preferential attachment. Growth over time means that most naturally occuring networks don’t come into existence with a fixed size, but expand over time (new locations being added to the highway system, for instance). The idea of preferential attachment suggests that newly added nodes are likely to connect to those nodes in the net that already have a high connectivity. It was shown by Barabasi [1] that networks constructed according to these principles feature a power-law distribution of P (k). An algorithm to generate scalefree networks is given in pseudo-code below, a MATLAB implementation can be found in appendix A. It should be noted that this algorithm generates undirected graphs, i.e. if node a is influenced by node b, then node b is also influenced by node a. This symmetry is a potential drawback which could be remedied by separately checking for each direction whether a link should be established. The parameter k can be used to tune the density of the net. If k is large, the probability of establishing a link grows, producing dense networks. If k is very low, the network will consequently feature very low connectivity. 15 4.2. NETWORK TOPOLOGIES CHAPTER 4. CONDUCTED EXPERIMENTS Algorithm 1: Generate a scale free net of size N 1: Let the net be represented by G. 2: Start with a small net of m (3 to 10) initial nodes, randomly connnected to each other 3: while m < N do 4: generate a new node a, initially unconnected 5: for all nodes b that are already part of the net do 6: deg(b) ← number of nodes that b is connected to. 7: edges(G) ← total number of edges in G. 8: if rand() < k · deg(b)/edges(G) then {connect new node to old node with a probability proportional to connectivity of old node} Create a new undirected link between a and b 9: end if 10: 11: end for 12: m←m+1 13: end while After creating a network with a scale-free topology in this manner, the weights were changed to random values, since simple 0-1 connections seemed unreasonable. This was achieved by simple element-wise multiplication with a random matrix of the same size, where random values where drawn from a uniform distribution over [−0.5, 0.5]. In the next step, the spectral radius of the weight matrix was changed as described in Section 4.1.1. In my experiments I tested Scale-Free nets with the following choice of parameters: k ∈ (0.1, 0.5, 1, 1.5, 2, 2.5, 3, 3.5, 4, 4.5, 5) 4.2.2 Small-World Nets In addition to the scale-free property described above, many real-world networks have been shown to possess the so-called Small-World property: Their average shortest-path (ASP) is comparable to random networks of the same size, while they maintain a much higher clustering coefficient than random networks. The ASP is the average path length of all shortest paths between any two nodes in the network. The clustering coefficient is a measure to quantify the amount of interconnections between groups of nodes: If node a is connected to a set of nodes S, then the clustering coefficient of a is directly proportional to the number of connections within the set S ∪ a. Some examples of real-world Small-World nets include scientific and actor collaboration graphs, subparts of the US power grid, the neural network of the nematode worm C. elegans (the only completely mapped natural neural network), and subparts of mammal brains [12, 9]. Small-World nets have 16 4.2. NETWORK TOPOLOGIES CHAPTER 4. CONDUCTED EXPERIMENTS become famous due to the ”six degrees of separation” thesis by Watts et al [11], which states that two humans are connected by a link chain with an average length of 6. Watts and Strogatz [12] used the algorithm below to construct and quantify Small-World nets (a MATLAB implementation is given in appendix A). Algorithm 2: Generate a Small-World net of size N with E edges 1: Let p be a probability measure between 0 and 1 2: Let G represent the graph 3: Generate a regular lattice of undirected links between nodes. For example, arrange nodes in a circular fashion and let each node be connected to its two closest neighbours on either side. 4: 5: for all edges e in the net do if rand() < p then {rewire with probability p} 6: delete edge e from net 7: add a new edge between one of the nodes the original node belonged to and a randomly chosen node 8: 9: end if end for {Add more edges to reach a total of E edges} 10: 11: 12: while edges(G) < E do add another edge between two randomly chosen nodes end while The parameter p can be used to control the ”randomness” of the net. If p = 1, the net is totally random, as all edges have been rewired. If p = 0 the net is totally regular, as the original regular arrangement has been maintained (assuming that no additional nodes needed to be added). Watts and Strogatz observed that a small increase in p from 0 to drastically lowers the average path length since random connections are introduced into the network, which in most cases form shortcuts between formerly distant nodes, whereas the average clustering coefficient remained almost unaffected. A Small-World net with 50 nodes and 155 edges, generated with the above algorithm (with p = 1/4) is shown in Figure 4.4. As with the Scale-Free net algorithm, the generated nets are undirected, leading to a symmetric matrix. This could again be remedied by starting off with a regular lattice of directed links and rewiring these. After the nets were generated as described above, they were again multiplied elementwise with a random matrix (uniform distribution over [−0.5, 0.5]) and scaled to achieve a spectral radius smaller than 1, as described in Section 4.1.1 I used the following set of parameters to create small-free nets of varying density (network size in all 17 4.2. NETWORK TOPOLOGIES CHAPTER 4. CONDUCTED EXPERIMENTS Figure 4.4: A Small-World graph of size 50 (generated with MATLAB) cases was 500): E ∈ (1e + 5, 5e + 4, 1e + 4, 5e + 3, 1e + 3) 4.2.3 Nets arising from spatial growth A third algorithm to generate nets with Scale-Free and Small-World properties was proposed by Kaiser and Hilgetag [9]: In their approach, nodes in the network have spatial information associated with them, i.e. they can be located in some multidimensional metric space. When new nodes are added to the network, they are likely to form connections to nodes that are close to them (where ”closeness” is simply quantified by the Euclidean distance between the two nodes). This approach makes sense when one considers some real-world networks: For a city in New England it makes more sense to get connected to the Boston Highway Hub rather than to the one of Los Angeles, even though Los Angeles is bigger (i.e. has higher connectivity). An algorithm that generates such spatial-growth nets is given below (A MATLAB implementation can be found in the appendix). A spatial-growth net generated with this algorithm can be seen in Figure 4.5 As can be seen from the algorithm, the parameter α can be used to adjust the distance sensitivity of the probability P (a, b) of forming a link between a and b, whereas β can be used to adjust the overall density of the net. Kaiser and Hilgetag classify nets into different categories (including Small-World and Scale-Free nets) depending on these parameter choices [9]. It should be noted that the algorithm returns an adjacency matrix like the two other algorithms, but in addition also a 2D coordinate for each node. However, for my purposes, this position information is simply discarded, and the remaining weight matrix is multiplied element-wise with a random matrix to get rid of simple 1-0 connections. Another possibility would be to adjust the weights between two nodes depending on their distance, i.e. nodes with a great distance would have weaker connections 18 4.2. NETWORK TOPOLOGIES CHAPTER 4. CONDUCTED EXPERIMENTS Algorithm 3: Generate a Spatial-Growth net (in 2D space) 1: Given the parameters α and β and the desired total numbers of nodes N 2: Let the net be represented by G 3: Start with a single node at (0.5,0.5) 4: k←1 5: while k < N do 6: create a new node a at a random position in 2D space with coordinates in the interval [0, 1] 7: c←0 8: for all nodes b ∈ G do 9: Calculate distance between a and b: d(a, b) = p (ax − bx )2 + (ay − by )2 10: Calculate probability of forming a connection: P (a, b) = βe−αd(a,b) 11: if rand() < P (a, b) then 12: Add an undirected link between a and b to G 13: c←c+1 14: end if 15: end for 16: if c = 0 then {no connections could be established} 17: 18: 19: delete node a from G else {at least one connection was established, keep the node in the net} k ←k+1 20: end if 21: end while Figure 4.5: A Spatial-Growth graph of size N = 100, α = 5.0, β = 0.1 (generated with MATLAB) 19 4.2. NETWORK TOPOLOGIES CHAPTER 4. CONDUCTED EXPERIMENTS (small weights), whereas nodes that are close to each other are characterized by strong connnections. I varied the parameters α and β over the following sets (all possible combinations were tested): 4.2.4 α = (0, 0.51, 1.5, 2, 2.5, 3, 5, 7, 9) β = (0.1, 0.5, 1, 1.5, 2, 2.5, 3, 3.5, 4, 4.5, 5) Why Scale-Free? The rationale for using Scale-Free, Small-World, and similar nets as a model for the internal ESN network topology is twofold: 1. As has been mentioned before, Scale-Free and Small-World nets occur in many different domains in nature. Especially interesting is the fact that parts of mammal brains have been identified as essentially Small-World nets [9]. In addition, it has also been shown that such structures can arise even when nets are artificially constructed. For example parts of the Java Class Library have been identified as both Scale-Free and Small-World [10]. It is therefore hoped that such apparently natural structures might also lead to an improvement in the ESN case. 2. The second reason is connected to the strong clustering of Small-World (and, in parts, ScaleFree) nets. With random matrices, no particular clustering occurs, which intuitively leads to internal dynamics that are ”averaged out” between different nodes. This would in turn then lead to a relatively strong correlation between states. Strong correlations are connected to a very uneven power density spectrum of x(n), which can be shown to negatively affect the eigenvalue spread (for a more rigorous argument, see [3]). In contrast to these random nets, Small-World and Scale-Free nets feature a strong clustering of nodes. Intuitively, this should lead to network dynamics that are concentrated in these clusters, i.e. nodes within one cluster show similar dynamics, whereas states of different clusters should essentially be uncorrelated, leading to an overall lower correlation between pairs of nodes. 3. Even if it turns out that the overall correlation is not significantly decreased by the clustering, it might be possible to select a few representative nodes from each cluster, and use only these as the dynamical reservoir from which the output is composed. Using fewer nodes always leads to a decrease in spread, but is usually inappropriate, as the learning capability of the network suffers from this reduction. However, if nodes are used that appropriately represent the dynamics of an entire cluster, the resulting dynamics might still be rich enough to approximate the original system well enough. 20 4.2. NETWORK TOPOLOGIES 4.2.5 CHAPTER 4. CONDUCTED EXPERIMENTS Recursively built nets Another approach that is essentially unconnected to the aforementioned ones is presented in this section. The idea is to find very small nets (3-5 nodes) that exhibit a comparably small spread and investigate • which topological features are shared by these nets, • how many of these small nets can be combined to produce large networks (100 or more nodes). The entire idea of starting off with small nets and iteratively building larger networks from them was inspired from a talk given by Viktoras Jucikas in the ESN Guided Research Seminar at IUB. However, he only presented the rough idea, the concrete implementation that I am describing was designed by me. My experiments in this area are far from being exhaustive. The very simple and naive approach I have taken in this thesis is to view the the adjacency matrix of a net as a bitstring, i.e. each entry in the adjacency matrix is either 1 or 0. Here is a simple example. Consider the following net, consisting of 4 nodes: Figure 4.6: A small net, consisting of only 4 nodes (generated with MATLAB) The corresponding adjacency/ESN weight matrix is: 21 4.2. NETWORK TOPOLOGIES CHAPTER 4. CONDUCTED EXPERIMENTS 0 1 1 0 0 W= 0 1 0 0 1 0 0 0 0 0 0 The corresponding bitstring is then 0001001000000110 (the least significant bit corresponding to the entry (1,1) in the matrix). For a 3 × 3 matrix, there are 29 = 512 possible permutations, whereas for a 4 × 4 matrix there are 216 = 65536 possible permutations. Of course, many of them will be essentially identical (since the ordering of nodes is arbitrary), but this was simply neglected in my approach. In order to find nets with good spread, I simply iterated over all possible 1-0 matrices of size 4 × 4 and used them as an ESN weight matrix (after multiplying elementwise with a random matrix and scaling to achieve a spectral radius smaller than 1). After finding those nets that exhibited low spread (averaged over several runs), I tried to combine them into bigger nets as follows: 1. Given an N × N weight matrix WN , construct a 2N × 2N matrix W2N : W2N = WN 0 0 WN ! 2. In order to allow for some interaction between the two clusters, add a few random connections between them (i.e. change some of the non-diagonal elements of W2N into 1’s). 3. Repeat steps 1 and 2 until the desired size is reached. 4. In order to get suitable ESN weight matrices, multiply element-wise with a random matrix and scale in order to obtain a spectral radius smaller than 1, as described before. Using this algorithm, I constructed nets of size 512 and tested them on the same datasets as the other nets. 22 23 Chapter 5 Experimental Results In this chapter, I will summarize the results that I obtained. Since I conducted a lot of experiments with different parameters, I will only present a few representative results in more detail. A complete table of all results can be found at http://pandora.iu-bremen.de/ bliebald/ESN (MATLAB .mat files). In all cases, I will only present the results obtained with the NARMA system, since I did not have the computational resources to do all experiments with the MGS as well. For comparison, I will first present results for ESNs represented by a random sparse internal weight matrix. These random matrices are the standard choice for most applications as of now. I will then compare these results to the ones obtained with different network topologies. 5.1 Random Networks As all other networks, the random networks that I investigated had a size of 500 internal nodes and a spectral radius of 0.8. I used nets of different density d: d ∈ (1e − 6, 1e − 5, 1e − 4, 1e − 3, 1e − 2, 1e − 1) As mentioned before, each test was conducted ten times and the results were averaged. The best results, regarding both spread and testing error were achieved with nets of comparably high density (1% and 10%, respectively). This might simply be because all other nets just had a density that is not used in practice. For example, for a net of 500 nodes, a density of 1e − 5 means that there are only around 3 edges in total! Densities of 1-10% are simply more realistic for real-world tasks, so the parameters should probably have been taken from a more representative set. Nevertheless, the almost identical values for nets of 1% and 10% density and additional, less systematic tests, suggest that there is hardly any difference for nets in this regime, at least in terms of spread. The most important 5.2. SCALE-FREE NETS CHAPTER 5. EXPERIMENTAL RESULTS results are summarised in Table 5.1. network spread spread test error test error density mean std. dev. mean std. dev. 1 % (10e-2) 3.91e+13 1.3533e+13 0.5171 0.6162 10 % (10e-1) 2.45e+13 1.0625e+13 0.3102 0.0678 Table 5.1: Results for Random Networks of density 1% and 10% The testing error is only given for the first output of the network for simplicity reasons. The results for the second output are similar and can be found in at http://pandora.iu-bremen.de/ bliebald/ESN. As is clearly visible from these results, the eigenvalue spread is quite large for random networks, but this was, of course, expected. 5.2 Scale-Free Nets We expected better results for Scale-Free nets. Unfortunately, this turned out to be a wrong hope. The best results that could be achieved within our parameter variation are listed in Table 5.2. network spread spread density mean std. dev. 1.8% (k = 4.5) 6.3238e+13 5.2006e+13 2 % (k = 5) 5.3822e+13 3.7233e+13 Table 5.2: Results for Scale-Free Networks test error test error mean std. dev. 0.3246 0.0564 0.8215 1.3709 of density 1.8% and 2% Even though the average spread is slightly better for k = 5 as compared to k = 4.5, the testing error is far worse, and the standard deviation of the error is quite huge. This is striking, since the overall density is almost the same for both parameters. It is also interesting to note that the spread’s standard deviation is larger than in the random case. I am not sure why this is the case. Overall, there is no significant improvement over random nets, but Scale-Free nets actually appear to perform worse or equally bad. 5.3 Small-World Nets Unfortunately, also Small-World nets did not deliver a better performance than random networks. The most important results are summarized in table 5.3. network spread spread density mean std. dev. 2% (E = 5e + 3) 2.5567e+13 1.3537e+13 40% (E = 1e + 5) 2.4813e+13 2.3269e+13 Table 5.3: Results for Small-World Networks 24 test error mean 0.3224 0.2960 of density 2% test error std. dev. 0.0659 0.0557 and 40% 5.4. SPATIAL-GROWTH NETS CHAPTER 5. EXPERIMENTAL RESULTS These results are in its entirety in very similar ranges as the random networks, regarding both spread as well as standard deviation. Small-World nets don’t seem to perform worse, but there is also no significant improvement over random networks. 5.4 Spatial-Growth Nets As was already expected from the previous results, Spatial-Growth nets also did not perform significantly better than random networks. The most important results are summarized in table 5.4. network spread spread test error density mean std. dev. mean 47 % (α = 5, β = 4.5) 1.3862e+13 6.4638e+12 0.2471 99 % (α = 0, β = 2.5) 1.4262e+13 4.5347e+12 0.2835 99 % (α = 1, β = 4) 1.5148e+13 6.0675e+12 0.2599 Table 5.4: Results for Spatial-Growth Networks of density 47% test error std. dev. 0.0376 0.0517 0.0502 and 99% These results are all in all very surprising. Spatial-Growth nets seem to be the only nets that perform better than random networks, though not substantially (1.38e+13 as compared to 2.45e+13 in the best cases). However, the nets with the best performance show a surprisingly high density, which does not allow for any particular clustering (with 99% density, almost all nodes are connected to each other, making clustering virtually impossible). Looking at Scale-Free, Small-World, and Spatial-Growth nets as a whole, it seems that their topology has very little influence on the eigenvalue spread or the testing error of the ESN. I cannot give a definite explanation why this is the case. An obvious reason might be that topology is simply irrelevant for the spread. However, there are nets with special topologies (like feedforward nets) that have a very low spread, so topology should play some role. Another possible reason might be that the parameters were chosen incorrectly, and that different parameters would have led to better results. However, this is also rather unlikely, since sensible parameter ranges were chosen for all three network types and the spread varied only very little over all investigated parameter choices. 5.5 Recursive Nets The results for recursive nets can be split into two parts: First, I ran an experiment over all possible 1-0 combinations of 4 × 4 weight matrices (using the NARMA system). As usual, matrices were multiplied element-wise with a random values between -0.5 and 0.5. The best performing nets are shown in Table 5.5. By investigating the ten best performing networks, it is striking that all of them have a relatively high density of 40 % or higher (corresponding to at least 6 non-zero elements in the matrix). Other than that, it is hard to find any similarities. Some of the matrices have single dependent nodes (i.e. 25 5.5. RECURSIVE NETS network matrix 0 1 0 1 0 0 1 1 0 1 1 1 0 1 1 0 1 1 1 1 1 0 1 1 Table 5.5: CHAPTER 5. EXPERIMENTAL RESULTS spread mean spread std. dev. test error mean 1 1 0 0 26390 8811 0.9844 1 1 0 0 1 1 1 0 27710 8147 0.9475 1 1 0 1 0 1 1 0 30612 13404 0.9135 1 0 0 1 Results for small nets of size 4 (averaged over test error std. dev. 0.0316 0.0294 0.0487 5 runs each) nodes that have only one incoming connection), some don’t, some of the matrices have empty rows (corresponding to nodes with no incoming connections), some don’t. In the second part of this experiment, I tried to expand these well-performing nets into bigger ones, in order to make them useful for real ESNs. As mentioned before, I used the simple block-diagonal expansion to produce matrices of size 512, which corresponds to 7 ”doubling steps”. As the resulting nets are slightly larger than the nets used before, an exact comparison is impossible. However, the difference between 512 and 500 is sufficiently small to compare results in a qualitative fashion. The performance of the three nets from above, expanded to 512 nodes, is summarized in table5.6. network spread spread matrix mean std. dev. 0 1 1 1 0 1 0 0 0 0 1 1 1.1101e+16 1.4274e+16 1 1 0 0 0 1 1 1 1 1 1 0 0 1 1 1 3.4353e+14 1.9275e+14 1 0 0 1 1 1 0 1 1 1 1 0 1 0 1 0 6.7843e+14 1.8717e+14 1 1 0 1 Table 5.6: Results for Recursive Nets of size 512 test error mean test error std. dev. 1.1595 1.9017 0.3592 0.0509 0.5779 0.2722 (averaged over 5 runs each) It should be noted that this experiment was run on the ten best-performing nets of the first part. Those results that are not shown do not differ significantly from the ones shown in Table 5.6. It is striking (and disappointing) to see, that these nets actually perform worse by one or more orders of magnitude than random nets. On the other hand, this at least suggests that there has to be some connection between network topology and eigenvalue spread. 26 5.6. OVERALL EVALUATION CHAPTER 5. EXPERIMENTAL RESULTS I did not have sufficient time to investigate these Recursive Nets any further. A few ideas for future research in this area are laid out in chapter 6. 5.6 Overall Evaluation As a general result, I can say that none of the investigated network topologies was able to perform significantly better than simple random networks, both in terms of eigenvalue spread as well as testing error. In fact, most networks performed worse than the randomly created nets. It is also striking to see that high-density networks often perform comparably well, such as the presented Spatial-Growth networks, which featured connectivity of 40 % or higher. The reasons for these rather disappointing results are unknown to me. It might be that topology does not play a big role for the eigenvalue spread, but the results obtained with Recursive Nets do not support this claim. It might be a good idea for future research, to find some theoretical insights into this connection. 27 28 Chapter 6 Future Investigations Unfortunately, my experiments did not lead to any improvement of the eigenvalue spread of R. However, there are a few points in connection with different network topologies that would deserve some more attention: • Recursive nets should be investigated further. For example, one could investigate all possible 1-0 combinations of 5 × 5 matrices. Even though there are 225 ≈ 33 · 106 permutations, one could significantly reduce this number by discarding symmetrically identical matrices and thus make this experiment computationally feasible. More interesting, however, should be the investigation of different expansion methods, i.e. how to build larger networks from small ones. The approach that I have chosen might not necessarily be the best one. • It could be investigated in how far it is possible to use only very few nodes as the actual reservoir from which the output signal is composed. The idea is to take a strongly clustered net, such as a Small-World net, and use only few representatives per cluster for learning. Since only few nodes participate in the learning procedure, the eigenvalue spread will be decreased significantly. • One could try to use one-to-one models of real-world networks (such as the neural network of the C. elegans worm) as ESN weight matrices. 29 Bibliography [1] Albert-László Barabási and Réka Albert. Emergence of scaling in random networks. Science, 286:509–512, 1999. [2] Herbert Jaeger. The echo-state approach to recurrent neural networks. slide presentation, can be obtained from http://www.faculty.iu-bremen.de/hjaeger/courses/SeminarSpring04/ESNStandardSlides.pdf . [3] Herbert Jaeger. Lecture notes for ”Machine Learning”, Fall Semester 2003, International University Bremen. [4] Herbert Jaeger. The ”Echo State” Approach to Analysing and Training Recurrent Neural Networks. Technical Report 148, GMD - Forschungszentrum Informationstechnik GmbH - Institut für autonome intelligente Systeme, 2001. [5] Herbert Jaeger. Short Term Memory in Echo State networks. Technical Report 152, GMD - Forschungszentrum Informationstechnik GmbH - Institut für autonome intelligente Systeme, 2001. [6] Herbert Jaeger. Tutorial on training recurrent neural networks, covering BPPT, RTRL, EKF and the ”echo state network” approach. Technical Report 159, GMD - Forschungszentrum Informationstechnik GmbH - Institut für autonome intelligente Systeme, 2001. [7] Herbert Jaeger. Adaptive nonlinear system identification with echo state networks. In NIPS, 2002. [8] Herbert Jaeger and Harald Haas. Harnessing nonlinearity: predicting chaotic systems and boosting wireless communication. Science, April 2nd, 2004. [9] Marcus Kaiser and Claus C. Hilgetag. Spatial growth of real-world networks. Phys. Rev. E, 69:036103, 2004. [10] Sergi Valverde, Ramon Ferrer Cancho, and Ricard V. Solé. Scale-free networks from optimal design. Europhysics Letters, 2002. BIBLIOGRAPHY BIBLIOGRAPHY [11] Duncan Watts. Website of the Small World Project: http://smallworld.columbia.edu/. [12] Duncan J. Watts and Steven H. Strogatz. Collective dynamics of small-world networks. Nature, 393:440–442, 1998. 30 31 Appendix A MATLAB Code c Listing A.1: A function that generates a Scale-Free graph (Markus Kaiser) function m at r i x = s c a l e f r e e g r a p h ( n , f c ) ; % matrix = s c a l e f r e e g r a p h (n , f c ) % y i e l d s m a t r i x o f a s c a l e −f r e e g r ap h % w i t h n nodes % fc i s a f ac t or determining f i n a l density % Author : Marcus K a i s e r Date : 8 . 1 2 . 0 2 NODES = n ; INITIALNODES = 3 ; % g e n e r a t e i n i t i t a l m a t r i x ( two nodes ; u n d i r e c t e d l i n k ) m at r i x = zeros (NODES,NODES) ; m at r i x ( 1 , 2 ) = 1 ; m at r i x ( 2 , 1 ) = 1 ; m at r i x ( 1 , 3 ) = 1 ; m at r i x ( 3 , 1 ) = 1 ; m at r i x ( 3 , 2 ) = 1 ; m at r i x ( 2 , 3 ) = 1 ; nodes incl = [1;2;3]; % a g g r e g a t i o n o f nodes t o i n i t i a l m a t r i x m = INITIALNODES ; k = zeros (NODES, 1 ) ; APPENDIX A. MATLAB CODE f o r i = 1 :m k ( i ) = (sum( m a tr i x ( i , : ) )+sum( matrix ( : , i ) ) ) ; end ; while m < NODES m = m + 1; f o r i = 1 :m−1 P = k ( i ) / sum( k ) ; i f ( rand ( 1 , 1 ) <= P ∗ f c ) k( i ) = k( i ) + 2; k (m) = k (m) + 2 ; mat r i x ( i ,m) = 1 ; mat r i x (m, i ) = 1 ; end ; % i f end ; % f o r end ; % w h i l e m return ; 32 APPENDIX A. MATLAB CODE c Listing A.2: A function that generates a Small-World graph (Markus Kaiser) function sw = s m a l l w o r l d g r a p h ( n , e ) ; % sw = s m a l l w o r l d g r a p h ( n , e ) % y i e l d s m a t r i x o f a s m a l l −w o r l d g r ap h % w i t h n nodes and e e d g e s % a l g o r i t h m d e s c r i b e d i n : Watts & S t r o g a t z , 1 9 9 8 % constants K = ceil ( e / n) ; % neighbors OneSideK = K / 2 ; prob = 0 . 2 5 ; % g e n e r a t e i n i t i a l m a t r i x ( p=0) sw=zeros ( n , n ) ; for i = 1 : n f o r j =1: OneSideK % n e i g h b o r i n g nodes a f t e r node i neighbor = i + j ; i f neighbor > n neighbor = neighbor − n ; end ; sw ( i , n e i g h b o r ) = 1 ; sw ( n e i g h b o r , i ) = 1 ; end ; f o r j =1: OneSideK % n e i g h b o r i n g nodes b e f o r e node i neighbor = i − j ; i f neighbor < 1 neighbor = neighbor + n ; end ; sw ( i , n e i g h b o r ) = 1 ; sw ( n e i g h b o r , i ) = 1 ; end ; end ; % r e w i r i n g ( p=p r o b ) f o r j =1: OneSideK f o r i =1:n neighbor = i − j ; % n e i g h b o r j b e f o r e node i 33 APPENDIX A. MATLAB CODE i f neighbor < 1 neighbor = neighbor + n ; end ; i f ( sw ( i , n e i g h b o r )==1) && (rand < prob ) dummy = randperm( n ) ; while ( sw ( i , dummy( 1 ) ) ˜ = 0 ) | | ( dummy( 1 ) == i ) dummy = randperm( n ) ; end ; sw ( i , n e i g h b o r ) = 0 ; sw ( i , dummy( 1 ) ) = 1 ; end ; neighbor = i + j ; % n e i g h b o r j a f t e r node i i f neighbor > n neighbor = neighbor − n ; end ; i f ( sw ( i , n e i g h b o r )==1) && (rand < prob ) dummy = randperm( n ) ; while ( sw ( i , dummy( 1 ) ) ˜ = 0 ) | | ( dummy( 1 ) == i ) dummy = randperm( n ) ; end ; sw ( i , n e i g h b o r ) = 0 ; sw ( i , dummy( 1 ) ) = 1 ; end ; end ; % f o r i end ; % f o r j % d e l e t e a l l but e edges e d g e s = sum(sum( sw ) ) ; victims = edges − e ; while v i c t i m s > 0 dummy = randperm( n ) ; i = dummy( 1 ) ; dummy = randperm( n ) ; j = dummy( 1 ) ; while sw ( i , j )==0 dummy = randperm( n ) ; i = dummy( 1 ) ; dummy = randperm( n ) ; 34 APPENDIX A. MATLAB CODE j = dummy( 1 ) ; end ; sw ( i , j ) = 0 ; victims = victims − 1; end ; %e d g e s = sum ( sum ( sw ) ) % analysis %cc = c l u s t e r c o e f f ( sw ) %d = d e n s i t y ( sw ) %l = asp ( sw ) 35 APPENDIX A. MATLAB CODE c Listing A.3: A function that generates a Spatial-Growth graph (Markus Kaiser) function [ matrix , p o s i t i o n ] = s p a t i a l g r a p h ( n , a s t a r t , b ) ; % [ matrix , p o s i t i o n ] = s p a t i a l g r a p h ( n , a s t a r t , b ) ; % n : number o f nodes % a s t a r t : s t a r t i n g v a l u e o f t h e d i s t a n c e −dependence a l p h a t h a t remains % unchanged as l o n g as a s t e p i s s e t t o z e r o % a s t a r t −> 0 => n e t w o r k i s i n d e p e n d e n t o f d i s t a n c e s % a s t a r t > > 10 => o n l y n e a r b y nodes remain % b : s c a l i n g parameter b e t a a f f e c t i n g t h e d e n s i t y o f t h e n e t w o r k % Author : Marcus K a i s e r % Date : 4.09.2002 % constants NODES = n ; INISIZE = 1 ; % parameters astep = 0; %. 2 5 ; a = astart ; % variables m at r i x = zeros (NODES,NODES) ; % c o n n e c t i v i t y m a t r i x ( no d i s t a n c e s ! ) p o s i t i o n = zeros (NODES, 2 ) ; % ( x , y ) p o s i t i o n s o f t h e nodes d i s t a n c e = zeros (NODES, 1 ) ; % d i s t a n c e s o f new node t o e x i s t i n g nodes % i n i m a t r i x ( one i n i t i a l node a t p o s i t i o n ( 0 . 5 , 0 . 5 ) position (1 ,:) = [0.5 0.5]; n = INISIZE + 1 ; while n <= NODES p o s i t i o n ( n , : ) = rand ( 1 , 2 ) ; % random p o s i t i o n f o r c a n d i d a t e node f o r i =1:n−1 % d i s t a n c e s t o node n d i s t a n c e ( i ) = sqrt ( ( p o s i t i o n ( n , 1 )−p o s i t i o n ( i , 1 ) ) ˆ 2 + ( p o s i t i o n ( n , 2 )− position ( i ,2) ) ˆ2 ) ; 36 APPENDIX A. MATLAB CODE prob = b ∗ exp( − a ∗ d i s t a n c e ( i ) ) ; i f rand ( 1 ) <= prob ma t r i x ( i , n ) = 1 ; ma t r i x ( n , i ) = 1 ; end ; % i f end ; % f o r i f deg ( matrix , n ) > 0 n = n + 1; a = a + astep ; end ; % i f end ; % w h i l e n return ; 37 % spatial contraint APPENDIX A. MATLAB CODE Further MATLAB scripts to repeat the described experiments, etc. are available from the author upon request. 38