jgrd50688-sup-0001-AppendixA

advertisement
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
APPENDIX A: Self-Organizing Map
The Self-Organizing Map or SOM is a pattern recognition technique or a cluster
algorithm that is based on unsupervised competitive learning process. The output neurons of
the network compete among themselves to be activated or fired with the result that only one
output neuron wins the competition known as the ‘winner neuron or node’. It is an effective
method for feature extraction and classification. Within limits, SOM can be used to identify
as many distinct patterns as we wish.
SOM is characterized by the formation of a topographic map of the input patterns
in which the spatial locations or coordinates of the neurons in the lattice indicates the intrinsic
statistical features contained in the input patterns and thereby facilitating the data
compression and easy visualization. Mathematically, this is a process of topology conserving
projection from an original higher-dimensional data space into the lower-dimensional lattice
[Haykin, 1999]. The main advantage of SOM is that it does not require any prior knowledge
of the data domain or the human intervention. This method is similar to the standard iterative
clustering algorithm such as k-means clustering [Gutierrez et al., 2004].
SOM algorithm has been extensively used in different disciplines of science
including meteorology [Cavazos, 1999; Hewitson and Crane, 2002; Gutierrez et al., 2005;
Ambroise et al., 2000; Leloup et al., 2007; Tozuka et al., 2008, CSG08; Morioka et al., 2010,
Joseph et al., 2011 etc.]. This technique is different from other statistical analysis in the sense
that it uses the minimum Euclidean distance among the reference vectors associated with a
node and the input data vector to cluster in each node. The main emphasis of the SOM
algorithm is in putting similar vectors close to one another in a low-dimensional map. Greater
are the distances between any two reference vectors, more different are the two nodes and so
are the patterns associated with the nodes. Heskes and Kappen, 1995 showed that in a way,
SOM technique is analogous to (but more complex than) nonparametric regression technique.
We have used the Kohonen model [Kohonen, 1990] of SOM in this study and its
implementation resembles to the vector-quantization method. Given an N-dimensional (ND)
data space consisting of cloud of data points (input variables), the SOM algorithm distributes
an arbitrary number of nodes (or cluster centres) in the form of a 1D or 2D regular lattice in
such a way that each node is uniquely defined by a reference vector (or code vector)
consisting of weights. Each weight of the reference vector is associated with a particular
input variable. SOM adjusts the reference vectors to the ND data cloud through a user defined
iterative cycle minimizing the Euclidean distance between the reference vector for any j th
node, W j and the input data vector, X . Mathematically,
35
36
X  Wc  min X  W j ; where Wc is the closest reference vector to X .
37
38
39
40
41
42
43
44
45
As mentioned earlier for a particular data record only one node wins and the
winner node locates the centre of the topological neighbourhood of the cooperating nodes. An
optimal mapping will be such that the winner node also changes the neighbour nodes as
defined by the user. This inclusion of the neighbourhood makes the SOM classification nonlinear since each node has to be adjusted relative to the neighbour. This training cycle may be
continued for n times and may be mathematically described as:
46
W  n   c  n   x  n   W  n   , j  R  n  1
j
j 
 j

W j  n  1  
,otherwise
W j  n 
 A1
47
48
49
50
51
52
53
54
55
56
Here Wj  n  is the reference vector for the j th node for nth training cycle, x  n  is
the input vector, R j  n  is the predefined neighbourhood around the node j and c  n  is the
neighbourhood kernel which defines the neighbourhood.
The neighbourhood attains its maximum at the winning node and its amplitude
decreases monotonically with the increase in the lateral distance decaying to 0 for lateral
distance   making it the necessary condition for convergence. Gaussian neighbourhood
satisfies these requirements to produce a smoother neighbourhood and hence in this study we
have used the Gaussian neighbourhood:

57
2
1 

 2 2  n 
i 



  n   exp   r  r


j


 A2 
58
59
where   n  and   n  are constants monotonically decreasing with n .   n  is the
60
learning rate which determines the ‘velocity’ of the learning process and   n  is the
61
62
63
64
65
66
67
68
69
70
71
amplitude which determines the width of the neighbourhood kernel. The r j and ri are the
coordinates of the nodes j and i in which the neighbourhood kernel is defined. The free
software for SOM is available at http://www.cis.hut.fi/research/somresearch/.
The SOM reference vectors span the data space and each node represents the
position that approximates the mean of the nearby samples in the data space. Another
important advantage of SOM is that smaller (larger) number of SOM nodes can be allocated
for a sparse (dense) dataset [Hewitson and Crane, 2002]. Hewitson and Crane, [2002] have
demonstrated the advantages of SOM by using a simple artificial 2D data set. CSG08 has
explained the same for a 2D meteorological data set while Joseph et al., [2011] used SOM for
a 3D atmospheric dataset. The experimental detail and the schematic figure of SOM (Fig.1)
are explained in the section 2.1.
Download