3 - neuron.tuke.sk

advertisement
Decreasing the Feature Space Dimension by Kohonen
Self-Organizing Maps
Igor Mokriš - Radoslav Forgáč
Institute of Informatics Slovak Academy of Sciences
Dúbravská cesta 9
846 07 Bratislava 45
Slovak Republik
mokris@savbb.sk, forgac@savbb.sk
Abstract
The paper deals with Kohonen Self – Organizing Maps for generation of features and
decreasing the dimension of classifying space in the image recognition process. In the
conclusions there are introduced the results of comparison for feature generation effectivity
by Kohonen Self - Organizing Maps and chosen methods for reduction of classifying feature
space dimension.
Key words: Kohonen self – organizing maps, reduction of feature space dimension
1 Introduction
Kohonen Self – Organizing Maps [3, 10, 13, 14, 15] are suitable for realization of
mapping of topology preservation and express this way the characteristic features for
classifying of input images. Based on this typical property the neurons in Kohonen Self –
Organizing Maps are ordered in two – or one – dimensional space. This mapping that
preserve the topology of neural networks in the learning process has one important property
[14], i.e. for similar patterns in the input image space are responding the neurons in Kohonen
Self – Organizing Map which are in the output layer physically near each other. All neurons
are shared into representation of features in Kohonen Self – Organizing Maps and feature
value is defined by winner neuron position, i.e. the extracted features are expressed
topologically. The extracted features may be without physical interpretation.
The structure of Kohonen Self – Organizing Map incoming from Willshaw and von der
Malsburg model [19], in which the neurons of output self – organizing layer mutually affect
themselves by means of the laterally feedback connections which are in time invariable. On
account of that substitution of laterally interactions by neighbor function in the output layer
are Kohonen learning algorithms more simple in relation to the different learning algorithms
of self – organizing neural networks because the laterally feedback connection is time
consuming from point of computation [14].
Kohonen Self – Organizing Map are suitable not only for reduction of the feature space
of images and patterns in the feature generation process, but also for the classification of input
images and patterns into corresponding classes. For classification purpose the quantization
learning vector algorithms LVQ can be utilized, which is based on the statistical approaches
of classification.
2 The structure of Kohonen Self – Organizing Map
Kohonen Self – Organizing Map belongs into group of feed forward self – organizing
full connected neural networks. It is two layer neural network (Fig. 1), which consists of input
and output layer, i.e. Kohonen Self – Organizing Map. Input layer copies the vector of input
features in D-dimensional space. From point of visualization of generated features Kohonen
layer can be two- or one -dimensional.
As above mentioned, important contribution of Kohonen Self – Organizing Maps is
substitution of laterally interactions between neurons in output layer by neighbor function
h(i*, i), where i* is index of winner neuron and i is index of neurons in neighbourhood of the
winner neuron.
Fig.1 Structure of Kohonen Self – Organizing Map
Each neuron has defined itself neighbourhood consists of the surrounding neurons. The
neighbor function defines the region of cooperation between neighbor neurons. I.e., that in the
process of learning were adapted only weight vectors of winner neuron and neurons from its
neighbourhood. The adaptation of weight vectors is influenced by type of utilized neighbor
function h(i*, i). Simplest neighbor function h(i*, i) is „bubble“ function [5, 8], which is
defined by relation
h (i * , i ) 
{
 (t ), if rM (i * , i)   (t )
(1)
0,
in other cases
where (t)(0, 1) is the learning coefficient, (t) is the radius of defined neighbourhood,
rM(i*, i) is distance between neurons i* and i of „Manhattan“ type. Function (t) decreases to
0 in time, i.e., the neural network is not able to learn.
In practice it is problem to set up the optimal initialized value of learning coefficient ,
because for small values  it is true that the neural network is quickly learning, but account of
new stimulus in input layer also quickly forget. Kohonen suggested, that best results are
achieved by monotonous decreasing of function (t) [5].
Second very frequently used neighbor function is function of Gaussian type, which is
defined by
 rE2 (i * , i ) 

h(i , i )   (t ). exp  
2

2

(
t
)


*
(2)
where rE is Euclidean distance between neurons i* and i in Kohonen layer, for which the
holds true
rE (i * , i)  ri*  ri  w i*  w i
(3)
where ri is a coordinate vector of i - neuron and wi is weight vector directed into i- neuron.
Main disadvantage of Kohonen Self – Organizing Maps is necessity to defining the
structure of neural networks and number of neurons in Kohonen layer, a priori. Possible
existence of so-called died neurons can be eliminated by substitution of concurrency learning
„winner takes all“ type by „winner takes most“ type.
3 Learning of Kohonen Self – Organizing Map
In this chapter will presented two algorithms for learning of Kohonen Self – Organizing
Maps, i.e. Kohonen algorithms of concurrency learning and algorithms LVQ.
Kohonen algorithms of concurrency learning is based on assumption, that space of input
feature vector x is identical to the space of weight vectors wi, tend to corresponding winner
neurons. In case of utilization of normalized input feature vector x and weight vectors wi the
relation holds true
| x |  | wi |  1
(4)
i.e. input feature vector x actives neurons in Kohonen layer by full connection of neurons in
individual layers. The activity of output neuron is defined by relation
D
yi (t )   wij (t ).x j (t )
(5)
j
where D is number of neurons in output layer.
Self – organizing neural networks use competitive principle and because of it is
necessary to determine the winner neuron, which is the most sensitive on input stimulus x.
There exist two methods for learning of winner neuron:
 Method of finding the most activity of output neuron (so-called Dot Product Method) [9],
where finding the winner neuron i* with maximal activity is realized by rule
i*  arg max | wi .x |
(6)
 Method of finding the minimal distance between vectors x and wi [10]. As the winner
neuron is considered this neuron, which of the weight vector is by Euclidean metric nearest
to actual input vector x by rule
i*  arg min | x  w i |
Both methods are mutually equivalent each other because holds
(7)
| x  w i |2 | x |2 2w Ti .x | w i |2
(8)
From equation (8) due to utilized normalization (4) influences, that finding of minimal
distance (7) is analogy to finding of maximal scalar product of vectors x by rule (6) [14].
Kohonen algorithm based on Euclidean metric don’t require normalization and because of is
its utilization more effective than the algorithm of finding the most activity of output neuron
[16]. Adaptation process of weights is based on minimization of error function, which is
defined by
J (t ) 
1 D
 wij (t )  x j (t )
2 j 1

2
(9)
The goal of competitive learning is projection of input vector x into weight vectors wi
for winner neuron and its neighbourhood. Adaptation of weights in learning process is defined
by relation for standard gradient method
wij (t )  
J (t )
wij (t )
(10)
where  is learning coefficient. Then the gradient of error function can be determined by
relation
J (t )

wij (t )
{
wij (t )  x j (t ), if it is the winner neuron
(11)
0,
in other cases
By substitution of relation (11) into (10) can be obtained relation for computation of weight
vector for winner neuron in time t+1

wij (t  1)  wij (t )   (t ) x j (t )  wij (t )

(12)
Kohonen algorithm of competitive learning „winner takes most“ type is better as
algorithm of competitive learning „winner takes all“ type. In the algorithm „winner takes
most“ there are adapted also weight vectors of neurons from neighborhood of winner neuron,
which is defined by neighbor function h(i*, i) by relations (1) and (2). Therefore, relation (12)
is modified as

wij (t  1)  wij (t )  h(i * , i) x j (t )  wij (t )

(13)
More detail description of Kohonen algorithm „winner takes most“ type is described as
follows:
procedure Kohonen
Define the Kohonen Self – Organizing Map topology.
Initialize of all weight vectors by random generator in interval <0, 1>.
t=0
Set parameter (t) and (t) for t = 0.
repeat
for k=1 to N
Get input vector xk from training set N.
Find winner i* by rule i* = arg min|x-wi|.
Adapt weights of winner neuron and its topological neighbors by (13).
Increment t.
end;
Decrement parameter (t) and (t).
until (algorithm converge) OR (number iteration is overflow)
end;
Number of iterations is variable in interval approx. 104-105. Kohonen recommended
empirically verified number of iterations, which can be minimally 500 times more than
number of neurons in network.
Advantage of Kohonen algorithm „winner takes most“ type in comparison with
algorithm „winner takes all“ type is elimination of undesirable effect of existence so – called
died neurons. There are neurons, which were eliminated in learning process and because of
their weight vectors aren’t adapted.
Algorithm LVQ [4, 6, 7, 11, 12] enables to set the weight vectors of Kohonen network
from point of minimization of number of bad classifications, which arises from reason of
overlaying the probability density functions for each classified class. The classification error
is minimized in case, when the boundary between classified classes is identical to Bayesian
boundary [14]. The role of algorithm LVQ is approximation of Bayesian boundary between
classified classes without knowledge of statistical description of input vectors x.
There are three versions of algorithm LVQ, i.e. LVQ1, LVQ2 and LVQ3. Because the
principle in all cases is similar, in next part will described more detail only algorithm LVQ1
[12].
Adaptation of weight vectors, unlike of Kohonen competitive learning is realized by
supervised learning. For each class of input images in Kohonen layer is allocated a label,
which represents membership of input vector to specified class. From point of difference of
competitive learning is not important, which neuron belonging of specified label is winner,
but more important is fact, which of neuron represents correct class.
Before realization of LVQ1 algorithm is necessary to initialize the weight vectors and
allocate labels. Weight vectors are initialized by Kohonen competitive learning. Labels can be
determined by “most” rule, i.e. evaluate number of winnings for each neuron and its
corresponding weight vector. Adaptation of weight vectors is performed by relations


w i (t  1)  w i (t )   (t )x(t )  w i (t )
w i* (t  1)  w i* (t )   (t ) x(t )  w i* (t )
*
*
*
ak C(x)  C(w i* )
(14)
ak C(x)  C(w i* )
(15)
where C(x) specifies class, whom belong an image represented by input vector x. Algorithms
LVQ1 is more detail described in next part:
procedure LVQ1
Initialize all weight vectors by Kohonen competitive learning.
Initialize parameter (t) and t=0.
repeat
for k=1 to N
Get input vector xk from training set N.
Find winner neuron i*.
if class_label = desired_class_label then
Adapt weights of winner neuron by (14).
else
Adapt weights of winner neuron by (15).
Increment t.
end;
Decrement (t).
until (algorithm converge) OR (number iteration is overflow)
end;
By optimization of LVQ1 was developed algorithm LVQ2 [7] and LVQ3 [12].
Algorithm LVQ2 unlike of origin algorithm LVQ1 adapts not only the weight vectors of
winner neuron, but also of second nearest neurons of input vector x. Algorithm LVQ3
removes disadvantages of LVQ2, mainly deteriorate results during longer learning, i.e.
approx. 105 iterations. Choice of suitable algorithm LVQ depends on applications. From point
of stability learning is recommended utilization of algorithm LVQ1 or LVQ3 [6].
4 Comparison of Kohonen Self - Organizing Maps with chosen methods for
reduction of feature space dimension
In this chapter are presented results of comparison of Kohonen Self – Organizing Map
(in Fig. 3 and 4 signed by SOM – Self Organizing Map) with chosen methods for reduction of
feature space dimension [1]:
1. linear method of principal component analysis (PCA),
2. nonlinear auto-associative neural network based on principal component analysis method
(AFN),
3. Sammon neural network based on multidimensional scaling method (SAM),
4. stochastic method of multidimensional scaling (MDS).
All introduced methods except for linear method of principal component analysis are
based on the principle of minimizing the error function.
Comparison of chosen methods is performed by two groups of textured images. Both
groups of images were pre–processing by wavelet transform on account of remove variance
under rotation of images [17, 18]. First group of images is created by 245 gray images, which
are divided into 5 classes and input feature space has dimension D=39. Second group of
images is created by 1024 color images divided into 16 classes and input feature space has
dimension D=48.
Initialized configuration of weight coefficients for Kohonen and auto-associative neural
network have been setting randomly. The structure of AFN was defined as:


D-30-d-30-D for first training set of images,
D-40-d-40-D for second training set of images,
where D is number of neurons in input and output layer and represents dimension of feature
space of input images, d is number of neurons in hidden layer and represents number of
generated features for d < D. Initialized configuration for both methods based on
multidimensional scaling methods was set up by the results of linear method of principal
components analysis. Number of iterations was determined as 100.
Quality of generated features for all methods of feature dimension reduction was
evaluated by classifier kNN (k=5, k nearest neighbors). Effectivity of compared methods was
evaluated by classification correctness in % in dependency on number of features in feature
vectors, which were entered into classifier. The results were analyzed from point of influence
for dimension of input feature space in relation to the classification correctness and number of
classified classes. Results of experiments for first group of images are represented by graph in
Fig. 2 and for second group of images are represented by graph in Fig.3.
Learning algorithms for AFN converges very slowly, number of iteration is approx.
3
10 *N (where N is number of images of training set). Algorithms SOM needs for convergence
approx. 10 times less time as in the case of AFN, i.e. approx. 102*N iterations. In this case is
more advantage the utilization of statistical multidimensional scaling method, or Sammon
neural network based on multidimensional scaling method. Linear method of principal
component analysis got worse results in comparison with nonlinear methods of projection
under reduction for one or two features. By increasing of number of generated features the
classification correctness of linear method of principal component analysis is comparative to
nonlinear methods. On the other hand advantage of linear method of principal component
analysis is implementation by neural networks with simple structure and shorter learning time.
From experiments resulted, that Kohonen neural networks due to nonlinear projection
are able to more precisely describe classified images in the case of small number of generated
features for d  2. On the other hand from point of time consuming of computation in the
learning process (Tab. 1) Kohonen neural networks are not suitable for reduction of input
feature space for large dimension.
From point of number of classified classes can be mentioned, that Kohonen and autoassociative neural networks are suitable for reduction of input feature space with lower
dimension and higher number of classified classes.
d
1
2
3
4
SOM
13
10
9
6
1. group of images
AFN
SAM
76
0,1
100
0,5
120
0,7
130
1,0
MDS
0,5
0,7
0,9
1,1
SOM
67
33
31
18
2. group of images
AFN
SAM
611
1,3
790
7,5
840
15,0
893
27,0
Tab.1 Time consuming of learning process for nonlinear PCA by [1]
MDS
7,8
11,0
13,0
21,0
Fig.2 Classification correctness for first group of images
Fig.3 Classification correctness for second group of images
5 Conclusion
Significant property of Kohonen Self-Organizing Maps is data projection ability with
topology preservation. After successful learning arbitrary two images, which are near each
other in input space, they activate in output Kohonen layer topologically near neurons, too.
With the assistance of projection of input images seems to be dramatic dimension reduction of
output space. Therefore Kohonen Self-Organizing Maps in dependence on used learning
algorithm are suitable for dimension reduction of features and also for classification tasks,
too.
Kohonen Self-Organizing Maps with unsupervised learning were applied in dimension
reduction of features. They are suitable for reduction of input space with lower dimension and
higher number of classification classes. In the case of reduction input space of features with
higher dimension is suitable to combine Kohonen Self-Organizing Maps with neural
networks, which are good for feature generation [2].
Kohonen neural networks based on Learning Vector Quantization belong to neural
networks with supervised learning. These neural networks are suitable for classification tasks.
References
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
DE BACKER, S.: Unsupervised Pattern Recognition - Dimensionality Reduction and
Classification. [PhD Thesis]. University of Antwerp, 2002.
FORGÁČ, R. – MOKRIŠ, I.: Artificial Neural Networks for Reduction of Dimension
of Feature Space and Classification. ISBN 80-8055-743-8, Matej Bel University Banská
Bystrica, 2002, (in Slovak).
HRISTEV, R. M.: The ANN Book. GNU Public License, 1998.
KANGAS, J. A. - KOHONEN, T. K. - LAAKSONEN, J. T.: Variants of SelfOrganizing Maps. IEEE Transactions on Neural Networks, Vol. 1, 1990, pp. 93-99.
KOHONEN, T. - HYNNINEN, J. - KANGAS, J. - LAAKSONEN, J.: SOM PAK: The
Self-Organizing Map Program Package. Helsinki University of Technology, 1996.
KOHONEN, T. et al: LVQ_PAK: The Learning Vector Quantization Program Package.
[Report A30]. Helsinki University of Technology, ISBN 951-22-2948-X, 1996.
KOHONEN, T.: Improved Versions of Learning Vector Quantization. Proc. of the
International Joint Conference on Neural Networks, Vol. 1, San Diego, 1990, pp. 545550.
KOHONEN, T.: Self-Organization and Associative Memory. Springer-Verlag, Berlin,
Heidelberg, New York, Tokyo, 3. Edition, 1989.
KOHONEN, T.: Self-Organized Formation of Topologically Correct Feature Maps.
Biological Cybernetics, Vol. 43, No. 1, 1982, pp. 56-69.
KOHONEN, T.: Self-Organizing Maps. Springer-Verlag, ISBN 3-540-58600-8, 1995.
KOHONEN, T.: Statistical Pattern Recognition Revisited. Advanced Neural Computers,
1990, pp. 137-144.
KOHONEN, T.: The Self-Organizing Map. Proc. of the IEEE, Vol. 78, No. 9, 1990, pp.
1464-1480.
KROSE, B. - VAN DER SMAGT, P.: An Introduction to Neural Networks. University
of Amsterdam, 1996.
KVASNIČKA, V. et al: An Introduction to the Neural Networks Theory. IRIS, ISBN
80-88778-30-1, 1997, (in Slovak).
[15] LIPPMAN, R. P.: An Introduction to Computing with Neural Nets. IEEE on ASSP
Magazine, 1987, pp. 4-22.
[16] SINČÁK, P. - ANDREJKOVÁ, G.: Neural Networks - Part 1, ELFA, ISBN 80-8878638-X, Košice, 1996, (in Slovak).
[17] VAN DE WOUWER, G. et al: Wavelet Correlation Signatures for Color Texture
Classification. Pattern Recognition, Vol. 32, No. 3, 1999, pp. 443-451.
[18] VAUTROT, P. et al: Rotation-Invariant Texture Segmentation using Continuous
Wavelets. Proc. of 2nd IEEE Symposium on Applications of Time-Frequency and TimeScale Methods, Coventry, 1997.
[19] WILLSHAW, D.J. - VON DER MALSBURG, C.: How Patterned Neural Connections
can be Set up by Self-Organization. Proc. of the Royal Society of London B, Vol. 194,
1976, pp. 431-445.
Download