Using SVM for Classification of Remotely Sensed

advertisement
Using Support Vectors Machine for
Classification of Remotely Sensed Images
M. Bundzel, P. Sinčák, N. Kopčo
Computational Intelligence Group, Laboratory of AI, Department of Cybernetics
and AI, Faculty of Electrical Engineering and Informatics, TU Košice, Slovakia Email: cig@neuron-ai.tuke.sk, http://neuron-ai.tuke.sk/cig
Abstract: The paper deals with a comparison study of support vector machines
classification approach and ARTMAP neural classifiers. SVM provides very
interesting mathematical methods based on virtual transformation of input space
into a multidimensional space. The high degree of nonlinear discrimination
hyperplane is aproximated by task tranformation into dichotmial classification
with the aim to achieve the best classification results. SVM was used with RBF
kernel and experiments were done on benchmark data as well as on real-world
datellite images over Slovakia. Comparisons with Fuzzy Artmap and Gaussian
Artmap on these data were accomplished. Adaptive kernel function based on
neural network is proposed for future reserach in this area.Classification is
evaluated using contigency tables for multiclass classification problems. The aim
was to develop a classification tool with the highest accuracy on the tested images.
Keywords: Support Vector Machines, VC dimension, RBF kernel function, Fuzzy Artmap,
Gaussian Artmap, accuracy assessment, contigency tables
1 Introduction
Support Vectors Machines (SVMs) represent a powerfull tool in the field of pattern
recognition and regression. Introduced by Vapnik in 1979 SVMs have received
increasing attention only in last few years. For further research it is important to
evaluate the power of SVMs with different kernels and compare it to existing
methods. In this work, Gaussian ARTMAP was choosen to be a competitor of
SVM with RBF kernel. The motivation of the project is to acomplish a
comparison study between SVM type of classifiers and ARTMAP family
classifiers on test and on real world data.
2 Support Vector Machine as a classifier
2.1 Basic description of a SVM principles
Following description is possible to find in extended version in [1].
Let us have the training data in the form: {xi , yi } , i  1, ... , l , yi  {1,1} ,

xi  R d . Let us assume that there is a hyperplane that separates positive
examples from the negative examples (separating hyperplane). The points lying on

 
the hyperplane satisfy w  x  b  0 , where w is normal to the hyperplane and


b w is a perpendicular distance from the hyperplane to the origin ( w is the

Euclidean norm of w ). Let d+ and d- be the shortest distance from the separating
hyperplane to the closest positive (negative) example. Define the "margin" of the
separating hyperplane to be d   d - . For, linearly separable case the algorithm
looks for the separating hyperplane with the largest margin. This situation is
illustrated in the folowing figure.
x2
H1
H2
b

w

w
margin
x1
Figure 1 Linear separation with SVM, support vectors are circled.
Constraints for this optimization problem are following:
 
xi  w  b  1 for yi  1
(1)
and also
 
xi  w  b  1 for yi  1
(2)
 
y i xi  w  b  1  0 i
(3)
or:

If the examples are linearly separable, it is always possible to find w and b, such
that the inequalities (1) and (2) will hold. Hyperplanes
 
H1 : x  w  b  1
and
 
H 2 : x  w  b  1

represents the margin with the width 2 w . H1, and H2 are parallel and no
training examples fall between them. Optimal hyperplane will be determined by

w and b for which is w 2 maximal (or 2 w minimal) subject to constraints (1)
and (2). As can be seen on the Figure 1 it illustrates the typical two dimensional
case of an optimal hyperplane. Points lying on one of the hyperplanes H1, H2 are
called support vectors. All other points could be removed from the training set and
it would not change the solution. The optimization problem is switched to the
Lagrangian formulation. Kuhn-Tucker (or Karush-Kuhn-Tucker) theorem for socalled convex optimization is used for this purpose. The most important reason of
this reformulation is that the training data will appear only in the form of
dot product. This is a crucial feature that allows generalizing the procedure to the
nonlinear case.
Reformulated problem:
Maximize:
l
Ld   i 
i 1
1 i , j l
 
 i j yi y j xi  x j 

2 i , j 1
(4)
subject to:
0  i  C
(5)
l
 i yi  0
i 1
(6)
Where αi,j are so called Lagrangian multiplyers and C is a user set constant
determining the required calculation accuracy.
2.1 SVM Kernel functions
The above methods can be generalized to the case, where the decision function is
not a linear function of the data. In some cases it might not be possible to separate
training data with hyperplane and relaxing constraints because this would lead to a
poor classification. Possible solution is to map the data to some other
(multidimensional, even infinite dimensional) Euklidian space H, where it is
possible to separate the data with a hyperplane. Let assume the mapping Φ:
 : Rd  H
(7)
where R d is the space of training data and H is the transformed (Hilbert) space.
Please note, that data appears in the training problem only in the form of a dot
product, Eqs.(4), (5), (6). Let us introduce a “kernel function” K such that
 


K xi , x j  ( xi )  ( x j ) . Now it is possible to replace usual dot products by K


everywhere in the algorithm without knowing what Φ is. One example is:
 
 
 x x
K ex xi , x j   e
i
j
2 2
(8)
In this particular example, H is infinite dimensional. It would not be possible to
 
 
work with Φ explicitly. If xi  x j is replaced with K ex xi , x j everywhere in the


algorithm, SVM lives in an infinite dimensional space. The training time is roughly
the same as that by un-mapped data. All the considerations of the previous section
hold, since linear separation is still done, but in the different space. Using Φ is also
avoided in the test phase.
3 ARTMAP family neural networks
ARTMAP neural networks belong to the class of neural networks called
Adaptive Resonance Theory (ART), a theory of cognitive information processing
in the human brain. Based on this theory, a whole family of neural network
algorithms was developed. These neural networks were shown to give a very good
performance in applications involving clustering, classification, and pattern
recognition. When compared to statistical and other neural-network-based
clustering/classification algorithms, these networks usually obtain very good
classification accuracy, while securing proven stability and a high level of
compression in the system.. From the point of view of this study, the currently
available ARTMAP classification systems can be divided into two groups. First,
systems based on (or systems that are a modification of) fuzzy ARTMAP
algorithm (e.g., ARTMAP-IC, ART-EMAP, etc). All these systems share the
property that they prefer data clusters distributed into hyper-rectangles in feature
space. In these systems the basic properties of the original ARTMAP design
(stability, proven convergence, fast on-line learning) are preserved, but they also
have well-known disadvantages, e.g., noise sensitivity and tendency to category
proliferation. The other group is based on the Gaussian ARTMAP neural network.
In this group of networks, preferably identifying Gaussian-shaped clusters, the
stability and fast on-line learning properties of the fuzzy ARTMAP networks are
traded for an emphasis on the ability of the system to generalize and for its
decreased sensitivity to noise in the input data. Structurally, every ARTMAP
network (fuzzy ARTMAP or Gaussian ARTMAP) can be divided into two parts.
The first part, represented by an ART module, dynamically generates units, each
identifying a single data cluster in feature space. This part can be used
autonomously for cluster analysis of a given data set. The second part serves to
identify each of the clusters found in the data with one of the classes defined on the
data set. A detailed description of fuzzy ARTMAP (FA), first of the algorithms
analyzed in this study, can be found in many previously published studies. From
the point of view of this study, the most important property of this system is that
the subsystem identifying clusters in feature space preferably identifies the clusters
in which patterns are distributed as hyper-rectangles as illustrated in the following
figure.
Figure 2 Distribution of discrimination rectangles defined by fuzzy ARTMAP in
the feature space.
4 Experimental results
Experiments were done on benchmark and real-world data. Thorsten Joachims
implementation of SVM was used ([2]). Simple extension to the algorithm was
done in order to achieve multiclass classification. In all cases Radial Basis
Function (RBF) kernel was used Classification accuracy was assessed by a
contingency table approach. There were 2-benchmark datasets prepared for
classification purposes. “Circle in the square” and “double spiral” were used for
dichotomous classification purposes. The results of both Fuzzy ARTMAP and the
SVM approach are presented in Table 1and Table 2
“Circle in the square”
Predicted
Actual Class
Class
A
A'
B'
“Double spiral”
Predicted
Actual Class
B
Class
A
B
99.54%
0.68%
A'
93.25%
57.24%
0.46%
99.32%
B'
6.75%
42.76%
Table 1 SVM results on benchmark datasets
“Circle in the square”
Predicted
Actual Class
Class
A
A'
B'
“Double spiral”
Predicted
Actual Class
B
Class
A
B
98.34%
2.80%
A'
87.59%
9.26%
1.66%
97.20%
B'
12.41%
90.74%
Table 2 Fuzzy ARTMAP results on benchmark datasets
As shown in Table 1, otherwise properly working SVM failed to classify class B of
"double spiral" dataset. Reason of this phenomenon remained unclear - increasing
error intolerance constant lead to an unaffordable long training time and
manipulation with RBF coefficient didn't help eithter.
4.1 Experiments on real-world data
Experiments were done on benchmark and also real-world data. Basicly the
behaviors of the methods were observed on multispectral image data with the aim
to obtain the best classification accuracy on the test data subset. The Košice data
consists of a training set of 3164 points in the feature space and of a test set of
3167 points of the feature space. A point in the feature space has 7 real-valued
coordinates of the feature space normalized into the interval (0,1) and 7 binary
output values. The class of a fact is determined by the output which has a value of
one; the other six output values are zero. The data represents 7 attributes of the
color spectrum sensed from Landsat satellite. The representation set was
determined by a geographer and was supported by ground verification procedure.
The main goal was landuse identification using the most precise classification
procedure for achieving accurate results. The image was taken over the eastern
Slovakia region particularly from the City of Kosice region. There were seven
classes of interest picked up for classification procedure as it can be seen in Figure
3. Results of classification procedures are depicted in the form of contingence
table 4 SVM with RBF kernel function was used.
Figure 3 Original image. Highlighted areas were classified by expert (A – urban
area, B – barren fields, C – bushes, D – agricultural fields, E – meadows, F –
forests, G – water)
Actual Class
Predicted
Class
A’
A
B
C
D
E
F
G
93.51
0.84
0.00
0.00
3.32
0.00
1.34
B’
0.61
88.30
0.00
3.56
12.76
0.00
1.34
C’
0.00
0.00 100.00
0.00
2.27
0.00
0.00
D’
0.00
8.45
0.00
96.33
0.52
0.00
0.00
E’
3.25
2.47
0.00
0.11
79.55
0.16
5.80
F’
0.00
0.00
0.00
0.00
0.00 98.97
2.68
G’
2.64
0.00
0.00
0.00
1.57
0.87 88.84
Actual Class
Predicted
Class
A’
A
B
C
D
E
F
G
96.15
0.00
0.00
0.00
2.76
0.00
1.41
B’
0.00
87.68
0.00
3.01
6.08
0.00
0.00
C’
0.00
1.64
100.0
0.11
1.10
0.00
0.00
D’
0.00
7.60
0.00
96.88
0.00
0.09
1.41
E’
0.64
2.05
0.00
0.00
83.98
0.34
8.45
F’
3.21
1.03
0.00
0.00
6.08
99.49
1.41
G’
0.00
0.00
0.00
0.00
0.00
0.09 87.32
Tables 3 and 4 : Confusion matrix for fuzzy Artmap neural network with voting from 5
networks. The overall weighted PCC is 93.95 %. ;Confusion matrix for SVM. The overall
weighted PCC is 95.64 %.
A B C D E
F G
Figure 4 SVM classified image of the City of Kosice area. Shadows are sometimes
classified as urban area which decreases classification accuracy significantly.
5 Conclusion
The paper delas with comparison study of Support Vector machine classifier with
RBF kernel and fuzzy ARTMAP aproach. Both of the tested classifiers showed
very good classification and generalization abilities. Experimental results indicate
that certain datasets could cause problems to SVMs with RBF kernel function.
6 References
[1] Burges, CH.,1999 ,Tutorial on Support Vector Machines for Pattern
Recognition.,http://svm.research.bell-labs.com/SVMdoc.html
[2] T. Joachims, Making large-Scale SVM Learning Practical. Advances in Kernel
Methods - Support Vector Learning, B. Schölkopf and C. Burges and A. Smola
(ed.), MIT-Press, 1999.
[3] Anthony M., Bartlett P. L. Neural Network Learning: Theoretical Foundations,
ISBN 0 521 57353, Cambridge University Press 1999
[4]. G. A. Carpenter, M. N. Gjaja, S. Gopal, and C. E. Woodcock. Art neural
networks for remote sensing: Vegetation classification from landsat tm and terrain
data. IEEE Transactions on Geoscience and Remote Sensing, 35(2):308-325, 1999.
[5]. G.A. Carpenter and S. Grossberg. A massively parallel architecture for a
selforganizing neural pattern recognition machine. Computer Vision, Graphics, and
Image Processing, 37:54-115, 1987.
[6]. G.A. Carpenter, S. Grossberg, N. Markuzon, J.H. Reynolds, and D.B. Rosen.
Fuzzy ARTMAP: A neural network architecture for incremental supervised
learning of analog multidimensional maps. IEEE Transactions on Neural
Networks, 3(5):698- 713, 1992.
[7] N. Kopčo, P. Sinčák, and H. Veregin, "Extended Methods for Classification of Remotely
Sensed Images Based on ARTMAP Neural Networks," in Computational Intelligence –
Theory and Applications (Lecture Notes in Computer Science 1625), Proceedings of
International Conference “The 6-th Fuzzy days,” Dortmund, Germany, May 1999, pp. 206219.
[8] P. Sinčák, H. Veregin, and N. Kopčo, "Conflation techniques in multispectral image
processing," Geocarto Int, March, pp. 11-19. 2000.
[9] G.A. Carpenter, B.L. Milenova, and B.W. Noeske, “Distributed ARTMAP: a neural
network for fast distributed supervised learning,” Neural Networks, vol. 11, no. 5, Jul. 1998,
pp. 793-813.
[10] J.R. Williamson, “Gaussian ARTMAP: A neural network for fast incremental learning
of noisy multidimensional maps,” Neural Networks, vol. 9, 1996, pp. 881-897.
Download