Multilayer Perceptron for classification Landsat

advertisement
Multilayer Perceptron for analyzing satellite data
Kamal Al-Rawi, and Raed Shadfan
Department of Computer Science, Faculty of Information Technology,
Petra University, P.O. Box 961343, Amman 11196, Jordan.
e-mail: kamalr@uop.edu.jo
k_alrawi@yahoo.com
Abstract:
Different ANN architectures of MLP have been trained by BP and used to
analyze Landsat TM images. Two different approaches have been applied for training:
an ordinary approach (for one hidden layer M-H1-L & two hidden layers M-H1-H2-L)
and one-against-all strategy (for one hidden layer (M-H1-1)xL, & two hidden layers
(M-H1-H2-1)xL). Classification accuracy up to 90% has been achieved using oneagainst-all strategy with two hidden layers architecture. The performance of oneagainst-all approach is slightly better than the ordinary approach.
Keywords: MLP, Landsat, Classification.
2. Objectives
1. Introduction
The
Propagation
principle
(BP)
of
the
The objective of this work is to
Back
Artificial
Neural
construct
an
automated
system
for
Networks (ANN) has been developed by
analyzing Landsat satellite data using
In 1980’s the algorithm of the
MLP. Different ANN architectures of
Werbos
[1].
developed
MLP class are tested. Two different
independently by many authors Le Cun [2],
approaches are applied for learning, an
Parker [3], and Rumelhart et al. [4].
ordinary
BP
ANN
was
further
training
using
fully
interconnected multi-output nets tagged as
Computer analysis of satellite
(M-H1-L & M-H1-H2-L) and one-against-
images has been a very active area of
all training using a collection of single-
research
Benediktsson
Conventional
al.
et
classification
is
[5]
.
usually
output nets tagged as (M-H1-1, & M-H1H2-1).
deployed for classifying patterns in these
images. However, work in the last decade
involved the use of neural networks in
3. Data
The data for this study were
place of conventional classifiers. The main
over
obtained by Landsat Thematic Mapper
Maximum
taken for the area around the Spanish City
Likelihood Classifiers (MLC) that they are
of “Talavera de la Reina”. The data
non-parametric.
contains 6 features for around 65,000
advantage
of
conventional
neural
networks
classifiers
as
exemplars; each corresponds to different
Multi-Layer Perceptron (MLP),
electromagnetic wavelength. The spots
with BP learning, is the most commonly
corresponds to a total of thirteen different
used neural network in the literature to
classes:
classify remotely sensed data. While some
mountains, three types of fallow land, two
authors have reported that conventional
types
classifiers perform better
irrigated
Mulder & Spreeuwers
Mouchot
[6]
than MLP
, Solaiman &
[7]
, many authors have reported
that MLP performed better than MLC in
classifying remotely sensed data Hepner et
al.
[8]
, Heerman & Khazenie
[9]
, Paola &
Schowengerdt [10], Yoshida & Omatu [11].
meadow,
of
natural
land,
wheat,
vegetation,
wetland,
and
alfalfa,
forest,
river.
Typically, these classes contain non-linear
separations.
4. Multilayer Perceptron
4.1 Architecture:
4.2 Learning and testing:
The network topology used in this
Two training schemes have been
study is based on fully connected feed-
adopted in this work. The first, termed as
forward ANNs. The number of nodes in
ordinary training, is based on training fully
the input layer is equal to the number of
connected feed-forward networks, each
features presented by the data, while the
with 13 outputs corresponding to the 13
number of nodes in the output layer (L) is
classes of the image. The network is
equal to the number of classes that this
trained to score 1 at the output that
data map to. At least one hidden layer
corresponds to the correct class, and to
must be added to the architecture in order
score 0 at all the other outputs. The
to treat the non-linear separation among
second, termed as one-against-all, is based
classes. Several networks with one and
on training 13 different single-output
two hidden layers, with different number
networks as shown in figure-2, where each
of nodes in each hidden layer, have been
network corresponds to one class. The
used. The architecture for a multilayer
problem presented to each network is a
perceptron with two hidden layers (M-H1-
simple 2-classes problem: one class that is
H2-L) is shown in figure-1. M represents
associated with every network against all
the number of nodes in the input layer, H1
other classes. Eventually, these single-
represents the number of nodes in the first
output networks are used to produce 13
hidden layer, H2 represents the number of
outputs. In both training schemes, the final
nodes in the second hidden layer, and L
class is determined by choosing the output
represents the number of nodes in the
which has the highest score.
output layer.
Figure-1: The architecture of an ordinary MLP
with two hidden layers. Number of nodes in
the input layer, the first hidden layer, the
second hidden layer, and the output layer are
M, H1, H2, and L, respectively.
1
1
1
1
i
j
k
M
H1
H2
Figure-2: The architecture of the one-againstall network. Thirteen of this ANNs have been
trained. One for each class. During
classification, we present the input for each
one. The class will be adapted from the one
with the highest score
1
1
1
r
i
j
k
L
M
H1
H2
1
The data was randomly mixed and
investigation must be conducted to reach a
split into training and testing sets with
more solid conclusion. These runs have
10% and 90% of the whole data
been conducted on a computer with 1.8
respectively. The training set includes
MHz processor, 128 MB RAM, and 40
around 6500 exemplars evenly distributed
GB hard disk.
amongst the 13 classes: 500 exemplars
from each class.
During experiments,
learning rates greater than 0.1 resulted in
networks being caught in local minima. As
a safety measure, the learning rate used in
this study was set to 0.001. The number of
epochs was chosen to be 50,000 iterations
Figure-3: The comparison of one-against-all
versus the ordinary architecture. The upper
chart shows the training time for one-againstall is slightly better than the ordinary one. The
training and testing performance for oneagainst-all are little better than the ordinary
one. Dark lines represent the one-against-all
while the light lines represent the ordinary
scheme.
learning rate assigned.
Learning time in
hours
to compensate for the relatively low
20.00
16.00
12.00
8.00
4.00
0.00
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
5. Results
Number of nodes in the hidden layer X 13
for the two schemes using single hidden
layer networks. The performance of the
one-against-all scheme was a little better
Learning
performance %
Many runs have been conducted
80.00
60.00
40.00
20.00
1
than the ordinary one. The results of these
2
3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Number of nodes in the hidden layer X 13
runs are shown in figure-3. The training
and testing performance using (6-208-13)
and (6-16-1)x13 network topologies were
(87%, 82%) and (87%, 83%) respectively.
While it was expected that the learning
time for the one-against-all scheme to be
More runs have been conducted,
lower than the ordinary scheme, due to the
with two hidden layer networks, using
fewer connections, the learning time was
one-against-all scheme only. The results of
about 16 hours for both approaches. A
some of these runs are listed in table-1.
possible explanation for this result is the
The performance for training and testing,
communication time calling 13 networks
using (6-92-43-1) x13 topology was 93%
from the computer memory in the one-
and 90% respectively. However, the
against-all networks is compensating for
learning time was more than 130 hours.
the computation time consumed by the
The learning time has been reduced to
processor to resolve the extra connections
about 47 hours using architecture (6-42-
in the ordinary networks. However, further
10-1) x 13, while the learning and testing
performance have been reduced by less
Table-1: Some runs for different
architectures using two hidden layers.
One-against-all approach is used. The best
classification performance is 90%. The
training time for this run is 131 hours. A
more practical run, with classification
performance 88%, required 47 hours.
than 2%. The classification performance at
the class level were class-1 91%, class-2
87%, class-3 96%, class-4 93%, class-5
93%, class-6 92%, class-7 96%, class-8
85%, class-9 85%, class-10 94%, class-11
L
E
A
R
N
I
N
G
I
N
P
U
T
H1
H2
O
U
T
P
U
T
N
O
D
E
S
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
N
O
D
E
S
8
12
16
21
25
29
33
37
42
46
50
58
63
67
71
79
84
88
92
96
100
N
O
D
E
S
50
62
82
19
26
43
64
85
10
25
45
81
4
21
42
81
5
24
43
62
82
N
O
D
E
S
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
P
E
R
F
O
R
M
A
N
C
E
%
87
90
91
90
91
91
92
92
92
92
92
93
92
93
93
93
93
93
93
93
93
88%, class-12 84%, and class-13 100%.
T
E
S
T
I
N
G
P
E
R
F
O
R
M
A
N
C
E
%
84
86
87
87
88
88
89
89
88
88
89
89
89
89
89
90
89
89
90
90
89
The same data has been analyzed
by Al-Rawi et al.
L
E
A
R
N
I
N
G
T
I
M
E
/ hr
52
70
90
39
49
68
94
119
47
68
90
134
61
84
109
158
80
105
131
157
186
[12]
using Supervised
ART-II that has been constructed by AlRawi et al. [13]. A similar performance was
obtained. However, the training time was
in minutes rather than days as the case of
MLP. The training time using 9000
exemplars (one epoch is used) was few
minutes.
The
performance
training
was
and
94%
testing
and
86%,
a
good
respectively.
6. Conclusions
1-The
MLP
gives
performance when analyzing
satellite images. However, it
cannot be implemented in real
time analysis due to very long
training time.
2-
One-against-all
preferable
over
scheme
is
ordinary
approach. When trying to
classify a new data that has
subset classes of the trained
classes, one can use a subset
of the trained ANNs that
corresponds to the needed
classes only.
3- Classification performance for
[4] Rumelhart, D. E., Hinton, G. E., and
MLP is little better than
Williams,
R.
J.,
Parallel
Supervised ART-II, however,
distributed
Supervised ART-II can be
Explorations
implemented for real problem
Microstructure of Cognition (D.
tasks.
E.
Processing:
in
Rumelhart
and
the
J.
L.
McClelland, Eds), MIT Press,
Cambridge,
Massachusetts,
(1986) 318-362.
7. Acknowledgement
We gratefully thank Professor Santiago
Ormeno Villajos, Dept. of Topographic
&
Cartographic
Engineering,
[5] Benediktsson, J. A., Swain, P. H., and
Ersoy, O. K., IEEE Transaction
University Polytechnic of Madrid, for
on
Geoscience
and
Remote
the Landsat TM images. Our thank to
Sensing, 28 (1990) 540-552.
Professor Consuelo Gonzalo Martin
and her group, Faculty of Information
[6] Mulder, N. J., and Spreeuwers, L.,
Technology, University Polytechnic of
International
Madrid, for their efforts on preparation
Remote
the Landsat images.
(IGARSS´91).
Geoscience
Sensing
and
Symposium
Espo,
Finland,
(1991) 2211-2213.
[7] Solaiman, B., and Mouchot, M. C,
8. References
International Geoscience. and
[1] Werbos, P. J., Ph.D. thesis, Harvard
University,
Cambridge,
MA,
USA (1974).
Bienenstock,
Sensing
(IGARSS´94),
Symposium
Pasadena,
CA,
USA, (1994) 1413-1415.
[2] Le Cun, Y. in Disordered Systems and
Biological
Remote
Organization,
(F.
E.
Fogelman
Souli, and G. Weisbruch, Eds.),
[8] Hepner, G. F., Logan,T., Ritter, N., and
Bryant,
N.,
Photogrammetric
Engineering & Remote Sensing,
56 (1990) 469-473.
France, Spring-Verlag, (1986)
233-340.
[9] Heermann, P. D., and Khazenie, N.,
IEEE Transaction on Geoscience
[3] Parker, D., MIT, Cambridge, MA,
USA, technical report TR-87
(1986).
and Remote Sensing, 30 (1992)
81-88.
Testing
performance %
90.00
80.00
70.00
60.00
50.00
40.00
30.00
20.00
1
2
3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Number of nodes in the hidden layer X 13
[10] Paola, J. D., and Schowengerdt, R. A,
Photogrammetric Engineering &
Remote Sensing, 63 (1997) 535544.
[11] Yoshida, T., and Omatu, S., IEEE
Transaction on Geoscience and
Remote Sensing, 32 (1994) 11031109.
[12] Al-Rawi, K. R., Gonzalo, C., and
Martinez, E., Remote Sensing in
the 21st Century: Economic and
Environmental
(Casanova
Applications,
ed.),
Balkema,
Rotterdam (2000) 229-235.
[13] Al-Rawi, K. R., Gonzalo, C., and
Arquero, A., European
Symposium on Artificial Neural
Network ESANN´99, Bruges,
Belgium (1999) 289-294.
Download