Terrain classification by cluster analysis

advertisement
Terrain classification by cluster analysis
M. Crespi (.), G. Forlani (.), L. Mussio(.), F. Radicioni ( .. )
(.) Istituto di Topografia, Fotogrammetria e Geofisica Politecnico di Milano - Piazza L. da Vinci, 32 - 20133
Milano.
( .. ) Dipartimento di Scienze dei Materiali e della Terra Universita di Ancona - Via Brecce Bianche - 60100
Ancona.
Italy
Commission No. III
1. Introduction
The digital terrain modelling can be obtained by different methods
belonging to two principal categories: deterministic methods (e.g.
polinomial and spline functions interpolation, Fourier spectra) and
stochastic methods (e.g. least squares collocation and fractals, i.e.
the concept of selfsimilarity in probability).
To reach good resul ts, both the fi rst and the second methods need same
initial suitable information which can be gained by a preprocessing of data
named terrain classification.
In fact, the deterministic methods require to know how is the roughness of
the terrain, related to the density of the data (elevations, deformations,
etc.) used for the i nterpo 1at ion, and the stochast i c methods ask for the
knowledge of the autocorrelation function of the data.
Moreover, may be useful or very necessary to sp 1it up the area under
consideration in subareas homogeneous according to some parameters, because
of different kinds of reasons (too much large initial set of data, so that
they can't be processed togheter; very important discontinuities or
singularities; etc.).
Last but not least, may be remarkable to test the type of distribution
(normal or non-normal) of the subsets obtained by the preceding selection,
because the statistical properties of the normal distribution are very
important (e.g., least squares linear estimations are the same of maximum
likelihood and minimum variance ones).
A general way to perform the terra inc 1ass i fi cat i on can cons i sts of the
following steps:
1 - to split up the original set of data into subsets (may be useful dimensions of these subsets are almost equal);
2 - to compute statistical parameters of each subset;
3 - to cluster these parameters to obtain homogeneous groups (clusters)
represented by one value only (cluster point), so that the objects
belonging to a cluster show a high degree of similarity while the ones
belonging to different clusters are as dissimilar as possible;
4 - to test the significance of each cluster point and to modify the subsets;
5 - to subdivide the test area according to the results of the cluster
analysis.
The elevations of the Noiretable area and the deformations of the Ancona
182 landslide are analyzed.
Before showing the results of the research, the algorithm used to perform
the cluster analysis is described.
2. The ISODATA algorithm
This algorithm was originally published by G.H. Ball and DeJ. Hall in the
1967; some changes were introduced in the steps 3) and 5) as regards to
split and to merge the groups (clusters).
The essential steps of the tecnique are:
1 - to choose the initial set of cluster points and the values for the min-
128
2 3 -
4 5 -
6 The
imum norm (e.g. the Euclidean distance) between two cluster points
(lidminll) and for the maximum dispersion parameter (e.g. the standard
deviation) of each group (lismaxll);
to group togheter the data closest to the same cluster point according
to the norm chosen;
to split into two groups each group whose dispersion parameter is
larger than "smaxll; the consequence of this operation is the birth of a
new cluster point (the modality of this birth is arbitrary; the choice
consists in the comparison of dispersions after having split up a group
into two ones whose size are balanced according to the ratio 1:3, 2:2,
3:1;
to regroup togheter the data taking into account the new cluster
points;
to merge into a group the groups whose cluster points are closer than
IIdminll; the consequence of this operation is the death of some cluster
points (the modality of this death is arbitrary; the choice consists in
the joining of two or three groups, the second possibility occours if
the number of groups is odd);
to iterate the procedure from the step 2) (the implemented version of
ISODATA a 1gori thm stops if in 3 consecuti ve i terati ons the number of
cluster points isn't changed and however after 25 iterations).
program CLUSTER was implemented to perform the algorithm.
3. Subdivision of the original sets of data
The Noiretable test area is a hilly zone of central France, whose shape is
a square wi th the sides about 3.2 km long. A very regul ar gri d of about
6600 measured elevations (average spacing 40 m) was used for computation.
The Ancona 182 landslide area is a zone in the district of Ancona (Italy)
interested by a 1a rge sca 1e 1ands 1ide, whose shape is almost rectangu 1a r
with sides about 2.1 and 3.0 km long. About 150 cross-sections of different
lengths and densities about 8900 measured altimetric deformations (average
spacing 20 m) were used for computations.
Taking into account the shapes of the two areas and the positions of
the measurements, they were subdivided into 64 and 59 subsets respectively,
as shown in the figs. 1 and 2.
In such a way, subsets of data very homogeneous for the fi rst area (the
size is about 100 points) and homogeneous enough for the second one (the
size ranges about 80-220 points) as regards their dimensions were obtained.
4. Statistical analysis of the subsets
The following statistical elaborations were carried out:
- computation of average, standard deviation, asymmetry, kurtosis and first
autocorrelation coefficient for each subset;
- construction of the histograms of these statistical parameters;
- construction of the histograms of the data for each subset (some examples
in figs. 3 and 4);
- computation of the empirical autocorrelation functions of the data for
each subset;
- best fitting of these functions with teorical ones (some examples in
fi gs. 3 and 4);
- construction of the histrograms of the noise standard deviation and of
the theorical autocorrelation function first zero.
Note that the histograms of the signal standard deviation are the same of
the histograms of the first autocorrelation coefficient.
Two conclusions were reached by this statistical analysis:
- in general, the subsets haven't a normal distribution;
- all the theorical autocorrelation functions which best fit the empirical
129
ones belong to the family of the J
Bessel functions which have the
0
following expression:
f ( x ) = a J (cx),
where a and c are two param~ters related to the variance of the signal
and the first zero.
Note that the subsets of Ancona 82 landslide area were divided into two
groups accord i ng to the resu 1t of compa rison between the va 1ues of the
noise and the signal standard deviation as an altimetric deformation didn1t
happen in all the grid-points, so that some autocorrelation functions donUt
represent a stochastic process but a white noise only.
These deducti ons are important because the fi rst estab 1 i shes the 1east
squares method can1t reach the best results in this case, and the
second gi ves a very exact i nformati on about the most sui tab 1e theori ca 1
autocorrelation function to use in the stocastic approach to the digital
terrain modelling.
l
5. Cluster analysis of statistical parameters
The cluster analysis was applied to all the statistical parameter computed
before. The results regarding kurtosis, noise standard deviation, first
autocorre 1ati on coeffi ci ent and theori ca 1 autocorre 1ati on functi on fi rst
zero are shown only for the sake of brevity. On the other hand, these last
are the most meaningful results, because the kurtosis is generally
discriminant about the normality of the distribution and the other
parameters characterize completely the autocorrelation function, then the
stochastic process.
The choices regarding the initial set of cluster points and the values for
IIsmax" and "dminll were made in the following way for each statistical
parameter:
- the initial cluster points were the averages of the modal classes found
by the histogram;
- the values for "smax " and IIdminli were assumed equal to the double of the
class interval of the histogram.
At the present a significance test for of the cluster points isn1t
available. Indeed to test them, distribution-free tests concerning
non-random samples are necessary.
So the figures show the results of the ISODATA algorithm without
subsequent refinings; however some observations can be developed.
The grey tonal ity of the fig. 5 is uniform: the kurtoses of the subsets
be long i ng to the No i retab 1e a rea are grouped togheter a round one cluster
point only, whose value is remarkably lower than 3 (the significance is
evalueted by the D1Agostino-Pearson test) according to the powerful
Kolmogorov-Smirnov test. It means not only the most part of subsets haven't
a normal distribution (same of them could have it, but they are so few they
can1t cause the birth of an other cluster point), but also the global set
of data hasn't a normal distribution.
Also the most part of the subsets of data of the Ancona 182 landslide has a
kurtosis remarkably greater than 3. Indeed principal clusters show a
behaviour apart from the normality, but the geomorphological interpretation
of each cluster isn1t immediate (see fig. 6).
The grey tonality of the fig. 7 is uniform: the noise standard deviation of
the subsets bel ongi ng to the Noi retab 1e area are grouped togheter around
one cluster point only whose square value is remarkably lower than 0.5
times the standard deviation of the process. It means there is a stochastic
process as regards the elevations in each part of the area.
The clearest grey tonality of the fig. 8 corresponds to the cluster point
of the lowest value for the noise standard deviation computed on Ancona '82
130
1ands 1 i de area data; its square va 1ue is remarkab 1y lower than 0.5 times
the standard deviation of the process. It means there is a stochastic
process as regards the a1timetri c deformati ons in some parts of the area
only, that is true. Moreover, the comparison between figs. 2 and 8 shows
the classification by cluster analysis is able to distinguish the parts
interested or no by landsl ide deformation exactly enough, i.e. the parts
with and without the stochastic signal as regards the altimetric
deformations. Some subsets belonging to the middle grey tonality are
located along the border of the landslide; a more refined partition could
solve the ambiguity.
Note that the cluster anal ys is app 1i ed to the averages can be usefu 1 to
recognize very important discontinuities or singularities. Besides the
cluster analysis applied to all the theorical autocorrelation function
parameters (see figs. 9, 10, 11 and 12) individuates connected homogenous
subareas useful for a successive data filtering.
Appendix
There are two fundamental types of algorithms for cluster analysis.
The lIiterative ll algorithm, like ISODATA, in which one must choose the
initial set of cluster points and the values of two parameters related to
the relative positions of the cluster points and to the dispersion of each
cluster before starting the procedure.
The algorithm with an objective function, like MEDOIDS and FUZZY
ISODATA, in which one must choose the number of cluster points only before
starting the procedure. In general, the objective function has the
following expression:
Fp ,q(C 1 ,···,C n )
n
:: l: .
1 J
e
p, q
( C. ) ,
J
where:
q
e (C.):: min oLC Ilx.- y./l
(1 ~ p ~ 00, q 2:1),
p,q J
Yj lE j
1
J P
Cj is the j-th cluster (1 ~ j ~ m),
x.1 is the i-th object (1 ~ i ~ n, n 2: m ),
Yj is the j-th cluster point (representative value for the j-th cluster Cj ),
i.e. the sum (on all the clusters) of the sum (on all the objects
belonging to each cluster) of the L distances to the q-th power of
the x. (i E C.) from the unknown clu~ter point y .. At the present, it
is po~sible td compute the cluster points y., tf,en to determinate C.
for p=q=l, p=q=2 and p=oo, q=l only.
J
J
Acknowledgements
The authors wish to thank very much prof. B. Benciolini and Mr. L.
Pa 11 ott i no , for the use of the i r programs to plot images: they were very
helpful to draw all the figures.
References
Ammannati F., Benciolini B., Mussio L., Sanso F. (1983): IIAn Experiment of
Collocation Applied to Digital Height Model Analysisll. Proceedings of the
Int. Colloquium on Mathematical Aspects of Digital Elevation Models,
Stockho 1m 19-20 Apri 1 1983, Dept. of Photogrammetry, Roya 1 I st i tute of
Technology, pp. 1.1 -1.9, Stockholm, 1983.
131
Ammannati F., Betti B., Mussio L. (1984): ilStatistical Analysis of Relevant
Terrain Features Through Digital Elevation Models". Int. Archives of
Photogrammetry and Remote Sensi ng, vo 1. XXV, part A3A/B, Ri 0 de Janei ro
17-29 June 1984, pp. 774-780, Rio de Janeiro, 1984.
Ball G.H., Hall D.J. (1967): IIA Clustering Tecnique for Summarizing
Multivarite Data ll • Behavioral Science, 12 (1967), pp. 153-155, 1967.
Colombo L., Fangi G., Mussio L., Radicioni F. (1986): "Further Development
on Digital Models in a Control Problem for the Ancona 182 Landslide".
Proceedings of the Int. Symp from Analytical to Digital of the ISPRS Comma
III, Rovaniemi 19-22 August 1986, Int. Archives of Photogrammetry and
Remote Sensing, vol. XXVI, parte 3.1, pp. 134-147, Rovaniemi, 1986.
Crippa B., Mussio L. (1987): liThe new ITM System of Programs MODEL for
Digital Modelling". Proceedings of the Int. Colloqui'um on Progress in
Terra in Mode 11 i ng, Copenhagen 20-22 May 1987, I nst i tute of Surveyi ng and
Photogrammetry, Technical University of Denmark, pp. 77-88, Lyngby, 1987.
Cunietti M., Fangi G., Mussio L., Radicioni F. (1984): IIBlock Adjustment
and Di gi ta 1 Mode 1 of Photogrammetri c Data ina Contro 1 Prob 1em for the
Ancona 82 Landsl ide ll • Int. Archives of Photogrannetry and Remote Sensing,
vol. XXV, part A3A/B, Rio de Janeiro 17-29 June 1984, pp. 774-780, Rio de
Janeiro, 1984.
Frederiksen P., Jacobi 0., Kubik K. (1984): IIModelling and Classifying
Terra in II. I nt. Arch i ves of Photogrannetry and Remote Sens i ng, vol. XXV,
part A3A/B, Ri 0 de Janei ro 17-29 June 1984, pp. 256-267, Ri 0 de Janei ro,
1984.
Kaufman L., Rousseeuw P. J. (1987): IICl usteri ng by means of Medoi ds ll • In
Dodge Y. (Ed.), Statistical Data Analysis Based on the L1-norm and Related
Methods, pp. 405-416, North-Holland, 1987.
Spath H. (1987): IIUsing the L1 norm within cluster analysis ll • In Dodge Y.
(Ed.), Statistical Data Analysis Based on the L1-norm and Related Methods,
pp. 427-433, North-Holland, 1987.
Trauvert E. (1987): IILI in Fuzzy Clustering li • In Dodge Y. (Ed.),
Statistical Data Analysis Based on the L1-norm and Related Methods, pp.
417-426, North-Holland, 1987.
Figs. 1-2 - Contour lines of data and borders of the 64 subsets of the
Noiretable area and of the 59 subsets of the Ancona 182 landslide area.
132
1.00
0.80
0.60
0.40
0.20
-0.00 - t - - - - \ - - - - - - : f - - -
-0.20
Ji1
I I D II I tdll-l
.If)
j
0
If)
to
00
If)
I
0
If)
I"-
I
0
If)
00
i
00
i
0
If)
a
If)
(7)
(7)
a
I/)
If)
(0
0
i
0
0
If)
If)
i
I
0
0
If)
a
to
0
to
;0 ;0
(0
l"-
(7)
00
~
0
N
(0
I"-
(0
v
(0
If)
I"-
-0.80
0
If)
0
If)
to
a
I/)
I")
I")
(0
(0
(0
N
-0.60
If)
00
I
0
II')
-0.40
0
0
If)
II')
(0
(0
av
0
II')
If)
(0
I/)
(0
a
to
v
0
II')
to
0
~
0
(0
(0
0
II')
to
-1.00
(0
(0
I
o
Fig. 3 - Remarkable histogram and autocorrelation
table area.
0
o
0
0
0
0
('II
0
0
~
f")
000
000
CD
,...
If)
function for the Noire-
1 .00
0.80
0.60
0.40
0.20
-0.00
-0.20
(0
I")
N
v
-
(7)
If)
If)
I")
(0001")
_
-
-0.40
(7)
v
I")
I")
00
-0.60
o
i
0
0
0
I
I
I
a
o
0
o
o
o
('II
0
0
f")
000
0
0
0
~
If)
CD
a) situation in the zone with altimetric deformations
1.00
0.80
0.60
0.40
0.20
00
1"')
;::
I"-
(7)
N
l"-
(0
v
aa
I
I
If)
v
v
(J)
- -v
v
v
(J)
N
0
0
0 0 0 0
I
N
II')
I"-
v
0
-
-0.20
N
(7)
N
v
If)
I
I
0
0
~
I
I
0
0
('II
0
0
f")
0
0
~
0
0
If)
b) situation in the zone without altimetric deformations
Fig. 4 - Remarkable histograms and autocorrelation functions for the Ancona 182 landslide area.
133
-
0
...
I/)
I"')
~
I"')
0
-
0
i
I
I
-
0
0
i
..,.
..,.
..,.
..,.
to ~
r-1 r-1 ..,. ..,.
"'- "'- "'- "'"'- "'- "'- "'- "'- "'- "'Ol
Ol
Ol
'It" Ol
Ol
"'-
'It"
<1?
-
N
\0
I/)
N
Fig. 5 - Histogram and representation of cluster analysis for the kurtoses
of the subsets of the Noiretable area.
.......,
\0
Ol
"'-
I
I
Ol
I
I
\0
I
\0
I
I/)
I
M
-
i
....
j
-
0
I
0
I
_
....
0
N
M
I"')
~ ~
I/)
W\0
I/)
~ ro 00 Ol
"'-
0
I
N
a
m0
N
I/)
N
C?
....
--0
a
0
i
~ ~ ~ ~ ~ ~ ~ ~
NO WI
~ NO VI
~ NO WI
~ ON WI
~
WI
N
....
I
N
I/)
_
0
I
a
0
i
0N
I/)
N
N
j
N 0N
--
0
I
0
I
N 0N
I/)
r-1 r-1
'It"
..- .-
0
I
0
i
a
0
I
I
I/)
N
0N
I/)
N
aN
..,.
~
I/)
\0
a
0
I
N
I/)
-
.......,
0
I
I
aN
N'N N
W"'-
I
-
0
I
I/)
0
I/)
"'-
ro
00
I I
N
a I/)
N
m Ol
1:
1/
,',,'.
."
.... :
'
····l~~i~ii;!
.. •'. .,<
.. <
>.
.,
} ....
•> •. .
,,:? ::
....
....
/.
,:..
'
•. :...........
....
::.'
Fig. 6 - Histogram and representation of cluster analysis for the kurtoses
of the subsets of the Ancona ' 82 landslide area.
134
I
I
<0....<0
_ N N
I
I
I"')
<0
I"')
-..,.
<0
..,.
o
0
0
0
0
0
Iii
0
Fig. 7 - Histogram and representation of cluster analysis for the noise
standard deviations of the subsets of the Noiretable area.
I
I
N
~
..,.
..,.
I
i
N
~
~
~
j
N
<0
I
~
<0
I
I
N
~
~
~
iii - I
N
~
N
00000000000
Fig. 8 - Histogram and representation of cluster analysis for the noise
standard deviations of the subsets of the Ancona 182 landslide
area.
135
~
ro ro m m
0
C1
I
00
~
I
I
I
I
~
0
00
00
N
00
m
o0
iii
M
00
V
00
I
i
~
~
~
00
00
I
00
00
00
f
m
00
I
I
I
I
I
0
_
N
~
V
m m m m m
0 0 0 0 0 0 0 0 0 0 000 0 0
Fig. 9 - Histogram and representation of cluster analysis for the first
autocorrelation coefficient of the subsets of the Noiretable
area.
d II
~
~
~
b[T}TiJJ~
N
~
~
~
M
~
-
~
v
~
_
~
-
~
o
0
000 0 0 0 0 0 0 0 0 0 0 0
I
-
I
-
I
I
N
N
-
~
j
I
~
~
-
~
I
v
iii
~
v
-
~
~
~
I
I
~
~
-
~
-
_
o
I
,
~
i
~
~
I
-
00
Fig. 10 - Histogram and representation of cluster analysis for the first
autocorrelation coefficient of the subsets of the Ancona 182
landslide area.
136
;:;; ;:;; ;:;; ;:;; ;:;; ;:;; ;:;; ;:;; ;:;; ;:;; ;:;; ;:;; ;:;; ;:;; ;:;; ;:;; ;:;; ;:;;
0 0 0 ~ N0 N 0 ~ 0'It'
0 ~
0 ~ 0 00 0
0
<0 <0
I/)
I/)
-
I/)
- - - - - ....
"- "-
(X)
(1)
I/)
I/)
(1)
N
N
I/)
(\j (\j
N
N
I")
I")
N
N
N
Fig. 11 - Histogram and representation of cluster analysis for the theorical autocorrelation functions first zero of the subsets of the
Noiretable area.
~
~
"-
~
0
<0
I/)
0
0
0
0
0
0
0
0
00-
(1)
(1)
(1)
(1)
(1)
(1)
(1)
(1)
(1)
(1)
(1)
(1)
(1)
(1)
(1)
(1)
(1)
(1)
(1)
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
....
00
00
00
00 00
.... 00 00
00
00
00
00
I/)
00
00
I")
00
N
N
I")
I/)
"-
(1)
I")
I/)
"-
(1)
....
(\j
"N
(1)
N
00 00
;:;;
I")
I")
Fig. 12 - Histogram and representation of cluster analisys for the theorical autocorrelation functions first zero of the subsets of the
Ancona 182 landslide area.
137
00
I/)
I")
00
"-
I")
Download