Terrain classification by cluster analysis M. Crespi (.), G. Forlani (.), L. Mussio(.), F. Radicioni ( .. ) (.) Istituto di Topografia, Fotogrammetria e Geofisica Politecnico di Milano - Piazza L. da Vinci, 32 - 20133 Milano. ( .. ) Dipartimento di Scienze dei Materiali e della Terra Universita di Ancona - Via Brecce Bianche - 60100 Ancona. Italy Commission No. III 1. Introduction The digital terrain modelling can be obtained by different methods belonging to two principal categories: deterministic methods (e.g. polinomial and spline functions interpolation, Fourier spectra) and stochastic methods (e.g. least squares collocation and fractals, i.e. the concept of selfsimilarity in probability). To reach good resul ts, both the fi rst and the second methods need same initial suitable information which can be gained by a preprocessing of data named terrain classification. In fact, the deterministic methods require to know how is the roughness of the terrain, related to the density of the data (elevations, deformations, etc.) used for the i nterpo 1at ion, and the stochast i c methods ask for the knowledge of the autocorrelation function of the data. Moreover, may be useful or very necessary to sp 1it up the area under consideration in subareas homogeneous according to some parameters, because of different kinds of reasons (too much large initial set of data, so that they can't be processed togheter; very important discontinuities or singularities; etc.). Last but not least, may be remarkable to test the type of distribution (normal or non-normal) of the subsets obtained by the preceding selection, because the statistical properties of the normal distribution are very important (e.g., least squares linear estimations are the same of maximum likelihood and minimum variance ones). A general way to perform the terra inc 1ass i fi cat i on can cons i sts of the following steps: 1 - to split up the original set of data into subsets (may be useful dimensions of these subsets are almost equal); 2 - to compute statistical parameters of each subset; 3 - to cluster these parameters to obtain homogeneous groups (clusters) represented by one value only (cluster point), so that the objects belonging to a cluster show a high degree of similarity while the ones belonging to different clusters are as dissimilar as possible; 4 - to test the significance of each cluster point and to modify the subsets; 5 - to subdivide the test area according to the results of the cluster analysis. The elevations of the Noiretable area and the deformations of the Ancona 182 landslide are analyzed. Before showing the results of the research, the algorithm used to perform the cluster analysis is described. 2. The ISODATA algorithm This algorithm was originally published by G.H. Ball and DeJ. Hall in the 1967; some changes were introduced in the steps 3) and 5) as regards to split and to merge the groups (clusters). The essential steps of the tecnique are: 1 - to choose the initial set of cluster points and the values for the min- 128 2 3 - 4 5 - 6 The imum norm (e.g. the Euclidean distance) between two cluster points (lidminll) and for the maximum dispersion parameter (e.g. the standard deviation) of each group (lismaxll); to group togheter the data closest to the same cluster point according to the norm chosen; to split into two groups each group whose dispersion parameter is larger than "smaxll; the consequence of this operation is the birth of a new cluster point (the modality of this birth is arbitrary; the choice consists in the comparison of dispersions after having split up a group into two ones whose size are balanced according to the ratio 1:3, 2:2, 3:1; to regroup togheter the data taking into account the new cluster points; to merge into a group the groups whose cluster points are closer than IIdminll; the consequence of this operation is the death of some cluster points (the modality of this death is arbitrary; the choice consists in the joining of two or three groups, the second possibility occours if the number of groups is odd); to iterate the procedure from the step 2) (the implemented version of ISODATA a 1gori thm stops if in 3 consecuti ve i terati ons the number of cluster points isn't changed and however after 25 iterations). program CLUSTER was implemented to perform the algorithm. 3. Subdivision of the original sets of data The Noiretable test area is a hilly zone of central France, whose shape is a square wi th the sides about 3.2 km long. A very regul ar gri d of about 6600 measured elevations (average spacing 40 m) was used for computation. The Ancona 182 landslide area is a zone in the district of Ancona (Italy) interested by a 1a rge sca 1e 1ands 1ide, whose shape is almost rectangu 1a r with sides about 2.1 and 3.0 km long. About 150 cross-sections of different lengths and densities about 8900 measured altimetric deformations (average spacing 20 m) were used for computations. Taking into account the shapes of the two areas and the positions of the measurements, they were subdivided into 64 and 59 subsets respectively, as shown in the figs. 1 and 2. In such a way, subsets of data very homogeneous for the fi rst area (the size is about 100 points) and homogeneous enough for the second one (the size ranges about 80-220 points) as regards their dimensions were obtained. 4. Statistical analysis of the subsets The following statistical elaborations were carried out: - computation of average, standard deviation, asymmetry, kurtosis and first autocorrelation coefficient for each subset; - construction of the histograms of these statistical parameters; - construction of the histograms of the data for each subset (some examples in figs. 3 and 4); - computation of the empirical autocorrelation functions of the data for each subset; - best fitting of these functions with teorical ones (some examples in fi gs. 3 and 4); - construction of the histrograms of the noise standard deviation and of the theorical autocorrelation function first zero. Note that the histograms of the signal standard deviation are the same of the histograms of the first autocorrelation coefficient. Two conclusions were reached by this statistical analysis: - in general, the subsets haven't a normal distribution; - all the theorical autocorrelation functions which best fit the empirical 129 ones belong to the family of the J Bessel functions which have the 0 following expression: f ( x ) = a J (cx), where a and c are two param~ters related to the variance of the signal and the first zero. Note that the subsets of Ancona 82 landslide area were divided into two groups accord i ng to the resu 1t of compa rison between the va 1ues of the noise and the signal standard deviation as an altimetric deformation didn1t happen in all the grid-points, so that some autocorrelation functions donUt represent a stochastic process but a white noise only. These deducti ons are important because the fi rst estab 1 i shes the 1east squares method can1t reach the best results in this case, and the second gi ves a very exact i nformati on about the most sui tab 1e theori ca 1 autocorrelation function to use in the stocastic approach to the digital terrain modelling. l 5. Cluster analysis of statistical parameters The cluster analysis was applied to all the statistical parameter computed before. The results regarding kurtosis, noise standard deviation, first autocorre 1ati on coeffi ci ent and theori ca 1 autocorre 1ati on functi on fi rst zero are shown only for the sake of brevity. On the other hand, these last are the most meaningful results, because the kurtosis is generally discriminant about the normality of the distribution and the other parameters characterize completely the autocorrelation function, then the stochastic process. The choices regarding the initial set of cluster points and the values for IIsmax" and "dminll were made in the following way for each statistical parameter: - the initial cluster points were the averages of the modal classes found by the histogram; - the values for "smax " and IIdminli were assumed equal to the double of the class interval of the histogram. At the present a significance test for of the cluster points isn1t available. Indeed to test them, distribution-free tests concerning non-random samples are necessary. So the figures show the results of the ISODATA algorithm without subsequent refinings; however some observations can be developed. The grey tonal ity of the fig. 5 is uniform: the kurtoses of the subsets be long i ng to the No i retab 1e a rea are grouped togheter a round one cluster point only, whose value is remarkably lower than 3 (the significance is evalueted by the D1Agostino-Pearson test) according to the powerful Kolmogorov-Smirnov test. It means not only the most part of subsets haven't a normal distribution (same of them could have it, but they are so few they can1t cause the birth of an other cluster point), but also the global set of data hasn't a normal distribution. Also the most part of the subsets of data of the Ancona 182 landslide has a kurtosis remarkably greater than 3. Indeed principal clusters show a behaviour apart from the normality, but the geomorphological interpretation of each cluster isn1t immediate (see fig. 6). The grey tonality of the fig. 7 is uniform: the noise standard deviation of the subsets bel ongi ng to the Noi retab 1e area are grouped togheter around one cluster point only whose square value is remarkably lower than 0.5 times the standard deviation of the process. It means there is a stochastic process as regards the elevations in each part of the area. The clearest grey tonality of the fig. 8 corresponds to the cluster point of the lowest value for the noise standard deviation computed on Ancona '82 130 1ands 1 i de area data; its square va 1ue is remarkab 1y lower than 0.5 times the standard deviation of the process. It means there is a stochastic process as regards the a1timetri c deformati ons in some parts of the area only, that is true. Moreover, the comparison between figs. 2 and 8 shows the classification by cluster analysis is able to distinguish the parts interested or no by landsl ide deformation exactly enough, i.e. the parts with and without the stochastic signal as regards the altimetric deformations. Some subsets belonging to the middle grey tonality are located along the border of the landslide; a more refined partition could solve the ambiguity. Note that the cluster anal ys is app 1i ed to the averages can be usefu 1 to recognize very important discontinuities or singularities. Besides the cluster analysis applied to all the theorical autocorrelation function parameters (see figs. 9, 10, 11 and 12) individuates connected homogenous subareas useful for a successive data filtering. Appendix There are two fundamental types of algorithms for cluster analysis. The lIiterative ll algorithm, like ISODATA, in which one must choose the initial set of cluster points and the values of two parameters related to the relative positions of the cluster points and to the dispersion of each cluster before starting the procedure. The algorithm with an objective function, like MEDOIDS and FUZZY ISODATA, in which one must choose the number of cluster points only before starting the procedure. In general, the objective function has the following expression: Fp ,q(C 1 ,···,C n ) n :: l: . 1 J e p, q ( C. ) , J where: q e (C.):: min oLC Ilx.- y./l (1 ~ p ~ 00, q 2:1), p,q J Yj lE j 1 J P Cj is the j-th cluster (1 ~ j ~ m), x.1 is the i-th object (1 ~ i ~ n, n 2: m ), Yj is the j-th cluster point (representative value for the j-th cluster Cj ), i.e. the sum (on all the clusters) of the sum (on all the objects belonging to each cluster) of the L distances to the q-th power of the x. (i E C.) from the unknown clu~ter point y .. At the present, it is po~sible td compute the cluster points y., tf,en to determinate C. for p=q=l, p=q=2 and p=oo, q=l only. J J Acknowledgements The authors wish to thank very much prof. B. Benciolini and Mr. L. Pa 11 ott i no , for the use of the i r programs to plot images: they were very helpful to draw all the figures. References Ammannati F., Benciolini B., Mussio L., Sanso F. (1983): IIAn Experiment of Collocation Applied to Digital Height Model Analysisll. Proceedings of the Int. Colloquium on Mathematical Aspects of Digital Elevation Models, Stockho 1m 19-20 Apri 1 1983, Dept. of Photogrammetry, Roya 1 I st i tute of Technology, pp. 1.1 -1.9, Stockholm, 1983. 131 Ammannati F., Betti B., Mussio L. (1984): ilStatistical Analysis of Relevant Terrain Features Through Digital Elevation Models". Int. Archives of Photogrammetry and Remote Sensi ng, vo 1. XXV, part A3A/B, Ri 0 de Janei ro 17-29 June 1984, pp. 774-780, Rio de Janeiro, 1984. Ball G.H., Hall D.J. (1967): IIA Clustering Tecnique for Summarizing Multivarite Data ll • Behavioral Science, 12 (1967), pp. 153-155, 1967. Colombo L., Fangi G., Mussio L., Radicioni F. (1986): "Further Development on Digital Models in a Control Problem for the Ancona 182 Landslide". Proceedings of the Int. Symp from Analytical to Digital of the ISPRS Comma III, Rovaniemi 19-22 August 1986, Int. Archives of Photogrammetry and Remote Sensing, vol. XXVI, parte 3.1, pp. 134-147, Rovaniemi, 1986. Crippa B., Mussio L. (1987): liThe new ITM System of Programs MODEL for Digital Modelling". Proceedings of the Int. Colloqui'um on Progress in Terra in Mode 11 i ng, Copenhagen 20-22 May 1987, I nst i tute of Surveyi ng and Photogrammetry, Technical University of Denmark, pp. 77-88, Lyngby, 1987. Cunietti M., Fangi G., Mussio L., Radicioni F. (1984): IIBlock Adjustment and Di gi ta 1 Mode 1 of Photogrammetri c Data ina Contro 1 Prob 1em for the Ancona 82 Landsl ide ll • Int. Archives of Photogrannetry and Remote Sensing, vol. XXV, part A3A/B, Rio de Janeiro 17-29 June 1984, pp. 774-780, Rio de Janeiro, 1984. Frederiksen P., Jacobi 0., Kubik K. (1984): IIModelling and Classifying Terra in II. I nt. Arch i ves of Photogrannetry and Remote Sens i ng, vol. XXV, part A3A/B, Ri 0 de Janei ro 17-29 June 1984, pp. 256-267, Ri 0 de Janei ro, 1984. Kaufman L., Rousseeuw P. J. (1987): IICl usteri ng by means of Medoi ds ll • In Dodge Y. (Ed.), Statistical Data Analysis Based on the L1-norm and Related Methods, pp. 405-416, North-Holland, 1987. Spath H. (1987): IIUsing the L1 norm within cluster analysis ll • In Dodge Y. (Ed.), Statistical Data Analysis Based on the L1-norm and Related Methods, pp. 427-433, North-Holland, 1987. Trauvert E. (1987): IILI in Fuzzy Clustering li • In Dodge Y. (Ed.), Statistical Data Analysis Based on the L1-norm and Related Methods, pp. 417-426, North-Holland, 1987. Figs. 1-2 - Contour lines of data and borders of the 64 subsets of the Noiretable area and of the 59 subsets of the Ancona 182 landslide area. 132 1.00 0.80 0.60 0.40 0.20 -0.00 - t - - - - \ - - - - - - : f - - - -0.20 Ji1 I I D II I tdll-l .If) j 0 If) to 00 If) I 0 If) I"- I 0 If) 00 i 00 i 0 If) a If) (7) (7) a I/) If) (0 0 i 0 0 If) If) i I 0 0 If) a to 0 to ;0 ;0 (0 l"- (7) 00 ~ 0 N (0 I"- (0 v (0 If) I"- -0.80 0 If) 0 If) to a I/) I") I") (0 (0 (0 N -0.60 If) 00 I 0 II') -0.40 0 0 If) II') (0 (0 av 0 II') If) (0 I/) (0 a to v 0 II') to 0 ~ 0 (0 (0 0 II') to -1.00 (0 (0 I o Fig. 3 - Remarkable histogram and autocorrelation table area. 0 o 0 0 0 0 ('II 0 0 ~ f") 000 000 CD ,... If) function for the Noire- 1 .00 0.80 0.60 0.40 0.20 -0.00 -0.20 (0 I") N v - (7) If) If) I") (0001") _ - -0.40 (7) v I") I") 00 -0.60 o i 0 0 0 I I I a o 0 o o o ('II 0 0 f") 000 0 0 0 ~ If) CD a) situation in the zone with altimetric deformations 1.00 0.80 0.60 0.40 0.20 00 1"') ;:: I"- (7) N l"- (0 v aa I I If) v v (J) - -v v v (J) N 0 0 0 0 0 0 I N II') I"- v 0 - -0.20 N (7) N v If) I I 0 0 ~ I I 0 0 ('II 0 0 f") 0 0 ~ 0 0 If) b) situation in the zone without altimetric deformations Fig. 4 - Remarkable histograms and autocorrelation functions for the Ancona 182 landslide area. 133 - 0 ... I/) I"') ~ I"') 0 - 0 i I I - 0 0 i ..,. ..,. ..,. ..,. to ~ r-1 r-1 ..,. ..,. "'- "'- "'- "'"'- "'- "'- "'- "'- "'- "'Ol Ol Ol 'It" Ol Ol "'- 'It" <1? - N \0 I/) N Fig. 5 - Histogram and representation of cluster analysis for the kurtoses of the subsets of the Noiretable area. ......., \0 Ol "'- I I Ol I I \0 I \0 I I/) I M - i .... j - 0 I 0 I _ .... 0 N M I"') ~ ~ I/) W\0 I/) ~ ro 00 Ol "'- 0 I N a m0 N I/) N C? .... --0 a 0 i ~ ~ ~ ~ ~ ~ ~ ~ NO WI ~ NO VI ~ NO WI ~ ON WI ~ WI N .... I N I/) _ 0 I a 0 i 0N I/) N N j N 0N -- 0 I 0 I N 0N I/) r-1 r-1 'It" ..- .- 0 I 0 i a 0 I I I/) N 0N I/) N aN ..,. ~ I/) \0 a 0 I N I/) - ......., 0 I I aN N'N N W"'- I - 0 I I/) 0 I/) "'- ro 00 I I N a I/) N m Ol 1: 1/ ,',,'. ." .... : ' ····l~~i~ii;! .. •'. .,< .. < >. ., } .... •> •. . ,,:? :: .... .... /. ,:.. ' •. :........... .... ::.' Fig. 6 - Histogram and representation of cluster analysis for the kurtoses of the subsets of the Ancona ' 82 landslide area. 134 I I <0....<0 _ N N I I I"') <0 I"') -..,. <0 ..,. o 0 0 0 0 0 Iii 0 Fig. 7 - Histogram and representation of cluster analysis for the noise standard deviations of the subsets of the Noiretable area. I I N ~ ..,. ..,. I i N ~ ~ ~ j N <0 I ~ <0 I I N ~ ~ ~ iii - I N ~ N 00000000000 Fig. 8 - Histogram and representation of cluster analysis for the noise standard deviations of the subsets of the Ancona 182 landslide area. 135 ~ ro ro m m 0 C1 I 00 ~ I I I I ~ 0 00 00 N 00 m o0 iii M 00 V 00 I i ~ ~ ~ 00 00 I 00 00 00 f m 00 I I I I I 0 _ N ~ V m m m m m 0 0 0 0 0 0 0 0 0 0 000 0 0 Fig. 9 - Histogram and representation of cluster analysis for the first autocorrelation coefficient of the subsets of the Noiretable area. d II ~ ~ ~ b[T}TiJJ~ N ~ ~ ~ M ~ - ~ v ~ _ ~ - ~ o 0 000 0 0 0 0 0 0 0 0 0 0 0 I - I - I I N N - ~ j I ~ ~ - ~ I v iii ~ v - ~ ~ ~ I I ~ ~ - ~ - _ o I , ~ i ~ ~ I - 00 Fig. 10 - Histogram and representation of cluster analysis for the first autocorrelation coefficient of the subsets of the Ancona 182 landslide area. 136 ;:;; ;:;; ;:;; ;:;; ;:;; ;:;; ;:;; ;:;; ;:;; ;:;; ;:;; ;:;; ;:;; ;:;; ;:;; ;:;; ;:;; ;:;; 0 0 0 ~ N0 N 0 ~ 0'It' 0 ~ 0 ~ 0 00 0 0 <0 <0 I/) I/) - I/) - - - - - .... "- "- (X) (1) I/) I/) (1) N N I/) (\j (\j N N I") I") N N N Fig. 11 - Histogram and representation of cluster analysis for the theorical autocorrelation functions first zero of the subsets of the Noiretable area. ~ ~ "- ~ 0 <0 I/) 0 0 0 0 0 0 0 0 00- (1) (1) (1) (1) (1) (1) (1) (1) (1) (1) (1) (1) (1) (1) (1) (1) (1) (1) (1) 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 .... 00 00 00 00 00 .... 00 00 00 00 00 00 I/) 00 00 I") 00 N N I") I/) "- (1) I") I/) "- (1) .... (\j "N (1) N 00 00 ;:;; I") I") Fig. 12 - Histogram and representation of cluster analisys for the theorical autocorrelation functions first zero of the subsets of the Ancona 182 landslide area. 137 00 I/) I") 00 "- I")