Clustering of Geotechnical Properties of Marine Sediments Through Self–Organizing Maps: An Example from the Zakynthos Canyon–Valley System, Greece M.D. Ferentinou, T. Hasiotis, and M.G. Sakellariou Abstract A methodology is proposed in order to investigate clustering tendency of data referring to geotechnical properties that describe the recent sedimentary cover at the head of Zakynthos canyon/valley system in western Greece. Furthermore, the technology of unsupervised artificial neural networks (ANNs) is applied to the particular data sets coming from a submarine environment. Self-organizing maps (SOMs) are used due to visualization and clustering capabilities for analyzing high dimensional data. SOMs implement an orderly mapping of a high-dimensional distribution onto a regular low-dimensional grid. The detected clusters correspond to different sediment types (thus, they have a clear “physical meaning”) recognized from sedimentological analysis in each of the examined data sets. The algorithm is also designed for classification in terms of supervised learning and was applied in order to predict the appropriate sediment type in new data incorporating geologists’ knowledge. A coupled model of SOMs using interaction matrix theory was finally applied in order to rate the examined geotechnical properties in an objective and quantified approach. The results were reasonable and illustrate that the most dominant parameters in the studied area are undrained shear strength, water content and silt percentage. Keywords Artificial neural networks • self organising maps • generic interaction matrix • geotechnical properties • submarine slides • Zakynthos Canyon • Greece M.D. Ferentinou () and M.G. Sakellariou School of Rural and Surveying Engineering, Laboratory of Structural Mechanics, National Technical University of Athens, 15780 Zografou, Greece e-mail: mferen@mail.ntua.gr T. Hasiotis () Department of Marine Sciences, University of the Aegean, University Hill, 81100 Mytilene, Lesvos, Greece e-mail: hasiotis@marine.aegean.gr D.C. Mosher et al. (eds.), Submarine Mass Movements and Their Consequences, Advances in Natural and Technological Hazards Research, Vol 28, © Springer Science + Business Media B.V. 2010 43 44 1 M.D. Ferentinou et al. Introduction and Scope Submarine instabilities of slope deposits are an important mechanism of sediment transport and redeposition as well as a hazard to offshore development. Although marine geophysical surveys may report the occurrence of recent and old failures, it is the geotechnical character of collected sediment cores that is used (together with other parameters) for the computation of slope stability under various environmental forces (i.e. Lee and Baraza 1999; Lykousis and Chronis 1989; Lykousis et al. 2002, 2008, references therein). Relationships between geotechnical properties give evidence of the physical conditions of sediments and a crude evaluation of their potential for instability (i.e. very low shear strengths, water content higher than the liquid limit etc.). In this paper the application of new mathematical tools is proposed in order to provide an integrated interpretation of geotechnical properties and discover tendencies of property variations, which may explain the geotechnical behaviour of the submarine slope sediments. The technology of unsupervised ANN is applied to the evaluation of the visualization and clustering capabilities of SOM for analyzing high dimensional data, coming from marine environments. This methodology is suggested in order to investigate clustering tendency of data collected from sediment cores, in clusters which have a clear physical meaning (i.e. sediment types) evident by the sedimentological analysis. An important characteristic of SOM is that they implement an orderly mapping of a high-dimensional distribution onto a regular low-dimensional grid. Moreover, the analysis is focused on the rating of the importance of the related parameters, their dominance and interaction intensity using generic interaction matrix theory (Hudson 1992). Coupling SOM with generic interaction matrix theory was successfully applied in order to rate slope stability controlling variables in sub aerial slopes (Ferentinou and Sakellariou 2007). The input training data in this study are geotechnical properties of marine sediments (sand silt and clay percentage, CaCO3, water content, Atterberg limits, wet bulk density and undrained shear strength), which were collected from the head of the Zakynthos valley/canyon system (Fig. 1). 2 Kohonen – Self Organising Maps Kohonen (1994) has established techniques for unsupervised learning based on associative properties. These techniques involve nets that learn to respond in different parts to differences in input signals and they are called ordered maps. This method was first applied to speech recognition, and subsequently has been used for data analysis in system recognition, image analysis, environmental analysis, and geotechnical engineering. Basically, SOM is a visualization, clustering and projection tool, which illustrates structures in the data in a different manner than, for example, multivariate data analysis. SOM networks combine competitive learning with dimensionality reduction by smoothing the clusters with respect to an “a priori” Clustering of Geotechnical Properties of Marine Sediments 45 Fig. 1 Location of the study area and sediment core stations. Isobaths in meters grid and provide a powerful tool for data visualization. Due to all the above mentioned characteristics, this specific training algorithm was applied to marine geotechnical data in order to investigate the non-linear relations and tendency of cluster creation among the sediment geotechnical properties. The second version of the SOM Toolbox for Matlab (Vesanto et al. 1999) was used to perform the training of the ANNs. In SOM, each neuron is presented by a weight or prototype vector, which has as many components as the dimension of the input space (i.e. as the number of input variables). In this study, during batch training the sample vectors of the input data are presented to SOM as a whole. The interactive process involves calculating and comparing the Euclidian distances between each sample vector and all the weight vectors of the SOM. For each particular input vector, the neuron whose weight vector is the best match (minimum distance) is chosen and called the Best Matching Unit (BMU). During each training step, the weight vectors are updated in such a way that the new vectors are weighted averages of the input data vectors. The neurons are connected to adjacent neurons with a neighbourhood function which dictates the structure and the topology of the map. Each neuron of the SOM has an associated dimensional prototype vector, equal to the dimension of the input space of the prototype vector and another in the output (lower dimension) space, on the map grid. This ordered grid (Fig. 2) can be used as a convenient visualization surface for showing different features of the SOM. 46 M.D. Ferentinou et al. U-matrix 1.82 1.06 0.292 147 4.64 63.5 28.5 1.54 46.5 41.7 0.178 8.19 d d d clay % silt % sand % depth cm CaCo3 % γ gr/cm3 w% 55.8 29.3 65.3 1.72 35.4 15.7 48.1 1.66 31 9.45 d 42.3 d d PI % PL % LL % Su 1.6 d 5.01 44.9 26.8 18.9 1.81 37.2 24.2 13.4 d 10.9 22.8 34.5 0.346 d d d Fig. 2 Clustering visualizations using similarity coloring for south Killini slope. U-matrix on top left, component planes (as many as variables). Each map corresponding to one variable should be compared to the label map representing the distribution in Fig. 3 3 Source Data The Zakynthos Valley/Canyon system is located within the narrow inner shelf – slope of the western Hellenic Trench and is characterized as a structural basin trending parallel to the local tectonic zones (Brooks and Ferentinos 1984) (Fig. 1). The recent sedimentation processes at the head of Zakynthos canyon were studied by Hasiotis et al. (2005) using a suite of high resolution seismic profiles and 47 sediment cores. Zakynthos canyon is not directly connected to any fluvial drainage system and it does not have a high energy wave regime, nor are there available data for internal wave resuspension and transportation of sediments to deeper waters. The head of the canyon is bounded by Killini and Zakynthos slopes, which are fault-controlled and are covered by a recent sediment drape. Extensive and complex mass movements affect both the recent sedimentary sequences and the fault-escarpment face. Along the Killini slopes the main type of failure is repeated retrogressive sliding, which was caused by the absence of downslope support of the leading blocks. The south Killini slope is also sculptured by large buried extensive slide scarps. Along the Zakynthos slope, layered sediments overlie buried slumped and mass flow deposits. Extensive and complex failures affect the central part of the slope. Oversteeping of the slope due to the salt diapirism has produced slide scarps up to 30 m in height. The Zakynthos valley is filled Clustering of Geotechnical Properties of Marine Sediments 47 Table 1 Summarized description and interpretation of the observed sediment types Sediment type, color, thickness General description ST1: Pale yellowish – brown mud (1.5–26 cm) Surficial deposit; homogeneous high carbonate content ST2: Grey mud (mm up to 100 cm) ST3: Light brown to grey mud (3–9.5 cm) ST4: Sand to sandy mud (0.5–26 cm) ST5: Plant debris (0.5–6 cm) Interpretation Calcareous mud. Accumulation from suspension Almost homogeneous; locally dis- Hemipelagic mud. playing color banding Accumulation from suspension Relative sharp contacts with the Mud of high deposurrounding sediments; low sitional rate shear strength and high water Accumulation from content. suspension Sandy horizons Turbiditic sand to sandy mud. Gravity flow deposit Well preserved debris of Posidonia Turbiditic origin oceanica; at the top of ST4 or intercalated with it Table 2 Sediment core data studied (S: sand, Z: silt, C: clay, w: water content, g: wet bulk density, Su: undrained vane shear strength, LL: liquid limit, PL: plastic limit, PI: plasticity index) Morphological unit Core number North Killini slope Z30, Z31, Z42, ST1, ST2, ST4, Depth (cm), CaCO3 (%), w (%), g (gr/cm3), Su (kPa) Z44 Z8, Z9, Z11, ST1, ST2, ST3, ST4 Depth (cm), S (%), Z (%), C (%), Z40 CaCO3 (%), w (%), g (gr/cm3), Su (kPa), LL, PL, PI Z2, Z3, Z27, ST1, ST2, ST3, ST4 Depth (cm), S (%), Z (%), C (%), Z28, Z34 CaCO3 (%), w (%), g (g/cm3), Su (kPa), LL, PL, PI South Killini slope Zakynthos slope Sediment types Studied parameters by extensive intercalated turbidtic/hemipelagic sediments and mass flow deposits. The extensive seafloor instabilities are attributed mainly to local tectonic activity, intense seismic activity and salt diapirism in relation to deep-seated gas ascension (Hasiotis et al. 2005). The sedimentological analysis of the collected cores revealed the existence of five main sediment types (Table 1). In the current work attention is focused on the geotechnical properties (coming from laboratory measurements) of the sediments, which were retrieved from the north and south Killini slopes and the Zakynthos slope and were analysed in the laboratory (Fig. 1 and Tables 2, 3). 48 M.D. Ferentinou et al. Table 3 Range of sediment core geotechnical values, (N. Kil: north Killini slope, S. Kil: south Killini slope, Zak: Zakynthos slope, d: core depth, S: sand, Z: silt, C: clay, w: water content, g: wet bulk density, LL: liquid limit, PL: plastic limit, PI: plasticity index, Su: undrained vane shear strength) N. Kil min max S. Kil min max Zak min max 4 d S (%) Z (%) C (%) CaCO3 w (%) g (gr/ LL (%) cm3) (cm) PL PI Su (kPa) 0 107 0 220 0 190 – – – 34 47 47 52 22 28 27 32 10 20 16 22 0.10 33.60 0.10 8.00 0.10 11.20 – – – 0.12 5.08 0.30 15.20 40.80 64.10 37.00 70.32 30.82 58.70 24.98 62.00 4.90 29.90 5.00 31.50 2.00 33.50 20.66 87.77 40.23 80.00 25.49 94.39 1.50 1.91 1.52 1.76 1.45 1.91 Results of Clustering The data were organized in a matrix [dlen x dim], where dlen is the number of samples, and dim is the number of input parameters. The proper data preparation is the most important step during the analysis procedure. It aims to (i) select variables and data sets to be used, (ii) clean erroneous or uninteresting values from the data, (iii) transform the data into a format which the modelling tool can best utilize and (iv) normalize the values in order to accomplish a unique scale and avoid parameter prevalence according to high values. In order to perform the following analysis using SOM Toolbox, scripts originally written by J. Vesanto (1999) in Matllab were rewritten in order to satisfy the needs of the specific data set. A batch training algorithm was used. Three grids were created one for each examined geographical subunit. The initialization of the initial weights was random. Training took place in two phases. The initial phase is a robust one, whereas the second one is fine-tuning with a smaller neighbourhood radius and smaller learning rate. The neighbourhood function that was used was Gaussian. The methodology aims to cluster detection (projection in a lower dimension space) and to discover non linear relations between data base items. The small multiples technique was used, between others (scattered diagrams, hit histograms, trajectory analysis), for data visualization. Objects in small multiples can be linked together using similar position or place. In Fig. 2 a map display is constructed using SOM algorithm for the south Killini data set. A multiple visualization consisting of 12 hexagonal grids is demonstrated. The first map on the upper left is a SOM, with values indicated using similar coloring. This map visualizes the training results and gives information about the general structure of the data and the clustering tendency (see color code map in Fig. 3). The multiple visualization is completed with the 11 maps which are called component planes. Each component plane refers to an input parameter. In these SOMs high values (hot colors) indicate the borders of the clusters, though low values (cold Clustering of Geotechnical Properties of Marine Sediments 49 Fig. 3 Projection of South Killini data set (color code, PC projection and label map). The BMUs (35) are illustrated in color code map with score numbers. The black line defines the borders of the three clusters. Starting from upper left the clusters correspond to ST2a, ST2b, ST1 colors) characterize the clusters themselves. These visualizations can only be used to obtain qualitative information. The default number of colors in the colormaps and colorbars is 64. However, it is often advantageous to use less colors in the colormap. This way the components planes visualization becomes easier to interpret. Here the eleven component planes are visualized using 64 colors, but we also applied ‘hot’ colormap visualization using only three colors. This is how we classified the parameters in Table 4 to low, medium and high. It should be mentioned that there is shrinkage of parameters range values presented on the columns adjacent to each component compared to range values presented in Table 3. The first step in the analysis of the map is visual inspection (see enlarged label map in Fig. 3) that shows the existence of three main clusters. Two clusters clearly correspond to ST2. They are almost homogeneous and exhibit low carbonate content, low water content, and high wet bulk density. These subgroups differ in clay percentage, Atterberg limits, undrained shear strength and core depth. The third cluster corresponds to ST1, which is an almost homogeneous surficial unit and depth S Z C CaCO3 w g LL PL PI Su ST Low – – – High High Low – – – Low ST 1 High – – – Low High Medium – – – Low ST 2a Medium – – – Low Low High – – – High ST2b High Low High High Low Low High Medium Low Low High ST 1 Medium Low High Medium Low Low High High High High Low ST 2a 2nd cluster South Killini slope 3rd cluster 1st cluster 2nd cluster North Killini slope 1st cluster Low Low Low High High High Low Low High Medium High ST2b 3rd cluster Low Low Low High High High Low – – – Low ST 2a 1st cluster High Low Low High Low Medium Low – – – Medium ST 2b Low High High Low High Low High – – – High ST1 2nd cluster 3rd cluster Zakynthos slope Table 4 Clustering results according to SOM (S: sand, Z: silt, C: clay, w: water content, g: wet bulk density, Su: undrained vane shear strength, LL: liquid limit, PL: plastic limit, PI: plasticity index) 50 M.D. Ferentinou et al. Clustering of Geotechnical Properties of Marine Sediments 51 exhibits low values of silt, wet bulk density and shear strength and high values of carbonate content, water content and Atterberg limits. In general, undrained shear strength seems to increase with core depth, whereas wet bulk density is inversely correlated with water content. General trends arising from the current analysis reveal an association between grain size, water content and Atterberg limits. The general trends are also in accordance with Lykousis et al. 2008. A principal component projection is made for the data and is applied to the map (Fig. 3). Three visualizations are illustrated: the color code, with clustering information and the number of hits in each unit, the projection and the labels. The projection confirms the existence of three different clusters and interpolative units seem to divide ST2 group into two subclasses, the difference being mainly clay percentage, shear strength, Atterberg limits and core depth. The most informative visualizations of all offered by SOM are simple scatter plots and histograms of all variables (Fig. 4). The sediment type information is coded as an 11th parameter. Original data points, N = 96 are in the upper triangle, map prototype values on the lower triangle and histograms on the diagonal. The color coding of the data samples has been copied from the color code map (from the BMU of each sample). This visualization reveals quite a lot of information, distributions of single and pairs of variables both in the data and in the map. Fig. 4 11 × 11 scatter diagram for the South Killini data set, for 96 samples, N = 96 52 M.D. Ferentinou et al. From this visualization many of the earlier conclusions are confirmed. For example, there appear to be three clusters, ST2a (dark green, blue), ST2b (green) and ST1 (yellow). Shear strength has a high linear correlation to core depth, and carbonate content is highly correlated to LL and PI. The training of north Killini and Zakynthos slopes data sets (Table 4) also revealed three clusters, from which two correspond to subgroups of ST2 and one to ST1. The geotechnical properties trends are also similar to the south Killini slope. Shear strength is linearly correlated with core depth and water content is inversely correlated with wet bulk density. The algorithm was also applied for classification, in order to predict the appropriate sediment type incorporating geologists’ knowledge, in case of new data, and the calculated accuracy was 79.2%, 65% and 89.1% for the south Killini, the north Killini and the Zakynthos data sets, respectively. Chang et al. (2002) used SOM in well log data and predicted lithofacies identity with 78.8% accuracy. 5 Parameters Rating: Interaction Matrix Theory and Cause/ Effect Plot Hudson (1992) suggested an analytical approach for representing rock engineering systems, as opposed to a synthetic approach, with the development of the interaction matrix device in order to represent the relevant parameters, their interaction, and the rock mass/construction behaviour. The principal factors considered relevant to the problem are listed along the leading diagonal of a square matrix (top left to bottom right) and the interactions between pairs of principal factors form the offdiagonal terms. Ferentinou and Sakellariou (2007) applied the method in landslide hazard estimation and extended the interaction matrix to soil mechanics. The method proposes coding of the interaction matrix and studying the interaction intensity and dominance of each parameter. For each principal factor, its “Cause – Effect” (C, E) coordinates can be developed. These are the sums of the values in the row and column through each principal factor. The coordinates are plotted in a ‘Cause – Effect’ space (Fig. 5). Cause (C) is the way that the parameter affects the system, though effect (E) is the effect of the system to the parameter. Parameter interaction intensity increases from zero to a maximum value which is actually equal to the dimension of the matrix. The associated maximum possible parameter dominance values rise from zero to a maximum of 50% parameter interaction intensity and then reduce back to zero. This idea of interaction matrix was applied in the three studied data sets. The scatter diagrams (Fig. 4) produced for each data set were coded using a binary system. The elements not belonging to the main diagonal were attributed a value of 1 in case they had severe correlation, otherwise they were attributed the value of 0. The elements of the leading diagonal yield a value of 0. The produced cause effect plots for each data set are presented in Fig. 5. Clustering of Geotechnical Properties of Marine Sediments 53 sand % North Killini su PL w% Effect su Effect Effect w/ CaCo3 C=E CaCo3 C = E depth / clay % silt % / LL C=E depth Zakynthos South killini depth silt % clay % su w% CaCo3 PI Cause Cause sand % Cause Fig. 5 Cause effect plots for the three data sets According to the three data sets the most dominant parameter is undrained shear strength and water content. The most interactive parameter is carbonate content for North Killini slope, and silt percentage for south Killini and Zakynthos slopes. Sand percentage is the least dominant and least interactive. 6 Discussion – Conclusions Although marine geotechnical properties are thoroughly studied, mainly for the evaluation of slope stability, this is the first time that marine sediment properties are manipulated with the above mentioned methodology. This study reveals that SOM can be an effective tool in order to successfully classify different sediment core samples according to their similar litho-geomorphological type. More specifically, the three data sets from the slopes bounding the upper part of the Zakynthos canyon/valley system revealed three clusters, taking into account the integration of multiple variables. These clusters correspond to different sediment types (thus, they have a clear “physical meaning”) recognised from sedimentological analysis in each of the three data sets. One cluster corresponds to surficial deposit of pale yellowish brown mud (ST1) and the other two correspond to subgroups of grey mud (ST2a, ST2b). Another advantage given by SOM is that it offers real insight to the data set, especially through the simple scatter plots. For example, although wet bulk density does not show any clear trend with core depth in the three data sets, map training of Zakynthos and north Killini data sets lead to the conclusion that there is a linear correlation, but within each subcluster. SOM component planes reveal useful information to interpret more easily the results that sometimes remain hidden with traditional approaches. On the other hand, one of the drawbacks of SOM is that it eliminates outlier data. This is probably why ST3 was not considered as an important sediment type by the 54 M.D. Ferentinou et al. ANN and was not recognised as a clear cluster. The core data representing this thin layer were generally few (one thin layer in each core) and sometimes incomplete; consequently there were not enough input data to represent this sediment type within the data set. ST4 unit has a high score in the map corresponding to Zakynthos data set, though it was also recognised in south Killini, where it appears to have a low score. Again this particular lithological unit is not adequately represented in the data set. Generally, ANN in order to converge and predict successfully, have to be trained with representative data of the system they have to simulate. A coupled model of SOM networks using interaction matrix theory was finally applied in order to rate the examined geotechnical properties in an objective and quantified approach. The results are again reasonable and illustrate that the most dominant parameters in the studied area are undrained shear strength, water content and silt percentage. Acknowledgments The authors would like to thank the reviewers V. Lykousis, T. Glade and H. Lee. for their constructive suggestions. M.F. was supported by a post-doctoral fellowship of the Greek State Scholarships Foundation. References Brooks, M. Ferentinos, G., 1984. Tectonics and sedimentation in the Gulf of Corinth and Kephalonia – Zante Straits, Ionian Sea, Greece. Tectonophysics, 101, 25–54. Chang, H-C., Kopaska-Merkel D.C., Chen H-C., 2002. Identification of lithofacies using Kohonen self-organising maps. Computers and Geosciences, 28, 223–229. Ferentinou, M., Sakellariou, M., 2007. Computational intelligence tools for the prediction of slope performance. Computers and Geotechnics, 34, 362–384. Hasiotis, T., Papatheodorou, G., Ferentinos G., 2005. A high resolution approach in the recent sedimentation processes at the head of Zakynthos Canyon, Western Greece. Marine Geology, 214, 49–73. Hudson J.A., 1992. Rock Engineering Systems: Theory and Practice, Horwood, Chisesteractice. Kohonen T. 1994. Self-Organising Maps. Springer, New York. Lee, H., Baraza, J., 1999. Geotechnical characteristics and slope stability in the gulf of Cadiz. Marine Geology, 155, 173–190. Lykousis, V., Chronis, G., 1989. Mass movements, geotechnical properties and slope stability in the outer shelf-upper slope, NW Aegean Sea. Marine Geotechnology, 8, 231–247. Lykousis, V., Roussakis, G., Alexandri, M., Pavlakis, P., Papoulia, I, 2002. Sliding and regional slope stability in active margins: North Aegean Trough (Mediterranean). Marine Geology, 186, 281–298. Lykousis, V., Roussakis, G.,Sakellariou D., 2008. Slope failures and stability analysis of shallow water prodeltas in the active margins of Western Greece, northeastern Mediterranean Sea. International Journal of Earth Sciences, 98, 807–822. Vesanto J., 1999. SOM-based data visualisation methods. Intelligent Data Analysis, 3, 2, 11–126.