Clustering of Geotechnical Properties of Marine Sediments Through Self–Organizing Maps:

advertisement
Clustering of Geotechnical Properties of Marine
Sediments Through Self–Organizing Maps:
An Example from the Zakynthos
Canyon–Valley System, Greece
M.D. Ferentinou, T. Hasiotis, and M.G. Sakellariou
Abstract A methodology is proposed in order to investigate clustering tendency of
data referring to geotechnical properties that describe the recent sedimentary cover
at the head of Zakynthos canyon/valley system in western Greece. Furthermore,
the technology of unsupervised artificial neural networks (ANNs) is applied to the
particular data sets coming from a submarine environment. Self-organizing maps
(SOMs) are used due to visualization and clustering capabilities for analyzing high
dimensional data. SOMs implement an orderly mapping of a high-dimensional
distribution onto a regular low-dimensional grid. The detected clusters correspond
to different sediment types (thus, they have a clear “physical meaning”) recognized
from sedimentological analysis in each of the examined data sets. The algorithm
is also designed for classification in terms of supervised learning and was applied
in order to predict the appropriate sediment type in new data incorporating geologists’ knowledge. A coupled model of SOMs using interaction matrix theory was
finally applied in order to rate the examined geotechnical properties in an objective
and quantified approach. The results were reasonable and illustrate that the most
dominant parameters in the studied area are undrained shear strength, water content
and silt percentage.
Keywords Artificial neural networks • self organising maps • generic interaction matrix • geotechnical properties • submarine slides • Zakynthos
Canyon • Greece
M.D. Ferentinou () and M.G. Sakellariou
School of Rural and Surveying Engineering, Laboratory of Structural Mechanics, National Technical
University of Athens, 15780 Zografou, Greece
e-mail: mferen@mail.ntua.gr
T. Hasiotis ()
Department of Marine Sciences, University of the Aegean, University Hill, 81100 Mytilene,
Lesvos, Greece
e-mail: hasiotis@marine.aegean.gr
D.C. Mosher et al. (eds.), Submarine Mass Movements and Their Consequences,
Advances in Natural and Technological Hazards Research, Vol 28,
© Springer Science + Business Media B.V. 2010
43
44
1
M.D. Ferentinou et al.
Introduction and Scope
Submarine instabilities of slope deposits are an important mechanism of sediment
transport and redeposition as well as a hazard to offshore development. Although
marine geophysical surveys may report the occurrence of recent and old failures, it
is the geotechnical character of collected sediment cores that is used (together with
other parameters) for the computation of slope stability under various environmental forces (i.e. Lee and Baraza 1999; Lykousis and Chronis 1989; Lykousis et al.
2002, 2008, references therein). Relationships between geotechnical properties
give evidence of the physical conditions of sediments and a crude evaluation of
their potential for instability (i.e. very low shear strengths, water content higher
than the liquid limit etc.).
In this paper the application of new mathematical tools is proposed in order to
provide an integrated interpretation of geotechnical properties and discover tendencies of property variations, which may explain the geotechnical behaviour of the
submarine slope sediments. The technology of unsupervised ANN is applied to the
evaluation of the visualization and clustering capabilities of SOM for analyzing high
dimensional data, coming from marine environments. This methodology is suggested in order to investigate clustering tendency of data collected from sediment
cores, in clusters which have a clear physical meaning (i.e. sediment types) evident
by the sedimentological analysis. An important characteristic of SOM is that they
implement an orderly mapping of a high-dimensional distribution onto a regular
low-dimensional grid. Moreover, the analysis is focused on the rating of the importance of the related parameters, their dominance and interaction intensity using
generic interaction matrix theory (Hudson 1992). Coupling SOM with generic interaction matrix theory was successfully applied in order to rate slope stability controlling variables in sub aerial slopes (Ferentinou and Sakellariou 2007).
The input training data in this study are geotechnical properties of marine sediments (sand silt and clay percentage, CaCO3, water content, Atterberg limits, wet
bulk density and undrained shear strength), which were collected from the head of
the Zakynthos valley/canyon system (Fig. 1).
2
Kohonen – Self Organising Maps
Kohonen (1994) has established techniques for unsupervised learning based on
associative properties. These techniques involve nets that learn to respond in different parts to differences in input signals and they are called ordered maps. This
method was first applied to speech recognition, and subsequently has been used for
data analysis in system recognition, image analysis, environmental analysis, and
geotechnical engineering. Basically, SOM is a visualization, clustering and projection tool, which illustrates structures in the data in a different manner than, for
example, multivariate data analysis. SOM networks combine competitive learning
with dimensionality reduction by smoothing the clusters with respect to an “a priori”
Clustering of Geotechnical Properties of Marine Sediments
45
Fig. 1 Location of the study area and sediment core stations. Isobaths in meters
grid and provide a powerful tool for data visualization. Due to all the above mentioned
characteristics, this specific training algorithm was applied to marine geotechnical
data in order to investigate the non-linear relations and tendency of cluster creation
among the sediment geotechnical properties.
The second version of the SOM Toolbox for Matlab (Vesanto et al. 1999) was
used to perform the training of the ANNs. In SOM, each neuron is presented by a
weight or prototype vector, which has as many components as the dimension of the
input space (i.e. as the number of input variables). In this study, during batch training
the sample vectors of the input data are presented to SOM as a whole. The interactive process involves calculating and comparing the Euclidian distances between
each sample vector and all the weight vectors of the SOM. For each particular input
vector, the neuron whose weight vector is the best match (minimum distance) is
chosen and called the Best Matching Unit (BMU). During each training step, the
weight vectors are updated in such a way that the new vectors are weighted averages
of the input data vectors. The neurons are connected to adjacent neurons with a
neighbourhood function which dictates the structure and the topology of the map.
Each neuron of the SOM has an associated dimensional prototype vector, equal to
the dimension of the input space of the prototype vector and another in the output
(lower dimension) space, on the map grid. This ordered grid (Fig. 2) can be used as
a convenient visualization surface for showing different features of the SOM.
46
M.D. Ferentinou et al.
U-matrix
1.82
1.06
0.292
147
4.64
63.5
28.5
1.54
46.5
41.7
0.178
8.19
d
d
d
clay %
silt %
sand %
depth cm
CaCo3 %
γ gr/cm3
w%
55.8
29.3
65.3
1.72
35.4
15.7
48.1
1.66
31
9.45
d
42.3
d
d
PI %
PL %
LL %
Su
1.6
d
5.01
44.9
26.8
18.9
1.81
37.2
24.2
13.4
d
10.9
22.8
34.5
0.346
d
d
d
Fig. 2 Clustering visualizations using similarity coloring for south Killini slope. U-matrix on top
left, component planes (as many as variables). Each map corresponding to one variable should be
compared to the label map representing the distribution in Fig. 3
3
Source Data
The Zakynthos Valley/Canyon system is located within the narrow inner shelf –
slope of the western Hellenic Trench and is characterized as a structural basin
trending parallel to the local tectonic zones (Brooks and Ferentinos 1984) (Fig. 1).
The recent sedimentation processes at the head of Zakynthos canyon were studied
by Hasiotis et al. (2005) using a suite of high resolution seismic profiles and 47
sediment cores. Zakynthos canyon is not directly connected to any fluvial drainage
system and it does not have a high energy wave regime, nor are there available data
for internal wave resuspension and transportation of sediments to deeper waters.
The head of the canyon is bounded by Killini and Zakynthos slopes, which are
fault-controlled and are covered by a recent sediment drape.
Extensive and complex mass movements affect both the recent sedimentary
sequences and the fault-escarpment face. Along the Killini slopes the main type of
failure is repeated retrogressive sliding, which was caused by the absence of downslope support of the leading blocks. The south Killini slope is also sculptured by
large buried extensive slide scarps. Along the Zakynthos slope, layered sediments
overlie buried slumped and mass flow deposits. Extensive and complex failures
affect the central part of the slope. Oversteeping of the slope due to the salt diapirism has produced slide scarps up to 30 m in height. The Zakynthos valley is filled
Clustering of Geotechnical Properties of Marine Sediments
47
Table 1 Summarized description and interpretation of the observed sediment types
Sediment type, color, thickness
General description
ST1: Pale yellowish – brown
mud (1.5–26 cm)
Surficial deposit; homogeneous
high carbonate content
ST2: Grey mud (mm up to
100 cm)
ST3: Light brown to grey mud
(3–9.5 cm)
ST4: Sand to sandy mud
(0.5–26 cm)
ST5: Plant debris (0.5–6 cm)
Interpretation
Calcareous mud.
Accumulation from
suspension
Almost homogeneous; locally dis- Hemipelagic mud.
playing color banding
Accumulation from
suspension
Relative sharp contacts with the
Mud of high deposurrounding sediments; low
sitional rate
shear strength and high water
Accumulation from
content.
suspension
Sandy horizons
Turbiditic sand to sandy
mud. Gravity flow
deposit
Well preserved debris of Posidonia Turbiditic origin
oceanica; at the top of ST4 or
intercalated with it
Table 2 Sediment core data studied (S: sand, Z: silt, C: clay, w: water content, g: wet bulk density,
Su: undrained vane shear strength, LL: liquid limit, PL: plastic limit, PI: plasticity index)
Morphological unit
Core number
North Killini slope
Z30, Z31, Z42, ST1, ST2, ST4,
Depth (cm), CaCO3 (%), w (%), g
(gr/cm3), Su (kPa)
Z44
Z8, Z9, Z11,
ST1, ST2, ST3, ST4 Depth (cm), S (%), Z (%), C (%),
Z40
CaCO3 (%), w (%), g (gr/cm3),
Su (kPa), LL, PL, PI
Z2, Z3, Z27,
ST1, ST2, ST3, ST4 Depth (cm), S (%), Z (%), C (%),
Z28, Z34
CaCO3 (%), w (%), g (g/cm3),
Su (kPa), LL, PL, PI
South Killini slope
Zakynthos slope
Sediment types
Studied parameters
by extensive intercalated turbidtic/hemipelagic sediments and mass flow deposits.
The extensive seafloor instabilities are attributed mainly to local tectonic activity,
intense seismic activity and salt diapirism in relation to deep-seated gas ascension
(Hasiotis et al. 2005).
The sedimentological analysis of the collected cores revealed the existence of
five main sediment types (Table 1). In the current work attention is focused on the
geotechnical properties (coming from laboratory measurements) of the sediments,
which were retrieved from the north and south Killini slopes and the Zakynthos
slope and were analysed in the laboratory (Fig. 1 and Tables 2, 3).
48
M.D. Ferentinou et al.
Table 3 Range of sediment core geotechnical values, (N. Kil: north Killini slope, S. Kil: south
Killini slope, Zak: Zakynthos slope, d: core depth, S: sand, Z: silt, C: clay, w: water content, g:
wet bulk density, LL: liquid limit, PL: plastic limit, PI: plasticity index, Su: undrained vane shear
strength)
N. Kil min
max
S. Kil min
max
Zak
min
max
4
d
S (%) Z (%) C (%) CaCO3 w (%) g (gr/ LL
(%)
cm3)
(cm)
PL PI
Su
(kPa)
0
107
0
220
0
190
–
–
–
34
47
47
52
22
28
27
32
10
20
16
22
0.10
33.60
0.10
8.00
0.10
11.20
–
–
–
0.12
5.08
0.30
15.20
40.80
64.10
37.00
70.32
30.82
58.70
24.98
62.00
4.90
29.90
5.00
31.50
2.00
33.50
20.66
87.77
40.23
80.00
25.49
94.39
1.50
1.91
1.52
1.76
1.45
1.91
Results of Clustering
The data were organized in a matrix [dlen x dim], where dlen is the number of
samples, and dim is the number of input parameters. The proper data preparation is
the most important step during the analysis procedure. It aims to (i) select variables
and data sets to be used, (ii) clean erroneous or uninteresting values from the data,
(iii) transform the data into a format which the modelling tool can best utilize and
(iv) normalize the values in order to accomplish a unique scale and avoid parameter
prevalence according to high values.
In order to perform the following analysis using SOM Toolbox, scripts originally written by J. Vesanto (1999) in Matllab were rewritten in order to satisfy the
needs of the specific data set. A batch training algorithm was used. Three grids
were created one for each examined geographical subunit. The initialization of the
initial weights was random. Training took place in two phases. The initial phase is
a robust one, whereas the second one is fine-tuning with a smaller neighbourhood
radius and smaller learning rate. The neighbourhood function that was used was
Gaussian. The methodology aims to cluster detection (projection in a lower dimension space) and to discover non linear relations between data base items. The small
multiples technique was used, between others (scattered diagrams, hit histograms,
trajectory analysis), for data visualization. Objects in small multiples can be linked
together using similar position or place.
In Fig. 2 a map display is constructed using SOM algorithm for the south Killini
data set. A multiple visualization consisting of 12 hexagonal grids is demonstrated.
The first map on the upper left is a SOM, with values indicated using similar coloring. This map visualizes the training results and gives information about the general
structure of the data and the clustering tendency (see color code map in Fig. 3). The
multiple visualization is completed with the 11 maps which are called component
planes. Each component plane refers to an input parameter. In these SOMs high
values (hot colors) indicate the borders of the clusters, though low values (cold
Clustering of Geotechnical Properties of Marine Sediments
49
Fig. 3 Projection of South Killini data set (color code, PC projection and label map). The BMUs
(35) are illustrated in color code map with score numbers. The black line defines the borders of
the three clusters. Starting from upper left the clusters correspond to ST2a, ST2b, ST1
colors) characterize the clusters themselves. These visualizations can only be used
to obtain qualitative information. The default number of colors in the colormaps
and colorbars is 64. However, it is often advantageous to use less colors in the
colormap. This way the components planes visualization becomes easier to interpret. Here the eleven component planes are visualized using 64 colors, but we also
applied ‘hot’ colormap visualization using only three colors. This is how we classified the parameters in Table 4 to low, medium and high. It should be mentioned
that there is shrinkage of parameters range values presented on the columns adjacent to each component compared to range values presented in Table 3.
The first step in the analysis of the map is visual inspection (see enlarged label
map in Fig. 3) that shows the existence of three main clusters. Two clusters clearly
correspond to ST2. They are almost homogeneous and exhibit low carbonate content, low water content, and high wet bulk density. These subgroups differ in clay
percentage, Atterberg limits, undrained shear strength and core depth. The third
cluster corresponds to ST1, which is an almost homogeneous surficial unit and
depth
S
Z
C
CaCO3
w
g
LL
PL
PI
Su
ST
Low
–
–
–
High
High
Low
–
–
–
Low
ST 1
High
–
–
–
Low
High
Medium
–
–
–
Low
ST 2a
Medium
–
–
–
Low
Low
High
–
–
–
High
ST2b
High
Low
High
High
Low
Low
High
Medium
Low
Low
High
ST 1
Medium
Low
High
Medium
Low
Low
High
High
High
High
Low
ST 2a
2nd cluster
South Killini slope
3rd cluster
1st cluster
2nd cluster
North Killini slope
1st cluster
Low
Low
Low
High
High
High
Low
Low
High
Medium
High
ST2b
3rd cluster
Low
Low
Low
High
High
High
Low
–
–
–
Low
ST 2a
1st cluster
High
Low
Low
High
Low
Medium
Low
–
–
–
Medium
ST 2b
Low
High
High
Low
High
Low
High
–
–
–
High
ST1
2nd cluster 3rd cluster
Zakynthos slope
Table 4 Clustering results according to SOM (S: sand, Z: silt, C: clay, w: water content, g: wet bulk density, Su: undrained vane shear strength,
LL: liquid limit, PL: plastic limit, PI: plasticity index)
50
M.D. Ferentinou et al.
Clustering of Geotechnical Properties of Marine Sediments
51
exhibits low values of silt, wet bulk density and shear strength and high values of
carbonate content, water content and Atterberg limits. In general, undrained shear
strength seems to increase with core depth, whereas wet bulk density is inversely
correlated with water content. General trends arising from the current analysis
reveal an association between grain size, water content and Atterberg limits. The
general trends are also in accordance with Lykousis et al. 2008.
A principal component projection is made for the data and is applied to the map
(Fig. 3). Three visualizations are illustrated: the color code, with clustering information and the number of hits in each unit, the projection and the labels. The projection confirms the existence of three different clusters and interpolative units
seem to divide ST2 group into two subclasses, the difference being mainly clay
percentage, shear strength, Atterberg limits and core depth.
The most informative visualizations of all offered by SOM are simple scatter
plots and histograms of all variables (Fig. 4). The sediment type information is
coded as an 11th parameter. Original data points, N = 96 are in the upper triangle,
map prototype values on the lower triangle and histograms on the diagonal. The
color coding of the data samples has been copied from the color code map (from
the BMU of each sample). This visualization reveals quite a lot of information,
distributions of single and pairs of variables both in the data and in the map.
Fig. 4 11 × 11 scatter diagram for the South Killini data set, for 96 samples, N = 96
52
M.D. Ferentinou et al.
From this visualization many of the earlier conclusions are confirmed. For example, there appear to be three clusters, ST2a (dark green, blue), ST2b (green) and
ST1 (yellow). Shear strength has a high linear correlation to core depth, and carbonate content is highly correlated to LL and PI.
The training of north Killini and Zakynthos slopes data sets (Table 4) also
revealed three clusters, from which two correspond to subgroups of ST2 and one to
ST1. The geotechnical properties trends are also similar to the south Killini slope.
Shear strength is linearly correlated with core depth and water content is inversely
correlated with wet bulk density. The algorithm was also applied for classification,
in order to predict the appropriate sediment type incorporating geologists’ knowledge, in case of new data, and the calculated accuracy was 79.2%, 65% and 89.1%
for the south Killini, the north Killini and the Zakynthos data sets, respectively.
Chang et al. (2002) used SOM in well log data and predicted lithofacies identity
with 78.8% accuracy.
5
Parameters Rating: Interaction Matrix Theory and Cause/
Effect Plot
Hudson (1992) suggested an analytical approach for representing rock engineering
systems, as opposed to a synthetic approach, with the development of the interaction matrix device in order to represent the relevant parameters, their interaction,
and the rock mass/construction behaviour. The principal factors considered relevant
to the problem are listed along the leading diagonal of a square matrix (top left to
bottom right) and the interactions between pairs of principal factors form the offdiagonal terms.
Ferentinou and Sakellariou (2007) applied the method in landslide hazard estimation and extended the interaction matrix to soil mechanics. The method proposes
coding of the interaction matrix and studying the interaction intensity and dominance of each parameter. For each principal factor, its “Cause – Effect” (C, E)
coordinates can be developed. These are the sums of the values in the row and
column through each principal factor. The coordinates are plotted in a ‘Cause –
Effect’ space (Fig. 5). Cause (C) is the way that the parameter affects the system,
though effect (E) is the effect of the system to the parameter. Parameter interaction
intensity increases from zero to a maximum value which is actually equal to the
dimension of the matrix. The associated maximum possible parameter dominance
values rise from zero to a maximum of 50% parameter interaction intensity and
then reduce back to zero. This idea of interaction matrix was applied in the three
studied data sets. The scatter diagrams (Fig. 4) produced for each data set were
coded using a binary system. The elements not belonging to the main diagonal were
attributed a value of 1 in case they had severe correlation, otherwise they were
attributed the value of 0. The elements of the leading diagonal yield a value of 0.
The produced cause effect plots for each data set are presented in Fig. 5.
Clustering of Geotechnical Properties of Marine Sediments
53
sand %
North Killini
su
PL
w%
Effect
su
Effect
Effect
w/ CaCo3
C=E
CaCo3 C = E
depth / clay % silt % / LL
C=E
depth
Zakynthos
South killini
depth
silt %
clay % su
w%
CaCo3
PI
Cause
Cause
sand %
Cause
Fig. 5 Cause effect plots for the three data sets
According to the three data sets the most dominant parameter is undrained shear
strength and water content. The most interactive parameter is carbonate content for
North Killini slope, and silt percentage for south Killini and Zakynthos slopes.
Sand percentage is the least dominant and least interactive.
6
Discussion – Conclusions
Although marine geotechnical properties are thoroughly studied, mainly for the
evaluation of slope stability, this is the first time that marine sediment properties are
manipulated with the above mentioned methodology. This study reveals that SOM
can be an effective tool in order to successfully classify different sediment core
samples according to their similar litho-geomorphological type. More specifically,
the three data sets from the slopes bounding the upper part of the Zakynthos canyon/valley system revealed three clusters, taking into account the integration of
multiple variables. These clusters correspond to different sediment types (thus, they
have a clear “physical meaning”) recognised from sedimentological analysis in
each of the three data sets. One cluster corresponds to surficial deposit of pale yellowish brown mud (ST1) and the other two correspond to subgroups of grey mud
(ST2a, ST2b).
Another advantage given by SOM is that it offers real insight to the data set,
especially through the simple scatter plots. For example, although wet bulk density
does not show any clear trend with core depth in the three data sets, map training
of Zakynthos and north Killini data sets lead to the conclusion that there is a linear
correlation, but within each subcluster. SOM component planes reveal useful information to interpret more easily the results that sometimes remain hidden with traditional approaches.
On the other hand, one of the drawbacks of SOM is that it eliminates outlier data.
This is probably why ST3 was not considered as an important sediment type by the
54
M.D. Ferentinou et al.
ANN and was not recognised as a clear cluster. The core data representing this thin
layer were generally few (one thin layer in each core) and sometimes incomplete;
consequently there were not enough input data to represent this sediment type within
the data set. ST4 unit has a high score in the map corresponding to Zakynthos data
set, though it was also recognised in south Killini, where it appears to have a low
score. Again this particular lithological unit is not adequately represented in the data
set. Generally, ANN in order to converge and predict successfully, have to be trained
with representative data of the system they have to simulate.
A coupled model of SOM networks using interaction matrix theory was finally
applied in order to rate the examined geotechnical properties in an objective and
quantified approach. The results are again reasonable and illustrate that the most
dominant parameters in the studied area are undrained shear strength, water content
and silt percentage.
Acknowledgments The authors would like to thank the reviewers V. Lykousis, T. Glade and
H. Lee. for their constructive suggestions. M.F. was supported by a post-doctoral fellowship of the
Greek State Scholarships Foundation.
References
Brooks, M. Ferentinos, G., 1984. Tectonics and sedimentation in the Gulf of Corinth and
Kephalonia – Zante Straits, Ionian Sea, Greece. Tectonophysics, 101, 25–54.
Chang, H-C., Kopaska-Merkel D.C., Chen H-C., 2002. Identification of lithofacies using Kohonen
self-organising maps. Computers and Geosciences, 28, 223–229.
Ferentinou, M., Sakellariou, M., 2007. Computational intelligence tools for the prediction of slope
performance. Computers and Geotechnics, 34, 362–384.
Hasiotis, T., Papatheodorou, G., Ferentinos G., 2005. A high resolution approach in the recent
sedimentation processes at the head of Zakynthos Canyon, Western Greece. Marine Geology,
214, 49–73.
Hudson J.A., 1992. Rock Engineering Systems: Theory and Practice, Horwood, Chisesteractice.
Kohonen T. 1994. Self-Organising Maps. Springer, New York.
Lee, H., Baraza, J., 1999. Geotechnical characteristics and slope stability in the gulf of Cadiz.
Marine Geology, 155, 173–190.
Lykousis, V., Chronis, G., 1989. Mass movements, geotechnical properties and slope stability in
the outer shelf-upper slope, NW Aegean Sea. Marine Geotechnology, 8, 231–247.
Lykousis, V., Roussakis, G., Alexandri, M., Pavlakis, P., Papoulia, I, 2002. Sliding and regional
slope stability in active margins: North Aegean Trough (Mediterranean). Marine Geology, 186,
281–298.
Lykousis, V., Roussakis, G.,Sakellariou D., 2008. Slope failures and stability analysis of shallow
water prodeltas in the active margins of Western Greece, northeastern Mediterranean Sea.
International Journal of Earth Sciences, 98, 807–822.
Vesanto J., 1999. SOM-based data visualisation methods. Intelligent Data Analysis, 3, 2,
11–126.
Download