CHARACTERIZATION OF GROUNDWATER QUALITY BY

advertisement
CHARACTERIZATION OF GROUNDWATER QUALITY BY
MULTIVARIATE STATISTICAL ANALYSIS: AN EXAMPLE FROM
KAOHSIUNG COUNTY, TAIWAN
Yi-Chu Huang*, Ting Nien Wu**, Po-Jen Cheng***
* Department of Environmental Science and Engineering, National Pingtung University of
Science &Technology, 1 Hsueh Fu Road, Nei-Pu Hsiang, Pingtung 912, Taiwan, R.O.C.
** Department of Environmental Engineering, Kun Shan University of Technology, 949 Da
Wan Road, Yung-Kang City, Tainan County 710, Taiwan, R.O.C. (E-mail:
wutn@mail.ksut.edu.tw)
*** Health Bureau of Kaohsiung County Government, 834 Cheng Ching Road, Niao-Song
Hsiang, Kaohsiung County 833, Taiwan, R.O.C.
Abstract
Either naturally occurring processes or human activities may have a significant
impact on the quality of subsurface waters and further limit its use as water supply.
With the aids of multivariate statistical techniques, this study attempted to puzzle
out these processes and attribute their influence on groundwater quality.
Kaohsiung County area holding parts of two main groundwater regions of Taiwan
was selected for this study. Geochemical data including pH, EC, hardness,
chloride, sulfate, ammonia, nitrate, Na, K, Ca, Mg, Fe, Mn, Zn, TOC from
twenty-four monitoring wells were subjected to factor and cluster analysis.
Principal component analysis (PCA) was utilized to reflect those chemical data
with the greatest correlation, whereas cluster analysis (CA) was used to evaluate
the similarities of water quality in groundwater samples. CA results illustrated that
the overall quality of groundwater within hinterland was better than that within
coastal area, where was partially salinized as a result of seawater intrusion. By
utilizing PCA, the identified four major principal components (PCs) representing
almost 80% of cumulative variance were able to interpret the most information
contained in the data. PC 1 reflects the dominance of salinization, which was
characterized by the elevated concentrations of EC, hardness, chloride, sulfate,
sodium, potassium and magnesium in groundwater. PC 2 with the elevated
concentrations of iron and manganese is thought to be representative of mineral
dissolution within the aquifer. PC 3 shows a strong monotonic relationship with
zinc concentration in the groundwater revealing the linkage of the
oxidizing/reducing conditions within the aquifer. PC 4 describes the infiltration of
organic matters that resulted in the enhancement of TOC on groundwater quality.
Keywords
principal component analysis; cluster analysis; principal component; salinization
INTRODUCTION
There are several factors capable of impinging on groundwater quality such as climate,
topography, aquifer lithology, surface water recharge, saline water intrusion, human
activities, …etc. Either one or two factors could contaminate the aquifers to such an
extent that the use of groundwater becomes restricted. Although groundwater is not
the major source of water supply in Taiwan, it sometimes serves as the use of
agriculture, aquatic feeding, industry and domestic water. The usage of ground water
**
Author to whom all correspondence should be addressed.
in Kaohsiung County reaches 9% of total amount of groundwater use in Taiwan (Lee,
2002). Thus, there is a need to regularly evaluate groundwater quality for improving
water management in this region. This study is attempted to figure out the factors
affecting groundwater quality and discriminate their influence area by using
multivariate statistical techniques. With the aid of statistical analysis, this study also
demonstrates the improvement on the understanding of groundwater systems.
MATERIALS AND METHODS
Study area Kaohsiung County is a strip-shaped region selected as the study area that
comprises parts of Chianan Plain and Pingtung Plain groundwater subregions in
Taiwan. The geological structure is mainly the mixed layers of clay, silt and silty sand,
which may hamper percolation and limit groundwater recharge in northern part of the
study area. On the contrary, the geological characteristics of a gravel-formed structure
allow abundant groundwater recharge from Kaoping River in southern part of the
study area (KCGEPB, 1998).
Groundwater data There are thirty-two monitoring wells established on the shallow
aquifer in the study area. Undergoing the project of “The integrated programming on
groundwater quality of the monitoring network in Taiwan area”, only twenty-four
monitoring wells were periodically sampled and analyzed in 2000. The analysis of
groundwater quality including pH, electrical conductivity (EC), hardness, chloride,
sulfate, ammonia, nitrate, Na, K, Ca, Mg, Fe, Mn, Zn, total organic carbon (TOC)
were obtained as more complete geochemical data. The locations of analyzed
monitoring wells are gathering in the south part of the study area as shown in Fig 1.
Taiwan
Tainan County
17-C67
Kaoshiung
County
Taitung County
7-C16
14-C64
16-C6
6-C14
618-C68
5-C13
4-C12
13-C63
15-C65
12-C62
11-C61
19-C83
3-C11
10-C59
9-C58
8-C56
1-C03
2-C04
21-P02
20-P01
Pingtung County
22-P12
23-P15
24-P19
Fig 1 Study area with the location of sampling wells
Statistical analysis By multivariate analytical techniques, the obtained groundwater
data can be simplified, organized and generalized to bring about useful meanings.
Principal component analysis (PCA) is known as a powerful technique of data
reduction based upon eigenanalysis of the correlation or covariance matrix within
large sets of data (Farnham et al., 2003). Because the measurement scales and
numerical range of the original variables evaluated in this study vary widely, all
variables should be first standardized by Z-score mode. Each variable within the
original data matrix subtracts the column mean and then is divided by the column
standard deviation. By utilizing PCA, the original p-dimensional standardized data
matrix is transformed into a m-dimensional principal component (PC) matrix with less
degrees of freedom. The linear relationship between original variables and
transformed PCs is expressed as follow:
Xi = ai,j × Yj + ai єi
(1)
where Xi (for i = 1,…,p) identifies original variables, Yj (for j = 1,…,m) identifies
principal components, ai,j identifies loading factors, and ai єi is loss of orthogonality.
Considering the correlations present in the original data, PCs can reduce the overall
complexity of the data and still reserve inherent inter-dependencies. Typically, the
first few PCs account for the majority of the variance within the original dataset, then
the first one explains the most variance and each subsequent PC explains
progressively less. The factor loadings are responsible for the correlations between
PCs and selected variables, and those with the greatest positive and negative loadings
make the largest contribution. As a result, the loadings can offer more information to
track the sources that are responsible for the similarities of collected samples in
groundwater quality.
Cluster analysis (CA) was used to classify true groups of data according to their
similarities to each other. Euclidean distant was employed as a measure of the
similarity, and a short Euclidean distant implies the high similarity between the
measured objects. In clustering, the distinct groups can reveal either the interaction
among the variables (R-mode) or the interrelation among the samples (Q-mode). The
two types of CA methods, i.e., hierarchical cluster analysis and nonhierarchical cluster
analysis have been done in two-step procedures in this study. The hierarchical method
of cluster analysis was used to identify the number of clusters by Ward’s clustering
procedure. Next, the K-means method commonly used in nonhierarchical cluster
analysis was utilized to obtain the correct classified observations.
A detailed description of the multivariate analysis method used in this study can be
found in textbooks on statistics (Jackson, 1991). A multivariate analysis including
PCA and CA was carried out by the computer package Statistical Package for Social
Sciences (SPSS-10.0) in this study.
RESULTS AND DISCUSSION
Principal component analysis PCA has been applied to both variables and samples
corresponding to sampling from twenty-four wells in 2000. As mentioned, PCA is
based on the diagonalization of the correlation matrix that can give the overall
coherence of the data set. We can observe strong and positive correlations: chloride
and sodium (r = 0.987), chloride and magnesium (r = 0.813), chloride and potassium
(r = 0.739), chloride and sulfate (r = 0.837), ammonia and chloride (r = 0.581),
ammonia and sodium (r = 0.558), ammonia and iron (r = 0.727), iron and manganese
(r = 0.662). The Kaiser-Meyer-Oklin test carried out on the correlation matrix shows a
calculated value KMO = 0.576 greater than the acceptable value 0.5, thus meaning
that PCA can successfully reduce the dimensionality of the original data set. The
Bartlett’s sphericity test provides a similar result as well showing a calculated 2 =
439.63 (P <0.01 and 105 degrees of freedom).
PCA results including the rotated loading, eigenvalues, and variance percentage of
each PC are summarized in Table 1. A scree plot was commonly used to identify the
number of factors to be retained for acquiring adequate information, which shows a
change of slope after the fourth eigenvalue. The obtained four PCs have eigenvalues
greater than unity and explain 79.5% of the variance or information contained in the
original data set. The absolute value of the loadings greater than 0.7 are highlighted in
Table 1 because it is an indicator of the participation of the variables in the PCs. PC 1
accounts for 47% of the total variance and is characterized by very high loadings of
EC, sulfate, hardness, sodium, potassium, magnesium, and chloride. PC 2 explains
14.8% of the total variance and is mainly associated with very high loadings of iron
and manganese. PC 3 and PC4 represent 10.2% and 7.5%, respectively, of the total
variance and are contributed by the dominance of only one variable each. PC 3 is
mostly participated by zinc and PC 4 is primarily related to TOC.
Table1. Varimax rotated R-mode factor loading matrix in PCA analysis
Variables
Factor 1
Factor 2
Factor 3
Factor 4
EC
0.101
0.225
0.207
0.932
SO4
0.173
0.120
-0.177
0.932
hardness
0.222
0.233
0.044
0.924
Na
0.060
0.202
0.273
0.916
K
0.005
0.011
0.018
0.888
Mg
0.145
0.073
0.263
0.882
Cl
0.090
0.214
0.300
0.876
Fe
-0.143
0.173
0.198
0.920
Mn
0.184
-0.132
-0.147
0.870
NH3
0.349
0.679
0.402
0.352
Zn
0.012
0.067
-0.069
0.861
Ca
0.372
0.222
0.650
0.071
pH
0.210
-0.028
0.499
0.013
TOC
0.199
0.006
0.167
0.741
NO3
-0.080
-0.161
0.380
-0.657
eigenvalue
7.050
2.217
1.531
1.128
variance (%)
47.0
14.8
10.2
7.5
cumulative variance (%)
47.0
61.8
72.0
79.5
PC 1 interprets that salinization affecting this aquifer was identified, and it is logical
to observe a strong positive correlation of EC with sulfate, chloride, sodium,
potassium, and magnesium. Calcium and magnesium plays a substantial role in
determining hardness, thus the presence of these cations in groundwater results in the
enhancement of hardness and simultaneously contributes the augmentation of EC.
Because of the low loading of calcium, magnesium is responsible for the foremost
contribution of hardness in PC 1. PC 1 is accordingly defined as the salinization factor,
and EC serves as an indicator of seawater intrusion in the studied aquifer. The loading
variables in PC 2 consist of iron and manganese, which are the most abundant mineral
elements in the earth shell. As a general rule, the mineral contents found in
groundwater samples are closely related to dissolution processes of geological
formation in the studied area. As a consequence, PC 2 is characterized as the
mineralization factor. The single dominant variable, zinc, in PC 3 does not show a
strong correlation with the rest of the chemical variables examined. Sphalerite,
calamine, or willemite may offer a good explanation of zinc present in groundwater,
whereas these mineral types have not been identified in the nearby geological
formations. Besides of natural dissolution processes, the presence of zinc in
groundwater is partially ascribed to the leakages derived from certain industrial
wastes or livestock manure piles. PC 4 most likely implies that a high concentration
level of TOC found in groundwater is attributed to the leakage of municipal
wastewater. Another indicator of municipal wastewater is ammonia, although the
strong correlation with TOC or the significance attributed to the loading in PC 4 is not
observed in the analysis. Ammonia can be transformed to nitrite and nitrate during a
long period of time, so nitrate is treated as a long-term indicator of municipal
wastewater. The negative and moderate loading of nitrate in PC 4 may be due to its
serving as electron terminal acceptor in biodegradation processes under groundwater
environment. As a result, both PC 3 and PC 4 are corresponding to non-natural
processes, such as leakages derived from the disposal of industrial waste, or leakages
from municipal wastewater.
PC 2 ( 14.8% of variance explained)
5
2-C04
4
3
2
1
24-P19
14-C64
0
-1
-2
-2
-1
0
1
2
3
4
5
PC 1 ( 47% of variance explained)
Fig 2. PC score of each sample for principal components 1 and 2
On the score plot with respect to the first two principal components (Fig 2), the
samples collected from well 24 and 2 are clearly separated from the majority of the
other well water samples. This finding is consistent with the fact that an extremely
high EC in well 24 and a highly concentrated water of iron and manganese in well 2
have been observed. Well 14 plots closest to but is still distinct from the cluster in Fig
2, and the corresponding explanation is that well water 14 has both higher magnitudes
in EC and ferrous concentration. Consequently, the remaining well waters last
undistinguished as both PC 1 and PC 2 are employed.
As the relative compositions of the constituents in a sample are often as important as
their absolute concentrations, cluster analysis is utilized to classify the similarity
among samples. The hierarchical method is consuetudinary cluster analysis, which
has been successfully demonstrated in hydrogeochemical studies (Reghunath et al,
2002). With Ward’s clustering procedure, the data set is first categorized into four
groups in a dendrogram. Following hierarchical cluster analysis, the K-means method
is utilized to reclassify the data set on the basis of the similarity between clusters. The
output of nonhierarchical cluster analysis with the chemical data for each cluster is
given in Table 2. Besides, the loading scores for each cluster with respect to the
identified four PCs are shown as Fig 3.
Table 2. The chemical properties of each cluster classified by the K-Mean method
Variable
Unit
Cluster 1
Cluster 2
Cluster 3
Cluster 4
(mean)
6.9
6.8
7.3
7.2
pH
EC
S/cm
hardness mg/l
Cl
mg/l
SO4
mg/l
NH3
mg/l
Fe
mg/l
Mn
mg/l
Ca
mg/l
Mg
mg/l
Na
mg/l
K
mg/l
NO3
mg/l
TOC
mg/l
Zn
mg/l
(range)
(mean)
(range)
(mean)
(range)
(mean)
(range)
(mean)
(range)
(mean)
(range)
(mean)
(range)
(mean)
(range)
(mean)
(range)
(mean)
(range)
(mean)
(range)
(mean)
(range)
(mean)
(range)
(mean)
(range)
(mean)
(range)
Number of samples
Sampling wells
6.6~7.2
1101.9
817~1740
476.2
294~683
61.1
25~271
125.0
11~310
0.521
0.04~3.78
4.835
0.05~39.55
0.554
0.04~2.15
129.5
69~142
30.2
14.2~71.6
65.8
17.6~190
9.0
2.55~21
2.122
0.25~9.61
2.586
0.35~7.1
0.027
0.01~0.068
5.9~7.8
523.6
270~779
217.2
124~430
104.4
11.25~549
54.3
12.7~85.4
0.082
0.02~0.2
1.287
0.02~3.65
0.435
0.04~1.21
43.0
29.5~90.1
21.2
14.2~37.1
53.0
20.15~94
11.4
1.86~18.5
1.415
0.15~6
2.389
0.35~5.3
0.020
0.01~0.025
15
7
1, 2, 3, 5, 6, 7, 8,
4, 9, 10, 12,
11, 15, 16, 17,
13, 18, 21
19, 20, 22, 23
7~7.6
16450.0
11300~23400
2218.0
1510~3320
3663.0
530~5240
897.0
543~1630
2.523
1.56~3.75
2.178
1.36~2.96
1.188
0.53~2.29
183.0
162~195
136.0
17~274
1965.0
1000~3720
56.4
29.8~88.2
0.230
0.12~0.45
4.500
2.9~5.8
0.035
0.005~0.1
7.1~7.4
8050.0
7750~8960
847.8
802~942
2432.5
2320~2530
1068.0
91.1~125
2.915
2.44~3.47
9.128
3.28~21.2
0.278
0.2~0.54
137.8
7.2~175
57.2
36.7~88.1
1071.0
687~1780
16.8
5.9~35.5
0.170
0.04~0.23
5.500
3.4~8.6
0.055
0.005~0.18
1
1
24
14
There are fifteen samples identified within Cluster 1, which has the highest loading
scores with respect to mineralization factor (PC 2) and zinc factor (PC 3). The
abundance of iron, manganese, and calcium in well water strengthens the intensity of
hardness and possibly raise the magnitude of EC as well. The relative constituent of
chloride to sulfate in cluster 1 samples is much less as compared with the other
clusters, furthermore it implies that the principle cause of the saline content in well
water cannot direct to seawater intrusion. In cluster 1, the dominant process occurring
in the aquifer is recognized as mineralization instead of salinization although the
locations of cluster members are mostly scattering along the coastal area. The
hardness and the abundance of iron and manganese limit groundwater serving as a
source of drinking water in the regions belonging to cluster 1.
6
4
2
0
-2
-4
-6
cluster 1
Score
cluster 2
cluster 3
cluster 4
PC 1:
salinization
PC 2:
mineralization
PC 3: zinc
PC 4: TOC
Fig. 3 Scores for each cluster with respect to the four identified principal components
The recognized seven sampling wells in cluster 2 are located on the hinterland of the
study area. The least values of EC and hardness are found in cluster 2 samples, and
the correspondence reveals that the impact of salinization and mineralization on
groundwater quality is weak in this regions. The variable chloride is commonly
utilized as a tracer of seawater intrusion, however it is interestingly to observe an even
higher chloride content of well water in cluster 2 as compared with cluster 1. This
raise of chloride content is certainly not attributed to seawater intrusion, whereas it is
mainly contributed by the relatively high chloride concentration of well 13. The
source of chloride around well 13 is suspected to be associated with the leakage of
domestic wastewater because a relatively high concentration of TOC is also found
within the same sample. The overall quality of groundwater from cluster 2 wells
surpasses the other clusters but it still cannot meet the drinking water standard on iron
and manganese.
As stated previously, water sampled from well 24 has plenty of the characteristics of
the saline water, which might have high EC, concentrated chloride, and abundant ions
of sulfate, sodium, potassium, magnesium, and calcium. Monotonic sample in cluster
3 seems to be deeply affected by seawater intrusion, and the saline evolution is
coincided with its sampling location nearby the seashore. In cluster 4, the evidence of
saline evolution and mineral dissolution of groundwater is scrutinized on the basis of
high EC and hardness of the single sample from well 14. Besides, the found high
concentration of TOC within well 14 corresponds to the possible contamination by the
leakage of domestic wastewater in the nearby aquifer. Therefore, the single sample in
cluster 4 may suffer from the associated impact of salinization, mineralization, and
wastewater leakage on its groundwater quality.
CONCLUSIONS
This study has successfully demonstrated the utility of multivariate statistical analysis
to characterize groundwater quality. In our case, PCA explains 79.5% of the total
variance and recognizes four PCs as salinization, mineralization, zinc, and TOC factor.
The K-mean method classifies the data set into four clusters containing fifteen
sampling wells in cluster 1, seven in cluster 2 and just one in cluster 3 and 4 each.
Mineral dissolution is the identified dominant process in cluster 1, and the leakage of
domestic wastewater is recognized as the major source in cluster 2. The governing
mechanism is the saline evolution of well water in cluster 3, while the combinations
of salinization, mineralization, and wastewater leakage affect the groundwater quality
in cluster 4. By the aid of statistics techniques, it is predictable to be aware of the
underlying processes and the distribution of sources that might affect groundwater
quality. Furthermore, it can offer the requisite information for the authority to pursue
the sustainable approaches on groundwater management and contamination
prevention.
REFERENCES
Cheng P. J. (2003). Application of Multivariate Statistical Method on Characteristic
Analysis of Groundwater Quality in the Kaohsiung County Area. M.S. thesis,
Department of Environmental Science and Engineering, National Pingtung
University of Science &Technology, Taiwan. [in Chinese].
Farnham I. M., Johannesson K. H., Singh A.K., Hodge V. F. and Stetzenbach K.J.
(2003). Factor analytical approaches for evaluating groundwater trace element
chemistry data. Analytica Chimica Acta, 490, 123-138.
Helena B., Pardo R., Vega M., Barrado E., Fernandez J. M. and Fernandez L. (1999).
Temporal evolution of groundwater composition in an alluvial aquifer (Pisuerga
River, Spain) by principal component analysis. Wat. Res., 34(3), 807-816.
Jackson J.E. (1991). A User’s Guide to Principal Components. Wiley, New York.
Kaohsiung County Government Environmental Protection Bureau (1998). The
Establishment of Groundwater Monitoring Wells in Taiwan: The First Stage, Report,
Kaohsiung County Government Environmental Protection Bureau, Kaohsiung
County, Taiwan. [in Chinese].
Lee Y. P. (2002). The Issue on the Application of Groundwater Monitoring Data in
Taiwan. Environment Protection Monthly, 2( 4), 130-139. [in Chinese].
Morell I., Gimenez E. and Esteller M.V. (1996). Application of principal components
analysis to the study of salinization on the Castellon Plain (Spain). Sci. Tot. Environ.,
177, 161-171.
Reghunath R., Sreedhara Murthy T. R. and Raghavan B. R. (2002). The utility of
multivariate statistical techniques in hydrogeochemical studies: an example from
Karnataka, India. Wat. Res., 36, 2437-2442.
Download