CHARACTERIZATION OF GROUNDWATER QUALITY BY MULTIVARIATE STATISTICAL ANALYSIS: AN EXAMPLE FROM KAOHSIUNG COUNTY, TAIWAN Yi-Chu Huang*, Ting Nien Wu**, Po-Jen Cheng*** * Department of Environmental Science and Engineering, National Pingtung University of Science &Technology, 1 Hsueh Fu Road, Nei-Pu Hsiang, Pingtung 912, Taiwan, R.O.C. ** Department of Environmental Engineering, Kun Shan University of Technology, 949 Da Wan Road, Yung-Kang City, Tainan County 710, Taiwan, R.O.C. (E-mail: wutn@mail.ksut.edu.tw) *** Health Bureau of Kaohsiung County Government, 834 Cheng Ching Road, Niao-Song Hsiang, Kaohsiung County 833, Taiwan, R.O.C. Abstract Either naturally occurring processes or human activities may have a significant impact on the quality of subsurface waters and further limit its use as water supply. With the aids of multivariate statistical techniques, this study attempted to puzzle out these processes and attribute their influence on groundwater quality. Kaohsiung County area holding parts of two main groundwater regions of Taiwan was selected for this study. Geochemical data including pH, EC, hardness, chloride, sulfate, ammonia, nitrate, Na, K, Ca, Mg, Fe, Mn, Zn, TOC from twenty-four monitoring wells were subjected to factor and cluster analysis. Principal component analysis (PCA) was utilized to reflect those chemical data with the greatest correlation, whereas cluster analysis (CA) was used to evaluate the similarities of water quality in groundwater samples. CA results illustrated that the overall quality of groundwater within hinterland was better than that within coastal area, where was partially salinized as a result of seawater intrusion. By utilizing PCA, the identified four major principal components (PCs) representing almost 80% of cumulative variance were able to interpret the most information contained in the data. PC 1 reflects the dominance of salinization, which was characterized by the elevated concentrations of EC, hardness, chloride, sulfate, sodium, potassium and magnesium in groundwater. PC 2 with the elevated concentrations of iron and manganese is thought to be representative of mineral dissolution within the aquifer. PC 3 shows a strong monotonic relationship with zinc concentration in the groundwater revealing the linkage of the oxidizing/reducing conditions within the aquifer. PC 4 describes the infiltration of organic matters that resulted in the enhancement of TOC on groundwater quality. Keywords principal component analysis; cluster analysis; principal component; salinization INTRODUCTION There are several factors capable of impinging on groundwater quality such as climate, topography, aquifer lithology, surface water recharge, saline water intrusion, human activities, …etc. Either one or two factors could contaminate the aquifers to such an extent that the use of groundwater becomes restricted. Although groundwater is not the major source of water supply in Taiwan, it sometimes serves as the use of agriculture, aquatic feeding, industry and domestic water. The usage of ground water ** Author to whom all correspondence should be addressed. in Kaohsiung County reaches 9% of total amount of groundwater use in Taiwan (Lee, 2002). Thus, there is a need to regularly evaluate groundwater quality for improving water management in this region. This study is attempted to figure out the factors affecting groundwater quality and discriminate their influence area by using multivariate statistical techniques. With the aid of statistical analysis, this study also demonstrates the improvement on the understanding of groundwater systems. MATERIALS AND METHODS Study area Kaohsiung County is a strip-shaped region selected as the study area that comprises parts of Chianan Plain and Pingtung Plain groundwater subregions in Taiwan. The geological structure is mainly the mixed layers of clay, silt and silty sand, which may hamper percolation and limit groundwater recharge in northern part of the study area. On the contrary, the geological characteristics of a gravel-formed structure allow abundant groundwater recharge from Kaoping River in southern part of the study area (KCGEPB, 1998). Groundwater data There are thirty-two monitoring wells established on the shallow aquifer in the study area. Undergoing the project of “The integrated programming on groundwater quality of the monitoring network in Taiwan area”, only twenty-four monitoring wells were periodically sampled and analyzed in 2000. The analysis of groundwater quality including pH, electrical conductivity (EC), hardness, chloride, sulfate, ammonia, nitrate, Na, K, Ca, Mg, Fe, Mn, Zn, total organic carbon (TOC) were obtained as more complete geochemical data. The locations of analyzed monitoring wells are gathering in the south part of the study area as shown in Fig 1. Taiwan Tainan County 17-C67 Kaoshiung County Taitung County 7-C16 14-C64 16-C6 6-C14 618-C68 5-C13 4-C12 13-C63 15-C65 12-C62 11-C61 19-C83 3-C11 10-C59 9-C58 8-C56 1-C03 2-C04 21-P02 20-P01 Pingtung County 22-P12 23-P15 24-P19 Fig 1 Study area with the location of sampling wells Statistical analysis By multivariate analytical techniques, the obtained groundwater data can be simplified, organized and generalized to bring about useful meanings. Principal component analysis (PCA) is known as a powerful technique of data reduction based upon eigenanalysis of the correlation or covariance matrix within large sets of data (Farnham et al., 2003). Because the measurement scales and numerical range of the original variables evaluated in this study vary widely, all variables should be first standardized by Z-score mode. Each variable within the original data matrix subtracts the column mean and then is divided by the column standard deviation. By utilizing PCA, the original p-dimensional standardized data matrix is transformed into a m-dimensional principal component (PC) matrix with less degrees of freedom. The linear relationship between original variables and transformed PCs is expressed as follow: Xi = ai,j × Yj + ai єi (1) where Xi (for i = 1,…,p) identifies original variables, Yj (for j = 1,…,m) identifies principal components, ai,j identifies loading factors, and ai єi is loss of orthogonality. Considering the correlations present in the original data, PCs can reduce the overall complexity of the data and still reserve inherent inter-dependencies. Typically, the first few PCs account for the majority of the variance within the original dataset, then the first one explains the most variance and each subsequent PC explains progressively less. The factor loadings are responsible for the correlations between PCs and selected variables, and those with the greatest positive and negative loadings make the largest contribution. As a result, the loadings can offer more information to track the sources that are responsible for the similarities of collected samples in groundwater quality. Cluster analysis (CA) was used to classify true groups of data according to their similarities to each other. Euclidean distant was employed as a measure of the similarity, and a short Euclidean distant implies the high similarity between the measured objects. In clustering, the distinct groups can reveal either the interaction among the variables (R-mode) or the interrelation among the samples (Q-mode). The two types of CA methods, i.e., hierarchical cluster analysis and nonhierarchical cluster analysis have been done in two-step procedures in this study. The hierarchical method of cluster analysis was used to identify the number of clusters by Ward’s clustering procedure. Next, the K-means method commonly used in nonhierarchical cluster analysis was utilized to obtain the correct classified observations. A detailed description of the multivariate analysis method used in this study can be found in textbooks on statistics (Jackson, 1991). A multivariate analysis including PCA and CA was carried out by the computer package Statistical Package for Social Sciences (SPSS-10.0) in this study. RESULTS AND DISCUSSION Principal component analysis PCA has been applied to both variables and samples corresponding to sampling from twenty-four wells in 2000. As mentioned, PCA is based on the diagonalization of the correlation matrix that can give the overall coherence of the data set. We can observe strong and positive correlations: chloride and sodium (r = 0.987), chloride and magnesium (r = 0.813), chloride and potassium (r = 0.739), chloride and sulfate (r = 0.837), ammonia and chloride (r = 0.581), ammonia and sodium (r = 0.558), ammonia and iron (r = 0.727), iron and manganese (r = 0.662). The Kaiser-Meyer-Oklin test carried out on the correlation matrix shows a calculated value KMO = 0.576 greater than the acceptable value 0.5, thus meaning that PCA can successfully reduce the dimensionality of the original data set. The Bartlett’s sphericity test provides a similar result as well showing a calculated 2 = 439.63 (P <0.01 and 105 degrees of freedom). PCA results including the rotated loading, eigenvalues, and variance percentage of each PC are summarized in Table 1. A scree plot was commonly used to identify the number of factors to be retained for acquiring adequate information, which shows a change of slope after the fourth eigenvalue. The obtained four PCs have eigenvalues greater than unity and explain 79.5% of the variance or information contained in the original data set. The absolute value of the loadings greater than 0.7 are highlighted in Table 1 because it is an indicator of the participation of the variables in the PCs. PC 1 accounts for 47% of the total variance and is characterized by very high loadings of EC, sulfate, hardness, sodium, potassium, magnesium, and chloride. PC 2 explains 14.8% of the total variance and is mainly associated with very high loadings of iron and manganese. PC 3 and PC4 represent 10.2% and 7.5%, respectively, of the total variance and are contributed by the dominance of only one variable each. PC 3 is mostly participated by zinc and PC 4 is primarily related to TOC. Table1. Varimax rotated R-mode factor loading matrix in PCA analysis Variables Factor 1 Factor 2 Factor 3 Factor 4 EC 0.101 0.225 0.207 0.932 SO4 0.173 0.120 -0.177 0.932 hardness 0.222 0.233 0.044 0.924 Na 0.060 0.202 0.273 0.916 K 0.005 0.011 0.018 0.888 Mg 0.145 0.073 0.263 0.882 Cl 0.090 0.214 0.300 0.876 Fe -0.143 0.173 0.198 0.920 Mn 0.184 -0.132 -0.147 0.870 NH3 0.349 0.679 0.402 0.352 Zn 0.012 0.067 -0.069 0.861 Ca 0.372 0.222 0.650 0.071 pH 0.210 -0.028 0.499 0.013 TOC 0.199 0.006 0.167 0.741 NO3 -0.080 -0.161 0.380 -0.657 eigenvalue 7.050 2.217 1.531 1.128 variance (%) 47.0 14.8 10.2 7.5 cumulative variance (%) 47.0 61.8 72.0 79.5 PC 1 interprets that salinization affecting this aquifer was identified, and it is logical to observe a strong positive correlation of EC with sulfate, chloride, sodium, potassium, and magnesium. Calcium and magnesium plays a substantial role in determining hardness, thus the presence of these cations in groundwater results in the enhancement of hardness and simultaneously contributes the augmentation of EC. Because of the low loading of calcium, magnesium is responsible for the foremost contribution of hardness in PC 1. PC 1 is accordingly defined as the salinization factor, and EC serves as an indicator of seawater intrusion in the studied aquifer. The loading variables in PC 2 consist of iron and manganese, which are the most abundant mineral elements in the earth shell. As a general rule, the mineral contents found in groundwater samples are closely related to dissolution processes of geological formation in the studied area. As a consequence, PC 2 is characterized as the mineralization factor. The single dominant variable, zinc, in PC 3 does not show a strong correlation with the rest of the chemical variables examined. Sphalerite, calamine, or willemite may offer a good explanation of zinc present in groundwater, whereas these mineral types have not been identified in the nearby geological formations. Besides of natural dissolution processes, the presence of zinc in groundwater is partially ascribed to the leakages derived from certain industrial wastes or livestock manure piles. PC 4 most likely implies that a high concentration level of TOC found in groundwater is attributed to the leakage of municipal wastewater. Another indicator of municipal wastewater is ammonia, although the strong correlation with TOC or the significance attributed to the loading in PC 4 is not observed in the analysis. Ammonia can be transformed to nitrite and nitrate during a long period of time, so nitrate is treated as a long-term indicator of municipal wastewater. The negative and moderate loading of nitrate in PC 4 may be due to its serving as electron terminal acceptor in biodegradation processes under groundwater environment. As a result, both PC 3 and PC 4 are corresponding to non-natural processes, such as leakages derived from the disposal of industrial waste, or leakages from municipal wastewater. PC 2 ( 14.8% of variance explained) 5 2-C04 4 3 2 1 24-P19 14-C64 0 -1 -2 -2 -1 0 1 2 3 4 5 PC 1 ( 47% of variance explained) Fig 2. PC score of each sample for principal components 1 and 2 On the score plot with respect to the first two principal components (Fig 2), the samples collected from well 24 and 2 are clearly separated from the majority of the other well water samples. This finding is consistent with the fact that an extremely high EC in well 24 and a highly concentrated water of iron and manganese in well 2 have been observed. Well 14 plots closest to but is still distinct from the cluster in Fig 2, and the corresponding explanation is that well water 14 has both higher magnitudes in EC and ferrous concentration. Consequently, the remaining well waters last undistinguished as both PC 1 and PC 2 are employed. As the relative compositions of the constituents in a sample are often as important as their absolute concentrations, cluster analysis is utilized to classify the similarity among samples. The hierarchical method is consuetudinary cluster analysis, which has been successfully demonstrated in hydrogeochemical studies (Reghunath et al, 2002). With Ward’s clustering procedure, the data set is first categorized into four groups in a dendrogram. Following hierarchical cluster analysis, the K-means method is utilized to reclassify the data set on the basis of the similarity between clusters. The output of nonhierarchical cluster analysis with the chemical data for each cluster is given in Table 2. Besides, the loading scores for each cluster with respect to the identified four PCs are shown as Fig 3. Table 2. The chemical properties of each cluster classified by the K-Mean method Variable Unit Cluster 1 Cluster 2 Cluster 3 Cluster 4 (mean) 6.9 6.8 7.3 7.2 pH EC S/cm hardness mg/l Cl mg/l SO4 mg/l NH3 mg/l Fe mg/l Mn mg/l Ca mg/l Mg mg/l Na mg/l K mg/l NO3 mg/l TOC mg/l Zn mg/l (range) (mean) (range) (mean) (range) (mean) (range) (mean) (range) (mean) (range) (mean) (range) (mean) (range) (mean) (range) (mean) (range) (mean) (range) (mean) (range) (mean) (range) (mean) (range) (mean) (range) Number of samples Sampling wells 6.6~7.2 1101.9 817~1740 476.2 294~683 61.1 25~271 125.0 11~310 0.521 0.04~3.78 4.835 0.05~39.55 0.554 0.04~2.15 129.5 69~142 30.2 14.2~71.6 65.8 17.6~190 9.0 2.55~21 2.122 0.25~9.61 2.586 0.35~7.1 0.027 0.01~0.068 5.9~7.8 523.6 270~779 217.2 124~430 104.4 11.25~549 54.3 12.7~85.4 0.082 0.02~0.2 1.287 0.02~3.65 0.435 0.04~1.21 43.0 29.5~90.1 21.2 14.2~37.1 53.0 20.15~94 11.4 1.86~18.5 1.415 0.15~6 2.389 0.35~5.3 0.020 0.01~0.025 15 7 1, 2, 3, 5, 6, 7, 8, 4, 9, 10, 12, 11, 15, 16, 17, 13, 18, 21 19, 20, 22, 23 7~7.6 16450.0 11300~23400 2218.0 1510~3320 3663.0 530~5240 897.0 543~1630 2.523 1.56~3.75 2.178 1.36~2.96 1.188 0.53~2.29 183.0 162~195 136.0 17~274 1965.0 1000~3720 56.4 29.8~88.2 0.230 0.12~0.45 4.500 2.9~5.8 0.035 0.005~0.1 7.1~7.4 8050.0 7750~8960 847.8 802~942 2432.5 2320~2530 1068.0 91.1~125 2.915 2.44~3.47 9.128 3.28~21.2 0.278 0.2~0.54 137.8 7.2~175 57.2 36.7~88.1 1071.0 687~1780 16.8 5.9~35.5 0.170 0.04~0.23 5.500 3.4~8.6 0.055 0.005~0.18 1 1 24 14 There are fifteen samples identified within Cluster 1, which has the highest loading scores with respect to mineralization factor (PC 2) and zinc factor (PC 3). The abundance of iron, manganese, and calcium in well water strengthens the intensity of hardness and possibly raise the magnitude of EC as well. The relative constituent of chloride to sulfate in cluster 1 samples is much less as compared with the other clusters, furthermore it implies that the principle cause of the saline content in well water cannot direct to seawater intrusion. In cluster 1, the dominant process occurring in the aquifer is recognized as mineralization instead of salinization although the locations of cluster members are mostly scattering along the coastal area. The hardness and the abundance of iron and manganese limit groundwater serving as a source of drinking water in the regions belonging to cluster 1. 6 4 2 0 -2 -4 -6 cluster 1 Score cluster 2 cluster 3 cluster 4 PC 1: salinization PC 2: mineralization PC 3: zinc PC 4: TOC Fig. 3 Scores for each cluster with respect to the four identified principal components The recognized seven sampling wells in cluster 2 are located on the hinterland of the study area. The least values of EC and hardness are found in cluster 2 samples, and the correspondence reveals that the impact of salinization and mineralization on groundwater quality is weak in this regions. The variable chloride is commonly utilized as a tracer of seawater intrusion, however it is interestingly to observe an even higher chloride content of well water in cluster 2 as compared with cluster 1. This raise of chloride content is certainly not attributed to seawater intrusion, whereas it is mainly contributed by the relatively high chloride concentration of well 13. The source of chloride around well 13 is suspected to be associated with the leakage of domestic wastewater because a relatively high concentration of TOC is also found within the same sample. The overall quality of groundwater from cluster 2 wells surpasses the other clusters but it still cannot meet the drinking water standard on iron and manganese. As stated previously, water sampled from well 24 has plenty of the characteristics of the saline water, which might have high EC, concentrated chloride, and abundant ions of sulfate, sodium, potassium, magnesium, and calcium. Monotonic sample in cluster 3 seems to be deeply affected by seawater intrusion, and the saline evolution is coincided with its sampling location nearby the seashore. In cluster 4, the evidence of saline evolution and mineral dissolution of groundwater is scrutinized on the basis of high EC and hardness of the single sample from well 14. Besides, the found high concentration of TOC within well 14 corresponds to the possible contamination by the leakage of domestic wastewater in the nearby aquifer. Therefore, the single sample in cluster 4 may suffer from the associated impact of salinization, mineralization, and wastewater leakage on its groundwater quality. CONCLUSIONS This study has successfully demonstrated the utility of multivariate statistical analysis to characterize groundwater quality. In our case, PCA explains 79.5% of the total variance and recognizes four PCs as salinization, mineralization, zinc, and TOC factor. The K-mean method classifies the data set into four clusters containing fifteen sampling wells in cluster 1, seven in cluster 2 and just one in cluster 3 and 4 each. Mineral dissolution is the identified dominant process in cluster 1, and the leakage of domestic wastewater is recognized as the major source in cluster 2. The governing mechanism is the saline evolution of well water in cluster 3, while the combinations of salinization, mineralization, and wastewater leakage affect the groundwater quality in cluster 4. By the aid of statistics techniques, it is predictable to be aware of the underlying processes and the distribution of sources that might affect groundwater quality. Furthermore, it can offer the requisite information for the authority to pursue the sustainable approaches on groundwater management and contamination prevention. REFERENCES Cheng P. J. (2003). Application of Multivariate Statistical Method on Characteristic Analysis of Groundwater Quality in the Kaohsiung County Area. M.S. thesis, Department of Environmental Science and Engineering, National Pingtung University of Science &Technology, Taiwan. [in Chinese]. Farnham I. M., Johannesson K. H., Singh A.K., Hodge V. F. and Stetzenbach K.J. (2003). Factor analytical approaches for evaluating groundwater trace element chemistry data. Analytica Chimica Acta, 490, 123-138. Helena B., Pardo R., Vega M., Barrado E., Fernandez J. M. and Fernandez L. (1999). Temporal evolution of groundwater composition in an alluvial aquifer (Pisuerga River, Spain) by principal component analysis. Wat. Res., 34(3), 807-816. Jackson J.E. (1991). A User’s Guide to Principal Components. Wiley, New York. Kaohsiung County Government Environmental Protection Bureau (1998). The Establishment of Groundwater Monitoring Wells in Taiwan: The First Stage, Report, Kaohsiung County Government Environmental Protection Bureau, Kaohsiung County, Taiwan. [in Chinese]. Lee Y. P. (2002). The Issue on the Application of Groundwater Monitoring Data in Taiwan. Environment Protection Monthly, 2( 4), 130-139. [in Chinese]. Morell I., Gimenez E. and Esteller M.V. (1996). Application of principal components analysis to the study of salinization on the Castellon Plain (Spain). Sci. Tot. Environ., 177, 161-171. Reghunath R., Sreedhara Murthy T. R. and Raghavan B. R. (2002). The utility of multivariate statistical techniques in hydrogeochemical studies: an example from Karnataka, India. Wat. Res., 36, 2437-2442.