Abstract The purpose of this report is to investigate different techniques used to predict concentrations of chemicals through a storm event. Sampling of storm events can be done electronically, using very limited data loggers, that continuously record such variables as pH and temperature or by using automatic samplers that take water samples at set intervals, but must be loaded and unloaded frequently. If a parameter that is continuously being recorded can be shown to be a proxy for a water constituent that must be sampled and analyzed, then that constituent can be successfully predicted. This paper examines attempts to predict NO3 based on specific conductivity and stage, which can be measured continuously. Multiple regression and step multiple regression equations are generated and then compared to the actual nitrate concentrations over a storm event. The techniques are not sufficient for simulating the nitrate concentrations at Millstone Spring, KY. Cluster analysis is examined as a technique to quantify chemical variability, and appears to be an effective tool for describing chemical changes through a storm event. Background Karst aquifers are notably different from other types of aquifers. Karst systems are characterized by highly soluble bedrock, generally either limestone or dolomite. As such, the aquifer has open-conduit flow, from fractures, as well as dissolution features, such as sinkholes and caves. These aquifers also have diffuse flow. Because of this open, fractured nature of karst, surface contamination can impact these aquifers. Understanding and quantifying this surface contamination is important for a variety of water-quality issues. Because of the various flow paths available to groundwater and the extreme heterogeneity, it is also incredibly difficult to model these aquifers. These aquifers cannot be quantified in the same way that a sandstone aquifer can, for example. Traditional modeling packages that simulate groundwater flow are effectively useless in karst systems. Figure 1 is a schematic cross-section of a karst aquifer. Figure 1: Karst aquifer cross section. From http://www.forester.net/images/sw0111_49.gif Springs offer a source of information in karst systems. These discharge points represent an average view of the aquifer characteristics. Storm water sampling of springs provides a “slice” of the aquifer behavior. Sampling of storm events can be done electronically, using very limited data loggers, that continuously record such variables as pH and temperature or by using automatic samplers that take water samples at set intervals, but must be loaded and unloaded frequently. The first portion of the storm water to be sampled is the base flow, or that water that is not affected by recharge. This is the water that was already in the system before the storm event. In a karst system, baseflow water chemistry is dominated by the presence of dissolved calcium carbonate and, to a lesser extent, magnesium. Calcium buffers pH and increases specific conductivity (Peterson, Davis and Brahana, 2000, p. 47). The second part of the sampling captures the storm water itself. These samples are characterized by an increase in water level, a dilution of chemical constituents that are part of the bedrock or epikarst, and an increase in those chemical constituents that are from the surface, being flushed into the system with the storm water. The third part of the sample is the recession as the influence of the storm water decreases in the system and the spring begins to revert back to base flow conditions. Surface constituents will be decreasing in concentration, but it is possible that there will be a lag time before bedrock constituents are at base flow levels. Methods of Prediction There are many possible sources of contamination in a karst aquifer. Figure 2 is an illustration of some of the various contaminants and paths in a karst system. Figure 2: Contamination in a karst aquifer. From http://www.dyetracing.com/karst/ka01013.html Contamination from agricultural practices, specifically in the case of nitrates (NO3), is the focus of this report. Peterson, Davis and Brahana propose a multiple regression method for predicting nitrate concentration based on stage and specific conductivity (2000, p. 43). The rational behind using stage as a proxy for nitrate concentration is that stage is a measure of volume of discharge, and as discharge increases, so do nitrate concentrations (Peterson, Davis and Brahana, 2000, p.47). Specific conductance (SC) is a measure of ionic strength and ionic strength is controlled primarily by calcium in a carbonate system (Peterson, Davis and Brahana, 2000, p.47). SC can thus be thought of as a function of water with a higher residence time in the system, rather than rainwater with a low SC. As SC decreases, nitrates should increase through a storm event. The first step for predicting nitrate concentration in a karst system would be a visual inspection of a storm water hydrograph. This seems like an elementary exercise but it is important to determine if SC and stage do fluctuate according to the model. After visual examination of the data, the data should be sorted and nitrate versus stage should be plotted. This plot will exhibit various points where the slope radically changes, and at these points the line should be divided into steps (Peterson, Davis and Brahana, 2000, p.49). Figure 3 is the sorted data from the spring measured by Peterson, Davis and Brahana (2000, p.55). After the stage data is separated into steps, a multiple regression is performed on the data set that relates to each step. The resulting regression equations are included below: step 1 no3 = 2.88 - 0.106 step 1 stage + 0.00696 step 1 sc step 2 NO3 = 6.54 + 0.208 Step 2 stage - 0.0167 step 2 sc step 3 NO3 = 6.88 - 0.0434 Step 3 Stage - 0.00510 Step 3 SC In addition, a multiple regression equation was generated for the entire data set. This equation is: NO3-N = 5.49 - 0.0311 Stage - 0.00204 SC Stafford Spring, AR 6 step 2 16.70-18.54 step 1 9.00-16.70 5 step 3 18.54-73.00 NO3-N 4 3 2 1 0 0 10 20 30 40 50 60 70 Stage Figure 3: Sorted stage versus nitrate concentration. The computer program MINITAB explains the regression equation with the following relationship: Response = constant + coefficient (predictor) + … + coefficient (predictor) where the response (Y) is the value of the response. The constant is the value of the response variable when the predictor variable(s) is zero. In the MINITAB program 80 identifies the constant as the intercept because it determines where the regression line intercepts (meets) the Y-axis. Predictor(s) (X) are the value of the predictor variable(s). The coefficients are the estimated change in average response. A portion of the summary table from this multiple regression is included below. The P is the probability and should be lower than the -level, chosen in this case to be 0.05. SC may be a borderline predictor, because it is above the -level, but only slightly. Predictor Constant Stage SC Coef 7.830 -0.044250 -0.008774 SE Coef 1.177 0.006665 0.003773 - P 0.001 0.001 0.059 The R and adjusted R-values can be thought of as a proportional variation. The adjusted R is adjusted to the number of terms in the model. R-Sq = 89.7% R-Sq(adj) = 86.3% Figure 4 shows the simulated nitrate concentration plotted with the real concentration. With the exception of step 1, the plots of the simulated and actual concentrations follow each other very well. Even step 1 follows the shape of the actual nitrate concentration, but appears to be too high of a magnitude plot. Stafford Spring Simulated vs. Actual Nitrate Concentrations 12 10 step 1 NO3 8 step 2 6 Step 3 4 Nitrate Concentration 2 0 Tim e (m in) Figure 4: Simulated and actual nitrate concentrations over a storm event. Peterson, Davis and Brahana also suggest creating a multiple regression equation for the entire data set (2000, p.48). A partial summary of the results is included below. The regression equation is NO3-N = 5.68 - 0.0324 Stage - 0.00252 SC Predictor Constant Stage SC Coef 5.6818 -0.032367 -0.002517 R-Sq = 61.3% SE Coef 0.9444 0.008542 0.002209 T 6.02 -3.79 -1.14 P 0.000 0.001 0.264 R-Sq(adj) = 58.7% The P-value for SC seems anomalously high, indicating that it may not be a good predictor. The R-values are also rather low, indicating that the variation of nitrates is not fully explained by the SC and stage when analyzed as a whole data set. Figure 5 is the plot of the multiple regression equation as well as the actual nitrate concentration. Even though the summary statistics indicate that multiple regressions may not be an effective way to simulate nitrate concentration, the plot still looks visually rather convincing. Stafford Spring NO3 Concentrations 6 NO3 (mg/L) 5 4 Simulated NO3 using MR 3 Measured NO3 2 1 0 time Figure 5: Multiple regression plots of nitrates and actual nitrate concentrations Application of Methods After investigating the methods used by Peterson, Davis and Brahana, the techniques were applied to a spring in Kentucky, Millstone Spring. The spring is located in a karst area. The data was collected over a storm event in 1999. The first step is a visual inspection of the data. Figures 6 and 7 are provided below. Figure 6: Nitrate and Stage Figure 7: Nitrate and SC It is hard to draw any obvious conclusions from these plots. SC appears to decrease as nitrates increase, and stage may increase with nitrates. Neither of these figures shows a clear-cut linear relationship, so it is hard to draw too many conclusions. Figure 8 shows the plot of sorted stage versus nitrate. From this different steps were delineated at abrupt slope changes. These rather arbitrary choices are labeled below. Although the first point appears to be an outlier, it is included in the step calculation. Millstone Spring, KY 11000 outlier? NO3 10000 Step 2 101.3-105.8 Step 3 105.8-111.9 9000 Step 1 95.3-101.3 8000 7000 6000 95 100 105 110 Stage Figure 7: Sorted stage and nitrates. The partial summary of the multiple regression (without breaking into steps) is listed below: The regression equation is NO3_1 = 25563 + 3.17 SC - 178 stage Predictor T P Constant 3.59 0.003 SC 0.60 0.557 stage -2.86 0.012 115 S = 1040.26 R-Sq = 41.1% R-Sq(adj) = 32.7% The SC p-value is much to high. The R-values are low, and do not explain very much of the variance. This does not appear to be an effective model for nitrates. The step regressions should be better models for nitrate behavior. The partial summary for step 1 is listed below: The regression equation is NO3_1_1 = 19013 + 14.6 SPC_1 - 158 stage_1_1 Predictor T P Constant 0.22 0.843 SPC_1 0.26 0.811 stage_1_1 -0.24 0.826 S = 1026.57 R-Sq = 43.0% R-Sq(adj) = 4.9% The p-values for step 1 are too high for an -level of 0.05. The r-squared value does not fully explain the variations, and once it is adjusted for number of variables, explains even less. The partial summary for step 2 is listed below. The p-values are still too high, but not as high as step 1. The R values account for more of the variation than in the previous step. The regression equation is NO3_1_2 = 25270 - 3.41 SPC_2 - 154 stage_1_2 Predictor Constant SPC_2 stage_1_2 S = 202.739 T 4.38 -1.86 -2.75 R-Sq = 79.1% P 0.022 0.160 0.071 R-Sq(adj) = 65.1% The partial summary for step 3 is listed below. The p-values are much too high, but the r-squared values do indicate that a majority of the variance is explained by this equation. The regression equation is NO3_1_2_1 = - 481 + 8.41 SPC_2_1 - 41.0 stage_1_2_1 Predictor Constant SPC_2_1 stage_1_2_1 -40.95 S = 3.95823 T -0.31 1.22 25.33 R-Sq = 80.5% P 0.771 0.291 -1.62 0.181 R-Sq(adj) = 70.8% Figure 8 is the plot of the single multiple regression with the actual measured nitrates. The data sets seem to follow a similar trend, but there is quite a bit of variability in the data sets. The simulated nitrate line does not reflect the same variability as the actual values, a point made by the calculated r-squared values. Simulated and Measured NO3 Millstone Spring 11000 NO3 10000 9000 8000 7000 6000 171 173 175 177 time (julian date) 179 measured N03 simulated NO3 Figure 8: Single Multiple Regression Figure 9 is the plot of the multi-step regression plots. This is essentially a useless plot. The negative SC values are meaningless, but even when plotted as absolute values, do not correlate well. Step 1 follows the measured data closer than any other step. Based solely on the p and r-squared values, it would have appeared that step 2 would have more closely predicted the nitrate concentration. Simulated Nitrates Using Step Regression 30000 25000 20000 15000 SC 10000 5000 0 -5000172 174 176 178 180 step 1 -10000 step 2 -15000 -20000 step 3 time (julian date) NO3 Figure 9: Simulated Nitrate Concentration, Millstone Spring. Analysis of Regression Techniques Based on the p-values, r-squared values and plots of the results, this does not appear to be an effective model for modeling the flow in Millstone Spring. There are several different possibilities for this. It is possible that a larger data set would produce more reliable regression equations. Peterson, Davis and Brahana do not address data set size, but this could be a factor, by giving each individual point more weight than is appropriate. The selection of slope and determining what makes an obvious slope change is a weak point in this model. This is not a quantifiable evaluation, and is instead subject to individual interpretation. Figure 7 had at least one outlier and determining obvious changes in slope was difficult. The data points did not break out in as identifiable slope changes as in Figure 3. Other slopes could have been isolated out in Figure 3 besides those that Peterson, Davis and Brahana selected. There are also geological parameters at work that can impact the effectiveness of this model. Peterson, Davis and Brahana conclude that the step regression methods work best for springs fed primarily by diffuse flow (2000, p. 61). It is quite possible that Millstone Spring is supplied by conduit flow instead, and in fact; the lack of effective modeling may indicate that. The method of step multiple regressions seem like an effective way to quantify nitrate concentration in areas of heavy agricultural use. The springs in Arkansas analyzed by Peterson, Davis and Brahana are heavily impacted by poultry farming. Millstone Spring, on the other hand, is in an area of less heavy use. It may be that the aquifer must be profoundly impacted to be predictable using this model. This method may be effective for a very select kind of karst aquifer. However, predicting nitrate contamination in a heavily impacted area is an extremely important tool. Determining other techniques that are not so dependent on type of flow may be an effective tool. Investigating other easily measured parameters, such as pH and temperature, would also be an effective next step. Other Prediction Techniques Another approach to predicting how chemical concentrations vary during a storm event is by using cluster analysis. If suites of chemicals can be shown to vary with, or inversely to each other, then they can help to explain chemical fluctuations. These techniques are not able to qualitatively predict chemical concentrations through a storm event, but can predict basic relationships. Figure 10 is a cluster analysis of chemical concentrations through the 1999 storm event at Millstone Spring, KY. The constituents being compared are arsenic, barium, cadmium, magnesium, strontium, calcium, chromium, iron, silicon, manganese, lead, copper, potassium and sodium. The data is divided into baseflow (B), storm (S) and recession (R). The recession and baseflow are clustered close to each other, while the storm flows are more dispersed. It is intuitive that the storm waters would have the most chemical variability, because the storm water has the greatest stage fluctuations. Storm water can be thought of as the flux component, while baseflow is more of an average flow. Recession is a median value between the two. Scatterplot of Chemical Concentrations, Millstone Spring 6 S 5 4 C20 3 S S 2 1 S S 0 -1 S S Recession R B R B B S R -2 -3 -10.0 S Baseflow -7.5 -5.0 -2.5 0.0 C19 Figure 10: Scatterplot of chemical concentrations. B-baseflow, R-recession, S-storm. Figure 11 is the dendrogram of recession and baseflow chemical concentrations. These two portions cluster similarly, so it is feasible that they could be analyzed using the same dendrogram. Chromium, iron, magnesium, potassium, silica, lead and copper have the highest similarity to each other. It is this suite of minerals that would be found in spring water at higher concentrations during baseflow and recession periods. This suite of minerals is probably associated with bedrock, because this water is not as impacted by surface input of water. Cluster Analysis of Baseflow and Recession Water, Millstone Spring Similarity 65.05 76.70 88.35 100.00 As Cr Fe_1 Mn K_1 Si_1 Pb Cu Variables Ba (D) Ca_1 Cd Mg_1 Na_1 Sr (D) Figure 10: Dendrogram of recession and baseflow, Millstone Spring. Figure 11 is the dendrogram of storm water during the Millstone Spring 1999 storm event. Chromium, iron, silica, manganese, lead, copper, potassium is the suite of elements with the greatest similarity. This is the same suite of minerals with the highest level of similarity in Figure 11. The main difference between Figure 11 and 10 is that Figure 11 has less variability. The chemical constituents are more clustered, which may indicate that they are always present in the system, but increase during storm events. Notice that calcium and magnesium, primary constituents of karst waters, are less clustered at baseflow/recession. That is probably due to the high concentrations of these ions relative to the other constituents. Cluster Analysis of Storm Water Constituents, Millstone Spring Similarity 64.60 76.40 88.20 100.00 As Ba (D) Cd Mg_1 Sr (D) Ca_1 Cr Fe_1 Variables Si_1 Mn Pb Cu K_1 Na_1 Figure 11: Dendrogram of storm water, Millstone Spring. Analysis of Cluster Analysis Cluster analysis is an effective technique for isolating out different types of spring water. Storm and baseflow can be effectively separated out based on the chemical constituents, while the recession waters are clustered somewhere between these two extremes. Dendrograms can identify chemical similarities and suites of elements that occur in different waters. The effect of fresh water, in the form of storm runoff, is apparent in the difference in the dendrograms. These techniques are effective for quantifying different components of storm sampling. These techniques cannot predict concentrations, but can provide a good idea for how samples may fall out. Understanding how baseflow, storm and recession portions of a storm event will plot can be very important. If samples vary from this predetermined cluster, it could indicate that there was a sampling or analysis problem. It could also indicate that there has been some sort of change in the aquifer, such as change in land use. Deviations from this cluster analysis could also indicate subsurface changes, such as flow from diffuse rather than conduit flow. Cluster analysis is an important way to quantify chemical variability in a karst aquifer. Works Cited Croft, A., 2003, Introduction to Karst Environmental Problems, http://www.dyetracing.com/karst/ka01013.html Forester Communications, 2003, Karst Cross-Section, http://www.forester.net/images/sw0111_49.gif Peterson, E.W., Davis, R.K. and Brahana, J.V., 2000, The use of Regression Analysis to Predict Nitrate-Nitrogen Concentrations in Springs of Northwest Arkansas, in Sasowsky, I.D. and Wicks, C.M. (eds), Groundwater Flow and Contaminant Transport in Carbonate Aquifers, A.A. Balkema, Rotterdam, p.43-63. In addition to the sources listed below, the statistical programs MINITAB, PSI-Plot and Excel were used in the analysis of the data. Dr. Dorothy Vespers provided the chemical data for Millstone Spring. An Investigation of Techniques to Predict and Quantify Stormwater Chemical Concentrations in a Karst Aquifer System Rachel Grand 11 December 2003