Water Chemistry of Lake Carcoar Introduction Data on chemical composition of the water of Lake Carcoar, near Cowra, were entered as part of a project on water quality management conducted at the University of Canberra. Lake Carcoar is a relatively small storage in an agricultural district, so its water quality is of particular concern to the New South Wales Sydney Catchment Authority. The data are held in disk file CARCOAR.DAT and the measurements have been selected because of their known relationship to algal production, particularly production by diatoms. These algae are single-celled and secrete elaborate silica skeletons. Blooms of these microscopic organisms can cause severe deterioration of water quality. The Data Variable STATION NUMBER DATE NITRATE SILICA SOLUBLE PHOS TOTAL PHOSPHORUS AMMONIA CHLOROPHYLL-A CONDUCTIVITY TURBIDITY Columns 1- 7 8-13 16-22 25-30 33-37 40-44 47-51 54-56 59-61 64-69 Units ddmmyy mg/l mg/l mg/l mg/l mg/l UNESCO units microsiemens/cm NTU The Problem The Sydney Catchment Authority would like to design a monitoring programme for this lake, based on knowledge of the typical concentrations of each of these key measurements. They would also like forewarning of algal blooms, and information that can be used to define upper acceptable limits for each of these variables would be most welcome. Silica and Total Phosphorus are of greatest interest. Perform the appropriate analyses for SILICA, and provide a brief report for the Catchment Authority, using the proforma supplied. (c) Arthur Georges, 2002 1 Analysis Undertake the appropriate analyses to determine whether silica concentration is normally distributed. Present the outcomes of the analysis below. Be sure to include a histogram. Normal Probability Plot 41+ * | 37+ | * 33+ *** | **** 29+ * | * 25+ * + | * +++ 21+ * +++ | +++ 17+ +** | +*** 13+ +++** | +++ * 9+ +++ *** | +++ ******* 5+ ******** | ********** 1+** ********** ++ +----+----+----+----+----+----+----+----+----+----+ -2 -1 0 +1 +2 Figure 1. Normal probability plot of 324 measurements of Silica concentrations (in mg/l) collected from 7 different sites in Lake Carcoar during the period between 14 July 1981 and 8 January 1985. Table 1. Test for normality for Silica concentrations (mg/l) for Lake Carcoar. Tests for Normality Test --Statistic--- -----p Value------ Shapiro-Wilk Kolmogorov-Smirnov Cramer-von Mises Anderson-Darling W D W-Sq A-Sq Pr Pr Pr Pr (c) Arthur Georges, 2002 0.666368 0.277731 6.588178 36.07851 < > > > W D W-Sq A-Sq <0.0001 <0.0100 <0.0050 <0.0050 2 Figure 2. Histogram for 324 measurements of Silica concentrations (in mg/l) collected from 7 different sites in Lake Carcoar during the period between 14 July 1981 and 8 January 1985. Compute a comprehensive set of summary statistics for the variable SILICA. Present the full set of statistics below in tabular form. Table 2. Descriptive statistics for Silica concentrations (mg/l) for Lake Carcoar. N Mean Std Deviation Skewness Uncorrected SS Coeff Variation 324 6.98352778 7.01269648 2.56413518 31685.8355 100.417679 Sum Weights Sum Observations Variance Kurtosis Corrected SS Std Error Mean 324 2262.663 49.177912 6.48595258 15884.4656 0.38959425 Table 3. Basic summary statistics for Silica concentrations (mg/l) for Lake Carcoar. Location Mean Median Mode Variability 6.983528 5.000000 2.400000 Std Deviation Variance Range Interquartile Range 7.01270 49.17791 40.50000 4.16550 Table 4. Hypothesis test for location for Silica concentrations (mg/l) for Lake Carcoar. Tests for Location: Mu0=0 Test -Statistic- -----p Value------ Student's t Sign Signed Rank t M S Pr > |t| Pr >= |M| Pr >= |S| 17.92513 161.5 26163 <.0001 <.0001 <.0001 Table 5. Quantiles for Silica concentrations (mg/l) for Lake Carcoar. Quantile 100% Max 99% 95% 90% 75% Q3 50% Median 25% Q1 10% 5% 1% 0% Min Estimate 40.5000 33.6000 25.2000 14.3000 7.1655 5.0000 3.0000 2.0000 1.6000 0.4000 0.0000 Table 6. Extreme values and missing values for Silica concentrations (mg/l) for Lake Carcoar. (c) Arthur Georges, 2002 ----Lowest---- ----Highest--- Value Value Obs Obs 3 0.0 0.3 0.4 0.4 0.5 188 102 103 101 107 33.4 33.6 33.7 34.3 40.5 200 199 284 286 282 Missing Values Missing Value Count . 18 (c) Arthur Georges, 2002 -----Percent Of----Missing All Obs Obs 5.26 100.00 4 Results What do you conclude regarding the normality of the variable SILICA? Be sure to include supporting statistics or cross-references to diagrams and tables produced during the analysis. Silica concentration (mg/l) from water samples collected from seven different sites in Lake Carcoar during the period between 14 July 1981 and 8 January 1985 were not normally distributed (Shapiro-Wilks Test, W=0.666, p<0.0001) (Figures 1 & 2). Indeed, their distribution is strongly skewed to the right. This is also confirmed in that the mean of 7.0 mg/l is larger than the median concentration of 5.0 mg/l which in turn is larger than the mode of 2.4 mg/l (Table 3). This indicates that the distribution is skewed and the likely presence of some extreme values. The key issue here is that you recognise that there are multiple indications of non-normality when it exists – from a test statistic (Shapiro-Wilks W), from the graphical representation of the data as a probability plot and histogram, and by the non-coincidence of the mean, median and mode. Provide a concise summary of the results, such as might appear in the results section of a manuscript or report. Include in your summary, a description of the distribution of SILICA values, only those descriptive statistics appropriate to the data, and a working definition of an extreme SILICA value. Silica cocentrations for Lake Carcoar ranged from 0.0 to 40.5 mg/L with a mean 6.98 0.39, (n= 324) during the period of the study. This variable was not normally distributed, but rather had a unimodal distribution with a pronounced skew to the right (W= 0.67 p <0.0001; Figures 2 &3). The median and mode were 5.0, and 2.4 mg/L respectively, and the interquartile range was 4.17. An extreme event was defined by the 99th percentile as ay value greater than 33.6 mg/L. The key issue here is that you recogised the need to adjust your description by virtue of the non-normality of the data, to include the greater detail necessary in your description than would be the case if the data had been Normal. Need for example to include the mean, median and mode, as the three will not be coincident, to describe the distribution in more detail (unimodal, bimodal? Skewed to right or left? Strongly leptokurtic or platykurtic, etc etc), and to define extreme events in terms of percentiles not the mean 3 standard deviations. Discussion With regard to normality, are your results consistent with expectation for a variable such as SILICA? Why? One might usually expect concentrations of chemicals, like many other variables, to be normally distributed. The strongly non-normal behaviour exhibited by the concentrations of silica in the lake must result from the periodic episodes of algal blooms or episodic influx of silica leached from the catchment during storm events. Diatoms secrete silica skeletons. Therefore high concentrations of silica in the water indicate high levels of algal production and hence possible deterioration of water quality. Since these events of high algal production are episodic and since they result in high concentrations of silica in the water body, we might not be surprised to find that the distribution of silica in the lake to be highly right skewed. (c) Arthur Georges, 2002 5 Any plausible explanation of your expectation, whether you expected normality or not, will do. What advice would you give to anyone planning further statistical analyses on SILICA? As the data is clearly not normal, the usual parameteric analyses such as construction of a 95% confidence interval cannot be directly applied. One approach to suggest is to try a normalising transformation such as a log(x) or log(x+1) transformation (in the later case since there are concentrations of zero in the data). Below is a histogram and normal probability plot of the log transformed silica concentrations. Clearly in this case the distribution of log silica is nearly normal. FREQ UENCY 110 100 90 80 70 60 50 40 30 20 10 0 0 .0 0 .2 0 .4 0 .6 0 .8 1 .0 1 .2 1 .4 1 .6 1 .8 2 .0 L G S IL IC A M ID P O IN T Normal Probability Plot 1.65+ * | **** | ******++ | ** +++ | **++ | **** | ++* | +++*** | +****** | ***** | ***** 0.55+ ***+ | **** | ***** | *****+ | **+++ | +*+ | +++* |++ ** | |*** | -0.55+* +----+----+----+----+----+----+----+----+----+----+ -2 -1 0 +1 +2 (c) Arthur Georges, 2002 6 What recommendations would you like to make to Sydney Catchment Authority? Since the presence of high concentrations of silica in the water body maybe an indicator of high algal production, continued measurement of this parameter may provide an index for monitoring the progress of algal blooms. As only 1% of the observed concentrations of silica in the water body exceeded 36 mg/l during the study period, one might use this value as a trigger for management intervention. Any reasonable recommendation will suffice for the purposes of this exercise, not necessarily the one above. Program Listing Append a full SAS program listing, cleaned up and free from error or redundant code. DATA CHEM; INFILE "K:\1\CARCOAR.DAT"; INPUT STATID DATE $ NITRATE SILICA SOLPO4 PO4 NH3 CHLOROA CONDUCT TURBID; RUN; PROC UNIVARIATE DATA=CHEM PLOT NORMAL; VAR SILICA; RUN; GOPTIONS RESET=ALL; PROC GCHART DATA=CHEM; VBAR SILICA /SPACE=0 MIDPOINTS=0.0 TO 40 BY 5; RUN; DATA CHEM; SET CHEM; SILICA=LOG10(SILICA+1); RUN; PROC UNIVARIATE DATA=CHEM PLOT NORMAL; VAR SILICA; RUN; GOPTIONS RESET=ALL; PROC GCHART DATA=CHEM; VBAR SILICA /SPACE=0; RUN; (c) Arthur Georges, 2002 7