EX2-02_solution - University of Canberra

advertisement
Water Chemistry of
Lake Carcoar
Introduction
Data on chemical composition of the water of Lake Carcoar, near Cowra, were entered as part of a project on
water quality management conducted at the University of Canberra.
Lake Carcoar is a relatively small storage in an agricultural district, so its water quality is of particular concern to
the New South Wales Sydney Catchment Authority.
The data are held in disk file CARCOAR.DAT and the measurements have been selected because of their known
relationship to algal production, particularly production by diatoms. These algae are single-celled and secrete
elaborate silica skeletons. Blooms of these microscopic organisms can cause severe deterioration of water
quality.
The Data
Variable
STATION NUMBER
DATE
NITRATE
SILICA
SOLUBLE PHOS
TOTAL PHOSPHORUS
AMMONIA
CHLOROPHYLL-A
CONDUCTIVITY
TURBIDITY
Columns
1- 7
8-13
16-22
25-30
33-37
40-44
47-51
54-56
59-61
64-69
Units
ddmmyy
mg/l
mg/l
mg/l
mg/l
mg/l
UNESCO units
microsiemens/cm
NTU
The Problem
The Sydney Catchment Authority would like to design a monitoring programme for this lake, based on
knowledge of the typical concentrations of each of these key measurements. They would also like forewarning
of algal blooms, and information that can be used to define upper acceptable limits for each of these variables
would be most welcome. Silica and Total Phosphorus are of greatest interest.
Perform the appropriate analyses for SILICA, and provide a brief report for the Catchment Authority, using the
proforma supplied.
(c) Arthur Georges, 2002
1
Analysis
Undertake the appropriate analyses to determine whether silica concentration is normally distributed. Present
the outcomes of the analysis below. Be sure to include a histogram.
Normal Probability Plot
41+
*
|
37+
|
*
33+
***
|
****
29+
*
|
*
25+
*
+
|
*
+++
21+
*
+++
|
+++
17+
+**
|
+***
13+
+++**
|
+++
*
9+
+++
***
|
+++ *******
5+
********
|
**********
1+** **********
++
+----+----+----+----+----+----+----+----+----+----+
-2
-1
0
+1
+2
Figure 1. Normal probability plot of 324 measurements of
Silica concentrations (in mg/l) collected from 7 different sites in
Lake Carcoar during the period between 14 July 1981 and 8
January 1985.
Table 1. Test for normality for Silica concentrations (mg/l) for Lake
Carcoar.
Tests for Normality
Test
--Statistic---
-----p Value------
Shapiro-Wilk
Kolmogorov-Smirnov
Cramer-von Mises
Anderson-Darling
W
D
W-Sq
A-Sq
Pr
Pr
Pr
Pr
(c) Arthur Georges, 2002
0.666368
0.277731
6.588178
36.07851
<
>
>
>
W
D
W-Sq
A-Sq
<0.0001
<0.0100
<0.0050
<0.0050
2
Figure 2. Histogram for 324 measurements of Silica
concentrations (in mg/l) collected from 7 different sites in
Lake Carcoar during the period between 14 July 1981 and 8
January 1985.
Compute a comprehensive set of summary statistics for the variable SILICA. Present the full set of statistics
below in tabular form.
Table 2. Descriptive statistics for Silica concentrations (mg/l) for Lake
Carcoar.
N
Mean
Std Deviation
Skewness
Uncorrected SS
Coeff Variation
324
6.98352778
7.01269648
2.56413518
31685.8355
100.417679
Sum Weights
Sum Observations
Variance
Kurtosis
Corrected SS
Std Error Mean
324
2262.663
49.177912
6.48595258
15884.4656
0.38959425
Table 3. Basic summary statistics for Silica concentrations (mg/l) for
Lake Carcoar.
Location
Mean
Median
Mode
Variability
6.983528
5.000000
2.400000
Std Deviation
Variance
Range
Interquartile Range
7.01270
49.17791
40.50000
4.16550
Table 4. Hypothesis test for location for Silica
concentrations (mg/l) for Lake Carcoar.
Tests for Location: Mu0=0
Test
-Statistic-
-----p Value------
Student's t
Sign
Signed Rank
t
M
S
Pr > |t|
Pr >= |M|
Pr >= |S|
17.92513
161.5
26163
<.0001
<.0001
<.0001
Table 5. Quantiles for Silica concentrations (mg/l) for
Lake Carcoar.
Quantile
100% Max
99%
95%
90%
75% Q3
50% Median
25% Q1
10%
5%
1%
0% Min
Estimate
40.5000
33.6000
25.2000
14.3000
7.1655
5.0000
3.0000
2.0000
1.6000
0.4000
0.0000
Table 6. Extreme values and missing values
for Silica concentrations (mg/l) for Lake
Carcoar.
(c) Arthur Georges, 2002
----Lowest----
----Highest---
Value
Value
Obs
Obs
3
0.0
0.3
0.4
0.4
0.5
188
102
103
101
107
33.4
33.6
33.7
34.3
40.5
200
199
284
286
282
Missing Values
Missing
Value
Count
.
18
(c) Arthur Georges, 2002
-----Percent Of----Missing
All Obs
Obs
5.26
100.00
4
Results
What do you conclude regarding the normality of the variable SILICA? Be sure to include supporting statistics or
cross-references to diagrams and tables produced during the analysis.
Silica concentration (mg/l) from water samples collected from seven different sites in Lake Carcoar during the
period between 14 July 1981 and 8 January 1985 were not normally distributed (Shapiro-Wilks Test, W=0.666,
p<0.0001) (Figures 1 & 2). Indeed, their distribution is strongly skewed to the right. This is also confirmed in
that the mean of 7.0 mg/l is larger than the median concentration of 5.0 mg/l which in turn is larger than the
mode of 2.4 mg/l (Table 3). This indicates that the distribution is skewed and the likely presence of some
extreme values.
The key issue here is that you recognise that there are multiple indications of non-normality when it exists –
from a test statistic (Shapiro-Wilks W), from the graphical representation of the data as a probability plot and
histogram, and by the non-coincidence of the mean, median and mode.
Provide a concise summary of the results, such as might appear in the results section of a manuscript or report.
Include in your summary, a description of the distribution of SILICA values, only those descriptive statistics
appropriate to the data, and a working definition of an extreme SILICA value.
Silica cocentrations for Lake Carcoar ranged from 0.0 to 40.5 mg/L with a mean 6.98  0.39, (n= 324) during
the period of the study. This variable was not normally distributed, but rather had a unimodal distribution with a
pronounced skew to the right (W= 0.67 p <0.0001; Figures 2 &3). The median and mode were 5.0, and 2.4
mg/L respectively, and the interquartile range was 4.17. An extreme event was defined by the 99th percentile
as ay value greater than 33.6 mg/L.
The key issue here is that you recogised the need to adjust your description by virtue of the non-normality of the
data, to include the greater detail necessary in your description than would be the case if the data had been
Normal. Need for example to include the mean, median and mode, as the three will not be coincident, to
describe the distribution in more detail (unimodal, bimodal? Skewed to right or left? Strongly leptokurtic or
platykurtic, etc etc), and to define extreme events in terms of percentiles not the mean  3 standard deviations.
Discussion
With regard to normality, are your results consistent with expectation for a variable such as SILICA? Why?
One might usually expect concentrations of chemicals, like many other variables, to be normally distributed. The
strongly non-normal behaviour exhibited by the concentrations of silica in the lake must result from the periodic
episodes of algal blooms or episodic influx of silica leached from the catchment during storm events. Diatoms
secrete silica skeletons. Therefore high concentrations of silica in the water indicate high levels of algal
production and hence possible deterioration of water quality. Since these events of high algal production are
episodic and since they result in high concentrations of silica in the water body, we might not be surprised to find
that the distribution of silica in the lake to be highly right skewed.
(c) Arthur Georges, 2002
5
Any plausible explanation of your expectation, whether you expected normality or not, will do.
What advice would you give to anyone planning further statistical analyses on SILICA?
As the data is clearly not normal, the usual parameteric analyses such as construction of a 95% confidence
interval cannot be directly applied. One approach to suggest is to try a normalising transformation such as a
log(x) or log(x+1) transformation (in the later case since there are concentrations of zero in the data).
Below is a histogram and normal probability plot of the log transformed silica concentrations. Clearly in this case
the distribution of log silica is nearly normal.
FREQ UENCY
110
100
90
80
70
60
50
40
30
20
10
0
0 .0
0 .2
0 .4
0 .6
0 .8
1 .0
1 .2
1 .4
1 .6
1 .8
2 .0
L G S IL IC A M ID P O IN T
Normal Probability Plot
1.65+
*
|
****
|
******++
|
** +++
|
**++
|
****
|
++*
|
+++***
|
+******
|
*****
|
*****
0.55+
***+
|
****
|
*****
|
*****+
|
**+++
|
+*+
| +++*
|++ **
|
|***
|
-0.55+*
+----+----+----+----+----+----+----+----+----+----+
-2
-1
0
+1
+2
(c) Arthur Georges, 2002
6
What recommendations would you like to make to Sydney Catchment Authority?
Since the presence of high concentrations of silica in the water body maybe an indicator of high algal production,
continued measurement of this parameter may provide an index for monitoring the progress of algal blooms. As
only 1% of the observed concentrations of silica in the water body exceeded 36 mg/l during the study period,
one might use this value as a trigger for management intervention.
Any reasonable recommendation will suffice for the purposes of this exercise, not necessarily the one above.
Program Listing
Append a full SAS program listing, cleaned up and free from error or redundant code.
DATA CHEM;
INFILE "K:\1\CARCOAR.DAT";
INPUT STATID DATE $ NITRATE SILICA SOLPO4 PO4 NH3
CHLOROA CONDUCT TURBID;
RUN;
PROC UNIVARIATE DATA=CHEM PLOT NORMAL;
VAR SILICA;
RUN;
GOPTIONS RESET=ALL;
PROC GCHART DATA=CHEM;
VBAR SILICA /SPACE=0 MIDPOINTS=0.0 TO 40 BY 5;
RUN;
DATA CHEM;
SET CHEM;
SILICA=LOG10(SILICA+1);
RUN;
PROC UNIVARIATE DATA=CHEM PLOT NORMAL;
VAR SILICA;
RUN;
GOPTIONS RESET=ALL;
PROC GCHART DATA=CHEM;
VBAR SILICA /SPACE=0;
RUN;
(c) Arthur Georges, 2002
7
Download