EX2-03_solution - University of Canberra

advertisement
Inflows to
Burrinjuck Dam
Surname:
Student No.
Introduction
Data on water flow for Burrinjuck Dam were collected as part of a project on water quality management
conducted by the University of Canberra for the Land and Water Resources Research and Development
Corporation.
The data were provided by the New South Wales Department of Water Resources and comprise the following
variables stored in the file BJUCK.DAT. DATE represents the date at which the water collections were taken.
Depth, volume in megalitres (ML) and area were measured or estimated for that date, and inflow and outflow
were measured using appropriate gauging stations.
The format of the data in BJUCK.DAT is shown in the table below. For example, the measurement for depth
occupies position 9 to 13 on each line in the data file.
The Data
Variable
Columns
Units
DATE
DEPTH
VOLUME
AREA
INFLOW
OUTFLOW
RAINFALL
EVAPORATION
1- 6
9-13
16-21
24-27
30-34
36-40
43-46
49-52
ddmmyy
m
ml
ha
ml/d
ml/d
mm/d
mm/d
The Problem
The New South Wales Department of Water Resources requires a detailed summary of the flows, rainfall and
evaporation for Burrinjuck Dam.
Delete the above text, leaving only the logo and exercise title, your name and student number.
Exercise 1
Perform the appropriate analyses for INFLOW only, and provide a brief report for the NSW Department of Water
Resources, using the proforma supplied.
Analysis
It is sound practice when analysing data that is not your own, to examine it before analysis. You should read the
raw data into the Editor for perusal before beginning the analysis. Once you are satisfied, undertake the
appropriate analyses, graphical and otherwise, to determine whether the inflows are normally distributed.
 Copyright Arthur Georges 2002
1
Table 1. Tests of Normality for inflows to Burrinjuck Dam, NSW. The Shapiro-Wilkes Test is the test of choice.
Figure 1. A probability plot of inflows to Burrinjuck Dam, NSW.
Curvilinearity in the plot indicates clear departure from normality.
Figure 2. Stem-Leaf Plot of inflows to Burrinjuck Dam, NSW. The
distribution is clearly skewed to the right, with some extreme (0) and
very extreme values (*). Non-normality is also indicated by the noncoincidence of the mean, median and mode.
 Copyright Arthur Georges 2002
2
Figure 3. A high quality histogram showing the
distribution of inflows to Burrinjuck Dam, NSW. The
distribution is clearly non-normal.
Results
What do you conclude regarding the normality of the variable INFLOW? Be sure to include supporting statistics
or cross-references to diagrams and tables produced during the analysis.
The data for inflows to Burrinjuck Dam are non-normal, with multiple lines of evidence supporting this
conclusion. There was significant deviation from Normality (Shapiro Wilkes W=0.86, p <0.0001), the mean,
median and mode did not coincide, and the probability plot (Figure 2) showed clear curvilinearity. The
distribution of inflows was unimodal with a strong skew to the right (Figure 3).
Compute a comprehensive set of summary statistics for the variable INFLOW. Provide a concise summary of the
results, such as might appear in the results section of a manuscript or report. Include in your summary, a
description of the distribution of INFLOW values, only those descriptive statistics appropriate to the data, and a
working definition of an extreme inflow.
Table 2. Summary statistics for inflows to Burringjuck Dam, NSW. The modal
class, read from Figure 3, was 9,000 ML.
 Copyright Arthur Georges 2002
3
Inflow for Burrinjuck Dam ranged from 770 ML to 40610 ML during the period of study, with a mean of 11,067
ML (+ 1014 ML, n=62). The distribution of flows was unimodal but strongly skewed to the right, with a median
flow of 9155 ML and a modal flow of 9,000 ML. Perusal of the data in the stem-leaf plot suggests that the 95th
percentile of 27,000 ML would be an appropriate definition of an extreme flow event for this system.
Discussion
With regard to normality, are your results consistent with expectation for a variable such as INFLOW? Why?
I would have expected the distribution of flows to be non-normal and skewed to the right because for most of
the time, in the absence of rainfall, the system would be dominated by low flows. Only during rain would flows
be expected to increase, and such rainfall events are likely to be eposodic.
What advice would you give to anyone planning further statistical analyses on INFLOW?
I would strongly advise anyone contemplating further statistical analysis of this variable to consider
transforming the data prior to analysis, with a view to achieving a Normal distribution of flows.
Program Listing
Append a full SAS program listing,
cleaned up and free from error or redundant code.
DATA BURRIN;
INFILE "C:\BJUCK.DAT";
INPUT DATE DDMMYY. DEPTH VOLUME AREA INFLOW OUTFLOW RAINFALL EVAP;
RUN;
PROC UNIVARIATE DATA=BURRIN PLOT NORMAL;
VAR INFLOW;
RUN;
GOPTIONS RESET=ALL;
PROC GCHART DATA=BURRIN;
VBAR INFLOW;
RUN;
Exercise 2
If the analysis of the Burrinjuck inflows shows that the variable INFLOW is not normally distributed, repeat the
analysis on this variable following a standard log transformation and a square root transformation.
Y' = LOG10 (Y+ 1)
Y' = SQRT(Y + ½)
Analysis
Undertake the appropriate analyses to determine whether the logged inflows are normally distributed. Repeat
for the square root flows. Select the transformation that is the most successful in normalising the inflows.
Present the outcomes of the analysis using the best transformation below. Be sure to include a histogram.
 Copyright Arthur Georges 2002
4
Table 3. A comparison of the performance of the Log transformation and the Square Root transformation in
rendering Burrinjuck flows Normal. The Log transformation is superior.
Comparison of the histograms (stem-leaf) and the probability plots suggests that the Log transformation was
superior to the Square Root transformation in rendering the flows normal. The results of the Shapiro-Wilkes test
did not distinguish between the two transformations, the square root being marginally superior to the Log. I
have chosen the Log transformation.
Compute a comprehensive set of summary statistics for the transformed inflows. Present the full set of statistics
below in tabular form.
Table 4. Summary statistics for inflows to Burrinjuck dam, following a Log 10 transformation.
 Copyright Arthur Georges 2002
5
Results
What do you conclude regarding the normality of the transformed inflows? Be sure to include supporting
statistics or cross-references to diagrams and tables produced during the analysis.
Following a log transformation, flows into Burrinjuck Dam were not significantly different in their distribution
from Normal (Shapiro-Wilkes W=0.97, p=0.1184). This was also clearly evident in the probability plot, and the
histogram, though nine points were extreme and seemed to depart from Normal expectation. These should be
checked.
Provide a concise summary of the results, such as might appear in the results section of a manuscript or report.
Include in your summary, a description of the distribution of the transformed inflows, only those descriptive
statistics appropriate to the data, and a working definition of an extreme inflow.
Following a Log transformation, every indication is that the flows are normally distributed, in other words,
flows into Lake Burrinjuck are log-normally distributed. Log flow ranged from 2.89 to 4.60 with a mean of 3.93
+ 0.04 (n=62). A definition of an extreme flow event is given by the mean plus two standard deviations, that
is 3.93 or 38,900 ML.
Discussion
What advice would you give to anyone planning further statistical analyses on inflows?
I would advise anyone contemplating further analysis of inflows to Burrinjuck Dam to log transform the data in
preparation for the further analyses. The log transformation rendered the distribution of flows Normal. A
screening for aberrant values would be advised, as even after transformation, a few points seemed to depart
from the general relationship in the probability plot.
What recommendations would you like to make to Department of Water Resources?
The flows to Burrinjuck appear to be dominated by low flow conditions for most of the year, punctuated by a
number of high flows presumably coincident with rainfall events. The non-normal distribution of flows is a likely
driver of non-normality in a number of water quality variables that depend in part on flow conditions (i.e. those
involving leaching from the surrounding catchment). The department might wish to consider partitioning their
dataset into low flow conditions, and flow conditions influenced by surface runoff.
Program Listing
Append a full SAS program listing,
cleaned up and free from error or redundant code.
DATA BURRIN;
SET BURRIN;
LOGFLOW=LOG10(INFLOW+1);
SQRTFLOW=SQRT(INFLOW+0.5);
RUN;
PROC UNIVARIATE DATA=BURRIN PLOT NORMAL;
VAR LOGFLOW SQRTFLOW;
RUN;
GOPTIONS RESET=ALL;
PROC GCHART DATA=BURRIN;
VBAR LOGFLOW SQRTFLOW;
RUN;
 Copyright Arthur Georges 2002
6
Download