Multiple Regression Analysis for Evaluating

advertisement
This file was created by scanning the printed publication.
Errors identified by the software have been corrected;
however, some errors may remain.
Multiple Regression Analysis for Evaluating
Non-Point Source Contributions to Water Quality
in the Green River, Wyoming1
Timothy E. Fannin, Michael Parker, and Timothy J. Maret
2
2
Abstract.--The Green River drains 12,000 mi of western
Wyoming and northern Utah. The basin incorporates a diverse
spectrum of geology, topography, soils, and climate. Land
use is predominately range and forest, though an increasing
number of industries are locating in the southern half of
the drainage. We report on the application of a multiple
regression model used to associate various riparian and nonriparian basin attributes (geologic substrate, land use,
channel slope, etc.) with previous measurements of
phosphorus, nitrate, and dissolved solids in the Green River
system. We propose possible reasons for such significant
water quality/basin attribute associations, and explain some
of the advantages and disadvantages of using such a technique
to explore those associations in a large western watershed.
INTRODUCTION
The Green River basin of western Wyoming and
northern Utah is a climatologically, topographically, and geologically div rse wa ershed. oMean
5
temgeratures range from -6 F (-21 C) to 86 F
(30 C); mean precipitation varies from 11" (28cm)
to 41" (l04cm), with the latter figure typical for
the surrounding mountains. The major vegetative
cover in the drainage is range and forest (table 1).
5
Not surprisingly, the area is used by man predominantely for grazing and forestry. Sparsely inhabitated, (the population of the study area is only about
52300 people, U.S. Bureau of the Census 1981), other
land uses are mining of trona (sodium" carbonate) and
farming two major areas of irrigated cropland.
The basin topographically is a mixture of extensive flats and rolling hills surrounded on three
sides by mountains (fig. 1) which have a maximum
Table 1.--Land cover by percentage of total basin
area in the Green River and Blacks Fork sections (see figure 1) of the Green River Basin.
Land cover
type
Alpine
Irrigated crops
Rock or dunes
Wetlands
Urban
Range
Forest
Total
2
Area (mi )
Green River
section
2
6
1
1
<1
73
16
100
9500
Blacks Fork
section
0
7
3
1
<1
67
20
100
2920
\
\
GREEN
•
RIVER
I
SECTION / "
~
1Paper presented at the first North American
Conference. [The University of Arizona,
Tucso~, AZ, April 16-18, 1985].
Timothy E. Fannin and Timothy J. Maret are
graduate students, and Michael Parker is Associate
Professor, in the Department of Zoology and Physiology, University of Wyoming, Laramie, WY.
This paper is based upon research conducted
under a grant from the Wyoming Water Research
Center, Laramie, WY.
WYOMING
Ripa~ian
UTAH
UINTA MTNS
'=STATION
(I
u
Figure 1.--0utline map of the study area showing
water quality/discharge stations, major streams
and the Green River and Blacks Fork Sections
of the drainage.
201
elevation of 13804 feet (4207m). Mean elevation is
7416 feet (2260m). Sixty percent of the drainage
is underlain by Tertiary formations, and extensive
areas of Green River shale.
Though poor water quality has not been a
problem in the upper reaches of the basin, the
lower reach of the Green River shows a large
increase in salinity load as dissolved solids
(DeLong 1977). Flaming Gorge Reservoir, immediately
downstream of our study area shows sporadic, though
increasingly severe, summer eutrophication which
has affected adversely both fishing and bodycontact recreation (U.S. Environmental Protection
Agency 1977, Southwestern Wyoming Water Quality
Planning Association 1978, Fannin 1983, Parker, et
al. 1984). The low human population density, few
industries or facilities requiring surface water
discharge permits (Wagner 1984), and relatively
high proportion of agricultural land use support
the observation that non-point sources are responsible for 88% of the phosphorus input to Flaming
Gorge Reservoir (Southwestern Wyoming Water Quality
Planning Association 1978).
In this paper, we will:
1) demonstrate and document our application of
mUltiple regression to associate water quality
with attributes of the Green River basin.
2) Propose possible causes of such significant
water quality/basin attribute associations.
3) Discuss the general advantages and disadvantages of using multiple regression techniques to model water quality in the basin.
In conducting this research we assumed that
water quality is indeed a function of physical,
chemical and biological characteristics of the
drainage, that multiple regression is suited for
associating such characteristics with water quality, and that non-point sources are paramount in
determining water quality in the watershed.
MATERIALS AND METHODS
Regression Models
No systematic basin-wide investigation of the
origin of dissolved and suspended substances in the
Green River has yet been done. Such a study would
be quite useful as a baseline study, both in the
accumulation and organization of existing data
about water quality and its sources, and in relating present associations of water quality to basin
characteristics. Practical applications of such
knowledge would be apportioning loadings to a specific source area of the drainage, predicting
changes in water quality from changes in basin
characteristics such as land use, or investigating
if associations of water quality with basin characteristics change with time.
Multiple linear regression describes variation
of a single dependent variable as a function of
variations in several independent variables. In
this case, a single water quality parameter is the
dependent variable, and its variation is accounted
for by the variation in two or more independent
variables of physical, chemical, or biological
basin characteristics. The general equation (from
Edwards 1979) is:
Perhaps one of the reasons such a basin-wide
investigation has not been done is the sheer size
of the area. However, Lystrom, et al. (1978) proposed and used a multiple regression modeling
approach to associate various ~asin parameters with
water quality in the 27,510 mi Susquehanna River
watershed. We report here the results of an investigation of the association of watershed characteristics with water quality in the Green River basin
of Wyoming and Utah, using a similar multiple regression technique.
The objectives of this project are to:
where Y' is the dependent variable, XiS are the
independent variables, k the number of independent
variables in the equation, and a is the regression
constant. By choosing appropriate independent variables (basin parameters), we seek to maximize the
correlation between the predicted value of our
water quality variable and the actual value of the
variable. The basis of our choice of independent
variables derives from an interpretation of results
from an SPSS (Nie, et al. 1975) multiple regression
program, as detailed in Regression, below.
Independent Variables
In this paper, we've defined an independent
variables as the unique numerical measure of some
feature of the drainage basin. The five major
types of independent variables (also referred to as
"basin attributes"), detailed in table 2, roughly
correspond to those of Lystrom, et al., but the
individual attributes within each of our categories
were dictated by the data available for the Green
River basin.
1) associate attributes of the Green River
watershed with water quality in the Green
River system using multiple regression. A
prerequisite to this objective is the collection and organization of water quality data
and information about the basin which could
conceivably affect water quality.
2) estimate water quality changes in Flaming
Gorge Reservoir which may be associated with
upstream basin characteristics.
Much of the data from which we derived basin
attributes had to be transformed from maps, charts,
or lists. We used a COMPAQ microcomputer with a
Houston Instruments 11"x11" digitizer to measure
areas from maps, and the LOTUS 123 software (Lotus
Development Corporation 1~83) to store and manipulate collected information. Sources of information
and a description of their transformation into independent variables follow.
3) achieve these objectives by analyzing data
existing in published records, reports, papers,
and maps. No field work is required.
202
Reduction of Number of Independent Variables
Table 2.--Major categories of basin attributes
for the Green River drainage, the number of
variables originally within each category, and
some examples of independent variables from
each category.
Basin attribute
category
GEOLOGY
Number of
variables
51
SOILS
19
CLIMATE
LAND COVER/LAND USE
3
75
HYDROLOGY
16
Number of variables
164
We reduced the number of independent variables
form the original 164 by first eliminating variables which were percentages or sums of other variables (except for Geological variables, where we
kept the sums and eliminated their components). We
make a further reduction in the number of variables
by dropping variables which were not significantly
related to a water quality variable (p=0.05) in a
simple bivariate regression. Thus, for every dependent water quality variable, we had a unique set
of independent basin attributes for the multiple
regression analysis.
Examples
Glacial area
Tipton shale area
Area of Precambrian rock
Soil pH
K factor
Mean minimum temp
Area juniper
% area of juniper
Total range area
Bifurcation ratio
Total stream length
10 year flood cfs
Dependent Variables
The Wyoming Water Research Center maintains a
copy of the U.S. Geological Survey's surface water
quality and discharge data for Wyoming. From this
we extracted all water quality data for all sampling
stations in the watershed. We selected stations
with the greatest number of acceptable water quality
parameters. A water quality parameeters was acceptable if it had at least seven years of data between
water years 1965 (when Flaming Gorge Reservoir's dam
closed) and 1979, with at least one year of data
comprised of ten or more samples. Using these criteria, we found only eight water quality variables
for at each of eighteen stations. The areas above
these eighteen stations also defined the subbasins
for which we complied basin attribute values.
Geology
We calculated areas of all geological formations shown on three hydrologic investigations maps
(Welder and McGreevy 1966, Whitcomb and Lowry 1968,
and Welder 1968). The area of each formation in
each of 18 subbasins (see Dependent Variables) were
recorded and areas of geologically similar formations summed as independent variables.
The concentration of many water quality parameters depends upon discharge (Lystrom, et al.).
For these parameters, mean loads should be calculated as the sum of instantaneous loads derived
from the concentration/discharge relationship.
For the parameters considered in this report [phosphorus (P), nitrate nitrogen (N0 ), and total dis3
solved solids (TDS)], only TDS concentration showed
such a significant relation. We therefore used TDS
loads, and phosphorus and nitrate concentrations as
our dependent variables in the multiple regression
analyses.
Soils
From Young and Singleton (1977) we found which
soil series were represented in soil associations
in the watershed and the area of each association
in each subbasin. From corresponding soil series
data sheets supplied by Munn (1984), we calculated
and weighted the characteristics of all soil series
within each association by area to obtain the subbasin values.
Climate
Regression
Maps from Lowers (1960) were enlarged and
minimum-maximum temperatures, weighted by area,
calculated for each subbasin.
Land Cover/Land Use
Anderson et al. (1984) complied a land cover
map of Wyoming from which we obtained values of cover,
weighted by area, for each subbasin.
Hydrology
Hydrological variables were estimated using
data taken from U.S. Geological Survey 1:250,000
scale topographic maps of the basin. Areas were
obtained with the digitizer, and linear measures
with a map measuring wheel. Transformations and
calculations were performed within Lotus spreadsheet files.
Each of the three water quality parameters had
a unique set of associated independent variables.
A Pearson correlation analysis (Nie, et al. 1975)
was used to investigate intercorrelations among the
independent variables prior to the regression analysis. For the regression method we chose Hull and
Nie's (1981) stepwise NEW REGRESSION, with probabilities of F-to-enter and F-to-remove at default
values of 0.05 and 0.10 respectively. All SPSS
analyses were conducted on a Control Data Corporation Cyber 760 computer.
Our interpretation of regression results to
find the "best" association of water quality with
basin attributes hinged on two objective criteria
and one somewhat philosophical principle. Our first
criterion was that a good regression equation
explains the most of the variance about the 2ependent variable (i.e., has a higher adjusted R ), and
203
has a lower measure of error (in this case, a lower
residual mean square) than would an equation with a
poorer fit. Our second criterion was that the equation minimize combinations of strongly interacting
inde~endent variables, as defined by a correlation
of r >0.60.
Given these criteria, we tempered their strict
application by the philosophy that "a relationship
may be statistically significant without being substantively important" (Milliken and Johnson 1984).
Lystrom, et al. also chose their best models based
on other-than-statistical criteria; that is, "conceptual knowledge of the water-quality process. 11
In other words, if a regression was best statistically, but we could find no conceptual reason for
the association of its basin attributes with water
quality, we chose a statistically less good but
conceptually more sensible model.
RESULTS
From 18 subbasin values for each of a selected
set of basin attributes, and eighteen values of
three water quality parameter~ taken one at a time,
we obtained three regression models with significant
and conceptually acceptable relations between the
attributes and the parameter (table 3).
NITRATE CONCENTRATION illustrates conceptual
acceptability over statistical significance. The
intercorrelation ratio was comparable to that of
PHOSPHORUS CONCENTRATION, but the initial, or best
statistical, regression analysis yielded MEAN JULY
MAXIMUM TEMPERATURE as the only significant associated attribute. Because we could think of no
process associating temperature with NITRATE CONCENTRATION in the basin, we sequentially removed
intercorrelated variables and continued regression
analyses after each deletion. The best regression
we found then, was the one in a set of conceptuall¥
acceptable models which had the highest adjusted R ,
and lowest residual mean square.
DISCUSSION
Green River Regression Models
Table 3.--Multiple-regression models of basin
attributes associated with water quality in
the Green River basin.
REGRESSION EQUATION
intercorrelation ratio is t2e number of attribute
significant correlations [r >.60] divided by the
number of interactions in the correlation matrix.)
TDS LOAD, on the other hand, had a high intercorrelation ratio (0.567), but since the variables
first selected by the analysis, which implies that
they were statistically best, also were conceptually related to TDS, we accepted this model as
best. TOTAL LENGTH OF CHANNELS is~ however, correlated with IRRIGATED CROPLAND (R =0.74), so some
caution should be used when applying this model.
The PHOSPHORUS CONCENTRATION model is conceptually acceptable because phosphorus (as total phosphorus, measured by the U.S. Geological Survey) is
associated with particulate matter in streams.
Since K FACTOR is a measure of soil erodability,
and FLOOD RATIO and estimate of flooding intensity,
we may expect that an increase of either or them
could be associated with an increase in particulates and therefore total phosphorus in streams.
ADJUSTED R2
RESIDUAL MEAN SQUARE
/I ATTRIBUTES CONSIDERED
PHOSPHORUS CONCENTRATION (mg/l) =
-0.144 + 0.563(K FACTOR) +0.0393(FLOOD RATIO)
0.978/0.000/24
For NITRATE CONCENTRATION, positive association of a geologic variable (CRETACEOUS ROCK) and
an estimate of flood intensity (FLOOD RATIO) with
dissolved nitrate in a river would be expected if
the Cretaceous rock bears minerals high in nitrate
or perhaps other nitrogenous compounds. The predictors and their relationship to nitrate concentration are therefore conceptually acceptable, although
we have not yet investigated whether the mineral
components of the Cretaceous rock formations in the
subbasin~ are in fact nitrogeneous.
NITRATE CONCENTRATION(mg/l)
-2.30 + 2.71(FLOOD RATIO) + 0.0043(CRETACEOU
ROCK [mi ])
2
0.893/0.442/5
TDS LOAD (tons/year) =
9730 + 36.5(TOTAL LENGTH OF CHANN~LS [mil)
+ 493(IRRIGATED CROPLAND ~mi ])
+ 135(MIXED RANGELAND [mi ])
8
0.993/2.21x10 /27
In table 3, FLOOD RATIO is the quotient of the
average 10-year flood divided by the maximum discharge recorded for the study period (water years
1965 to 1979). The 5 attributes considered for the
nitrate concentration model are a nonintercorrelated
subset of an original set of 16 attributes contain2
ing some highly intercorrelated members (r >0.60).
The results of the regression analyses illustrate the application of our criteria for acceptance of a regression model. PHOSPHORUS CONCENTRATION had a relatively low intercorrelation ratio
(0.323), and the variables first selected by the
regression analysis made sense conceptually. (The
The TDS LOAD model was the only model incorporating land use parameters as predictors of water
quality. A positive association of TDS with irrigated cropland is not unexpected, since TDS increases from 223 mg/l (3359 tons/year) above an
irrigated area on the Big Sandy River to 2630 mg/l
(147,000 tons/year) below it. A disturbed MIXED
RANGELAND also could increase TDS LOAD if infiltration increased as a result of reduced plant cover.
Increases in TOTAL LENGTH OF CHANNELS may imply an
increased chance of infiltrating precipitation
being captured by a stream and measured in a sample
rather than being "lost" to deeper groundwater.
Application of the Techniques
There are several advantages in applying mult-
204
iple regression techniques to water quality data
from the Green River basin. First, since existing
data were on hand or readily available from published sources, no field work was required, reducing costs. This is not a trivial advantage considering the large basin area.
Fannin, Timothy E. 1983. Wyoming lake classification and survey, volume 1. 226pp. Wyoming Game
and Fish Department, Cheyenne, WY.
Hull, C. Hadlai, and Norman H. Nie. 1981. SPSS update 7-9: new procedures and facilities for
releases 7-9. 402pp. McGraw-Hill Book Company,
New York, NY.
Secondly, from the results of the mUltiple
Lowers, A.R., 1960. Climate of Wyoming. Climatoregression analyses, we have a smaller set of basin
graphy of the United States H60-48. U.S.
parameters to investigate if we wish to determine
Department of Commerce, Washington, DC.
cause-effect relations between attributes of the
Lotus Development Corporation. 1983. Lotus 123.
drainage and water quality. Multiple regression is
Lotus Development Corporation, Cambridge, MA.
an associative technique; simply because an attriLystrom, David J., Frank A. Rinella, David A. Rickert,
bute is associated with a water quality parameter
and Lisa Zimmerman. 1978. Multiple regression
does not mean that a change in the attribute will
modeling approach for regional water quality
necessarily cause a change in the parameter. This
management. EPA-600/7-78-198. 60pp. Environnomination of important parameters is also a nonmental Research Laboratory, Athens, GA.
trivial advantage given that we found,from existMilliken, George A. and Dallas E. Johnson. 1984.
ing data, 164 basin attributes.
The analysis of messy data, volume 1: designed experiments. 473pp. Lifetime Learning
Thirdly, if we assume a cause-effect relationPublications, Belmont, CA.
ship between basin attributes and water quality,
Munn, Larry. 1984. Personal correspondence.
and if the models have been tested and verified, we
Department of Plant Science, University of
may use them to predict changes in water quality
Wyoming, Laramie, WY.
from changes in the basin attributes. this is not
Nie, Norman H., C. Hadlai Hull, Jean G. Jenkins,
useful for relatively non-static basin attributes.
Karin Steinbrenner,and Dale H. Brent. 1975.
The area of the drainage underlain by Cretaceous
SPSS: statistical package for the social
rock is not as likely to change as the area under
sciences. 675pp. McGraw-Hill Book Company, New
different land cover. We must note that the three
York, NY.
models we propose have not yet been tested or verParker, Michael, Wayne A. Hubert, and Steve Greb.
ified; they should not be used as predictive equa1984. A preliminary assessment of eutrophications.
tion in Flaming Gorge Reservoir, Denise
Bierley, ed. 46pp.+app. Wyoming Water DevelFinally, the water quality database for the
opment Commission, Cheyenne, WY, and Wyoming
portion of the Green River basin we studied was,
Water Research Center, Laramie, WY.
compared by Lystrom et al., very sparse. In order
Southwestern Wyoming Water Quality Planning Assocto get 18 stations, two less than the minimum they
iation. 1978. Clean water report for southrecommend, we had to liberalize our criteria of
western Wyoming, 313pp. CH2M Hill, Denver, CO.
choice twice. We increased the temporal width of
U.S. Environmental Protection Agency. 1977. Report
the study from 10 years to 15, and reduced the
on Flaming Gorge Reservoir, Sweetwater County,
proportion of years for which we required at least
Wyoming and Daggett County, Utah, EPA Region
one data point from 10 to 7. Lystrom, et al. also
VIII. National Eutrophication Survey Working
extrapolated the data from a single year with at
Paper H885.
least 10 seasonally spaced samples to their entire
U.S. Bureau of the Census. 1981. 1980 census of the
10-year study. In the arid and semi-arid West,
population; vol. 1: characteristics of the
where year-to-year va~iation in precipitation can
population; chapter A: number of inhabitants,
be significant, we did not feel that extrapolating.
part 52 Wyoming. ·PC80-11-A52. U.S. Bureau of
a single year of data to the entire study period
the Census, Washington, DC.
was wise. We feel that the scarcity of water qualWagner, John. 1984. Personal correspondence.
ity data found in this study would be typical of
Wyoming Departmant of Environmental Quality,
other western drainages with low human population
Cheyenne, WY.
densities and little cropland agriculture.
Welder, George E. 1968. Ground-water reconnaissance
of the Green River basin--southwestern Wyoming.
Hydrologic Investigations Atlas HA-290. U.S.
LITERATURE CITED
Geological Survey, Washington, DC.
, and Laurence J. McGreevy. 1966. GroundAnderson, S.H., and D.B. Inkely. 1984. Wyoming land
cover map. Wyoming Cooperative Fish and Wildlife -----water reconnaissance of the Great Divide and
Washakie basins and some adjacent areas,
Unit, University of Wyoming, Laramie, WY.
southwestern Wyoming. Hydrologic InvestigaDeLong, Lewis L. 1977. An analysis of salinity in
tions Atlas HA-219, reprinted 1981. U.S. Geostreams of the Green River basin, Wyoming.
logical Survey, Washington, DC.
U.S. Geological Survey Water-Resources InvestWhitcomb, Harold A. and Martin E. Lowry. 1968.
igations $77-103. 32pp. USGS Water Resources
Ground-water resources and g~ology of the Wind
Division, Cheyenne, WY.
River basin area, central Wyoming. Hydrologic
Edwards, Allen L. 1979. Multiple regression and
Investigations Atlas HA-270. U.S. Geological
the analysis of variance and covariance. 212
Survey, Washington, DC.
pp. W.H. Freeman and Co., San Francisco, CA.
205
Download