This file was created by scanning the printed publication. Errors identified by the software have been corrected; however, some errors may remain. Multiple Regression Analysis for Evaluating Non-Point Source Contributions to Water Quality in the Green River, Wyoming1 Timothy E. Fannin, Michael Parker, and Timothy J. Maret 2 2 Abstract.--The Green River drains 12,000 mi of western Wyoming and northern Utah. The basin incorporates a diverse spectrum of geology, topography, soils, and climate. Land use is predominately range and forest, though an increasing number of industries are locating in the southern half of the drainage. We report on the application of a multiple regression model used to associate various riparian and nonriparian basin attributes (geologic substrate, land use, channel slope, etc.) with previous measurements of phosphorus, nitrate, and dissolved solids in the Green River system. We propose possible reasons for such significant water quality/basin attribute associations, and explain some of the advantages and disadvantages of using such a technique to explore those associations in a large western watershed. INTRODUCTION The Green River basin of western Wyoming and northern Utah is a climatologically, topographically, and geologically div rse wa ershed. oMean 5 temgeratures range from -6 F (-21 C) to 86 F (30 C); mean precipitation varies from 11" (28cm) to 41" (l04cm), with the latter figure typical for the surrounding mountains. The major vegetative cover in the drainage is range and forest (table 1). 5 Not surprisingly, the area is used by man predominantely for grazing and forestry. Sparsely inhabitated, (the population of the study area is only about 52300 people, U.S. Bureau of the Census 1981), other land uses are mining of trona (sodium" carbonate) and farming two major areas of irrigated cropland. The basin topographically is a mixture of extensive flats and rolling hills surrounded on three sides by mountains (fig. 1) which have a maximum Table 1.--Land cover by percentage of total basin area in the Green River and Blacks Fork sections (see figure 1) of the Green River Basin. Land cover type Alpine Irrigated crops Rock or dunes Wetlands Urban Range Forest Total 2 Area (mi ) Green River section 2 6 1 1 <1 73 16 100 9500 Blacks Fork section 0 7 3 1 <1 67 20 100 2920 \ \ GREEN • RIVER I SECTION / " ~ 1Paper presented at the first North American Conference. [The University of Arizona, Tucso~, AZ, April 16-18, 1985]. Timothy E. Fannin and Timothy J. Maret are graduate students, and Michael Parker is Associate Professor, in the Department of Zoology and Physiology, University of Wyoming, Laramie, WY. This paper is based upon research conducted under a grant from the Wyoming Water Research Center, Laramie, WY. WYOMING Ripa~ian UTAH UINTA MTNS '=STATION (I u Figure 1.--0utline map of the study area showing water quality/discharge stations, major streams and the Green River and Blacks Fork Sections of the drainage. 201 elevation of 13804 feet (4207m). Mean elevation is 7416 feet (2260m). Sixty percent of the drainage is underlain by Tertiary formations, and extensive areas of Green River shale. Though poor water quality has not been a problem in the upper reaches of the basin, the lower reach of the Green River shows a large increase in salinity load as dissolved solids (DeLong 1977). Flaming Gorge Reservoir, immediately downstream of our study area shows sporadic, though increasingly severe, summer eutrophication which has affected adversely both fishing and bodycontact recreation (U.S. Environmental Protection Agency 1977, Southwestern Wyoming Water Quality Planning Association 1978, Fannin 1983, Parker, et al. 1984). The low human population density, few industries or facilities requiring surface water discharge permits (Wagner 1984), and relatively high proportion of agricultural land use support the observation that non-point sources are responsible for 88% of the phosphorus input to Flaming Gorge Reservoir (Southwestern Wyoming Water Quality Planning Association 1978). In this paper, we will: 1) demonstrate and document our application of mUltiple regression to associate water quality with attributes of the Green River basin. 2) Propose possible causes of such significant water quality/basin attribute associations. 3) Discuss the general advantages and disadvantages of using multiple regression techniques to model water quality in the basin. In conducting this research we assumed that water quality is indeed a function of physical, chemical and biological characteristics of the drainage, that multiple regression is suited for associating such characteristics with water quality, and that non-point sources are paramount in determining water quality in the watershed. MATERIALS AND METHODS Regression Models No systematic basin-wide investigation of the origin of dissolved and suspended substances in the Green River has yet been done. Such a study would be quite useful as a baseline study, both in the accumulation and organization of existing data about water quality and its sources, and in relating present associations of water quality to basin characteristics. Practical applications of such knowledge would be apportioning loadings to a specific source area of the drainage, predicting changes in water quality from changes in basin characteristics such as land use, or investigating if associations of water quality with basin characteristics change with time. Multiple linear regression describes variation of a single dependent variable as a function of variations in several independent variables. In this case, a single water quality parameter is the dependent variable, and its variation is accounted for by the variation in two or more independent variables of physical, chemical, or biological basin characteristics. The general equation (from Edwards 1979) is: Perhaps one of the reasons such a basin-wide investigation has not been done is the sheer size of the area. However, Lystrom, et al. (1978) proposed and used a multiple regression modeling approach to associate various ~asin parameters with water quality in the 27,510 mi Susquehanna River watershed. We report here the results of an investigation of the association of watershed characteristics with water quality in the Green River basin of Wyoming and Utah, using a similar multiple regression technique. The objectives of this project are to: where Y' is the dependent variable, XiS are the independent variables, k the number of independent variables in the equation, and a is the regression constant. By choosing appropriate independent variables (basin parameters), we seek to maximize the correlation between the predicted value of our water quality variable and the actual value of the variable. The basis of our choice of independent variables derives from an interpretation of results from an SPSS (Nie, et al. 1975) multiple regression program, as detailed in Regression, below. Independent Variables In this paper, we've defined an independent variables as the unique numerical measure of some feature of the drainage basin. The five major types of independent variables (also referred to as "basin attributes"), detailed in table 2, roughly correspond to those of Lystrom, et al., but the individual attributes within each of our categories were dictated by the data available for the Green River basin. 1) associate attributes of the Green River watershed with water quality in the Green River system using multiple regression. A prerequisite to this objective is the collection and organization of water quality data and information about the basin which could conceivably affect water quality. 2) estimate water quality changes in Flaming Gorge Reservoir which may be associated with upstream basin characteristics. Much of the data from which we derived basin attributes had to be transformed from maps, charts, or lists. We used a COMPAQ microcomputer with a Houston Instruments 11"x11" digitizer to measure areas from maps, and the LOTUS 123 software (Lotus Development Corporation 1~83) to store and manipulate collected information. Sources of information and a description of their transformation into independent variables follow. 3) achieve these objectives by analyzing data existing in published records, reports, papers, and maps. No field work is required. 202 Reduction of Number of Independent Variables Table 2.--Major categories of basin attributes for the Green River drainage, the number of variables originally within each category, and some examples of independent variables from each category. Basin attribute category GEOLOGY Number of variables 51 SOILS 19 CLIMATE LAND COVER/LAND USE 3 75 HYDROLOGY 16 Number of variables 164 We reduced the number of independent variables form the original 164 by first eliminating variables which were percentages or sums of other variables (except for Geological variables, where we kept the sums and eliminated their components). We make a further reduction in the number of variables by dropping variables which were not significantly related to a water quality variable (p=0.05) in a simple bivariate regression. Thus, for every dependent water quality variable, we had a unique set of independent basin attributes for the multiple regression analysis. Examples Glacial area Tipton shale area Area of Precambrian rock Soil pH K factor Mean minimum temp Area juniper % area of juniper Total range area Bifurcation ratio Total stream length 10 year flood cfs Dependent Variables The Wyoming Water Research Center maintains a copy of the U.S. Geological Survey's surface water quality and discharge data for Wyoming. From this we extracted all water quality data for all sampling stations in the watershed. We selected stations with the greatest number of acceptable water quality parameters. A water quality parameeters was acceptable if it had at least seven years of data between water years 1965 (when Flaming Gorge Reservoir's dam closed) and 1979, with at least one year of data comprised of ten or more samples. Using these criteria, we found only eight water quality variables for at each of eighteen stations. The areas above these eighteen stations also defined the subbasins for which we complied basin attribute values. Geology We calculated areas of all geological formations shown on three hydrologic investigations maps (Welder and McGreevy 1966, Whitcomb and Lowry 1968, and Welder 1968). The area of each formation in each of 18 subbasins (see Dependent Variables) were recorded and areas of geologically similar formations summed as independent variables. The concentration of many water quality parameters depends upon discharge (Lystrom, et al.). For these parameters, mean loads should be calculated as the sum of instantaneous loads derived from the concentration/discharge relationship. For the parameters considered in this report [phosphorus (P), nitrate nitrogen (N0 ), and total dis3 solved solids (TDS)], only TDS concentration showed such a significant relation. We therefore used TDS loads, and phosphorus and nitrate concentrations as our dependent variables in the multiple regression analyses. Soils From Young and Singleton (1977) we found which soil series were represented in soil associations in the watershed and the area of each association in each subbasin. From corresponding soil series data sheets supplied by Munn (1984), we calculated and weighted the characteristics of all soil series within each association by area to obtain the subbasin values. Climate Regression Maps from Lowers (1960) were enlarged and minimum-maximum temperatures, weighted by area, calculated for each subbasin. Land Cover/Land Use Anderson et al. (1984) complied a land cover map of Wyoming from which we obtained values of cover, weighted by area, for each subbasin. Hydrology Hydrological variables were estimated using data taken from U.S. Geological Survey 1:250,000 scale topographic maps of the basin. Areas were obtained with the digitizer, and linear measures with a map measuring wheel. Transformations and calculations were performed within Lotus spreadsheet files. Each of the three water quality parameters had a unique set of associated independent variables. A Pearson correlation analysis (Nie, et al. 1975) was used to investigate intercorrelations among the independent variables prior to the regression analysis. For the regression method we chose Hull and Nie's (1981) stepwise NEW REGRESSION, with probabilities of F-to-enter and F-to-remove at default values of 0.05 and 0.10 respectively. All SPSS analyses were conducted on a Control Data Corporation Cyber 760 computer. Our interpretation of regression results to find the "best" association of water quality with basin attributes hinged on two objective criteria and one somewhat philosophical principle. Our first criterion was that a good regression equation explains the most of the variance about the 2ependent variable (i.e., has a higher adjusted R ), and 203 has a lower measure of error (in this case, a lower residual mean square) than would an equation with a poorer fit. Our second criterion was that the equation minimize combinations of strongly interacting inde~endent variables, as defined by a correlation of r >0.60. Given these criteria, we tempered their strict application by the philosophy that "a relationship may be statistically significant without being substantively important" (Milliken and Johnson 1984). Lystrom, et al. also chose their best models based on other-than-statistical criteria; that is, "conceptual knowledge of the water-quality process. 11 In other words, if a regression was best statistically, but we could find no conceptual reason for the association of its basin attributes with water quality, we chose a statistically less good but conceptually more sensible model. RESULTS From 18 subbasin values for each of a selected set of basin attributes, and eighteen values of three water quality parameter~ taken one at a time, we obtained three regression models with significant and conceptually acceptable relations between the attributes and the parameter (table 3). NITRATE CONCENTRATION illustrates conceptual acceptability over statistical significance. The intercorrelation ratio was comparable to that of PHOSPHORUS CONCENTRATION, but the initial, or best statistical, regression analysis yielded MEAN JULY MAXIMUM TEMPERATURE as the only significant associated attribute. Because we could think of no process associating temperature with NITRATE CONCENTRATION in the basin, we sequentially removed intercorrelated variables and continued regression analyses after each deletion. The best regression we found then, was the one in a set of conceptuall¥ acceptable models which had the highest adjusted R , and lowest residual mean square. DISCUSSION Green River Regression Models Table 3.--Multiple-regression models of basin attributes associated with water quality in the Green River basin. REGRESSION EQUATION intercorrelation ratio is t2e number of attribute significant correlations [r >.60] divided by the number of interactions in the correlation matrix.) TDS LOAD, on the other hand, had a high intercorrelation ratio (0.567), but since the variables first selected by the analysis, which implies that they were statistically best, also were conceptually related to TDS, we accepted this model as best. TOTAL LENGTH OF CHANNELS is~ however, correlated with IRRIGATED CROPLAND (R =0.74), so some caution should be used when applying this model. The PHOSPHORUS CONCENTRATION model is conceptually acceptable because phosphorus (as total phosphorus, measured by the U.S. Geological Survey) is associated with particulate matter in streams. Since K FACTOR is a measure of soil erodability, and FLOOD RATIO and estimate of flooding intensity, we may expect that an increase of either or them could be associated with an increase in particulates and therefore total phosphorus in streams. ADJUSTED R2 RESIDUAL MEAN SQUARE /I ATTRIBUTES CONSIDERED PHOSPHORUS CONCENTRATION (mg/l) = -0.144 + 0.563(K FACTOR) +0.0393(FLOOD RATIO) 0.978/0.000/24 For NITRATE CONCENTRATION, positive association of a geologic variable (CRETACEOUS ROCK) and an estimate of flood intensity (FLOOD RATIO) with dissolved nitrate in a river would be expected if the Cretaceous rock bears minerals high in nitrate or perhaps other nitrogenous compounds. The predictors and their relationship to nitrate concentration are therefore conceptually acceptable, although we have not yet investigated whether the mineral components of the Cretaceous rock formations in the subbasin~ are in fact nitrogeneous. NITRATE CONCENTRATION(mg/l) -2.30 + 2.71(FLOOD RATIO) + 0.0043(CRETACEOU ROCK [mi ]) 2 0.893/0.442/5 TDS LOAD (tons/year) = 9730 + 36.5(TOTAL LENGTH OF CHANN~LS [mil) + 493(IRRIGATED CROPLAND ~mi ]) + 135(MIXED RANGELAND [mi ]) 8 0.993/2.21x10 /27 In table 3, FLOOD RATIO is the quotient of the average 10-year flood divided by the maximum discharge recorded for the study period (water years 1965 to 1979). The 5 attributes considered for the nitrate concentration model are a nonintercorrelated subset of an original set of 16 attributes contain2 ing some highly intercorrelated members (r >0.60). The results of the regression analyses illustrate the application of our criteria for acceptance of a regression model. PHOSPHORUS CONCENTRATION had a relatively low intercorrelation ratio (0.323), and the variables first selected by the regression analysis made sense conceptually. (The The TDS LOAD model was the only model incorporating land use parameters as predictors of water quality. A positive association of TDS with irrigated cropland is not unexpected, since TDS increases from 223 mg/l (3359 tons/year) above an irrigated area on the Big Sandy River to 2630 mg/l (147,000 tons/year) below it. A disturbed MIXED RANGELAND also could increase TDS LOAD if infiltration increased as a result of reduced plant cover. Increases in TOTAL LENGTH OF CHANNELS may imply an increased chance of infiltrating precipitation being captured by a stream and measured in a sample rather than being "lost" to deeper groundwater. Application of the Techniques There are several advantages in applying mult- 204 iple regression techniques to water quality data from the Green River basin. First, since existing data were on hand or readily available from published sources, no field work was required, reducing costs. This is not a trivial advantage considering the large basin area. Fannin, Timothy E. 1983. Wyoming lake classification and survey, volume 1. 226pp. Wyoming Game and Fish Department, Cheyenne, WY. Hull, C. Hadlai, and Norman H. Nie. 1981. SPSS update 7-9: new procedures and facilities for releases 7-9. 402pp. McGraw-Hill Book Company, New York, NY. Secondly, from the results of the mUltiple Lowers, A.R., 1960. Climate of Wyoming. Climatoregression analyses, we have a smaller set of basin graphy of the United States H60-48. U.S. parameters to investigate if we wish to determine Department of Commerce, Washington, DC. cause-effect relations between attributes of the Lotus Development Corporation. 1983. Lotus 123. drainage and water quality. Multiple regression is Lotus Development Corporation, Cambridge, MA. an associative technique; simply because an attriLystrom, David J., Frank A. Rinella, David A. Rickert, bute is associated with a water quality parameter and Lisa Zimmerman. 1978. Multiple regression does not mean that a change in the attribute will modeling approach for regional water quality necessarily cause a change in the parameter. This management. EPA-600/7-78-198. 60pp. Environnomination of important parameters is also a nonmental Research Laboratory, Athens, GA. trivial advantage given that we found,from existMilliken, George A. and Dallas E. Johnson. 1984. ing data, 164 basin attributes. The analysis of messy data, volume 1: designed experiments. 473pp. Lifetime Learning Thirdly, if we assume a cause-effect relationPublications, Belmont, CA. ship between basin attributes and water quality, Munn, Larry. 1984. Personal correspondence. and if the models have been tested and verified, we Department of Plant Science, University of may use them to predict changes in water quality Wyoming, Laramie, WY. from changes in the basin attributes. this is not Nie, Norman H., C. Hadlai Hull, Jean G. Jenkins, useful for relatively non-static basin attributes. Karin Steinbrenner,and Dale H. Brent. 1975. The area of the drainage underlain by Cretaceous SPSS: statistical package for the social rock is not as likely to change as the area under sciences. 675pp. McGraw-Hill Book Company, New different land cover. We must note that the three York, NY. models we propose have not yet been tested or verParker, Michael, Wayne A. Hubert, and Steve Greb. ified; they should not be used as predictive equa1984. A preliminary assessment of eutrophications. tion in Flaming Gorge Reservoir, Denise Bierley, ed. 46pp.+app. Wyoming Water DevelFinally, the water quality database for the opment Commission, Cheyenne, WY, and Wyoming portion of the Green River basin we studied was, Water Research Center, Laramie, WY. compared by Lystrom et al., very sparse. In order Southwestern Wyoming Water Quality Planning Assocto get 18 stations, two less than the minimum they iation. 1978. Clean water report for southrecommend, we had to liberalize our criteria of western Wyoming, 313pp. CH2M Hill, Denver, CO. choice twice. We increased the temporal width of U.S. Environmental Protection Agency. 1977. Report the study from 10 years to 15, and reduced the on Flaming Gorge Reservoir, Sweetwater County, proportion of years for which we required at least Wyoming and Daggett County, Utah, EPA Region one data point from 10 to 7. Lystrom, et al. also VIII. National Eutrophication Survey Working extrapolated the data from a single year with at Paper H885. least 10 seasonally spaced samples to their entire U.S. Bureau of the Census. 1981. 1980 census of the 10-year study. In the arid and semi-arid West, population; vol. 1: characteristics of the where year-to-year va~iation in precipitation can population; chapter A: number of inhabitants, be significant, we did not feel that extrapolating. part 52 Wyoming. ·PC80-11-A52. U.S. Bureau of a single year of data to the entire study period the Census, Washington, DC. was wise. We feel that the scarcity of water qualWagner, John. 1984. Personal correspondence. ity data found in this study would be typical of Wyoming Departmant of Environmental Quality, other western drainages with low human population Cheyenne, WY. densities and little cropland agriculture. Welder, George E. 1968. Ground-water reconnaissance of the Green River basin--southwestern Wyoming. Hydrologic Investigations Atlas HA-290. U.S. LITERATURE CITED Geological Survey, Washington, DC. , and Laurence J. McGreevy. 1966. GroundAnderson, S.H., and D.B. Inkely. 1984. Wyoming land cover map. Wyoming Cooperative Fish and Wildlife -----water reconnaissance of the Great Divide and Washakie basins and some adjacent areas, Unit, University of Wyoming, Laramie, WY. southwestern Wyoming. Hydrologic InvestigaDeLong, Lewis L. 1977. An analysis of salinity in tions Atlas HA-219, reprinted 1981. U.S. Geostreams of the Green River basin, Wyoming. logical Survey, Washington, DC. U.S. Geological Survey Water-Resources InvestWhitcomb, Harold A. and Martin E. Lowry. 1968. igations $77-103. 32pp. USGS Water Resources Ground-water resources and g~ology of the Wind Division, Cheyenne, WY. River basin area, central Wyoming. Hydrologic Edwards, Allen L. 1979. Multiple regression and Investigations Atlas HA-270. U.S. Geological the analysis of variance and covariance. 212 Survey, Washington, DC. pp. W.H. Freeman and Co., San Francisco, CA. 205