USING MULTIPLE LINEAR REGRESSION TO PREDICT REMOVAL OF COPPER AND ZINC ON VEGETATIVE BIOFILTER STRIPS TREATING HIGHWAY STORMWATER RUNOFF Jessica Marie Wagner B.S., University of California, Davis, 2001 PROJECT Submitted in partial satisfaction of the requirements for the degree of MASTER OF SCIENCE in CIVIL ENGINEERING at CALIFORNIA STATE UNIVERSITY, SACRAMENTO FALL 2011 USING MULTIPLE LINEAR REGRESSION TO PREDICT REMOVAL OF COPPER AND ZINC ON VEGETATIVE BIOFILTER STRIPS TREATING HIGHWAY STORMWATER RUNOFF A Project by Jessica Marie Wagner Approved by: __________________________________, Committee Chair John Johnston, Ph.D., P.E. __________________________________, Second Reader Dipen Patel, Ph.D. ____________________________ Date ii Student: Jessica Marie Wagner I certify that this student has met the requirements for format contained in the University format manual, and that this project is suitable for shelving in the Library and credit is to be awarded for the Project. __________________________, Graduate Coordinator Cyrus Aryani, Ph.D., P.E., G.E. Department of Civil Engineering iii ________________ Date Abstract of USING MULTIPLE LINEAR REGRESSION TO PREDICT REMOVAL OF COPPER AND ZINC ON VEGETATIVE BIOFILTER STRIPS TREATING HIGHWAY STORMWATER RUNOFF by Jessica Marie Wagner This study was conducted to determine if equations could be derived using multiple linear regression (MLR) to predict the removal of total and dissolved copper and zinc by vegetated biofilter strips treating highway runoff. The regression analysis was based on the data set collected during the Caltrans Roadside Vegetative Treatment Site (RVTS) Study. Eight RVTS study sites distributed across California and storms from several years were included. The predictors chosen for the MLR analysis were strip slope, strip width, vegetation coverage, percentage of clay content in the soil, rainfall duration, total event precipitation, and antecedent dry days. The predictands chosen were effluent concentration (Ce), concentration reduction (Ci-Ce), and fraction of concentration remaining (Ce/Ci). In a second analysis, a first order removal model was assumed to fit the field data, resulting in concentrations that should decline exponentially with strip width. MLR analysis was used to develop predictive equations for the exponential decay coefficients based on the same predictors (except width). Regression models were evaluated using criteria such as the value of the coefficient of determination (R2), whether iv or not the sign of the predictor coefficient matched expectations from physical processes and how well the equations conformed to MLR assumptions. Not all predictors proved to be statistically significant. Predictive equations were produced for effluent concentrations of total copper, total zinc, and dissolved zinc based on vegetation coverage, rainfall duration, total event precipitation, and antecedent dry days. The best equation for effluent dissolved copper concentrations was based on only vegetation coverage, rainfall duration, and antecedent dry days. The coefficient of determination (R2) values for these equations were 0.348 to 0.523. R2 values for the best equations predicting Ci-Ce and Ce/Ci were much lower. When the predictive Ce equations were graphed against vegetation coverage, dissolved copper and zinc concentrations are higher than those for total copper and zinc, thus these equations are unreliable. For the first order decay coefficient, the best fit equation had a low R2 value and the regression model could not meet all the assumptions of the MLR. Thus, these equations are also unreliable. _______________________, Committee Chair John Johnston, Ph.D., P.E. _______________________ Date v TABLE OF CONTENTS Page List of Tables ............................................................................................................. vii List of Figures ........................................................................................................... viii Chapter 1. INTRODUCTION …….………………………………………………………… .1 2. BACKGROUND OF THE STUDY ....................................................................... 4 2.1 Vegetated Biofilter Strips ........................................................................... 4 2.2 Mechanisms ................................................................................................ 7 2.3 Caltrans Roadside Vegetated Treatment Sites (RVTS)……………… .... 11 2.4 Biofilter Strip Performance ....................................................................... 16 3. METHODOLOGY ..........................................................................................…. 18 3.1 RVTS Study Data Analysis ................................……………………….. 18 3.2 Predictors for Multiple Linear Regression ..........……………………….. 20 3.3 Predictands for Multiple Linear Regression .......……………………….. 23 3.4 Using Multiple Linear Regression to Determine First Order Coefficient 23 3.5 Multiple Linear Regression.................................……………………….. 25 4. RESULTS AND DISCUSSION ........................................................................... 32 4.1 Basic Multiple Linear Regression .......................……………………….. 32 4.2 Multiple Linear Regression for First Order Decay .…………………….. 47 4.3 Discussion ...........................................................……………………….. 51 5. CONCLUSION…………………………………………………………….….... 56 Appendix A RVTS Storm Data ................................................................................ 57 Appendix B RVTS Input Data for MLR................................................................... 58 Appendix C Multiple Linear Regression Reports from JMP® ................................ 59 Appendix D MLR Trials ........................................................................................... 76 References ................................................................................................................... 83 vi LIST OF TABLES Tables Page 1. Table 1 RVTS Study Site Characteristics ... ………….……………………….12 2. Table 2 Storm Data Qualifiers .......................................................................... 19 3. Table 3 Predictors for Multiple Linear Regression.. …………………..…….. 22 4. Table 4a Coefficient Values for Models with R2>0.250 for Total Copper and Dissolved Copper .............................................................................................. 34 5. Table 4b Coefficient Values for Models with R2>0.250 for Total Zinc and Dissolved Zinc .................................................................................................. 35 6. Table 5 Criteria for Predictor Evaluation of Ce and Ci-Ce ................................ 39 7. Table 6 Coefficient Values for Total Copper, Dissolved Copper, Total Zinc and Dissolved Zinc .................................................................................................. 48 vii LIST OF FIGURES Figures Page 1. Figure 1 Vegetated Biofilter Strip Design…… .……………………………….5 2. Figure 2 Schematic of RVTS Test Site…… ... .……………………………….14 3. Figure 3 Dissolved Zinc Removal as a Function of Width (Sacramento) ........ 25 4. Figure 4 Scatter Plot of Residuals for a Linear and Non-linear Relationship .. 27 5. Figure 5 Scatter Plot of Residuals Showing Homoscedasticity and Heteroscedasticity………. ............................ …………………………………29 6. Figure 6 Total Zinc Residual by Predicted Plot (trial 12 for Ce) .................. …42 7. Figure 7 Total Zinc Normality Test for ln(Ce) (trial 12)............................... …43 8. Figure 8 Total Copper Predicted Ce vs. Actual Ce ........................................ …44 9. Figure 9 Dissolved Copper Predicted Ce vs. Actual Ce ................................ …45 10. Figure 10 Total Zinc Predicted Ce vs. Actual Ce .......................................... …46 11. Figure 11 Dissolved Zinc Predicted Ce vs. Actual Ce ................................... …47 12. Figure 12 Total Zinc Residual by Predicted Plot (trial 8) ............................. …50 13. Figure 13 Total Zinc Normality Test for ln(k) (trial 8) ................................ …50 14. Figure 14 Predictive Ce vs. Vegetation Cover (assumed values were antecedent dry days at 1 day, rainfall duration at 1 hour, and total event precipitation at 24 mm) ............................................................................................................... …52 viii 1 Chapter 1 INTRODUCTION The purpose of the federal Clean Water Act is to stop pollutants from being discharged into waterways and to maintain water quality to provide a safe environment for fishing and swimming. In 1987, the Act was amended to require the Environmental Protection Agency (EPA) to establish a program to address stormwater discharges. As part of the Act, the National Pollution Discharge Elimination System (NPDES) permit program was implemented, which regulates the discharge of pollutants from point sources to waters. Many California Department of Transportation (Caltrans) properties and facilities fall under the jurisdiction of the NPDES permitting system. Caltrans maintains approximately 15,000 miles of highway, 12,000 bridges, and more than 230,000 acres of right-of-way, so the potential for serious pollution problems is elevated. This is due to water washing materials such as oil, grease, and litter from highways, streets, and gutters into Municipal Separate Storm Sewer Systems (MS4s) and eventually into rivers, lakes, and the ocean from every rain event (Brice & Starring, 2002). To meet the requirements of its NPDES permit, Caltrans created a Storm Water Management Plan (SWMP) in 2003, which guides all Caltrans activities related to stormwater control and treatment (Caltrans, 2003). Caltrans is concerned that chemical constituents in highway stormwater runoff, particularly copper and zinc, will cause water quality impairments in receiving waters. Zinc in highway stormwater runoff comes from crankcase and lubricating oils, grease, 2 tire wear, and decorative and protective coatings (Yousef et al., 1985). When zinc reaches a concentration of 5.6 μg/L or greater in receiving waters, it affects salmon by altering behavior, blood and serum chemistry, impairing reproduction, and reducing growth (Sprague, 1968). Copper in highway stormwater runoff comes from bearing wear, brake lining wear, and decorative and protective coatings (Yousef et al., 1985). When copper reaches concentrations of 0.18 to 2.1 μg/L in receiving waters, it affects salmon by disrupting the salmonid smoltification process, interfering with fish sensory systems, inhibiting their ability to avoid predators and migrate, and slowing juvenile growth (Hecht et al., 2007). Copper affects the sense of smell in fish by competing with natural odorants for binding sites in the olfactory tissue, which in turn affects the activation of the olfactory receptor neurons or intracellular signaling in the neurons (Solomon, 2009). Fish rely on their sense of smell to find food, avoid predators, and migrate. In other aquatic organisms, copper causes gills to fray and lose their ability to regulate the transport of salts such as sodium chloride and potassium chloride into and out of the fish. These salts are important for the normal functioning of the cardiovascular and nervous systems. When the salt balance between the body of a copper-exposed fish and the surrounding water is disrupted, the fish can die (Solomon, 2009). The reproduction rates of aquatic life such as sea scallops and fathead minnows are also affected by copper (Solomon, 2009). Thus, contaminating receiving waters with zinc and copper is a huge concern for not only the salmon population, but also other aquatic organisms. 3 Vegetative biofilters are a best management practice used to treat stormwater before it is discharged to natural receiving waters. In laboratory studies, vegetative biofilters were found to remove heavy metals at rates in excess of 90% (Blecken et al., 2009). In field studies, the California Department of Transportation (Caltrans) monitored ten vegetative biofilter sites to evaluate heavy metal removal from highway runoff. Zinc and copper were removed, but the results varied from site to site. It is not certain which parameters of biofilter strips and storms are significant in the removal of copper and zinc. In this project, data collected during the Caltrans’ Roadside Vegetated Treatment Sites (RVTS) study will be used in a multiple linear regression (MLR) analysis of the biofilter parameters. The parameters considered are strip width, strip slope, average strip vegetation coverage, percentage of clay content in the soil, antecedent dry days, total event precipitation, and rainfall duration. The MLR results will suggest which parameters are significant in copper and zinc removal. With these parameters, an equation will be developed to predict the removal of copper and zinc on the vegetative biofilter strips. 4 Chapter 2 BACKGROUND OF THE STUDY This chapter provides background information on vegetated treatment systems, the California Department of Transportation’s Roadside Vegetated Treatment Sites (RVTS) study and additional studies, which were conducted on vegetative biofilter strips. 2.1 Vegetated Biofilter Strips As defined by the Environmental Protection Agency (EPA) a vegetated biofilter strip is a permanent, maintained strip of planted or indigenous vegetation located between nonpoint sources of pollution and receiving water bodies for the purpose of removing or mitigating the effects of nonpoint source pollutants (Pennsylvania, 2006). The purpose of a biofilter strip is to pass water over a vegetated surface, remove pollutants by a variety of mechanisms, and provide an opportunity for incidental infiltration of runoff. Biofilter strips function by slowing runoff velocities and filtering out sediment and other pollutants and by providing some infiltration into underlying soils. Pollutants removed include nutrients, heavy metals, toxic materials, floatable materials, oxygen demanding substances, oil, and grease (Ventura, 2009). Biofilters act most effectively when they are designed to receive sheet flow from paved areas, (e.g. highways), and maximize water contact within the biofilter vegetation and soil surface (Ventura, 2009). Various criteria should be considered when designing a vegetated biofilter strip (see Figure 1). The California Department of Transportation (Caltrans) recommends a preferred slope of 6% with a width of 5 feet to 150 feet and a flow depth of 1 inch (Caltrans, 2007). The Washington Department of Transportation suggests a slope of 2% 5 to 15% and states the minimum width is dictated by the runoff treatment design flow velocity where the maximum flow depth is 1 inch (WSDOT, 2010). The Texas Transportation Institute suggests a slope of 1% to 20% with 2% to 6% slope being preferred, a minimum width of 24 feet with a flow velocity of 0.5 to 1 ft/s, vegetation density of 80% to 90%, and a flow depth of 1 inch (Storey et al., 2009). Grismer et al. (2006) suggest a slope of 1% - 3% should have a minimum width of 25 feet, a slope of 4% - 7% should have a minimum width of 35 feet, and a slope of 8% - 10% should have a minimum width of 50 feet. Grismer et al. (2006) also suggest that sturdy, tall perennial grasses do the best job of trapping sediment. The Caltrans Roadside Vegetated Treatment Sites (RVTS) Study (2003) recommends a minimum of 65% vegetation coverage but observed that a rapid decline in performance occurs below 80% vegetation coverage. Figure 1 Vegetated Biofilter Strip Design (Shoemaker et al., 2002) There are advantages in considering a vegetated biofilter strip as a best management practice (BMP), including the following: 6 Biofilter strips require minimal maintenance including erosion prevention and mowing. Biofilter strips provide reliable water quality benefits in conjunction with the high aesthetic appeal. Biofilter strips are cost affective. Biofilter strips were among the best performers in reducing sediment and heavy metals in the BMP Retrofit Pilot Program (Caltrans, 2004). Biofilter strips are well suited to being part of a “treatment train” system of BMP’s and should be considered whenever siting other BMP’s that could benefit from pretreatment, especially infiltration basins and infiltration trenches (Caltrans, 2004). There are also limitations to biofilter strips, including the following: Biofilter strips are not appropriate for industrial sites or locations where spills may occur. A thick vegetative cover is needed for these practices to function properly, generally 65% to 80% coverage (Caltrans, 2003). Sheet flow must be maintained for the strip to be effective. Biofilter strips are impracticable in watersheds where open land is scarce or expensive. The width of a biofilter strip must be adequate and flow characteristics acceptable or water quality performance can be severely limited. 7 Biofilter strips do not provide significant attenuation of the increased volume and flow rate of runoff during intense rain events. 2.2 Mechanisms The design of the vegetated biofilter strip should focus on setting up conditions that facilitate the removal mechanisms for the reduction of total suspended solids (TSS), hydrocarbons, heavy metals, and nutrients. As water flows through a filter strip, pollutants are removed by filtering, infiltration, and settling of particulates due to the slow water velocity. The filter strip has a high sediment trapping efficiency as long as the flow is shallow and uniform and the filter is only submerged during a rain event (Dillaha et al., 1986). A pea gravel diaphragm, or a small trench, should be used at the top of the slope to act as a pretreatment device settling out sediment particles before they reach the strip and to act as a level spreader establishing sheet flow as runoff enters the filter strip (see Figure 1). The top and toe of the slope should be as flat as possible to encourage sheet flow and prevent erosion (US EPA, 2006). To achieve the highest reduction of pollutants, the biofilter strip should be designed with three layers, a root zone, a subsoil horizon, and surface vegetation. The root zone will allow high infiltration rates via macropores that arise with the generally improved soil structure (Grismer et al., 2006). The subsoil horizon must have moderate permeability and fertility with adequate organic matter content and sufficient strength to support both plants and the topsoil (France, 2002). The ideal infiltration rate of the soil is around ½ inch per hour (Cahill et al., 2011). The vegetation of the filter strip slows the velocity of runoff, stabilizing the slope, and stabilizing accumulated sediment in the root zone of the plants (Caltrans, 8 2003). Decreasing the flow volume and velocity facilitates sediment deposition on the filter because of a decrease in transport capacity. As sediment from runoff is deposited in vegetated zones, sediment-bound nutrients and metals are also removed. Trapping efficiencies in the Caltrans RVTS study is 14% for zinc and 18% for copper (Caltrans, 2008). The mechanisms that assist in metal removal are adsorption, ion exchange, precipitation, complexation, and phytostabilization. Heavy metals are removed initially by short-term processes, such as adsorption and ion exchange. Adsorption is the accumulation of ions at the interface between a solid phase and an aqueous phase. The adsorption of heavy metal ions (e.g. zinc or copper ions), is dependent upon the properties of the soil including clay and organic fractions, pH, water content, temperature of soil, and properties of the metal ion. The solid state of soils composes an average of 45% of soil bulk and consists of mineral particles, organic matter, and organic-mineral particles (Dube et al., 2001). These all play an important role in giving soil the ability to adsorb metal ions. The inorganic colloidal fraction of soil is the most responsible of adsorption by it mineral particles including clay minerals, oxides, sesquioxides and hydrous oxides of minerals (Dube et al., 2001). The clay minerals, montmorillonite, kaolinite, and iron and manganese oxides in the soil adsorb metals. Clay particles are usually negatively charged which is an important factor influencing the sorption properties of the soil. The binding forces between heavy metals and soil fractions are dependent on pH and ion properties such as charge. The binding forces of metal ions to soil particles decrease with increasing pH of the environment (Dube et al., 2001). 9 The affinity for binding heavy metals varies between different soil mineral constituencies and organic material (Dube et al., 2001). There are different types of clay minerals that can be present in soil including Sepiolite (Orera), Sepiolite (Vallecas), Bentonite, Palygorskite (Bercimuelle), Palygorskite (Torejon), Illite, and Kaolin. GarciaSanchez, et. al. (1998) found that the adsorptive capacity of these minerals are in the order of Sepiolite (Orera) > Sepiolite (Vallecas) > Bentonite > Palygorskite (Bercimuelle) > Illite > Kaolin > Palygorskite (Torejon). The adsorptive capacity of the clay minerals depends on the inner crystal structure features, as well as the surface area of the particle where the greatest surface area shows the strongest adsorptive capacity. Soil organic material consists of living organisms, biochemicals, and insoluble humic substances. Biochemicals, such as amino acids, proteins, carbohydrates, organic acids, lignin, etc, and humic substances, such as insoluble polymers of aliphatic and aromatic substances produced through microbial action, provide sites for metal sorption. The binding of metals to organic matter involves a continuum of reactive sites ranging from weak forces of attraction to formation of strong chemical bonds (McLean & Bledsoe, 1992). Organic matter decreases with depth, so the mineral constituents of soil will become a more important surface for adsorption. Ion exchange takes place where there are negatively charged clay or organic matter in the soil that attract positively charged cations, such as zinc and copper, through electrostatic forces. The attractive forces of the positive metal ions to the negatively charged clay particles are strong enough to hold the metal ions in the soil despite the passage of water through the soil (McLean & Bledsoe, 1992). In ion exchange, the 10 metals displace other ions of the same valence or multiple ions of lesser valence, thereby not altering the surface charge. Another mechanism of metal removal is precipitation. Metals may precipitate to form inorganic compounds including metal oxides, hydroxides, and carbonates (McLean & Bledsoe, 1992). Precipitation is pH dependent, occurring mainly when the pH is greater than 7.5. The precipitates formed will immobilize the metals if the pH drops below 6.5, however, it has been found that the metals will be released back into the system (McLean & Bledsoe, 1992). Complexation changes the adsorptive properties of metals. Metal cations form soluble complexes with inorganic and organic ligands. Inorganic ligands include SO42-, Cl-, OH-, PO43-, NO3-, and CO32-. Organic ligands include low molecular weight aliphatic, aromatic, and amino acids and soluble constituents of fulvic acids. The presence of complex species in the soil solution can significantly affect the transport of metals through the soil. With complexation, the resulting metal species may be positively or negatively charged or be electrically neutral. When metals are complexed with inorganic ligands, the positive charge on the complexed metal decreases, reducing its ability to adsorb to a negatively charged surface (McLean & Bledsoe, 1992). In soils where the organic ligand adsorbs to the soil surface, metal adsorption may be enhanced by the complexation of the metal to the surface-adsorbed ligand (McLean & Bledsoe, 1992). Phytostabilization is the immobilization of a contaminant in soil through adsorption and accumulation by roots, adsorption onto roots, or precipitation within the 11 root zone of plants (Brookhaven National Laboratory, 2008). The metals must be bioavailable or ready to be adsorbed by roots. Bioavailability depends on metal solubility in soil solution. Only metals associated as free metal ions, soluble metal complexes, or adsorbed to inorganic soil constituents at ion exchange sites are readily available for plant uptake (Lasat, 2000). Metals that are bound to soil organic matter, precipitated as oxides, hydroxides, or carbonates, and are embedded in the structure of the silicate minerals are not readily available for plant uptake (Lasat, 2000). The plant species tolerance, biological cycle, and ability to grow on unvegetated soils are characteristics that may contribute to the success of the stabilization of plants in soils contaminated with heavy metals. Plants of biofilters have been shown to accumulate 10% of the metals trapped in the soil by other mechanisms (Blecken et al., 2009). 2.3 Caltrans Roadside Vegetated Treatment Sites (RVTS) The Roadside Vegetated Treatment Sites (RVTS) study was an eight-year water quality monitoring project that was started to evaluate the pollutant removal efficiency of existing vegetated side slopes adjacent to freeways. The primary objective of this study was to determine if standard roadway design requirements result in vegetated side slopes with stormwater treatment capabilities equivalent to biofiltration strips specifically engineered for water quality performance. The monitoring of the RVTS sites took place during five wet seasons including October 2001 to April 2002, October 2002 to April 2003, January 2006 to May 2006, October 2006 to April 2007 and October 2007 to April 2008. The ten study sites were located in Sacramento, Redding, Cottonwood, San Rafael, Yorba Linda, Irvine, San Onofre, Murrieta, and Moreno Valley (two sites: A and B). The 12 study includes 31 strips (Caltrans, 2008). The Murrieta and Moreno Valley B sites were new for the 2007-2008 monitoring year. Since these sites were not as established as the other sites and data such as vegetation coverage and clay content were not available, these sites are not included in the multiple linear regression analysis performed in this project. The site characteristics of the eight strips included in this project are shown in Table 1. Table 1 RVTS Study Site Characteristics (Caltrans, 2008) Site Location Cottonwood Irvine Sampling Freeway Location Site ID I-5, Southbound 2-201 near Cottonwood 2-202 Exit I-405, Northbound at Sand Canyon Ave Exit SR-60, Eastbound Moreno Valley A at Frederick St onramp Redding SR-99, Eastbound near Old Oregon Trail/Shasta College Exits Vegetation Clay Coverage Content (%) (%) Widtha (m) Slope (%) EOP b EOP b EOP b EOP b 9.3 52 75 14.4 b EOP b EOP b 12-230 EOP 12-231 3.0 11 75 15.2 12-232 6.0 11 63 23.8 12-233 13.0 11 68 33.4 8-201 8-202 EOP b 2.6 EOP b 13 EOP b 7 EOP b 18.2 8-203 4.9 13 19 17.3 8-204 8.0 13 26 24.4 8-205 9.9 13 b EOP 10 EOP b 22 b EOP 88 16.1 b EOP b 11.6 2-203 2-204 EOP 2.2 2-205 4.2 10 87 10.3 2-206 6.2 10 82 12.5 13 Table 1 (continued) Site Location Sacramento San Onofre San Rafael Yorba Linda Width (m) Slope (%) 3-213 EOP b EOP b EOP b EOP b 3-214 1.1 5c Sampling Freeway Location Site ID I-5, Northbound between Pocket and Laguna Exits I-5, Northbound near Basilone Exit I-101, Northbound at St. Vincent onramp SR-91, Eastbound between Weir Canyon Road and SR-241 Exits Vegetation Clay Coverage Content (%) (%) a 91 11.3 c 3-215 4.6 33 86 31.6 3-216 6.6 33 92 31.0 3-217 8.4 33 90 25.0 11-204 EOP b EOP b EOP b EOP b 11-205 1.3 8 81 17.2 11-206 5.3 10 76 16.1 11-207 9.9 16 b EOP 73 b EOP 22.6 b EOP b 4-213 EOP 4-214 8.3 50 84 20.8 12-225 EOP b EOP b EOP b EOP b 12-226 2.3 14 66 18.5 12-227 12-228 12-229 5.4 7.6 13.0 14 14 14 85 79 79 21.2 22.2 16.2 a Width is the distance from the edge of pavement to the collection v-ditch EOP is the edge of pavement, the top of the biofilter slope c Slope is 5% for the first 3 meters b Figure 2 shows a schematic of the RVTS sites. The collection system at each location consisted of a concrete v-ditch constructed parallel to the roadway to capture highway runoff after it passed through existing vegetated slopes of varying widths perpendicular to the roadway. The concrete v-ditch varied in length from 19 to 30 meters. 14 Figure 2 Schematic of RVTS Test Site (adapted from Caltrans, 2008) In the 2008 RVTS study, an Analysis of Variance (ANOVA) was performed using a 95% confidence level on the RVTS data from four locations (Sacramento, Redding, San Onofre, and Yorba Linda) to determine whether statistically significant differences exist between concentration levels of constituents at the edge of pavement and the strips. The results for dissolved copper showed 39.5% significant reduction in concentration level at the 11-207 strip in San Onofre. The remainder of the strips had a 25% non-significant reduction in concentration for dissolved copper. The results for total copper showed 52.7% significant reduction in concentration level at the 3-215, 3-216, and 3-217 strips in Sacramento and the 12-228 and 12-229 strips in Yorba Linda. The remainder of the strips had a 34% non-significant reduction in concentration for total copper. ANOVA results for dissolved zinc showed 81% significant reduction in concentration level at all 4 strips, not including the edge of pavement, for Yorba Linda. The remainder of the strips had a 37% non-significant reduction in concentration for 15 dissolved zinc. ANOVA results for total zinc showed 68% significant reduction in concentration level at all the strips, not including the edge of pavement, for Sacramento and San Onofre and the 2-204 and 2-205 strips in Redding. The remainder of the strips had a 66% non-significant reduction in concentration for total zinc. In the 2008 RVTS study, a preliminary Multiple Linear Regression (MLR) analysis was performed to investigate how various factors affect the performance of the vegetative strips in terms of concentration. This study tested thirteen constituents, including copper and zinc. The predictors that were selected for the MLR analysis included influent concentration, strip width, strip slope, average strip vegetative cover, hydraulic residence time, infiltration rate, antecedent dry days, total event precipitation and rainfall duration. A data set was compiled for each of the thirteen constituents, which included concentration, concentration differences between the edge of pavement and the strip, and associated site characteristics. The data sets were natural-logarithmtransformed and compiled into an input table for the MLR analysis. The initial conclusions for this analysis show that the edge of pavement concentration, infiltration rate, and rainfall duration have a significant impact on strip concentration for total and dissolved copper and zinc; slope and vegetative cover have a significant impact on dissolved copper; and antecedent dry days have a significant impact on dissolved zinc. These are only observations because the MLR analysis lacked a statistical analysis to validate and verify the assumptions of linearity, normality, collinearity, homoscedasticity, and constant variance (Caltrans, 2008). 16 2.4 Biofilter Strip Performance In 2004, Caltrans completed a BMP Retrofit Pilot Program where three biofiltration strips were monitored in Southern California. Two of the strips were located at maintenance stations and the other strip was located along a highway (I-605/SR-91). The design criteria for these strips included a slope of no more than 12 percent (actual slope obtained was 1% to 3%), a minimum width in the direction of flow of 8 meters, no gullies or rills that could concentrate overland flow, and a top edge level with the plane of the adjacent pavement. These biofilter strips removed 85% of total copper, 72% of total zinc, 65% of dissolved copper, and 53% of dissolved zinc. Yousef et al. (1985) conducted research on two grass swales in Florida located at the Maitland/I-4 Interchange and the EPCOT Interchange. The slopes of the sites were 8% to 29% and the areas were predominately covered with Bahia grass. The Maitland Interchange sampling covered an eight-month period between August 1982 and March 1983 and samples were taken from six stations. A sampling sharp crested 90⁰ v-notch weir was constructed at 53 m from the pavement. Three experiments were conducted at the Maitland Interchange at different flow velocities. At a flow velocity of 2.58 m/min, 77% of the dissolved zinc but only about 20% of the dissolved copper were removed. At a flow velocity of 1.37 m/min, 92% of the dissolved zinc and 60% of the dissolved copper was removed. Finally, at a flow velocity of 0.90 m/min, 90% of the dissolved zinc was removed but an accurate analysis of dissolved copper was not achieved in this experiment. The EPCOT interchange was constructed with two sampling sharp crested 90⁰ v-notch weirs at 90 m and 170 m from the pavement. The grass coverage at this site 17 was about 80%. At the EPCOT interchange site, with a flow velocity of 2.44 m/min and a distance of 170 meters, 65% of the dissolved zinc was removed but the removal of dissolved copper was minimal. 18 Chapter 3 METHODOLOGY Multiple linear regression (MLR) is a method used to model the linear relationship between a dependent variable (predictand) and one or more independent variables (predictors). The predictors used in the study are site characteristics of the biofilter strips analyzed during the Roadside Vegetated Treatments Study (RVTS) and storm event data. A variety of predictands have been calculated using the RVTS study data obtained from the Caltrans database. These are described below. The predictors and predictands were analyzed by means of multiple linear regression (MLR) using the JMP® 8 statistical program. The method for analyzing the data is described in this chapter. 3.1 RVTS Study Data Analysis The data in this study were obtained from the Caltrans database, which contains information from the Roadside Vegetated Treatments Study (RVTS). From the Caltrans database, the RVTS study information containing storm information and the amount of copper and zinc found in the influent and effluent waters were downloaded. These data can be found in Appendix A. The data were formatted into a spreadsheet and evaluated for quality assurance to assure that they meet a standard of quality needed to evaluate the effectiveness of the filter strips in removal of copper and zinc. All storm data were rejected or deemed unusable if they had one or more of the qualifiers listed in Table 2 (Caltrans, 2008). 19 Table 2 Storm Data Qualifiers (Caltrans, 2008) Data Qualifier Reason for Rejection U The analyte was analyzed for, but was not detected above the level of the reported sample quantitation limit. J The result is an estimated quantity. The associated numerical value is the approximate concentration of the analyte in the sample. J+ The result is an estimated quantity, but the result may be biased high. J- The result is an estimated quantity, but the result may be biased low. R The data are unusable. The sample results are rejected due to serious deficiencies in meeting quality assurance criteria. The analyte may or may not be present in the sample. UJ The analyte was analyzed for, but was not detected. The reported quantitation limit is approximate and may be inaccurate or imprecise. In addition to the qualifiers in Table 2, storm data were also rejected for the following reasons: The quantity given was based on a "less than" quantity rather than an "equal to" quantity which means the value of copper or zinc listed is not a true reading. The amount of precipitation is missing due to equipment malfunctioning. There were edge of pavement readings with no corresponding strip readings or there were corresponding strip readings with no edge of pavement readings. The entries that were duplicates. Dissolved metal concentrations were higher than total metal concentrations. 20 After analyzing the data for quality assurance, 17 complete storms out of 233 storms (7%) were rejected for dissolved copper. For total copper, 14 complete storms out of 153 storms (9%) were rejected. For dissolved zinc, 30 complete storms out of 233 storms (13%) were rejected. Finally, for total zinc, 8 complete storms out of 153 storms (5%) were rejected. 3.2 Predictors for Multiple Linear Regression Multiple linear regression (MLR) is a method used to model the linear relationship between a predictand and one or more predictors. In general, the predictand is a variable that is affected by the predictors. The site characteristic predictors chosen to be included in this MLR are strip slope, strip width, vegetation cover, and percentage of clay content in the soil. Table 3 provides a list of the predictors and the metals removal mechanisms that are affected by each predictor. The slope of the vegetative biofilter strip influences the velocity at which the water passes through the strip. The flow velocity affects contact time, which in turn affects filtration, sedimentation, ion exchange, adsorption, and precipitation of the metals. The lower the flow velocity is, the more contact time the water has with the vegetation and soil. Increased contact time allows for more filtration, sedimentation, adsorption, ion exchange, and precipitation to take place. The width of the strip is important because it also influences contact time. The amount of vegetation cover that is present on the strip slows down the flow of water thus increasing contact time. With the decrease in the flow, sediment deposition is facilitated on the filter due to a decrease in transport capacity. The decaying vegetation cover can also provide more adsorption sites and aid 21 in sediment deposition. The decaying vegetation is known as a humic substance and has been found to strongly influence the adsorption of metals (Dube et al., 2000). The percentage of clay in the soil is important because it indicates the relative number of negatively charged sites for ion exchange. Clay also provides sites for adsorption. The storm event predictors include total event precipitation, rainfall duration, and antecedent dry days. The total event precipitation is important when considering the amount of copper or zinc flushed from the roadway during a storm. In the Statewide Discharge Characterization Study (Caltrans, 2003), it was observed that pollutant concentrations tend to be higher for smaller rainfall events and lower for larger events, which implies that pollutants tend to become diluted in larger storms. This observation is consistent with the existence of an event first flush effect where concentrations tend to be highest in the initial portion of runoff events and are diluted as the storm event continues (Caltrans, 2003). This could affect adsorption, ion exchange, and precipitation negatively because the sites for these reactions become occupied during the first flush of a storm event limiting the removal of copper and zinc during the remainder of a storm event. If equilibrium is achieved in the high concentration first flush, when the cleaner water passes over the soil, adsorbed metals may desorb. Whether desorption cancels the adsorption can be seen in the event mean concentration data. In addition, if the total event precipitation is high due to a large amount of rainfall in a short period, the water will flow faster and deeper over the vegetated biofilter strip limiting the amount of water that comes in contact with the soil and vegetation of the strip for filtration and sedimentation. However, if total event precipitation is high due to a long rainfall 22 duration, this should not be an issue. Thus, event volume is an imperfect measure of storm size. The rainfall duration is important for copper and zinc removal because the longer it rains, the more water that will be on the filter strip saturating the soil. Once the soil is saturated, the strips infiltration capacity has been reached and overland flow will occur. The water now relies on the vegetation to slow it down so that sedimentation and adsorption can take place. The antecedent dry days are the number of dry days prior to the start of a storm event. Kim et al. (2003), found the longer the period of antecedent dry days, the higher the concentration of metals that will be at the edge of pavement. With longer antecedent dry days, the soil becomes dry, which will increase infiltration allowing for more adsorption and ion exchange within the soil particles. Longer antecedent dry days could cause further reactions, like precipitation, that would sequester the metals and maybe regenerate the adsorptive capacity of the soil. Table 3 Predictors for Multiple Linear Regression Predictor Strip Slope Strip Width Metals Removal Mechanism Affected in the Biofilter Strip Adsorption, Ion Exchange, Precipitation, Filtration, Sedimentation Adsorption, Ion Exchange, Precipitation, Filtration, Sedimentation Vegetation Cover Adsorption, Ion Exchange, Filtration, Sedimentation Clay Content Adsorption, Ion Exchange Adsorption, Ion Exchange, Precipitation, Filtration, Sedimentation Adsorption, Ion Exchange, Precipitation, Filtration, Sedimentation Total Event Precipitation Rainfall Duration Antecedent Dry Days Adsorption, Ion Exchange, Precipitation, Filtration 23 3.3 Predictands for Multiple Linear Regression Several predictands were considered in the multiple linear regression model. When looking at possible predictands, the natural logarithm (ln) was used to be consistent with the early multiple linear regression analysis in the 2008 RVTS study. Natural logarithms were used because the Discharge Characterization Study (Caltrans, 2003) found that the natural logarithm transformations were needed to satisfy the statistical assumptions of the analyses. The predictands used for the multiple linear regression are the natural logarithm of the effluent concentration (ln(Ce)), change of concentration (ln(Ci-Ce)), and fraction of concentration remaining (ln(Ce/Ci)). Appendix B contains the values of the above listed predictands for each study site, as well as the values for the predictors mentioned in Section 3.2. 3.4 Using Multiple Linear Regression to Determine First Order Coefficient Kadlec (1999) defined a first-order decay equation that summarizes the performance of a wide range of pollutants in wetlands. This two-parameter model includes k, the area-based removal rate constant, and C*, the irreducible background concentration of the pollutant in the wetland, and assumes plug flow kinetics. The resulting equation is, Cout = C* + (Cin - C* )e-kt (1) where Cout is the effluent concentration, C* is the irreducible background concentration, Cin is the influent concentration, k is the first order decay constant, and t is the detention time. 24 The Environmental Protection Agency (EPA) defines a removal mechanism using first order decay as C = Co e-kt (2) where C is the effluent concentration, Co is the initial concentration, k is the first order decay constant, and t is the hydraulic detention time (Huber et al., 2006). Based on Kadlec’s and EPA’s studies, it was decided to try an alternate approach with multiple linear regression. In this approach, the first-order model would be assumed and MLR would be used to predict the first order decay coefficient. To evaluate a first order decay equation the following equation was used, Ce = Ci e-kw (3) where Ce is the effluent concentration, Ci is the influent concentration, k is the first order decay constant, and w is the filter strip width. To use this equation, the relationship between the effluent concentration and the strip width should be exponential in nature. Such a relationship is shown in Figure 3 where the Sacramento RVTS dissolved zinc data is graphed with the effluent concentration on the y-axis and the width on the x-axis. To find the k values from the above equation, the slope function was used in Excel with the natural logarithm of Ce as the dependent (y) variable and strip width as the independent (x) variable. The multiple linear regression was run using the natural logarithm of k values as the predictand against the various predictors mentioned above with the exception of width, which is used to calculate the k value. 25 80 70 60 Ce (ug/L) 50 40 30 20 10 0 0 1 2 3 4 5 6 7 8 9 Width (m) Figure 3 Dissolved Zinc Removal as a Function of Width (Sacramento) 3.5 Multiple Linear Regression Multiple linear regression is a method used to model the linear relationship between a dependent variable (predictand) and one or more independent variables (predictor). In MLR the model is fit such that the sum-of-squares of differences of observed and predicted values is minimized. MLR expresses the value of a predictand variable as a linear function of one or more predictor variables and an error term: y = b0 + b1 x1 + b2 x2 +…+ bm xm + ε (4) where y is the predictand variable, b0 , b1 , …, bm are unknown parameters, x1 , x2 , …, xm are the predictor variables, and ε is a random error term (Freund & Littell, 2003). Least squares is a technique used to estimate the unknown parameters. The goal is to find estimates of the parameters, b0 , b1 ,…,bm, that minimize the sum of the squared 26 differences between the actual y values and the values of y predicted (ŷ ) by the equation. The estimates are the least-square estimates and the quantity minimized is the error sum of squares (Freund & Littell, 2003). The principle of least squares is applied to a set of n observed values of y and the associated xj to obtain estimates, b̂ 0 , b̂ 1 ,…,b̂ m, of the respective parameters b0 , b1 , …, bm. The method of ordinary least squares minimizes the sum of squared vertical distances between observed predictors and the predictors predicted by the linear approximation. The resulting estimated values are expressed in the estimating equation ŷ = b̂ 0 + b̂ 1 x1 + b̂ 2 x2 +…+ b̂ m xm (5) The MLR model is based on several assumptions, which, if satisfied, show that the regression estimators are optimal in the sense that they are unbiased, efficient, and consistent. An unbiased estimator means its expected value is equal to the true value of the predictor. An efficient estimator has a smaller variance than any other estimator has. A consistent estimator has a bias and variance that approach zero as the sample size approaches infinity (Meko, 2011). The basic assumptions for MLR are linearity, collinearity, zero mean, constant variance, and normality. Multiple linear regression can only accurately estimate the relationship between the predictand and predictors if the relationships are linear in nature. If the relationship between the predictand and predictors is not linear, the results of the regression analysis will under-estimate the predictor and over-estimate other predictors that share the same variance. A method of checking for linearity is examination of 27 residual scatter plots showing the studentized residuals as a function of predicted values (Osborne & Waters, 2002). The residual represents unexplained variation after fitting a regression model. It is the difference between the observed value of the variable and the value suggested by the regression model. The studentized residual is the residual divided by its estimated standard deviation. The residuals are assumed to be uncorrelated with the predicted values of the predictand. A violation of linearity is indicated by a noticeable pattern of dependence in the scatter plot by a flare out with increasing value of the predictand or curvature in the plot. Figure 4 shows examples of linear and nonlinear relationships of residuals. Figure 4 Scatter Plot of Residuals for a Linear (left plot) and Non-Linear (right plot) Relationship (Osborne & Waters, 2002) Collinear variables are a major problem with MLR modeling. Two variables are said to be collinear if they are approximately (or exactly) linearly dependent or if there is a high correlation between them. When collinearity between variables is present, it does not invalidate the regression model in the sense that the predictive value of the equation may still be good as long as the prediction is based on combinations of predictors within 28 the same multivariate space used to calibrate the equation (Meko, 2011). However, there are various negative effects of collinearity to consider. The variance of the regression coefficients can be inflated so much that individual coefficients are not statistically significant even though the overall regression equation is strong and the predictive ability good. The relative magnitudes and even the signs of the coefficients may defy interpretation. Finally, the values of the individual regression coefficients may change radically with the removal or addition of a predictor variable in the equation. The Variance Inflation Factor (VIF) is a statistic that is used by the JMP statistical program to identify collinearity amongst the predictor variables. VIF provides an index that measures how much the variance of an estimated regression coefficient increases as a result of collinearity. VIF is based on the multiple coefficient of determination in regression of each predictor in multivariate linear regression on all the other predictors VIFi 1 1 R i2 (6) where R2i is the multiple coefficient of determination in a regression of the ith predictor on all other predictors, and VIFi is the variance inflation factor associated with the ith predictor. Collinearity can be a problem when the variance inflation factor of one or more predictors is greater than five (Meko, 2011). Zero mean assumes the expected value of the residuals is zero. Zero mean is generally not a problem because the least squares method of estimating regression equations guarantees that the mean of the residuals is zero. 29 Constant variance assumes the variance of the residuals is constant. The method of least squares used to find the line of best fit assumes that each data point is equally reliable. In some cases, it is known that the predictand is more variable and hence less predictable for certain values of predictors. An example of a violation of this assumption is a scatter plot, which shows a pattern of residuals whose variance increases over time (Meko, 2011). The assumption of homoscedasticity means that the variance of errors is the same across all levels of predictors. Heteroscedasticity is indicated when the variance of errors differs at different values of the predictors. When heteroscedasticity is present, it can lead to distortion of the findings and weaken the analysis thus increasing the possibility of over-estimation of significance. Homoscedasticity can be checked by visual examination of a scatter plot of the studentized residuals by the regression predicted value (Osborne & Waters, 2002). Figure 5 shows examples of a homoscedasticity and heteroscedasticity scatter plot of studentized residuals. Figure 5 Scatter Plot of Residuals Showing Homoscedasticity and Heteroscedasticity (Osborne & Waters, 2002) 30 Multiple linear regression assumes that the residuals are normally distributed. Non-normally distributed residuals can distort relationships and significance tests. Normality can be tested by using the Shapiro-Wilks test where the hypothesis is that the data are from a normal distribution. Using a confidence interval of 0.05, a P-value greater than 0.05 accepts the hypothesis whereas a P-value less than 0.05 rejects the hypothesis. The coefficient of determination (R2) is the proportion of variance accounted for, explained, or described by regression. The relative sizes of the sums-of-squares terms indicate how “good” the regression is in terms of fitting the calibration data. If the regression is “perfect”, all residuals are zero and R2 is one. If the regression is a total failure, the sum-of-squares of residuals equals the total sum-of-squares (sum-of-squares of regression plus sum-of-squares of errors), no variance is accounted for by regression, and R2 is zero (Meko, 2011). The MLR modeling in this study was evaluated using JMP®, Version 8, SAS Institute Inc., Cary, NC, 1989-2011. All the data for the MLR were calculated by the author using Microsoft Excel and transferred into a table within the JMP® program. Once the data were in a table in JMP®, the MLR was calculated using the Analyze Data Fit Model where the personality used was “standard least squares” and the emphasis used was “effect leverage”. The predictand (ln(Ce), ln(C-Ci), etc.) was graphed on the y-axis while the predictors (strip slope, strip width, vegetation cover, etc.) were placed on the xaxis in combinations of two to seven predictors. Refer to Appendix C for a sample report from JMP® showing the MLR results along with a description of the results shown. 31 After several regression models were produced, the models with R2 values greater than 0.250 were evaluated to make sure the results made physical sense, the predictors were significant, and the assumptions were valid. 32 Chapter 4 RESULTS AND DISCUSSION In this chapter the results from multiple linear regression are presented, which includes the testing of the assumptions and the formulation of predictive equations. 4.1 Basic Multiple Linear Regression Multiple linear regression (MLR) was performed using the three predictands, ln(Ce), ln(Ci-Ce), ln(Ce/Ci), and the seven predictors, strip slope, strip width, vegetation coverage, clay content, rainfall duration, total event precipitation, and antecedent dry days, described in Chapter 3. The dissolved zinc regression trials were used to narrow down which combinations of predictors would be used for the regression trials of total zinc and total and dissolved copper. The regression trials for dissolved zinc started by using all seven predictors and then removing one predictor randomly so that only six predictors were used and then five predictors and so on. First each predictor was eliminated one at a time. Then the predictors which were not significant (P-value >0.05), were eliminated. (The MLR was tested with a 95% confidence level so the significant predictors are those with a t- statistic P-value less than the 0.05, which rejects the null hypothesis that the predictors have values of zero.) The effect the eliminated predictors had on the significance of the other predictors dictated which predictors were chosen until numerous combinations had been tried. After several trials with dissolved zinc were completed, only the significant predictors were evaluated in different combinations. Then only the predictor combinations that produced R2 values greater than 0.200 were applied to total zinc and total and dissolved copper. Appendix D contains the regression 33 model results for total and dissolved copper and total and dissolved zinc showing the tstatistic P-value and whether or not a predictor is significant in the model, which is indicated by the asterisk (*) next to the P-value. Of the predictands, only ln(Ce) and ln(Ci-Ce) produced results that were worthy of further evaluation, as determined by the value of the coefficient of determination (R2). The MLR trial results did not produce any R2 values greater than 0.55, so trials with R2 values of 0.250 or greater were evaluated further. Linear associations between ln(Ce/Ci) and the predictors were not observed. R2 values were less than 0.150. Table 4a presents the predictor coefficients of the regression model trials with R2 values of 0.250 or greater for total copper and dissolved copper while Table 4b presents the predictor coefficients for total zinc and dissolved zinc. 34 Dissolved Copper Total Copper Table 4a Coefficient Values for Models with R2 > 0.250 for Total Copper and Dissolved Copper Trial No. Predictand ln slope 1 ln Ce (0.212)* 2 ln Ce 3 ln Ce (0.302)* 4 ln Ce (0.279)* 5 ln Ce 0.030 6 ln Ce (0.223)* 7 ln Ce (0.202)* 8 ln Ce 0.077 9 ln Ce (0.290)* 10 ln Ce (0.033) 11 ln Ce 12 ln Ce 1 ln Ci-Ce 0.219 2 ln Ci-Ce 3 ln Ci-Ce 0.392* 4 ln Ci-Ce 0.208 5 ln Ci-Ce 0.333 6 ln Ci-Ce 0.215 7 ln Ci-Ce 0.233 9 ln Ci-Ce 0.223 10 ln Ci-Ce 0.572* 12 ln Ci-Ce 0.359* ln Ci-Ce 13 1 2 3 4 5 6 7 8 9 10 11 12 1 5 6 ln Ce ln Ce ln Ce ln Ce ln Ce ln Ce ln Ce ln Ce ln Ce ln Ce ln Ce ln Ce ln Ci-Ce ln Ci-Ce ln Ci-Ce (0.320)* (0.294)* (0.420)* (0.119) (0.334)* (0.299)* (0.077) (0.414)* (0.078) 0.514* 0.550* 0.518* ln ln width vegetation ln % clay (0.131) (0.416)* 0.965* (0.221)* (0.436)* 0.805* (0.404)* 0.933* (0.092) 1.079* (0.083) (0.469)* (0.131) (0.422)* 1.041* (0.162)* (0.421)* 0.957* (0.141) (0.510)* (0.154) 1.278* (0.460)* (0.429)* 0.569* (0.599)* 0.258 (0.122) 0.671 0.370* (0.102) 0.796* (0.147) 0.766 0.267 0.694* 0.314 (0.159) 0.255 (0.113) 0.730* 0.232 (0.163) 0.653 0.200 0.832* (0.197) 0.904* 0.373* 0.810* ln rainfall duration (0.189)* (0.194)* (0.189)* (0.199)* (0.251)* 0.037 (0.123)* (0.107) (0.128)* (0.107) (0.127)* (0.159)* 0.107 0.056 0.039 0.013 0.004 0.061 0.288 0.308 0.288 (0.543)* (0.584)* (0.548)* (0.578)* (0.549)* (0.535)* (0.580)* 0.731* 0.473* 0.735* 0.838* 0.775* 0.722* (0.337)* (0.250)* (0.200)* (0.287)* (0.129) (0.125) (0.126) (0.124) (0.180) (0.222) (0.185) (0.121) ln ln event Antecedent rain Dry Days R2 value (0.264)* 0.324* 0.523 (0.260)* 0.346* 0.513 (0.275)* 0.316* 0.517 (0.270)* 0.379* 0.457 (0.260)* 0.369* 0.452 (0.367)* 0.302* 0.506 0.312* 0.489 0.282* 0.305 0.298* 0.334 (0.267)* 0.362* 0.450 (0.283)* 0.348* 0.488 (0.269)* 0.427* 0.523 (0.166) 0.421* 0.283 (0.178) 0.394* 0.278 (0.133) 0.447* 0.273 (0.186) 0.437* 0.282 (0.153) 0.442* 0.267 (0.249)* 0.408* 0.280 0.416* 0.278 0.398* 0.257 (0.109) 0.479* 0.251 0.425* 0.250 (0.194) 0.409* 0.276 (0.134)* (0.108) (0.129)* (0.108) (0.125)* (0.198)* (0.185)* 0.942* (0.585)* (0.577)* (0.753)* (0.360)* (0.371)* (0.361)* * Predictors are significant () Coefficient of the predictor is negative 0.346* 0.250 0.235 (0.160)* (0.134)* (0.212)* 0.022 (0.010) (0.118) (0.122)* (0.330)* (0.318)* (0.317)* 0.296* 0.330* 0.297* 0.358* 0.333* 0.286* 0.292* 0.298* 0.325* 0.336* 0.335* 0.322* 0.423* 0.436* 0.427* 0.445 0.420 0.445 0.340 0.400 0.440 0.437 0.351 0.308 0.399 0.412 0.348 0.256 0.254 0.256 35 Table 4b Coefficient Values for Models with R2 > 0.250 for Total Zinc and Dissolved Zinc Dissolved Zinc Total Zinc Trial No. Predictand ln slope 1 2 3 4 5 6 7 8 10 11 12 1 2 3 4 5 6 7 8 9 10 11 ln Ce ln Ce ln Ce ln Ce ln Ce ln Ce ln Ce ln Ce ln Ce ln Ce ln Ce ln Ci-Ce ln Ci-Ce ln Ci-Ce ln Ci-Ce ln Ci-Ce ln Ci-Ce ln Ci-Ce ln Ci-Ce ln Ci-Ce ln Ci-Ce ln Ci-Ce 1 2 3 4 5 7 8 10 12 13 14 15 18 19 20 21 ln Ce ln Ce ln Ce ln Ce ln Ce ln Ce ln Ce ln Ce ln Ce ln Ce ln Ce ln Ce ln Ce ln Ce ln Ce ln Ce (0.208) (0.266)* (0.302)* (0.028) (0.224)* (0.188) 0.018 (0.061) (0.302) (0.135) (0.376)* (0.048) (0.324) (0.274) 0.009 (0.369)* 0.205 0.015 0.042 (0.131) (0.001) 0.027 0.132 0.143 0.157 (0.052) (0.031) (0.044) 0.105 (0.500) (0.036) (0.143) (0.051) ln ln rainfall ln event ln width vegetation ln % clay duration rain (0.085) (0.171)* (0.034) (0.042) (0.079) (0.122) (0.097) 0.251 0.124 0.279* 0.334* 0.263 0.202 0.265 0.198 (0.551)* (0.571)* (0.544)* (0.592)* (0.551)* (0.553)* (0.605)* (0.587)* (0.566)* (0.863)* (0.472)* (0.497)* (0.484)* (0.536)* (0.473)* (0.477)* (0.563)* 0.829* 0.746* 0.414* 1.195* 0.988* 1.283* 1.347* 1.284* 1.191* (0.203)* (0.211)* (0.200)* (0.202)* (0.248)* (0.246)* (0.238)* (0.256)* (0.249)* (0.242)* (0.360)* (0.346)* (0.246)* (0.212)* (0.258)* (0.267)* (0.279)* (0.275)* (0.269)* (0.330)* (0.247)* (0.260)* (0.307)* (0.291)* (0.277)* (0.258)* (0.298)* (0.288)* (0.439)* (0.433)* 1.556* (0.559)* (0.496)* (0.100) (0.120) (0.020) (0.090) (0.121) (0.070) (0.093) (0.085) (0.083) (0.096) (0.097) (0.054) 0.753* 0.602* 0.729* 0.920* (0.487)* (0.489)* (0.605)* (0.487)* (0.477)* (0.522)* (0.519)* (0.516)* 1.127* 0.502* 0.437* 0.643* 0.561* 0.556* 0.668* 0.606* 0.716* (0.484)* (0.483)* (0.603)* (0.471)* 0.475* 0.402* 0.636* 0.521* ln Antecedent Dry Days R2 value 0.315* 0.336* 0.310* 0.375* 0.348* 0.292* 0.311* 0.280* 0.345* 0.337* 0.411* 0.520* 0.550* 0.537* 0.571* 0.577* 0.485* 0.521* 0.491* 0.489* 0.605* 0.550* 0.445 0.437 0.442 0.340 0.410 0.429 0.421 0.299 0.409 0.424 0.483 0.416 0.408 0.407 0.378 0.375 0.403 0.400 0.294 0.311 0.359 0.405 0.455 0.424 0.270 0.446 0.365 0.433 0.407 0.337 0.353 0.320 0.266 0.287 0.451 0.418 0.270 0.359 (0.348)* (0.280)* (0.243) (0.259)* (0.131)* (0.262)* (0.168)* (0.232)* 0.316* 0.317* (0.308)* 0.300* 0.273* 0.332* 0.331* 0.288* 0.365* 0.364* 0.322* 0.389* 0.307* 0.307* (0.167)* (0.282)* (0.211)* (0.131) (0.250)* (0.218)* (0.279)* (0.126)* (0.262)* (0.169)* (0.240)* 0.264* 36 Table 4b Continued Dissolved Zinc Trial No. Predictand ln slope 30 31 32 33 36 37 38 40 41 42 4 5 6 9 10 12 21 22 24 26 ln Ce ln Ce ln Ce ln Ce ln Ce ln Ce ln Ce ln Ce ln Ce ln Ce ln Ci-Ce ln Ci-Ce ln Ci-Ce ln Ci-Ce ln Ci-Ce ln Ci-Ce ln Ci-Ce ln Ci-Ce ln Ci-Ce ln Ci-Ce ln ln rainfall ln event ln width vegetation ln % clay duration rain (0.094) (0.103) (0.074) (0.110) (0.486)* (0.485)* (0.622)* (0.474)* (0.488)* (0.486)* (0.476)* (0.508)* 0.513* 0.465* 0.556* 0.575* 0.419* 0.362* 0.465* (0.131)* (0.232)* (0.261)* (0.167)* (0.128)* (0.238)* (0.263)* 0.572* 1.04* 1.02* 0.990* 1.14* 1.11* 1.10* 0.635* 0.637* 0.628* 0.717* (0.687)* (0.651)* (0.619)* (0.663)* (0.629)* (0.588)* (0.630)* (0.748)* (0.783)* (0.766)* (0.774)* (0.807)* (0.794)* (0.782)* (0.804)* (0.788)* (0.807)* 0.457 0.412 0.501 0.319 0.270 0.351 (0.150)* (0.286)* (0.198) (0.111) (0.241) (0.225) (0.257)* (0.094) (0.127) (0.203) (0.162) ln Antecedent Dry Days R2 value 0.314* 0.313* 0.271* 0.311* 0.310* 0.267* 0.275* 0.323* 0.341* 0.493* 0.482* 0.439* 0.510* 0.496* 0.454* 0.426* 0.423* 0.389* 0.401* 0.455 0.424 0.265 0.365 0.450 0.418 0.359 0.329 0.258 0.470 0.325 0.317 0.302 0.319 0.313 0.296 0.276 0.272 0.261 0.258 * Predictors are significant () Coefficient of the predictor is negative The regression models presented in this table were then examined to determine whether the coefficients make physical sense and whether the model is a good representation of the data based on the corresponding leverage plots. Then, the regression models were examined to make sure that all the assumptions presented in Chapter 3 were valid. From this process, the best predictive equation was chosen. One way to check whether a model is consistent with the physical and chemical processes on the ground is to examine the signs of the predictor coefficients. To determine which sign a predictor coefficient should have, it is important to think back to 37 the removal mechanisms and how they are affected by the predictors and how they affect the predictands. When considering Ce, the slope coefficient should be positive whereas the width, vegetative cover, percentage of clay content, and antecedent dry days coefficients should be negative. As the slope increases, the effluent concentration should increase because the runoff will be flowing faster over the biofilter strip, which will not allow for much infiltration and will decrease contact time, which in turn limits filtration, sedimentation, adsorption, ion exchange, and precipitation. Thus, the slope predictor coefficient sign should be positive. As the width increases, the effluent concentration should decrease because the wider biofilter strip will allow for more filter area and contact time, which should provide more adsorption and ion exchange sites and more vegetation contact area for particle removal. Therefore, the width predictor coefficient sign should be negative. As vegetative cover increases, the effluent concentration should decrease because the vegetation will slow down the flow of water allowing for more contact time and provide more sites for adsorption. Therefore, the sign of the vegetative coverage predictor coefficient should be negative. As the percentage of clay increases, the effluent concentration should decrease because more adsorption and ion exchange sites are available for the uptake of metals. Thus, the sign of the clay predictor coefficient should be negative. When considering Ci-Ce, smaller Ce values lead to larger Ci-Ce values, so all the expected predictors signs should be reversed from what is expected for Ce. For both effluent concentration and concentration reduction, it is difficult to determine the expected sign for antecedent dry days, rainfall duration and total event 38 precipitation. As the antecedent dry days increase, the effluent concentration should decrease because during the non-rain inter-event time, the ground dries out which increases infiltration in the following storm allowing for more contact with the adsorptive and ion exchange sites within the soil particles. On the other hand, between storms more pollutants build up on the road, which might increase influent concentrations, which may lead to higher effluent concentrations. Thus, the sign of the antecedent dry days predictor coefficient is not obvious. Although rainfall duration is a good measure of how long it rains and the total event precipitation is a good measure of rainfall volume, neither is a good measure of the intensity of the rain event. If the total event precipitation is high due to a large volume storm with short rainfall duration, water will flow faster and deeper over the strip limiting the amount of water that comes in contact with soil and vegetation for filtration and sedimentation. From this argument, the sign for the total event precipitation predictor coefficient should be positive and the rainfall duration predictor coefficient should be negative for effluent concentration. However, if the total event precipitation is high due to long rainfall duration, runoff will have sufficient contact with soil and vegetation and the total event precipitation and rainfall duration coefficients should be positive for effluent concentration. Thus, without further knowledge of the storm event, it is difficult to determine whether the predictor coefficients for rainfall duration and total event precipitation should be positive or negative. Table 5 shows the sign that is expected of the coefficient for each of the predictors. 39 Table 5 Criteria for Predictor Evaluation of Ce and Ci-Ce Criteria for Equation Evaluation Ce Ci-Ce Slope + - Width - + Veg. Cover - + % Clay - + Rainfall Duration ? ? Total Event Rain ? ? Antecedent Dry Days ? ? The set of regression models was narrowed down by eliminating the trials which had two or more coefficient signs that did not match the expected signs and which included insignificant predictors. It was decided not to eliminate those trials that had only one coefficient sign that did not match the expected signs because in only three trials did all coefficient signs match the expected signs. By this process, 17 trials were eliminated for total copper, 14 for dissolved copper, 16 for total zinc, and 34 for dissolved zinc. The predictors were then evaluated for significance. The significance level was evaluated using the P-value of each predictor and examination of the leverage plot. The equations that contained a predictor with P-values greater than 0.05 were recast with the elimination of the insignificant predictor(s). Different combinations were generated for each metal. For example, in trial 5 for Ce of total copper, strip slope and strip width were insignificant (see Appendix D) and strip slope was insignificant in trial 10. Upon eliminating strip slope and strip width, the MLR was run again and the result is 40 shown as trial 12. A second example is Ci-Ce of total copper. Trial 2 showed insignificant strip vegetation, rainfall duration, and total event precipitation and trial 13 showed insignificant rainfall duration and total event precipitation. After eliminating these insignificant predictors, the MLR was run again (Trial 14) with only predictors strip width, clay content, and antecedent dry days as predictors. An R2 value of less than 0.250 was produced, so these results are not shown in Table 4a, although the P-values are shown in Appendix D. Total copper had only one equation for Ci-Ce (trial 12) in which all predictor coefficient signs matched the expected values and all predictors were significant. For Ce of dissolved copper, only one equation (trial 12) has only significant predictors. For Ci-Ce of dissolved copper, there were no equations with two or fewer unexpected coefficient signs and all predictors being significant. For Ce of total zinc, only in trial 12 were all predictors significant and all signs matched expectations. For CiCe of total zinc, trials 4 and 11 have all significant predictors, but trial 4 has 3 unexpected signs while trial 11 has only two. For Ce of dissolved zinc, six trials had all significant predictors, but only trial 42 had no unexpected coefficient signs. For Ci-Ce of dissolved zinc, there were no equations with two or fewer unexpected coefficient signs and all significant predictors. In the end, only seven equations were produced in which all the predictors were significant. These regression models were then checked for outliers, which are data points that are distant from the remainder of the data and can skew the results of a predictive equation. The outlier analysis was performed in JMP® by using the Multivariate function. Mahalanobis distance, which is a distance measure based on 41 correlations between predictors by which different patterns can be identified and analyzed, was used to find outliers. By using the Mahalanobis distance, outliers with a distance greater than 3.25 were eliminated and the models were re-run. For Ci-Ce of total copper, the R2 value for trial 12 dropped below 0.250 when outliers were removed so this equation was eliminated. For Ci-Ce of total zinc, the rainfall duration in trial 11 was no longer significant and slope, width, and rainfall duration in trial 4 were no longer significant once the outliers were removed. The remaining four regression models were evaluated to verify the MLR assumptions described in Chapter 3. For the assumption of linearity, constant variance, and homoscedasticity, the residual versus predicted plot was visually inspected. Figure 6 shows the residual versus predicted plot for total zinc trial 12 with the ln(Ce) predictand and the vegetation coverage, rainfall duration, total event rain, and antecedent dry days predictors. The evenly scattered data points about the line at zero and good dispersal of the data points show that the linearity, constant variance, and homoscedasticity assumptions have been met. All of the regression models checked in this way had residual versus predicted plots with evenly scattered data points about the line at zero and good dispersal of the data points, thus verifying the assumptions. 42 Figure 6 Total Zinc Residual by Predicted Plot (trial 12 for Ce) The assumption of zero mean for residuals was not an issue because the least squares method of estimating regression equations guarantees that the mean of the residuals is zero. The remaining four equations were then checked for collinearity. The Variance Inflation Factor (VIF) must be less than five to say that there is no collinearity within the predictors. In all the regression models, the VIF values were less than two. Normality was tested using the Distribution function in JMP® in which the residuals of ln(Ce) were evaluated using the Shapiro-Wilks test. Only the residuals of ln(Ce) were used since the remaining equations are all for Ce. To show normal distribution, the Shapiro-Wilks P-value must be greater than 0.05, which accepts the null hypothesis that the data are from a normal distribution. All of the regression models evaluated have P values greater than 0.05, so all exhibit normality. Figure 7 shows normality test results for total zinc trial 12. 43 Goodness-of-Fit Test Shapiro-Wilk W Test W Prob<W 0.988357 0.0736 Figure 7 Total Zinc Normality Test for ln(Ce) (trial 12) The final four equations, all using the ln(Ce) predictand are trial 12 for total copper, trial 12 for dissolved copper, trial 12 for total zinc, and trial 42 for dissolved zinc. The predictive equations produced from these trials are shown below. In these equations V is the strip vegetation coverage (%), RD is the rainfall duration (hours), EP is the total event precipitation (mm), and DD is the antecedent dry days (days). Equation 8 (dissolved copper) varies from the others in that it does not contain total event precipitation. This is discussed further in Section 4.3. Also shown below are plots of predicted Ce vs. actual Ce plots, which show how well the predicted values match the actual values. The JMP® reports for the chosen equations can be found in Appendix C. 44 Total Copper: 5.846(DD) 0.427 Ce (V) 0.599 (RD) 0.287 (EP) 0.269 (R2 = 0.523) (7) 100 90 Actual Ce (μg/L) 80 70 60 50 MLR Results 40 30 Ce Predicted = Ce Actual 20 10 0 0 50 100 150 Predicted Ce (μg/L) Figure 8 Total Copper Predicted Ce vs. Actual Ce 200 45 Dissolved Copper: 5.075(DD) 0.322 Ce (V) 0.753 (RD) 0.212 (R2 = 0.348) (8) 80 70 Actual Ce (μg/L) 60 50 40 MLR Series 30 20 Ce Predicted = Ce Actual 10 0 0 50 100 150 200 250 Predicted Ce (μg/L) Figure 9 Dissolved Copper Predicted Ce vs. Actual Ce 300 46 Total Zinc: 9.130(DD) 0.457 Ce (V) 1.062 (RD) 0.265 (EP) 0.363 (R2 = 0.483) 500 MLR Series 450 400 Actual Ce (μg/L) (9) Ce Predicted = Ce Actual 350 300 250 200 150 100 50 0 0 200 400 600 800 1000 Predicted Ce (μg/L) Figure 10 Total Zinc Predicted Ce vs. Actual Ce 1200 47 Dissolved Zinc: 6.277(DD) 0.341 Ce (V) 0.630 (RD) 0.150 (EP) 0.286 (R2 = 0.470) (10) 200 180 Actual Ce (μg/L) 160 140 120 100 80 MLR Series 60 40 Ce Predicted = Ce Actual 20 0 0 50 100 150 200 Predicted Ce (μg/L) Figure 11 Dissolved Zinc Predicted Ce vs. Actual Ce In all of the chosen equations, the predictor exponent signs match the signs expected from the physical mechanisms (Table 5). 4.2 Multiple Linear Regression for First Order Decay Multiple linear regression (MLR) was performed using the natural logarithm of the first order decay coefficient (ln(k)) as the predictand. The predictors investigated include strip slope, vegetation coverage, clay content, rainfall duration, total event precipitation, and antecedent dry days. The strip width was not used in the MLR since it was used in the calculation of the k value. Table 6 presents the estimate coefficients of the regression model trials for total copper, dissolved copper, total zinc, and dissolved 48 zinc. The highest coefficient of determination (R2) value obtained was 0.227 with most of the values being less than 0.150. The R2 value of 0.227 means that only 22.7% of the variation in ln(k) can be attributed to the linear model. Total Zinc Dissolved Copper Total Copper Table 6 Coefficient Values for Total Copper, Dissolved Copper, Total Zinc, and Dissolved Zinc Trial ln ln rainfall No. Predictand ln slope vegetation ln % clay duration 1 ln k 0.165 0.609* (0.934)* 0.031 2 ln k 0.622* (0.736)* 0.037 3 ln k 0.223 (1.117)* 0.046 4 ln k (0.105) 0.665* 0.092 5 ln k 0.166 0.610* (0.947)* 6 ln k 0.158 0.608* (0.942)* (0.053) 7 ln k (0.115) 0.665* 8 ln k 0.213 (1.105)* 9 ln k 0.618* (0.733)* 1 ln k 0.281* 0.278* (0.769)* 0.268* 2 ln k 0.307* (0.401)* 0.293* 3 ln k 0.308* (0.829)* 0.280* 4 ln k 0.058 0.317* 0.323* 5 ln k 0.311* 0.293* (0.879)* 6 ln k 0.295* 0.314* (0.803)* 0.016 7 ln k 0.060 0.358* 8 ln k 0.329* (0.884)* 9 ln k 0.345* (0.432)* 1 ln k (0.103) 0.894* (0.252) 0.279* 2 ln k 0.885* (0.373) 0.275* 3 ln k (0.004) (0.548) 0.285* 4 ln k (0.174) 0.909* 0.295* 5 ln k (0.087) 0.896* (0.358) 6 ln k (0.113) 0.896* (0.289) (0.015) 7 ln k (0.195) 0.914* 8 ln k (0.015) (0.580) 9 ln k 0.905* 0.304* ln event rain (0.147) (0.143) (0.143) (0.155) (0.130) (0.424)* (0.431)* (0.441)* (0.435)* (0.264)* (0.503)* (0.504)* (0.507)* (0.506)* (0.350)* (0.513)* ln Antecedent Dry Days R2 value 0.153* 0.156 0.135* 0.151 0.068 0.07 0.106 0.116 0.156* 0.155 0.145* 0.149 0.098 0.108 0.052 0.063 0.119 0.144 0.040 0.133 0.005 0.113 0.006 0.114 0.000 0.100 0.064 0.110 0.025 0.750 (0.008) 0.037 (0.012) 0.051 (0.007) 0.053 0.170* 0.227 0.180* 0.226 0.062 0.076 0.157* 0.225 0.203* 0.211 0.156* 0.172 0.142* 0.169 0.045 0.02 0.168* 0.218 49 Dissolved Zinc Table 6 (continued) Trial ln ln rainfall ln event No. Predictand ln slope vegetation ln % clay duration rain 1 ln k 0.009 (0.144) 0.157 0.182 0.117 2 ln k (0.144) 0.169 0.183 0.116 3 ln k (0.004) 0.191 0.171 0.135* 4 ln k 0.054 (0.152) 0.169 0.127* 5 ln k 0.040 (0.126) 0.071 0.141* 6 ln k 0.026 (0.094) 0.120 (0.017) 0.121 7 ln k 0.059 (0.102) 0.125* 8 ln k 0.013 0.155 0.129* ln Antecedent Dry Days R2 value (0.335)* 0.061 (0.335)* 0.061 (0.318)* 0.055 (0.332)* 0.059 (0.230)* 0.050 0.023 0.022 0.021 Even though the R2 values are low, the regression models were evaluated for the validity of the assumptions. For the assumptions of linearity, constant variance, and homoscedasticity, the residual versus predicted plots were visually inspected. These plots do not show a good dispersal of the data as seen in Figures 4 and 5, meaning the linearity, constant variance, and homoscedasticity assumptions cannot be verified as valid for this model. Figure 12 shows the residual versus predicted plot for total zinc with the ln(k) predictand and strip slope, vegetation coverage, clay content, rainfall duration, total event precipitation, and antecedent dry days predictors. The assumption of zero mean is not a problem because the least squares method of estimating regression equations guarantees that the mean of the residuals is zero. As before, the Variance Inflation Factor (VIF) must be less than five to say that there is no collinearity amongst the predictors and in all the ln(k) regression models, the VIF values were less than two. Normality was checked using the same process as previously mentioned. All of the regression models for ln(k), have Shapiro-Wilks P-values less than 0.05 with least being less than 0.001. 50 Thus, normality cannot be validated for these regression models. Figure 13 shows the normality test results for total zinc. Figure 12 Total Zinc Residual by Predicted Plot (trial 8) Goodness-of-Fit Test Shapiro-Wilk W Test W 0.871170 Prob<W <.0001* Figure 13 Total Zinc Normality Test for ln(k) (trial 8) With the low R2 values and all assumptions being rejected except for zero mean and collinearity, it was determined that even though equation (3) may be a valid representation of first order decay for a vegetated biofilter strip, multiple linear regression is not a useful tool for predicting the first order decay coefficient, k. 51 4.3 Discussion There are several conceptual difficulties with the chosen predictive equations. Although equations (7) thru (10) were found to be the best fit regression models based on R2 values, expected sign of the predictor coefficients, and validity of the assumptions, the highest R2 value was only 0.55 which is the proportion of variability in the data set that is accounted for by the model. One factor that could affect the goodness-of-fit is variability in the raw data collected at the RVTS sites. These are field measurements made under storm conditions at different locations over several years. In a multiple linear regression analysis of Caltrans highway runoff data from edge of pavement, Kayhanian et al. (2003) concluded that regression models are generally limited to the region or site from which the original data used to develop the model were collected and tend to be unreliable when applied to conditions beyond the range of the original data set. Since the MLR in this study combined study sites from different regions of California, regional variabilities may be a factor affecting the R2 values. Another problem with the predictive equations is their apparent conflict with physical process. Although formulated and mathematically correct based on MLR assumptions, when predictive values of effluent concentration calculated from equations (7) thru (10) are graphed (Figure 14) against vegetative cover of the strip, dissolved copper and dissolved zinc show higher effluent concentrations than total copper and total zinc. One possible cause could be the exclusion of influent concentration from this study, which may change the MLR results enough to give higher dissolved metals than total metals. The dissolved metals would be expected to be more sensitive to influent quality 52 because the strips provide less treatment for them. Because the total values include the dissolved values, these results cannot be accurate. 1 Total Copper 0.9 Dissolved Copper Predicted Ce (μg/L) 0.8 0.7 Total Zinc 0.6 Dissolved Zinc 0.5 0.4 0.3 0.2 0.1 0 0 20 40 60 80 100 Vegetation Cover (%) Figure 14 Predicted Ce vs. Vegetation Cover (assumed values were antecedent dry days at 1 day, rainfall duration at 1 hour, and total event precipitation at 24 mm) Additionally, antecedent dry days were found to have a positive coefficient; the strip slope, strip width, and clay content predictors are not included in any of the equations; and the total event precipitation predictor is not included in the equation for dissolved copper. The positive antecedent dry days coefficient means as antecedent dry days goes up, the effluent concentration also goes up. Antecedent dry days were usually a significant predictor in the equations with P-values of <0.0001, which is highly indicative of a coefficient that is not zero. In most cases when the antecedent dry days predictor was removed from the regression model, a lower R2 value was obtained and other predictors that were significant became non-significant. Thus, even though the result seems counter-intuitive, it seemed fitting to include antecedent dry days in the 53 predictive equation. Kayhanian et al. (2003) found that longer antecedent dry periods tend to result in higher pollutant concentrations in storm runoff, which is consistent with the “buildup” of pollutants during dry periods. This could be the cause of the positive coefficient seen in equations (7) thru (10). Strip slope and strip width were not included in the predictive equations because they surprisingly did not often prove to be significant predictors. In those regression models, where they were significant predictors, their coefficients had the opposite sign from what was expected. The effluent concentration was expected to have a positive strip slope coefficient because as the slope gets steeper, the runoff moves more quickly over the strip minimizing the contact time and treatment. It does not make physical sense for the effluent concentration to go down as slopes get steeper. Conversely, the effluent concentration was expected to have a negative strip width coefficient because as the strip gets wider, the runoff contact time increases. Thus, it would not make physical sense that the strip width coefficient would be positive. The strips may be acting oddly with respect to strip slope and width due to the water films being shallow and moving so fast because of steep slopes that only minimal treatment is occurring. Thus, the effects of slope and width are not apparent. In Chapter 2 various references were cited suggesting that biofilter slopes should be between 2% and 15% and that the minimum width should be 24 feet. In the Caltrans RVTS study, three sites have slopes of 33%, 50%, and 52% and 14 of the 23 strips have widths less than 24 feet. In a future study, the regression models should be based only on strip slopes and widths that are within the recommended ranges. 54 The clay content was a significant predictor in most of the regression models, but it always had the opposite sign from what was expected. Possibly, if the clay content is too great, the runoff will not be able to infiltrate well and the water film on the surface will be thicker, providing less contact with vegetation and soil and less treatment. This is speculation. Nevertheless, when clay content was eliminated from the predictive equation, the R2 of the revised model did not change substantially. For this reason, it was decided to exclude this predictor from the predictive equation. The total event precipitation was not a significant predictor in the equation for dissolved copper. When it was excluded from the regression model for dissolved copper, the R2 value did not decrease by much and the P-value of predictors that were significant, got closer to 0.05 meaning predictors became less significant. Dissolved copper consistently had the most data variability and the fewest number of regression models with R2 values greater than 0.250. If data variability is higher in dissolved copper than the other metals, this could cause one or more predictors such as total event precipitation to drop out of the dissolved copper predictive equation even though it remained a significant predictor in the predictive equations for the other metals. One factor that might be an important influence on equation performance is influent concentration. There is an intuitive sense that effluent concentration may be directly related to influent concentration. When Caltrans performed the preliminary MLR study, influent concentration was included as a predictor and was consistently found to be a significant one (Caltrans, 2008). Although influent concentration was not included as a predictor in this project, it was included as a predictand in Ci-Ce and Ce/Ci. 55 The regression models for Ce/Ci produced R2 values that were all less than 0.250 thus they were not further evaluated. For the models of Ci-Ce, very few produced predictor coefficients with the expected signs. When the regression models that did follow the sign convention were analyzed and outliers were eliminated, two or more of the predictors were no longer significant. One possibility for the relatively poor fit from Ce/Ci and Ci-Ce models is that they rely on two accurately measured concentrations. As noted earlier, there is much variation in the original data. Because Ce models rely on only one concentration measurement, there may be less error in the predictand values. With regards to the first order model, although Kadlec (1999) was able to define a first-order decay equation for the performance of pollutants in wetlands, multiple linear regression did not prove to be an effective tool in predicting the first order decay coefficient for metal removal in vegetative biofilter strips. The first-order decay model assumes plug flow conditions on the vegetated slopes. In reality there is significant potential for channelized flow caused by motor vehicle tracks from maintenance equipment or vehicles leaving the pavement. Burrowing wildlife can also disrupt plug flow conditions on the slopes (Caltrans, 2004). The assumption in this MLR study was that all the data fit a first-order model, was not always the case as seen in total copper where 53% of R2 values for the first-order decay value were below 0.85. In further studies, only those cases where the R2 for the first-order fit is greater than a specified value should be used in MLR studies. This may cause the MLR fit to be considerably better. 56 Chapter 5 CONCLUSION Based on the Caltrans RVTS study data, MLR was used to formulate equations that might be used in vegetated biofilter strip designs to predict copper and zinc removal rates or effluent concentrations using design parameters such as strip slope, strip width, vegetation cover, percentage of clay content in the soil, rainfall duration, total event precipitation, and antecedent dry days. Predictive equations were formulated for effluent concentration (Ce) for total and dissolved copper and total and dissolved zinc. Although these equations are mathematically correct because they meet the MLR assumptions, the R2 values are relatively low, meaning they are explaining only a part of the observed variation in the data. In addition, these equations predict that the dissolved metals concentrations will be greater than total metals concentrations, which is physically impossible. Consequently, these equations are unreliable predictors of effluent concentration. MLR was also used to predict first-order decay coefficients derived from the same data set. The predictand was Ce/Ci and the same predictors, except slope width, were used. None of the MLR equations produced R2 values above 0.23, and the MLR assumptions were violated. These equations are also unreliable. Further research should be conducted to determine if including influent concentrations as a predictor in the MLR would improve the resulting equations. 57 APPENDIX A RVTS Storm Data This appendix contains the RVTS storm data that were obtained from the Caltrans database and used in the calculations found in Appendix B. The storm data includes the following: Station Name Station ID Collection Date Fraction (Total or Dissolved) Constituent (Copper or Zinc) Reported Value (concentration) Precipitation Start Date Precipitation Start Time Precipitation End Date Precipitation End Time Event Rain (mm) Antecedent Dry Days Due to the quantity of storm data, this information can be found on the attached CD. 58 APPENDIX B RVTS Input Data for MLR This appendix contains the calculations used to prepare the inputs used in the multiple linear regression. These include the predictands and predictors and their natural logarithm values. The following predictands were calculated in Excel and are included in this appendix: Ce is equal to the reported value of concentration. Ci-Ce is calculated by using the edge of pavement reported value of concentration as Ci subtracting the reported value of concentration for the strip of interest. Ce/Ci is the reported value of concentration for the strip of interest divided by the edge of pavement reported value as Ci. First order decay coefficient, k, is the slope of the line with ln(Ce) on the y-axis and strip width on the x-axis. This value was calculated by using the slope function in Excel. Due to the quantity of storm data, this information can be found on the attached CD. 59 APPENDIX C Multiple Linear Regression Reports from JMP® The JMP® reports in the printer part of this appendix coincide with equations (7), (8), (9) and (10), respectively. The remainder of the JMP® reports coinciding with the MLR trials summarized in Appendix D, can be found on the attached CD. To provide a better understanding of the various data and plots shown on the JMP® reports, an explanation of the different components is presented below. Summary of Fit Table “R Square” (R2) is the coefficient of determination, which is the proportion of variance accounted for, explained, or described by the regression model. If the regression is perfect, the R2 value is one. If the regression is a failure and the sum of squares of errors equals the total sum of squares, no variance is accounted for by regression and R2 is zero. “R Square Adj” is the adjusted R2 value, which measures the proportion of the variation in the predictand accounted for by the predictors. Adjusted R2 allows for the degrees of freedom associated with the sums of squares. “Root Mean Square Error” is the square of the differences between values predicted by a model and the observed values being modeled. “Mean of Response” is the values of the predictand calculated from the regression parameters and a given value of the predictor. This is an estimate of the mean of 60 the predictand associated with an explanatory value of the predictor. The mean of response is used in the plots as a horizontal reference point. “Observations” are the number of data points used in the regression model. Analysis of Variance Table “Source” has three rows, one for total variability and one for each of the two pieces comprising the total, Model or Regression, and Error or Residual. The “C” in “C Total” stands for corrected. “DF” is the degrees of freedom. For the Model, the number of degrees of freedom is the number of predictors used for the regression. For the Error, the degrees of freedom is the number of observations minus the number of predictors minus 1. The degrees of freedom for C Total is the sum of the degrees of freedom for the Model and the Error. “Sum of Squares” is a way to find the function which best fits (least varies) from the data. It is the total variability in the response which is calculated from ∑(yy̅ )2 , where y̅ is the sample mean. The “Corrected” in “C Total” refers to subtracting the sample mean before squaring. The amount of variation in the data that cannot be accounted for by this simple method of prediction is given by the total sum of squares. When the regression model is used for prediction, the 2 uncertainty that remains is the variability about the regression line, ∑ (y-ŷ ) , where ŷ is the predicted value of the predictand. This is the Error sum of squares. 61 The difference between the Total sum of squares and the Error sum of squares is 2 the Model sum of squares, which is equal to ∑ (ŷ -y̅ ) . “Mean Squares” are the sum of squares divided by the corresponding degrees of freedom. “F Ratio” is the test statistic used to decide whether a model as a whole has a statistically significant predictive capability, that is, whether the regression sum of squares is big enough, considering the number of variables needed to achieve it. F is the ratio of the Model mean square to the Error mean square. “Prob>F” is the probability that the null hypothesis for the full model is true. The null hypothesis is that all of the regression coefficients are zero. The lower the value of Prob>F, the more likely the null hypothesis will be rejected which means that some of the regression coefficients are not zero and that the regression equation does have validity in fitting the data (i.e. the predictors are not purely random with respect to the predictand). Parameter Estimates Table ‘Term” is the predictor. “Estimate’ is the regression coefficients in the regression equation. “Standard Error” is an estimate of the standard deviation of the regression coefficients. 62 “t ratio” tests the hypothesis that a population regression coefficient is 0 when the other predictors are in the model. It is the ratio of the sample regression coefficient to its standard error. ‘Prob>|t|” or “P”-value labels the observed significance levels for the t statistics. The P value indicates whether a predictor has statistically significant predictive capability in the presence of the other predictors. If the p value is less than the confidence level, in this case 0.05, then the null hypothesis that the predictors have values of 0, is rejected. The asterisk (*) next to this value indicates a significant value. “VIF” is the Variance Inflation Factor. This statistical measure is used to identify collinearity amongst the predictor variables. If the VIF is greater than five, one or more predictors are said to be collinear. Plots “Whole Model Actual by Predicted Plot” shows the observed values of y versus its predicted values for the hypothesis that all the predictors for the model are 0. The plot portrays the observation-by-observation composition of the regression sum of squares. This plot can be used to detect linearity, constant variance, or homoscedasticity within the regression model as shown in Chapter 3. “Residual by Predicted Plot” shows the actual residual values of y versus its predicted values in reference to 0. This plot is used to detect linearity, constant variance, or homoscedasticity within the regression model as shown in Chapter 3. 63 “Leverage Plot” shows a point-by-point composition of the sum of squares for a hypothesis test. The leverage plot is useful for identifying likely candidate points that might be influential to the hypothesis. This plot shows the confidence limits for the expected value as a function of the predictor. The hyperbola has properties, which make it a useful significance-measuring instrument. If the slope predictor is significantly different from 0, the confidence curve will cross the horizontal line of the response mean. If the slope predictor is not significantly different from 0, the confidence curve will not cross the horizontal line of the response mean. If the t test for the slope predictor is sitting right on the margin of significance, the confidence curve will have the horizontal line of the response mean as an asymptote. 64 JMP® report coinciding with equation (7) for total copper: Response ln Ce Whole Model Actual by Predicted Plot Summary of Fit RSquare RSquare Adj Root Mean Square Error Mean of Response Observations (or Sum Wgts) 0.523064 0.51398 0.608854 2.592512 215 Analysis of Variance Source Model Error C. Total DF 4 210 214 Sum of Squares 85.37685 77.84759 163.22444 Mean Square 21.3442 0.3707 F Ratio 57.5777 Prob > F <.0001* Parameter Estimates Term Intercept ln Vegetation ln Rainfall Duration ln Event Rain ln Antecedant Dry Days Estimate 5.8459517 -0.598525 -0.286517 -0.268827 0.4265024 Std Error 0.47692 0.099502 0.067715 0.067373 0.041584 t Ratio 12.26 -6.02 -4.23 -3.99 10.26 Prob>|t| <.0001* <.0001* <.0001* <.0001* <.0001* VIF . 1.0938455 1.7214278 1.6478328 1.2043856 65 Residual by Predicted Plot ln Vegetation Leverage Plot ln Rainfall Duration Leverage Plot 66 ln Event Rain Leverage Plot ln Antecedant Dry Days Leverage Plot 67 JMP® report coinciding with equation (8) for dissolved copper: Response ln Ce Whole Model Actual by Predicted Plot Summary of Fit RSquare RSquare Adj Root Mean Square Error Mean of Response Observations (or Sum Wgts) 0.348001 0.341831 0.682306 1.912928 321 Analysis of Variance Source Model Error C. Total DF 3 317 320 Sum of Squares 78.76837 147.57668 226.34505 Mean Square 26.2561 0.4655 F Ratio 56.3991 Prob > F <.0001* Parameter Estimates Term Intercept ln vegetation ln rainfall duration ln antecedent dry days Estimate 5.0749741 -0.752602 -0.212181 0.3223368 Std Error 0.462135 0.099201 0.049144 0.040169 t Ratio 10.98 -7.59 -4.32 8.02 Prob>|t| <.0001* <.0001* <.0001* <.0001* VIF . 1.0598281 1.0316266 1.090435 68 Residual by Predicted Plot ln vegetation Leverage Plot ln rainfall duration Leverage Plot 69 ln antecedent dry days Leverage Plot 70 JMP® report coinciding with equation (9) for total zinc: Response ln Ce Whole Model Actual by Predicted Plot Summary of Fit RSquare RSquare Adj Root Mean Square Error Mean of Response Observations (or Sum Wgts) 0.483113 0.473883 0.715722 3.705415 229 Analysis of Variance Source Model Error C. Total DF 4 224 228 Sum of Squares 107.24801 114.74567 221.99369 Mean Square 26.8120 0.5123 F Ratio 52.3409 Prob > F <.0001* Parameter Estimates Term Intercept ln Vegetation ln Rainfall Duration ln Event Rain ln Antecedent Dry Days Estimate 8.167073 -0.863429 -0.257729 -0.307068 0.4108706 Std Error 0.534948 0.110869 0.080265 0.077382 0.046002 t Ratio 15.27 -7.79 -3.21 -3.97 8.93 Prob>|t| <.0001* <.0001* 0.0015* <.0001* <.0001* VIF . 1.0530361 1.6785512 1.608196 1.1338568 71 Residual by Predicted Plot ln Vegetation Leverage Plot ln Rainfall Duration Leverage Plot 72 ln Event Rain Leverage Plot ln Antecedent Dry Days Leverage Plot 73 JMP® report coinciding with equation (10) for dissolved zinc: Response ln Ce Whole Model Actual by Predicted Plot Summary of Fit RSquare RSquare Adj Root Mean Square Error Mean of Response Observations (or Sum Wgts) 0.470214 0.461119 0.606438 3.008192 238 Analysis of Variance Source Model Error C. Total DF 4 233 237 Sum of Squares 76.05412 85.68959 161.74371 Mean Square 19.0135 0.3678 F Ratio 51.7000 Prob > F <.0001* Parameter Estimates Term Intercept ln Strip Vegetation ln rainfall duration ln total event precipitation ln antecedent dry days Estimate 6.2769443 -0.630037 -0.150002 -0.285506 0.3406691 Std Error 0.445724 0.09109 0.063755 0.064922 0.037233 t Ratio 14.08 -6.92 -2.35 -4.40 9.15 Prob>|t| <.0001* <.0001* 0.0195* <.0001* <.0001* VIF . 1.0957188 1.6422912 1.5883739 1.1712607 74 Residual by Predicted Plot ln Strip Vegetation Leverage Plot ln rainfall duration Leverage Plot 75 ln total event precipitation Leverage Plot ln antecedent dry days Leverage Plot 76 APPENDIX D MLR Trials This appendix contains the results obtained from the multiple linear regression models. The tables presented below contain the predictand in the left column with the predictors listed across the top of the table. For each predictor that was used in a particular MLR model, the box contains the probability that is calculated from each tratio. This value comes from the JMP® reports in Appendix C and is listed in the Parameter Estimates table as Prob>t. If this value is significant as defined in Appendix C, it has an asterisk (*) next to it. In the far right column is the associated coefficient of determination (R2) value, which is the proportion of variability in a data set that is accounted for by the statistical model. 77 D-1 TOTAL COPPER Trial No. Predictand ln slope ln width 0.0842 0.0282* ln Ce 1 0.0007* ln Ce 2 0.0003* ln Ce 3 0.2524 0.0065* ln Ce 4 0.3008 0.743 ln Ce 5 0.0885 0.0234* ln Ce 6 0.0426* 0.0385* ln Ce 7 0.117 0.4595 ln Ce 8 0.0804 0.0101* ln Ce 9 0.6413 ln Ce 10 ln Ce 11 ln Ce 12 0.1265 0.2552 ln Ci-Ce 1 0.0072* ln Ci-Ce 2 ln Ci-Ce 0.0129* 3 0.1119 0.2776 ln Ci-Ce 4 0.0608 0.0717 ln Ci-Ce 5 0.1296 0.2648 ln Ci-Ce 6 0.1641 0.2258 ln Ci-Ce 7 0.1284 ln Ci-Ce 0.0420* 8 0.231 0.248 ln Ci-Ce 9 ln Ci-Ce <0.0001* 10 ln Ci-Ce 11 ln Ci-Ce 0.0228* 12 0.0066* ln Ci-Ce 13 0.0128* ln Ci-Ce 14 0.4005 0.0032* ln C/Ci 1 0.0046* ln C/Ci 2 ln C/Ci <0.0001* 3 0.576 0.0011* ln C/Ci 4 0.3712 0.0005* ln C/Ci 5 0.4 0.0033* ln C/Ci 6 0.4291 0.0030* ln C/Ci 7 0.4263 0.0004* ln C/Ci 8 0.6419 0.0011* ln C/Ci 9 ln C/Ci <0.0001* 10 ln C/Ci 11 ln C/Ci <0.0001* 12 0.2382 ln k 1 ln k 2 0.1252 ln k 3 0.3595 ln k 4 0.2319 ln k 5 0.257 ln k 6 0.3174 ln k 7 0.1419 ln k 8 ln k 9 ln vegetation <0.0001* <0.0001* <0.0001* <0.0001* <0.0001* <0.0001* <0.0001* <0.0001* <0.0001* <0.0001* 0.5217 0.591 0.4418 0.4038 0.553 0.3832 0.1985 0.3034 0.5309 0.0031* 0.0010* 0.0037* 0.0033* 0.0031* 0.0031* 0.0040* .0042* 0.0016* 0.0049* <0.0001* <0.0001* <0.0001* <0.0001* <0.0001* <0.0001* <0.0001* ln event ln Antecedent R2 value Dry Days rain 0.523 <0.0001* <0.0001* 0.513 <0.0001* 0.0001* 0.517 <0.0001* <0.0001* 0.457 <0.0001* 0.0001* 0.452 <0.0001* 0.0002* 0.506 <0.0001* <0.0001* 0.489 <0.0001* 0.305 <0.0001* 0.334 <0.0001* <0.0001* 0.45 <0.0001* 0.0004* 0.0002* 0.488 <0.0001* <0.0001* 0.0036* <0.0001* 0.523 <0.0001* <0.0001* <0.0001* 0.283 <0.0001* 0.2811 0.3695 0.0587 0.278 <0.0001* 0.2483 0.3861 0.0186* 0.273 <0.0001* 0.3846 0.3838 0.0291* 0.282 <0.0001* 0.2182 0.3867 0.0488* 0.267 <0.0001* 0.3251 0.2062 0.28 <0.0001* 0.0428* 0.0363* 0.278 <0.0001* 0.053 0.0653 0.238 <0.0001* 0.257 <0.0001* 0.0171* 0.251 <0.0001* 0.4813 0.1973 0.245 <0.0001* 0.4193 <0.0001* 0.4534 0.25 <0.0001* 0.0088* 0.276 <0.0001* 0.1987 0.3999 0.0161* 0.243 <0.0001* 0.0162* 0.168 0.0004* 0.5819 0.6521 0.6677 0.136 0.0043* 0.5379 0.7385 0.0606 0.166 0.0003* 0.6391 0.6525 0.6001 0.135 0.0043* 0.6221 0.7154 0.9505 0.168 0.0002* 0.5844 0.5974 0.168 0.0004* 0.3346 0.6102 0.167 0.0005* 0.362 0.6715 0.163 0.0004* 0.132 0.0062* 0.8323 0.165 0.0001* 0.6467 0.5867 0.105 0.0054* 0.7498 0.7969 0.0005* 0.161 0.0002* 0.156 0.0274* 0.1891 0.7855 0.0012* 0.151 0.0454* 0.2031 0.7447 0.0016* 0.07 0.3328 0.2223 0.7018 0.0002* 0.116 0.1244 0.1749 0.4242 0.155 0.0211* 0.1606 0.0008* 0.149 0.0361* 0.5773 0.0011* 0.108 0.1392 0.063 0.4289 0.0002* 0.144 0.0649 0.0014* ln rainfall ln % clay duration <0.0001* 0.0046* <0.0001* 0.0038* <0.0001* 0.0048* <0.0001* 0.0051* 0.0004* <0.0001* <0.0001* <0.0001* 78 D-2 DISSOLVED COPPER Trial No. Predictand ln slope 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 13 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 ln Ce ln Ce ln Ce ln Ce ln Ce ln Ce ln Ce ln Ce ln Ce ln Ce ln Ce ln Ce ln Ci-Ce ln Ci-Ce ln Ci-Ce ln Ci-Ce ln Ci-Ce ln Ci-Ce ln Ci-Ce ln Ci-Ce ln Ci-Ce ln Ci-Ce ln Ci-Ce ln Ci-Ce ln Ci-Ce ln Ce/Ci ln Ce/Ci ln Ce/Ci ln Ce/Ci ln Ce/Ci ln Ce/Ci ln Ce/Ci ln Ce/Ci ln Ce/Ci ln Ce/Ci ln Ce/Ci ln Ce/Ci ln k ln k ln k ln k ln k ln k ln k ln k ln k 0.0001* <0.0001* <0.0001* 0.1165 <0.0001* 0.0003* 0.3223 <0.0001* 0.1506 0.0024* <0.0001* 0.0081* 0.0008* 0.0020* 0.0004* 0.0001* 0.0019* <0.0001* ln width ln vegetation ln % clay 0.5927 0.0317* 0.1535 0.4355 0.5739 0.8502 0.9587 0.4185 0.0743 <0.0001* 0.0411* 0.0521 0.0726 0.1654 0.1954 0.1628 <0.0001* <0.0001* <0.0001* <0.0001* <0.0001* <0.0001* <0.0001* 0.0158* 0.0190* 0.0167* 0.0144* <0.0001* <0.0001* <0.0001* <0.0001* <0.0001* <0.0001* <0.0001* <0.0001* 0.2164 0.3137 0.2205 0.2852 0.4269 0.3373 <0.0001* 0.0066* 0.0030* 0.4821 0.0028* 0.0057* 0.4772 0.0021* 0.0686 0.0318* 0.067 0.0473* 0.0081* 0.0247 0.0736 0.0283* 0.0933 0.0420* <0.0001* 0.0001* 0.0023* 0.4517 0.1017 0.2855 0.3356 0.4581 0.6428 0.0075* 0.0429* <0.0001* 0.8764 0.5649 0.8282 0.8097 0.9403 0.0526 0.0249* 0.0144* 0.0023* 0.0294* 0.0122* 0.0173* 0.0034* 0.1068 0.2423 <0.0001* <0.0001* 0.2239 <0.0001* <0.0001* <0.0001* ln event rain <0.0001* <0.0001* <0.0001* <0.0001* 0.0197* 0.0723 0.0112* 0.0075* 0.0692 <0.0001* <0.0001* 0.0002* <0.0001* <0.0001* ln rainfall duration 0.9485 0.3546 0.8252 0.7768 0.933 0.9931 0.7436 0.0005* 0.1568 0.0869 0.0003* 0.6287 0.0004* 0.0003* 0.0003* 0.0003* 0.9134 0.2684 0.0410* 0.0069* 0.5508 0.6505 0.863 0.6368 0.6478 0.8588 0.0024* 0.2972 0.089 0.387 0.2974 0.2632 0.1031 0.1831 0.0005* 0.8801 0.5383 0.8282 0.0084* 0.0039* 0.0030* 0.0060* 0.0039* 0.0011* 0.0016* 0.0110* 0.0005* 0.0215* 0.0002* <0.0001* 0.0004* <0.0001* 0.0149* 0.8791 0.652 0.33 0.3237 0.0032* 0.0014* 0.0023* 0.0004* <0.0001* <0.0001* <0.0001* <0.0001* 0.0004* 0.8288 ln Antecedent Dry Days R2 value <0.0001* <0.0001* <0.0001* <0.0001* <0.0001* <0.0001* <0.0001* <0.0001* <0.0001* <0.0001* <0.0001* <0.0001* <0.0001* 0.0002* <0.0001* <0.0001* <0.0001* <0.0001* <0.0001* <0.0001* <0.0001* <0.0001* 0.0019* <0.0001* <0.0001* <0.0001* 0.0009* <0.0001* <0.0001* <0.0001* <0.0001* <0.0001* <0.0001* <0.0001* <0.0001* 0.0052* <0.0001* 0.4785 0.9316 0.9168 0.9993 0.2639 0.6664 0.8917 0.8355 0.9066 0.445 0.420 0.445 0.340 0.400 0.440 0.437 0.351 0.308 0.399 0.412 0.348 0.256 0.221 0.244 0.236 0.254 0.256 0.234 0.220 0.203 0.240 0.156 0.195 0.209 0.247 0.142 0.243 0.247 0.216 0.246 0.244 0.213 0.240 0.214 0.046 0.211 0.133 0.113 0.114 0.100 0.110 0.750 0.037 0.051 0.053 79 D-3 TOTAL ZINC Trial No. 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 1 2 3 4 5 6 7 8 9 10 11 1 2 3 4 5 6 7 8 9 Predictand ln Ce ln Ce ln Ce ln Ce ln Ce ln Ce ln Ce ln Ce ln Ce ln Ce ln Ce ln Ce ln Ci-Ce ln Ci-Ce ln Ci-Ce ln Ci-Ce ln Ci-Ce ln Ci-Ce ln Ci-Ce ln Ci-Ce ln Ci-Ce ln Ci-Ce ln Ci-Ce ln C/Ci ln C/Ci ln C/Ci ln C/Ci ln C/Ci ln C/Ci ln C/Ci ln C/Ci ln C/Ci ln C/Ci ln C/Ci ln k ln k ln k ln k ln k ln k ln k ln k ln k ln slope 0.0645 0.0053* 0.0129* 0.7865 0.0481* 0.1 0.873 0.0193* 0.4437 0.0808 0.3585 0.0337* 0.7724 0.063 0.116 0.9599 0.0463* 0.1084 0.4991 0.4789 0.4873 0.9346 0.4993 0.5457 0.9684 0.5151 0.0368* 0.4962 0.9828 0.1594 0.5661 0.4692 0.126 0.9304 ln width 0.3257 0.0212* 0.7139 0.6324 0.3688 0.1648 0.306 0.322 0.068 0.2884 0.0489* 0.0177* 0.0584 0.1408 0.0711 0.1764 0.0167* 0.0160* 0.0157* 0.0095* 0.0164* 0.0249* 0.0162* 0.0263* ln vegetation ln % clay <0.0001* 0.0001* <0.0001* 0.0009* <0.0001* 0.0002* <0.0001* <0.0001* <0.0001* <0.0001* <0.0001* 0.0002* <0.0001* <0.0001* <0.0001* <0.0001* 0.0105* <0.0001* 0.0002* 0.0001* <0.0001* 0.0007* 0.0002* <0.0001* <0.0001* <0.0001* 0.0002* <0.0001* 0.0002* 0.0002* <0.0001* <0.0001* <0.0001* 0.0001* <0.0001* 0.9254 0.1492 0.8567 0.2007 0.7621 0.0806 0.1418 0.7799 0.9251 0.1458 0.9202 0.154 0.7562 0.1038 0.5811 0.8123 0.0085* <0.0001* 0.4152 <0.0001* 0.1396 0.1018 <0.0001* <0.0001* 0.2452 <0.0001* 0.365 <0.0001* 0.0863 <0.0001* ln rainfall duration 0.0103* 0.0078* 0.0111* 0.0187* 0.0021* <0.0001* 0.0022* 0.0082* 0.0015* 0.0314* 0.0247* 0.0270* 0.0350* 0.0094* <0.0001* 0.0067* 0.0241* 0.9615 0.9927 0.9834 0.9613 0.8647 0.4008 0.7679 0.9882 0.0268* 0.0287* 0.0384* 0.0179* 0.8886 0.0147* ln event ln Antecedent rain Dry Days R2 value 0.0016* <0.0001* 0.445 0.0023* <0.0001* 0.437 0.0009* <0.0001* 0.442 0.0033* <0.0001* 0.34 0.0025* <0.0001* 0.41 <0.0001* <0.0001* 0.429 <0.0001* 0.421 <0.0001* 0.299 <0.0001* 0.248 0.0018* <0.0001* 0.409 0.0009* <0.0001* 0.424 <0.0001* <0.0001* 0.483 0.0171* <0.0001* 0.416 0.0236* <0.0001* 0.408 0.0332* <0.0001* 0.407 0.0177* <0.0001* 0.378 0.0219* <0.0001* 0.375 <0.0001* <0.0001* 0.403 <0.0001* 0.4 <0.0001* 0.294 <0.0001* 0.311 0.0525 <0.0001* 0.359 0.0321* <0.0001* 0.405 0.1972 0.0006* 0.121 0.2093 0.0002* 0.119 0.3314 0.0002* 0.099 0.1961 0.0004* 0.121 0.2041 0.0002* 0.113 0.1234 0.0004* 0.121 0.0006* 0.115 0.0003* 0.103 0.0006* 0.112 0.3594 <0.0001* 0.088 0.3371 0.0002* 0.098 <0.0001* 0.0220* 0.227 <0.0001* 0.0132* 0.226 0.0002* 0.79 0.076 <0.0001* 0.0298* 0.225 0.0006* 0.0055* 0.211 0.0413* 0.172 0.0490* 0.169 0.5562 0.02 <0.0001* 0.0200* 0.218 80 D-4 DISSOLVED ZINC Trial No. Predictand ln Slope 1 0.8685 ln Ce 2 0.6515 ln Ce 3 0.2014 ln Ce 4 0.9916 ln Ce 5 0.7815 ln Ce 6 0.2277 ln Ce 7 0.1207 ln Ce 8 0.0994 ln Ce 9 0.9191 ln Ce 10 0.0866 ln Ce 11 0.7361 ln Ce 12 0.6 ln Ce 13 0.7541 ln Ce 14 0.6714 ln Ce 15 0.2652 ln Ce 16 0.5106 ln Ce 17 0.6518 ln Ce 18 0.5202 ln Ce 19 0.6545 ln Ce 20 0.1047 ln Ce 21 0.5411 ln Ce 22 0.1093 ln Ce 23 0.5163 ln Ce 24 0.9468 ln Ce 25 0.2618 ln Ce 26 0.8573 ln Ce 27 0.0209* ln Ce 28 0.0259* ln Ce 29 0.0148* ln Ce 30 ln Ce 31 ln Ce 32 ln Ce 33 ln Ce 34 ln Ce 35 ln Ce 36 ln Ce 37 ln Ce 38 ln Ce 39 ln Ce 40 ln Ce 41 ln Ce 42 ln Ce ln rainfall ln Antecedent ln Event ln Strip R2 Value Rain Dry Days ln Width Vegetation ln % Clay duration 0.1776 0.1104 0.81 0.2274 0.1261 0.7292 0.3469 0.2203 0.7354 0.2877 0.7844 0.3043 0.2411 0.2532 0.5129 0.2503 0.2822 <0.0001* <0.0001* <0.0001* <0.0001* <0.0001* <0.0001* <0.0001* <0.0001* <0.0001* <0.0001* <0.0001* <0.0001* <0.0001* <0.0001* <0.0001* <0.0001* <0.0001* 0.0016* 0.0070* 0.0004* 0.0004* 0.0010* 0.0001* 0.0346* <0.0001* 0.0032* 0.0077* <0.0001* 0.0009* 0.0002* <0.0001* <0.0001* <0.0001* <0.0001* <0.0001* <0.0001* <0.0001* 0.0009* <0.0001* 0.0533 0.0001* 0.0005* <0.0001* <0.0001* <0.0001* 0.0094* <0.0001* <0.0001* <0.0001* <0.0001* 0.0014* 0.0425* <0.0001* 0.0031* <0.0001* <0.0001* 0.0001* 0.0026* 0.0125* 0.0004* 0.0018* 0.0001* <0.0001* 0.0102* <0.0001* 0.0007* 0.7378 0.8119 0.1371 0.11 0.3049 0.1038 0.2686 0.5363 <0.0001* <0.0001* <0.0001* <0.0001* <0.0001* <0.0001* <0.0001* <0.0001* <0.0001* <0.0001* <0.0001* <0.0001* 0.0303* <0.0001* <0.0001* 0.0004* 0.0348* 0.0017* <0.0001* 0.0009* 0.0035* 0.0002* 0.0002* 0.0014* 0.0068* 0.0008* 0.0003* 0.0383* <0.0001* 0.0001* <0.0001* 0.0195* <0.0001* <0.0001* 0.0002* <0.0001* <0.0001* <0.0001* <0.0001* <0.0001* <0.0001* <0.0001* 0.0001* <0.0001* 0.455 0.424 0.270 0.446 0.365 0.245 0.433 0.407 0.233 0.337 0.198 0.353 0.320 0.266 0.287 0.031 0.005 0.451 0.418 0.270 0.359 0.244 0.198 0.026 0.217 0.044 0.107 0.090 0.090 0.455 0.424 0.265 0.365 0.240 0.198 0.450 0.418 0.359 0.237 0.329 0.258 0.470 81 D-4 Trial ln Strip ln rainfall ln Antecedent No. Predictand ln Slope ln Width Vegetation ln % Clay duration Dry Days 1 ln Ce/Ci <0.0001* 0.0020* 0.0177* 0.394 0.8424 0.0037* 2 ln Ce/Ci <0.0001* 0.0021* 0.0122* 0.3394 0.8658 0.0043* 3 ln Ce/Ci <0.0001* 0.0073* 0.0022* 0.606 0.6368 4 ln Ce/Ci <0.0001* 0.0019* 0.0174* 0.3653 0.4865 5 ln Ce/Ci <0.0001* 0.0020* 0.0122* 0.346 0.0038* 6 ln Ce/Ci <0.0001* 0.0075* 0.0019* 0.5593 7 ln Ce/Ci <0.0001* 0.0014* 0.0233* 0.7187 0.4392 8 ln Ce/Ci <0.0001* 0.0013* 0.0171* 0.9813 0.0059* 9 ln Ce/Ci <0.0001* 0.0053* 0.0025* 0.5854 10 ln Ce/Ci <0.0001* 0.0013* 0.0168* 0.0049* 11 ln Ce/Ci <0.0001* 0.0054* 0.0022* 12 ln Ce/Ci <0.0001* 0.0030* 0.6371 0.8363 0.5477 13 ln Ce/Ci <0.0001* 0.0035* 0.5851 0.9355 0.0008* 14 ln Ce/Ci <0.0001* 0.0167* 0.9142 0.4802 15 ln Ce/Ci <0.0001* 0.0035* 0.5885 0.0006* 16 ln Ce/Ci <0.0001* 0.0178* 0.9909 17 ln Ce/Ci <0.0001* 0.0026* 0.9997 0.0010* 18 ln Ce/Ci <0.0001* 0.0158* 0.4833 19 ln Ce/Ci <0.0001* 0.0025* 0.0007* 20 ln Ce/Ci <0.0001* 0.0163* 21 ln Ce/Ci <0.0001* 0.0268* 0.2433 0.6859 0.0162* 22 ln Ce/Ci <0.0001* 0.0212* 0.1932 0.8771 0.0176* 23 ln Ce/Ci <0.0001* 0.0049* 0.3722 0.7021 24 ln Ce/Ci <0.0001* 0.0211* 0.1952 0.0161* 25 ln Ce/Ci <0.0001* 0.0043* 0.3417 26 ln Ce/Ci <0.0001* 0.0065* 27 ln Ce/Ci <0.0001* 0.7147 28 ln Ce/Ci <0.0001* 0.5178 29 ln Ce/Ci <0.0001* 0.0048* 30 ln Ce/Ci <0.0001* 31 ln Ce/Ci 0.9724 0.1061 0.0981 0.4985 0.1162 32 ln Ce/Ci 0.9775 0.092 0.1241 0.9551 0.1252 33 ln Ce/Ci 0.9766 0.0909 0.1212 0.1122 34 ln Ce/Ci 0.9391 0.0340* 0.1007 35 ln Ce/Ci 0.9491 0.0378* 0.0931 0.6896 36 ln Ce/Ci 0.3764 0.0189* 37 ln Ce/Ci 0.0903 0.0822 0.1106 38 ln Ce/Ci 0.0337* 0.0621 39 ln Ce/Ci 0.0546 0.0935 40 ln Ce/Ci 0.0498* 0.0417* 1 ln Ci-Ce 0.2772 0.0301* 2 ln Ci-Ce 0.0035* 0.0813 0.7324 3 ln Ci-Ce 0.0032* 0.0836 4 ln Ci-Ce <0.0001* 0.0006* <0.0001* 0.2452 0.1932 <0.0001* ln Event Rain 0.4957 0.0026* 0.0048* 0.0007* 0.3204 0.6603 0.244 0.4643 R2 Value 0.165 0.162 0.135 0.165 0.162 0.135 0.163 0.159 0.134 0.159 0.133 0.146 0.141 0.103 0.141 0.101 0.140 0.103 0.140 0.101 0.133 0.130 0.110 0.130 0.110 0.107 0.081 0.082 0.108 0.082 0.049 0.044 0.044 0.036 0.036 0.025 0.044 0.036 0.033 0.034 0.058 0.049 0.049 0.325 82 D-4 Trial ln Strip ln rainfall ln Antecedent No. Predictand ln Slope ln Width Vegetation ln % Clay duration Dry Days 5 ln Ci-Ce <0.0001* 0.0010* <0.0001* 0.2942 0.0599 <0.0001* 6 ln Ci-Ce <0.0001* 0.0019* <0.0001* 0.203 <0.0001* 7 ln Ci-Ce 0.0035* 0.0252* <0.0001* 0.0771 8 ln Ci-Ce 0.0036* 0.0233* <0.0001* 0.0907 0.4719 9 ln Ci-Ce <0.0001* 0.0009* <0.0001* 0.1365 <0.0001* 10 ln Ci-Ce <0.0001* 0.0014* <0.0001* 0.0435* <0.0001* 11 ln Ci-Ce 0.0002* 0.0413* <0.0001* 0.3814 12 ln Ci-Ce <0.0001* 0.0029* <0.0001* <0.0001* 13 ln Ci-Ce 0.0002* 0.0465* <0.0001* 14 ln Ci-Ce <0.0001* 0.0005* 0.092 0.2448 <0.0001* 15 ln Ci-Ce 0.0001* 0.0011* 0.1079 0.124 <0.0001* 16 ln Ci-Ce 0.0479* 0.0356* 0.0144* 0.8759 17 ln Ci-Ce 0.0002* 0.0018* 0.0743 <0.0001* 18 ln Ci-Ce 0.0469* 0.0355* 0.0134* 19 ln Ci-Ce <0.0001* 0.0019* 0.085 <0.0001* 20 ln Ci-Ce <0.0001* 0.0034* <0.0001* 21 ln Ci-Ce 0.0017* <0.0001* 0.4294 0.4157 <0.0001* 22 ln Ci-Ce 0.0016* <0.0001* 0.5019 0.1207 <0.0001* 23 ln Ci-Ce 0.0388* <0.0001* 0.1716 0.5493 24 ln Ci-Ce 0.0020* <0.0001* 0.38 <0.0001* 25 ln Ci-Ce 0.0368* <0.0001* 0.1507 26 ln Ci-Ce <0.0001* <0.0001* <0.0001* 27 ln Ci-Ce 0.0015* <0.0001* 28 ln Ci-Ce 0.0189* 0.8163 29 ln Ci-Ce 0.0003* <0.0001* 30 ln Ci-Ce 0.0163* 31 ln Ci-Ce 0.2322 0.0001* 0.0033* 0.4366 0.0004* 32 ln Ci-Ce 0.2618 <0.0001* 0.0052* 0.1383 0.0004* 33 ln Ci-Ce 0.3394 <0.0001* 0.0045* 0.4815 34 ln Ci-Ce 0.3025 <0.0001* 0.0033* 0.0011* 35 ln Ci-Ce 0.3586 <0.0001* 0.0035* 36 ln Ci-Ce 0.796 <0.0001* 37 ln Ci-Ce <0.0001* 0.0057* 0.0012* 38 ln Ci-Ce <0.0001* 0.0052* 39 ln Ci-Ce <0.0001* 0.0013* 40 ln Ci-Ce 0.0031* <0.0001* 1 ln k 0.9364 0.2072 0.5253 0.0731 0.0588 2 ln k 0.2072 0.3851 0.0676 0.0524 3 ln k 0.9756 0.4392 0.0915 0.0262* 4 ln k 0.5509 0.1809 0.0884 0.0352* 5 ln k 0.7259 0.2691 0.7708 0.0215* 6 ln k 0.8272 0.4113 0.633 0.8436 0.0557 7 ln k 0.5225 0.3699 0.0378* 8 ln k 0.9117 0.5261 0.0302* ln Event Rain 0.5333 0.5822 0.3 0.4004 0.3138 0.0009* 0.0008* 0.0014* 0.0009* 0.0049* R2 Value 0.317 0.302 0.215 0.218 0.319 0.313 0.204 0.296 0.200 0.243 0.226 0.082 0.215 0.082 0.214 0.200 0.276 0.272 0.194 0.261 0.192 0.258 0.182 0.032 0.160 0.037 0.238 0.234 0.178 0.224 0.175 0.134 0.219 0.171 0.183 0.139 0.061 0.061 0.055 0.059 0.050 0.023 0.022 0.021 83 REFERENCES Blecken, G. T., Y. Zinger, A. Deletic, T. D. Fletcher, & M. Viklander (2009). Impact of a submerged zone and a carbon source on heavy metal removal in stormwater biofilters. Ecological Engineering 35, 769-778. Brice, D. C. & D. Starring (2002). Managing the Storm: Stormwater Pollution Management. ESRI International Conference Proceeding. Brookhaven National Laboratory (2008). Peconic River Remedial Alternatives: Phytostabilization. The University of Chicago Argonne National Laboratory. Cahill, M., D. C. Godwin, & M. Sowles (2011). Vegetated Filter Strips. Oregon State University Oregon Sea Grant ORESU-G-11-003. California Department of Transportation (2003). Discharge Characterization Study Report CTSW-RT-03-065.51.42. California Department of Transportation (2003). Final Report, Roadside Vegetated Treatment Sites (RVTS) Study CTSW-RT-03-028. California Department of Transportation (2003). Statewide Stormwater Management Plan CTSW-RT-02-008. California Department of Transportation (2004). BMP Retrofit Pilot Program, Final Report CTSW-RT-01-050. California Department of Transportation (2007). Biofiltration Strips Design Guidance. Caltrans Storm Water Quality Handbook. California Department of Transportation (2008). Final Summary Report, 2008 Report, Roadside Vegetated Treatment Sites (RVTS) Study CTSW-RT-08-208-03-1. Dillaha, T. A., J. H. Sherrard, & D. Lee (1986). Long-Term Effectiveness and Maintenance of Vegetative Filter Strips. Virginia Water Resources Research Center Bulletin 153. Dube, A., R. Zbytniewski, T. Kowalkowski, E. Cukrowska, & B. Buszewski (2001). Adsorption and Migration of Heavy Metals in Soil. Polish Journal of Environmental Studies, 10(1), 1-10. France, R. L. (2002). Handbook of Water Sensitive Planning and Design. Danvers, MA: CRC Press LLC. 84 Freund, R. J. & R. C. Littell (2003). Regression Using JMP. Cary, NC: SAS Press & John Wiley Sons, Inc. Garcia-Sanchez, A., E. Alvarez-Ayuso, & O. Jimenez-De Blas (1998). Sorption of heavy metals from industrial waste water by low-cost mineral silicates. Clay Minerals, 34, 469477. Grismer, M. E., A. T. O’Geen & D. Lewis (2006). Vegetative Filter Strips for Nonpoint Source Pollution Control in Agriculture. University of California Division of Agriculture and Natural Resources Publication 8195. Hecht, S.A., D. H. Baldwin, C. A. Mebane, T. Hawkes, S. J. Gross, & N. L. Scholz (2007). An Overview of Sensory Effects on Juvenile Salmonids Exposed to Dissolved Copper: Applying a Benchmark Concentration Approach to Evaluate Sublethal Neurobehavioral Toxicity. U.S. Department of Commerce, NOAA Technical Memorandum NMFS-NWFSC-83. Huber, W. C., L. Cannon, & M. Stouder (2006). BMP Modeling Concepts and Simulation. United States Environmental Protection Agency Report EPA/600/R-06/033. Kadlec, R. (1999). The inadequacy of first-order treatment wetland models. Ecological Engineering 15, 105-119. Kayhanian, M., A. Singh, C. Suverkropp, & S. Borroum (2003). Impact of Annual Average Daily Traffic on Highway Runoff Pollutant Concentrations. Journal of Environmental Engineering © American Society of Civil Engineers (11), 975-990. Kim, L., M. Kayhanian, S. Lau, & S. Stenstrom (2003). A New Modeling Approach for Estimating First Flush Metal Mass Loading. Water Science & Technology 51(3-4), 159 – 167. Lasat, M. (2000). The Use of Plants for Removal of Toxic Metals from Contaminated Soil. Environmental Protection Agency National Service Center for Environmental Publications. McLean, J. E., & B. E. Bledsoe (1992). Behavior of Metals in Soils. Environmental Protection Agency Ground Water Issue 14 EPA 540-S-92-018. Meko, D. (2011). Multiple Linear Regression. Retrieved from The University of Arizona, The Laboratory of Tree Ring Research Website http://www.ltrr.arizona.edu/~dmeko/notes_11.pdf. 85 Osborne, J. & E. Waters (2002). Four assumptions of multiple regression that researchers should always test. Practical Assessment, Research & Evaluation, 8(2). Pennsylvania Department of Environmental Protection (2006). Pennsylvania Stormwater Best Management Practices Manual. Document number 363-0300-002. Shoemaker, L., M. Lahlou, A. Doll, & P. Cazenas (2002). Stormwater Best Management Practices in an Ultra-Urban Setting: Selection and Monitoring. U.S. Department of Transportation Federal Highway Administration Stormwater Management. Solomon, F. (2009). Impacts of Copper on Aquatic Ecosystems and Human Health. Mining.com Magazine April-June, 25-28. Sprague, J. B. (1968). Avoidance Reactions of Rainbow Trout to Zinc Sulphate Solutions. Water Research (2), 367-372. Storey, B. J., M. Li, J. A. McFalls, & Y. Yi (2009). Stormwater Treatment With Vegetated Buffers. National Cooperative Highway Research Program Project 25-25. US EPA (2006). Vegetated Filter Strip. Retrieved November 17, 2010, from http://cfpub.epa.gov/npdes/stormwater/menuofbmps/index.cfm?action=factsheet_results &view=specific&bmp=76 Ventura Countywide Stormwater Quality Management Program (2009). BMP BF: Biofilter. Retrieved from http://www.vcstormwater.org/documents/workproducts/landuseguidelines/biofilter.pdf Washington State Department of Transportation (WSDOT) (2010). Highway Runoff Manual M31-16.02. Yousef, Y.A., M. P. Wanielista, H. H. Harper, D. B. Pearce, & R. D. Tolbert (1985). Best Management Practices: Removal of Highway Contaminants by Roadside Swales. Florida Department of Transportation FLHPR Study E-11-81.