ASSESSING THE STATISTICAL ACCURACY OF A MONITORING PROGRAM

ASSESSING THE STATISTICAL ACCURACY OF A MONITORING PROGRAM PROPOSED FOR THE CALIFORNIA DEPARTMENT OF TRANSPORTATION STORMWATER DISCHARGE PERMIT A Project Presented to the faculty of the Department of Civil Engineering California State University, Sacramento Submitted in partial satisfaction of the requirements for the degree of MASTER OF SCIENCE in Civil Engineering by Lakshmi Jayaprakash SPRING 2012 © 2012 Lakshmi Jayaprakash ALL RIGHTS RESERVED ii ASSESSING THE STATISTICAL ACCURACY OF A MONITORING PROGRAM PROPOSED FOR THE CALIFORNIA DEPARTMENT OF TRANSPORTATION STORMWATER DISCHARGE PERMIT A Project by Lakshmi Jayaprakash Approved by: __________________________________, Committee Chair John Johnston, Ph.D., P.E. __________________________________, Second Reader Ramzi Mahmood, Ph.D., P.E. __________________________ Date iii Student: Lakshmi Jayaprakash I certify that this student has met the requirements for format contained in the University format manual, and that this project is suitable for shelving in the Library and credit is to be awarded for the project. __________________________, Department Chair Ramzi Mahmood, Ph.D., P.E. Department of Civil Engineering iv ___________________ Date ABSTRACT of ASSESSING THE STATISTICAL ACCURACY OF A MONITORING PROGRAM PROPOSED FOR THE CALIFORNIA DEPARTMENT OF TRANSPORTATION STORMWATER DISCHARGE PERMIT by Lakshmi Jayaprakash In the 2011 revision to the California Department of Transportation (Caltrans) statewide stormwater permit (Tentative Order No. 2011-XX-DWQ), the California State Water Resources Control Board required Caltrans to monitor discharge flows for only three events each year and using those results, make a determination as to whether the runoff characteristics exceed water quality objectives (WQO). The process for making the determination was specified in the permit in the form of two water quality action levels: (1) three or more exceedances of a water quality objective (WQO) or (2) two or more exceedances of a WQO by more than 50 percent. The goal of this project is to assess the accuracy of these rules as a function of the number of field samples collected per year. The permit rules were applied to 100,000 randomly chosen samples of specified sizes drawn from historic Caltrans data sets for total cadmium at three sites. The California 303(d) listing process was used to determine whether the historic discharges v exceeded the WQO. The results of the permit rules applied to the multiple simulated sample data sets and the 303(d) listing process (representing the “true” result) were compared. From these comparisons, the probability or frequency of incorrect conclusion arising from using the permit rules was calculated. An alternate statistical approach in which the WQO was compared to the 95 percent lower confidence limit (LCL) of the 90th percentile of the data was tested in a similar manner (i.e.100, 000 random simulations of various sample sizes). _______________________, Committee Chair John Johnston, Ph.D., P.E. _______________________ Date vi ACKNOWLEDGEMENTS The author would like to thank Dr. John Johnston, Dr. Ramzi Mahmood and Dr. Dipen Patel for their support, input, guidance and review of this project. In addition, she would like to thank Dr. Ramzi Mahmood and Dr. John Johnston again for their support and guidance throughout the graduate program. Lastly, the author would like to thank her family for their support throughout the program. vii TABLE OF CONTENTS Page Acknowledgements ........................................................................................................... vii List of Tables .......................................................................................................................x List of Figures ................................................................................................................... xii Chapters 1. INTRODUCTION ...........................................................................................................1 2. BACKGROUND .............................................................................................................4 2.1 Stormwater Definition .................................................................................. 4 2.2 Stormwater Discharge Characterization Study ............................................. 5 2.2.1 Constituents monitored……………………………………………………..9 2.2.2 Factors Affecting Runoff Quality…………………………………………10 2.2.3 Comparison with Water Quality Objectives………………………………11 2.3 Provisions of Tentative Order No. 2011-XX-DWQ ................................... 16 2.4 Water Quality Control Policy ..................................................................... 22 2.4.1 2.5 California Listing Procedure for Numeric Water Quality Objectives 22 Gibbon’s Method ........................................................................................ 26 3. METHODOLOGY ........................................................................................................33 3.1 Caltrans Data............................................................................................... 33 3.2 303(d) Listing Process ................................................................................ 35 3.3 Permit Monitoring Model Program ............................................................ 35 3.4 Alternate Statistical Model ......................................................................... 38 viii 3.5 Delta Procedure........................................................................................... 40 4. RESULTS AND DISCUSSION ....................................................................................41 4.1 Permit Monitoring Model Results .............................................................. 41 4.2 Alternate Statistical Model ......................................................................... 48 4.3 Delta Procedure........................................................................................... 51 5. CONCLUSION ..............................................................................................................56 Appendix A Flowcharts .....................................................................................................57 Appendix B VBA Program Code ......................................................................................64 References ..........................................................................................................................73 ix LIST OF TABLES Tables Page Table 1 Water Quality Parameter Monitored in Stormwater Runoff (Table 2-1 of Caltrans 2003b) ....................................................................................................................... 10 Table 2 Summary Statistics for Highway Runoff Characteristics (Table 3-2 of Caltrans, 2003b) ....................................................................................................................... 13 Table 3 Comparisons of Caltrans runoff quality data with CTR and other relevant water quality objectives (Table 3-18, Caltrans, 2003b) ...................................................... 15 Table 4 Water Quality Action Levels (CSWRCB, 2011a) ............................................... 19 Table 5 Minimum number of Exceedances to Place a Water Segment on Section 303(D) List for Conventional or Other Pollutants (CSWRCB, 2004) .................................. 24 Table 6 One-Sided Factors for 95% Confidence LCL for the 75th and 90th percentile of the distribution k = 4 to 1000(Gibbons, 2003) .......................................................... 32 Table 7 Input data Total Cadmium (Granicher, 2011) ..................................................... 34 Table 8 User-Defined Input .............................................................................................. 38 Table 9 Permit Monitoring Model Results for Site 1 ....................................................... 41 Table 10 Permit Monitoring Model Results for Site 2 ..................................................... 44 Table 11 Permit Monitoring Model Results for Site 3 ..................................................... 46 Table 12 Alternate Statistical Model Results for Site 1 .................................................... 49 Table 13 Alternate Statistical Model Results for Site 2 .................................................... 50 Table 14 Alternate Statistical Model Results for Site 3 .................................................... 51 Table 15 Synthetic CTR Values ....................................................................................... 51 x Table 16 Results for Synthetic CTR = 1.748 .................................................................... 52 Table 17 Results for Synthetic CTR = 2.385 .................................................................... 53 Table 18 Results for Synthetic CTR = 3.022 .................................................................... 53 Table 19 Results for Synthetic CTR = 3.659 .................................................................... 53 Table 20 Results for Synthetic CTR = 4.295 .................................................................... 54 Table 21 Summary Statistics for Non-Parametric 95% LCL of 90th percentile ............... 55 xi LIST OF FIGURES Figures Page Figure 1- Stormwater Monitoring Sites (Caltrans, 2003b) ................................................. 7 Figure 2- Typical Stormwater Monitoring Facilities (Caltrans, 2003b) ............................. 8 Figure 3- Flowchart of the Water Quality Monitoring Process (CSWRCB, 2011a) ........ 20 Figure 4- 95% LCL of the 90th percentile compared to WQO ......................................... 28 Figure 5- Comparing Simulated and Actual Data for Site 1 ............................................. 43 Figure 6- Comparing Simulated and Actual data for Site 2.............................................. 45 Figure 7- Comparison of Simulated and Actual Data for Site 3 ....................................... 47 xii 1 Chapter 1 INTRODUCTION The California State Department of Transportation (hereafter Caltrans) operates under a statewide stormwater permit issued under the National Pollutant Discharge Elimination System (NPDES) by the State Water Resources Control Board (hereafter the State Board). On August 18 2011, the State Board issued a proposed renewal and revision of the Caltrans permit in the form of Tentative Order No. 2011-XX-DWQ (CSWRCB, 2011a). The Tentative Order contains a water quality monitoring process and water quality action levels. If the action levels are exceeded for a direct discharge to receiving waters, Caltrans must institute monitoring of receiving water, which is an expensive undertaking. If subsequent monitoring shows that the receiving water does not exceed the water quality standards, then the monitoring can be discontinued. If water quality standards are exceeded, a stormwater Best Management Practice (BMP) must be installed and monitoring of the discharge must continue. The discharge monitoring required to determine whether or not a site exceeds water quality action levels consists of only three samples analyzed over the space of a year. The site is deemed as exceeding water quality action levels if all three samples exceed local water quality objectives or if two of the samples exceed the objectives by at least 50 percent (CSWRCB, 2011a). Given the normal wide variability in stormwater quality, three samples is a very small number on which to base a statistical determination. It is possible that such small numbers may lead to many incorrect determinations. Under its current permit (Order No. 2 99-06-DWQ), Caltrans conducted a Statewide Discharge Characterization Study (Caltrans, 2003b) to identify concentrations and loads for pollutants of concern in runoff from Caltrans facilities such as highways, rest stops, and maintenance facilities. This three-year study generated 60,000 data points. More importantly, the data includes 20-30 samples at each of the discharge locations. These larger site-specific data sets allow an opportunity to evaluate the reliability of the three-sample, one-year protocol in the Tentative Order. In this project, data collected during the Caltrans Statewide Discharge Characterization Study will be used to evaluate whether runoff from a site exceeds a water quality objective (WQO) using the legally accepted statistical methods used to determine whether a receiving water violates WQOs for the Clean Water Act 303(d) program (CSWRCB, 2004). A computer model will be constructed to simulate runoff by randomly choosing sample results from existing data at a site. Then the rules proposed in the Tentative Order will be applied to determine if the stormwater quality exceeds a WQO. A second method proposed by (Gibbons, 2003) will also be investigated. In this method, the lower confidence limit of the 90th percentile runoff concentration is compared with the WQO. For both cases, model results will be used to assess the reliability of the three-sample methods proposed in the permit in correctly determining whether or not runoff exceeds WQOs. The probabilities of Type I and Type II errors will be calculated. 3 The goal of this project is to develop a process for assessing the accuracy of methods that use limited number of samples. The process will be tested using total cadmium data from several highway sites, but full evaluation of all the constituents discharged from all the sites monitored by Caltrans is beyond the scope of this project. 4 Chapter 2 BACKGROUND In 1972, the Federal Clean Water Act (CWA) was amended so that the discharge of pollutants to waters of the United States from any point source is unlawful unless the discharge is in compliance with an NPDES permit. In 1987, amendments were made to the CWA and Section 402(p) was added. This section established a framework for regulating municipal and industrial stormwater discharges under the NPDES permit program. On November 16, 1990, the U.S. Environmental Protection Agency (USEPA) published federal regulations to control the pollutant levels in stormwater runoff discharges (CSWRCB, 2011). Prior to July 1999, Caltrans stormwater discharges were regulated by individual permits issued by different Regional Water Quality Control Boards (hereafter the Regional Boards). On July 15, 1999, the State Water Resources Control Board issued Caltrans its first statewide stormwater permit (Order No. 99-06DWQ). This statewide permit regulates all Caltrans facilities including maintenance stations, equipment storage areas, storage facilities, fleet vehicle parking, and discharges from highways and all rights-of-way owned by Caltrans. The proposed new permit (Tentative Order No. 2011-XX-DWQ) will supersede the current order. 2.1 Stormwater Definition Stormwater is categorized into two types: 1. Stormwater Discharge- Stormwater discharges consist only of discharges that result from precipitation events. Stormwater is defined in 40 C.F.R. § 122.26(b) (13) 5 as stormwater runoff, snowmelt runoff, and surface runoff and drainage (Code of Federal Regulations, 2009). 2. Non-Stormwater Discharge- Non-stormwater discharges consists of all discharges to an MS4 that do not result from precipitation events. Caltrans discharges are mainly generated from the maintenance and operation of state- owned rights- of- way, department storage and disposal areas, and other properties. Stormwater and non-stormwater are discharged directly to surface waters or indirectly through Municipal Separate Storm Sewer Systems (MS4s). 2.2 Stormwater Discharge Characterization Study After the first statewide NPDES permit (Order No. 99-06-DWQ) was issued, Caltrans conducted a three-year discharge characterization study starting in 2000(Caltrans, 2003b). The study was designed to characterize stormwater discharges from transportation facilities throughout California. Characterization monitoring was conducted at over 180 sites statewide yielding more than 60,000 concentration data points. This study provided broad geographic coverage throughout California (see Figure 1). Some typical stormwater quality monitoring facilities are shown in Figure 2. The key objectives of the characterization study included: 1. Achieving compliance with NPDES permit requirements; 2. Producing data that are scientifically credible and representative of runoff from the Department’s facilities and that can be used to define future monitoring needs; 6 3. Providing information that can be useful to the Department in designing effective stormwater monitoring and management strategies. 7 Figure 1- Stormwater Monitoring Sites (Caltrans, 2003b) 8 Figure 2- Typical Stormwater Monitoring Facilities (Caltrans, 2003b) 9 The sampling program was designed to produce representative data of runoff for the full range of transportation facility types, geographic locations, traffic levels and land use categories. Sampling was conducted over a three-year period with up to eight storm events annually. The study started during the 2000-01 wet season and was completed at the end of the 2002-03 wet season. Facilities monitored by Caltrans as part of the study included highways, maintenance stations, park-and-ride lots , rest areas, toll plazas and weigh stations. The highway runoff samples were collected from non-urban (Average Annual Daily Traffic (AADT) ≤ 30,000 vehicles/day) and urban (AADT > 30,000 vehicles/day), highways throughout the state. 2.2.1 Constituents monitored The standard list of water quality constituents monitored in the discharge characterization studies included: 1. Conventional parameters (pH, temperature, TSS, TDS, conductivity, hardness, TOC, and DOC). 2. Nutrients (nitrate, TKN, orthophosphate-P, and total P), 3. Total and dissolved metals (As, Cd, Cr, Cu, Pb, Ni and Zn), and 4. Selected pesticides. The minimum list of constituents is shown in Table 1. 10 Table 1 Water Quality Parameter Monitored in Stormwater Runoff (Table 2-1 of Caltrans 2003b) Constituent Conventional Pollutants Conductivity Hardness as CaCO3 pH Temperature Total Dissolved Solids Total Suspended Solids Dissolved Organic Carbon (DOC) Total Organic Carbon Nutrients Nitrate as Nitrogen(NO3-N) Total Kjeldahl Nitrogen (TKN) Total Phosphorous Dissolved Ortho- Phosphate Metals(total recoverable and dissolved) Arsenic Cadmium Chromium Copper Lead Nickel Zinc Herbicides Diuron Glyphosate Oryzalin Oxadiazon Triclopyr 2.2.2 Units Reporting Limit Mmhos/cm mg/L pH Units ºC mg/L mg/L mg/L mg/L 1 2 ±0.1 ±0.1 1 1 1 1 mg/L mg/L mg/L mg/L 0.1 0.1 0.03 0.03 µg/L µg/L µg/L µg/L µg/L µg/L µg/L 1 0.2 1 1 1 2 5 µg/L µg/L µg/L µg/L µg/L 1 5 1 0.05 0.1 Factors Affecting Runoff Quality Environmental factors that affect the quality of edge-of-pavement runoff are average annual daily traffic (AADT), temporal trend variations (annual, seasonal, intrastorm intervals), precipitation characteristics (especially antecedent dry period), cumulative seasonal rainfall and event rainfall amount (Caltrans, 2003b). Pollutant 11 concentrations were found to be higher for Caltrans facilities with higher AADT, particularly highways and toll plazas. Concentrations of pollutants were found to be higher early in the wet season due to build-up of pollutant on the roadway during dry periods. Longer antecedent dry periods led to higher pollutant concentrations. Runoff pollutant concentrations decreased as storm size increased; smaller storms produced higher pollutant concentrations in runoff than storms with larger rainfall amounts. 2.2.3 Comparison with Water Quality Objectives The results of the study were compared to the California Toxics Rule (CTR) and other surface water quality objectives considered potentially relevant to stormwater runoff quality. The other water quality objectives considered included the National Primary Drinking Water Maximum Contaminant Levels, USEPA Action Plan for Beaches and Recreational Waters, USEPA Aquatic Life Criteria, and California Department of Fish and Game recommended criteria for Diazinon and Chlorpyrifos. These WQOs were considered relevant because they apply to surface waters, which may receive stormwater discharges from highways and other Caltrans facilities. Table 2 shows summary statistics for highway runoff characteristics. Depending on the frequency with which the most stringent water quality objectives were exceeded, the constituents were prioritized as high, medium and low. Constituents were considered high priority when the frequency of exceedances was greater than 50 percent, medium priority when exceedance frequencies were 5-50 percent and low priority when exceedance frequencies were less than 5 percent. Table 3 summarizes the results of 12 comparisons with the most stringent CTR criteria and other relevant WQOs. The high priority constituents were found to be lead, copper, zinc, aluminum, diazinon, chlorpyrifos and iron. 13 Table 2 Summary Statistics for Highway Runoff Characteristics (Table 3-2 of Caltrans, 2003b) 14 Table 2 (contd) Summary Statistics for Highway Runoff Characteristics (Table 3-2 of Caltrans, 2003b) 15 Table 3 Comparisons of Caltrans runoff quality data with CTR and other relevant water quality objectives (Table 3-18, Caltrans, 2003b) 16 2.3 Provisions of Tentative Order No. 2011-XX-DWQ The Tentative Order No. 2011-XX-DWQ, issued by the State Water Resources Board renews Caltrans’ NPDES and state permits to discharge stormwater and permitted non-stormwater to waters of California and the United States. The permit governs Caltrans activities such as: 1. Stormwater discharges to MS4s and receiving waters; 2. Stormwater discharges from the Caltrans’ vehicle maintenance, equipment cleaning operations facilities and other non-industrial facilities with activities that have the potential of generating significant quantities of pollutants; and 3. Certain categories of non-stormwater discharges. As part of the renewal process, Caltrans submitted a Stormwater Management Plan (SWMP) that addresses stormwater discharges from its properties, facilities and activities throughout the State of California. Caltrans developed the SWMP to document procedures and practices that it will follow to reduce the discharge of pollutants (Caltrans, 2003a). The requirement of interest to this project is the provision on water quality monitoring in the Tentative Order (Section E.2.c.2. a. ix). Caltrans is required to sample stormwater and non-stormwater discharges throughout its system. The sampling frequency specified is a minimum of three wet weather samples, including the first flush flows. Two dry weather samples are required at sites discharging non-stormwater. The Tentative Order also requires toxicity analysis to be conducted on the first wet weather 17 sample and first dry weather sample at each site. Toxicity testing need not be continued if toxicity is not present in first wet and dry weather samples. In the first year of the new permit cycle, Caltrans is directed to establish a candidate pool of sampling locations that are representative of the diverse geographic, climatic, hydrologic, demographic and land use conditions in the state. The pool shall not be limited to locations, which do not receive run-off from outside the rights-of-way. It shall eventually contain at least 500 locations. In the first year, 200sites shall be identified. In the second year, the pool shall be increased to 400 sites and then 500 sites in the third year. Sites may be designated for stormwater sampling only, non-stormwater sampling only, or both. Out of the candidate pool, Caltrans, in consultation with Regional Boards, shall select a minimum of 100 stormwater and non-stormwater sites for sampling each year. The State Board shall reevaluate the allocation of sites in year 2. The Tentative Order contains a list of water quality “action levels” which are exceedances of WQOs as defined in Table 4. At a site where the action levels are not exceeded, monitoring will be discontinued and a new site will be selected for monitoring the following year. Where the action levels are exceeded for indirect discharges to receiving waters, the Regional Board Executive officer shall determine if the discharge is a threat to receiving waters. If the discharge does not pose a threat, Caltrans may discontinue monitoring and select a new site from the candidate pool for the following year. If the action levels are exceeded for a direct discharge to receiving waters, Caltrans must monitor the affected receiving water. It may or may not choose to continue 18 monitoring the discharge. If receiving water monitoring shows that the discharge is not causing an exceedance of a water quality objective, Caltrans may discontinue monitoring and select a new site from the candidate pool. If Caltrans or the Regional Board determines that the discharge is contributing to an exceedance of a water quality objective, Caltrans shall conduct a water-shed analysis to determine whether other sources are contributing to the exceedance. Then Caltrans shall begin an iterative process. First it shall install or modify BMPs to address the WQO exceedances and continue discharge monitoring. If the action levels are not exceeded after revising the BMPs, Caltrans may discontinue monitoring and select a new site from the candidate pool. If the action levels continue to be exceeded, however, Caltrans shall resume receiving water monitoring. The iterative monitoring process contained in the Tentative Order is shown in Figure 3. 19 Table 4 Water Quality Action Levels (CSWRCB, 2011a) Stormwater Direct Indirect Discharge Discharge ≥3 ≥ 3 exceedances of exceedances a WQO by 10 % or of a WQO, more, or or 1 Non-Stormwater Direct Discharge Indirect Discharge ≥ 2 exceedances of a WQO, or ≥ 2 exceedances of a WQO by 10 % or more, or ≥2 exceedances of a WQO by 50% or more, or ≥ 2 exceedances of a WQO by 50% or more, or ≥ 1 exceedances of a WQO by 50% or more, or ≥ 1 exceedances of a WQO by 50% or more, or ≥ 3 TUa >1 ≥ 3 TUa >1 ≥ 2 TUa >1 or TUc1 > 0 ≥ 2 TUa >1 or TUc > 0 TUa = Acute Toxicity, TUc = Chronic Toxicity, WQO = Water Quality Objective. 20 Figure 3- Flowchart of the Water Quality Monitoring Process (CSWRCB, 2011a) 21 The water quality action levels are shown in Table 4. In practice, any WQO for any constituent can trigger an action level. For this project, though, only metals will be evaluated and the applicable WQOs are found in the California Toxics Rule (CTR). A site with direct discharge is said to exceed a water quality action level if three or more samples exceed the CTR value in a year or if two or more samples are greater than the WQO by 50 percent. The water quality action levels outlined for toxicity will not be used in this project because toxicity data was not collected during the Statewide Discharge Characterization Study. Essentially the water quality action levels are a means of determining whether Caltrans discharges are causing or contributing to an exceedance of WQOs in receiving waters. The objective of this project is to evaluate the risk of making an error in this determination due to the limited number of samples considered in the water quality action levels. In other words, what is the probability that a “clean” site is determined to be “dirty” by the application of the water quality action level methodology? Statistically, this is a Type I error in which the null hypothesis that the site is “clean” is erroneously rejected. Minimizing this error is of interest to Caltrans because Type I error s result in spending scarce funds on monitoring receiving waters when there is no water quality threat. On the other hand, the State Board is interested in minimizing the converse Type II error in which a site is erroneously classified as “clean” (null hypothesis is accepted) when it is not. 22 For the permit, a site is determined to be “dirty” when it exceeds the water quality action levels in Table 4. A site with direct discharge is declared “dirty” if it has three or more samples that exceed the CTR value or if two or more samples are greater than the CTR by 50 percent. If the water quality action levels are not exceeded then a site is declared “clean”. 2.4 Water Quality Control Policy Section 303(d) of the Clean Water Act (CWA) requires states to identify waters for further monitoring if they do not meet relevant water quality standards. The states must gather and evaluate all existing and readily available water quality-related data and information to develop the list and to provide documentation for listing or not listing a state’s water. The State Board has adopted a standardized approach to developing California’s 303(d) list (CSWRCB, 2004). Surface waters will be placed on the 303(d) list if they meet the listing factors described below. These factors can also be applied to Caltrans discharges to determine whether or not they are causing or contributing to exceedances of WQOs (i.e. whether or not a site is “clean”). 2.4.1 California Listing Procedure for Numeric Water Quality Objectives The 303(d) water quality assessment process is a statistical decision problem where a decision is made using a limited set of data. In the California method, whether or not receiving water violates a WQO is assumed to follow a binomial distribution (CSWRCB, 2004). A hypothesis test is set up in which null hypothesis (H0) is that the 23 probability of a given constituent’s concentration exceeding the associated WQO is less than or equal to 0.10. H0: p ≤ 0.10 If the null hypothesis is rejected, then the water body is classified as impaired and placed on the 303(d) list. If the null hypothesis is not rejected, the water body is considered to be in compliance with the WQO (i.e. not impaired). The minimum number of measured exceedances needed to place a water body in the 303(d) list is shown in Table 5. 24 Table 5 Minimum number of Exceedances to Place a Water Segment on Section 303(D) List for Conventional or Other Pollutants (CSWRCB, 2004) 25 As seen in Table 5, the null hypothesis is that the water body is not impaired and the alternate hypothesis is that the water body is impaired. If the null hypothesis H0 is true but rejected in favor of the alternate hypothesis Ha, a false positive or Type I error is committed. In this context, a Type I error would mean that an unimpaired water is incorrectly added to the 303(d) list, which triggers management responses under the Clean Water Act. Regulated facilities may be erroneously required to incur substantial costs in additional monitoring and treatment that are unnecessary. If the alternate hypothesis is true but is rejected in favor of the null hypothesis, a false negative or Type II error occurs. In this case, the impairment of a water body would go unrecognized and it would not be listed. Ideally, one would like to simultaneously minimize both errors. Unfortunately, the risk of committing a Type I error (denoted as α) is typically indirectly and conversely related to the risk of committing Type II error (denoted as β). In other words, as α is reduced, β increases. One way to manage the magnitude of β is to increase the sample size. Typically, the Type I error is chosen by the water quality assessor (Smith, 2001). If the Type I error is chosen to be 0.10, this determines the cutoff value k shown in Table 5. The cutoff value is estimated based on the following equation: 𝑘 = 𝑚𝑖𝑛|𝐵𝑖𝑛(𝑛 − 𝑘, 𝑛, 1 − 0.10) − 𝐵𝑖𝑛(𝑘 − 1, 𝑛, 0.25)| n = number of samples k = cutoff value Bin = Binomial distribution function (1) 26 Though under the Section 303(d) of the CWA a set of guidelines are available for water quality assessments (USEPA, 1997), it lacks statistically sound procedures for evaluating data (Gibbons, 2003). The 303(d) listing process described in Table 5 is based on the observed percentage of samples exceeding a WQO instead of an estimate of the percentage of the true concentration distribution that exceeds the criterion. The problem with the 303(d) listing process is that confidence in the results depends on the number of samples collected (i.e. the smaller the number of samples, the greater the uncertainty in the percentage of the true concentration distribution that exceeds the regulatory standard). By relying on the percentage of exceedances, the actual concentrations have minimal bearing on the decision rule. The problem with this approach is that it provides no information regarding the confidence with which a percentage of the true concentration distribution fails to meet a regulatory standard (Gibbons, 2003). Thus, there is a need to use a more statistically rigorous approach to make impairment decisions. 2.5 Gibbon’s Method An alternative statistical approach to determine whether or not a water body is impaired is discussed by Gibbons (2003). In this approach, a water quality objective is compared to the lower confidence limit (LCL1-α, p) of a selected percentile of the concentration distribution of the associated constituent. If the LCL for the given percentile exceeds the regulatory standard, the water body is declared in violation of that standard. Typically, the 95% Lower Confidence Limit (LCL) of the 90th percentile is compared against the WQO (see Figure 4). 27 This method offers the following features (Gibbons, 2003): 1. It provides a test of the null hypothesis that a given percentage of the true concentration distribution fails to meet the regulatory standard. 2. It is appropriate for a variety of distributions (i.e. normal, lognormal and nonparametric). 3. It directly incorporates the magnitudes of the measured concentrations in testing the hypothesis that a given percentage of the true concentration distribution exceeds the standard. 4. It has explicit statistical power that describes the probability of detecting a true impairment conditional with a given number of samples (k), concentration distribution, and magnitude of exceedance. 28 Figure 4- 95% LCL of the 90th percentile compared to WQO The setup for the hypothesis test is shown below: H0: LCLP 90, 95 ≤ WQO (unimpaired) Ha: LCLP 90, 95 > WQO (impaired) Where, LCL P90, 95 = is the 95% lower confidence limit for the 90th percentile (P90). Based on the distribution of the analytical field data, either parametric or nonparametric methods for calculating the LCL may be chosen. Normal, lognormal and nonparametric forms of the LCL are listed below (Gibbons, 2003): 29 1. Normal Confidence Limits for a percentile 𝐿𝐶𝐿1−𝛼,𝑝 = 𝑥̅ + 𝐾𝛼,𝑝 𝑠 (2) Where, x = simulated sample mean of k measurements s = simulated sample standard deviation of k measurements Kα, p= One sided factors for 95% Confidence LCLs for the 90th percentile of the distribution for a specific value of k from Table 6. k = number of sub-samples per sample drawn from the analytical field data. 2. Lognormal Confidence Limits for a Percentile 𝐿𝐶𝐿1−𝛼,𝑝 = 𝑒𝑥𝑝[𝑦̅ + 𝐾𝛼,𝑝 𝑠𝑦 ] (3) Where, y =mean of the natural log-transformed data y=ln(x) sy= standard deviation of the transformed data Kα, p= One sided factors for 95% Confidence LCLs for the 90th percentile of the distribution for a specific value of k from Table 6. 3. Nonparametric Confidence Limits for a Percentile To construct a nonparametric confidence limit for a percentile of the concentration distribution, the k samples are first rank ordered in ascending order. Then the LCL with the desired confidence interval of a percentile was calculated by trial and error. The number of samples falling below the percentile of the distribution out of a set of k samples will follow a binomial 30 distribution with parameters k and success probability p, where success is defined as the event that a sample measurement is below the percentile. The cumulative binomial distribution [Bin(x:k,p] represents the probability of getting x or fewer successes in k trials with success probability p, and is evaluated by: 𝑘 𝐵𝑖𝑛(𝑥; 𝑘, 𝑝) = ∑𝑥𝑖=1 ( 𝑖 ) 𝑝𝑖 (1 − 𝑝)𝑘−𝑖 (4) 𝑘 Where,( 𝑖 ) = number of combinations of k subsamples taken i at a time k = number of subsamples i = number of trials From the ordered sample from smallest to largest (X (1), X (2), ….X (k)), the LCL of the 90th percentile is calculated by trial and error by computing the level of confidence p (or 1-α). The level of confidence is estimated as the probability of Y > L*. That is, 𝑝 = 1 − 𝛼 = 𝑃(𝑌 > 𝐿∗ ) = 1 − 𝑃(𝑌 ≤ 𝐿∗ − 1 ) (5) Where Y is a binomial random variable with a probability of success of 0.9 (corresponding to the 90th percentile). Accordingly, for the binomial distribution the confidence level is expressed as: 𝑝 = 1 − 𝐵𝑖𝑛(𝐿∗ − 1, 𝑛, 0.9) (6) Where, Bin(L*-1, k, 0.9) is the cumulative distribution function (CDF) of a binomial distribution with L*-1 successes in k trials with a probability of success of 0.9. The trial and error 31 process starts by setting L* to be equal to k to estimate the probability (p) using Equation 5. If the probability is less than the desired confidence (e.g. 95%), choose a new L* = k-1 and recalculate the probability until the desired confidence level (p) is achieved. Then the final L* represents the order that corresponds to the desired LCL. That is, LCL is estimated as X (L). 32 Table 6 One-Sided Factors for 95% Confidence LCL for the 75th and 90th percentile of the distribution k = 4 to 1000(Gibbons, 2003) m 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 35 40 50 60 120 240 480 1000 75th Percentile -0.155 -0.063 -0.002 0.052 0.091 0.123 0.150 0.174 0.194 0.212 0.228 0.243 0.256 0.267 0.278 0.288 0.298 0.306 0.314 0.322 0.329 0.335 0.342 0.348 0.353 0.358 0.363 0.385 0.403 0.431 0.451 0.514 0.560 0.592 0.617 90th Percentile 0.444 0.519 0.575 0.619 0.655 0.686 0.712 0.734 0.754 0.772 0.788 0.802 0.815 0.827 0.839 0.849 0.858 0.867 0.876 0.884 0.891 0.898 0.904 0.911 0.917 0.922 0.928 0.951 0.970 1.000 1.022 1.093 1.146 1.184 1.282 33 Chapter 3 METHODOLOGY The original goal of the project was to assess the accuracy of the method proposed in the Caltrans statewide permit (Tentative Order No. 2011-XX-DWQ) for determining whether or not a discharge might cause or contribute to an impairment of a receiving water body. The method is based on counting the exceedances of water quality objectives in stormwater discharge samples collected over a year. The method’s accuracy will be assessed by simulating many hypothetical years using existing data from the Caltrans Statewide Discharge Characterization Study and comparing the method results to the “true” determination provided by the California 303(d) process. A second method found in the literature (Gibbons, 2003) will also be assessed for comparison. Both methods will be tested using the existing Total Cadmium data from three Caltrans Sites. 3.1 Caltrans Data Total cadmium data collected during the Statewide Discharge Characterization Study were obtained (with permission) from the Caltrans stormwater database. Three highway sites with greater than 20 samples each were selected. The three sites represented one of each type-low, medium or high priority. In low priority sites less than 5% of the samples from the dataset exceed the CTR; in medium priority sites 5-50% of the samples from the dataset exceed the CTR; and in high priority sites greater than 50% 34 of the samples from the dataset exceed the CTR. The three sites chosen for this project are: 1. 1-34, 299E, Humboldt County(Site 1-Low priority) 2. 2-02, 5N, Tehama County(Site2-Medium priority) 3. 4-39, 580W, Alameda County(Site 3-High priority) The total cadmium data for the three sites is shown in Table7. The CTR value for total cadmium is 0.97 µg/L. Table 7 Input data Total Cadmium (Granicher, 2011) Low Priority 1-34, 299E, Humboldt County Site 1 “Reported Values” µg/L 0.3 0.2 0.2 0.4 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 Medium Priority 2-02, 5N, Tehama County Site 2 “Reported Values” µg/L 0.5 0.3 0.2 0.2 0.2 0.5 0.3 0.2 0.3 0.4 30 0.2 0.3 0.2 0.4 0.6 1 0.2 0.2 0.2 0.2 0.5 0.52 0.29 High Priority 4-39, 580W, Alameda County Site 3 “Reported Values” µg/L 4.8 0.7 0.9 1 1.8 0.2 4.7 4.1 2 0.6 1.3 1.2 1 1 0.9 1.8 3.3 1.1 0.9 1.9 1.8 1.5 1.7 35 3.2 303(d) Listing Process Whether or not a discharge is “clean” is determined by applying the California 303(d) listing process to the data from the Statewide Discharge Characterization Study. The number of samples that exceed the cadmium CTR value is counted and compared to the values in Table 5. Because none of the sites has more than 30 samples, the number of exceedances that would indicate an impairment is five. According to this criterion, Site 1 produces a “clean” discharge, Site 2’s discharge is “clean” and Site 3’s is “dirty”. (“Clean” in this context means the discharge does not exceed the WQO; “dirty” means that it does, as determined from the California 303(d) listing process.) 3.3 Permit Monitoring Model Program To test the accuracy of the permit method, a computer program was written to check whether stormwater discharge characteristics exceeds water quality objectives by simulating annual datasets created by sampling from the existing data set for each of the three sites. The program was written using Visual Basic for Applications (VBA) in EXCELTM. The flowchart for the code is shown in Appendix A. The VBA code is listed in Appendix B. The proposed Caltrans permit contains water quality action levels (Table4) for determining whether a discharge violates a water quality objective. The first action level states that three or more exceedances of a water quality objective (WQO) for any constituent is a violation. This will be referred to as “Rule 1” hereafter. The second 36 action level states that two or more exceedances of a WQO by 50% or more of the WQO value is a violation. This will be referred to as “Rule 2” hereafter. To use the program, the user must first import data from the Caltrans database to the Input worksheet. The input data were provided by the Office of Water Programs with Caltrans’ permission (Granicher, 2011) Data were randomly chosen from the analytical data set to create 100,000 simulated years with a user specified number of samples per year. The program looks at each annual data set and determines whether either or both rules are violated. The α and β values are calculated as: If the site is clean: 𝛼 = 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑒𝑥𝑐𝑒𝑒𝑑𝑎𝑛𝑐𝑒𝑠 𝑁 (7) Where number of exceedances = false positives, and N = number of simulations. If the site is dirty: 𝛽 = 𝑁−𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑒𝑥𝑐𝑒𝑒𝑑𝑎𝑛𝑐𝑒𝑠 𝑁 (8) Where (N - number of exceedances) = false negatives, and N = number of simulations The program code is divided into two subroutines, each controlled by a button embedded in the spreadsheet. GetInput Button Subroutine: This subroutine reads the data values from the Input worksheet and writes them to Output worksheet. It can automatically read and write data lists of different lengths. 37 ResultsButton Subroutine: This subroutine generates the random annual data sets and checks them against the two rules. The input variables required from the user are listed in Table 8. The random annual data sets are generated using the RND() function in VBA. The results are stored in the ‘Samplearray’ array. Next, the program checks each annual data set of ‘Samplearray’ to determine whether each annual data set violates Rule 1 and/orRule2. The results are stored in the ‘Ruleviolations’ array. Both the total number of times Rule 1 and Rule 2 are exceeded individually as well as together are calculated. Next, the mean and standard deviation of each annual data set in ‘Samplearray’ are calculated and stored it in the ‘Ruleviolations’ array. From the ‘Ruleviolations’ array it calculates the mean of the means of all annual data sets and standard deviation of all annual data sets and writes them to the Output worksheet. The total number of violations in the ‘Ruleviolations’ array are summed and used to calculateα and β are calculated, depending on the status of the site(clean or dirty). The power of the test is calculated from 1-β. A histogram and probability plot of the results are provided in the Output worksheet. The histogram shows the distribution of data points in the original data set and the simulated data set to demonstrate that the simulation is similar to the original data. To generate a histogram the numbers of bins is assumed to be 10, and the bin size and frequency are calculated from the ‘Samplearray’. The probability plot is a visual tool to show whether or not the data are normally distributed (USEPA, 2009a). The Shapiro 38 Wilk test (USEPA, 2009a) was used to test whether the data follows a normal or a lognormal distribution. Table 8 User-Defined Input k CTR 3.4 Number of subsamples per iteration. k corresponds to the number of samples collected per year. This value has to be less than or equal to the number of results in the dataset. CTR is the water quality objective from the California Toxics Rule. It is used to assess the permit rules. The CTR values for different metals were obtained from Table 3-18 of Statewide Discharge Characterization Report. Alternate Statistical Model The alternate model uses the Gibbons method to determine violations. If the 95% LCL of the 90th percentile of the distribution of the concentration is less than the CTR a violation is recorded. For non-parametric data, the subsamples are randomly generated using the Rnd () function in EXCELTM. For normally distributed data, the random subsamples are generated using the Norm_Inv (Rnd(), mean, standard deviation) function in which the mean and standard deviation are those of the data set. For a lognormal data set, the random numbers are generated using the mean and standard deviation of normally distributed natural log-transformed data. The formulas used for calculating mean and standard deviation for a lognormal data set are as follows (Singh, 1997): Mean-𝜇1 = 𝑒𝑥𝑝[𝜇 + 0.5𝜎 2 ] (9) Standard deviation-𝜎1 = √[exp(2𝜇 + 𝜎 2 )][exp(𝜎 2 − 1]) (10) Where µ = Mean of natural log-transformed data. σ= Standard deviation of natural log-transformed data. 39 µ1 = Mean of the original random variable. σ1= Standard deviation of the original random variable. Depending on how the data set is distributed (normal, lognormal or nonparametric), the methods described in Chapter 2 can be used to calculate LCL P90, 95 values for each simulated year and each user chosen number of samples (k). These values are then compared with the WQO to determine the number of violations and the α and β values are calculated as in the Permit Monitoring Model. The program is divided into two subroutines, each controlled by a button embedded in the spreadsheet: GetSampleButton Subroutine: This subroutine reads the reported values from the Input worksheet and writes them to the Alternate worksheet. It can automatically read samples of different lengths from the Input worksheet. The VBA code is in Appendix B. CheckButton Subroutine: This subroutine, based on the user defined input variables (see Table 8), first generates random samples. The results are stored in the ‘Samplearray’ array. Next, it orders the subsamples using a bubble sort technique and replaces the results in ‘Samplearray’. The Samplearray is now an ordered array with the values in each simulated annual data set arranged in ascending order. Then the 90th percentile value for each annual data set is calculated and stored in the array ‘percent’. This calculation is based on the distribution already determined using the Shapiro Wilk test in the Permit Monitoring Model. Next, the 95% LCL is calculated and stored it the array 40 ‘LCL’. Finally, the number of times the 95% LCL is less than WQO is counted. Based on this, α, β and power are calculated as described previously. 3.5 Delta Procedure To better understand the relationship between the number of samples, Type I error (α), Type II error (β) and the relative magnitude of the data WQO with respect to data, further testing was accomplished using a range of “synthetic” (hypothetical) CTR values. The synthetic CTR was calculated as follows: 𝑆𝑦𝑛𝑡ℎ𝑒𝑡𝑖𝑐 𝐶𝑇𝑅 = 𝑥̅ + 𝑎 ∗ 𝑠 (11) Where 𝑥̅ = mean of analytical data a = number of standard deviations from the mean (0 to 2 chosen by user) s = standard deviation of analytical data. In this analysis, the discharge is first determined to be clean or dirty, based on the 303(d) listing process and the synthetic CTR. Then both models are run with a range of k values. If the site is clean, α is calculated for different sample sizes. If the site is dirty, β is calculated. For the Alternate Statistical Model the annual data sets are assumed to have non-parametric distributions and the 95% LCL of the 90th percentile is determined using the method described in Section 2.5. The 95% LCL of the 90th percentile is compared to the synthetic CTR to calculate α and β as appropriate. 41 Chapter 4 RESULTS AND DISCUSSION Presented in this chapter are the results of the computer simulations. 4.1 Permit Monitoring Model Results The Permit Monitoring Model was built to check the Water Quality Action Levels (Table 4) specified in Tentative Order No. 2011-XX-DWQ. The purpose of this model is to evaluate the risk of making incorrect conclusions about stormwater discharge quality. The risk is measured by α and β. The expectation is that these risks will decrease with increasing number of samples. Each site was tested using sample sizes of 3, 5, 10, 15 and 20. A histogram was plotted to compare the actual and simulated data distributions. The simulation results for Site 1 are shown in Table 9 and Figure 6. Table 9 Permit Monitoring Model Results for Site 1 Site:1-34 299E Humboldt County Results from the 303d Listing Process: Clean Subsample Size in a k=3 Hypothetical Year Mean 0.213 Standard Deviation 0.020 Permit Monitoring Results Probability of Exceedance of 0 Rule 1 Probability of Exceedance of 0 Rule 2 α 0 β NA Power of Test NA k=5 k = 10 k = 15 k = 20 0.212 0.025 0.212 0.032 0.212 0.035 0.213 0.038 0 0 0 0 0 0 0 0 0 NA NA 0 NA NA 0 NA NA 0 NA NA As seen in Figure 5, the distribution of the simulated data is essentially identical to that of the original data set. Site 1 produces a “clean” discharge according to 303(d) listing process. The simulation results indicate that there was no exceedance of the 42 WQO. Site 1 was tested primarily to demonstrate that the simulated data sets are similar to the original data set and that the model does not predict an exceedance when there are none. The original data set was checked for normality using the Shapiro Wilk test and it was found to not follow a normal distribution. Next, the original data set was checked for log normality, using the Shapiro Wilk test on the log-transformed data and it was found not to follow lognormal distribution either. Therefore, a non-parametric distribution approach is applicable. 43 Histogram 100.0 90.0 80.0 Frequency% 70.0 60.0 50.0 Simulated Data Freq % Input Data Freq % 40.0 30.0 20.0 10.0 0.0 0.22 0.24 0.26 0.28 0.30 0.32 0.34 0.36 0.38 0.40 Bins Figure 5- Comparing Simulated and Actual Data for Site 1 Next, Site 2, which is a medium priority site (8% of the samples exceeded the CTR) was evaluated. The results are shown in Table 10 and Figure 6. As seen in Figure 6, the distribution of the simulated data is essentially identical to that of the original data set. The discharge from Site 2 is classified “clean” according to the 303(d) process, so α values were calculated. The results show that at the permit-required number of samples(3) α is 0.5%, meaning the risk of incorrectly classifying the site as dirty when it is clean is relatively small. As the number of samples collected per year (k) increases, however, this value increases, meaning the risk of making an error increases with greater amounts of data. As the number of samples increases, the probability of capturing values 44 that exceed the WQO in the random sample increases. This happens because Rule1 and Rule 2 are setup based on three samples. As larger number of samples are collected, the probability that three samples will exceed the WQO or that two will exceed it by 50% rises. It is inappropriate, however to compare risks for different k values because the permit rules are based on three samples. If the number of samples to be required by the State Board is larger than three, it would have written water quality action levels based on that number of samples. The only valid Type I error risk calculated according to permit procedure is for k = 3, which is 0.5%. In the original data set all of the sample concentrations are ≤ 1 µg/L except one which is 30 µg/L, this causes the distribution to be positively skewed. For a sample size of 3, the mean is 1.605 µg/L which is greater than the WQO. A closer look at the distribution of the original data set is recommended to make an impairment decision rather than a simple tally of numbers as in the permit procedure. Table 10 Permit Monitoring Model Results for Site 2 Site: 2-02, 5N, Tehama County Results from the 303d Listing Process: Clean Subsample Size in a k=3 k=5 Hypothetical Year Mean 1.605 1.587 Standard Deviation 2.226 2.743 Permit Monitoring Results Probability of Exceedance of 0.00055 0.00467 Rule 1 Probability of Exceedance of 0.00511 0.015 Rule 2 α 0.005 0.015 β NA NA Power of Test NA NA k = 10 k = 15 k = 20 1.575 3.561 1.583 4.116 1.574 4.466 0.044 0.124 0.229 0.062 0.127 0.20 0.062 NA NA 0.127 NA NA 0.22 NA NA 45 When the original data set was checked for normality using the Shapiro Wilk test, it was found that it does not follow either normal or lognormal distribution. Therefore, the non-parametric approach is applicable. Histogram 100.0 90.0 80.0 Frequency% 70.0 60.0 50.0 Simulated Data Freq % Input Data Freq % 40.0 30.0 20.0 10.0 0.0 3.18 6.16 9.14 12.12 15.10 18.08 21.06 24.04 27.02 30.00 Bins Figure 6- Comparing Simulated and Actual data for Site 2 The last site, Site 3, is a high priority site (74 % of the sample exceeded the CTR). According to the 303(d) process, Site 3 produces a “dirty” discharge. Therefore, the model calculated β values for various sample sizes as shown in Table 11. The distribution of simulated data compared to the original data set is shown in Figure 7. As 46 shown the distribution of the simulated data is essentially identical to that of the original data set. The model results indicate that as the sample size increases the β decreases. β is the probability of not violating Rule 1 and Rule 2 when the discharge is dirty, i.e. concluding the discharge is clean when it is not. Consistent with the findings of Site 2, the probability of violating the rules increases with sample size. Although, theoretically β is expected to decrease with increase in sample size, because we are applying the water quality action levels based on a sample size of 3 to larger sample sizes, the probability of violating the rules is overestimated, which causes the rapid decline in the value of β, as shown in the results below. At k=3, where the β value is accurate, the risk of missing a dirty discharge is 53% at this site. More sites should be evaluated to better understand this risk for a sample size of 3. Due to the varied data in the actual data set, this site seems more typical of Caltrans sites than Site2. Table 11 Permit Monitoring Model Results for Site 3 Site: 4-39, 580W, Alameda County Results from the 303d Listing Process: Dirty Subsample Size in a k=3 Hypothetical Year Mean 1.75 Standard Deviation 1.015 Permit Monitoring Results Probability of Exceedance of 0.401 Rule 1 Probability of Exceedance of 0.466 Rule 2 α NA β 0.53 Power of Test 0.46 k=5 k = 10 k = 15 k = 20 1.746 1.109 1.748 1.185 1.746 1.206 1.749 1.220 0.88 0.99 1 1 0.78 0.98 0.99 0.99 NA 0.115 0.88 NA 0.001 0.99 NA 0 1 NA 0 1 47 When the data were checked for normality and lognormality using the Shapiro Wilk test, it was found that the data does follows a lognormal distribution. Figure 7- Comparison of Simulated and Actual Data for Site 3 48 4.2 Alternate Statistical Model The Alternate Statistical Model takes the actual distribution of the data into consideration. Based on the type of distribution of the original data set, the respective 95% LCL of the 90th percentile is calculated for each of 100,000 simulations. The 95% LCL is compared to the CTR value (i.e. water quality objective) and the number of times LCL is less than the CTR value is tabulated. If the site produces a clean discharge according to the 303(d) listing process, then the program computes the α value, otherwise it computes the β value. The model was used to test the three sites with different numbers of annual simulated samples. The advantage of this method is that the estimations of α and β are not dependent on a fixed number of exceedances as in the permit procedure. Therefore, for a particular sample size, an accurate estimate of the risk can be made based of sample size and CTR. The disadvantage of this method is that a minimum sample size of four is needed to make an accurate determination of 95% LCL of the 90th percentile (Table 6). For modeling purposes, a sample size of 5 was chosen. As a result, the method cannot be used to evaluate the risk calculated by the permit procedure for a sample size of 3. As noted earlier, Site 1 produces a “clean” discharge and the original data set does not follow normal or lognormal distributions. Since the discharge is clean α values are computed as shown in Table 12. The α values were computed by calculating the number of times 95% LCL of the 90th percentile is greater than the CTR. The 95% LCL of the 90th percentile of each of the 100,000 simulations was determined using Gibbon’s method 49 for non-parametric confidence limit of a percentile. In this site, none of the samples exceed the CTR, therefore α = 0. Table 12 Alternate Statistical Model Results for Site 1 Site:1-34 299E Humboldt County Results from the 303d Listing Process: Clean Subsample Size in a k=5 k = 10 Hypothetical Year Alternate Statistical Model Number of times 95% LCL < 100000 100000 CTR α 0 0 β NA NA Power of Test NA NA k = 15 k = 20 100000 100000 0 NA NA 0 NA NA Site 2 produces a clean discharge with only two members of the original data set exceeding the CTR value. The original data set follows a non-parametric distribution and the 95 % LCLs of the 90th percentile for each of the 100,000 simulations were calculated based on the Gibbon’s method as discussed above. The calculated α values are shown in Table 13. It is observed that as sample size increases α increases. To better understand why α increases more detailed analysis was done, which will be described at the end of this chapter. As expected, the increase in α with sample size differs from results obtained by applying the permit rules to sample sizes greater than three. 50 Table 13 Alternate Statistical Model Results for Site 2 Site: 2-02, 5N, Tehama County Results from the 303d Listing Process: Clean Subsample Size in a k=5 Hypothetical Year Alternate Statistical Model Number of times 95% LCL < 99532 CTR α 0.004 β NA Power of Test NA k = 10 k = 15 k = 20 99317 99394 97851 0.006 NA NA 0.006 NA NA 0.020 NA NA Site 3 produces a dirty discharge whose concentration data follows a lognormal distribution. The 95% LCL of the 90th percentile of each of the 100,000 simulations were calculated using Gibbon’s method of determining lognormal confidence limits for a percentile. Because discharge exceeds CTR β values are computed as shown in Table 14. As seen in Table 14, the β values decreases with increase in sample size. This can be explained by taking a closer look at the original data set. At this site, a large number of samples exceed the CTR. Only, 6 data points out of 23 are less than the CTR, so as sample size increases, the number of times 95% LCL is less than CTR decreases. This site has a low β value because the CTR is lower than majority of the original data. To better understand the affect the relative position of the CTR with respect to the original data has on risk estimation, lead to the delta procedure. 51 Table 14 Alternate Statistical Model Results for Site 3 Site: 4-39, 580W, Alameda County Results from the 303d Listing Process: Dirty Subsample Size in a k=5 Hypothetical Year Alternate Statistical Model Number of times 95% LCL < 4 CTR α NA β 0.00004 Power of Test 0.99996 4.3 k = 10 k = 15 k = 20 0 0 0 NA 0 1 NA 0 1 NA 0 1 Delta Procedure To better understand how α and β behave in cases where the runoff is on the borderline between “clean” and “dirty”, the Site 3 data were evaluated against a synthetic(hypothetical) WQO that could be moved a distance “delta” in relation to the mean of the data set. The synthetic CTR value was chosen to be various increments of standard deviation (between 0 and 2) above the mean of the data set(see Section 3.5). The evaluation was performed with the Alternate Statistical Model program using a nonparametric approach. Table 15 shows the synthetic CTR values. Table 15 Synthetic CTR Values Site: 4-39, 580W, Alameda County Mean of Sample = 1.748 Standard deviations above 0 Mean Synthetic CTR 1.748 0.5 1 1.5 2 2.385 3.022 3.659 4.295 For a synthetic CTR value of 1.748, the discharge was determined to be “dirty” using the 303(d) listing factors. As a result, the β values shown in Table16 were calculated. Note that the β values are substantially larger than they were for the true CTR of 0.97µg/L, because the synthetic CTR is higher in relation to the data set. For the data 52 sets below, but near the WQO, the risk of erroneously deciding that the discharge is clean, when in fact it is dirty increases significantly. As shown in Table 16, the risk decreases with increasing number of samples. Table 16 Results for Synthetic CTR = 1.748 Site: 4-39, 580W, Alameda County Synthetic CTR: 1.748 Results from the 303d Listing Process: Dirty Subsample Size in a k=5 k = 10 Hypothetical Year Alternate Statistical Model α NA NA β 0.69 0.4 k = 15 k = 20 NA 0.23 NA 0.05 When the synthetic CTR value is raised in relation to the data to a value of 2.385 (one standard deviation above the mean), the 303(d) classification changes from “dirty” to “clean”. This is because only four samples in the original data set are higher than 2.385. As a result the α values were calculated, as shown in Table 17. The α values are observed to increase with increase in number of samples, as seen in the earlier cases. 53 Table 17 Results for Synthetic CTR = 2.385 Site: 4-39, 580W, Alameda County Synthetic CTR: 2.385 Results from the 303d Listing Process: Clean Subsample Size in a k=5 k = 10 Hypothetical Year Alternate Statistical Model α 0.03 0.08 β NA NA k = 15 k = 20 0.1 NA 0.26 NA These calculations were repeated for synthetic CTR values of 3.022, 3.659 and 4.295 as shown in Table 18, 19 and 20respectively. Table 18 Results for Synthetic CTR = 3.022 Site: 4-39, 580W, Alameda County Synthetic CTR: 3.022 Results from the 303d Listing Process: Clean Subsample Size in a k=5 k = 10 Hypothetical Year Alternate Statistical Model α 0.04 0.07 β NA NA k = 15 k = 20 0.104 NA 0.25 NA Table 19 Results for Synthetic CTR = 3.659 Site: 4-39, 580W, Alameda County Synthetic CTR: 3.659 Results from the 303d Listing Process: Clean Subsample Size in a k=5 k = 10 Hypothetical Year Alternate Statistical Model α 0.017 0.03 β NA NA k = 15 k = 20 0.036 NA 0.19 NA 54 Table 20 Results for Synthetic CTR = 4.295 Site: 4-39, 580W, Alameda County Results from the 303d Listing Process: Clean Synthetic CTR: 4.295 Subsample Size in a k=5 k = 10 Hypothetical Year Alternate Statistical Model α 0.005 0.007 β NA NA k = 15 k = 20 0.007 NA 0.0252 NA To better understand why α increases with sample size, the 95% LCL was investigated for each sample size. The simulation was run 1000times. Then, the array of 1000 95% LCLs was ordered in ascending order and from this the summary statistics shown in Table 21 were calculated. It is observed that the average 95% LCL of the 1000 samples increases as the sample size increases. This is consistent with the idea of a lower confidence limit. As the number of samples increase, the width of the confidence interval decreases. As the LCL increases with sample size, the probability of exceeding the fixed CTR value also increases which increases α. This indicates that the risk of making Type I error increases with increase in sample size. This is contrary to expectations that the probability of error should decrease with increase in number of data points. Consequently, this suggests that Gibbon’s method as described in his paper is not a good method for calculating α in the bootstrapping type procedure. Comparing the LCL to the CTR does not seem to be the best way to calculate α. It is recommended to conduct further investigation on the distribution of the simulated 90th percentile. 55 Table 21 Summary Statistics for Non-Parametric 95% LCL of 90th percentile Site: 4-39, 580W, Alameda County Subsample Size in a k=5 Hypothetical Year Average LCL 1.44 Stdev LCL 0.92 95% LCL 0.9 95% UCL 3.3 k = 10 k = 15 k = 20 1.79 1.097 1 4.1 1.92 1.15 1.2 4.1 2.36 1.451 1.7 4.7 56 Chapter 5 CONCLUSION In this project, annual datasets were simulated by sampling from existing data, with two Excel Visual Basic Application (VBA) computer models. The purpose of the simulations was to determine the statistical power of the test outlined in the proposed Caltrans stormwater discharge permit (Tentative Order No. 2011-XX-DWQ) and to suggest an alternate method that is applicable to a wide variety of different statistical concentration distributions. Water Quality Action Levels specified in the Tentative Order when applied to a sample size of three, the α value is 0.005 at 2-02, 5N, Tehama County and the β value is 0.53 at 4-39, 580W, Alameda County. Synthetic CTR indicates that as the CTR moves away from the mean of the data, the chance of making a Type I error decreases. For a given CTR, it was observed that β decreases with sample size. The α value was observed to increase which is generally not the case. Further research should be conducted to investigate the distribution of the simulated 90th percentile and how that affects the behavior of α. 57 APPENDIX A Flowcharts VBA Logic for Get Input: Start Count the Number of cells in the Reported values column in Input worksheet. Re-dimension array (0 to Number of cells) Read values in Reported values column in Input sheet and add to array. Write array to ‘Data_Table’ in Output worksheet& Resize Table End 58 VBA Logic for Results button : Start Input k and CTR in User Defined Input Table in Output worksheet Assign ‘Data_Table’ to a variable DataInput Re-dimension Samplearray & Ruleviolations array based on k. A 59 A Initialize j to 1 (Iteration count) Initialize i to 1 (sample count) M= 1 + Int (Rnd() *DataInput.Count) Write output to Samplearray (i, j) i = i+1 No Is i = k? j = j+1 Yes No Is j= n? Yes Samplearray (i, j) is stored as virtual array. B 60 B Initialize j to 1 (Iteration count) E Initialize i to 1 (sample count) Check Samplearray (i, j) > rule1? Check Samplearray (i, j) > rule2? Calculate Mean Samplearray (i, j.) Write results to array ‘Rule violations’ C F Calculate stdev Samplearray (i, j.) 61 C i = i+1 No Is i = k? Yes j = j+1 No Is j= n? Yes Ruleviolations is stored as a virtual array. Calculate: Number of times rule1and rule2 are exceeded Calculate: Total number of times rule pairs are exceeded Write output to Output Table in Output worksheet. 62 F Number of bins=10 Length= binsize(using DataInput) Initialize p to 1 (bin count) breaks (p) =class interval frequency (1 to 10) =frequency Is Samplearray (i, j)> breaks (p1) & < breaks (p)? Yes D No Next p 63 D frequency (p) = frequency (p) +1 Write breaks (p) and frequency to Output worksheet. E End 64 APPENDIX B VBA Program Code This is the VBA Code for the Permit Monitoring Model: 1. Get Input Button Sub GetInputbutton() 'Declaring Variables Dim myinputsheet As Worksheet Dim myOutputsheet As Worksheet Dim cnt As Integer, i As Integer, j As Integer, y As Integer Dim Rng As Range Dim myArray() As Double 'This macro copies the Reported values from the sheet named "Input" and pastes it into sheet named "Output". Set myOutputsheet = ActiveWorkbook.Worksheets("Output") myOutputsheet.ListObjects("Data_Table").ListColumns(1).DataBodyRange.ClearContents Set myinputsheet = ActiveWorkbook.Worksheets("Input(Clean)") 'This part of the code is to count the number of data points in the reported values column in the Input worksheet. i=9 cnt = 0 Do While (Len(Trim(myinputsheet.Cells(i, 9))) > 0) cnt = cnt + 1 i=i+1 Loop 'The array is redimensioned depending on the number of data points and the data is written to the array ReDim myArray(0 To cnt) i=9 j=0 For i = 9 To (cnt + 8) Step 1 myArray(j) = myinputsheet.Cells(i, 9) 'Debug.Print myarray(j) j=j+1 Next i 'The array is written to the Output worksheet i=8 j=0 x = UBound(myArray) For i = 8 To (UBound(myArray) + 7) Step 1 myOutputsheet.Cells(i, 2) = myArray(j) j=j+1 65 Next i 'Resizes the Data_Table in Outputworksheet y = cnt + 7 myOutputsheet.ListObjects("Data_Table").Resize Range("B7:C" & y) End Sub 2. Results Button Sub Resultsbutton() ‘Declare Variables Dim Samplearray() As Double 'Samplearray is where the randomly generated sub-samples are stored virtually Dim Ruleviolations() As Variant 'Ruleviolations is where the Rule violations are stored virtually Dim myworksheet As Worksheet Dim i As Long, j As Long, n As Integer, k As Long, M As Integer, cnt1 As Integer, cnt2 As Integer, cnt3 As Integer, cnt4 As Integer, cnt 5 as Long, Mean As Double, SumSq As Double, StdDev As Double, Mean1 As Double, p As Long, x As Integer, s as double. 'i is the counter for the number of subsamples per iteration,k is the number of subsamples per iteration,j is the counter for the number of iterations, n is the number of iterations. M is the temporary variable to store the random number. 'cnt1 is the number of times Rule1 is exceeded in a particular iteration, cnt2 is the number of times rule 1 is violated, cnt3 is the number of times rule2 is exceededed in a particular iteration, cnt4 is the number of times rule 1 and 2 is exceeded, cnt 5 is the number of times rule 2 is exceeded 'mean is the mean of each subsample,Sumsq is the sum of squares of the subsample it helps in calculating the standard deviation,StdDev is the standard deviation of each subsample, Mean1 is the Mean of the Means of all subsamples, p is the counter variable for the histogram calculations. Dim DataInput As Range Dim rule1 As Double, rule2 As Double 'rule1 is the CTR value. CTR value is obtained from Table 3-18 of Statewide Discarge Characterization Report. 'rule1 states that greater than or equal to 3 exceedance of WQO(in this case CTR) is a violation. 'rule2 states that greater than or equal to 2 exceedances of WQO( in this case CTR) by 50% or more is a violation. Dim value As Double, value1 As Double, value2 As Double 'value1 is a variable to hold each array element and help in the mean calculation 'value2 is a variable that holds each array element for the calculation of mean for the summary statistics Dim Length As Double ReDim breaks(10) As Double ReDim freq(10) As Double Set myworksheet = ActiveWorkbook.Worksheets("Output") Set DataInput = myworksheet.ListObjects("Data_Table").ListColumns(2).DataBodyRange 66 k = Sheets("Output").Range("M8").value rule1 = Sheets("Output").Range("M9").value 'CTR value in User Defined table 'gets the value for input variable k and CTR from the respective cells in Outputsheet. ReDim Samplearray(k, 100000) ReDim Ruleviolations(100000, 6) 'This part of the code is for sampling. For j = 1 To 100000 Step 1 For i = 1 To k Step 1 M = 1 + Int(Rnd * DataInput.Count) 'random number generation Samplearray(i, j) = DataInput.Cells(M, 1) 'Samplearray is the virtual array Next i Next j ' This part of the code is for checking Rules 1 and 2. cnt2 = 0 cnt3 = 0 cnt4 = 0 cnt5 = 0 cnt6 = 0 cnt7 = 0 For j = 1 To 100000 Step 1 cnt1 = 0 cnt3 = 0 sumofvals = 0 sumofvalue1 = 0 For i = 1 To k Step 1 value = Samplearray(i, j) 'value stores elements to check rule1 value1 = Samplearray(i, j) 'value1 stores elements to help in the calculation of Mean sumofvalue1 = sumofvalue1 + value1 'sumofvalue1 is the sum of values in a sample example sumof values in sample1 If (value > rule1) Then 'rule1=CTR value cnt1 = cnt1 + 1 End If rule2 = rule1*1.5’checking rule2 if (value>rule2) Then cnt3 = cnt3 +1 End if Next i Ruleviolations(j, 1) = j If (cnt1 = 3 Or cnt1 > 3) Then 'checking rule 1 Ruleviolations(j, 2) = True ' rule1 is exceeded cnt2 = cnt2 + 1 'cnt2 is the number of times rule1 is exceeded Else Ruleviolations(j, 2) = False 'rule1 is not violated End If If (cnt3 = 2 cnt3 > 2) Then 'checking rule 2 67 Ruleviolations(j, 3) = True ' rule2 is exceeded Cnt5 = cnt5 + 1 'cnt5 is the number of times rule2 is exceeded Else Ruleviolations(j, 3) = False End If If (Ruleviolations(j, 2) = True And Ruleviolations(j, 3) = True) Then cnt4 = cnt4 + 1 'cnt4 is the counter variable for the number of times rule 1 and rule2 are exceeded. Else cnt4 = cnt4 End If Mean = (sumofvalue1 / k) ' code for calculating mean Ruleviolations(j, 5) = Mean value2 = Ruleviolations(j, 5) sumofvalue2 = sumofvalue2 + value2 Mean1 = (sumofvalue2 / 100000) 'Mean1 is the mean of all the iterations that will be output into Summary Statistics SumSq = 0 'Code for calculating standard deviation For i = 1 To k SumSq = SumSq + (Samplearray(i, j) - Mean) ^ 2 StdDev = Sqr(SumSq / (k - 1)) Ruleviolations(j, 6) = StdDev Next i value3 = Ruleviolations(j, 6) SumSq1 = SumSq1 + value3 StdDev1 = (SumSq1 / 100000) 'average of standard dev 'This part of code is for outputing the bins and Frequency needed for the histogram Length = (((WorksheetFunction.Max(DataInput)) - (WorksheetFunction.Min(DataInput))) / 10) 'binsize, Considering number of bins to be fixed and No. of bins = 10 For p = 1 To 10 Step 1 breaks(p) = (WorksheetFunction.Min(DataInput)) + (Length * p) 'calculating class intervals Next p For i = 1 To k Step 1 If (Samplearray(i, j) <= breaks(1)) Then freq(1) = freq(1) + 1 If (Samplearray(i, j) > breaks(1) And Samplearray(i, j) <= breaks(2)) Then freq(2) = freq(2) + 1 If (Samplearray(i, j) > breaks(2) And Samplearray(i, j) <= breaks(3)) Then freq(3) = freq(3) + 1 If (Samplearray(i, j) > breaks(3) And Samplearray(i, j) <= breaks(4)) Then freq(4) = freq(4) + 1 If (Samplearray(i, j) > breaks(4) And Samplearray(i, j) <= breaks(5)) Then freq(5) = freq(5) + 1 If (Samplearray(i, j) > breaks(5) And Samplearray(i, j) <= breaks(6)) Then freq(6) = freq(6) + 1 68 If (Samplearray(i, j) > breaks(6) And Samplearray(i, j) <= breaks(7)) Then freq(7) = freq(7) + 1 If (Samplearray(i, j) > breaks(7) And Samplearray(i, j) <= breaks(8)) Then freq(8) = freq(8) + 1 If (Samplearray(i, j) > breaks(8) And Samplearray(i, j) <= breaks(9)) Then freq(9) = freq(9) + 1 If (Samplearray(i, j) >= breaks(9)) Then freq(10) = freq(10) + 1 Next i Next j For p = 1 To 10 Step 1 Cells(p + 8, 30) = breaks(p) Cells(p + 8, 31) = freq(p) Next p ‘To check if the sample is clean, using 303d listing process for a sample size of 5-30 If(s <5) Then Sheets("Output").Cells(21, 13) = “Clean” Else Sheets("Output").Cells(21, 13) = “Dirty” End if 'This part of the code is to write the output to the Output worksheet Sheets("Output").Cells(14, 13) = cnt2 Sheets("Output").Cells(15, 13) = cnt5 Sheets("Output").Cells(18, 13) = cnt4 Sheets("Output").Cells(12, 13) = Mean1 Sheets("Output").Cells(13, 13) = StdDev1 MsgBox "Done" End Sub This is the VBA code for the Alternate Statistical Model: 1. Get Sample button Sub GetSamplebutton() 'Declaring Variables Dim myinputsheet As Worksheet Dim myAlternatesheet As Worksheet Dim cnt As Integer, i As Integer, j As Integer, z As Integer Dim sample() As Double 'This macro copies the Reported values from the sheet named "Input" and pastes it into sheet named "Alternate". Set myAlternatesheet = ActiveWorkbook.Worksheets("Alternate") myAlternatesheet.Range("Sample_Table").ClearContents Set myinputsheet = ActiveWorkbook.Worksheets("Input(Clean)") 'This part of the code is to count the number of data points in the reported values column in the Input worksheet. i=9 cnt = 0 Do While (Len(Trim(myinputsheet.Cells(i, 9))) > 0) cnt = cnt + 1 69 i=i+1 Loop 'The array is redimensioned depending on the number of data points and the data is written to the array ReDim sample(0 To cnt) i=9 j=0 For i = 9 To (cnt + 8) Step 1 sample(j) = myinputsheet.Cells(i, 9) 'Debug.Print sample(j) j=j+1 Next i 'The array is written to the Alternate worksheet i=8 j=0 x = UBound(sample) For i = 8 To (UBound(sample) + 7) Step 1 myAlternatesheet.Cells(i, 2) = sample(j) j=j+1 Next i 'Resizes the Sample_Table in Alternate worksheet z = cnt + 7 myAlternatesheet.ListObjects("Sample_Table").Resize Range("B7:B" & z) End Sub 2. Check Button a. This is for Non-Parametric data distribution. Sub Samplerbutton() ‘Declaring Variables Dim Samplearray() As Double 'Samplearray is where the samples are stored virtually Dim tempcolumn() As Double Dim percent() As Double Dim LCL() As Double Dim myworksheet As Worksheet Dim i As Long, j As Long, k As Long, M As Integer, t As Long, p As Long, temp As Double, Mean As Double, cnt As Integer, value As Double, StDev As Double, x As Long, WQO as double. 'i is the counter for the number of subsamples per iteration,k is the number of subsamples per iteration,j is the counter for the number of iterations, n is the number of iterations, it is a user input. M is the temporary variable to store the random number. Dim DataInput As Range Dim SampleResults As Range 70 'SampleResults is where the samples will be displayed in Alternate worksheet Dim Samplepercentile As Range Dim SampleLCL As Range Set myworksheet = ActiveWorkbook.Worksheets("Alternate") Set DataInput = myworksheet.ListObjects("Sample_Table").DataBodyRange k = Sheets("Alternate").Range("K10").value WQO = Sheets("Alternate").Range("K7").value 'gets the value for input variable k Alternatesheet. ReDim Samplearray(k, 100000) ReDim tempcolumn(k) ReDim percent(100000) ReDim LCL(100000) 'This part of the code is for randomsampling. For j = 1 To 100000 Step 1 For i = 1 To k Step 1 M = 1 + Int(Rnd * DataInput.Count) Samplearray(i, j) = DataInput.Cells(M, 1) Next i Next j 'This part of the code is for Bubblesort For j = 1 To 100000 Step 1 For i = 1 To k Step 1 For p = 1 To (UBound(Samplearray) - 1) For t = p To UBound(Samplearray) If Val(Samplearray(t, j)) < Val(Samplearray(p, j)) Then temp = Samplearray(p, j) Samplearray(p, j) = Samplearray(t, j) Samplearray(t, j) = temp End If Next t Next p 'This line of code is to check if the bubblesort code is working correctly, the array is written to the worksheet 'Alternate'. 'SampleResults.Cells(i, j) = Samplearray(i, j) ‘From Gibbons, Binomial method order is calculated for various sample sizes.. If k = 5 Then LCL(j) = Samplearray(3, j) If k = 10 Then LCL(j) = Samplearray(7, j) If k = 15 Then LCL(j) = Samplearray(11, j) If k = 20 Then LCL(j) = Samplearray(16, j) 'SampleLCL.Cells(j) = LCL(j) Next i If (LCL(j) < WQO) Then ‘counting the number of times 95% LCL < CTR x=x+1 Else x=x End If 71 Next j Sheets("Alternate").Cells(13, 11) = x MsgBox "Done" End Sub b. This is for Log-Normal Distribution Sub Samplerbutton() ‘Declaring Variables Dim Samplearray() As Double 'Samplearray is where the samples are stored virtually Dim tempcolumn() As Double Dim percent() As Double Dim Mean() As Double Dim stdev() As Double Dim LCL() As Double Dim myworksheet As Worksheet Dim i As Integer, j As Long, k As Long, M As Double, t As Long, p As Long, temp As Double, cnt As Integer, value As Double, sumofvalue As Double, Coeff As Double, x As Integer 'i is the counter for the number of subsamples per iteration,k is the number of subsamples per iteration,j is the counter for the number of iterations, n is the number of iterations, it is a user input. M is the temporary variable to store the random number. Dim DataInput As Range Dim SampleResults As Range 'SampleResults is where the samples will be displayed in Alternate worksheet Dim Samplepercentile As Range Dim SampleMean As Range Dim SampleStdev As Range Dim SampleLCL As Range Set myworksheet = ActiveWorkbook.Worksheets("Alternate") Set DataInput = myworksheet.ListObjects("Sample_Table").DataBodyRange k = Sheets("Alternate").Range("K10").value WQO = Sheets("Alternate").Range("K7").value 'gets the value for input variable k Alternatesheet. ReDim Samplearray(k, 100000) ReDim tempcolumn(k) ReDim percent(100000) ReDim Mean(100000) ReDim stdev(100000) ReDim LCL(100000) 'This part of the code is for randomsampling. For j = 1 To 100000 Step 1 For i = 1 To k Step 1 M = Application.WorksheetFunction.Norm_Inv(Rnd(), 1.98, 1.41) Samplearray(i, j) = M 72 'This part of the code is for Bubblesort Next i Next j For j = 1 To 100000 Step 1 For i = 1 To k Step 1 For p = 1 To (UBound(Samplearray) - 1) For t = p To UBound(Samplearray) If Val(Samplearray(t, j)) < Val(Samplearray(p, j)) Then temp = Samplearray(p, j) Samplearray(p, j) = Samplearray(t, j) Samplearray(t, j) = temp End If Next t Next p 'This line of code is to check if the bubblesort code is working correctly, the array is written to the worksheet 'Alternate'. ' SampleResults.Cells(i, j) = Samplearray(i, j) Next i Next j 'This part of code is to calculate 90th percentile For j = 1 To 100000 Step 1 value = 0 sumofvalue = 0 SumSq = 0 If (k = 5) Then Coeff = 0.519 If (k = 10) Then Coeff = 0.712 If (k = 15) Then Coeff = 0.802 If (k = 20) Then Coeff = 0.858 For i = 1 To k Step 1 tempcolumn(i) = Samplearray(i, j) percent(j) = Application.WorksheetFunction.Percentile_Inc(tempcolumn, 0.9) 'Samplepercentile.Cells(j) = percent(j) value = Samplearray(i, j) 'Code for calculating mean sumofvalue = value + sumofvalue Mean(j) = (sumofvalue / k) ' SampleMean.Cells(j) = Mean(j) stdev(j) = Application.WorksheetFunction.stdev(tempcolumn) 'SampleStdev.Cells(j) = stdev(j) LCL(j) = Exp(Mean(j) + Coeff * stdev(j)) 'SampleLCL.Cells(j) = LCL(j) Next i If (LCL(j) < WQO) Then x=x+1 Else x=x End If Next j Sheets("Alternate").Cells(13, 11) = x MsgBox "Done" End Sub 73 REFERENCES Caltrans (California Department of Transportation), 2003a. Statewide Stormwater Management Plan CTSW-RT-02-008. Caltrans (California Department of Transportation), 2003b. Discharge Characterization Study Report CTSW-RT-03-065.51.42. Caltrans (California Department of Transportation), 2009a. BMP Pilot study Guidance Manual CTSW-RT-06-171.02.1. Caltrans (California Department of Transportation), 2009b. Storm water Monitoring and Data Management CTSW-RT-03-069.51.42. Conover, W.J., 1980. Practical Nonparametric Statistics. John Wiley & Sons, Inc, pp 95-99. CSWRCB (California State Water Resources Control Board), 2004. Water Quality Control Policy for Developing California’s Clean Water Act Section 303(d) List. CSWRCB (California State Water Resources Control Board), 2011a. National Pollutant Discharge Elimination System (NPDES): Tentative Order NO. 2011-XX-DWQ, NPDES NO. CAS000003, August 18, 2011. CSWRCB (California State Water Resources Control Board), 2011b. National Pollutant Discharge Elimination System (NPDES): Fact Sheet, Tentative Order NO. 2011-XXDWQ, NPDES NO. CAS000003, August 18, 2011. Gibbons, J.D., I.Olkin and M.Sobel, 1977. Selecting and Ordering Populations: A New Statistical Methodology. Wiley. New York. Gibbons, R.D, 2003. A Statistical Approach for Performing Water Quality Impairment Assessment. Journal of the American Water Resources Association, 39(4):841-849. Granicher, T, 2011. Office of Water Programs, California State University, Sacramento. Personal Communication. Hahn, G.J. and Meeker, W.Q., 1991. Statistical Intervals: A Guide for Practitioners. Wiley, John & Sons, Incorporated. pp 75-169. Kvam, P.H and B. Vidakovic, 2007. Nonparametric Statistics with Applications to Science and Engineering. Wiley, John & Sons, Incorporated. pp 69-75. 74 Madansky, A, 1988. Prescriptions for Working Statisticians. Springer. pp 25-30 Mansfield, R, 2007. Mastering VBA for Microsoft Office 2007. Wiley Publishing Inc. Shapiro, S.S and M.B.Wilk, 1965. An analysis for variance test for normality (complete samples). Biometrika, 52:591-611. Singh, A.K., A.Singh, and Engelhardt, M., 1997. The Lognormal Distribution in Environmental Applications. U.S. Environmental Protection Agency, EPA/600/S-97/006. Smith, E.P., K.Ye, C. Hughes and L.Shabman, 2001. Statistical Assessment of Violations of Water Quality Standards under Section 303(d) of the Clean Water Act. Environmental Science and Technology, 35:606-612. USEPA (U.S. Environmental Protection Agency), 1989. Statistical Analysis of GroundWater Monitoring Data at RCRA Facilities: Interim Final Guidance, EPA 530/R-89-026. USEPA (U.S. Environmental Protection Agency), 1997. Guidelines for the Preparation of the Comprehensive State Water Quality Assessments (305(b) Reports) and Electronic Updates: Supplements, EPA-841-B-97-002A. USEPA (U.S. Environmental Protection Agency), 2009a. Statistical Analysis of GroundWater Monitoring Data at RCRA Facilities: Unified Guidance, EPA 530/R-09-007. Code of Federal Regulations, 2009. Title 40, Part 122.

ASSESSING THE STATISTICAL ACCURACY OF A MONITORING PROGRAM

Related documents

Products

Support

ASSESSING THE STATISTICAL ACCURACY OF A MONITORING PROGRAM

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib