ASSESSING THE STATISTICAL ACCURACY OF A MONITORING PROGRAM

ASSESSING THE STATISTICAL ACCURACY OF A MONITORING PROGRAM
PROPOSED FOR THE CALIFORNIA DEPARTMENT OF TRANSPORTATION
STORMWATER DISCHARGE PERMIT
A Project
Presented to the faculty of the Department of Civil Engineering
California State University, Sacramento
Submitted in partial satisfaction of
the requirements for the degree of
MASTER OF SCIENCE
in
Civil Engineering
by
Lakshmi Jayaprakash
SPRING
2012
© 2012
Lakshmi Jayaprakash
ALL RIGHTS RESERVED
ii
ASSESSING THE STATISTICAL ACCURACY OF A MONITORING PROGRAM
PROPOSED FOR THE CALIFORNIA DEPARTMENT OF TRANSPORTATION
STORMWATER DISCHARGE PERMIT
A Project
by
Lakshmi Jayaprakash
Approved by:
__________________________________, Committee Chair
John Johnston, Ph.D., P.E.
__________________________________, Second Reader
Ramzi Mahmood, Ph.D., P.E.
__________________________
Date
iii
Student: Lakshmi Jayaprakash
I certify that this student has met the requirements for format contained in the University format
manual, and that this project is suitable for shelving in the Library and credit is to be awarded for
the project.
__________________________, Department Chair
Ramzi Mahmood, Ph.D., P.E.
Department of Civil Engineering
iv
___________________
Date
ABSTRACT
of
ASSESSING THE STATISTICAL ACCURACY OF A MONITORING PROGRAM
PROPOSED FOR THE CALIFORNIA DEPARTMENT OF TRANSPORTATION
STORMWATER DISCHARGE PERMIT
by
Lakshmi Jayaprakash
In the 2011 revision to the California Department of Transportation (Caltrans)
statewide stormwater permit (Tentative Order No. 2011-XX-DWQ), the California State
Water Resources Control Board required Caltrans to monitor discharge flows for only
three events each year and using those results, make a determination as to whether the
runoff characteristics exceed water quality objectives (WQO). The process for making
the determination was specified in the permit in the form of two water quality action
levels: (1) three or more exceedances of a water quality objective (WQO) or (2) two or
more exceedances of a WQO by more than 50 percent. The goal of this project is to
assess the accuracy of these rules as a function of the number of field samples collected
per year.
The permit rules were applied to 100,000 randomly chosen samples of specified
sizes drawn from historic Caltrans data sets for total cadmium at three sites. The
California 303(d) listing process was used to determine whether the historic discharges
v
exceeded the WQO. The results of the permit rules applied to the multiple simulated
sample data sets and the 303(d) listing process (representing the “true” result) were
compared. From these comparisons, the probability or frequency of incorrect conclusion
arising from using the permit rules was calculated. An alternate statistical approach in
which the WQO was compared to the 95 percent lower confidence limit (LCL) of the 90th
percentile of the data was tested in a similar manner (i.e.100, 000 random simulations of
various sample sizes).
_______________________, Committee Chair
John Johnston, Ph.D., P.E.
_______________________
Date
vi
ACKNOWLEDGEMENTS
The author would like to thank Dr. John Johnston, Dr. Ramzi Mahmood and Dr. Dipen
Patel for their support, input, guidance and review of this project. In addition, she would
like to thank Dr. Ramzi Mahmood and Dr. John Johnston again for their support and
guidance throughout the graduate program. Lastly, the author would like to thank her
family for their support throughout the program.
vii
TABLE OF CONTENTS
Page
Acknowledgements ........................................................................................................... vii
List of Tables .......................................................................................................................x
List of Figures ................................................................................................................... xii
Chapters
1. INTRODUCTION ...........................................................................................................1
2. BACKGROUND .............................................................................................................4
2.1
Stormwater Definition .................................................................................. 4
2.2
Stormwater Discharge Characterization Study ............................................. 5
2.2.1
Constituents monitored……………………………………………………..9
2.2.2
Factors Affecting Runoff Quality…………………………………………10
2.2.3
Comparison with Water Quality Objectives………………………………11
2.3
Provisions of Tentative Order No. 2011-XX-DWQ ................................... 16
2.4
Water Quality Control Policy ..................................................................... 22
2.4.1
2.5
California Listing Procedure for Numeric Water Quality Objectives
22
Gibbon’s Method ........................................................................................ 26
3. METHODOLOGY ........................................................................................................33
3.1
Caltrans Data............................................................................................... 33
3.2
303(d) Listing Process ................................................................................ 35
3.3
Permit Monitoring Model Program ............................................................ 35
3.4
Alternate Statistical Model ......................................................................... 38
viii
3.5
Delta Procedure........................................................................................... 40
4. RESULTS AND DISCUSSION ....................................................................................41
4.1
Permit Monitoring Model Results .............................................................. 41
4.2
Alternate Statistical Model ......................................................................... 48
4.3
Delta Procedure........................................................................................... 51
5. CONCLUSION ..............................................................................................................56
Appendix A Flowcharts .....................................................................................................57
Appendix B VBA Program Code ......................................................................................64
References ..........................................................................................................................73
ix
LIST OF TABLES
Tables
Page
Table 1 Water Quality Parameter Monitored in Stormwater Runoff (Table 2-1 of Caltrans
2003b) ....................................................................................................................... 10
Table 2 Summary Statistics for Highway Runoff Characteristics (Table 3-2 of Caltrans,
2003b) ....................................................................................................................... 13
Table 3 Comparisons of Caltrans runoff quality data with CTR and other relevant water
quality objectives (Table 3-18, Caltrans, 2003b) ...................................................... 15
Table 4 Water Quality Action Levels (CSWRCB, 2011a) ............................................... 19
Table 5 Minimum number of Exceedances to Place a Water Segment on Section 303(D)
List for Conventional or Other Pollutants (CSWRCB, 2004) .................................. 24
Table 6 One-Sided Factors for 95% Confidence LCL for the 75th and 90th percentile of
the distribution k = 4 to 1000(Gibbons, 2003) .......................................................... 32
Table 7 Input data Total Cadmium (Granicher, 2011) ..................................................... 34
Table 8 User-Defined Input .............................................................................................. 38
Table 9 Permit Monitoring Model Results for Site 1 ....................................................... 41
Table 10 Permit Monitoring Model Results for Site 2 ..................................................... 44
Table 11 Permit Monitoring Model Results for Site 3 ..................................................... 46
Table 12 Alternate Statistical Model Results for Site 1 .................................................... 49
Table 13 Alternate Statistical Model Results for Site 2 .................................................... 50
Table 14 Alternate Statistical Model Results for Site 3 .................................................... 51
Table 15 Synthetic CTR Values ....................................................................................... 51
x
Table 16 Results for Synthetic CTR = 1.748 .................................................................... 52
Table 17 Results for Synthetic CTR = 2.385 .................................................................... 53
Table 18 Results for Synthetic CTR = 3.022 .................................................................... 53
Table 19 Results for Synthetic CTR = 3.659 .................................................................... 53
Table 20 Results for Synthetic CTR = 4.295 .................................................................... 54
Table 21 Summary Statistics for Non-Parametric 95% LCL of 90th percentile ............... 55
xi
LIST OF FIGURES
Figures
Page
Figure 1- Stormwater Monitoring Sites (Caltrans, 2003b) ................................................. 7
Figure 2- Typical Stormwater Monitoring Facilities (Caltrans, 2003b) ............................. 8
Figure 3- Flowchart of the Water Quality Monitoring Process (CSWRCB, 2011a) ........ 20
Figure 4- 95% LCL of the 90th percentile compared to WQO ......................................... 28
Figure 5- Comparing Simulated and Actual Data for Site 1 ............................................. 43
Figure 6- Comparing Simulated and Actual data for Site 2.............................................. 45
Figure 7- Comparison of Simulated and Actual Data for Site 3 ....................................... 47
xii
1
Chapter 1
INTRODUCTION
The California State Department of Transportation (hereafter Caltrans) operates
under a statewide stormwater permit issued under the National Pollutant Discharge
Elimination System (NPDES) by the State Water Resources Control Board (hereafter the
State Board). On August 18 2011, the State Board issued a proposed renewal and
revision of the Caltrans permit in the form of Tentative Order No. 2011-XX-DWQ
(CSWRCB, 2011a). The Tentative Order contains a water quality monitoring process
and water quality action levels. If the action levels are exceeded for a direct discharge to
receiving waters, Caltrans must institute monitoring of receiving water, which is an
expensive undertaking. If subsequent monitoring shows that the receiving water does not
exceed the water quality standards, then the monitoring can be discontinued. If water
quality standards are exceeded, a stormwater Best Management Practice (BMP) must be
installed and monitoring of the discharge must continue. The discharge monitoring
required to determine whether or not a site exceeds water quality action levels consists of
only three samples analyzed over the space of a year. The site is deemed as exceeding
water quality action levels if all three samples exceed local water quality objectives or if
two of the samples exceed the objectives by at least 50 percent (CSWRCB, 2011a).
Given the normal wide variability in stormwater quality, three samples is a very small
number on which to base a statistical determination. It is possible that such small
numbers may lead to many incorrect determinations. Under its current permit (Order No.
2
99-06-DWQ), Caltrans conducted a Statewide Discharge Characterization Study
(Caltrans, 2003b) to identify concentrations and loads for pollutants of concern in runoff
from Caltrans facilities such as highways, rest stops, and maintenance facilities. This
three-year study generated 60,000 data points. More importantly, the data includes 20-30
samples at each of the discharge locations. These larger site-specific data sets allow an
opportunity to evaluate the reliability of the three-sample, one-year protocol in the
Tentative Order.
In this project, data collected during the Caltrans Statewide Discharge
Characterization Study will be used to evaluate whether runoff from a site exceeds a
water quality objective (WQO) using the legally accepted statistical methods used to
determine whether a receiving water violates WQOs for the Clean Water Act 303(d)
program (CSWRCB, 2004). A computer model will be constructed to simulate runoff by
randomly choosing sample results from existing data at a site. Then the rules proposed in
the Tentative Order will be applied to determine if the stormwater quality exceeds a
WQO. A second method proposed by (Gibbons, 2003) will also be investigated. In this
method, the lower confidence limit of the 90th percentile runoff concentration is
compared with the WQO. For both cases, model results will be used to assess the
reliability of the three-sample methods proposed in the permit in correctly determining
whether or not runoff exceeds WQOs. The probabilities of Type I and Type II errors will
be calculated.
3
The goal of this project is to develop a process for assessing the accuracy of
methods that use limited number of samples. The process will be tested using total
cadmium data from several highway sites, but full evaluation of all the constituents
discharged from all the sites monitored by Caltrans is beyond the scope of this project.
4
Chapter 2
BACKGROUND
In 1972, the Federal Clean Water Act (CWA) was amended so that the discharge
of pollutants to waters of the United States from any point source is unlawful unless the
discharge is in compliance with an NPDES permit. In 1987, amendments were made to
the CWA and Section 402(p) was added. This section established a framework for
regulating municipal and industrial stormwater discharges under the NPDES permit
program. On November 16, 1990, the U.S. Environmental Protection Agency (USEPA)
published federal regulations to control the pollutant levels in stormwater runoff
discharges (CSWRCB, 2011). Prior to July 1999, Caltrans stormwater discharges were
regulated by individual permits issued by different Regional Water Quality Control
Boards (hereafter the Regional Boards). On July 15, 1999, the State Water Resources
Control Board issued Caltrans its first statewide stormwater permit (Order No. 99-06DWQ). This statewide permit regulates all Caltrans facilities including maintenance
stations, equipment storage areas, storage facilities, fleet vehicle parking, and discharges
from highways and all rights-of-way owned by Caltrans. The proposed new permit
(Tentative Order No. 2011-XX-DWQ) will supersede the current order.
2.1
Stormwater Definition
Stormwater is categorized into two types:
1.
Stormwater Discharge- Stormwater discharges consist only of discharges that
result from precipitation events. Stormwater is defined in 40 C.F.R. § 122.26(b) (13)
5
as stormwater runoff, snowmelt runoff, and surface runoff and drainage (Code of Federal
Regulations, 2009).
2.
Non-Stormwater Discharge- Non-stormwater discharges consists of all discharges
to an MS4 that do not result from precipitation events.
Caltrans discharges are mainly generated from the maintenance and operation of
state- owned rights- of- way, department storage and disposal areas, and other properties.
Stormwater and non-stormwater are discharged directly to surface waters or indirectly
through Municipal Separate Storm Sewer Systems (MS4s).
2.2
Stormwater Discharge Characterization Study
After the first statewide NPDES permit (Order No. 99-06-DWQ) was issued,
Caltrans conducted a three-year discharge characterization study starting in
2000(Caltrans, 2003b). The study was designed to characterize stormwater discharges
from transportation facilities throughout California. Characterization monitoring was
conducted at over 180 sites statewide yielding more than 60,000 concentration data
points. This study provided broad geographic coverage throughout California (see Figure
1). Some typical stormwater quality monitoring facilities are shown in Figure 2. The key
objectives of the characterization study included:
1. Achieving compliance with NPDES permit requirements;
2. Producing data that are scientifically credible and representative of runoff
from the Department’s facilities and that can be used to define future
monitoring needs;
6
3. Providing information that can be useful to the Department in designing
effective stormwater monitoring and management strategies.
7
Figure 1- Stormwater Monitoring Sites (Caltrans, 2003b)
8
Figure 2- Typical Stormwater Monitoring Facilities (Caltrans, 2003b)
9
The sampling program was designed to produce representative data of runoff for the full
range of transportation facility types, geographic locations, traffic levels and land use
categories. Sampling was conducted over a three-year period with up to eight storm
events annually. The study started during the 2000-01 wet season and was completed at
the end of the 2002-03 wet season. Facilities monitored by Caltrans as part of the study
included highways, maintenance stations, park-and-ride lots , rest areas, toll plazas and
weigh stations.
The highway runoff samples were collected from non-urban (Average Annual
Daily Traffic (AADT) ≤ 30,000 vehicles/day) and urban (AADT > 30,000 vehicles/day),
highways throughout the state.
2.2.1
Constituents monitored
The standard list of water quality constituents monitored in the discharge
characterization studies included:
1. Conventional parameters (pH, temperature, TSS, TDS, conductivity, hardness,
TOC, and DOC).
2. Nutrients (nitrate, TKN, orthophosphate-P, and total P),
3. Total and dissolved metals (As, Cd, Cr, Cu, Pb, Ni and Zn), and
4. Selected pesticides.
The minimum list of constituents is shown in Table 1.
10
Table 1 Water Quality Parameter Monitored in Stormwater Runoff (Table 2-1 of Caltrans
2003b)
Constituent
Conventional Pollutants
Conductivity
Hardness as CaCO3
pH
Temperature
Total Dissolved Solids
Total Suspended Solids
Dissolved Organic Carbon (DOC)
Total Organic Carbon
Nutrients
Nitrate as Nitrogen(NO3-N)
Total Kjeldahl Nitrogen (TKN)
Total Phosphorous
Dissolved Ortho- Phosphate
Metals(total recoverable and dissolved)
Arsenic
Cadmium
Chromium
Copper
Lead
Nickel
Zinc
Herbicides
Diuron
Glyphosate
Oryzalin
Oxadiazon
Triclopyr
2.2.2
Units
Reporting Limit
Mmhos/cm
mg/L
pH Units
ºC
mg/L
mg/L
mg/L
mg/L
1
2
±0.1
±0.1
1
1
1
1
mg/L
mg/L
mg/L
mg/L
0.1
0.1
0.03
0.03
µg/L
µg/L
µg/L
µg/L
µg/L
µg/L
µg/L
1
0.2
1
1
1
2
5
µg/L
µg/L
µg/L
µg/L
µg/L
1
5
1
0.05
0.1
Factors Affecting Runoff Quality
Environmental factors that affect the quality of edge-of-pavement runoff are
average annual daily traffic (AADT), temporal trend variations (annual, seasonal, intrastorm intervals), precipitation characteristics (especially antecedent dry period),
cumulative seasonal rainfall and event rainfall amount (Caltrans, 2003b). Pollutant
11
concentrations were found to be higher for Caltrans facilities with higher AADT,
particularly highways and toll plazas. Concentrations of pollutants were found to be
higher early in the wet season due to build-up of pollutant on the roadway during dry
periods. Longer antecedent dry periods led to higher pollutant concentrations. Runoff
pollutant concentrations decreased as storm size increased; smaller storms produced
higher pollutant concentrations in runoff than storms with larger rainfall amounts.
2.2.3
Comparison with Water Quality Objectives
The results of the study were compared to the California Toxics Rule (CTR) and
other surface water quality objectives considered potentially relevant to stormwater
runoff quality. The other water quality objectives considered included the National
Primary Drinking Water Maximum Contaminant Levels, USEPA Action Plan for
Beaches and Recreational Waters, USEPA Aquatic Life Criteria, and California
Department of Fish and Game recommended criteria for Diazinon and Chlorpyrifos.
These WQOs were considered relevant because they apply to surface waters, which may
receive stormwater discharges from highways and other Caltrans facilities. Table 2
shows summary statistics for highway runoff characteristics. Depending on the
frequency with which the most stringent water quality objectives were exceeded, the
constituents were prioritized as high, medium and low. Constituents were considered
high priority when the frequency of exceedances was greater than 50 percent, medium
priority when exceedance frequencies were 5-50 percent and low priority when
exceedance frequencies were less than 5 percent. Table 3 summarizes the results of
12
comparisons with the most stringent CTR criteria and other relevant WQOs. The high
priority constituents were found to be lead, copper, zinc, aluminum, diazinon,
chlorpyrifos and iron.
13
Table 2 Summary Statistics for Highway Runoff Characteristics (Table 3-2 of Caltrans,
2003b)
14
Table 2 (contd) Summary Statistics for Highway Runoff Characteristics (Table 3-2 of
Caltrans, 2003b)
15
Table 3 Comparisons of Caltrans runoff quality data with CTR and other relevant water
quality objectives (Table 3-18, Caltrans, 2003b)
16
2.3
Provisions of Tentative Order No. 2011-XX-DWQ
The Tentative Order No. 2011-XX-DWQ, issued by the State Water Resources
Board renews Caltrans’ NPDES and state permits to discharge stormwater and permitted
non-stormwater to waters of California and the United States. The permit governs
Caltrans activities such as:
1. Stormwater discharges to MS4s and receiving waters;
2. Stormwater discharges from the Caltrans’ vehicle maintenance, equipment
cleaning operations facilities and other non-industrial facilities with activities
that have the potential of generating significant quantities of pollutants; and
3. Certain categories of non-stormwater discharges.
As part of the renewal process, Caltrans submitted a Stormwater Management
Plan (SWMP) that addresses stormwater discharges from its properties, facilities and
activities throughout the State of California. Caltrans developed the SWMP to document
procedures and practices that it will follow to reduce the discharge of pollutants
(Caltrans, 2003a).
The requirement of interest to this project is the provision on water quality
monitoring in the Tentative Order (Section E.2.c.2. a. ix). Caltrans is required to sample
stormwater and non-stormwater discharges throughout its system. The sampling
frequency specified is a minimum of three wet weather samples, including the first flush
flows. Two dry weather samples are required at sites discharging non-stormwater. The
Tentative Order also requires toxicity analysis to be conducted on the first wet weather
17
sample and first dry weather sample at each site. Toxicity testing need not be continued
if toxicity is not present in first wet and dry weather samples.
In the first year of the new permit cycle, Caltrans is directed to establish a
candidate pool of sampling locations that are representative of the diverse geographic,
climatic, hydrologic, demographic and land use conditions in the state. The pool shall
not be limited to locations, which do not receive run-off from outside the rights-of-way.
It shall eventually contain at least 500 locations. In the first year, 200sites shall be
identified. In the second year, the pool shall be increased to 400 sites and then 500 sites
in the third year. Sites may be designated for stormwater sampling only, non-stormwater
sampling only, or both. Out of the candidate pool, Caltrans, in consultation with
Regional Boards, shall select a minimum of 100 stormwater and non-stormwater sites for
sampling each year. The State Board shall reevaluate the allocation of sites in year 2.
The Tentative Order contains a list of water quality “action levels” which are
exceedances of WQOs as defined in Table 4. At a site where the action levels are not
exceeded, monitoring will be discontinued and a new site will be selected for monitoring
the following year. Where the action levels are exceeded for indirect discharges to
receiving waters, the Regional Board Executive officer shall determine if the discharge is
a threat to receiving waters. If the discharge does not pose a threat, Caltrans may
discontinue monitoring and select a new site from the candidate pool for the following
year. If the action levels are exceeded for a direct discharge to receiving waters, Caltrans
must monitor the affected receiving water. It may or may not choose to continue
18
monitoring the discharge. If receiving water monitoring shows that the discharge is not
causing an exceedance of a water quality objective, Caltrans may discontinue monitoring
and select a new site from the candidate pool. If Caltrans or the Regional Board
determines that the discharge is contributing to an exceedance of a water quality
objective, Caltrans shall conduct a water-shed analysis to determine whether other
sources are contributing to the exceedance. Then Caltrans shall begin an iterative
process. First it shall install or modify BMPs to address the WQO exceedances and
continue discharge monitoring. If the action levels are not exceeded after revising the
BMPs, Caltrans may discontinue monitoring and select a new site from the candidate
pool. If the action levels continue to be exceeded, however, Caltrans shall resume
receiving water monitoring. The iterative monitoring process contained in the Tentative
Order is shown in Figure 3.
19
Table 4 Water Quality Action Levels (CSWRCB, 2011a)
Stormwater
Direct
Indirect Discharge
Discharge
≥3
≥ 3 exceedances of
exceedances a WQO by 10 % or
of a WQO,
more, or
or
1
Non-Stormwater
Direct Discharge
Indirect Discharge
≥ 2 exceedances of a
WQO, or
≥ 2 exceedances
of a WQO by 10
% or more, or
≥2
exceedances
of a WQO by
50% or
more, or
≥ 2 exceedances of
a WQO by 50% or
more, or
≥ 1 exceedances of a
WQO by 50% or
more, or
≥ 1 exceedances
of a WQO by
50% or more, or
≥ 3 TUa >1
≥ 3 TUa >1
≥ 2 TUa >1 or
TUc1 > 0
≥ 2 TUa >1 or
TUc > 0
TUa = Acute Toxicity, TUc = Chronic Toxicity, WQO = Water Quality Objective.
20
Figure 3- Flowchart of the Water Quality Monitoring Process (CSWRCB, 2011a)
21
The water quality action levels are shown in Table 4. In practice, any WQO for
any constituent can trigger an action level. For this project, though, only metals will be
evaluated and the applicable WQOs are found in the California Toxics Rule (CTR). A
site with direct discharge is said to exceed a water quality action level if three or more
samples exceed the CTR value in a year or if two or more samples are greater than the
WQO by 50 percent. The water quality action levels outlined for toxicity will not be
used in this project because toxicity data was not collected during the Statewide
Discharge Characterization Study.
Essentially the water quality action levels are a means of determining whether
Caltrans discharges are causing or contributing to an exceedance of WQOs in receiving
waters. The objective of this project is to evaluate the risk of making an error in this
determination due to the limited number of samples considered in the water quality action
levels. In other words, what is the probability that a “clean” site is determined to be
“dirty” by the application of the water quality action level methodology? Statistically,
this is a Type I error in which the null hypothesis that the site is “clean” is erroneously
rejected. Minimizing this error is of interest to Caltrans because Type I error s result in
spending scarce funds on monitoring receiving waters when there is no water quality
threat. On the other hand, the State Board is interested in minimizing the converse Type
II error in which a site is erroneously classified as “clean” (null hypothesis is accepted)
when it is not.
22
For the permit, a site is determined to be “dirty” when it exceeds the water quality
action levels in Table 4. A site with direct discharge is declared “dirty” if it has three or
more samples that exceed the CTR value or if two or more samples are greater than the
CTR by 50 percent. If the water quality action levels are not exceeded then a site is
declared “clean”.
2.4
Water Quality Control Policy
Section 303(d) of the Clean Water Act (CWA) requires states to identify waters
for further monitoring if they do not meet relevant water quality standards. The states
must gather and evaluate all existing and readily available water quality-related data and
information to develop the list and to provide documentation for listing or not listing a
state’s water. The State Board has adopted a standardized approach to developing
California’s 303(d) list (CSWRCB, 2004). Surface waters will be placed on the 303(d)
list if they meet the listing factors described below. These factors can also be applied to
Caltrans discharges to determine whether or not they are causing or contributing to
exceedances of WQOs (i.e. whether or not a site is “clean”).
2.4.1
California Listing Procedure for Numeric Water Quality Objectives
The 303(d) water quality assessment process is a statistical decision problem
where a decision is made using a limited set of data. In the California method, whether
or not receiving water violates a WQO is assumed to follow a binomial distribution
(CSWRCB, 2004). A hypothesis test is set up in which null hypothesis (H0) is that the
23
probability of a given constituent’s concentration exceeding the associated WQO is less
than or equal to 0.10.
H0: p ≤ 0.10
If the null hypothesis is rejected, then the water body is classified as impaired and
placed on the 303(d) list. If the null hypothesis is not rejected, the water body is
considered to be in compliance with the WQO (i.e. not impaired). The minimum number
of measured exceedances needed to place a water body in the 303(d) list is shown in
Table 5.
24
Table 5 Minimum number of Exceedances to Place a Water Segment on Section 303(D)
List for Conventional or Other Pollutants (CSWRCB, 2004)
25
As seen in Table 5, the null hypothesis is that the water body is not impaired and
the alternate hypothesis is that the water body is impaired. If the null hypothesis H0 is
true but rejected in favor of the alternate hypothesis Ha, a false positive or Type I error is
committed. In this context, a Type I error would mean that an unimpaired water is
incorrectly added to the 303(d) list, which triggers management responses under the
Clean Water Act. Regulated facilities may be erroneously required to incur substantial
costs in additional monitoring and treatment that are unnecessary. If the alternate
hypothesis is true but is rejected in favor of the null hypothesis, a false negative or Type
II error occurs. In this case, the impairment of a water body would go unrecognized and
it would not be listed. Ideally, one would like to simultaneously minimize both errors.
Unfortunately, the risk of committing a Type I error (denoted as α) is typically indirectly
and conversely related to the risk of committing Type II error (denoted as β). In other
words, as α is reduced, β increases. One way to manage the magnitude of β is to increase
the sample size. Typically, the Type I error is chosen by the water quality assessor
(Smith, 2001). If the Type I error is chosen to be 0.10, this determines the cutoff value k
shown in Table 5. The cutoff value is estimated based on the following equation:
π‘˜ = π‘šπ‘–π‘›|𝐡𝑖𝑛(𝑛 − π‘˜, 𝑛, 1 − 0.10) − 𝐡𝑖𝑛(π‘˜ − 1, 𝑛, 0.25)|
n = number of samples
k = cutoff value
Bin = Binomial distribution function
(1)
26
Though under the Section 303(d) of the CWA a set of guidelines are available for
water quality assessments (USEPA, 1997), it lacks statistically sound procedures for
evaluating data (Gibbons, 2003). The 303(d) listing process described in Table 5 is based
on the observed percentage of samples exceeding a WQO instead of an estimate of the
percentage of the true concentration distribution that exceeds the criterion. The problem
with the 303(d) listing process is that confidence in the results depends on the number of
samples collected (i.e. the smaller the number of samples, the greater the uncertainty in
the percentage of the true concentration distribution that exceeds the regulatory standard).
By relying on the percentage of exceedances, the actual concentrations have minimal
bearing on the decision rule. The problem with this approach is that it provides no
information regarding the confidence with which a percentage of the true concentration
distribution fails to meet a regulatory standard (Gibbons, 2003). Thus, there is a need to
use a more statistically rigorous approach to make impairment decisions.
2.5
Gibbon’s Method
An alternative statistical approach to determine whether or not a water body is
impaired is discussed by Gibbons (2003). In this approach, a water quality objective is
compared to the lower confidence limit (LCL1-α, p) of a selected percentile of the
concentration distribution of the associated constituent. If the LCL for the given
percentile exceeds the regulatory standard, the water body is declared in violation of that
standard. Typically, the 95% Lower Confidence Limit (LCL) of the 90th percentile is
compared against the WQO (see Figure 4).
27
This method offers the following features (Gibbons, 2003):
1. It provides a test of the null hypothesis that a given percentage of the true
concentration distribution fails to meet the regulatory standard.
2. It is appropriate for a variety of distributions (i.e. normal, lognormal and nonparametric).
3. It directly incorporates the magnitudes of the measured concentrations in
testing the hypothesis that a given percentage of the true concentration
distribution exceeds the standard.
4. It has explicit statistical power that describes the probability of detecting a
true impairment conditional with a given number of samples (k),
concentration distribution, and magnitude of exceedance.
28
Figure 4- 95% LCL of the 90th percentile compared to WQO
The setup for the hypothesis test is shown below:
H0: LCLP 90, 95 ≤ WQO (unimpaired)
Ha: LCLP 90, 95 > WQO (impaired)
Where,
LCL P90, 95 = is the 95% lower confidence limit for the 90th percentile (P90).
Based on the distribution of the analytical field data, either parametric or
nonparametric methods for calculating the LCL may be chosen. Normal, lognormal and
nonparametric forms of the LCL are listed below (Gibbons, 2003):
29
1. Normal Confidence Limits for a percentile
𝐿𝐢𝐿1−𝛼,𝑝 = π‘₯Μ… + 𝐾𝛼,𝑝 𝑠
(2)
Where,
x = simulated sample mean of k measurements
s = simulated sample standard deviation of k measurements
Kα, p= One sided factors for 95% Confidence LCLs for the 90th percentile of
the distribution for a specific value of k from Table 6.
k = number of sub-samples per sample drawn from the analytical field data.
2. Lognormal Confidence Limits for a Percentile
𝐿𝐢𝐿1−𝛼,𝑝 = 𝑒π‘₯𝑝[𝑦̅ + 𝐾𝛼,𝑝 𝑠𝑦 ]
(3)
Where,
y =mean of the natural log-transformed data y=ln(x)
sy= standard deviation of the transformed data
Kα, p= One sided factors for 95% Confidence LCLs for the 90th percentile of
the distribution for a specific value of k from Table 6.
3. Nonparametric Confidence Limits for a Percentile
To construct a nonparametric confidence limit for a percentile of the
concentration distribution, the k samples are first rank ordered in ascending
order. Then the LCL with the desired confidence interval of a percentile was
calculated by trial and error. The number of samples falling below the
percentile of the distribution out of a set of k samples will follow a binomial
30
distribution with parameters k and success probability p, where success is
defined as the event that a sample measurement is below the percentile. The
cumulative binomial distribution [Bin(x:k,p] represents the probability of
getting x or fewer successes in k trials with success probability p, and is
evaluated by:
π‘˜
𝐡𝑖𝑛(π‘₯; π‘˜, 𝑝) = ∑π‘₯𝑖=1 ( 𝑖 ) 𝑝𝑖 (1 − 𝑝)π‘˜−𝑖
(4)
π‘˜
Where,( 𝑖 ) = number of combinations of k subsamples taken i at a time
k = number of subsamples
i = number of trials
From the ordered sample from smallest to largest (X (1), X (2), ….X (k)), the
LCL of the 90th percentile is calculated by trial and error by computing the level of
confidence p (or 1-α). The level of confidence is estimated as the probability of Y >
L*. That is,
𝑝 = 1 − 𝛼 = 𝑃(π‘Œ > 𝐿∗ ) = 1 − 𝑃(π‘Œ ≤ 𝐿∗ − 1 )
(5)
Where Y is a binomial random variable with a probability of success of 0.9
(corresponding to the 90th percentile). Accordingly, for the binomial distribution the
confidence level is expressed as:
𝑝 = 1 − 𝐡𝑖𝑛(𝐿∗ − 1, 𝑛, 0.9)
(6)
Where,
Bin(L*-1, k, 0.9) is the cumulative distribution function (CDF) of a binomial distribution
with L*-1 successes in k trials with a probability of success of 0.9. The trial and error
31
process starts by setting L* to be equal to k to estimate the probability (p) using Equation
5. If the probability is less than the desired confidence (e.g. 95%), choose a new L* = k-1
and recalculate the probability until the desired confidence level (p) is achieved. Then
the final L* represents the order that corresponds to the desired LCL. That is, LCL is
estimated as X (L).
32
Table 6 One-Sided Factors for 95% Confidence LCL for the 75th and 90th percentile of
the distribution k = 4 to 1000(Gibbons, 2003)
m
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
35
40
50
60
120
240
480
1000
75th Percentile
-0.155
-0.063
-0.002
0.052
0.091
0.123
0.150
0.174
0.194
0.212
0.228
0.243
0.256
0.267
0.278
0.288
0.298
0.306
0.314
0.322
0.329
0.335
0.342
0.348
0.353
0.358
0.363
0.385
0.403
0.431
0.451
0.514
0.560
0.592
0.617
90th Percentile
0.444
0.519
0.575
0.619
0.655
0.686
0.712
0.734
0.754
0.772
0.788
0.802
0.815
0.827
0.839
0.849
0.858
0.867
0.876
0.884
0.891
0.898
0.904
0.911
0.917
0.922
0.928
0.951
0.970
1.000
1.022
1.093
1.146
1.184
1.282
33
Chapter 3
METHODOLOGY
The original goal of the project was to assess the accuracy of the method proposed
in the Caltrans statewide permit (Tentative Order No. 2011-XX-DWQ) for determining
whether or not a discharge might cause or contribute to an impairment of a receiving
water body. The method is based on counting the exceedances of water quality
objectives in stormwater discharge samples collected over a year. The method’s
accuracy will be assessed by simulating many hypothetical years using existing data from
the Caltrans Statewide Discharge Characterization Study and comparing the method
results to the “true” determination provided by the California 303(d) process. A second
method found in the literature (Gibbons, 2003) will also be assessed for comparison.
Both methods will be tested using the existing Total Cadmium data from three Caltrans
Sites.
3.1
Caltrans Data
Total cadmium data collected during the Statewide Discharge Characterization
Study were obtained (with permission) from the Caltrans stormwater database. Three
highway sites with greater than 20 samples each were selected. The three sites
represented one of each type-low, medium or high priority. In low priority sites less than
5% of the samples from the dataset exceed the CTR; in medium priority sites 5-50% of
the samples from the dataset exceed the CTR; and in high priority sites greater than 50%
34
of the samples from the dataset exceed the CTR. The three sites chosen for this project
are:
1. 1-34, 299E, Humboldt County(Site 1-Low priority)
2. 2-02, 5N, Tehama County(Site2-Medium priority)
3. 4-39, 580W, Alameda County(Site 3-High priority)
The total cadmium data for the three sites is shown in Table7. The CTR value for total
cadmium is 0.97 µg/L.
Table 7 Input data Total Cadmium (Granicher, 2011)
Low Priority
1-34, 299E, Humboldt County
Site 1
“Reported Values”
µg/L
0.3
0.2
0.2
0.4
0.2
0.2
0.2
0.2
0.2
0.2
0.2
0.2
0.2
0.2
0.2
0.2
0.2
0.2
0.2
0.2
0.2
0.2
0.2
0.2
Medium Priority
2-02, 5N, Tehama County
Site 2
“Reported Values”
µg/L
0.5
0.3
0.2
0.2
0.2
0.5
0.3
0.2
0.3
0.4
30
0.2
0.3
0.2
0.4
0.6
1
0.2
0.2
0.2
0.2
0.5
0.52
0.29
High Priority
4-39, 580W, Alameda County
Site 3
“Reported Values”
µg/L
4.8
0.7
0.9
1
1.8
0.2
4.7
4.1
2
0.6
1.3
1.2
1
1
0.9
1.8
3.3
1.1
0.9
1.9
1.8
1.5
1.7
35
3.2
303(d) Listing Process
Whether or not a discharge is “clean” is determined by applying the California
303(d) listing process to the data from the Statewide Discharge Characterization Study.
The number of samples that exceed the cadmium CTR value is counted and compared to
the values in Table 5. Because none of the sites has more than 30 samples, the number of
exceedances that would indicate an impairment is five. According to this criterion, Site 1
produces a “clean” discharge, Site 2’s discharge is “clean” and Site 3’s is “dirty”.
(“Clean” in this context means the discharge does not exceed the WQO; “dirty” means
that it does, as determined from the California 303(d) listing process.)
3.3
Permit Monitoring Model Program
To test the accuracy of the permit method, a computer program was written to
check whether stormwater discharge characteristics exceeds water quality objectives by
simulating annual datasets created by sampling from the existing data set for each of the
three sites. The program was written using Visual Basic for Applications (VBA) in
EXCELTM. The flowchart for the code is shown in Appendix A. The VBA code is listed
in Appendix B.
The proposed Caltrans permit contains water quality action levels (Table4) for
determining whether a discharge violates a water quality objective. The first action level
states that three or more exceedances of a water quality objective (WQO) for any
constituent is a violation. This will be referred to as “Rule 1” hereafter. The second
36
action level states that two or more exceedances of a WQO by 50% or more of the WQO
value is a violation. This will be referred to as “Rule 2” hereafter.
To use the program, the user must first import data from the Caltrans database to
the Input worksheet. The input data were provided by the Office of Water Programs with
Caltrans’ permission (Granicher, 2011)
Data were randomly chosen from the analytical data set to create 100,000
simulated years with a user specified number of samples per year. The program looks at
each annual data set and determines whether either or both rules are violated. The α and
β values are calculated as:
If the site is clean: 𝛼 =
π‘›π‘’π‘šπ‘π‘’π‘Ÿ π‘œπ‘“ 𝑒π‘₯π‘π‘’π‘’π‘‘π‘Žπ‘›π‘π‘’π‘ 
𝑁
(7)
Where number of exceedances = false positives, and
N = number of simulations.
If the site is dirty: 𝛽 =
𝑁−π‘›π‘’π‘šπ‘π‘’π‘Ÿ π‘œπ‘“ 𝑒π‘₯π‘π‘’π‘’π‘‘π‘Žπ‘›π‘π‘’π‘ 
𝑁
(8)
Where (N - number of exceedances) = false negatives, and
N = number of simulations
The program code is divided into two subroutines, each controlled by a button
embedded in the spreadsheet.
GetInput Button Subroutine: This subroutine reads the data values from the Input
worksheet and writes them to Output worksheet. It can automatically read and write data
lists of different lengths.
37
ResultsButton Subroutine: This subroutine generates the random annual data sets
and checks them against the two rules. The input variables required from the user are
listed in Table 8. The random annual data sets are generated using the RND() function in
VBA. The results are stored in the ‘Samplearray’ array. Next, the program checks each
annual data set of ‘Samplearray’ to determine whether each annual data set violates Rule
1 and/orRule2. The results are stored in the ‘Ruleviolations’ array. Both the total
number of times Rule 1 and Rule 2 are exceeded individually as well as together are
calculated. Next, the mean and standard deviation of each annual data set in
‘Samplearray’ are calculated and stored it in the ‘Ruleviolations’ array. From the
‘Ruleviolations’ array it calculates the mean of the means of all annual data sets and
standard deviation of all annual data sets and writes them to the Output worksheet. The
total number of violations in the ‘Ruleviolations’ array are summed and used to
calculateα and β are calculated, depending on the status of the site(clean or dirty). The
power of the test is calculated from 1-β.
A histogram and probability plot of the results are provided in the Output
worksheet. The histogram shows the distribution of data points in the original data set
and the simulated data set to demonstrate that the simulation is similar to the original
data. To generate a histogram the numbers of bins is assumed to be 10, and the bin size
and frequency are calculated from the ‘Samplearray’. The probability plot is a visual tool
to show whether or not the data are normally distributed (USEPA, 2009a). The Shapiro
38
Wilk test (USEPA, 2009a) was used to test whether the data follows a normal or a
lognormal distribution.
Table 8 User-Defined Input
k
CTR
3.4
Number of subsamples per iteration. k corresponds to the number of
samples collected per year. This value has to be less than or equal to
the number of results in the dataset.
CTR is the water quality objective from the California Toxics Rule.
It is used to assess the permit rules. The CTR values for different
metals were obtained from Table 3-18 of Statewide Discharge
Characterization Report.
Alternate Statistical Model
The alternate model uses the Gibbons method to determine violations. If the 95%
LCL of the 90th percentile of the distribution of the concentration is less than the CTR a
violation is recorded.
For non-parametric data, the subsamples are randomly generated using the Rnd ()
function in EXCELTM. For normally distributed data, the random subsamples are
generated using the Norm_Inv (Rnd(), mean, standard deviation) function in which the
mean and standard deviation are those of the data set. For a lognormal data set, the
random numbers are generated using the mean and standard deviation of normally
distributed natural log-transformed data. The formulas used for calculating mean and
standard deviation for a lognormal data set are as follows (Singh, 1997):
Mean-πœ‡1 = 𝑒π‘₯𝑝[πœ‡ + 0.5𝜎 2 ]
(9)
Standard deviation-𝜎1 = √[exp(2πœ‡ + 𝜎 2 )][exp(𝜎 2 − 1])
(10)
Where µ = Mean of natural log-transformed data.
σ= Standard deviation of natural log-transformed data.
39
µ1 = Mean of the original random variable.
σ1= Standard deviation of the original random variable.
Depending on how the data set is distributed (normal, lognormal or
nonparametric), the methods described in Chapter 2 can be used to calculate LCL P90, 95
values for each simulated year and each user chosen number of samples (k). These
values are then compared with the WQO to determine the number of violations and the α
and β values are calculated as in the Permit Monitoring Model.
The program is divided into two subroutines, each controlled by a button embedded in
the spreadsheet:
GetSampleButton Subroutine: This subroutine reads the reported values from the
Input worksheet and writes them to the Alternate worksheet. It can automatically read
samples of different lengths from the Input worksheet. The VBA code is in Appendix B.
CheckButton Subroutine: This subroutine, based on the user defined input variables
(see Table 8), first generates random samples. The results are stored in the ‘Samplearray’
array. Next, it orders the subsamples using a bubble sort technique and replaces the
results in ‘Samplearray’. The Samplearray is now an ordered array with the values in
each simulated annual data set arranged in ascending order. Then the 90th percentile
value for each annual data set is calculated and stored in the array ‘percent’. This
calculation is based on the distribution already determined using the Shapiro Wilk test in
the Permit Monitoring Model. Next, the 95% LCL is calculated and stored it the array
40
‘LCL’. Finally, the number of times the 95% LCL is less than WQO is counted. Based
on this, α, β and power are calculated as described previously.
3.5
Delta Procedure
To better understand the relationship between the number of samples, Type I error
(α), Type II error (β) and the relative magnitude of the data WQO with respect to data,
further testing was accomplished using a range of “synthetic” (hypothetical) CTR values.
The synthetic CTR was calculated as follows:
π‘†π‘¦π‘›π‘‘β„Žπ‘’π‘‘π‘–π‘ 𝐢𝑇𝑅 = π‘₯Μ… + π‘Ž ∗ 𝑠
(11)
Where π‘₯Μ… = mean of analytical data
a = number of standard deviations from the mean (0 to 2 chosen by user)
s = standard deviation of analytical data.
In this analysis, the discharge is first determined to be clean or dirty, based on the
303(d) listing process and the synthetic CTR. Then both models are run with a range of k
values. If the site is clean, α is calculated for different sample sizes. If the site is dirty, β
is calculated. For the Alternate Statistical Model the annual data sets are assumed to have
non-parametric distributions and the 95% LCL of the 90th percentile is determined using
the method described in Section 2.5. The 95% LCL of the 90th percentile is compared to
the synthetic CTR to calculate α and β as appropriate.
41
Chapter 4
RESULTS AND DISCUSSION
Presented in this chapter are the results of the computer simulations.
4.1
Permit Monitoring Model Results
The Permit Monitoring Model was built to check the Water Quality Action Levels
(Table 4) specified in Tentative Order No. 2011-XX-DWQ. The purpose of this model is
to evaluate the risk of making incorrect conclusions about stormwater discharge quality.
The risk is measured by α and β. The expectation is that these risks will decrease with
increasing number of samples. Each site was tested using sample sizes of 3, 5, 10, 15 and
20. A histogram was plotted to compare the actual and simulated data distributions. The
simulation results for Site 1 are shown in Table 9 and Figure 6.
Table 9 Permit Monitoring Model Results for Site 1
Site:1-34 299E Humboldt County
Results from the 303d Listing Process: Clean
Subsample Size in a
k=3
Hypothetical Year
Mean
0.213
Standard Deviation
0.020
Permit Monitoring Results
Probability of Exceedance of
0
Rule 1
Probability of Exceedance of
0
Rule 2
α
0
β
NA
Power of Test
NA
k=5
k = 10
k = 15
k = 20
0.212
0.025
0.212
0.032
0.212
0.035
0.213
0.038
0
0
0
0
0
0
0
0
0
NA
NA
0
NA
NA
0
NA
NA
0
NA
NA
As seen in Figure 5, the distribution of the simulated data is essentially identical
to that of the original data set. Site 1 produces a “clean” discharge according to 303(d)
listing process. The simulation results indicate that there was no exceedance of the
42
WQO. Site 1 was tested primarily to demonstrate that the simulated data sets are similar
to the original data set and that the model does not predict an exceedance when there are
none.
The original data set was checked for normality using the Shapiro Wilk test and it
was found to not follow a normal distribution. Next, the original data set was checked for
log normality, using the Shapiro Wilk test on the log-transformed data and it was found
not to follow lognormal distribution either. Therefore, a non-parametric distribution
approach is applicable.
43
Histogram
100.0
90.0
80.0
Frequency%
70.0
60.0
50.0
Simulated Data Freq %
Input Data Freq %
40.0
30.0
20.0
10.0
0.0
0.22 0.24 0.26 0.28 0.30 0.32 0.34 0.36 0.38 0.40
Bins
Figure 5- Comparing Simulated and Actual Data for Site 1
Next, Site 2, which is a medium priority site (8% of the samples exceeded the
CTR) was evaluated. The results are shown in Table 10 and Figure 6. As seen in Figure
6, the distribution of the simulated data is essentially identical to that of the original data
set. The discharge from Site 2 is classified “clean” according to the 303(d) process, so α
values were calculated. The results show that at the permit-required number of
samples(3) α is 0.5%, meaning the risk of incorrectly classifying the site as dirty when it
is clean is relatively small. As the number of samples collected per year (k) increases,
however, this value increases, meaning the risk of making an error increases with greater
amounts of data. As the number of samples increases, the probability of capturing values
44
that exceed the WQO in the random sample increases. This happens because Rule1 and
Rule 2 are setup based on three samples. As larger number of samples are collected, the
probability that three samples will exceed the WQO or that two will exceed it by 50%
rises. It is inappropriate, however to compare risks for different k values because the
permit rules are based on three samples. If the number of samples to be required by the
State Board is larger than three, it would have written water quality action levels based on
that number of samples. The only valid Type I error risk calculated according to permit
procedure is for k = 3, which is 0.5%.
In the original data set all of the sample concentrations are ≤ 1 µg/L except one
which is 30 µg/L, this causes the distribution to be positively skewed. For a sample size
of 3, the mean is 1.605 µg/L which is greater than the WQO. A closer look at the
distribution of the original data set is recommended to make an impairment decision
rather than a simple tally of numbers as in the permit procedure.
Table 10 Permit Monitoring Model Results for Site 2
Site: 2-02, 5N, Tehama County
Results from the 303d Listing Process: Clean
Subsample Size in a
k=3
k=5
Hypothetical Year
Mean
1.605
1.587
Standard Deviation
2.226
2.743
Permit Monitoring Results
Probability of Exceedance of
0.00055
0.00467
Rule 1
Probability of Exceedance of
0.00511
0.015
Rule 2
α
0.005
0.015
β
NA
NA
Power of Test
NA
NA
k = 10
k = 15
k = 20
1.575
3.561
1.583
4.116
1.574
4.466
0.044
0.124
0.229
0.062
0.127
0.20
0.062
NA
NA
0.127
NA
NA
0.22
NA
NA
45
When the original data set was checked for normality using the Shapiro Wilk test,
it was found that it does not follow either normal or lognormal distribution. Therefore,
the non-parametric approach is applicable.
Histogram
100.0
90.0
80.0
Frequency%
70.0
60.0
50.0
Simulated Data Freq %
Input Data Freq %
40.0
30.0
20.0
10.0
0.0
3.18 6.16 9.14 12.12 15.10 18.08 21.06 24.04 27.02 30.00
Bins
Figure 6- Comparing Simulated and Actual data for Site 2
The last site, Site 3, is a high priority site (74 % of the sample exceeded the CTR).
According to the 303(d) process, Site 3 produces a “dirty” discharge. Therefore, the
model calculated β values for various sample sizes as shown in Table 11. The
distribution of simulated data compared to the original data set is shown in Figure 7. As
46
shown the distribution of the simulated data is essentially identical to that of the original
data set.
The model results indicate that as the sample size increases the β decreases. β is
the probability of not violating Rule 1 and Rule 2 when the discharge is dirty, i.e.
concluding the discharge is clean when it is not. Consistent with the findings of Site 2,
the probability of violating the rules increases with sample size. Although, theoretically
β is expected to decrease with increase in sample size, because we are applying the water
quality action levels based on a sample size of 3 to larger sample sizes, the probability of
violating the rules is overestimated, which causes the rapid decline in the value of β, as
shown in the results below. At k=3, where the β value is accurate, the risk of missing a
dirty discharge is 53% at this site. More sites should be evaluated to better understand
this risk for a sample size of 3. Due to the varied data in the actual data set, this site
seems more typical of Caltrans sites than Site2.
Table 11 Permit Monitoring Model Results for Site 3
Site: 4-39, 580W, Alameda County
Results from the 303d Listing Process: Dirty
Subsample Size in a
k=3
Hypothetical Year
Mean
1.75
Standard Deviation
1.015
Permit Monitoring Results
Probability of Exceedance of
0.401
Rule 1
Probability of Exceedance of
0.466
Rule 2
α
NA
β
0.53
Power of Test
0.46
k=5
k = 10
k = 15
k = 20
1.746
1.109
1.748
1.185
1.746
1.206
1.749
1.220
0.88
0.99
1
1
0.78
0.98
0.99
0.99
NA
0.115
0.88
NA
0.001
0.99
NA
0
1
NA
0
1
47
When the data were checked for normality and lognormality using the Shapiro
Wilk test, it was found that the data does follows a lognormal distribution.
Figure 7- Comparison of Simulated and Actual Data for Site 3
48
4.2
Alternate Statistical Model
The Alternate Statistical Model takes the actual distribution of the data into
consideration. Based on the type of distribution of the original data set, the respective
95% LCL of the 90th percentile is calculated for each of 100,000 simulations. The 95%
LCL is compared to the CTR value (i.e. water quality objective) and the number of times
LCL is less than the CTR value is tabulated. If the site produces a clean discharge
according to the 303(d) listing process, then the program computes the α value, otherwise
it computes the β value. The model was used to test the three sites with different
numbers of annual simulated samples. The advantage of this method is that the
estimations of α and β are not dependent on a fixed number of exceedances as in the
permit procedure. Therefore, for a particular sample size, an accurate estimate of the risk
can be made based of sample size and CTR. The disadvantage of this method is that a
minimum sample size of four is needed to make an accurate determination of 95% LCL
of the 90th percentile (Table 6). For modeling purposes, a sample size of 5 was chosen.
As a result, the method cannot be used to evaluate the risk calculated by the permit
procedure for a sample size of 3.
As noted earlier, Site 1 produces a “clean” discharge and the original data set does
not follow normal or lognormal distributions. Since the discharge is clean α values are
computed as shown in Table 12. The α values were computed by calculating the number
of times 95% LCL of the 90th percentile is greater than the CTR. The 95% LCL of the
90th percentile of each of the 100,000 simulations was determined using Gibbon’s method
49
for non-parametric confidence limit of a percentile. In this site, none of the samples
exceed the CTR, therefore α = 0.
Table 12 Alternate Statistical Model Results for Site 1
Site:1-34 299E Humboldt County
Results from the 303d Listing Process: Clean
Subsample Size in a
k=5
k = 10
Hypothetical Year
Alternate Statistical Model
Number of times 95% LCL <
100000
100000
CTR
α
0
0
β
NA
NA
Power of Test
NA
NA
k = 15
k = 20
100000
100000
0
NA
NA
0
NA
NA
Site 2 produces a clean discharge with only two members of the original data set
exceeding the CTR value. The original data set follows a non-parametric distribution and
the 95 % LCLs of the 90th percentile for each of the 100,000 simulations were calculated
based on the Gibbon’s method as discussed above. The calculated α values are shown in
Table 13. It is observed that as sample size increases α increases. To better understand
why α increases more detailed analysis was done, which will be described at the end of
this chapter. As expected, the increase in α with sample size differs from results obtained
by applying the permit rules to sample sizes greater than three.
50
Table 13 Alternate Statistical Model Results for Site 2
Site: 2-02, 5N, Tehama County
Results from the 303d Listing Process: Clean
Subsample Size in a
k=5
Hypothetical Year
Alternate Statistical Model
Number of times 95% LCL <
99532
CTR
α
0.004
β
NA
Power of Test
NA
k = 10
k = 15
k = 20
99317
99394
97851
0.006
NA
NA
0.006
NA
NA
0.020
NA
NA
Site 3 produces a dirty discharge whose concentration data follows a lognormal
distribution. The 95% LCL of the 90th percentile of each of the 100,000 simulations were
calculated using Gibbon’s method of determining lognormal confidence limits for a
percentile. Because discharge exceeds CTR β values are computed as shown in Table 14.
As seen in Table 14, the β values decreases with increase in sample size. This can be
explained by taking a closer look at the original data set. At this site, a large number of
samples exceed the CTR. Only, 6 data points out of 23 are less than the CTR, so as
sample size increases, the number of times 95% LCL is less than CTR decreases. This
site has a low β value because the CTR is lower than majority of the original data. To
better understand the affect the relative position of the CTR with respect to the original
data has on risk estimation, lead to the delta procedure.
51
Table 14 Alternate Statistical Model Results for Site 3
Site: 4-39, 580W, Alameda County
Results from the 303d Listing Process: Dirty
Subsample Size in a
k=5
Hypothetical Year
Alternate Statistical Model
Number of times 95% LCL <
4
CTR
α
NA
β
0.00004
Power of Test
0.99996
4.3
k = 10
k = 15
k = 20
0
0
0
NA
0
1
NA
0
1
NA
0
1
Delta Procedure
To better understand how α and β behave in cases where the runoff is on the
borderline between “clean” and “dirty”, the Site 3 data were evaluated against a
synthetic(hypothetical) WQO that could be moved a distance “delta” in relation to the
mean of the data set. The synthetic CTR value was chosen to be various increments of
standard deviation (between 0 and 2) above the mean of the data set(see Section 3.5).
The evaluation was performed with the Alternate Statistical Model program using a nonparametric approach. Table 15 shows the synthetic CTR values.
Table 15 Synthetic CTR Values
Site: 4-39, 580W, Alameda County
Mean of Sample = 1.748
Standard deviations above
0
Mean
Synthetic CTR
1.748
0.5
1
1.5
2
2.385
3.022
3.659
4.295
For a synthetic CTR value of 1.748, the discharge was determined to be “dirty”
using the 303(d) listing factors. As a result, the β values shown in Table16 were
calculated. Note that the β values are substantially larger than they were for the true CTR
of 0.97µg/L, because the synthetic CTR is higher in relation to the data set. For the data
52
sets below, but near the WQO, the risk of erroneously deciding that the discharge is
clean, when in fact it is dirty increases significantly. As shown in Table 16, the risk
decreases with increasing number of samples.
Table 16 Results for Synthetic CTR = 1.748
Site: 4-39, 580W, Alameda County
Synthetic CTR: 1.748
Results from the 303d Listing Process: Dirty
Subsample Size in a
k=5
k = 10
Hypothetical Year
Alternate Statistical Model
α
NA
NA
β
0.69
0.4
k = 15
k = 20
NA
0.23
NA
0.05
When the synthetic CTR value is raised in relation to the data to a value of 2.385
(one standard deviation above the mean), the 303(d) classification changes from “dirty”
to “clean”. This is because only four samples in the original data set are higher than
2.385. As a result the α values were calculated, as shown in Table 17. The α values are
observed to increase with increase in number of samples, as seen in the earlier cases.
53
Table 17 Results for Synthetic CTR = 2.385
Site: 4-39, 580W, Alameda County
Synthetic CTR: 2.385
Results from the 303d Listing Process: Clean
Subsample Size in a
k=5
k = 10
Hypothetical Year
Alternate Statistical Model
α
0.03
0.08
β
NA
NA
k = 15
k = 20
0.1
NA
0.26
NA
These calculations were repeated for synthetic CTR values of 3.022, 3.659 and
4.295 as shown in Table 18, 19 and 20respectively.
Table 18 Results for Synthetic CTR = 3.022
Site: 4-39, 580W, Alameda County
Synthetic CTR: 3.022
Results from the 303d Listing Process: Clean
Subsample Size in a
k=5
k = 10
Hypothetical Year
Alternate Statistical Model
α
0.04
0.07
β
NA
NA
k = 15
k = 20
0.104
NA
0.25
NA
Table 19 Results for Synthetic CTR = 3.659
Site: 4-39, 580W, Alameda County
Synthetic CTR: 3.659
Results from the 303d Listing Process: Clean
Subsample Size in a
k=5
k = 10
Hypothetical Year
Alternate Statistical Model
α
0.017
0.03
β
NA
NA
k = 15
k = 20
0.036
NA
0.19
NA
54
Table 20 Results for Synthetic CTR = 4.295
Site: 4-39, 580W, Alameda County
Results from the 303d Listing Process: Clean
Synthetic CTR: 4.295
Subsample Size in a
k=5
k = 10
Hypothetical Year
Alternate Statistical Model
α
0.005
0.007
β
NA
NA
k = 15
k = 20
0.007
NA
0.0252
NA
To better understand why α increases with sample size, the 95% LCL was
investigated for each sample size. The simulation was run 1000times. Then, the array of
1000 95% LCLs was ordered in ascending order and from this the summary statistics
shown in Table 21 were calculated. It is observed that the average 95% LCL of the 1000
samples increases as the sample size increases. This is consistent with the idea of a lower
confidence limit. As the number of samples increase, the width of the confidence interval
decreases. As the LCL increases with sample size, the probability of exceeding the fixed
CTR value also increases which increases α. This indicates that the risk of making
Type I error increases with increase in sample size. This is contrary to expectations that
the probability of error should decrease with increase in number of data points.
Consequently, this suggests that Gibbon’s method as described in his paper is not a good
method for calculating α in the bootstrapping type procedure. Comparing the LCL to the
CTR does not seem to be the best way to calculate α. It is recommended to conduct
further investigation on the distribution of the simulated 90th percentile.
55
Table 21 Summary Statistics for Non-Parametric 95% LCL of 90th percentile
Site: 4-39, 580W, Alameda County
Subsample Size in a
k=5
Hypothetical Year
Average LCL
1.44
Stdev LCL
0.92
95% LCL
0.9
95% UCL
3.3
k = 10
k = 15
k = 20
1.79
1.097
1
4.1
1.92
1.15
1.2
4.1
2.36
1.451
1.7
4.7
56
Chapter 5
CONCLUSION
In this project, annual datasets were simulated by sampling from existing data,
with two Excel Visual Basic Application (VBA) computer models. The purpose of the
simulations was to determine the statistical power of the test outlined in the proposed
Caltrans stormwater discharge permit (Tentative Order No. 2011-XX-DWQ) and to
suggest an alternate method that is applicable to a wide variety of different statistical
concentration distributions.
Water Quality Action Levels specified in the Tentative Order when applied to a
sample size of three, the α value is 0.005 at 2-02, 5N, Tehama County and the β value is
0.53 at 4-39, 580W, Alameda County. Synthetic CTR indicates that as the CTR moves
away from the mean of the data, the chance of making a Type I error decreases. For a
given CTR, it was observed that β decreases with sample size. The α value was observed
to increase which is generally not the case. Further research should be conducted to
investigate the distribution of the simulated 90th percentile and how that affects the
behavior of α.
57
APPENDIX A
Flowcharts
VBA Logic for Get Input:
Start
Count the Number of cells in
the Reported values column in
Input worksheet.
Re-dimension array (0 to
Number of cells)
Read values in Reported values
column in Input sheet and add
to array.
Write array to
‘Data_Table’ in
Output
worksheet&
Resize Table
End
58
VBA Logic for Results button :
Start
Input k and CTR
in User Defined
Input Table in
Output worksheet
Assign ‘Data_Table’ to a
variable DataInput
Re-dimension Samplearray &
Ruleviolations array based on k.
A
59
A
Initialize j to 1 (Iteration count)
Initialize i to 1 (sample count)
M= 1 + Int (Rnd()
*DataInput.Count)
Write output to
Samplearray (i, j)
i = i+1
No
Is i = k?
j = j+1
Yes
No
Is j= n?
Yes
Samplearray (i,
j) is stored as
virtual array.
B
60
B
Initialize j to 1 (Iteration
count)
E
Initialize i to 1 (sample count)
Check
Samplearray
(i, j) > rule1?
Check
Samplearray
(i, j) > rule2?
Calculate
Mean
Samplearray
(i, j.)
Write results to
array ‘Rule
violations’
C
F
Calculate
stdev
Samplearray
(i, j.)
61
C
i = i+1
No
Is i = k?
Yes
j = j+1
No
Is j= n?
Yes
Ruleviolations is
stored as a virtual
array.
Calculate: Number of times
rule1and rule2 are exceeded
Calculate: Total number of
times rule pairs are
exceeded
Write output to
Output Table in
Output
worksheet.
62
F
Number of bins=10
Length= binsize(using
DataInput)
Initialize p to 1
(bin count)
breaks (p) =class interval
frequency (1 to 10)
=frequency
Is Samplearray
(i, j)> breaks (p1) & < breaks
(p)?
Yes
D
No
Next p
63
D
frequency (p) =
frequency (p) +1
Write breaks
(p) and
frequency to
Output
worksheet.
E
End
64
APPENDIX B
VBA Program Code
This is the VBA Code for the Permit Monitoring Model:
1. Get Input Button
Sub GetInputbutton()
'Declaring Variables
Dim myinputsheet As Worksheet
Dim myOutputsheet As Worksheet
Dim cnt As Integer, i As Integer, j As Integer, y As Integer
Dim Rng As Range
Dim myArray() As Double
'This macro copies the Reported values from the sheet named "Input" and pastes it into sheet
named "Output".
Set myOutputsheet = ActiveWorkbook.Worksheets("Output")
myOutputsheet.ListObjects("Data_Table").ListColumns(1).DataBodyRange.ClearContents
Set myinputsheet = ActiveWorkbook.Worksheets("Input(Clean)")
'This part of the code is to count the number of data points in the reported values column in the
Input worksheet.
i=9
cnt = 0
Do While (Len(Trim(myinputsheet.Cells(i, 9))) > 0)
cnt = cnt + 1
i=i+1
Loop
'The array is redimensioned depending on the number of data points and the data is written to
the array
ReDim myArray(0 To cnt)
i=9
j=0
For i = 9 To (cnt + 8) Step 1
myArray(j) = myinputsheet.Cells(i, 9)
'Debug.Print myarray(j)
j=j+1
Next i
'The array is written to the Output worksheet
i=8
j=0
x = UBound(myArray)
For i = 8 To (UBound(myArray) + 7) Step 1
myOutputsheet.Cells(i, 2) = myArray(j)
j=j+1
65
Next i
'Resizes the Data_Table in Outputworksheet
y = cnt + 7
myOutputsheet.ListObjects("Data_Table").Resize Range("B7:C" & y)
End Sub
2. Results Button
Sub Resultsbutton()
‘Declare Variables
Dim Samplearray() As Double
'Samplearray is where the randomly generated sub-samples are stored virtually
Dim Ruleviolations() As Variant
'Ruleviolations is where the Rule violations are stored virtually
Dim myworksheet As Worksheet
Dim i As Long, j As Long, n As Integer, k As Long, M As Integer, cnt1 As Integer, cnt2 As
Integer, cnt3 As Integer, cnt4 As Integer, cnt 5 as Long, Mean As Double, SumSq As Double,
StdDev As Double, Mean1 As Double, p As Long, x As Integer, s as double.
'i is the counter for the number of subsamples per iteration,k is the number of subsamples per
iteration,j is the counter for the number of iterations, n is the number of iterations. M is the
temporary variable to store the random number.
'cnt1 is the number of times Rule1 is exceeded in a particular iteration, cnt2 is the number of
times rule 1 is violated, cnt3 is the number of times rule2 is exceededed in a particular iteration,
cnt4 is the number of times rule 1 and 2 is exceeded, cnt 5 is the number of times rule 2 is
exceeded
'mean is the mean of each subsample,Sumsq is the sum of squares of the subsample it helps in
calculating the standard deviation,StdDev is the standard deviation of each subsample, Mean1
is the Mean of the Means of all subsamples, p is the counter variable for the histogram
calculations.
Dim DataInput As Range
Dim rule1 As Double, rule2 As Double
'rule1 is the CTR value. CTR value is obtained from Table 3-18 of Statewide Discarge
Characterization Report.
'rule1 states that greater than or equal to 3 exceedance of WQO(in this case CTR) is a
violation.
'rule2 states that greater than or equal to 2 exceedances of WQO( in this case CTR) by 50% or
more is a violation.
Dim value As Double, value1 As Double, value2 As Double
'value1 is a variable to hold each array element and help in the mean calculation
'value2 is a variable that holds each array element for the calculation of mean for the summary
statistics
Dim Length As Double
ReDim breaks(10) As Double
ReDim freq(10) As Double
Set myworksheet = ActiveWorkbook.Worksheets("Output")
Set DataInput = myworksheet.ListObjects("Data_Table").ListColumns(2).DataBodyRange
66
k = Sheets("Output").Range("M8").value
rule1 = Sheets("Output").Range("M9").value 'CTR value in User Defined table
'gets the value for input variable k and CTR from the respective cells in Outputsheet.
ReDim Samplearray(k, 100000)
ReDim Ruleviolations(100000, 6)
'This part of the code is for sampling.
For j = 1 To 100000 Step 1
For i = 1 To k Step 1
M = 1 + Int(Rnd * DataInput.Count) 'random number generation
Samplearray(i, j) = DataInput.Cells(M, 1) 'Samplearray is the virtual array
Next i
Next j
' This part of the code is for checking Rules 1 and 2.
cnt2 = 0
cnt3 = 0
cnt4 = 0
cnt5 = 0
cnt6 = 0
cnt7 = 0
For j = 1 To 100000 Step 1
cnt1 = 0
cnt3 = 0
sumofvals = 0
sumofvalue1 = 0
For i = 1 To k Step 1
value = Samplearray(i, j) 'value stores elements to check rule1
value1 = Samplearray(i, j) 'value1 stores elements to help in the calculation of Mean
sumofvalue1 = sumofvalue1 + value1 'sumofvalue1 is the sum of values in a sample
example sumof values in sample1
If (value > rule1) Then 'rule1=CTR value
cnt1 = cnt1 + 1
End If
rule2 = rule1*1.5’checking rule2
if (value>rule2) Then
cnt3 = cnt3 +1
End if
Next i
Ruleviolations(j, 1) = j
If (cnt1 = 3 Or cnt1 > 3) Then 'checking rule 1
Ruleviolations(j, 2) = True ' rule1 is exceeded
cnt2 = cnt2 + 1 'cnt2 is the number of times rule1 is exceeded
Else
Ruleviolations(j, 2) = False 'rule1 is not violated
End If
If (cnt3 = 2 cnt3 > 2) Then 'checking rule 2
67
Ruleviolations(j, 3) = True ' rule2 is exceeded
Cnt5 = cnt5 + 1 'cnt5 is the number of times rule2 is exceeded
Else
Ruleviolations(j, 3) = False
End If
If (Ruleviolations(j, 2) = True And Ruleviolations(j, 3) = True) Then
cnt4 = cnt4 + 1 'cnt4 is the counter variable for the number of times rule 1 and rule2 are
exceeded.
Else
cnt4 = cnt4
End If
Mean = (sumofvalue1 / k) ' code for calculating mean
Ruleviolations(j, 5) = Mean
value2 = Ruleviolations(j, 5)
sumofvalue2 = sumofvalue2 + value2
Mean1 = (sumofvalue2 / 100000) 'Mean1 is the mean of all the iterations that will be
output into Summary Statistics
SumSq = 0 'Code for calculating standard deviation
For i = 1 To k
SumSq = SumSq + (Samplearray(i, j) - Mean) ^ 2
StdDev = Sqr(SumSq / (k - 1))
Ruleviolations(j, 6) = StdDev
Next i
value3 = Ruleviolations(j, 6)
SumSq1 = SumSq1 + value3
StdDev1 = (SumSq1 / 100000) 'average of standard dev
'This part of code is for outputing the bins and Frequency needed for the histogram
Length = (((WorksheetFunction.Max(DataInput)) - (WorksheetFunction.Min(DataInput))) / 10)
'binsize, Considering number of bins to be fixed and No. of bins = 10
For p = 1 To 10 Step 1
breaks(p) = (WorksheetFunction.Min(DataInput)) + (Length * p) 'calculating class
intervals
Next p
For i = 1 To k Step 1
If (Samplearray(i, j) <= breaks(1)) Then freq(1) = freq(1) + 1
If (Samplearray(i, j) > breaks(1) And Samplearray(i, j) <= breaks(2)) Then freq(2) =
freq(2) + 1
If (Samplearray(i, j) > breaks(2) And Samplearray(i, j) <= breaks(3)) Then freq(3) =
freq(3) + 1
If (Samplearray(i, j) > breaks(3) And Samplearray(i, j) <= breaks(4)) Then freq(4) =
freq(4) + 1
If (Samplearray(i, j) > breaks(4) And Samplearray(i, j) <= breaks(5)) Then freq(5) =
freq(5) + 1
If (Samplearray(i, j) > breaks(5) And Samplearray(i, j) <= breaks(6)) Then freq(6) =
freq(6) + 1
68
If (Samplearray(i, j) > breaks(6) And Samplearray(i, j) <= breaks(7)) Then freq(7) =
freq(7) + 1
If (Samplearray(i, j) > breaks(7) And Samplearray(i, j) <= breaks(8)) Then freq(8) =
freq(8) + 1
If (Samplearray(i, j) > breaks(8) And Samplearray(i, j) <= breaks(9)) Then freq(9) =
freq(9) + 1
If (Samplearray(i, j) >= breaks(9)) Then freq(10) = freq(10) + 1
Next i
Next j
For p = 1 To 10 Step 1
Cells(p + 8, 30) = breaks(p)
Cells(p + 8, 31) = freq(p)
Next p
‘To check if the sample is clean, using 303d listing process for a sample size of 5-30
If(s <5) Then
Sheets("Output").Cells(21, 13) = “Clean”
Else
Sheets("Output").Cells(21, 13) = “Dirty”
End if
'This part of the code is to write the output to the Output worksheet
Sheets("Output").Cells(14, 13) = cnt2
Sheets("Output").Cells(15, 13) = cnt5
Sheets("Output").Cells(18, 13) = cnt4
Sheets("Output").Cells(12, 13) = Mean1
Sheets("Output").Cells(13, 13) = StdDev1
MsgBox "Done"
End Sub
This is the VBA code for the Alternate Statistical Model:
1. Get Sample button
Sub GetSamplebutton()
'Declaring Variables
Dim myinputsheet As Worksheet
Dim myAlternatesheet As Worksheet
Dim cnt As Integer, i As Integer, j As Integer, z As Integer
Dim sample() As Double
'This macro copies the Reported values from the sheet named "Input" and pastes it into sheet
named "Alternate".
Set myAlternatesheet = ActiveWorkbook.Worksheets("Alternate")
myAlternatesheet.Range("Sample_Table").ClearContents
Set myinputsheet = ActiveWorkbook.Worksheets("Input(Clean)")
'This part of the code is to count the number of data points in the reported values column in the
Input worksheet.
i=9
cnt = 0
Do While (Len(Trim(myinputsheet.Cells(i, 9))) > 0)
cnt = cnt + 1
69
i=i+1
Loop
'The array is redimensioned depending on the number of data points and the data is written to
the array
ReDim sample(0 To cnt)
i=9
j=0
For i = 9 To (cnt + 8) Step 1
sample(j) = myinputsheet.Cells(i, 9)
'Debug.Print sample(j)
j=j+1
Next i
'The array is written to the Alternate worksheet
i=8
j=0
x = UBound(sample)
For i = 8 To (UBound(sample) + 7) Step 1
myAlternatesheet.Cells(i, 2) = sample(j)
j=j+1
Next i
'Resizes the Sample_Table in Alternate worksheet
z = cnt + 7
myAlternatesheet.ListObjects("Sample_Table").Resize Range("B7:B" & z)
End Sub
2. Check Button
a. This is for Non-Parametric data distribution.
Sub Samplerbutton()
‘Declaring Variables
Dim Samplearray() As Double
'Samplearray is where the samples are stored virtually
Dim tempcolumn() As Double
Dim percent() As Double
Dim LCL() As Double
Dim myworksheet As Worksheet
Dim i As Long, j As Long, k As Long, M As Integer, t As Long, p As Long, temp As
Double, Mean As Double, cnt As Integer, value As Double, StDev As Double, x As
Long, WQO as double.
'i is the counter for the number of subsamples per iteration,k is the number of
subsamples per iteration,j is the counter for the number of iterations, n is the number
of iterations, it is a user input. M is the temporary variable to store the random
number.
Dim DataInput As Range
Dim SampleResults As Range
70
'SampleResults is where the samples will be displayed in Alternate worksheet
Dim Samplepercentile As Range
Dim SampleLCL As Range
Set myworksheet = ActiveWorkbook.Worksheets("Alternate")
Set DataInput = myworksheet.ListObjects("Sample_Table").DataBodyRange
k = Sheets("Alternate").Range("K10").value
WQO = Sheets("Alternate").Range("K7").value
'gets the value for input variable k Alternatesheet.
ReDim Samplearray(k, 100000)
ReDim tempcolumn(k)
ReDim percent(100000)
ReDim LCL(100000)
'This part of the code is for randomsampling.
For j = 1 To 100000 Step 1
For i = 1 To k Step 1
M = 1 + Int(Rnd * DataInput.Count)
Samplearray(i, j) = DataInput.Cells(M, 1)
Next i
Next j
'This part of the code is for Bubblesort
For j = 1 To 100000 Step 1
For i = 1 To k Step 1
For p = 1 To (UBound(Samplearray) - 1)
For t = p To UBound(Samplearray)
If Val(Samplearray(t, j)) < Val(Samplearray(p, j)) Then
temp = Samplearray(p, j)
Samplearray(p, j) = Samplearray(t, j)
Samplearray(t, j) = temp
End If
Next t
Next p
'This line of code is to check if the bubblesort code is working correctly, the array is
written to the worksheet 'Alternate'.
'SampleResults.Cells(i, j) = Samplearray(i, j)
‘From Gibbons, Binomial method order is calculated for various sample sizes..
If k = 5 Then LCL(j) = Samplearray(3, j)
If k = 10 Then LCL(j) = Samplearray(7, j)
If k = 15 Then LCL(j) = Samplearray(11, j)
If k = 20 Then LCL(j) = Samplearray(16, j)
'SampleLCL.Cells(j) = LCL(j)
Next i
If (LCL(j) < WQO) Then
‘counting the number of times 95% LCL < CTR
x=x+1
Else
x=x
End If
71
Next j
Sheets("Alternate").Cells(13, 11) = x
MsgBox "Done"
End Sub
b.
This is for Log-Normal Distribution
Sub Samplerbutton()
‘Declaring Variables
Dim Samplearray() As Double
'Samplearray is where the samples are stored virtually
Dim tempcolumn() As Double
Dim percent() As Double
Dim Mean() As Double
Dim stdev() As Double
Dim LCL() As Double
Dim myworksheet As Worksheet
Dim i As Integer, j As Long, k As Long, M As Double, t As Long, p As Long, temp As
Double, cnt As Integer, value As Double, sumofvalue As Double, Coeff As Double, x As
Integer
'i is the counter for the number of subsamples per iteration,k is the number of
subsamples per iteration,j is the counter for the number of iterations, n is the number
of iterations, it is a user input. M is the temporary variable to store the random
number.
Dim DataInput As Range
Dim SampleResults As Range
'SampleResults is where the samples will be displayed in Alternate worksheet
Dim Samplepercentile As Range
Dim SampleMean As Range
Dim SampleStdev As Range
Dim SampleLCL As Range
Set myworksheet = ActiveWorkbook.Worksheets("Alternate")
Set DataInput = myworksheet.ListObjects("Sample_Table").DataBodyRange
k = Sheets("Alternate").Range("K10").value
WQO = Sheets("Alternate").Range("K7").value
'gets the value for input variable k Alternatesheet.
ReDim Samplearray(k, 100000)
ReDim tempcolumn(k)
ReDim percent(100000)
ReDim Mean(100000)
ReDim stdev(100000)
ReDim LCL(100000)
'This part of the code is for randomsampling.
For j = 1 To 100000 Step 1
For i = 1 To k Step 1
M = Application.WorksheetFunction.Norm_Inv(Rnd(), 1.98, 1.41)
Samplearray(i, j) = M
72
'This part of the code is for Bubblesort
Next i
Next j
For j = 1 To 100000 Step 1
For i = 1 To k Step 1
For p = 1 To (UBound(Samplearray) - 1)
For t = p To UBound(Samplearray)
If Val(Samplearray(t, j)) < Val(Samplearray(p, j)) Then
temp = Samplearray(p, j)
Samplearray(p, j) = Samplearray(t, j)
Samplearray(t, j) = temp
End If
Next t
Next p
'This line of code is to check if the bubblesort code is working correctly, the array
is written to the worksheet 'Alternate'.
' SampleResults.Cells(i, j) = Samplearray(i, j)
Next i
Next j
'This part of code is to calculate 90th percentile
For j = 1 To 100000 Step 1
value = 0
sumofvalue = 0
SumSq = 0
If (k = 5) Then Coeff = 0.519
If (k = 10) Then Coeff = 0.712
If (k = 15) Then Coeff = 0.802
If (k = 20) Then Coeff = 0.858
For i = 1 To k Step 1
tempcolumn(i) = Samplearray(i, j)
percent(j) = Application.WorksheetFunction.Percentile_Inc(tempcolumn, 0.9)
'Samplepercentile.Cells(j) = percent(j)
value = Samplearray(i, j) 'Code for calculating mean
sumofvalue = value + sumofvalue
Mean(j) = (sumofvalue / k)
' SampleMean.Cells(j) = Mean(j)
stdev(j) = Application.WorksheetFunction.stdev(tempcolumn)
'SampleStdev.Cells(j) = stdev(j)
LCL(j) = Exp(Mean(j) + Coeff * stdev(j))
'SampleLCL.Cells(j) = LCL(j)
Next i
If (LCL(j) < WQO) Then
x=x+1
Else
x=x
End If
Next j
Sheets("Alternate").Cells(13, 11) = x
MsgBox "Done"
End Sub
73
REFERENCES
Caltrans (California Department of Transportation), 2003a. Statewide Stormwater
Management Plan CTSW-RT-02-008.
Caltrans (California Department of Transportation), 2003b. Discharge Characterization
Study Report CTSW-RT-03-065.51.42.
Caltrans (California Department of Transportation), 2009a. BMP Pilot study Guidance
Manual CTSW-RT-06-171.02.1.
Caltrans (California Department of Transportation), 2009b. Storm water Monitoring and
Data Management CTSW-RT-03-069.51.42.
Conover, W.J., 1980. Practical Nonparametric Statistics. John Wiley & Sons, Inc, pp
95-99.
CSWRCB (California State Water Resources Control Board), 2004. Water Quality
Control Policy for Developing California’s Clean Water Act Section 303(d) List.
CSWRCB (California State Water Resources Control Board), 2011a. National Pollutant
Discharge Elimination System (NPDES): Tentative Order NO. 2011-XX-DWQ, NPDES
NO. CAS000003, August 18, 2011.
CSWRCB (California State Water Resources Control Board), 2011b. National Pollutant
Discharge Elimination System (NPDES): Fact Sheet, Tentative Order NO. 2011-XXDWQ, NPDES NO. CAS000003, August 18, 2011.
Gibbons, J.D., I.Olkin and M.Sobel, 1977. Selecting and Ordering Populations: A New
Statistical Methodology. Wiley. New York.
Gibbons, R.D, 2003. A Statistical Approach for Performing Water Quality Impairment
Assessment. Journal of the American Water Resources Association, 39(4):841-849.
Granicher, T, 2011. Office of Water Programs, California State University, Sacramento.
Personal Communication.
Hahn, G.J. and Meeker, W.Q., 1991. Statistical Intervals: A Guide for Practitioners.
Wiley, John & Sons, Incorporated. pp 75-169.
Kvam, P.H and B. Vidakovic, 2007. Nonparametric Statistics with Applications to
Science and Engineering. Wiley, John & Sons, Incorporated. pp 69-75.
74
Madansky, A, 1988. Prescriptions for Working Statisticians. Springer. pp 25-30
Mansfield, R, 2007. Mastering VBA for Microsoft Office 2007. Wiley Publishing Inc.
Shapiro, S.S and M.B.Wilk, 1965. An analysis for variance test for normality (complete
samples). Biometrika, 52:591-611.
Singh, A.K., A.Singh, and Engelhardt, M., 1997. The Lognormal Distribution in
Environmental Applications. U.S. Environmental Protection Agency, EPA/600/S-97/006.
Smith, E.P., K.Ye, C. Hughes and L.Shabman, 2001. Statistical Assessment of
Violations of Water Quality Standards under Section 303(d) of the Clean Water Act.
Environmental Science and Technology, 35:606-612.
USEPA (U.S. Environmental Protection Agency), 1989. Statistical Analysis of GroundWater Monitoring Data at RCRA Facilities: Interim Final Guidance, EPA 530/R-89-026.
USEPA (U.S. Environmental Protection Agency), 1997. Guidelines for the Preparation
of the Comprehensive State Water Quality Assessments (305(b) Reports) and Electronic
Updates: Supplements, EPA-841-B-97-002A.
USEPA (U.S. Environmental Protection Agency), 2009a. Statistical Analysis of GroundWater Monitoring Data at RCRA Facilities: Unified Guidance, EPA 530/R-09-007.
Code of Federal Regulations, 2009. Title 40, Part 122.