A comparison of statistical methods for deriving freshwater

advertisement
Environ Sci Pollut Res (2014) 21:159–167
DOI 10.1007/s11356-013-1462-y
ENVIRONMENTAL QUALITY BENCHMARKS FOR PROTECTING AQUATIC ECOSYSTEMS
A comparison of statistical methods for deriving freshwater
quality criteria for the protection of aquatic organisms
Liqun Xing & Hongling Liu & Xiaowei Zhang & Markus Hecker &
John P. Giesy & Hongxia Yu
Received: 6 August 2012 / Accepted: 2 January 2013 / Published online: 15 January 2013
# Springer-Verlag Berlin Heidelberg 2013
Abstract Species sensitivity distributions (SSDs) are increasingly used in both ecological risk assessment and derivation of water quality criteria. However, there has been debate
about the choice of an appropriate approach for derivation of
water quality criteria based on SSDs because the various
methods can generate different values. The objective of this
study was to compare the differences among various methods.
Data sets of acute toxicities of 12 substances to aquatic organisms, representing a range of classes with different modes of
action, were studied. Nine typical statistical approaches, including parametric and nonparametric methods, were used to
construct SSDs for 12 chemicals. Water quality criteria,
expressed as hazardous concentration for 5 % of species
(HC5), were derived by use of several approaches. All
approaches produced comparable results, and the data generated by the different approaches were significantly correlated.
Responsible editor: Michael Matthies
L. Xing : H. Liu (*) : X. Zhang : H. Yu (*)
State Key Laboratory of Pollution Control and Resource Reuse,
School of the Environment, Nanjing University,
Nanjing 210046, China
e-mail: hlliu@nju.edu.cn
e-mail: yuhx@nju.edu.cn
M. Hecker
School of Environment and Sustainability and Toxicology Centre,
University of Saskatchewan, Saskatoon, Saskatchewan, Canada
J. P. Giesy
Department of Veterinary Biomedical Sciences and Toxicology
Centre, University of Saskatchewan, Saskatoon, Saskatchewan,
Canada
J. P. Giesy
Department of Biology and Chemistry and State Key Laboratory
for Marine Pollution, City University of Hong Kong, Kowloon,
Hong Kong, SAR, China
Variability among estimates of HC5 of all inclusive species
decreased with increasing sample size, and variability was
similar among the statistical methods applied. Of the statistical
methods selected, the bootstrap method represented the bestfitting model for all chemicals, while log-triangle and Weibull
were the best models among the parametric methods evaluated.
The bootstrap method was the primary choice to derive water
quality criteria when data points are sufficient (more than 20).
If the available data are few, all other methods should be
constructed, and that which best describes the distribution of
the data was selected.
Keywords Species sensitivity distribution (SSD) .
Hazardous concentration for 5 % of species (HC5) . Water
quality criteria . Statistical methods
Introduction
Water quality criteria (WQC), environmental quality standard (EQS), or predicted no-effect concentration (PNEC) is
playing an important role in environmental management and
pollution control (or risk management). These management
standards can be derived by several methods. However, species sensitivity distribution (SSD) methodology, as one typical
method, has been increasingly used in ecological risk assessment procedures (Caldwell et al. 2008; Grist et al. 2006;
Newman et al. 2000; Solomon et al. 1996) and for deriving
WQC (ANZECC and ARMCANZ 2000; CCME 2007; EC
2003) or EQS (EC 2011) in recent decades. SSDs have some
advantages compared to traditional quotient and assessment
factor (AF) approaches (Table 1). Therefore, the SSD method
has become the priority method in deriving water quality
criteria by most countries. SSD approaches are more reasonable and objective than earlier methods used to derive WQC
160
Table 1 Comparison between
SSD and AF
Environ Sci Pollut Res (2014) 21:159–167
SSD
AF
Probabilistic method
Using the entire dataset
(i.e., all taxa, so that the relative
sensitivities of taxa can be examined)
Less uncertainties, more information
are given (i.e., confidence interval)
Data requirements are demanding
(preferably 10–15)
Deterministic method
Only using the most sensitivity species
toxicity data
More robust approach
Different protective levels can be derived
When the environmental exposure concentrations are
available, quantitative risk can be described
such as the AF approach. The AF approach derives criteria by
multiplying the least toxicity value from a minimum data set,
which varies among jurisdictions, by a safety factor (TenBrook
and Tjeerdema 2006). Since this approach is used to be protective rather than predictive, it often results in significant
overprotective criteria. In contrast, the SSD method extrapolates from a set of available data, assuming that those data are a
random sample of all species, to derive water quality criteria
which were designed to protect some portion of the species in
an ecosystem (Posthuma et al. 2002). Therefore, SSD
approaches have become the preferable method to derive water
quality criteria for several countries, such as Canada, New
Zealand, and Australia as well as European countries.
The purpose of WQC, EQS, or PNEC for aquatic organisms is to protect a known portion of species in an aquatic
environment. Usually, the protection goal is 95 % species in a
given environment, which is achieved by calculating the
hazardous concentration for 5 % (HC5) of all considered
species based on simulated SSDs. In fact, water quality criteria
are based on risk; they are both based on the exposure–
response model (Suter and Cormier 2008). Several
approaches for generating SSDs have been developed over
the years. SSDs are frequently constructed by plotting cumulative probabilities of logarithmically transformed toxicity
threshold values (typically no observed effect concentrations
[NOECs] or half of the concentration at which 50 % of all test
animals in an experiment died [LC50s]) for aquatic species
with different sensitivities, generally including multiple trophic levels and a variety of taxa, against percent rank of each
value. Currently, several statistical approaches are used for
this purpose, including parametric and nonparametric methods. Parametric methods assume a certain probability distribution of the available species toxicity dataset for a chemical,
such as log-normal, log-logistic (Aldenberg and Jaworska
2000; Aldenberg and Slob 1993; Caldwell et al. 2008;
Pennington 2003; van Straalen 2002), Burr type III (Shao
2000), and some other distributions (CCME 2007). Since
A high level of uncertainty, no any
expression of uncertainty
At least one species; assessment factor is
from 10 to 1,000 depending on available
toxicity data
More variable method
Only one protective level can be derived
When the environmental exposure concentrations
are available, qualitative risk can be described
1985, the US Environmental Protection Agency (USEPA)
has been applying four modified formulas to calculate final
acute values (US-FAV approach for short in this study) based
on which numeric water quality criteria are derived that are
based on log-triangle distribution (Stephan et al. 1985).
Nonparametric methods, without any assumptions about the
distribution before simulation and absolutely on probability,
have also been applied (Grist et al. 2002; Jagoe and Newman
1997) and developed (Wang et al. 2008). Generally, estimates
derived by use of different methods are virtually identical
(Wang et al. 2008), especially when estimating a “middle”
centile such as the median. However, different methods can
result in significantly different estimates when calculating
centiles at the lower or upper tails of the distribution curve
(e.g., HC5), especially when the data are few. Effects of data
quantity and quality on various statistical models to derive
SSDs have been studied (Wheeler et al. 2002). The results of
that study indicated that both the size of the data set and
quality of the data have significant effects on the HC5
(Newman et al. 2000). Those authors suggested a minimum
data requirement of 10 data points. The sufficient number of
species on the SSDs has been suggested to be a median of 30
(range, 15 to 55), which is greater than the sample sizes
(number of species) required in recent regulatory documents:
toxicity data from at least eight families belong to three phyla
for the USA (Stephan et al. 1985), 10–15 long-term data from
eight taxonomic groups for Europe (EC 2011). The main
difference among the approaches to derive SSDs lies in the
choice of the underlying distribution. Therefore, a systematic
comparative study is needed to inform the selection of the
most appropriate model to derive objective water quality
criteria.
In the present study, 12 toxicity databases of different
chemicals (Table 2) with various modes of action were used
to evaluate the effect of data quantity on the variability of the
water quality criteria (HC5) derived with different statistical
methods. The relationship between different results derived
0.66
0.54
0.9775
131.58
100.18
0.9564
−193.01
0.0759
137.79
103.96
0.9603
3.65
0.2415
−68.72
0.1608
61.99
0.62
0.49
0.9747
−158.76
0.0566
51.16
45.51
0.9885
−345.24
0.0310
2.23
0.84
0.72
0.9900
−234.33
0.0288
3.09
0.2407
AIC
42.51
RSE
0.3004
Weibull
HC5
4.48
LHC5
4.15
R2
0.9911
AIC
−384.98
RSE
0.0272
Burr type III
HC5
4.96
LHC5
4.14
R2
0.9820
AIC
−280.65
RSE
0.0489
Log-Gumbel
HC5
5.13
LHC5
4.36
R2
0.9824
AIC
RSE
Log-triangle
HC5
76.31
0.4616
40.92
33.78
0.9802
2.19
1.64
0.9692
0.23
−30.10
0.1785
0.21
0.11
0.9244
0.22
0.12
0.9235
13.80
0.2642
61.63
52.35
0.9858
−122.53
0.1150
1.22
0.39
DDT
(n=56)
2.27
1.74
0.9640
−45.14
0.1836
Lead
(n=85)
87.26
35.98
Copper
(n=89)
3.47
1.36
US-FAV
HC5
LHC5
Log-normal
HC5
LHC5
R2
AIC
RSE
Log-logistic
HC5
LHC5
R2
Chemicals
376.70
−0.58
0.2015
632.46
439.66
0.9650
596.99
424.11
0.9657
−30.05
0.0590
308.69
163.53
0.9452
−26.61
0.0681
11.19
0.3290
293.17
168.01
0.9527
341.23
183.28
0.9455
0.17
0.2079
429.36
240.37
Naphthalene
(n=12)
10.85
−3.60
0.1855
14.85
12.52
0.9711
14.35
11.57
0.9681
−36.33
0.0576
8.91
6.58
0.9328
−28.86
0.0753
18.34
0.4061
9.44
5.57
0.9297
10.29
6.42
0.9300
3.32
0.2375
16.27
13.46
Fluoranthene
(n=14)
4.07
11.69
0.3274
6.27
3.97
0.9088
5.95
3.71
0.9145
−20.82
0.0938
5.78
4.02
0.9618
−33.86
0.0568
10.61
0.3141
3.09
1.85
0.9574
3.66
2.39
0.9614
−4.50
0.1756
4.52
3.60
Aroclor
(n=13)
0.80
−15.41
0.1821
1.74
1.37
0.9754
1.66
1.31
0.9751
−97.41
0.0545
2.20
1.99
0.9787
−114.81
0.0422
48.62
0.4668
0.71
0.50
0.9184
0.77
0.35
0.9104
14.08
0.2809
1.91
1.00
Dieldrin
(n=34)
287.70
51.69
0.6291
503.87
195.20
0.6943
474.44
179.83
0.7028
−9.68
0.1844
1,673.12
720.05
0.9588
−66.92
0.0587
55.43
0.6780
219.48
149.75
0.8208
258.67
112.73
0.8154
28.77
0.3977
104.20
72.16
Benzenamine
(n=25)
4,738.83
21.96
0.2762
7,536.39
6,191.92
0.9471
7,296.06
5,860.56
0.9423
−136.10
0.0864
2,348.69
2,010.22
0.9802
−238.73
0.0406
31.75
0.2968
3,399.69
2,901.55
0.9693
4,672.61
4,048.09
0.9796
−73.04
0.1374
7,585.94
3,220.13
Phenol
(n=68)
0.047
55.06
0.4076
0.121
0.076
0.8813
0.111
0.067
0.8804
−62.93
0.1223
0.070
0.050
0.9618
−138.63
0.0565
59.19
0.4251
0.018
0.010
0.9351
0.043
0.026
0.9535
−12.22
0.2051
0.070
0.043
Chlorpyrifos
(n=49)
14,878.48
4.70
0.2479
21,708.18
16,932.67
0.9498
20,904.73
16,090.75
0.9478
−35.99
0.0749
25,486.55
22,471.43
0.9670
−48.00
0.0526
25.76
0.4606
12,767.05
8,291.62
0.9123
14,007.77
8,493.02
0.9127
7.35
0.2681
27,737.76
22,825.43
Formalin
(n=17)
Table 2 Statistical summary of the HC5s (in microgram per liter) and its lower confidence limit (LHC5) for 95 % one-tail confidence intervals calculated by different approaches
3,151.30
17.32
0.3594
7,791.23
3,781.97
0.8927
7,008.76
3,278.04
0.8963
−24.62
0.1047
9,885.10
5,710.15
0.9346
−36.78
0.0732
18.50
0.3721
1,177.74
425.81
0.9409
2,678.87
1,245.14
0.9518
−3.28
0.1961
3,911.42
2,363.39
Ethanol
(n=17)
Environ Sci Pollut Res (2014) 21:159–167
161
n number of available toxicity data; NA not available, due to small sample size (n<20); R2 determination coefficients; AIC Akaike information criterion; RSE residual standard error
0.20
0.079
349.30
99.25
9.20
5.44
2.98
0.72
0.82
0.35
737.10
27.11
3,414.56
2,221.33
0.018
0.0044
13,240.04
6,866.16
1,177.75
200.50
from various SSD methods was discussed, and recommendations regarding preferable SSD models were made.
Numbers in bold are the better two models, and the ones in italic are the best models among the parametric methods
2.11
1.13
Bootstrap regression based on log-logistic
HC5
2.03
41.01
LHC5
1.12
21.66
NA
NA
NA
NA
0.070
0.035
6,734.98
1,555.00
100.00
100.00
NA
NA
1.22
0.17
83.89
66.61
NA
NA
NA
NA
1.00
1.00
9,877.36
0.9156
−45.68
0.0563
0.032
0.9605
−173.24
0.0397
4,048.09
0.9808
−290.33
0.0278
132.56
0.8150
−48.69
0.0845
2.75
0.9645
−45.58
0.0362
0.13
0.9197
−158.56
0.0567
1.60
0.9539
−302.48
0.0433
LHC5
R2
AIC
RSE
Bootstrap
HC5
LHC5
50.40
0.9840
−378.96
0.0254
215.29
0.9435
−36.29
0.0455
6.97
0.9296
−39.69
0.0511
0.36
0.9025
−88.81
0.0619
Formalin
(n=17)
Chlorpyrifos
(n=49)
Phenol
(n=68)
Benzenamine
(n=25)
Dieldrin
(n=34)
Aroclor
(n=13)
Fluoranthene
(n=14)
Naphthalene
(n=12)
DDT
(n=56)
Lead
(n=85)
Copper
(n=89)
Chemicals
Table 2 (continued)
1,567.13
0.9535
−56.39
0.0411
Environ Sci Pollut Res (2014) 21:159–167
Ethanol
(n=17)
162
Materials and methods
Data sources
Twelve substances representing a variety of substance groups
with different modes of actions were selected. These included
metals like copper (Cu, CAS no. 7440-50-8) and lead (Pb,
CAS no. 7439-92-1), and organic compounds including
dichlorodiphenyltrichloroethane (DDT, CAS no. 50-29-3),
naphthalene (CAS no. 91-20-3), fluoranthene (CAS no. 20644-0), Aroclor (CAS no. 12674-11-2), dieldrin (CAS no. 6057-1), benzenamine (CAS no. 62-53-3), phenol (CAS no. 10895-2), chlorpyrifos (CAS no. 2921-88-2), formalin (CAS no.
50-00-0), and ethanol (CAS no. 57158-54-0). The range of
selected substances toxicity data numbers is from 12 to 89, and
seven of them had more than 20 data. Data on acute toxicity of
these chemicals to freshwater species at different trophic levels
were primarily extracted from USEPA's Aquatic Information
Retrieval database (http://www.epa.gov/ecotox). The acceptable acute-effect endpoints were EC50 or LC50 based on mortality. Selection criteria for acceptable data sets were set to
durations of 96 and 48 h for fish and invertebrate toxicity (like
daphnia) tests, respectively (CCME 2007; Stephan et al. 1985).
Because algae had a rapid cell division rate (reproduction),
exposure time shorter than 24 h was considered (CCME 2007).
Concentration units were all converted as microgram per liter.
Parametric methods
Seven parametric methods including US-FAV (based on logtriangle distribution), log-normal, log-logistic, Weibull, Burr
type III, log-Gumbel, and log-triangle models were applied to
derive criteria based on the toxicity datasets for each of the 12
chemicals. These methods have been extensively described in
the literature and are commonly applied in derivation of SSDs
(CCME 2007; Pennington 2003; Shao 2000; Stephan et al.
1985; van Straalen 2002). Confidence intervals associated with
the HC5 values were estimated by using the bootstrap method
(Duboudin et al. 2004; Efron and Tibshirani 1993): the lower
(5 %) of the 90 % confidence of the HC5 are reported as LHC5.
Nonparametric bootstrap methods
Nonparametric approaches were proposed as an alternative to
parametric models because in some cases, data do not follow
any of the distributions assumed by the parametric models. A
detailed description of the extrapolative process utilized by the
nonparametric bootstrap method has been provided previously
by a number of authors (Efron and Tibshirani 1993; Grist et al.
Environ Sci Pollut Res (2014) 21:159–167
163
2002; Jagoe and Newman 1997; Wang et al. 2008). Briefly, a
point estimate such as the HC5is obtained from a large number
of resamples (typically more than 2,000) drawn at random
from the original sample with replacement. The associated
confidence intervals (e.g., 95 %) are then simply defined as
the centile confidence interval that contains a specific percentage of the computed values over the complete set of bootstrap
samples generated. One limitation of this approach, however,
is that it is a data-demanding approach requiring at least 20 data
points for calculation of a HC5 (Duboudin et al. 2004).
replacement). Then, the HC5 was calculated using the different
SSD fitting and bootstrapping methods for each resample
(Wheeler et al. 2002), and the range of fluctuation was used
to evaluate the goodness of fit and the variability of the
different models.
Pearson's correlation coefficients between different methods were calculated. All statistical analyses were performed
using the statistical computing software R (R Development
Core Team, http://www.rproject.org/).
Bootstrap regression method
Results
To overcome the limitations of traditional bootstrap methods, a
resampling regression approach was developed for limited data
sets for which a nonstandard regression model might achieve a
relatively good fit. In brief, the bootstrap regression approach
combines the classic bootstrap method with a certain parametric distribution (Grist et al. 2002; Wang et al. 2008). This
approach fits a specific parametric regression model (e.g.,
log-logistic) to the corresponding cumulative frequency distribution associated with each bootstrap sample generated from
the resampling of the data resulting in a large set of bootstrap
curves. From these large numbers of curves, the 95 % confidence intervals are then calculated in the same way as bootstrap confidence intervals by using the points on curve
replicates for each centile.
Estimates of HC5s and LHC5s derived by use of the various
parametric and nonparametric methods differed somewhat but
were all within the same order of magnitude; 67.7 % of the
HC5s derived from the different statistical approaches were
within a factor of 3.5 and strongly correlated with each other
(R>0.93) especially when the number of toxicity data was
large enough (Tables 1 and 2). Except for benzenamine, all
parametric approaches (except US-FAV, not defined because it
just used four species cumulative probabilities closest to 0.05)
described the toxicity data well, based on coefficients of
determination (R2 >0.9) and RSE (RSE<0.5). Among these
approaches, the Weibull (5 cases out of 12) and log-triangle (7
cases out of 12) were the best-fitting models for all the datasets
as indicated by the quantity standards AIC (−379 to 76) or
RSE (RSE<0.1), followed by Burr type III (Table 2). The R2
of benzenamine was comparatively low (0.69–0.96), which
was probably due to the discontinuous toxicity data—several
species belonging to crustaceans (10^2 μg/L) were much
more sensitive than the others (10^4 μg/L), which may be
due to the mode of action of benzenamine. Furthermore, the
ratios between the HC5 and LHC5 derived from parametric
methods were all within fourfold, and for the majority of
methods, less than a twofold difference between the HC5
and LHC5 was observed (Fig. 1). Confidence intervals around
the average HC5 for the bootstrap and bootstrap regression
methods were greater than those for the other approaches.
Visually, parametric and nonparametric approaches all
described the distribution of the toxicity data well (Fig. 2),
with Cu and fluoranthene used as examples. However,
goodness-of-it curves modeled by the different approaches
differed slightly, especially in the lower tail (less than the
40th centile) of the curves. The nonparametric bootstrap
method fitted the lower portion of the SSD curves best compared to all other approaches. Among the parametric
approaches, the Burr type III and log-Gumbel methods were
the best models for estimating HC5 values. The curve generated by the bootstrap regression based on a log-logistic model
was similar to the one generated by conventional log-logistic.
However, the combinatory approach resulted in a significantly
improved fit in the lower tail of the curve (Table 3, Fig. 2).
Comparison of the methods
Estimated HC5s, based on all available toxicity values obtained
by different statistical methods, were assessed visually for
goodness of “curve fit” at the lower centile tail and the parity
and conservativeness between HC5 and LHC5. Additionally,
quantitative standards including determination coefficients
(R2), Akaike information criterion (AIC), and residual standard
error (RSE) were calculated to further assess the goodness of fit
for the parametric methods (Spiess et al. 2008). It is well
known that the goodness of fit improves with increasing number of model parameters at the cost of increased uncertainty.
The AIC method can consider the effect of number of parameters in the SSD models (Eq. 1). The lower the AIC value, the
better the fit of the model.
AIC ¼ 2k þ n ln
rss n
ð1Þ
where rss=residual sum of squares, k=number of parameters,
and n=number of observations.
The variability of HC5, as a function of varying sample size
derived by the different parametric and nonparametric SSDs,
was assessed by use of a random resampling approach (2,000times resampling from a uniform distribution without
164
Environ Sci Pollut Res (2014) 21:159–167
Fig. 1 A fold difference by
different methods. a US-FAV, b
log-normal, c log-logistic, d
Weibull, e Burr type III, f logGumbel, g log-triangle, h bootstrap, i bootstrap regression
Regardless of the approach used, variability of calculated
HC5 strongly depended on sample size and decreased significantly with increasing sample size (Fig. 3). Variability
was very great for sample sizes of 15 or less, and changes
became negligible at sample sizes greater than 15. The
medians or means of the HC5 did not vary significantly for
sample sizes greater than 10, and mean or median values of
HC5 derived using different sample sizes for one chemical
were almost within a factor of 2.
Discussion
Fig. 2 Illustration of species sensitivity distribution for copper and
fluoranthene derived from different methods. Insert figures amplify the
lower percentile region, and the x-axis is original concentrations. Data
(superimposed open circles) are freshwater acute toxicity 50 % lethal
concentration endpoints extracted from AQUIR database (http://
www.epa.gov/ecotox)
The major advantage of using SSDs for performing ecotoxicological risk assessments and deriving WQC is that they fully
use all available toxicity data (except US-FAV, which only uses
toxicity data most closest to the 0.05) and therefore are more
predictive and have less uncertainty than do classic approaches
such as the AF (usually 10–1,000; detailed information was
introduced in TGD of EC 2003 and 2011) or PNEC methods
Environ Sci Pollut Res (2014) 21:159–167
165
Table 3 Correlation matrix of HC5s calculated by different approaches
Pearson correlation
coefficient
USFAV
US-FAV
Log-normal
Log-logistic
Weibull
Burr type III
Log-Gumbel
Log-triangle
Bootstrap
Bootstrap regression
1.0000
0.9974
0.9986
0.9490
0.9815
0.9774
0.9970
1.0000
0.9969
Lognormal
Loglogistic
1.0000
0.9937
0.9479
0.9905
0.9872
0.9997
0.9992
0.9916
1.0000
0.9373
0.9709
0.9657
0.9923
0.9988
0.9992
Weibull
1.0000
0.9650
0.9671
0.9556
0.7902*
0.9384
Burr type
III
1.0000
0.9997
0.9930
0.9988
0.9679
LogGumbel
1.0000
0.9902
0.9987
0.9626
Logtriangle
1.0000
0.9990
0.9902
Bootstrap
1.0000
0.9794
Bootstrap
regression
1.0000
*p value is 0.034; the other p values are all <0.001
that rely only on data from single or few experiments and
require application of (sometimes) large safety factors
(Stephan et al. 1985). Currently, most countries rely on parametric methods, such as log-normal (CCME 2007; European
Commission 2003), log-logistic (CCME 2007), and Burr type
III (ANZECC and ARMCANZ 2000; CCME 2007)
approaches to derive water quality criteria or to conduct ecotoxicological risk assessment. While based on our results, both
parametric and nonparametric statistical methods all describe
SSDs well; there is no doubt that the bootstrap approach was
the best-fitting model, which is in agreement with reports by
other authors (Wang et al. 2008; Wheeler et al. 2002).
However, to date, it is not widely used in deriving WQC due
to its more extensive data requirements (at least 20 data points
to define HC5). Some regulators are more comfortable with
parametric methods due to their mathematical simplicity and
limited data demand. In fact, several countries such as Canada
(CCME 2007), the European Union (EC 2003, 2011), and
Australia and New Zealand (ANZECC and ARMCANZ
2000) have derived their WQC by use of parametric
approaches. Among these, the log-triangle, Weibull, Gumbel,
and Burr type were superior to the other parametric methods,
particularly for estimating HC in the lower centile range such
as the HC5; but no single method, however, was consistently
Fig. 3 Distribution of variation of HC5 with sample size. Data
used are freshwater acute toxicity 50 % lethal concentration
endpoints of lead and phenol extracted from the AQUIR database
(http://www.epa.gov/ecotox). Figure inserts: the y-axis is log
based 10; the red points represent the mean of the HC5. Each
sample size was calculated from 2,000 resampling events using a
bootstrap approach without replacement. The color density represents the frequency at each point
166
the best fit for all toxicity datasets. The US-FAV method had
been criticized by some researchers who indicated that it only
used the four genus mean acute values (cumulative probabilities closest to 0.05) in the dataset, ignoring all others in the final
derivation of the FAV, and gives poor extrapolations for small
datasets (Fisher and Burton 2003).
The HC5s calculated by the different approaches exhibited
variability for sample sizes of 15 or less regardless of the
method used. This is consistent with recommendations by
Newman et al. (2000) and Wheeler et al. (2002) that estimated
optimal sample sizes for HC5 calculations of 20 or more.
Based on the findings of this study, when using parametric
methods, reliable SSDs can be constructed with 20 or more
data points, which is comparable to the data requirements of
the nonparametric bootstrap method. However, the minimum
data requirements are usually 10 or less (e.g., OECD at least 5
data points from 5 taxonomic groups, USEPA at least 8 data
points from 8 different families, and EU at least 10 data points
from 8 taxonomic groups). Water quality criteria are not only
associated with sample size, but also related to the selected
group of species of aquatic organisms to be protected. While
during the random resample, species distribution or representativeness was not considered. Variability in small sample
sizes might bias the results deriving from the formal process.
However, it can give insight into the fluctuation in deriving
water quality criteria, particularly through a large random
resample. As we know, the uncertainties are inevitable in
deriving water quality criteria and assessing risk due to lack
of knowledge about site species, water quality characteristics,
limited available toxicity data, and mode of action.
Based on the findings of this study, it can be concluded
that for all chemicals tested here, the parametric and nonparametric techniques utilized were able to predict the distribution of the data well. However, there were some
differences in the ability of the different models to accurately fit the lower portion of the curve, resulting in somewhat
variable estimates of HC5 values depending on the method
used. In general, if there are sufficient data available, the
bootstrap method was the superior model and is recommended as the preferred method to describe SSDs.
One difficult aspect of using probabilistic approaches is
in risk communication. It is often difficult to explain to the
public that the WQC is designed to allow a set percentage of
taxa to be affected. For this reason, where possible, it is
proposed to compare the WQC to the results (e.g., NOEC)
obtained from community-level studies in the microcosm or
from field studies. It is known that because the individual
toxicity test datasets used to derive the HC5 (i.e., 95 %
protection level), for instance, give maximum effects that
are often modulated in the environment. Also, there is often
functional redundancy in ecosystems such that functional
replacement of affected taxa occurs. If one taxon is affected
by a toxicant, another more tolerant taxa may replace its
Environ Sci Pollut Res (2014) 21:159–167
function. Finally, effects can be transient either due to dissipation of the toxicant over time or due to development of
tolerance or resistance of populations to its effects. Also,
WQC are developed for individual contaminants, while taxa
can be exposed to multiple toxicants simultaneously. Effects
of toxicants can be independent, or additive or less than
additive such that the mixture may be no more toxic than the
concentration of the most potent toxicant, or they can interact to make a more or a less toxic mixture. In many cases,
the toxicity of mixtures is in fact independent or less than
additive. In these cases, the established WQC will be protective of ecosystem function. But because in some cases the
effects may be additive or even more than additive, there
may be situations where exposure to the mixture would
result in unacceptable effects. For all of these reasons,
choosing the HC5 does not necessarily mean that 5 % of
species will be affected. For all of these reasons, it is
suggested that periodic population- and community-level
assessments of critical habitats be conducted to assure that
the selected WQCs are indeed protective of the environment. This would increase further knowledge about the
relationship between chemical, biological, and ecological
status in aquatic ecosystems.
Acknowledgments This research was supported by the fund of National Natural Science (no. 20977047), Major Science and Technology
Program for Water Pollution Control and Treatment of China (no.
2012ZX07506-001, 2012ZX07501-003-02), and the Environmental
Monitoring Research Foundation of Jiangsu Province (no. 1114). The
research was supported by a Discovery Grant from the Natural Science
and Engineering Research Council of Canada (project # 326415–07).
Prof. Giesy was supported by the Canada Research Chair program, an
at large Chair Professorship at the Department of Biology and Chemistry, and State Key Laboratory in Marine Pollution, City University of
Hong Kong, The Einstein Professor Program of the Chinese Academy
of Sciences.
References
Aldenberg T, Jaworska JS (2000) Uncertainty of the hazardous concentration and fraction affected for normal species sensitivity
distributions. Ecotoxicol Environ Saf 46(1):1–18. doi:10.1016/
S0025-326X(01)00327-7
Aldenberg T, Slob W (1993) Confidence-limits for hazardous concentrations based on logistically distributed noec toxicity data. Ecotoxicol Environ Saf 25(1):48–63. doi:10.1006/eesa.1993.1006
ANZECC, ARMCANZ (2000) Australian and New Zealand guildlines
for fresh and marine water quality. National Water Quality Management Strategy Paper No 4. ANZECC and ARMCANZ,
Canberra
Caldwell DJ, Mastrocco F, Hutchinson TH, Lange R, Heijerick D,
Janssen C, Anderson PD, Sumpter JP (2008) Derivation of an
aquatic predicted no-effect concentration for the synthetic hormone, 17 alpha-ethinyl estradiol. Environ Sci Technol 42
(19):7046–7054. doi:10.1021/es800633q
CCME (2007) A protocol for the derivation of water quality guidelines
for the protection of aquatic life. Canadian Council of Ministers of
the Environment, Winnipeg
Environ Sci Pollut Res (2014) 21:159–167
Duboudin C, Ciffroy P, Magaud H (2004) Effects of data manipulation
and statistical methods on species sensitivity distributions. Environ
Toxicol Chem 23(2):489–499. doi:10.1897/03-159
Efron B, Tibshirani RJ (1993) An introduction to the bootstrap. Chapman
& Hall, New York
European Commission (2003) Technical guidance document on risk
assessment. Part II. European Commission, Joint Research Centre,
EUR 20418 EN/2
European Commission (2011) Common implementation strategy for
the Water Framework Directive (2000/60/EC). Guidance document no. 27. Technical guidance for deriving environmental quality standards. Technical Report 2011–055
Fisher DJ, Burton DT (2003) Comparison of two US environmental
protection agency species sensitivity distribution methods for
calculating ecological risk criteria. Hum Ecol Risk Assess 9
(3):675–690. doi:10.1080/713609961
Grist EPM, Leung KMY, Wheeler JR, Crane M (2002) Better bootstrap
estimation of hazardous concentration thresholds for aquatic
assemblages. Environ Toxicol Chem 21(7):1515–1524.
doi:10.1897/1551-5028(2002)
Grist EPM, O’Hagan A, Crane M, Sorokin N, Sims I, Whitehouse P
(2006) Bayesian and time-independent species sensitivity distributions for risk assessment of chemicals. Environ Sci Technol 40
(1):395–401. doi:10.1021/es050871e
Jagoe RH, Newman MC (1997) Bootstrap estimation of community
NOEC values. Ecotoxicology 6(5):293–306. doi:10.1023/
A:1018639113818
Newman MC, Ownby DR, Mezin LCA, Powell DC, Christensen
TRL, Lerberg SB, Anderson BA (2000) Applying speciessensitivity distributions in ecological risk assessment:
assumptions of distribution type and sufficient numbers of
species. Environ Toxicol Chem 19(2):508–515. doi:10.1897/
1551-5028(2000)019
Pennington DW (2003) Extrapolating ecotoxicological measures from
small data sets. Ecotoxicol Environ Safe 56(2):238–250.
doi:10.1016/S0147-6513(02)00089-1
167
Posthuma L, Suter GW II, Traas TP (2002) Species sensitivity distributions in ecotoxicology. Lewis Publishers, Boca Raton
Shao Q (2000) Estimation for hazardous concentrations based on NOEC
toxicity data: an alternative approach. Environmetrics 11(5):583–
595. doi:10.1002/1099-095X(200009/10)
Solomon KR, Baker DB, Richards RP, Dixon DR, Klaine SJ, LaPoint
TW, Kendall RJ, Weisskopf CP, Giddings JM, Giesy JP, Hall LW,
Williams WM (1996) Ecological risk assessment of atrazine in
North American surface waters. Environ Toxicol Chem 15(1):31–
74. doi:10.1897/1551-5028(1996)015
Spiess AN, Feig C, Ritz C (2008) Highly accurate sigmoidal fitting of
real-time PCR data by introducing a parameter for asymmetry.
BMC Bioinforma 9:221
Stephan CE, Mount DI, Hansen DJ, Gentile JH, Chapman GA, Brungs
WA (1985) Guidelines for deriving numerical national water quality
criteria for the protection of aquatic organisms and their uses. PB85227049. National Technical Information Service, Springfield
Suter GW II, Cormier SM (2008) What is meant by risk-based environmental quality criteria? Integr Environ Assess Manag 4(4):486–489.
doi:10.1897/IEAM_2008-017.1
TenBrook PL, Tjeerdema RS (2006) Methodology for derivation of
pesticide water quality criteria for the protection of aquatic life in
the Sacramento and San Joaquin river basins. Phase I: review of
existing methodologies. Final Report Prepared for the Central
Valley Regional Water Quality Control Board, Department of
Environmental Toxicology, University of California, Davis, USA
van Straalen NM (2002) Threshold models for species sensitivity distributions applied to aquatic risk assessment for zinc. Environ Toxicol
Pharmacol 11(3–4):167–172. doi:10.1016/S1382-6689(01)00114-4
Wang B, Yu G, Huang J, Hu HY (2008) Development of species
sensitivity distributions and estimation of HC5 of organochlorine
pesticides with five statistical approaches. Ecotoxicology 17
(8):716–724. doi:10.1007/s10646-008-0220-2
Wheeler JR, Grist EPM, Leung KMY, Morritt D, Crane M (2002)
Species sensitivity distributions: data and model choice. Mar Pollut
Bull 45(1–12):192–202. doi:10.1016/S0025-326X(01)00327-7
Download