Environ Sci Pollut Res (2014) 21:159–167 DOI 10.1007/s11356-013-1462-y ENVIRONMENTAL QUALITY BENCHMARKS FOR PROTECTING AQUATIC ECOSYSTEMS A comparison of statistical methods for deriving freshwater quality criteria for the protection of aquatic organisms Liqun Xing & Hongling Liu & Xiaowei Zhang & Markus Hecker & John P. Giesy & Hongxia Yu Received: 6 August 2012 / Accepted: 2 January 2013 / Published online: 15 January 2013 # Springer-Verlag Berlin Heidelberg 2013 Abstract Species sensitivity distributions (SSDs) are increasingly used in both ecological risk assessment and derivation of water quality criteria. However, there has been debate about the choice of an appropriate approach for derivation of water quality criteria based on SSDs because the various methods can generate different values. The objective of this study was to compare the differences among various methods. Data sets of acute toxicities of 12 substances to aquatic organisms, representing a range of classes with different modes of action, were studied. Nine typical statistical approaches, including parametric and nonparametric methods, were used to construct SSDs for 12 chemicals. Water quality criteria, expressed as hazardous concentration for 5 % of species (HC5), were derived by use of several approaches. All approaches produced comparable results, and the data generated by the different approaches were significantly correlated. Responsible editor: Michael Matthies L. Xing : H. Liu (*) : X. Zhang : H. Yu (*) State Key Laboratory of Pollution Control and Resource Reuse, School of the Environment, Nanjing University, Nanjing 210046, China e-mail: hlliu@nju.edu.cn e-mail: yuhx@nju.edu.cn M. Hecker School of Environment and Sustainability and Toxicology Centre, University of Saskatchewan, Saskatoon, Saskatchewan, Canada J. P. Giesy Department of Veterinary Biomedical Sciences and Toxicology Centre, University of Saskatchewan, Saskatoon, Saskatchewan, Canada J. P. Giesy Department of Biology and Chemistry and State Key Laboratory for Marine Pollution, City University of Hong Kong, Kowloon, Hong Kong, SAR, China Variability among estimates of HC5 of all inclusive species decreased with increasing sample size, and variability was similar among the statistical methods applied. Of the statistical methods selected, the bootstrap method represented the bestfitting model for all chemicals, while log-triangle and Weibull were the best models among the parametric methods evaluated. The bootstrap method was the primary choice to derive water quality criteria when data points are sufficient (more than 20). If the available data are few, all other methods should be constructed, and that which best describes the distribution of the data was selected. Keywords Species sensitivity distribution (SSD) . Hazardous concentration for 5 % of species (HC5) . Water quality criteria . Statistical methods Introduction Water quality criteria (WQC), environmental quality standard (EQS), or predicted no-effect concentration (PNEC) is playing an important role in environmental management and pollution control (or risk management). These management standards can be derived by several methods. However, species sensitivity distribution (SSD) methodology, as one typical method, has been increasingly used in ecological risk assessment procedures (Caldwell et al. 2008; Grist et al. 2006; Newman et al. 2000; Solomon et al. 1996) and for deriving WQC (ANZECC and ARMCANZ 2000; CCME 2007; EC 2003) or EQS (EC 2011) in recent decades. SSDs have some advantages compared to traditional quotient and assessment factor (AF) approaches (Table 1). Therefore, the SSD method has become the priority method in deriving water quality criteria by most countries. SSD approaches are more reasonable and objective than earlier methods used to derive WQC 160 Table 1 Comparison between SSD and AF Environ Sci Pollut Res (2014) 21:159–167 SSD AF Probabilistic method Using the entire dataset (i.e., all taxa, so that the relative sensitivities of taxa can be examined) Less uncertainties, more information are given (i.e., confidence interval) Data requirements are demanding (preferably 10–15) Deterministic method Only using the most sensitivity species toxicity data More robust approach Different protective levels can be derived When the environmental exposure concentrations are available, quantitative risk can be described such as the AF approach. The AF approach derives criteria by multiplying the least toxicity value from a minimum data set, which varies among jurisdictions, by a safety factor (TenBrook and Tjeerdema 2006). Since this approach is used to be protective rather than predictive, it often results in significant overprotective criteria. In contrast, the SSD method extrapolates from a set of available data, assuming that those data are a random sample of all species, to derive water quality criteria which were designed to protect some portion of the species in an ecosystem (Posthuma et al. 2002). Therefore, SSD approaches have become the preferable method to derive water quality criteria for several countries, such as Canada, New Zealand, and Australia as well as European countries. The purpose of WQC, EQS, or PNEC for aquatic organisms is to protect a known portion of species in an aquatic environment. Usually, the protection goal is 95 % species in a given environment, which is achieved by calculating the hazardous concentration for 5 % (HC5) of all considered species based on simulated SSDs. In fact, water quality criteria are based on risk; they are both based on the exposure– response model (Suter and Cormier 2008). Several approaches for generating SSDs have been developed over the years. SSDs are frequently constructed by plotting cumulative probabilities of logarithmically transformed toxicity threshold values (typically no observed effect concentrations [NOECs] or half of the concentration at which 50 % of all test animals in an experiment died [LC50s]) for aquatic species with different sensitivities, generally including multiple trophic levels and a variety of taxa, against percent rank of each value. Currently, several statistical approaches are used for this purpose, including parametric and nonparametric methods. Parametric methods assume a certain probability distribution of the available species toxicity dataset for a chemical, such as log-normal, log-logistic (Aldenberg and Jaworska 2000; Aldenberg and Slob 1993; Caldwell et al. 2008; Pennington 2003; van Straalen 2002), Burr type III (Shao 2000), and some other distributions (CCME 2007). Since A high level of uncertainty, no any expression of uncertainty At least one species; assessment factor is from 10 to 1,000 depending on available toxicity data More variable method Only one protective level can be derived When the environmental exposure concentrations are available, qualitative risk can be described 1985, the US Environmental Protection Agency (USEPA) has been applying four modified formulas to calculate final acute values (US-FAV approach for short in this study) based on which numeric water quality criteria are derived that are based on log-triangle distribution (Stephan et al. 1985). Nonparametric methods, without any assumptions about the distribution before simulation and absolutely on probability, have also been applied (Grist et al. 2002; Jagoe and Newman 1997) and developed (Wang et al. 2008). Generally, estimates derived by use of different methods are virtually identical (Wang et al. 2008), especially when estimating a “middle” centile such as the median. However, different methods can result in significantly different estimates when calculating centiles at the lower or upper tails of the distribution curve (e.g., HC5), especially when the data are few. Effects of data quantity and quality on various statistical models to derive SSDs have been studied (Wheeler et al. 2002). The results of that study indicated that both the size of the data set and quality of the data have significant effects on the HC5 (Newman et al. 2000). Those authors suggested a minimum data requirement of 10 data points. The sufficient number of species on the SSDs has been suggested to be a median of 30 (range, 15 to 55), which is greater than the sample sizes (number of species) required in recent regulatory documents: toxicity data from at least eight families belong to three phyla for the USA (Stephan et al. 1985), 10–15 long-term data from eight taxonomic groups for Europe (EC 2011). The main difference among the approaches to derive SSDs lies in the choice of the underlying distribution. Therefore, a systematic comparative study is needed to inform the selection of the most appropriate model to derive objective water quality criteria. In the present study, 12 toxicity databases of different chemicals (Table 2) with various modes of action were used to evaluate the effect of data quantity on the variability of the water quality criteria (HC5) derived with different statistical methods. The relationship between different results derived 0.66 0.54 0.9775 131.58 100.18 0.9564 −193.01 0.0759 137.79 103.96 0.9603 3.65 0.2415 −68.72 0.1608 61.99 0.62 0.49 0.9747 −158.76 0.0566 51.16 45.51 0.9885 −345.24 0.0310 2.23 0.84 0.72 0.9900 −234.33 0.0288 3.09 0.2407 AIC 42.51 RSE 0.3004 Weibull HC5 4.48 LHC5 4.15 R2 0.9911 AIC −384.98 RSE 0.0272 Burr type III HC5 4.96 LHC5 4.14 R2 0.9820 AIC −280.65 RSE 0.0489 Log-Gumbel HC5 5.13 LHC5 4.36 R2 0.9824 AIC RSE Log-triangle HC5 76.31 0.4616 40.92 33.78 0.9802 2.19 1.64 0.9692 0.23 −30.10 0.1785 0.21 0.11 0.9244 0.22 0.12 0.9235 13.80 0.2642 61.63 52.35 0.9858 −122.53 0.1150 1.22 0.39 DDT (n=56) 2.27 1.74 0.9640 −45.14 0.1836 Lead (n=85) 87.26 35.98 Copper (n=89) 3.47 1.36 US-FAV HC5 LHC5 Log-normal HC5 LHC5 R2 AIC RSE Log-logistic HC5 LHC5 R2 Chemicals 376.70 −0.58 0.2015 632.46 439.66 0.9650 596.99 424.11 0.9657 −30.05 0.0590 308.69 163.53 0.9452 −26.61 0.0681 11.19 0.3290 293.17 168.01 0.9527 341.23 183.28 0.9455 0.17 0.2079 429.36 240.37 Naphthalene (n=12) 10.85 −3.60 0.1855 14.85 12.52 0.9711 14.35 11.57 0.9681 −36.33 0.0576 8.91 6.58 0.9328 −28.86 0.0753 18.34 0.4061 9.44 5.57 0.9297 10.29 6.42 0.9300 3.32 0.2375 16.27 13.46 Fluoranthene (n=14) 4.07 11.69 0.3274 6.27 3.97 0.9088 5.95 3.71 0.9145 −20.82 0.0938 5.78 4.02 0.9618 −33.86 0.0568 10.61 0.3141 3.09 1.85 0.9574 3.66 2.39 0.9614 −4.50 0.1756 4.52 3.60 Aroclor (n=13) 0.80 −15.41 0.1821 1.74 1.37 0.9754 1.66 1.31 0.9751 −97.41 0.0545 2.20 1.99 0.9787 −114.81 0.0422 48.62 0.4668 0.71 0.50 0.9184 0.77 0.35 0.9104 14.08 0.2809 1.91 1.00 Dieldrin (n=34) 287.70 51.69 0.6291 503.87 195.20 0.6943 474.44 179.83 0.7028 −9.68 0.1844 1,673.12 720.05 0.9588 −66.92 0.0587 55.43 0.6780 219.48 149.75 0.8208 258.67 112.73 0.8154 28.77 0.3977 104.20 72.16 Benzenamine (n=25) 4,738.83 21.96 0.2762 7,536.39 6,191.92 0.9471 7,296.06 5,860.56 0.9423 −136.10 0.0864 2,348.69 2,010.22 0.9802 −238.73 0.0406 31.75 0.2968 3,399.69 2,901.55 0.9693 4,672.61 4,048.09 0.9796 −73.04 0.1374 7,585.94 3,220.13 Phenol (n=68) 0.047 55.06 0.4076 0.121 0.076 0.8813 0.111 0.067 0.8804 −62.93 0.1223 0.070 0.050 0.9618 −138.63 0.0565 59.19 0.4251 0.018 0.010 0.9351 0.043 0.026 0.9535 −12.22 0.2051 0.070 0.043 Chlorpyrifos (n=49) 14,878.48 4.70 0.2479 21,708.18 16,932.67 0.9498 20,904.73 16,090.75 0.9478 −35.99 0.0749 25,486.55 22,471.43 0.9670 −48.00 0.0526 25.76 0.4606 12,767.05 8,291.62 0.9123 14,007.77 8,493.02 0.9127 7.35 0.2681 27,737.76 22,825.43 Formalin (n=17) Table 2 Statistical summary of the HC5s (in microgram per liter) and its lower confidence limit (LHC5) for 95 % one-tail confidence intervals calculated by different approaches 3,151.30 17.32 0.3594 7,791.23 3,781.97 0.8927 7,008.76 3,278.04 0.8963 −24.62 0.1047 9,885.10 5,710.15 0.9346 −36.78 0.0732 18.50 0.3721 1,177.74 425.81 0.9409 2,678.87 1,245.14 0.9518 −3.28 0.1961 3,911.42 2,363.39 Ethanol (n=17) Environ Sci Pollut Res (2014) 21:159–167 161 n number of available toxicity data; NA not available, due to small sample size (n<20); R2 determination coefficients; AIC Akaike information criterion; RSE residual standard error 0.20 0.079 349.30 99.25 9.20 5.44 2.98 0.72 0.82 0.35 737.10 27.11 3,414.56 2,221.33 0.018 0.0044 13,240.04 6,866.16 1,177.75 200.50 from various SSD methods was discussed, and recommendations regarding preferable SSD models were made. Numbers in bold are the better two models, and the ones in italic are the best models among the parametric methods 2.11 1.13 Bootstrap regression based on log-logistic HC5 2.03 41.01 LHC5 1.12 21.66 NA NA NA NA 0.070 0.035 6,734.98 1,555.00 100.00 100.00 NA NA 1.22 0.17 83.89 66.61 NA NA NA NA 1.00 1.00 9,877.36 0.9156 −45.68 0.0563 0.032 0.9605 −173.24 0.0397 4,048.09 0.9808 −290.33 0.0278 132.56 0.8150 −48.69 0.0845 2.75 0.9645 −45.58 0.0362 0.13 0.9197 −158.56 0.0567 1.60 0.9539 −302.48 0.0433 LHC5 R2 AIC RSE Bootstrap HC5 LHC5 50.40 0.9840 −378.96 0.0254 215.29 0.9435 −36.29 0.0455 6.97 0.9296 −39.69 0.0511 0.36 0.9025 −88.81 0.0619 Formalin (n=17) Chlorpyrifos (n=49) Phenol (n=68) Benzenamine (n=25) Dieldrin (n=34) Aroclor (n=13) Fluoranthene (n=14) Naphthalene (n=12) DDT (n=56) Lead (n=85) Copper (n=89) Chemicals Table 2 (continued) 1,567.13 0.9535 −56.39 0.0411 Environ Sci Pollut Res (2014) 21:159–167 Ethanol (n=17) 162 Materials and methods Data sources Twelve substances representing a variety of substance groups with different modes of actions were selected. These included metals like copper (Cu, CAS no. 7440-50-8) and lead (Pb, CAS no. 7439-92-1), and organic compounds including dichlorodiphenyltrichloroethane (DDT, CAS no. 50-29-3), naphthalene (CAS no. 91-20-3), fluoranthene (CAS no. 20644-0), Aroclor (CAS no. 12674-11-2), dieldrin (CAS no. 6057-1), benzenamine (CAS no. 62-53-3), phenol (CAS no. 10895-2), chlorpyrifos (CAS no. 2921-88-2), formalin (CAS no. 50-00-0), and ethanol (CAS no. 57158-54-0). The range of selected substances toxicity data numbers is from 12 to 89, and seven of them had more than 20 data. Data on acute toxicity of these chemicals to freshwater species at different trophic levels were primarily extracted from USEPA's Aquatic Information Retrieval database (http://www.epa.gov/ecotox). The acceptable acute-effect endpoints were EC50 or LC50 based on mortality. Selection criteria for acceptable data sets were set to durations of 96 and 48 h for fish and invertebrate toxicity (like daphnia) tests, respectively (CCME 2007; Stephan et al. 1985). Because algae had a rapid cell division rate (reproduction), exposure time shorter than 24 h was considered (CCME 2007). Concentration units were all converted as microgram per liter. Parametric methods Seven parametric methods including US-FAV (based on logtriangle distribution), log-normal, log-logistic, Weibull, Burr type III, log-Gumbel, and log-triangle models were applied to derive criteria based on the toxicity datasets for each of the 12 chemicals. These methods have been extensively described in the literature and are commonly applied in derivation of SSDs (CCME 2007; Pennington 2003; Shao 2000; Stephan et al. 1985; van Straalen 2002). Confidence intervals associated with the HC5 values were estimated by using the bootstrap method (Duboudin et al. 2004; Efron and Tibshirani 1993): the lower (5 %) of the 90 % confidence of the HC5 are reported as LHC5. Nonparametric bootstrap methods Nonparametric approaches were proposed as an alternative to parametric models because in some cases, data do not follow any of the distributions assumed by the parametric models. A detailed description of the extrapolative process utilized by the nonparametric bootstrap method has been provided previously by a number of authors (Efron and Tibshirani 1993; Grist et al. Environ Sci Pollut Res (2014) 21:159–167 163 2002; Jagoe and Newman 1997; Wang et al. 2008). Briefly, a point estimate such as the HC5is obtained from a large number of resamples (typically more than 2,000) drawn at random from the original sample with replacement. The associated confidence intervals (e.g., 95 %) are then simply defined as the centile confidence interval that contains a specific percentage of the computed values over the complete set of bootstrap samples generated. One limitation of this approach, however, is that it is a data-demanding approach requiring at least 20 data points for calculation of a HC5 (Duboudin et al. 2004). replacement). Then, the HC5 was calculated using the different SSD fitting and bootstrapping methods for each resample (Wheeler et al. 2002), and the range of fluctuation was used to evaluate the goodness of fit and the variability of the different models. Pearson's correlation coefficients between different methods were calculated. All statistical analyses were performed using the statistical computing software R (R Development Core Team, http://www.rproject.org/). Bootstrap regression method Results To overcome the limitations of traditional bootstrap methods, a resampling regression approach was developed for limited data sets for which a nonstandard regression model might achieve a relatively good fit. In brief, the bootstrap regression approach combines the classic bootstrap method with a certain parametric distribution (Grist et al. 2002; Wang et al. 2008). This approach fits a specific parametric regression model (e.g., log-logistic) to the corresponding cumulative frequency distribution associated with each bootstrap sample generated from the resampling of the data resulting in a large set of bootstrap curves. From these large numbers of curves, the 95 % confidence intervals are then calculated in the same way as bootstrap confidence intervals by using the points on curve replicates for each centile. Estimates of HC5s and LHC5s derived by use of the various parametric and nonparametric methods differed somewhat but were all within the same order of magnitude; 67.7 % of the HC5s derived from the different statistical approaches were within a factor of 3.5 and strongly correlated with each other (R>0.93) especially when the number of toxicity data was large enough (Tables 1 and 2). Except for benzenamine, all parametric approaches (except US-FAV, not defined because it just used four species cumulative probabilities closest to 0.05) described the toxicity data well, based on coefficients of determination (R2 >0.9) and RSE (RSE<0.5). Among these approaches, the Weibull (5 cases out of 12) and log-triangle (7 cases out of 12) were the best-fitting models for all the datasets as indicated by the quantity standards AIC (−379 to 76) or RSE (RSE<0.1), followed by Burr type III (Table 2). The R2 of benzenamine was comparatively low (0.69–0.96), which was probably due to the discontinuous toxicity data—several species belonging to crustaceans (10^2 μg/L) were much more sensitive than the others (10^4 μg/L), which may be due to the mode of action of benzenamine. Furthermore, the ratios between the HC5 and LHC5 derived from parametric methods were all within fourfold, and for the majority of methods, less than a twofold difference between the HC5 and LHC5 was observed (Fig. 1). Confidence intervals around the average HC5 for the bootstrap and bootstrap regression methods were greater than those for the other approaches. Visually, parametric and nonparametric approaches all described the distribution of the toxicity data well (Fig. 2), with Cu and fluoranthene used as examples. However, goodness-of-it curves modeled by the different approaches differed slightly, especially in the lower tail (less than the 40th centile) of the curves. The nonparametric bootstrap method fitted the lower portion of the SSD curves best compared to all other approaches. Among the parametric approaches, the Burr type III and log-Gumbel methods were the best models for estimating HC5 values. The curve generated by the bootstrap regression based on a log-logistic model was similar to the one generated by conventional log-logistic. However, the combinatory approach resulted in a significantly improved fit in the lower tail of the curve (Table 3, Fig. 2). Comparison of the methods Estimated HC5s, based on all available toxicity values obtained by different statistical methods, were assessed visually for goodness of “curve fit” at the lower centile tail and the parity and conservativeness between HC5 and LHC5. Additionally, quantitative standards including determination coefficients (R2), Akaike information criterion (AIC), and residual standard error (RSE) were calculated to further assess the goodness of fit for the parametric methods (Spiess et al. 2008). It is well known that the goodness of fit improves with increasing number of model parameters at the cost of increased uncertainty. The AIC method can consider the effect of number of parameters in the SSD models (Eq. 1). The lower the AIC value, the better the fit of the model. AIC ¼ 2k þ n ln rss n ð1Þ where rss=residual sum of squares, k=number of parameters, and n=number of observations. The variability of HC5, as a function of varying sample size derived by the different parametric and nonparametric SSDs, was assessed by use of a random resampling approach (2,000times resampling from a uniform distribution without 164 Environ Sci Pollut Res (2014) 21:159–167 Fig. 1 A fold difference by different methods. a US-FAV, b log-normal, c log-logistic, d Weibull, e Burr type III, f logGumbel, g log-triangle, h bootstrap, i bootstrap regression Regardless of the approach used, variability of calculated HC5 strongly depended on sample size and decreased significantly with increasing sample size (Fig. 3). Variability was very great for sample sizes of 15 or less, and changes became negligible at sample sizes greater than 15. The medians or means of the HC5 did not vary significantly for sample sizes greater than 10, and mean or median values of HC5 derived using different sample sizes for one chemical were almost within a factor of 2. Discussion Fig. 2 Illustration of species sensitivity distribution for copper and fluoranthene derived from different methods. Insert figures amplify the lower percentile region, and the x-axis is original concentrations. Data (superimposed open circles) are freshwater acute toxicity 50 % lethal concentration endpoints extracted from AQUIR database (http:// www.epa.gov/ecotox) The major advantage of using SSDs for performing ecotoxicological risk assessments and deriving WQC is that they fully use all available toxicity data (except US-FAV, which only uses toxicity data most closest to the 0.05) and therefore are more predictive and have less uncertainty than do classic approaches such as the AF (usually 10–1,000; detailed information was introduced in TGD of EC 2003 and 2011) or PNEC methods Environ Sci Pollut Res (2014) 21:159–167 165 Table 3 Correlation matrix of HC5s calculated by different approaches Pearson correlation coefficient USFAV US-FAV Log-normal Log-logistic Weibull Burr type III Log-Gumbel Log-triangle Bootstrap Bootstrap regression 1.0000 0.9974 0.9986 0.9490 0.9815 0.9774 0.9970 1.0000 0.9969 Lognormal Loglogistic 1.0000 0.9937 0.9479 0.9905 0.9872 0.9997 0.9992 0.9916 1.0000 0.9373 0.9709 0.9657 0.9923 0.9988 0.9992 Weibull 1.0000 0.9650 0.9671 0.9556 0.7902* 0.9384 Burr type III 1.0000 0.9997 0.9930 0.9988 0.9679 LogGumbel 1.0000 0.9902 0.9987 0.9626 Logtriangle 1.0000 0.9990 0.9902 Bootstrap 1.0000 0.9794 Bootstrap regression 1.0000 *p value is 0.034; the other p values are all <0.001 that rely only on data from single or few experiments and require application of (sometimes) large safety factors (Stephan et al. 1985). Currently, most countries rely on parametric methods, such as log-normal (CCME 2007; European Commission 2003), log-logistic (CCME 2007), and Burr type III (ANZECC and ARMCANZ 2000; CCME 2007) approaches to derive water quality criteria or to conduct ecotoxicological risk assessment. While based on our results, both parametric and nonparametric statistical methods all describe SSDs well; there is no doubt that the bootstrap approach was the best-fitting model, which is in agreement with reports by other authors (Wang et al. 2008; Wheeler et al. 2002). However, to date, it is not widely used in deriving WQC due to its more extensive data requirements (at least 20 data points to define HC5). Some regulators are more comfortable with parametric methods due to their mathematical simplicity and limited data demand. In fact, several countries such as Canada (CCME 2007), the European Union (EC 2003, 2011), and Australia and New Zealand (ANZECC and ARMCANZ 2000) have derived their WQC by use of parametric approaches. Among these, the log-triangle, Weibull, Gumbel, and Burr type were superior to the other parametric methods, particularly for estimating HC in the lower centile range such as the HC5; but no single method, however, was consistently Fig. 3 Distribution of variation of HC5 with sample size. Data used are freshwater acute toxicity 50 % lethal concentration endpoints of lead and phenol extracted from the AQUIR database (http://www.epa.gov/ecotox). Figure inserts: the y-axis is log based 10; the red points represent the mean of the HC5. Each sample size was calculated from 2,000 resampling events using a bootstrap approach without replacement. The color density represents the frequency at each point 166 the best fit for all toxicity datasets. The US-FAV method had been criticized by some researchers who indicated that it only used the four genus mean acute values (cumulative probabilities closest to 0.05) in the dataset, ignoring all others in the final derivation of the FAV, and gives poor extrapolations for small datasets (Fisher and Burton 2003). The HC5s calculated by the different approaches exhibited variability for sample sizes of 15 or less regardless of the method used. This is consistent with recommendations by Newman et al. (2000) and Wheeler et al. (2002) that estimated optimal sample sizes for HC5 calculations of 20 or more. Based on the findings of this study, when using parametric methods, reliable SSDs can be constructed with 20 or more data points, which is comparable to the data requirements of the nonparametric bootstrap method. However, the minimum data requirements are usually 10 or less (e.g., OECD at least 5 data points from 5 taxonomic groups, USEPA at least 8 data points from 8 different families, and EU at least 10 data points from 8 taxonomic groups). Water quality criteria are not only associated with sample size, but also related to the selected group of species of aquatic organisms to be protected. While during the random resample, species distribution or representativeness was not considered. Variability in small sample sizes might bias the results deriving from the formal process. However, it can give insight into the fluctuation in deriving water quality criteria, particularly through a large random resample. As we know, the uncertainties are inevitable in deriving water quality criteria and assessing risk due to lack of knowledge about site species, water quality characteristics, limited available toxicity data, and mode of action. Based on the findings of this study, it can be concluded that for all chemicals tested here, the parametric and nonparametric techniques utilized were able to predict the distribution of the data well. However, there were some differences in the ability of the different models to accurately fit the lower portion of the curve, resulting in somewhat variable estimates of HC5 values depending on the method used. In general, if there are sufficient data available, the bootstrap method was the superior model and is recommended as the preferred method to describe SSDs. One difficult aspect of using probabilistic approaches is in risk communication. It is often difficult to explain to the public that the WQC is designed to allow a set percentage of taxa to be affected. For this reason, where possible, it is proposed to compare the WQC to the results (e.g., NOEC) obtained from community-level studies in the microcosm or from field studies. It is known that because the individual toxicity test datasets used to derive the HC5 (i.e., 95 % protection level), for instance, give maximum effects that are often modulated in the environment. Also, there is often functional redundancy in ecosystems such that functional replacement of affected taxa occurs. If one taxon is affected by a toxicant, another more tolerant taxa may replace its Environ Sci Pollut Res (2014) 21:159–167 function. Finally, effects can be transient either due to dissipation of the toxicant over time or due to development of tolerance or resistance of populations to its effects. Also, WQC are developed for individual contaminants, while taxa can be exposed to multiple toxicants simultaneously. Effects of toxicants can be independent, or additive or less than additive such that the mixture may be no more toxic than the concentration of the most potent toxicant, or they can interact to make a more or a less toxic mixture. In many cases, the toxicity of mixtures is in fact independent or less than additive. In these cases, the established WQC will be protective of ecosystem function. But because in some cases the effects may be additive or even more than additive, there may be situations where exposure to the mixture would result in unacceptable effects. For all of these reasons, choosing the HC5 does not necessarily mean that 5 % of species will be affected. For all of these reasons, it is suggested that periodic population- and community-level assessments of critical habitats be conducted to assure that the selected WQCs are indeed protective of the environment. This would increase further knowledge about the relationship between chemical, biological, and ecological status in aquatic ecosystems. Acknowledgments This research was supported by the fund of National Natural Science (no. 20977047), Major Science and Technology Program for Water Pollution Control and Treatment of China (no. 2012ZX07506-001, 2012ZX07501-003-02), and the Environmental Monitoring Research Foundation of Jiangsu Province (no. 1114). The research was supported by a Discovery Grant from the Natural Science and Engineering Research Council of Canada (project # 326415–07). Prof. Giesy was supported by the Canada Research Chair program, an at large Chair Professorship at the Department of Biology and Chemistry, and State Key Laboratory in Marine Pollution, City University of Hong Kong, The Einstein Professor Program of the Chinese Academy of Sciences. References Aldenberg T, Jaworska JS (2000) Uncertainty of the hazardous concentration and fraction affected for normal species sensitivity distributions. Ecotoxicol Environ Saf 46(1):1–18. doi:10.1016/ S0025-326X(01)00327-7 Aldenberg T, Slob W (1993) Confidence-limits for hazardous concentrations based on logistically distributed noec toxicity data. Ecotoxicol Environ Saf 25(1):48–63. doi:10.1006/eesa.1993.1006 ANZECC, ARMCANZ (2000) Australian and New Zealand guildlines for fresh and marine water quality. National Water Quality Management Strategy Paper No 4. ANZECC and ARMCANZ, Canberra Caldwell DJ, Mastrocco F, Hutchinson TH, Lange R, Heijerick D, Janssen C, Anderson PD, Sumpter JP (2008) Derivation of an aquatic predicted no-effect concentration for the synthetic hormone, 17 alpha-ethinyl estradiol. Environ Sci Technol 42 (19):7046–7054. doi:10.1021/es800633q CCME (2007) A protocol for the derivation of water quality guidelines for the protection of aquatic life. Canadian Council of Ministers of the Environment, Winnipeg Environ Sci Pollut Res (2014) 21:159–167 Duboudin C, Ciffroy P, Magaud H (2004) Effects of data manipulation and statistical methods on species sensitivity distributions. Environ Toxicol Chem 23(2):489–499. doi:10.1897/03-159 Efron B, Tibshirani RJ (1993) An introduction to the bootstrap. Chapman & Hall, New York European Commission (2003) Technical guidance document on risk assessment. Part II. European Commission, Joint Research Centre, EUR 20418 EN/2 European Commission (2011) Common implementation strategy for the Water Framework Directive (2000/60/EC). Guidance document no. 27. Technical guidance for deriving environmental quality standards. Technical Report 2011–055 Fisher DJ, Burton DT (2003) Comparison of two US environmental protection agency species sensitivity distribution methods for calculating ecological risk criteria. Hum Ecol Risk Assess 9 (3):675–690. doi:10.1080/713609961 Grist EPM, Leung KMY, Wheeler JR, Crane M (2002) Better bootstrap estimation of hazardous concentration thresholds for aquatic assemblages. Environ Toxicol Chem 21(7):1515–1524. doi:10.1897/1551-5028(2002) Grist EPM, O’Hagan A, Crane M, Sorokin N, Sims I, Whitehouse P (2006) Bayesian and time-independent species sensitivity distributions for risk assessment of chemicals. Environ Sci Technol 40 (1):395–401. doi:10.1021/es050871e Jagoe RH, Newman MC (1997) Bootstrap estimation of community NOEC values. Ecotoxicology 6(5):293–306. doi:10.1023/ A:1018639113818 Newman MC, Ownby DR, Mezin LCA, Powell DC, Christensen TRL, Lerberg SB, Anderson BA (2000) Applying speciessensitivity distributions in ecological risk assessment: assumptions of distribution type and sufficient numbers of species. Environ Toxicol Chem 19(2):508–515. doi:10.1897/ 1551-5028(2000)019 Pennington DW (2003) Extrapolating ecotoxicological measures from small data sets. Ecotoxicol Environ Safe 56(2):238–250. doi:10.1016/S0147-6513(02)00089-1 167 Posthuma L, Suter GW II, Traas TP (2002) Species sensitivity distributions in ecotoxicology. Lewis Publishers, Boca Raton Shao Q (2000) Estimation for hazardous concentrations based on NOEC toxicity data: an alternative approach. Environmetrics 11(5):583– 595. doi:10.1002/1099-095X(200009/10) Solomon KR, Baker DB, Richards RP, Dixon DR, Klaine SJ, LaPoint TW, Kendall RJ, Weisskopf CP, Giddings JM, Giesy JP, Hall LW, Williams WM (1996) Ecological risk assessment of atrazine in North American surface waters. Environ Toxicol Chem 15(1):31– 74. doi:10.1897/1551-5028(1996)015 Spiess AN, Feig C, Ritz C (2008) Highly accurate sigmoidal fitting of real-time PCR data by introducing a parameter for asymmetry. BMC Bioinforma 9:221 Stephan CE, Mount DI, Hansen DJ, Gentile JH, Chapman GA, Brungs WA (1985) Guidelines for deriving numerical national water quality criteria for the protection of aquatic organisms and their uses. PB85227049. National Technical Information Service, Springfield Suter GW II, Cormier SM (2008) What is meant by risk-based environmental quality criteria? Integr Environ Assess Manag 4(4):486–489. doi:10.1897/IEAM_2008-017.1 TenBrook PL, Tjeerdema RS (2006) Methodology for derivation of pesticide water quality criteria for the protection of aquatic life in the Sacramento and San Joaquin river basins. Phase I: review of existing methodologies. Final Report Prepared for the Central Valley Regional Water Quality Control Board, Department of Environmental Toxicology, University of California, Davis, USA van Straalen NM (2002) Threshold models for species sensitivity distributions applied to aquatic risk assessment for zinc. Environ Toxicol Pharmacol 11(3–4):167–172. doi:10.1016/S1382-6689(01)00114-4 Wang B, Yu G, Huang J, Hu HY (2008) Development of species sensitivity distributions and estimation of HC5 of organochlorine pesticides with five statistical approaches. Ecotoxicology 17 (8):716–724. doi:10.1007/s10646-008-0220-2 Wheeler JR, Grist EPM, Leung KMY, Morritt D, Crane M (2002) Species sensitivity distributions: data and model choice. Mar Pollut Bull 45(1–12):192–202. doi:10.1016/S0025-326X(01)00327-7