BASICS OF WET STATISTICS SETAC Expert Advisory Panel Performance Evaluation and Data Interpretation GRAPH THE DATA 80 Raw Data Mean 70 Response (% Effect) 60 50 40 30 20 10 0 0 1 2 3 4 5 Concentration (% Effluent) 6 ANALYZE DATA FOLLOWING EPA WET STATISTICAL FLOWCHARTS • Hypothesis Tests –NOAEC (Acute) –NOEC (Chronic) • Point Estimation –LC50 (Acute) –EC25 or IC25 (Chronic) PURPOSE OF HYPOTHESIS TESTS AND BASIC CONSIDERATIONS • Purpose - Determine if two things (responses) are different • Relevance of initial (control) condition(s) • Power of statistical test EFFECTS ASSOCIATED WITH THE NOEC IN FATHEAD MINNOW GROWTH DATA 25 Effect at NOEC 20 15 10 5 0 -5 -10 0 2 4 6 8 Test # 10 12 14 16 EPA HYPOTHESIS TEST FLOWCHART (MULTI-CONC) • Test assumptions of • Select appropriate test ANOVA – Parametric Tests – Transform data if • Assumptions met necessary – Non-Parametric Tests – Normally distributed • Assumptions NOT data met • Shapiro-Wilks Test – Variance is equal • Bartlett’s test MULTIPLE CONCENTRATION PARAMETRIC TESTS • Dunnett’s Test –Equal number of replicates in each treatment • Multiple t-tests with Bonferroni adjustment –Unequal number of replicates in each treatment MULIPLE CONCENTRATION NON-PARAMETRIC TESTS • Steel’s Many-one Rank Test –Equal number of replicates in each treatment • Wilcoxon Rank Sum –Unequal number of replicates in each treatment PASS/FAIL TESTS • Control and critical concentration (IWC) • Test assumptions – Transformations - Arc sine square root – Normality - Shapiro-Wilk’s test – Homogeneity - F-test • Test for statistical difference – Normal/homogeneous - t-test – Non-normal - Wilcoxon rank sum test – Normal/heterogeneous - Modified t-test PURPOSE OF POINT-ESTIMATION AND BASIC CONSIDERATIONS Describe relationship between two parameters 12 10 8 Selection of a significant response 6 4 2 0 0 2 4 6 8 10 12 Elucidation of relationship Confidence in relationship EPA POINT-ESTIMATE METHOD SELECTION • Binomial Data –Probit –Spearman-Karber • Untrimmed or trimmed –Graphical • Continuous Data –ICp / Linear Interpolation PROBIT ANALYSIS • Binomial data only (two choices) – Dead or alive, normal/abnormal, etc. • Normally distributed • Adjusted for control mortality – Abbott’s correction • At least two partial mortalities • Sufficient fit – Chi-square test for heterogeneity • Designed for LC50/EC50 and confidence intervals SPEARMAN-KARBER • Nonparametric model • Monotonic concentration response – Smoothing • Adjusted for control mortality • Zero response in the lowest concentration • 100% response in the highest concentration • Calculates LC50/EC50 • Confidence interval calculation requires at least one partial response TRIMMED SPEARMAN-KARBER • Same basic procedure as SpearmanKarber • Requires at least 50% mortality in one concentration • The trimming procedure is employed when the zero and/or 100% response requirements of Spearman-Karber method are not met. GRAPHICAL METHOD • Specifics –Nonparametric procedure –Adjusted for control mortality –Monotonic concentration response • Smoothing –Linear interpolation of “all or nothing” response –Calculates LC50/EC50 - No CI’s INHIBITION CONCENTRATION (ICp) • Specifics – Nonparametric procedure – Calculates any effect level – Monotonic concentration response • Smoothing – Random, independent, and representative data – Piecewise linear interpolation – Bootstrapped confidence intervals SOFTWARE PROGRAMS • Many software packages/programs are available • DO NOT assume they follow the EPA recommended analysis • DO verify the software by running example datasets from the methods manuals DO THE RESULTS MAKE SENSE ??? 0 Percent Effect 10 20 30 40 Raw Data Mean Probit % MSD EC25 50 60 70 80 0 1 2 3 4 5 Concentration (% Effluent) 6 TOXIC UNITS IN WET TESTS • Goals 1) Standardize the results of toxicity tests to simulate chemical specific criteria. 2) Create a reporting value which increases with sample toxicity. DEFINITIONS OF TU VALUES • Acute – TUa = 100/LC50 OR • Chronic – TUc = 100/NOEC • where the NOEC is defined by hypothesis testing or the IC25 SUMMARY OF THE ANALYSIS OF WET DATA • STEP 1: Graph The Data • STEP 2: Analyze The Data By EPA Methods • STEP 3: Do The Results Make Sense? ANALYSIS OF MULTIPLE CONTROL TOXICITY TESTS SETAC Expert Advisory Panel Performance Evaluation and Data Interpretation WHAT IS A CONTROL SAMPLE ? • A treatment in a toxicity test that duplicates all the conditions of the exposure treatments but contains no test material. The control is used to determine the absence of toxicity of basic test conditions (e.g. health of test organisms, quality of dilution water). Rand and Petrocelli, 1985. WHAT IS A REFERENCE SAMPLE? • “A reference sample is the “control” by which to gauge the instream effects of a discharge at a particular site.” Grothe et.al. 1996. - site-specific - ecoregional WHEN ARE MULTIPLE CONTROLS USED? • When manipulations are made to SOME of the test concentrations or treatments. • To compare “standard” and “alternative” methods. • When testing control and/or reference samples in which the quality is unknown. • When a sample used for toxicity testing possess physico-chemical properties significantly different from water in which surrogate test organisms were cultured. • TIEs - Toxicity Identification Evaluations. WHEN ARE MULTIPLE CONTROLS USED? Example #1 • When manipulations are made to SOME of the test concentrations or treatments. BRINE ADDITION IN MARINE TESTS Concentration Seawater Control 1.25 % 2.5 % 5 % (IWC) 10 % 20 % Brine Control Effluent Volume ( 0 ppt) 0 ml 12.5 ml 25 ml 50 ml 100 ml 200 ml 0 ml Brine Volume Seawater Volume Salinity (68 ppt) (34 ppt) 0 ml 1000 ml 34 ppt 0 ml 0 ml 0 ml 100 ml 200 ml 200 ml + 200 ml 987.5 ml 975 ml 950 ml 800 ml 600 ml 600 ml 34 ppt 33 ppt 32 ppt 34 ppt 34 ppt 34 ppt ANALYSIS OF TWO-CONTROL TOXICITY TESTS WHEN SOME CONCENTRATIONS WERE MANIPULATED Both Controls Valid? No IWC Treated Control Valid? No Yes Analyze IWC and Like Treated Concs. and Control Using EPA Flowcharts Repeat Test Yes Control t-Test Non-Significant? Yes No Pool Controls and Analyze All Data Using EPA Flowcharts Analyze IWC and Like Treated Concs. and Control Using EPA Flowcharts WHEN ARE MULTIPLE CONTROLS USED ? Example #2 • To compare “standard” and “alternative” methods. • To determine treatment effects. 40 35 60 E C 5 100 E C 70 50 30 E C 10 20 50 30 15 60 40 40 30 20 10 20 10 0 F re s h S to re d 0 F re s h 120 E C 25 100 80 60 40 20 0 20 10 (S t o re d - F re s h ; p p b C u ) 0 S to re d 0 F re s h S to re d F re s h S to re d S to re d 10 0 -1 0 -2 0 * -3 0 * -4 0 * -5 0 -6 0 * -7 0 F re s h 15 80 60 40 25 5 C o p p e r C o n c e n tr a t io n ( p p b ) 80 E C 1 C h a n g e in E C V a lu e s C o p p e r C o n c e n tr a t io n ( p p b ) EFFECT OF KELP STORAGE ON SENSITIVITY TO COPPER 1 5 10 15 E ffe c t L e v e l 25 WHEN ARE MULTIPLE CONTROLS USED? Example #3 • When testing control and/or reference samples in which the quality is unknown. - Use of a reference not previously tested (ambient). - Quality of reference may vary from season to season (ambient). - When the potential exists for a sample to be impacted or impaired. EFFECT OF A NON-POINT DISCHARGE ON AN INSTREAM DILUTION WATER C. dubia Control Survival 120 Percent Survival Lab Control Upstream 100 80 60 40 20 0 Apr-96 May-96 Jun-96 Test Date Aug-96 Dec-96 WHEN ARE MULTIPLE CONTROLS USED ? Example #4 • When a sample used for toxicity testing possess physico-chemical properties significantly different from water in which surrogate test organisms were cultured - As a natural phenomenon - Due to sample manipulation WHEN ARE MULTIPLE CONTROLS USED ? Example #5 • TIEs - Toxicity Identification Evaluations. - Methods require the use of multiple controls called “blanks” which are exact manipulations on the dilution water. TAKE HOME POINTS • Multiple negative controls are a good idea if: - New reference or control sample. - Performing any sample manipulations. - Comparing “standard” vs. “alternative” methods. Multiple Positive Controls (e.g. Ref Tox tests) should be used in this situation - Using multiple organisms with different sensitivities. REFERENCES: • Short-Term Methods For Estimating The Chronic Toxicity Of Effluents And Receiving Water To Freshwater Organisms. EPA-600-4-91-002. July, 1994. • Methods for Measuring the Acute Toxicity of Effluents and Receiving Waters to Freshwater and Marine Organisms. EPA/600/4-90/027F. August, 1993. - Have recommendations for multiple controls under certain conditions. • Methods for Aquatic Toxicology Identification Evaluations. Phase I Toxicity Characterization Procedures. EPA/600/6-91/003. February, 1991. - Has recommendations for multiple controls “blanks”. • Whole Effluent Toxicity Testing: An Evaluation of Methods and Prediction of Receiving Water System Impacts. Grothe et al.. 1996. SUSPICIOUS DATA AND OUTLIER DETECTION SETAC Expert Advisory Panel Performance Evaluation and Data Interpretation CONCERNS • Outliers make interpretation of WET data difficult by – Increasing the variability in test responses – Biasing mean responses IDENTIFYING OUTLIERS Raw Data and Means Proportion Alive 0.8 0.6 0.4 0.2 0.0 0 100 200 300 400 Copper Concentration (ppb) Residual (predicted - observed) • Graph raw data, means and residuals 1.0 Residuals 0.4 0.2 0.0 -0.2 -0.4 -0.6 -0.8 0 100 200 300 400 Copper Concentration (ppb) IDENTIFYING OUTLIERS • Formal statistical test - Chauvenet’s Criterion – Using the previous mysid data, the critical values are: • Mean = .80, Std. Dev. = 0.302, n = 8 – Chauvenet’s Criterion Value = n/2 = 4 – Z-score = 2.054 (two-tailed probability of n/2 = 4 %) – The calculations are: • Equation 1) (Z-score)(Std. Dev.) = (2.054)(0.302) = 0.620 • Mean Equation 1 = 0.80 0.620 = 1.42 - 0.18 • Outlier Range is >1.42 or <0.18 – A value of 0.2 is not an outlier. CAN A CAUSE BE ASSIGNED TO THE OUTLIER(S) ? • • • • • Review analyst’s daily observations Check water chemistry data Check data entry Check calculations If cause can be assigned to outlier, then reanalyze data without outlier DETERMINE EFFECT ON TEST INTERPRETATION • Keep all data unless cause is found • Analyze data with and without suspect data • Determine effect of suspect data on test interpretation • Results reported will depend on effect of outlier(s) on test interpretation, best professional judgement, and discussions with regulatory agency REPORTING OF RESULTS • Insignificant Effect – With Outlier • IC25 = 131 (96.9-158) ppb • NOEC = 100 ppb • % MSD = 28.1 % – Without Outlier • IC25 = 124 (93.6-152) ppb • NOEC = 100 ppb • % MSD = 20.9 % • Report results with suspect data included • Significant Effect – With Outlier • IC25 = 131 (96.9-158) ppb • NOEC = 100 ppb • % MSD = 28.1 % – Without Outlier • IC25 = 106 (83.8-126) ppb • NOEC = 50 ppb • % MSD = 12.2 % • Report results from both analyses CONCENTRATION RESPONSE CURVES IN WET TESTS SETAC Expert Advisory Panel Performance Evaluation and Data Interpretation NON-MONOTONICITY vs. HORMESIS • Hormesis is a toxicological response to a single toxicant characterized by lowconcentration stimulation but is inhibitory at higher doses. • Non-monotonicity is a relationship where a smaller response (e.g. mortality) is observed at the higher of two consecutive concentrations. TYPICAL TRAITS OF HORMESIS Max. Stimulation (30-60%) Response • Calabrese and Baldwin, 1998 • Hormetic - concentration range • Magnitude of hormetic stimulation • Range from maximum stimulation to NOEL (NOEC) Hormetic Range (10 x) Max. Stimulation to NOEL Range (4-5 x) Concentration NOEL WHY IS HORMESIS DIFFICULT TO DETECT IN TOXICITY TESTS? Response Well Defined Hormetic Response 100 1000 Concentration Poorly Defined "Hormetic" Response Response • Inadequate concentration series • Inadequate description of concentration response • Inadequate statistical power • Hormesis is not the cause 100 Concentration EFFECTS OF NON-MONOTONIC DATA NOEC >LOEC 100 Percent Fertilized • Limited replicates (4) • High control & low concentration variability • High Statistical Power • NOEC > LOEC Sea Urchin Fertilization Data 95 Statistically Significant Reduction 90 85 80 75 NOEC = 6.0 % LOEC = 0.36 % % MSD = 5.82 % IC25 = > 6.0 % 70 0 1 2 3 4 Percent Effluent 5 6 EFFECTS OF NONMONOTONIC DATA HETEROGENEITY IN PROBIT ANALYSIS Significant Chi-Square for Heterogeneity Response • Limited replicates (5) • High control & low concentration variability • Significant chi-square • Inflated confidence intervals • Reanalyze with nonparametric models 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 1 10 100 Dose ppb 1000 10000 EFFECTS OF NON-MONOTONIC DATA SMOOTHING IN ICp ANALYSIS Selenastrum Cell Growth Data 250 Actual Response Smoothed Response 225 200 Response (% of Control) • Smoothing is used in all non-parametric models. • Smoothing procedure averages treatment responses • Increases estimated toxicity 175 150 125 100 75 50 25 0 0 20 40 60 Percent Effluent 80 100 REMEDIES FOR PROBLEMS ASSOCIATED WITH NONMONOTONIC DATA • • • • Better concentration series selection Increase number of replicates % MSD limits (NOEC’s) Use of more robust parametric models Bailer and Oris, 1997 Kerr and Meador, 1996 Baird et al., 1996 • Concentration-response curve criterion CONFIRMATION OF A CONCENTRATION-RESPONSE CURVE • Graphical • Linear regression Analysis • Correlation Analysis GRAPHIC ANALYSIS OF CONCENTRATION-RESPONSE CURVES Concentration-Response Curve Absent Concentration-Response Curve Present 80 80 70 60 Response (% Effect) Response (% Effect) 60 50 40 30 20 50 40 30 20 10 10 0 0 -10 -10 0 1 2 3 4 5 Concentration (% Effluent) 6 % MSD Raw Data Mean 70 % MSD Raw Data Mean 0 1 2 3 4 5 Concentration (% Effluent) 6 GRAPHIC ANALYSIS OF CONCENTRATION-RESPONSE CURVES Response (% Effect) Concentration-Response Curve Present ??? 80 70 60 50 40 30 20 10 0 -10 Raw Data Mean % MSD 0 1 2 3 4 Concentration (% Effluent) 5 6 LINEAR REGRESSION ANALYSIS OF CONCENTRATIONRESPONSE CURVES Concentration-Response Curve Absent 80 80 Negative Slope Not Sig. Dif. from Zero Raw Data Mean Probit % MSD 70 50 40 30 20 50 40 30 20 10 0 0 -10 -10 1 2 3 4 5 Concentration (% Effluent) 6 Raw Data Mean Probit % MSD 60 10 0 Positive Slope and Sig. Dif. than Zero 70 Response (% Effect) 60 Response (% Effect) Concentration-Response Curve Present 0 1 2 3 4 5 Concentration (% Effluent) 6 LINEAR REGRESSION ANALYSIS OF CONCENTRATIONRESPONSE CURVES Response (% Effect) Concentration-Response Curve Present ??? 80 70 60 50 40 30 20 10 0 -10 Positive Slope Not Sig. Dif. from Zero Raw Data Mean Probit % MSD 0 1 2 3 4 Concentration (% Effluent) 5 6 CORRELATION ANALYSIS OF CONCENTRATION-RESPONSE CURVES Concentration-Response Curve Present Concentration-Response Curve Absent 80 60 Significant Negative Correlation (r = -0.965, P = 0.000) 70 % MSD Raw Data Mean 50 40 30 20 % MSD Raw Data Mean 60 Response (% Effect) 70 Response (% Effect) 80 Insignificant Correlation (r = -0.0931, P = 0.593) 50 40 30 20 10 10 0 0 -10 0 1 2 3 4 5 Concentration (% Effluent) 6 0 1 2 3 4 5 Concentration (% Effluent) 6 CORRELATION ANALYSIS OF CONCENTRATION-RESPONSE CURVES Response (% Effect) Concentration-Response Curve Present ??? 80 70 60 50 40 30 20 10 0 -10 Significant Negative Correlation (r = -0.389, P = 0.021) Raw Data Mean % MSD 0 1 2 3 4 Concentration (% Effluent) 5 6 SUMMARY • Identification of a significant C-R curve is an important QA check. • Graphical analysis is simple but subjective • Linear regression analysis is objective and conservative but requires parametric analysis. • Correlation analysis is objective and liberal and non-parametric methods are available. BIOLOGICAL INTEFERENCE IN FATHEAD CHRONIC TESTS TOXICITY CHARACTERISTICS • Seasonal (cold months) • Affects only fathead minnows • High variability • Poor dose response • Fungus-like growth Normal Gills and Pharynx Bacterial Clogging % Survival on Day of Test Rep 3 4 7 1 100 13 0 2 100 25 0 3 100 100 100 4 100 88 88 5 100 50 13 STERILIZATION Autoclaved 100 80 60 40 20 0 25% 50% 100% % Survival % Survival UV LIGHT 100 80 60 40 20 0 100% Untrt Untrt PASTEURIZE 25% 50% 100% Antibiotic % Survival 100 80 60 40 20 0 Untrt Autoclaved UV ANTIBIOTIC % Survival 50 100 80 60 40 20 0 25% 50% 100% Untrt Pasteur ANTIBIOTIC ADDITION 100 Rec control % Survival 80 32% 60 42% 40 56% 20 75% 100% 0 Baseline Diluent Only ANTIBIOTIC ADDITION 100 Rec cont % Survival 80 32% 60 42% 40 56% 20 75% 0 100% Baseline Diluent + Effluent % Alive Since Previous Day EFFECT OF ISOLATION 100 80 Sick Fish Removed 60 40 Dead Fish Removed 20 0 1 2 3 4 Day of Test 5 6 CONCLUSION • “Toxicity” due to a naturally occurring pathogen • Best viewed as a kind of interference CONTROLLING BIOLOGICAL INTERFERENCE • Heat • Filtration (0.2 uM) • UV light • Antibiotics HEAT Advantages: • Simple, no specialized equipment Disadvantages: • May be more “intrusive” (e.g. removal of volatile components • Must re-aerate sample FILTRATION (0.2 UM) Advantages: • Usually very effective Disadvantages: • Impractical with high suspended solids • Requires specialized equipment for filtering large volumes • May remove particulate bound contaminants UV LIGHT Advantages: • Usually very effective. • Uses common equipment Disadvantages: • Less effective with high suspended solids or stained water • May degrade organic contaminants or enhance organic toxicity (e.g. PAHs) ANTIBIOTICS Advantages: • Usually very effective. • Chemicals inexpensive and widely available • Easy to treat large volumes Disadvantages: • May require determination of proper dose SUMMARY • Chronic WET tests using fathead minnows may show evidence of interference due to pathogens. • Interference = high variability, poorly defined dose response • Most common with surface waters • Control measures = sample treatment to kill or remove pathogens. STATISTICAL AND BIOLOGICAL SIGNIFICANCE SETAC Expert Advisory Panel Performance Evaluation and Data Interpretation TOXIC VS. NON-TOXIC • WET Tests Developed to Identify Toxic Samples • Two Methods Used –Hypothesis testing - Statistical difference –Point-estimation - Standard level of effect TOXICITY ASSUMPTIONS OF HYPOTHESIS TESTING • Non-Toxic = No statistical difference between control and critical concentration response • Toxic = Statistical difference between control and critical concentration response TOXICITY ASSUMPTIONS OF POINT-ESTIMATION • A preselected level of effect is considered toxic – Acute test: 50 % effect – Chronic test: 25 % effect • Toxic = ECx/ICx is less than the critical concentration (IWC) • Non-Toxic = ECx/ICx is equal or greater than the critical concentration (IWC) BOTH APPROACHES HAVE STRENGTHS AND LIMITATIONS • Complete Discussion in: –Grothe et al. Eds. 1996. Whole Effluent Toxicity Testing: An Evaluation of Methods and prediction of Receiving System Impacts, SETAC Press, Pensacola, FL, USA. STRENGTHS AND LIMITATIONS OF HYPOTHESIS TESTS • Strengths – Suited for comparison of treatments – Simple to calculate (no modeling) – Not model dependent • Limitations – NOEC is concentration dependent – Variability reduces statistical power and increases significant effect – No confidence intervals – Results are independent of concentration-response curve STRENGTHS AND LIMITATIONS OF POINT ESTIMATES • Strengths – Uses concentrationresponse curve – Not limited to tested concentrations – Confidence intervals • Limitations – Selection of effect level – Partial responses increase accuracy – Model dependent – More difficult computations WHICH METHOD IS BEST? • Both Approaches Are Supported By The TSD And The Methods Manuals • Depends On The Purpose Of The WET Test –Hypothesis test - Identify statistical difference from control response –Point-estimate - Concentration which shows a standard effect TOXIC MAY NOT = ECOLOGICAL IMPACT • Hypersensitive Hypothesis Tests • Relatively Sensitive Test Species • Inconsistent Exposure Parameters Between the Toxicity Test and Receiving Water – Magnitude, duration, frequency of exposure – Water chemistry • Population/Community Structure Dynamics NONTOXIC MAY NOT = NO ECOLOGICAL IMPACT • • • • Hyposensitive Hypothesis Tests High Effect Level In Point-Estimates Relatively Insensitive Test Species Inconsistent Exposure Parameters Between the Toxicity Test and Receiving Water – Magnitude, duration, frequency of exposure – Water chemistry • Undetected Biological Effects • Population/Community Structure Dynamics WHAT CONCLUSIONS CAN BE MADE? • The Sample Is Toxic/Non-Toxic As Defined By The WET Program • The Biological Impact Was Significant/Insignificant In The Beaker • The Receiving Water May or May not Become Impacted WAYS TO INCREASE THE ECOLOGICAL RELEVANCE • Identification of Toxic Agent(s) • Consider the Use Of Indigenous Species In Toxicity Tests • Consider Exposure Parameters Found In Receiving Water – Magnitude, duration, frequency of exposure – Water chemistry – Ambient water tests • In Situ Bioassays • Detection and Study Of Other Biological Effects • Comprehensive Study Of Population/Community Structure Dynamics In Receiving Water • Further Studies In A Variety Of Ecosystems Which Examine The Relationship Between WET Tests And Ecological Impact. COST OF “ECOLOGICALLY RELEVANT” WET TESTS • Very Expensive –Methods Research and Development –Receiving water characterization –Field bioassessments • Loss Of Comparability • Increase In Complexity Of Water Quality Standards and Interpretation SUMMARY • WET Tests Were Developed To Identify Toxic and Nontoxic Samples • WET Tests Are Useful In Conjunction With Chemical And Field Assessment Data To Protect Aquatic Ecosystems • Adaptation Of WET Tests To Be Ecologically Relevant Can Be Helpful But Comes At A Cost FALSE POSITIVES FALSE NEGATIVES GUIDING PRINCIPLE = REPEATABILITY Repeatable test results are taken as “true” or “real” or “correct”. FALSE POSITIVES/NEGATIVES IN CONTEXT OF WET TESTS Depends on presumed function of WET tests: • WET Test as “predictor” of instream effects. • WET Test as “detector” of toxic amounts of toxic chemicals WET TEST AS “PREDICTOR” OF INSTREAM EFFECTS. • False Positive = false indication of instream effects • False Negative = failure to indicate instream effects WET TEST AS “DETECTOR” OF TOXIC AMOUNTS OF TOXIC CHEMICALS • False Positive = false indication of presence of toxic amounts of toxic chemicals • False Negative = failure to indicate presence of toxic amounts of toxic chemicals WHAT IS “TOXICITY”? • Statistically significant difference between effluent concentration and control • An LC50 or other point estimate that is less than some predetermined value The operational definition of toxicity is often statistical TOXICITY AS A STATISTICAL CONCEPT • False Positive = Statistically significant effect that is not “Real” (spurious, artifactual). • False Negative = Effect that should be observed but is not. THERE ARE REASONS WHY STATISTICALLY SIGNIFICANT RESULTS HAPPEN At most, 4 things are present in a test beaker: Diluent Sample Organism(s) Food TOXICITY NOT DUE TO SAMPLE • Technician error • Bias in test chamber location or in assigning organisms to treatments. • Statistical sampling error (Type 1 error) • Other TECHNICIAN ERROR • Expertise • Experience BIAS IN ORGANISM/CHAMBER ASSIGNMENT • Bias in organism assignment is a tendency to assign healthier or less healthy organisms to certain test concentrations • Systematic arrangement of test chambers can result in systematic bias in organism response (e.g. Selenastrum algal growth test) • Can be eliminated through proper randomization.(See Davis, et al, 1998) STATISTICAL OUTCOMES Types of Errors in Hypothesis Testing If Ho is True If Ho is False If Ho is rejected Type I error No error If Ho is not rejected No error Type II error HYPOTHESIS TESTING FACTS • NOECs are not point estimates • Cannot calculate coefficients of variation or confidence intervals • NOEC is a lower concentration level than the LOEC when the dose response curve is smooth • LOEC may represent a different amount of effect from test to test = 0.05 = Type 1 Error o msd Null Hypothesis is TRUE = 0.05 Power = 0.8 = 0.2 = Type 2 Error o msd a Null Hypothesis is FALSE STATISTICAL SAMPLING ERROR • Type 1 error. • Should be rare (P < alpha) • Not repeatable • Can be reduced by decreasing alpha but at cost of increasing Type 2 error (False Negatives) “UNINTERESTING” TOXICITY Toxic response due to a sample that deviates from culture conditions but is still within standard test conditions. E.g. The toxic response is due to a slight difference in pH (0.2 units). FALSE NEGATIVE: FAILURE OF THE TEST SYSTEM TO INDICATE TOXICITY • Operator error • Bias • Type 2 error • Intrinsically variable data • Interference CONCLUSIONS False +/- are “wrong” answers. • In the absence of technician error, biased test design and biased sampling, the False +/rate = Type I and II error rate, respectively. • Repeatable results, in the absence of technician error and biased sampling, cannot be False +/-’s. • An estimate of the False + rate could be obtained through testing of blanks. INTRA- AND INTERTEST VARIABILITY SETAC Expert Advisory Panel Performance Evaluation and Data Interpretation TYPES OF VARIABILITY • Variability inherent in any analytical procedure • Intra-test : among and between concentrations • Intra-lab: within one lab, same method • Inter-lab: between labs, same method • Method specific: within limits of method –organism age, length of test, dilution water, food type, etc. INTRA-TEST VARIABILITY Group N Mean Survival s.d. CV (%) control 4 0.975 0.050 5.1 2 4 0.975 0.050 5.1 3 4 0.975 0.050 5.1 4 4 0.950 0.058 6.1 5* 4 0.675 0.150 22.2 6* 4 0.275 0.222 80.6 MSE = 0.033 MSD = 13.9 % INTRA-TEST VARIABILITY AND ENDPOINT UNCERTAINTY EC Conc. 1 220 10 332 196 422 0.68 50 553 440 670 0.41 90 919 744 1416 0.73 99 1392 1024 2906 1.35 Lower Upper 95% CL 95% CL 95 310 Conf. Int/EC 0.98 POINT ESTIMATE INTRA-LAB VARIABILITY 13 11 10 LC50 95% UCI 95% LCI mean LC50 9 8 7 6 Tests 19 17 15 13 11 9 7 5 3 5 1 LC50 (mg/l SDS) 12 HYPOTHESIS TESTS INTRA-LAB VARIABILITY Horizontal lines = acceptance limits for two dilution series (red dotted = 0.5; blue dashed = 0.75) NOEC (ppb Cu) 300 250 200 150 100 50 0 0 1 2 3 4 5 Test # 6 7 8 9 10 SOURCES OF INTRA-TEST VARIABILITY • Genetic variability • Organism handling and feeding • Toxicity among and between treatments • Non-homogeneous sample source • Sample toxicity SOURCES OF INTRA-TEST VARIABILITY • Abiotic conditions • Dilution scheme • Number of organisms/treatment • Dilution water pathogens • Randomization important! SOURCES OF INTRA-LAB VARIABILITY • • • • • • Intra-test sources Analyst experience and practice Organism age and health Acclimation Dilution water Type of sample SOURCES OF INTRA-LAB VARIABILITY • • • • Sample quality Test chamber characteristics Organisms/source food type/rate/source SOURCES OF INTRA-LAB VARIABILITY • Replicate volume • Test duration • Procedures SOURCES OF INTER-LAB VARIABILITY • All of previous are important • Differences allowed in methods - Could be significant between labs • Differences in protocols - State, federal, local, etc. Use promulgated standard • ANALYST EXPERIENCE VARIABILITY AND POINT ESTIMATE UNCERTAINTY Test #1 Test #2 Mean CV (%) 9.9 33.8 IC25 (%) 27.2 26.0 MSE 34.5 290.6 95% CI 25.7-28.5 17.2-31.3 HYPOTHESIS TESTS HIGH VARIABILITY - LOW STATISTICAL POWER Group n Control 4 2 4 727 674 92.7 3 4 1080 408 37.7 4 4 564 493 87.5 5 4 748 235 31.4 MSD = 131 % Mean wt s.d. (ug/ind) 632 552 CV% 87.4 HYPOTHESIS TESTS LOW VARIABILITY - HIGH STATISTICAL POWER Group n Control 4 Mean wt mg/survivor 0.30 s.d. CV 0.012 4.0% 10% 4 0.30 0.013 4.3% 18% 4 0.31 0.008 2.6% 32% 4 0.30 0.010 3.3% 56% 4 0.27* 0.013 4.8% 100% 4 0.27* 0.013 4.8% MSD = 6.5 % ACTIONS TO REDUCE VARIABILITY • Establish performance criteria • QA program • Establish and follow strict procedures • MAXIMIZE ANALYST SKILL • Contract lab selection • Additional QA/QC criteria WHY DETERMINE METHOD VARIABILITY AND WHY CONTROL VARIABILITY? • If inherent variability of each method is known there will be less chance of making errors concerning toxicity. • Variability too high - not detect toxicity when present. Variability too low - might detect toxicity when it is not there. • At present there is little incentive to reduce variability. EXAMPLES OF ADDITIONAL QC TEST CRITERIA • EPA Region IX: upper MSD limits • Washington: upper MSD limits, change in • N. Carolina: limit control CVs, C. dubia “Practical Sensitivity Criteria” • EPA Region VI: limit control CV, increase number replicates,biological significance THE CHRONIC TEST GROWTH ENDPOINT SETAC Expert Advisory Panel Performance Evaluation and Data Interpretation CHANGE IN GROWTH ENDPOINT CALCULATION Pre-Nov., 1995 Approach Growth = D.W. surviving organisms # surviving organisms Post-Nov., 1995 Approach Growth = D.W. surviving organisms # initial organisms EFFECT ON MEAN TREATMENT RESPONSES % Before After Treatment Mortality Promulgation Promulgation Control 5.1 325 308 2 2.6 353 341 3 5.0 345 329 4 17.9 387 306 5 47.5 319 167 INTRA-TREATMENT VARIABILITY AND WEIGHT CALCULATIONS 35 CV (%) 30 25 20 After Before 15 10 5 1 3 5 7 9 11 13 15 17 Observations 19 21 23 OLD MSE/NEW MSE RATIO 1.6 1.4 1.2 1 Ref. Tox. Effluent 0.8 0.6 0.4 0.2 0 1 2 3 4 5 6 Tests 7 8 9 10 EFFECTS ON HYPOTHESIS TEST ENDPOINTS Test # Before After Promulgation Promulgation %MSD NOEC %MSD NOEC 1 16.4 50 16.7 50 2 10.8 10 29.1 10 3 11.9 5 39.0 5 4 19.7 25 18.5 25 EFFECTS ON HYPOTHESIS TEST ENDPOINTS Tes t# 1 2 3 4 Before Promulgation %M NO Avg. SD EC wgt. at NO EC 20.9 100 296 19.5 100 268 22.1 100 254 21.4 100 387 After Promulgation %M NO Avg. SD EC wgt. at NO EC 23.4 100 296 25.1 100 233 24.1 100 227 22.8 100 313 EFFECTS ON POINT ESTIMATE ENDPOINTS Test # Before Promulgation IC25 95%CI After Promulgation IC25 95%CI 1 56.2 45.4-79.3 48.3 43.3-61.9 2 NC NC 12.4 6.4-13.8 3 NC NC 4.2 1.5-7.3 4 33.7 28.2-40.6 30.0 19.4-35.0 EFFECTS ON POINT ESTIMATE ENDPOINTS Test # Before Promulgation IC25 95%CI After Promulgation IC25 95%CI 1 291 NC 234 191-262 2 386 NC 176 140-256 3 227 179-258 138 111-155 4 >400 NC 144 104-162 NOEC/IC25 RELATIONSHIP Test # Test Type NOEC IC25 Before IC25 After 1 Effluent 50% 56.2 48.3 2 Effluent 25% 33.7 30.0 3 Ref. Tox. 100 ppb 291 234 4 Ref. Tox. 100 ppb 386 176 5 Ref. Tox. 100 ppb 227 138 6 Ref. Tox. 100 ppb >400 144 IMPACT ON TEST INTERPRETATION • Hypothesis Test Results - most cases show little change, but not always • Point Estimate Results - usually increases predicted toxicity ISSUES RELATED TO CHANGE IN APPROACH • Test growth or biomass? • Accurate representation of growth? • Correlation between new results and instream responses? ISSUES RELATED TO CHANGE IN APPROACH • Conflict between new results and unchanged effluent quality? • Effect on reference toxicant control charts • Relationship between NOEC and IC25 AGE-RELATED SENSITIVITY OF FISH IN ACUTE WET TESTS SETAC Expert Advisory Panel Performance Evaluation and Data Interpretation REVISIONS TO FISH AGES IN EPA ACUTE TEST MANUALS • From: 1-90 days old in the 3rd edition of the acute manual (1985; EPA/600/4-85/013) • To: 1-14 days old (or 9-14 days old for silversides) in the 4th edition of the acute manual (1993; EPA/600/4-90/027F) COMMONLY USED TEST SPECIES • Fathead minnows • Sheepshead minnows • Silversides (inland, atlantic, and tidewater) RATIONALE • Younger life stage is generally more sensitive than older life stage • Reduction in range of acceptable ages from 1-90 to 1-14 days will reduce variability CONCERN • Use of younger fish in NPDES testing may show an increase in apparent toxicity, without any changes in effluent conditions COMMON QUESTIONS • Are <14-day old fish more sensitive than <90-day old fish to toxicants? • Does the use of <14-day old fish reduce intertest variability when compared to <90 day-old fish? • How does the sensitivity and precision vary within the 1 to 14 day old age range? SENSITIVITY OF 14, 30, AND 90 DAY-OLD FATHEAD MINNOWS Copper Unionized Ammonia 1200 1.50 C 1000 800 B 600 400 A 200 0 Mean 96 hr LC50 (ppm) Mean 96 hr LC50 (ppb) B 1.25 A 1.00 A 0.75 0.50 0.25 0.00 14 30 Age (days) 90 14 30 Age (days) 90 INTER-TEST PRECISION OF 14, 30, AND 90-DAY OLD FATHEAD MINNOWS Copper Unionized Ammonia 0.25 Coefficient of Variation Coefficient of Variation 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 0.20 0.15 0.10 0.05 0.00 14 30 Age (days) 90 14 30 Age (days) 90 SENSITIVITY OF 1-14 DAY-OLD FATHEAD MINNOWS A 300 B 200 B 100 B 0 1 Mean 48 hr LC50 (ppm) B 7 6 5 4 3 2 1 0 4 7 10 250 A 200 150 B 100 B B B 7 10 14 50 0 14 1 4 Age (days) Age (days) SDS Unionized Ammonia A B 1 Mean 48 hr LC50 (ppm) 400 Hexavalent Chromium 4 B 7 B 10 Age (days) B 14 Mean 48 hr LC50 (ppm) Mean 48 hr LC50 (ppb) Sodium Pentachlorophenol 2.5 2.0 A 1.5 A 1.0 A A A 7 10 14 0.5 0.0 1 4 Age (days) INTER-TEST PRECISION OF 1-14 DAY-OLD FATHEAD MINNOWS Coefficient of Variation 0.6 NaPCP Cr+6 SDS NH3 0.5 0.4 0.3 0.2 0.1 0.0 1 - 14 4 - 14 7 - 14 Age Range (days) 10 - 14 SUMMARY • 14-day old fathead minnow larvae are more sensitive to copper & ammonia than 90 day- old fish. • The inter-test precision of 90 day old fish is equal or better than 14 day-old fish for copper & ammonia. SUMMARY - Cont. • Within the 1-14 day age range, 1 day-old larvae are less sensitive to several toxicants. • The sensitivity of these toxicants becomes constant after 4-7 days of age. • Maximum inter-test precision for these toxicants is observed when the age range is limited to 7 -14 day old larvae. REASONABLE POTENTIAL AND TOXICITY TEST DESIGN RP DETERMINATION DEFINITION • “to determine whether the discharge causes, has the reasonable potential to cause, or contributes to an excursion of numeric or narrative water quality criteria” (TSD, 1991) REASONABLE POTENTIAL • 40 CFR 122.44(d)(1) requires that the RP procedure address the following: – effluent variability – existing controls on all pollution sources – available dilution – species sensitivity • WERF POTW survey found that RP is not consistent among regulatory agencies REASONABLE POTENTIAL EXAMPLES • Virginia definition is that 75% of tests must meet decision criterion • Region IX uses a statistical approach adopted from the TSD • Some states do not issue limits • Some states issue limits to all major dischargers VARIABILITY AND RP • Primarily an inter-test issue – effluent variability – method variability • How is it determined? – Assumptions • TSD • Similar facilities – Collecting sufficient data • Monthly?, Quarterly?, Annually? VARIABILITY ASSUMPTION ISSUES • TSD assumption (CV=0.6) may not be accurate • May take advantage of data for similar facilities, reduces some uncertainty • Actual data always best - greater certainty in decision to issue limit • Reduce potential for erroneous conclusions based on a few data points 95%1 WLA 95%2 HOW TO ADDRESS VARIABILITY THROUGH TEST DESIGN • Consistency between tests: – dilution schemes – dilution water type and characteristics – test vessel dimensions and material – test replicate volume – increase sample size per rep. or conc. – test organism age (acute tests) – species sensitivity affects variability SPECIES SENSITIVITY AND RP • Two Components: – Representative of condition to be protected? – Magnitude of toxicity • Both components affected by: – species – age of life stage – dilution water quality – test type (static, renewal, flow-through) – culturing/handling of organisms SPECIES SENSITIVITY AND REPRESENTATION OF TOXICITY • Important that tests be reliable indicators of toxicity, dependent on some test design parameters: –pH –hardness –alkalinity –treatment renewals TEST AND INSTREAM HARDNESS • C. dubia sensitive to hardness • C. dubia acclimated and tested at 120 ppm hardness • Instream and effluent hardness is 300 ppm • Test result due to effluent or sensitivity to hardness? • Solution: test different organism or C. dubia cultured at higher hardness SPECIES SENSITIVITY, TOXICITY, ORGANISM AGE & RP • Flexibility in organism age tested –acute: significant –chronic: minimal • Data indicates that age affects sensitivity SPECIES SENSITIVITY, TOXICITY, DILUTION WATER QUALITY & RP • Example: pH • If ammonia is present, and pH artificially rises in test beyond that in real world, ammonia may contribute to toxicity and affect results used to determine RP • Solution: control pH in tests at levels occurring at the condition of interest (IWC, 100% discharge, etc.) using direct control (CO2 headspace) or flow-through testing DILUTION & RP • EPA’s RP approach compares data distribution to WLA • If WLA predicted to be exceeded by a specific percentile of the distribution, then RP exists • WLA consists of numeric standard and dilution Ceriodaphnia sp. Relative Frequency CV = 1.06 Long - Term Average WLA1 95th % WLA2 Chronic Toxic Units ADDRESSING DILUTION & RP IN TEST DESIGN • Center test dilutions on respective effluent concentrations of concern • Test dilutions below and above • Avoid testing concentrations/conditions which are unlikely to naturally occur • Maximize dilution factor with intra-test and inter-test uncertainty in mind CHOOSING TEST DILUTIONS • Example: – Chronic IWC = 25% – Dilutions of 23%, 24%, 25%, 26% and 27% may miss toxicity at 28% which is well within uncertainty of most chronic endpoints and may result in a false negative indication of toxicity – If dilutions are 6.25%, 12.5%, 25%, 50% and 100%, there is little environmental relevance to results at concentration 4x the IWC – Choose something in between, like 12%, 17%, 25%, 35% and 50% (dilution factor 0.7) RP TEST DESIGN SUMMARY • Minimize inter-test method variability • Insure representative test results through control of parameters not limited by methods • Account for dilution in tests • Balance maximum dilution factors in tests with endpoint uncertainty MOST SENSITIVE SPECIES SELECTION SETAC Expert Advisory Panel Performance Evaluation and Data Interpretation MOST SENSITIVE SPECIES (MSS) DETERMINATION • Purpose –To determine which test species is “most sensitive” to an effluent source or ambient water • Desired Toxicity Information from MSS –Variability/Seasonality –Magnitude or frequency of “sensitive” response COMMON CONSIDERATIONS • • • • • • Test Frequency Species Selection Dilution Water Type Sample Type Concentration Series Statistical Analysis FREQUENCY AND TIMING OF MSS SCREENS • Balance of Cost and Adequate Information • Initial or Reevaluation • Seasonal or Summary Information Desired SELECTION OF TEST SPECIES • Diversity of Organism Types – Plant, vertebrate, invertebrate • Nature of Receiving Water – Salinity, resident species • Non-promulgated, Resident Species • Suspected Toxicant(s) – USEPA Region 9 & 10 Guidance Document SELECTION OF DILUTION WATER • Method Defined Synthetic Dilution Water • Natural Receiving Waters • Receiving Water Defined Synthetic Dilution Water SELECTION OF SAMPLE TYPE • Whole Effluents • Receiving Water • Composite or Grab Samples CONCENTRATION SERIES SELECTION • Multiple Concentration Tests – Preferred experimental design for MSS screens – Select concentrations based upon IWC and elucidation of concentration-response (CR) relationship. • Single Concentration Tests (Pass/Fail) – Effective if cost is prohibitive – Control and IWC STATISTICAL ANALYSIS AND INTERPRETATION • Multiple Biological Endpoints • Combining Multiple Screen Results • Statistical Analysis Method MULTIPLE BIOLOGICAL ENDPOINT ANALYSIS K e l p G e r m i n a t i o n a n d G e r m T u b e L e n g t h 1 0 0 G e r m i n a t i o n T u b e L e n g t h 8 0 6 0 EfluentConcetraion(%) • Evaluate each biological endpoint • Use most “toxic” endpoint 4 0 2 0 0 N O E C E C / I C 2 5 S t a t i s t i c a l E n d p o i n t METHODS OF COMBINING MSS RESULTS M u l t i p l e M S S S D a t a U s i n g F W C h r o n i c T e s t s • Averaging C D S C 1 0 0 8 0 EC 25 /IC 25 (%Efluent) • Proportion (X times out of Y screens) F H * * 6 0 4 0 * 2 0 0 1 2 3 S c r e e n N u m b e r S p e c i e s P r o p o r t i o n ( X / Y ) A v e r a g e F a t h e a d M i n n o w ( F H ) 6 7 % ( 2 / 3 ) * 8 7 % C e r i o d a p h n i a ( C D ) 3 3 % ( 1 / 3 ) 7 0 % * S e l e n a s t r u m ( S C ) 0 % ( 0 / 3 ) 9 7 % STATISTICAL ANALYSIS METHODS FOR MSS SCREENS • NOEC’s • Point-estimates • Probability of effect at critical concentration (pECC) NOEC’S • Experimental Question Which method/species is most likely to identify a change from control response? ADVANTAGES OF NOEC’S 100 NOEC (% Effluent) • Common method • Integrates effect and intratest variability 80 * 60 40 20 0 FH CD Species SC • Can not separate biological effect and statistical sensitivity • Can not average • NOEC’s may not be environmentally relevant Effluent Concentration (%) DISADVANTAGES OF NOEC’S EC/IC25 NOEC >100 100 >100 80 60 IWC 40 20 0 FH CD Species SC POINT ESTIMATES • Experimental Question Which method/species shows the specified effect at the lowest concentration? ADVANTAGES OF POINT ESTIMATES 100 * FH - EC25/IC25 = 70 % CD - EC25/IC25 = 90 % SC - EC25/IC25 = > 100 % 90 80 70 Effect (%) • Evaluates a common effect level • Utilizes the entire concentration-response curve (parametric models) • Can use proportion or average analysis 60 50 40 30 20 10 0 0 20 40 60 Concentration (%) 80 100 DISADVANTAGES OF POINT ESTIMATES * FH - EC25/IC25 = 70 % CD - EC25/IC25 = 90 % SC - EC25/IC25 = > 100 % 100 90 80 70 Effect (%) • Effect level selection • Concentrationresponse required • Smoothing • No consideration of endpoint precision • EC values may not be environmentally relevant 60 IWC 50 40 30 20 10 0 0 20 40 60 Concentration (%) 80 100 PROBABILITY OF EFFECT AT THE CRITICAL CONCENTRATION (pECC) • Experimental Question At the concentration of environmental concern, which method/species had the greatest effect at the lower 95 % confidence limit? ADVANTAGES OF pECC ECC pECC 20 * Effect (%) • Considers precision of response estimate • Can use proportion or average analysis • Environmental relevance • No concentrationresponse required 30 10 0 -10 FH CD Species SC DISADVANTAGES OF pECC 10 5 * Effect (%) • Zero replicate variance • Boot-strapping • Obtaining 95% confidence intervals at IWC 0 0 0 -5 -10 ECC pECC -15 FH CD Species SC SUMMARY • Discuss the MSS procedure in detail during permit development • Select variety of organism types • Initially test for trends in toxicity • Continue periodic screening • Select type of statistical analysis carefully • Make sure that statistical analysis and the raw results “make sense” WHOLE EFFLUENT TOXICITY TEST DESIGN WET TESTING DESIGN • Important factors – discharge concentration of concern – type of statistical analysis – typical toxicant(s) – dilution/control water – receiving water quality – number of concentrations tested – stage in testing program (initial, advanced) DISCHARGE CONCENTRATION OF CONCERN (COC) • Acute – initial dilution, if allowed, at edge of acute mixing zone multiplied by 3.3 (TSD, 1991) to convert concentration at LC1 to concentration at LC50 • Chronic – dilution available at edge of chronic mixing zone TYPES OF WET TESTS • COC and control • Multiple concentrations and control WET TESTS WITH MULTIPLE CONCENTRATIONS • Recommended design for discharge monitoring • Usually includes small number of replicates • Focus more on concentration-response relationship • Dilutions center on COC • EPA recommends dilution factor > 0.5 • Maximize dilution factor with endpoint uncertainty and inter-test variability in mind WET TESTING ONLY THE COC • Design for ambient and some discharge monitoring • Little flexibility in test design • Increase number of replicates and/or organisms to increase confidence in results • Information on concentration/response relationship not available and not considered WET TESTS & WATER QUALITY PARAMETERS • Important that parameters match goals of testing, either: –instream condition of discharge upon dilution, or –inherent toxicity of discharge independent of instream condition WET TEST WATER QUALITY PARAMETERS • Most common parameters of concern – hardness – salinity – pH – temperature – conductivity • Test design solution: extra controls EXAMPLE OF ADDITIONAL CONTROL TO ADDRESS HARDNESS • Example goal: test instream condition of discharge after dilution • Daphnids cultured at 120 ppm • Discharge and receiving water are at 300 ppm • Prepare extra controls at 300 ppm hardness and compare results with dilutions tested WET TEST DESIGN AND TYPICAL TOXICANTS • The toxicant(s) suspected determine if and which test conditions are important • Good example is ammonia: – pH affects ammonia toxicity – pH is not strictly limited by the methods – pH drift beyond realistic levels may bring unionized ammonia to unrealistic levels • Test design solution: use pH control in WET tests WET TEST DESIGN & DILUTION/CONTROL WATER • Depends on test goals • Instream mixed discharge condition – use of water upstream from discharge preferred – second choice is water similar to upstream – as culture and dilution water differ, acclimation importance prior to testing increases WET TESTING FREQUENCY • Dependent on variability in condition (instream or discharge) • As variability increases, frequency should increase • Balance variability and frequency of testing with cost • Goal is to accurately represent the condition in question WET TEST DESIGN & STAGE OF TESTING • Species sensitivity varies with biological endpoints and test conditions • Frequency of testing and number of endpoints tested can decrease as data set increases WET TEST DESIGN & STATISTICS • Statistical approach used to analyze results affects test design and usually is permit-defined • Point estimates benefit from fewer replicates but more treatments • Hypothesis testing benefits from greater numbers of replicates but the number of treatments minimally affects results WET TEST DESIGN SUMMARY • Focus on condition to be tested and question being asked • Insure test parameters are representative of condition being tested • Testing frequency is driven by temporal variability in condition • Design tests to meet requirements of statistical approaches to be used Ambient Water Testing: Experimental Design and Data Analysis SETAC Expert Advisory Panel Performance Evaluation and Data Interpretation AMBIENT TOXICITY TESTING OBJECTIVES OF AMBIENT TOXICITY TESTING • Objectives vary – General assessment of water quality in streams, rivers, bays, ocean • Determine whether water body should receive more focused assessment • Assess whether water body or segment thereof should be placed or taken off of CWA 303d list of impaired waterways • Ascertain source of water contamination OBJECTIVES OF AMBIENT TOXICITY TESTING - Cont. • Compare results of effluent toxicity tests with receiving water tests • In conjunction with TIEs, and associated chemical analysis, identify the cause(s) of contamination • Assess the success of remediation efforts • Determine compliance with water quality standard for toxicity INFORMATION PROVIDED BY AMBIENT TOXICITY TESTING • Toxicity testing procedures with TIEs and chemical analyses have been used effectively to identify the chemical causes and sources of water quality contamination. • When applied in conjunction with carefully designed sampling regimes (e.g., site selection and timing of collection) these procedures can describe: – – – – Magnitude of toxicity Temporal extent (duration and frequency) Spatial/geographic distribution Land use practices responsible for toxicity STRENGTHS OF SINGLE SPECIES TESTS • An integrative measure of aggregate, additive toxicity • Provide a direct measure of toxicity and bioavailablity • In combination with TIEs, they can identify chemical cause(s) of toxicity • Measure toxicological responses to chemicals for which there are no chemical specific water quality standards STRENGTHS OF SINGLE SPECIES TESTS - Cont. • Reliable predictors of instream impacts • Afford reliable, repeatable, and comparable results compared to other types of biological and chemical tests • Furnish an early warning signal so that actions can be taken to minimize ecosystem impacts from toxic chemicals • Can be performed quickly and inexpensively compared to other biological monitoring procedures LIMITATIONS OF SINGLE SPECIES TESTS • Do not characterize the persistence/duration or frequency of exposures in ambient waters without repeated sampling and testing • Do not directly measure biotic community responses • Do not encompass the range of species, sensitivities, or functions (endpoints) responsive to toxic chemicals which occur in biological communities LIMITATIONS OF SINGLE SPECIES TESTS - Cont. • Do not measure delayed impacts nor effects due to bioaccumulation or bioconcentration, mutagenicity, carcinogenicity, teratogenicity, and enrichment. • Laboratory tests do not reflect the multivariate and complex exposure conditions which exist in many aquatic ecosystems • Results may underestimate biotic community responses to chemicals because of multiple stressors acting on aquatic ecosystems LIMITATIONS OF SINGLE SPECIES TESTS - Cont. • Use of surrogate species may not represent toxicological sensitivities in some aquatic ecosystems AMBIENT TESTING METHODS • Usually U.S. EPA marine or freshwater methods • Other (e.g., ASTM) protocols or indigenous species tests are sometimes used DEVIATIONS FROM U.S. EPA EFFLUENT TESTING PROCEDURES • Ambient water testing follows U.S. EPA protocols for testing effluents with a few exceptions • A dilution series usually is not included in testing until TIEs are performed on toxic samples • Water renewals may be from a single sample • Number of control replicates may be increased • Tests are conducted in glass or teflon containers “TIERED” APPROACH TO AMBIENT TESTING • Initial surveys intended to characterize watershed or waterbody sites over several years or hydrologic cycles - sampling may be monthly • Focused follow-up studies may include: – Increased number of sites and frequency of sampling – TIEs conducted – Evaluation monitoring to assess toxicity reduction/remediation efforts EXPERIMENTAL DESIGN • Centers around selection of: – Surface waterbody or segment(s) thereof to be monitored – Number and location of sampling sites – Sample type – Timing/period and frequency of sampling FACTORS TO CONSIDER WHEN SELECTING SAMPLING SITES • Significant source of flow or loads into the watershed? • Representative type of drainage (agriculture, urban, mining, etc.)? • Receives runoff from particular land use? • Predicted or suspected toxicity? • “Integrator” site indicative of inputs and/or of waterway (e.g., near mouth of river) • Previously identified toxicity? • Critical or sensitive habitat? TYPE OF SAMPLE • Composite collected over various time periods • Sub-surface grab sample SELECTING PERIOD AND FREQUENCY OF SAMPLING • Selecting sampling period depends on objectives of investigation • Selecting sampling frequency relates to defining duration and frequency of toxic events DATA ANALYSIS • EPA recommends t-tests to compare laboratory control to single ambient water sample • ANOVA and Dunnett’s multiple comparison are appropriate for multiple sites/samples ECOLOGICAL RELEVANCE QUESTION • Are the results of the U.S. EPA tests, or other single species tests, reliable predictors of biotic community responses/impacts? TWO REVIEWS OF ECOLOGICAL RELEVANCE ISSUE • Waller W.T., et. al. 1996. • de Vlaming V, Norberg-King T.J. 1999. ENCAPSULATED CONCLUSIONS OF REVIEWS • SETAC Panel - “It is unmistakable and clear that when U.S. EPA toxicity test procedures are used properly, they are reliable predictors of environmental impact provided that the duration and magnitude of exposure are sufficient to resident biota.” and “a strong predictive relationship exists between ambient toxicity and ecological impact.” ENCAPSULATED CONCLUSIONS OF REVIEWS - Cont. • de Vlaming and Norberg-King - The U.S. EPA, and other single species toxicity test results are, in a majority of cases, reliable qualitative predictors of responses in aquatic ecosystem populations. DE VLAMING AND NORBERGKING SUMMARY • Available literature yields a weight of evidence demonstration that WET, and other indicator species, toxicity test results are reliable qualitative predictors of biotic responses. • There are no empirical data which demonstrate that the indicator species results consistently fail to provide reliable predictions of instream biological responses. DE VLAMING AND NORBERGKING SUMMARY - Cont. • When toxicity test results fail to provide a reliable prediction, they more frequently underestimate instream biological responses. • Lab toxicity test results do not tend to overestimate bioavailability of chemicals. • Reliability with which toxicity test results predict instream biological responses increases when tests are performed on ambient waters and with magnitude of toxicity. DE VLAMING AND NORBERGKING SUMMARY - Cont. • Reliability with which toxicity test results predict instream biological responses increases with characterization of persistence and frequency of toxicity. • Reliability with which toxicity test results predict instream biological responses increases with effective matching (or accounting for) of lab and field exposures. TIE/TRE TEST DESIGN TIE/TRE GOAL • To identify, confirm and remove toxicant(s) in order to bring effluent into compliance with water quality standards • Test design is dependent on the phase of the TIE and the magnitude/variability of toxicity present • As toxicity decreases, number of replicates and identification/confirmation trials may need to increase TEST DESIGN AND PHASE I TIE • Use species that were used in testing which suggests toxicity • Many sample manipulations • Minimum number of replicates/treatment • Primarily analyze with hypothesis testing and BPJ • Test at 100% concentration or concentration providing significant response compared to controls TEST DESIGN AND PHASE III TIE • May use more than one species to compare sensitivities in supporting hypothesis • Few sample manipulations • Number or replicates and treatments similar to normal tests • May use hypothesis or point estimate statistical approaches - depends on permit • Usually test at multiple concentrations to support point estimates and to capture concentrationresponse relationships • Standard QA/QC OTHER TIE/TRE TEST DESIGN ISSUES • Flexibility • Temporal variability within and between samples • Screening • Dilution water • Controls for manipulations • QA/QC FLEXIBILITY • Be creative • Do not be constrained by required methods • Consider toxicology in test design and interpretation – rate of action – changes with organism age or development • Consider magnitude of toxicity for chronic TIEs - can you use acute tests? REFERENCE TEST APPROACH FLUORIDE LC50S FOR EFFLUENT AND LAB WATER Age (days) Series #1 Series #2 Series #3 2 7.8, 4.7 7.1, 4.4 8.0, 5.0 4 11.0, 6.8 11.7, 6.8 9.5, 7.3 6 16.3, 9.3 17.6, 8.0 18.6, 9.2 TEST DESIGN & TEMPORAL VARIABILITY • Variability can occur within and between samples, as well as between toxicant(s), over time • As toxicity persistence within samples decreases, may increase requirement for renewals • As temporal variability in toxicant identity and magnitude of toxicity increases, the number of trials increases TIE/TRE TEST DESIGN AND SCREENING • Only possible if screen can be a reliable predictor of toxicity in definitive test • Utility of screens impacted when toxicity is not persistent • Good idea when toxicity is unpredictable between samples - saves resources • Difficult for chronic TIE/TREs TIE/TRE TEST DESIGN AND DILUTION WATER • Should use same dilution water as that in tests which originally suggested toxicity • Advisable to test another dilution water to see if it impacts test results • Dilution water may influence toxicity and TIE interpretation • Differences may be biological, chemical or physical TIE/TRE TEST DESIGN AND ADDITIONAL CONTROLS • Phase I includes numerous manipulations of tested sample • Manipulations may cause toxicity independent of samples • Be wary of chemical additions which oxidize or reduce (examples will be provided) • Solution: treat control water in same fashion as sample and add to test as another control TIE/TRE TEST DESIGN SUMMARY • Design changes with stage of study • Focus resources on issues specific to each stage of study • Maintain flexibility and creativity • Avoid false conclusions with multiple controls and checks • Expertise