The Test for Significant Toxicity (TST) – A “New” Hypothesis Testing Approach for Aquatic Bioassay Testing Philip Markle Environmental Scientist pmarkle@lacsd.org History of the TST June 2010 – EPA released WET TST guidance (EPA 833-R-10-003) Also referred as: – Bioequivalence Testing – Alternative Null Hypothesis Testing Accepted for FDA drug trials and evaluations Originally proposed for use in toxicity testing in 1995 (Erickson and McDonald) Recently proposed for CA’s WET Policy Limitations of the TST It is still a statistical hypothesis test – Not very useful for comparing results spatially or temporally – Pass/Fail test, provides no information on magnitude Requires knowledge/use of a “threshold” response – “b” or bioequivalence factor Probably (and debatably) best suited for regulatory purposes Statistical Hypothesis Testing 101 Statistical speaking; – You can’t “prove” anything with a hypothesis test – we only “disprove” The “White Swan” Parable: Statistical Hypothesis Testing 101 You can’t prove that “all swans are white” If we see 10,000 white swans and no non-white swans, we fail reject our hypothesis In the absence of evidence to the contrary, we then assume the hypothesis is true “Proving” with Statistics However, after observing just one nonwhite swan, we can then confidently reject or disprove our hypothesis that all swans are white Statistical Hypothesis Testing - Background Null or “Initial” Hypothesis (Ho) – Mean(sample) Mean(control) Conduct statistical analyses to try to reject this hypothesis If unable to reject, we assume the null or “Initial” hypothesis is correct Type I and Type II error Type I and Type II Errors Type I Error – Probability of rejecting when the null or “Initial” hypothesis when it is “true” – Controlled directly by setting alpha () Type II Error – Probability of accepting the null or “Initial” hypothesis when it is “false” – Also called “power” () – Controlled indirectly Standard Hypothesis Testing (NOEC) With the NOEC: – The initial hypothesis is mean (sample) mean (control) In other words, the sample is non-toxic! – If we don’t/can’t “prove” this to be incorrect statistically, we assume it is true – Type I error = Identifying a non-toxic sample as toxic TST Hypothesis With the TST: – The hypothesis is mean(effluent) =/< 0.75 * mean(control) In other words, the sample is toxic! – If we don’t/can’t “prove” this to be incorrect statistically, we assume it is true – we assume the sample is toxic – Type I error = Identifying a toxic sample as non-toxic Bioequivalence Factor (b) In the EPA Guidance – Set as an unacceptable or “toxic” threshold For Chronic: – B = 0.75 = 25% Effect For Acute – B = 0.80 = 20% Effect Regulatory Management Decisions (RMDs) Setting the Type I Error Rate–alpha () – How frequent will you reject the Ho when it is true? EPA desires that no more than 25% of the tests with a 25% effect or more are identified as “non-toxic” Alpha () is then set at 0.05 to 0.25, depending on the test Test/Species-Specific Alpha Why the Different Alphas? EPA’s Second Regulatory Management Decision – No more than 5% of tests with effects less than 10% should be identified as toxic – Type II Error Rate – not really a “false positive” Alpha adjusted down until no more than 5% of tests with effects less than 10% were identified as “toxic” – Monte Carlo simulations TST Equation (Welch’s t-test) Mean (sample ) 0.75 Mean ( control ) Variance (sample ) 0.5625 Variance ( control ) n (sample ) n ( control ) t= t (calculated) < t (table/critical) = toxic t (calculated) > t (table/critical) = non-toxic Factors That Impact Ability to Statistically Reject the Hypothesis Magnitude of Effect Number of Replicates Within Test Variability TST Equation (Welch’s t-test) Mean (sample ) 0.75 Mean ( control ) Variance (sample ) 0.5625 Variance ( control ) n (sample ) n ( control ) t= All tests (100%) with an effect of 25% will be identified as “toxic” The greater the within test variability, the harder or less likely it will be to identify a sample as being statistically different (nontoxic). The more replication, the more likely it will be to identify a sample as being statistically different (non-toxic). Effect of Variability: Standard t-test 50 NOEC = Significant/toxic NOEC = Not significant/non-toxic Mean Young Produced 40 30 25% Reduction 20 Col 1 vs Response 10 0 Control Effluent Control Effluent Example: TST test 40 TST = Non-toxic TST = Toxic Mean Young Produced 35 30 25 25% Effect or 75% of Control 20 15 Control Effluent Control Effluent Controllable Factors That Impact Ability to Statistically Reject the Hypothesis Variability – The greater the within test variability, the harder or less likely it will be to identify a sample as being statistically different. – For the “regular” hypothesis test • Less frequent identification of “toxicity” – For the TST • Less frequent identification of “no toxicity” Replication Procedures That May Reduce Variability Maximize Mean Response • CV = S.D. / Mean From EPA Test of Significant Toxicity (TST) Document EPA 833-R-10-003 Impact of Control Mean At the 10th Percentile (17.7) - a 25% effect is reduction of 4.4 neonates At the 50th Percentile (25.5) - a 25% effect is reduction of 6.4 neonates At the 95th Percentile (35.6) - a 25% effect is reduction of 8.9 neonates Procedures That May Increase Mean Response Dilution Water Selection – Match sample condition as much as possible Food Supplements, Combinations – Specifically allowed (13.6.16.9.2) Feeding Rates – Twice or three times per day – Amount of food Fathead Minnow Feeding Rate Example Fathead Minnow Growth - Control Mean 1.65 Control Mean (mg) 1.45 1.25 1.05 0.85 0.65 0.45 0.25 1000 Artemia/Test Chamber n = 267, Mean = 0.616 mg 0.4% Exceed 95th Percentile 1.1% Exceed 90th Percentile 2.2% Exceed 85th Percentile 7.1% Exceed 75th Percentile 50.9% Exceed 50th Percentile 1500 Artemia/Test Chamber n=317, Mean = 0.801 9.5% Exceed 95th Percentile 20.5% Exceed 90th Percentile 30.3% Exceed 85th Percentile 53.3% Exceed 75th Percentile 97.2% Exceed 50th Percentile Impact of Growth on CV Fathead Minnow Growth - Control CV Control CV (%) 40 30 20 10 0 1000 Artemia/Test Chamber n = 267 3% Exceed 95th Percentile 9.4% Exceed 90th Percentile 9.7% Exceed 85th Percentile 15.7% Exceed 75th Percentile 40.4% Exceed 50th Percentile 1500 Artemia/Test Chamber n=317 1.9% Exceed 95th Percentile 4.7% Exceed 90th Percentile 5.4% Exceed 85th Percentile 15.2% Exceed 75th Percentile 41.5% Exceed 50th Percentile Procedures That May Decrease Variability Set Internal Control CV Criteria Ceriodaphnia dubia Control CV 2010 through February 2011 140 Meets TAC Failed TAC National 50th Percentile National 75th Percentile National 95th Percentile 120 Control CV (%) 100 80 60 40 20 0 1/1/2010 3/1/2010 5/1/2010 7/1/2010 9/1/2010 11/1/2010 Date of Test Initiation 1/1/2011 3/1/2011 Procedures That May Decrease Variability Set Internal Control Mean Criteria Mean Reproduction in Control 50 Ceriodaphnia dubia Control Reproduction Means 2010 through February 2011 40 30 20 Meets TAC Failed TAC Minimum TAC National 50th Percentile National 75th Percentile National 95th Percentile 10 0 1/1/2010 3/1/2010 5/1/2010 7/1/2010 9/1/2010 Date of Testing 11/1/2010 1/1/2011 3/1/2011 Statistical and Non-statistical Error False Determinations of Toxicity USEPA Non-Toxic "Blank" Samples1 Ceriodaphnia dubia Reproduction Results Effect Relative to Control (%) 80 TST Non-Toxic TST Toxic (14.8%) 60 40 20 0 -20 1 Data Source: USEPA's WET Interlaboratory Validation Study (EPA 821-B-01-004), Table 9.7. Dose Response Evaluation Eliminating multiple concentrations may limit ability to evaluate spurious results. 35 Single Concentration Test Multiple Concentration Test 30 Number of Neonates 25 20 15 10 Non-Toxic Toxic 5 0 Control 100% Effluent 20% Effluent 40% Effluent 60% Effluent 80% Effluent 100% Effluent Conclusions Same limitations as any hypothesis test – Implications associated with variability and “power” shifted Not a magical “black box” – You need to be aware of the impact variability, QA/QC, and test design may have May be useful for regulation – NPDES Permits – Possible use for remediation goals? Questions? Contact info: pmarkle@lacsd.org