Comet analysis – Issues, Limitations and Recommendations Jonathan Bright Discovery Statistics AstraZeneca Nested Design Test substance Vehicle Low Medium High Positive Control Culture 1 2 3 4 5 6 7 8 9 10 … 14 15 … 19 20 … 24 25 Gel (Slide) ab ab … Measurements on 50 cells per gel 2 Jonathan Bright, Comet Analysis – Issues, Limitations and Recommendations, NCSC 2010 … ab ab AstraZeneca Discovery Statistics Group Vehicle-treated cells showing no genetic damage Positive controltreated cells showing severe genetic damage 3 Jonathan Bright, Comet Analysis – Issues, Limitations and Recommendations, NCSC 2010 AstraZeneca Discovery Statistics Undamaged and Damaged Cells Label 4 Jonathan Bright, Comet Analysis – Issues, Limitations and Recommendations, NCSC 2010 AstraZeneca Discovery Statistics Tail Intensities (linear scale) Label 5 Jonathan Bright, Comet Analysis – Issues, Limitations and Recommendations, NCSC 2010 AstraZeneca Discovery Statistics Tail Intensities (linear, jittered) Label 6 Jonathan Bright, Comet Analysis – Issues, Limitations and Recommendations, NCSC 2010 AstraZeneca Discovery Statistics Tail Intensities (log scale, jittered) Summary and Analysis • Excluding positive control data • Using PROC MIXED • Pairwise contrasts • Main issue is the choice of gel summary, in particular when: • Large number of zeros • Small number of unusually large values 7 Jonathan Bright, Comet Analysis – Issues, Limitations and Recommendations, NCSC 2010 AstraZeneca Discovery Statistics • Summarise each gel of 50 numbers with a statistic, S • Analyse S • Non-zero part of distribution approx lognormal • Mean is powerful • Fold-change treatment effects BUT • Zeros! • Add delta (e.g. 0.001) to all tail intensities before logging and averaging 8 Jonathan Bright, Comet Analysis – Issues, Limitations and Recommendations, NCSC 2010 AstraZeneca Discovery Statistics Summary Statistic 1. Mean Log Summary Statistic 1. Mean Log (contd) • S = mean(log(tail intensity + delta)) = log(delta) • Delta = 0.001 then S = -3 • Delta = 0.0001 then S = -4 etc • i.e. S depends critically on delta • 50 tail intensities all > 0.1, say • S approx = mean(log(tail intensity)) • i.e. S is approx independent of delta • Treatment effect depends on delta! 9 Jonathan Bright, Comet Analysis – Issues, Limitations and Recommendations, NCSC 2010 AstraZeneca Discovery Statistics • 50 zeros on a gel 0.7 i.e. 5-fold 1.0 i.e. 10-fold Group 10 Jonathan Bright, Comet Analysis – Issues, Limitations and Recommendations, NCSC 2010 AstraZeneca Discovery Statistics Summary Statistic 1. Example Summary Statistic 2. Percentiles • Robust measure of location • Will fail to detect changes of interest in the upper tail • 90th percentile • Better chance of detecting changes in the upper tail • Not very robust with only 50 values • 75th percentile • May offer a better balance than either of the above • (In the presence of many zeros, even the chosen percentile may equal zero.) 11 Jonathan Bright, Comet Analysis – Issues, Limitations and Recommendations, NCSC 2010 AstraZeneca Discovery Statistics • Median Group 12 Jonathan Bright, Comet Analysis – Issues, Limitations and Recommendations, NCSC 2010 AstraZeneca Discovery Statistics Summary Statistic 2. Example (linear scale) Summary Statistic 2. Example (linear, jittered) Increase from 0.16 to 0.83 “5-fold” Increase from 1.3 to 3.2 “2.5-fold” Group 13 Jonathan Bright, Comet Analysis – Issues, Limitations and Recommendations, NCSC 2010 AstraZeneca Discovery Statistics Increase from 0.004 to 0.175 “44-fold” Group 14 Jonathan Bright, Comet Analysis – Issues, Limitations and Recommendations, NCSC 2010 AstraZeneca Discovery Statistics Summary Statistic 2. Example (log, jittered) Two Summary Statistics? • Awkward distribution of many zeros and small values • Too much to ask of a single summary statistic? • Two summary statistics: • Proportion of zeros (or proportion of tail intensities < low threshold) • Mean(log(all other tail intensities) 15 Jonathan Bright, Comet Analysis – Issues, Limitations and Recommendations, NCSC 2010 AstraZeneca Discovery Statistics plus some extreme values Two Summary Statistics. Example (linear, jittered) 0.36 Group 16 Jonathan Bright, Comet Analysis – Issues, Limitations and Recommendations, NCSC 2010 0.13 i.e. 1.4-fold Group AstraZeneca Discovery Statistics 0.64 Recommendations Animal_Gel 17 Jonathan Bright, Comet Analysis – Issues, Limitations and Recommendations, NCSC 2010 AstraZeneca Discovery Statistics • Picture the raw data 22 20 18 14 %TI 12 10 8 6 4 2 0 -2 1a_a 1a_b 1b_a 1b_b 1c_a 1c_b 1d_a 1d_b 1e_a 1e_b 1f_a 1f_b 2a_a Animal_Gel 2a_b 2b_a 2b_b Vehicle 18 Jonathan Bright, Comet Analysis – Issues, Limitations and Recommendations, NCSC 2010 2c_a 2c_b 2d_a 2d_b Treated 2e_a 2e_b 2f_a 2f_b AstraZeneca Discovery Statistics %TI 16 Recommendations of awkward distributions • Present results as confidence intervals 19 Jonathan Bright, Comet Analysis – Issues, Limitations and Recommendations, NCSC 2010 AstraZeneca Discovery Statistics • Picture the raw data • Consider using 2 summary statistics in the presence Two Summary Statistics. Example (linear, jittered) 0.36 20 0.13 i.e. 1.4-fold Group Group 95% 1-sided confidence interval extends up to 0.4 95% 1-sided confidence interval extends up to 2.3-fold 95% 2-sided confidence interval extends from 0.1 up to 0.45 95% 2-sided confidence interval extends from 0.7-fold up to 2.6-fold Jonathan Bright, Comet Analysis – Issues, Limitations and Recommendations, NCSC 2010 AstraZeneca Discovery Statistics 0.64