The Role of the Statistician in Clinical Research Teams Elizabeth Garrett-Mayer, Ph.D. Division of Biostatistics Department of Oncology Statistics • Statistics is the art/science of summarizing data. • Better yet … summarizing data so that non-statisticians can understand it. • Clinical investigations usually involve collecting a lot of data. • But, at the end of your trial, what you really want is a “punchline”: – Did the new treatment work? – Are the two groups being compared similar or different? – Is the new method more precise than the old method? • Statistical inference is the answer! Do you need a statistician as part of your clinical research team? • YES! • Simplest reasons: s/he will help optimize – Design – Analysis – Interpretation of results – Conclusions What if I already know how to calculate sample size and perform a t-test? • Statisticians might know of a better approach • Trained more formally in design options • More “bang for your buck” • Tend to be less biased • Adds credibility to your application • Use resources that are available to you Different Roles • Very collaborative – – – – Active co-investigator Helps develop aims and design Brought in at an early stage in planning Contributes input throughout trial planning and ongoing study • Consultants – Inactive co-investigator – Often not brought in until either • you need a sample size calculation several days before submission • trial has been criticized/rejected for lack of statistical input • you’ve finally collected all of the data and don’t know what to do next. – Seldom involved in planning or analysis. Find a statistician early • Your trial can only benefit from inclusion of a statistician • Statisticians cannot correct a poorly designed trial after the trial has begun. • “Statistical adjustment” in analysis plan does not always work. • Ignorance is not bliss: – Some clinical investigators are trained in statistics – … but usually not all aspects! – Despite inclination to choose a particular design or analysis method, there might be better ways. Example: Breast Cancer Prevention in Mice • • • • NOT a clinical study (thank goodness!) Mouse study, comparing 4 types of treatments Her-2/neu Transgenic Mouse Prevention: which treatment is associated with the longest tumor formation time? A intra-ductal B intra-ductal No treatment A intravenous Example: Breast Cancer Prevention in Mice • Each oval represents a mouse and dots represent ducts • Lab researchers developed the design • Varying numbers of mice for each of the four treatment groups • Multiple dose levels, not explicitly shown here A intra-ductal B intra-ductal No treatment A intravenous How to analyze? • Researchers had created their own survival curves. • Wanted “p-values” • Some design issues: – What is the unit of analysis? – How do we handle the imbalance? – Treatment B has no “control” side Our approach • Statistical adjustments used • Treat “side of mouse” as unit of analysis – Cannot use mouse: multiple treatments per mouse in many cases. Too many categories (once dose was accounted for). – Cannot use duct: hard to model “dependence” within side and within mice. – Still need to include dependence within mice. • Need to adjust for treatment on “other side” – Problems due to imbalance of doses – Could only adjust for “active” versus “inactive” treatment on other side. Our approach • • • • • We “saved” this study! Not optimal, but “best” given the design Adjustments might not be appropriate We “modeled” the data—we made assumptions We could not implement all the adjustments we would have liked to • Better approach? – Start with a good design! – We would have suggested balance – We would NOT have given multiple treatments within mice. Statisticians: Specific responsibilities • Design – Choose most efficient design – Consider all aims of the study – Particular designs that might be useful • Cross-over • Pre-post • Factorial – Sample size considerations – Interim monitoring plan Statisticians: Specific responsibilities • Assistance in endpoint selection – Subjective vs. objective – Measurement issues • Should any measurement error be considered? • What if you are measuring pain? QOL? – Multiple endpoints (e.g. safety AND efficacy) – Patient benefit vs. biological/PK endpoint – Primary vs. secondary – Continuous vs. categorical outcomes Statisticians: Specific responsibilities • Analysis Plan – Statistical method for EACH aim – Account for type I and type II errors – Stratifications or adjustments are included if necessary – Simpler is often better – Loss to follow-up: missing data? Sample size and power • The most common reason we are contacted • Sample size is contingent on design, analysis plan, and outcome • With the wrong sample size, you will either – not be able to make conclusions because the study is “underpowered” – waste time and money because the study is unnecessarily large • And, with wrong sample size, you might have problems interpreting your result: – Does the lack of a significant result show that the treatment does not work, or is it due to my sample size being too small? – Did the treatment REALLY work, or is the effect I saw too small to warrant further consideration of this treatment? – This is an issue of CLINICAL vs. STATISTICAL significance Sample size and power • Sample size ALWAYS requires the investigator to make some assumptions: – How much better do you expect the experimental therapy group to perform compared to the standard therapy groups? – How much variability do we expect in measurements? – What would be a clinically relevant improvement? • The statistician CANNOT tell you what these numbers should be (unless you provide data) • It is the responsibility of the clinical investigator to define these parameters Sample size and power • Review of power – Power = The probability of concluding that the new treatment is effective if it truly is effective – Type I error = The probability of concluding that the new treatment is effective if it truly is NOT effective – (Type I error = alpha level of the test) – (Type II error = 1 – power) • When your study is too small, it is hard to conclude that your treatment is effective Example: sample size in multiple myeloma study • Phase II study in multiple myeloma patients (Borello) • Primary Aim: Assess clinical efficacy of activated marrow infiltrating lymphocytes (aMILs) + GM-CSF-based tumor vaccines in the autologous transplant setting for patients with multiple myeloma. • If the clinical response rate is ≤ 0.25, then the investigators are not interested in pursuing aMILs + vaccines further. • If the clinical response rate is ≥ 0.40, they would be interested in pursuing these further. Example: sample size in multiple myeloma study • H0: p = 0.25 • H1: p = 0.40 • We want to know what sample size we need for a large power and small type I error. – If the treatment DOES work, then we want a high probability of concluding that H1 is “true.” – If the treatment DOES NOT work, then we want a low probability of concluding that H1 is “true.” Sample size = 10; Power = 0.17 Distribution of H0: p = 0.25 0.20 0.15 0.10 0.05 0.00 distribution Probability Density 0.25 Distribution of H1: p = 0.40 0.0 0.2 0.4 0.6 ProportionProb. of responders of Response 0.8 1.0 Vertical line defines “rejection region” Sample size = 30; Power = 0.48 Distribution of H0: p = 0.25 0.10 0.05 0.00 distribution Probability Density 0.15 Distribution of H1: p = 0.40 0.0 0.2 0.4 0.6 ProportionProb. of responders of Response 0.8 1.0 Vertical line defines “rejection region” Sample size = 75; Power = 0.79 Distribution of H0: p = 0.25 0.08 0.06 0.04 0.02 0.00 distribution Probability Density 0.10 Distribution of H1: p = 0.40 0.0 0.2 0.4 0.6 ProportionProb. of responders of Response 0.8 1.0 Vertical line defines “rejection region” Sample size = 120; Power = 0.92 Distribution of H0: p = 0.25 0.06 0.04 0.02 0.00 distribution Probability Density 0.08 Distribution of H1: p = 0.40 0.0 0.2 0.4 0.6 ProportionProb. of responders of Response 0.8 1.0 Vertical line defines “rejection region” Not always so easy • More complex designs require more complex calculations • Usually also require more assumptions • Examples: – Longitudinal studies – Cross-over studies – Correlation of outcomes • Often, “simulations” are required for a sample size estimate. Most common problems seen in study proposals when a statistician is not involved • Outcomes are not clearly defined • There is not an analysis plan for secondary aims of the study • Sample size calculation is too simplistic or absent • Assumptions of statistical methods are not appropriate Examples: trials with additional statistical needs • Major clinical trials (e.g. Phase III studies) • Continual reassessment method (CRM) studies • Longitudinal studies • Study of natural history of disease/disorder • Studies with “non-random” missing data Major trials • Major trials are usually monitored periodically for safety and ethical reasons. • Monitoring board: Data (Safety and) Monitoring Committee (DMC or DSMC) • In these trials you would ideally use three statisticians (Pocock, 2004, Statistics in Medicine) – Study statistician – DMC statistician – Independent statistician • Why? – Interim analyses require “unbiased” analysis and interpretation of study data. • Industry- versus investigator-initiated trials … differences? Study statistician • Overall statistical responsibility • Actively engaged in design, conduct, final analysis • Not involved in interim analyses • Should remain “blinded” until study is completed DMC statistician • Experienced trialist • Evaluate interim results • Decide (along with rest of DMC) whether trial should continue • No conflict of interest Independent Statistician • • • • Performs interim analysis Writes interim analysis report No conflict of interest Only person with full access to “unblinded” data until trial completion High-maintenance trials • Some trials require statistical decision-making during the trial process • Simon’s “two-stage” design: – Stage 1: Treat about half the patients. – Stage 2: If efficacy at stage 1 meets a certain standard, the remainder of the patients are enrolled • “Adaptive” and “sequential” trials: final sample size is determined somewhere in the middle of the trial • Continual Reassessment Method … Continual Reassessment Method • Phase I trial design • “Standard” Phase I trials (in oncology) use what is often called the ‘3+3’ design Treat 3 patients at dose K 1. If 0 patients experience dose-limiting toxicity (DLT), escalate to dose K+1 2. If 2 or more patients experience DLT, de-escalate to level K-1 3. If 1 patient experiences DLT, treat 3 more patients at dose level K A. If 1 of 6 experiences DLT, escalate to dose level K+1 B. If 2 or more of 6 experiences DLT, de-escalate to level K-1 • Maximum tolerated dose (MTD) is considered the highest dose at which 1 or 0 out of six patients experiences DLT. • Doses need to be pre-specified. • Confidence in MTD is usually poor. Continual Reassessment Method • Allows statistical modeling of optimal dose: dose-response relationship is assumed to behave in a certain way • Can be based on “safety” or “efficacy” outcome (or both). • Design aims to determine an optimal dose, given a desired toxicity or efficacy level, and does so in an efficient way. • This design REALLY requires a statistician throughout the trial. • Example: Phase I/II trial of Samarium 153 in High Risk Osteogenic Sarcoma (Schwartz) CRM history in brief • Originally devised by O’Quigley, Pepe and Fisher (1990) where dose for next patient was determined based on responses of patients previously treated in the trial • Due to safety concerns, several authors developed variants – – – – Modified CRM (Goodman et al. 1995) Extended CRM [2 stage] (Moller, 1995) Restricted CRM (Moller, 1995) and others …. Basic Idea of CRM exp(3 di ) p(toxicity| dose di ) 1 exp(3 di ) Modified CRM (Goodman, Zahurak, and Piantadosi, Statistics in Medicine, 1995) • Carry-overs from standard CRM – Mathematical dose-toxicity model must be assumed – To do this, we need to think about the dose-response curve and create a preliminary model. – We CHOOSE the level of toxicity that we desire for the MTD (p = 0.30) – At end of trial, we can estimate a dose response curve. – “prior distribution” p(toxicity| dose (mathematical subtlety) exp(3 di ) di ) 1 exp(3 di ) Modified CRM by Goodman, Zahurak, and Piantadosi (Statistics in Medicine, 1995) • Modifications by Goodman et al. – Use ‘standard’ dose escalation model until first toxicity is observed: • Choose cohort sizes of 1, 2, or 3 • Use standard ‘3+3’ design (or, in this case, ‘2+2’) – Upon first toxicity, fit the dose-response model using observed data • Estimate • Find dose that is closest to toxicity of 0.3. – Does not allow escalation to increase by more than one dose level. – De-escalation can occur by more than one dose level. – Dose levels are discrete: must be rounded off to closest level Example Samarium study with cohorts of size 2: 2 patients treated at dose 1 with 0 toxicities 2 patient treated at dose 2 with 1 toxicity Fit CRM using equation below exp(3 di ) p(toxicity| dose di ) 1 exp(3 di ) • Estimated = 0.77 • Estimated dose for next patient is 2.0 • Use dose level 2 for next cohort. Example Samarium study with cohorts of size 2: 2 patients treated at dose 1 with no toxicities 4 patients treated at dose 2 with 1 toxicity Fit CRM using equation on earlier slide • Estimated = 0.88 • Estimated dose for next patient is 2.54 • Round up to dose level 3 for next cohort. Example Samarium study with cohorts of size 2: 2 patients treated at dose 1 with no toxicities 4 patients treated at dose 2 with 1 toxicity 2 patients treated at dose 3 with 1 toxicity Fit CRM using equation on earlier slide • Estimated = 0.85 • Estimated dose for next patient is 2.47 • Use dose level 2 for next cohort. Example Samarium study with cohorts of size 2: 2 patients treated at dose 1 with no toxicities 6 patient treated at dose 2 with 1 toxicity 2 patients treated at dose 3 with 1 toxicity Fit CRM using equation on earlier slide • Estimated = 0.91 • Estimated dose for next patient is 2.8 • Use dose level 3 for next cohort. • Etc. Longitudinal Studies • Multiple observations per individual over time • Sample size calculations are HARD • But, analysis is also complex • Standard assumption of “basic” statistical methods and models is that observations are independent • With longitudinal data, we have “correlated” measures within individuals Study of Autism in Young Children (Landa) • Autism is usually diagnosed at age 3 • But, there is evidence that there are earlier symptoms that are indicative of autism • Children at high risk of autism (kids with older autistic siblings) and “controls” were observed at 6 months, 14 months, and 24 months for symptoms (Mullen) • Prospective study • Children were diagnosed at 36 months into three groups. – ASD (autism-spectrum disorder) – LD (learning disabled) – Unaffected • Earlier symptoms were compared to see if certain symptoms could predict diagnosis. ASD (n=23) Mullen Subscales Mean LD (n=11) (SD) Mean (SD) Unaffected (n=53) Mean (SD) 6 Months Gross Motor 51.50 (7.69) 53.86 (6.31) 49.35 (10.35) Visual Reception 52.58 (8.94) 47.43 (10.75) 55.18 (10.66) Fine Motor 46.92 (11.86) 36.86 (7.45) 49.54 (10.76) Receptive Language 50.75 (8.21) 44.86 (8.95) 50.18 (7.26) Expressive Language 47.25 (10.14) 44.57 (4.57) 43.98 (6.70) Early Learning Composite 100.67 (12.75) 87.29 (9.16) 99.59 (10.10) 14 Months Gross Motor 46.91 (12.34) 52.91 (10.38) 58.16 (10.52) Visual Reception 48.39 (10.95) 51.00 (9.32) 54.73 (9.03) Fine Motor 50.48 (10.44) 52.82 (9.40) 57.41 (7.28) Receptive Language 34.70 (13.38) 39.64 (5.95) 52.59 (12.26) Expressive Language 39.04 (15.14) 47.00 (7.64) 52.02 (11.33) Early Learning Composite 87.39 (19.97) 95.55 (10.68) 108.27 (13.89) 24 Months Gross Motor 35.43 (8.69) 49.18 (11.04) 52.20 (10.97) Visual Reception 43.26 (10.98) 48.91 (11.59) 56.73 (10.48) Fine Motor 36.04 (14.17) 48.91 (7.97) 52.78 (11.07) Receptive Language 35.74 (15.25) 42.73 (11.31) 59.22 (10.74) Expressive Language 36.65 (15.31) 45.27 (12.03) 60.14 (12.15) Early Learning Composite 78.43 (21.68) 93.73 (14.86) 114.98 (15.89) Statistical modeling can help! • Previous table was hard to draw conclusions from • Each time point was analyzed separately, and within time points, groups were compared. • Use some reasonable assumptions to help interpretation – Children have “growth trajectories” that are continuous and smooth – Observations within the same child are correlated. Conclusions are much easier • The models used to make the graphs are somewhat complicated • But, they are “behind the scenes” • The important information is presented clearly and concisely • ANOVA approach does not “summarize” data Concluding Remarks • Involve your statistician as soon as you begin to plan your study • Things statisticians do not like: – Being contacted a few days before grant/protocol/proposal is due – Rewriting inappropriate statistical sections – Analyzing data from a poorly designed trial • Statisticians have a lot to add – A “fresh” perspective on your study – A more efficient study!