Statistics for Health Research What size of trial do I need? Peter T. Donnan Professor of Epidemiology and Biostatistics Co-Director of TCTU What size of study do I need? or 10 10,000 Answer •As large as possible! •Data is information, so the more data the sounder the conclusions •In real world, data limited by resources: access to patients, money, time, etc.. What size of study do I need? Expand the question: What size of study do I need to answer the question posed, given the size of my practice / clinic, or no. of samples, given the amount of resources (time and money) I have to collect the information? What is the question? •Trial is Comparative: new drugs (CTIMP), management of patients, etc… •Efficacy •Equivalence •Non-inferiority Why bother? 1.You will not get your study past ethics! 2.You will not get your proposal past a statistical review by funders! 3.It will be difficult to publish your results! Why bother? •Is the study feasible? •Is likely sample size enough to show meaningful differences with statistical significance? •Does number planned give enough power or need larger number? OBJECTIVES •Understand issues involved in estimating sample size •Sample size is dependent on design and type of analysis •Parameters needed for sample size estimation OBJECTIVES •Understand what is necessary to carry out some simple sample size calculations •Carry out these calculations with software •Note SPSS does not yet have a sample size calculator What is the measure of outcome? •Difference in Change in : •Scores, physiological measures (BP, Chol), QOL, hospitalisations, mortality, etc…. •Choose PRIMARY OUTCOME •A number of secondary but not too many! Intervention – Randomised Controlled Trial 1) Randomisation by patient- RCT Crossover trial 2) Randomisation by practice (Cluster randomisation) RANDOMISED CONTROLLED TRIAL (RCT) Gold standard method to assess Efficacy of treatment RANDOMISED CONTROLLED TRIAL (RCT) Random allocation to intervention or control so likely balance of all factors affecting outcome Hence any difference in outcome ‘caused’ by the intervention Randomised Controlled Trial Eligible subjects RANDOMISED Intervention Control INTERVENTION: To improve patient care and/or efficiency of care delivery •new drug/therapy •patient education •Health professional education •organisational change Example Evaluate cost-effectiveness of new statin RCT of new statin vs. old Randomise eligible individuals to either receive new statin or old statin Eligible subjects Evaluate cost-effectiveness of new statin on: Men ? Aged over 50? Cardiovascular disease? Previous MI? Requires precise INCLUSION and EXCLUSION criteria in protocol WHAT IS THE OUTCOME? •Improvement in patients health •Reduction in CV hospitalisations •More explicitly a greater reduction in mean lipid levels in those receiving the new statin compared with the old statin •Reduction in costs Effect size? Sounds a bit chicken and egg! Likely size of effect: What is the minimum effect size you will accept as being clinically or scientifically meaningful? Effect size? Change in Percentage with Total Cholesterol < 5 mmol/l • New Old Difference • 40% 20% 20% • 30% 20% 10% • 25% 20% 5% Variability of effect? Variability of size of effect: Obtained from previous published studies and/or Obtained from pilot work prior to main study Variability of effect? For a comparison of two proportions the variability of size of effect is dependent on: 1) the size of the study and 2) the size of the proportions or percentages How many subjects? •1) Likely size of effect •2) Variability of effect •3) Statistical significance level •4) Power •5) 1 or 2-sided tests Statistical significance or type I error Type I error – rejecting null hypothesis when it is true: False positive (Prob= ) Generally use 5% level ( = 0.05) i.e. accept evidence that null hypothesis unlikely with p< 0.05 May decrease this for multiple testing e.g. with 10 tests accept p < 0.005 1 or 2-sided? Generally use 2-sided significance tests unless VERY strong belief one treatment could not be worse than the other e.g. Weakest NSAID compared with new Cox-2 NSAID How many subjects? •1) Likely size of effect •2) Variability of effect •3) Statistical significance level •4) Power •5) 1 or 2-sided tests Power and type II error Type II error (False negative): Not rejecting the null hypothesis (non-significance) when it is false Probability of type II error - Power = 1 - , typically 80% Type I and Type II errors Analogy with sensitivity and specificity Error Prob. Screening Type I () False positive 1-specificity Type II () False negative 1-sensitivity Power Acceptable power 70% - 99% If sample size is not a problem go for 90% or 95% power; If sample size could be problematic go for lower power but still sufficient e.g. 80% Power In some studies finite limit on the possible size of the study then question is rephrased: What likely effect size will I be able to detect given a fixed sample size? How many subjects? •1) Likely size of effect •2) Variability of effect •3) Statistical significance level •4) Power •5) 1 or 2-sided tests Sample size for difference in two proportions Number needed for comparison depends on statistical test used For comparison of two proportions or percentages use Chi-Squared (2) test Comparison of two proportions Number in each arm = z n z2 p1 100 p1 p2 100 p2 2 p1 p2 2 Where p1 and p2 are the percentages in group 1 and group 2 respectively Assume 90% power and 5% statistical significance (2-sided) Number in each arm = 10.507p1 100 p1 p2 100 p2 n 2 p1 p2 z = 1.96 (5% significance level, 2-sided) z2β = 1.28 ( 90% power) are obtained from Normal distribution Assume 40% reach lipid target on new statin and 20% on old drug 10.50740 60 20 80 n 2 40 20 Number in each arm = 105 Total = 210 Comparison of two proportions Repeat for different effects •New Old Difference n Total •40% 20% 20% 105 210 •30% 20% 10% 472 944 •25% 20% 5% 1964 3928 n.b. Halving effect size increases size by factor 4! Increase in sample size with decrease in difference Two group ׿ test of equal proportions (odds ratio = 1) (equal n's) Æ= 0.050 ( 2) ÒÁ= 0.400 Pow= 90 1800 1600 n per group 1400 1200 1000 800 600 400 200 0 0.10 0.12 0.14 0.16 0.18 0.20 0.22 0.24 0.26 Group 2 proportion, Ò 0.28 0.30 0.32 0.34 Increase in power with sample size Two group ׿ test of equal proportions (odds ratio = 1) (equal n's) Æ= 0.050 ( 2) ÒÁ= 0.400 ÒÂ= 0.300 100 90 Power 80 70 60 50 100 200 300 400 Sample Size per Group 500 Comparison of two means has a similar formula Number in each arm = 2(zα + z2 β ) σ 2 n = (x1 2 - x2 ) 2 Where x1 and x2 are the means in group 1 and group 2 respectively and is the assumed standard deviation Allowing for loss to follow-up / non-compliance The number estimated for statistical purposes may need to be inflated if likely that a proportion will be lost to follow-up For example if you know approx. 20% will drop-out inflate sample size by 1/(1-0.2) = 1.25 Software and other sample size estimation The formula depends on the nature of the outcome and likely statistical test Numerous texts with sample size tables and formula Software – nQuery Advisor® SUMMARY In planning consider: design, type of intervention, outcomes, sample size, power, and ethics together at the design stage SUMMARY Sample size follows from type of analysis which follows from design Invaluable information is gained from pilot work and also more likely to be funded (CSO) SUMMARY Pilot also gives information on recruitment rate You may need to inflate sample size due to: Loss of follow-up/ drop-out Low compliance Remember the checklist •1) Likely size of effect •2) Variability of effect •3) Statistical significance level •4) Power •5) 1 or 2-sided tests SUMMARY Remember in Scientific Research: Size Matters