Sample Size Estimation, Power Analysis, and Minimal Detectable Difference Calculations Sample Size Estimation, Power Analysis, and Minimal Detectable Difference Calculations John Sorkin, M.D., Ph.D. Chief of Biostatistics and Informatics Baltimore VA GRECC and University of Maryland School of Medicine Claude D. Pepper OAIC 7/29/2008 John Sorkin M.D. Ph.D. • • • • Internal Medicine Endocrinology Gerontology Geriatrics • Ph.D. (Epidemiology) What Motivates This Session? John Sorkin M.D. Ph.D. • • • • Internal Medicine Endocrinology Gerontology Geriatrics • Ph.D. (Epidemiology) – Out of the closet statistician What Motivates This Session? • Sample size, power, and minimal detectable difference calculations play a role in – Designing studies – IRB applications – Grant applications – Evaluating study results © 2001 John Sorkin. Do not use without permission Aim • Describe the concepts that must be understood when a • sample size, • power, or • minimal detectable difference calculation is performed • This presentation is not intended to be mathematically rigorous. 1 Sample Size Estimation, Power Analysis, and Minimal Detectable Difference Calculations 7/29/2008 Sample Size • The number of subjects that need to be studied if a significant difference between treatment and control groups is to be shown – Assuming a known treatment effect • Sample size analyses help determine the cost of a study. Power Analyses • The probability that a treatment effect will be correctly identified if one exists. – Assuming a known sample size • Power analyses help determine a study’s probability of success i.e., the probability that a proposed study will find a treatment effect. Minimal Detectable Difference • The smallest treatment effect that can be identified. Primum non Nocere • Minimal detectable difference analyses helps determine the feasibility of a study • It is unethical to perform any experiment that puts subjects at risk for injury or other harm if the study as designed has no hope of answering the question the study was designed to address. Basic Concepts Basic Concepts – Assuming a known sample size • • • • • • • • • Study design Mean value of the characteristic Distribution of the measurements Variability of the measurements (SD) Statistical significance (α error rate) Number of subjects per group Effect size One or two-sided test Power (1-β) © 2001 John Sorkin. Do not use without permission • • • • • • • • • Study design Mean value of the characteristic Distribution of the measurements Variability of the measurements (SD) Statistical significance (α error rate) Number of subjects per group Effect size One or two-sided test Power (1-β) 2 Sample Size Estimation, Power Analysis, and Minimal Detectable Difference Calculations Study Design • Single sample • Two samples • Multiple samples 7/29/2008 Single Sample • A characteristic is measured in a group. The group receives a treatment and the characteristic is measured again. Single Sample - Example Two Samples • Serum cholesterol concentration is measured in a single group of subjects. The subjects receive a drug and the subjects’ cholesterol concentration is measured again. • A characteristic is measured in a control group and an experimental group. The experimental group receives a treatment, the control group receives a placebo. The characteristic is measured in both group after treatment. Two Samples – Example Multiple Samples • Pulse rate is measured in a control group and an experimental group. The experimental group receives a pill containing caffeine. The control group receives a sugar pill. Pulse rate is measured in both groups after treatment. © 2001 John Sorkin. Do not use without permission • A characteristic is measured in a control group and several experimental groups. The experimental groups receive different treatments, the control group receives a placebo. The characteristic is measured in all groups after the treatments. 3 Sample Size Estimation, Power Analysis, and Minimal Detectable Difference Calculations 7/29/2008 The Mean Value of the Characteristic Being Studied Basic Concepts • • • • • • • • • Study design Mean value of the characteristic Distribution of the measurements Variability of the measurements (SD) Statistical significance (α error rate) Number of subjects per group Effect size One or two-sided test Power (1-β) • Also called – Null hypothesis or – H0. • Source of estimates – Pilot study – Literature review – Prior knowledge. H0, the Null Hypothesis Probability 0.25 0.2 0.15 0.1 0.05 5 10 15 20 x Distribution of the Measurements of the Characteristic • • • • Normal Poisson Binomial Exponential Basic Concepts • • • • • • • • • Study design Mean value of the characteristic Distribution of the measurements Variability of the measurements (SD) Statistical significance (α error rate) Number of subjects per group Effect size One or two-sided test Power (1-β) Distribution of the Measurements of the Characteristic Probability 0.25 0.2 0.15 0.1 0.05 5 © 2001 John Sorkin. Do not use without permission 10 15 20 x 4 Sample Size Estimation, Power Analysis, and Minimal Detectable Difference Calculations 7/29/2008 Variability of the Measurements Basic Concepts • • • • • • • • • Study design Mean value of the characteristic Distribution of the measurements Variability of the measurements (SD) Statistical significance (α error rate) Number of subjects per group Effect size One or two-sided test Power (1-β) • Quantified by the standard deviation (SD) of the characteristic. Variability of the Measurements: SD Variability of the Measurements: SD Probability Probability 0.25 0.4 0.2 0.3 0.15 0.2 0.1 0.1 0.05 5 10 15 Basic Concepts • • • • • • • • • Study design Mean value of the characteristic Distribution of the measurements Variability of the measurements (SD) Statistical significance (α error rate) Number of subjects per group Effect size One or two-sided test Power (1-β) © 2001 John Sorkin. Do not use without permission 20 x 2.5 5 x 7.5 10 12.5 15 17.5 20 Significance • Also know as: – Probability of a Type-I error, α. – The probability of incorrectly saying there is a treatment effect when there is none. 5 Sample Size Estimation, Power Analysis, and Minimal Detectable Difference Calculations 7/29/2008 Significance: Significance Probability of saying there is a treatment effect when there is none Absolute Truth: H0: Treatment has no effect Probability True 0.25 You State 0.2 0.15 0.1 5 10 15 Basic Concepts Study design Mean value of the characteristic Distribution of the measurements Variability of the measurements (SD) Statistical significance (α error rate) Number of subjects per group Effect size One or two-sided test Power (1-β) Basic Concepts • • • • • • • • • False α • The probability of saying there is a treatment effect when there is none 0.05 • • • • • • • • • True False Study design Mean value of the characteristic Distribution of the measurements Variability of the measurements (SD) Statistical significance (α error rate) Number of subjects per group Effect size One or two-sided test Power (1-β) © 2001 John Sorkin. Do not use without permission 20 x Number of Subjects in Each Group • Single sample – Only one number needed • Two or more samples – Sample size for each group – Groups need not be the same size. Effect Size • Increase (or decrease) in the mean brought about by treatment. • Also known as – Alternate hypothesis, post treatment mean, Ha. • Source of estimates – Pilot study – Literature review – Prior knowledge. 6 Sample Size Estimation, Power Analysis, and Minimal Detectable Difference Calculations 7/29/2008 Distribution of Measurements Resulting in Ha: SDa Ha: The Post Treatment Mean Probability Probability 0.25 0.25 0.2 0.2 0.15 0.15 0.1 0.1 0.05 0.05 5 10 15 20 x 5 Distribution of H0 and Ha Probability 0.25 0.2 0.15 0.1 0.05 5 10 15 20 x One or Two-Sided Test • Two-sided – Can the intervention both increase and decrease the pre-treatment mean? • One sided – Does the intervention only increase the mean? – Does the intervention only decrease the mean? • Use one-sided test infrequently! 10 15 20 x Basic Concepts • • • • • • • • • Study design Mean value of the characteristic Distribution of the measurements Variability of the measurements (SD) Statistical significance (α error rate) Number of subjects per group Effect size One or two-sided test Power (1-β) One or Two-Sided Test • Two-sided – Can the intervention both increase and decrease the pre-treatment mean? • One sided – Does the intervention only increase the mean? – Does the intervention only decrease the mean? • Use one-sided test infrequently! – Consider recent Estrogen supplementation study © 2001 John Sorkin. Do not use without permission 7 Sample Size Estimation, Power Analysis, and Minimal Detectable Difference Calculations 7/29/2008 Power Basic Concepts • • • • • • • • • Study design Mean value of the characteristic Distribution of the measurements Variability of the measurements (SD) Statistical significance (α error rate) Number of subjects per group Effect size One or two-sided test Power (1-β) Power Absolute Truth: H0: Treatment has no effect True • Also know as – 1-β. • The probability of correctly saying there is a treatment effect when there is a treatment effect. Power: Probability of saying there is a treatment effect when there is a treatment effect Probability False β 1-β 0.25 • The probability of saying there is a treatment effect when there is a treatment effect 0.1 You State True False α 0.2 0.15 0.05 5 Software • DSTPLAN (http://odin.mdacc.tmc.edu/anonftp/) – FREE! Excellent documentation. – Runs on Macintosh, DOS, and Windows • Analyst application is SAS • S-Plus • PASS 2000 – NCSS 2000 © 2001 John Sorkin. Do not use without permission 10 15 20 x Books • Cohen, J. Statistical Power Analysis for the Behavioral Sciences, 2nd Edition. Lawrence Erlbaum Associates, Inc. Hillsdale New Jersey 1988. • Mace, AE. Sample-Size Determination. Robert E. Krieger Pub Co., Huntington, New York 1974. 8 Sample Size Estimation, Power Analysis, and Minimal Detectable Difference Calculations References Structural Equation Modeling • MacCallum RC, Brown MW, Sugawara HM. Power analysis and determination of sample size for covariance structure equation modeling. Psychological Methods 1(2), 130149. • MacCallum RC, Brown MW, Hong S. Power analysis in covariance structure equation modeling using GFI and AGFI. Multivariate Behavorial Research 32(2), 193-210. 7/29/2008 Problem 1 – Sample Size • A clinical trail of a drug is planned in which half of the subjects will receive an active drug, the other half of the subjects will receive a placebo. • The mean HDL2 cholesterol concentration in the population is 10±2.5 mg/dl (mean ± SD). Problem 1 (cont) • A pilot study indicates that the drug increases mean HDL2 concentration to 14 mg/dl. • How many subjects need to be studied in each group assuming a significance (α) of 0.05 and a power (1-β) of 0.80? Assume HDL2 concentration is normally distributed on repeated testing. © 2001 John Sorkin. Do not use without permission 9 Sample Size Estimation, Power Analysis, and Minimal Detectable Difference Calculations 7/29/2008 Describing the Power Analysis • Prior data from our laboratory indicates that the mean HDL2 concentration in our population is 10±2.5 mg/dl (mean ± SD) and that the drug increases HDL2 concentration an average of 4 mg/dl to 14 mg/dl. Describing the Power Analysis (cont) • Sample size was calculated for a twotailed comparison of an experimental group to a control group with a significance (α) of 0.05 and a power (1β) of 0.80. Under these assumptions, we calculate that we will need 8 subjects per group to complete the study. Assuming a 50% drop out rate, we will enroll 16 subjects per group. © 2001 John Sorkin. Do not use without permission Problem 2 - Power • A clinical trail of a drug is planned in which half of the subjects will receive an active drug, the other half of the subjects will receive a placebo. • The mean HDL2 cholesterol concentration in the population is 10±2.5 mg/dl (mean ± SD). 10 Sample Size Estimation, Power Analysis, and Minimal Detectable Difference Calculations 7/29/2008 Problem 2 – Power (Cont) • A pilot study indicates that the drug increases mean HDL2 concentration to 14 mg/dl. • Ten subjects will be enrolled in the control group and ten subjects in the experimental group. • What is the power of the study (1-β) assuming a significance (α) of 0.05? Assume HDL2 concentration is normally distributed on repeated testing. Problem 3 – Minimal Detectable Difference • A clinical trail of a drug is planned in which half of the subjects will receive an active drug, the other half of the subjects will receive a placebo. The mean HDL2 cholesterol concentration in the population is known to be 10±2.5 mg/dl (mean ± SD). Problem 3 (cont) • Five subjects will be enrolled in the control group and five subjects in the experimental test. • What is the minimal increase in HDL2 concentration that can be detected assuming a significance (α) of 0.05 and a power (1-β) of 0.80? Assume HDL2 concentration is normally distributed on repeated testing. © 2001 John Sorkin. Do not use without permission 11 Sample Size Estimation, Power Analysis, and Minimal Detectable Difference Calculations 7/29/2008 Suggestions • Consult a statistician (early and often). – Include a statistician in your grant. • Adjust all calculations for loss to followup. • Avoid one-sided tests. – If you use a one-sided test, justify its use! Suggestions • Use a significance of 0.05 or better, e.g. 0.01 – If you use a value >0.05 (e.g. 0.10) justify its use. • Use a power of 0.80 or better, e.g. 0.90 If you use a value <0.80, e.g. 0.60 justify its use. How to Contact Me (I Can’t Run and I Can’t Hide) John Sorkin, M.D. Ph.D. University of Maryland School of Medicine Baltimore VA Medical Center 10 North Greene Street (BT/18/GR) Baltimore MD 21201 410 605-7119 JSorkin@grecc.umaryland.edu Response Types Response Types Continuous Outcome Two-State • Analyses done so far assume an continuous outcome (or response) – Used to study a continuous variable that can take on a large range of values • Serum cholesterol • IQ • Height © 2001 John Sorkin. Do not use without permission • To study a variable that can take on only one of two states – Disease vs. non-diseased – Yes vs. no – Dead vs. alive • Other techniques are needed 12 Sample Size Estimation, Power Analysis, and Minimal Detectable Difference Calculations Response Types Two-State Response • Follow-up time – Not important – Important 7/29/2008 Two-State Response Yes vs. No (Follow-up time not important) • Follow-up time not of interest – Binomial distribution • Other techniques are needed – Not addressed in this presentation. • Survival time Two-State Response Survival Time Survival: Yes vs. No (Follow-up time important) • Survival in two groups • Exponential distribution – Treated vs. untreated – Follow-up (i.e. survival) time of primary interest. • Survival time – Assumes constant risk of death • Risk is NOT a function of age – Can be used where age does not influence survival – Time to death – Time to failure • Short term survival • Survival immediately after treatment • Analyses matched by age • Does age affect survival? – No: – Yes: Exponential Distribution Exponential distribution vs. Proportional hazards model Normal Distribution Normal Distribution Requires Two Parameters Probability vs. Value 0.08 Mean 0 SD 5 0.07 Probability( x ) = e − ( x − μ ) 2σ σ 2π 2 2 μ = Mean σ = Standard Deviation 0.06 0.05 Probability 0.04 0.03 0.02 0.01 0 -30 -20 -10 0 10 20 30 X © 2001 John Sorkin. Do not use without permission 13 Sample Size Estimation, Power Analysis, and Minimal Detectable Difference Calculations 7/29/2008 Normal Distribution Normal Distribution Probability vs. Value Probability vs. Value 0.08 0.08 Mean 0 SD 5 0.07 Mean -10 SD 5 0.07 0.06 Mean 0 SD 5 0.06 0.05 0.05 Probability 0.04 Probability 0.04 0.03 0.03 Mean 0 SD 10 0.02 0.02 0.01 0.01 0 Mean 0 SD 10 0 -30 -20 -10 0 10 20 30 -30 -20 -10 X 0 10 20 30 X Normal Distribution Normal Distribution Probability vs. Value Standard Deviation Defines Spread 0.08 0.08 Mean 0 SD 5 0.07 Mean 0 SD 5 0.07 0.06 0.06 0.05 0.05 Probability 0.04 Probability 0.04 0.03 0.03 0.02 0.02 0.01 0.01 0 Mean 0 SD 10 0 -30 -20 -10 0 10 20 30 -30 -20 -10 X 0 10 20 30 X Normal Distribution Exponential Distribution Mean Defines Location 0.08 Mean -10 SD 5 0.07 Requires One Parameter Mean 0 SD 5 MortalityRisk ( x ) = λe − λt 0.06 0.05 1 Probability 0.04 λ 0.03 = Mean Mean 0 SD 10 0.02 0.01 0 -30 -20 -10 0 10 20 30 X © 2001 John Sorkin. Do not use without permission 14 Sample Size Estimation, Power Analysis, and Minimal Detectable Difference Calculations 7/29/2008 Exponential Distribution Exponential Distribution Mortality Risk vs. Time 1.0 Requires One Parameter Lambda=1 0.8 MortalityRisk ( x ) = λe − λt 0.6 1 λ 1 λ2 Mortality Risk = Mean 0.4 = Standard Deviation 0.2 0.0 0.0 1.0 2.0 3.0 4.0 Time Exponential Distribution Exponential Distribution Mortality Risk vs. Time 1.0 Requires One Parameter Lambda=1 0.8 MortalityRisk ( x ) = λe − λt 0.6 1 Mortality Risk λ 0.4 1 Lambda=0.5 λ2 0.2 0.0 1.0 2.0 3.0 λ 4.0 Time Basic Concepts • • • • • • • • • = Standard Deviation ln2 0.0 Study design Mean value of the characteristic Distribution of the measurements Variability of the measurements (SD) Statistical significance (α error rate) Number of subjects per group Effect size One or two-sided test Power (1-β) © 2001 John Sorkin. Do not use without permission = Mean = Median ≅ 0.693 λ Basic Concepts for Survival • Study design • Mean value of the characteristic – Mean (or median) survival • Distribution of the measurements – Exponential • Variability of the measurements (SD) • Statistical significance (α error rate) • Number of subjects per group – Accrual time and rate • Effect size – Alternative survival (mean or median) • One or two-sided test • Power (1-β) • Follow-up time 15 Sample Size Estimation, Power Analysis, and Minimal Detectable Difference Calculations 7/29/2008 Follow-up Study Accrual and Follow-up Time Survival Time Taking Age Into Account • Used for Cox Proportional Hazards Regression • Simulations – Hard to do – Take age into account Problem 4 – Sample Size Problem 4 (cont) • A clinical trail of a drug is planned in which half of the subjects will receive an active drug, the other half of the subjects will receive a placebo. • Mean survival in untreated subjects is know to be 2 months, and has been successfully modeled assuming mortality risk follows a pattern of exponential decay. • A pilot study indicates that the drug increases mean survival to four months. • How many subjects need to be studied in each group assuming a significance (α) of 0.05 and a power (1-β) of 0.80? © 2001 John Sorkin. Do not use without permission 16 Sample Size Estimation, Power Analysis, and Minimal Detectable Difference Calculations 7/29/2008 Describing the Power Analysis • We performed sample size calculations under the assumption that the risk for mortality in the control and treatment groups follows exponential decay. Our calculations were performed assuming a Type I error rate (α) of 5% and a power (1-β) of 80%. • Clinical studies have shown an average pretreatment survival of 2 months. Based on preliminary studies in our laboratory, we believe that our new treatment will result in an average post-treatment survival of 4 months. © 2001 John Sorkin. Do not use without permission 17 Sample Size Estimation, Power Analysis, and Minimal Detectable Difference Calculations Describing the Power Analysis (cont.) • Base on the assumptions noted above, we will need to recruit and enroll a total of 90 subjects (45 control and 45 who will receive our new drug) over a one-year period, and then follow subjects for four years beyond the one-year recruiting period. Given the short nature of the study and the need for universal need for medical follow-up in these patients, we anticipate 100% follow-up. © 2001 John Sorkin. Do not use without permission 7/29/2008 Power, Sample Size, Minimum Detectable Difference 07/09/2005: Evaluation of the presentation (Circle a value that indicates your evaluation of the presentation) Best ever Worst ever 10-------------9-------------8------------7-------------6-------------5-------------4-------------3-------------2-------------1 What I like most about the presentation What I like least about the presentation Suggestions: Questions (If you give me you name, I will try to find you and answer your question): 18