Robert Ploutz-Snyder, Ph.D. Biostatistician NASA JSC USRA / Division of Space Life Sciences Robert.Ploutz-Snyder-1@nasa.gov Biostats Lab Overview Review key terms & concepts relevant to sample size considerations Demonstrate easily accessible method for calculating sample size requirements Additional Strategies & Topics Key Terms & Concepts Demonstration Additional Strategies Biostats Lab Primary Outcome(s) Primary Outcome variable (a.k.a. Dependant Variable) Is it continuous or discrete? How is it distributed in the population? Have any pilot data with this outcome? Is it sensitive enough to detect the effects you are looking for?? ○ Precision, Reliability, Validity Secondary Outcomes Do you really need 30 different ―primary‖ outcomes?? Key Terms & Concepts Biostats Lab Effect ―Effect‖ what are you trying to observe happen? What are hypothesizing will change, or be different? Pre vs. Post Changes in Y Within Group ○ Main Effect for Time ○ ―We anticipate subjects’ coordination to decrease in response to simulated microgravity, with significant differences observed Post, relative to Pre.‖ Mean Outcome of Y in Treatment A vs. Treatment B ○ Main Effect for Treatment ○ ―We expect participants randomized to receive our novel intervention to report lower mean back pain ratings relative to participants randomized to receive standard care.‖ Pre/Post Changes in Y for one group relative to another ○ Time x Treatment Interaction Effect ○ ―We hypothesize that our novel intervention will result in a reduction in the amount of strength loss overt time, relative to controls who receive the intervention typically delivered to LD Astronauts.‖ Key Terms & Concepts Biostats Lab Effect (cont.) Note that ―statistical effect‖ (ex. main effect, interaction effect) ≠ hypothesized effect However merit review panels tend to respond well to descriptions of your anticipated Effect in terms of: The outcome variable of interest The experimental design The statistical effect for support Key Terms & Concepts Biostats Lab Effect Size ―Effect Size‖ How big of a change (difference) do you anticipate observing in the proposed study? Measured in the metric of your Outcome variable With specified standard deviation (σ) Typically based on evidence Pilot data collected from your lab Data published by others in a related study Even better if consistent with clinically, scientifically, operationally relevant changes that would be considered impressive Key Terms & Concepts Biostats Lab Effect Size (cont.) Smaller effects are more difficult to detect Variability is important consideration Be realistically optimistic about size estimates Does it really matter that you detected a ―significant difference‖ in XYZ, if that amount of difference would not affect … A person’s health, safety, performance? The way you’d rehab/treat persons affected ―this much (p<.05)‖ different? Decisions NASA would or could make based on your results (ex. did you have the right control group?) Statistical significance isn’t the reason we do research Key Terms & Concepts Biostats Lab Power ―Power‖ The probability of detecting an effect, given the effect size, experimental design & outcome variable(s) you have chosen, and your assumptions regarding Type I (α) and Type II (β) errors. Key Terms & Concepts Biostats Lab Type I & II Errors & Power The Truth is: H0 Really isTrue (there’s no effect) H0 is Actually False (there is an effect) You Rejected H0 Due to a Statistically Significant Result Wrong Conclusion Right Conclusion You Accepted H0 Due to a Non-Significant Result Right Conclusion Wrong Conclusion Key Terms & Concepts Biostats Lab Type I & II Errors & Power The Truth is: H0 Really isTrue (there’s no effect) You Rejected H0 Due to a Statistically Significant Result You Accepted H0 Due to a Non-Significant Result Key Terms & Concepts H0 is Actually False (there is an effect) Type I Error Probability = α Power Probability = (1-β) Probability = 1- α Type II Error Probability = β Biostats Lab What is a Power Analysis? Calculations performed when planning a future study that help determine the minimum number of subjects you will need to have a high likelihood of detecting the effect that you expect, given everything discussed thus far. Note that if the effect isn’t real, no amount of power will help you find it… Key Terms & Concepts Biostats Lab Parts of the Calculations Anticipated Effect size Variation of the outcome (σ) Assumed α risk Assumed Power Sample Size Alpha Effect Size & σ Power Minimum N Key Terms & Concepts Biostats Lab Other Considerations Keep it Simple! Power the PRIMARY outcomes, not all of them Distill your effect into a ―t-test‖ like comparison if possible Be mindful of Experimental Designs issues that can increase power Use data from multiple sources to validate Use software that has been validated Consult the Biostatistics lab if you need help! Key Terms & Concepts Biostats Lab On-line Power Software? Not all of it is good stuff! The Biostats Lab uses and recommends http://www.stat.uiowa.edu/~rlenth/Power/ Drs. Feiveson or Ploutz-Snyder can assist you if you want to learn how to use it… Key Terms & Concepts Biostats Lab Example of Recent Power Analysis Exercise Lab –Novel Intervention Knee extensor strength (KES) and endurance (KEE) reduces on ISS astronauts due to negative effects of space flight We have a new intervention that we think will reduce these negative effects Key Terms & Concepts Demonstration Biostats Lab Study Challenges For good reasons… we have no pilot data Thus difficult to project the likely effect size Our desire for a ―Usual Care‖ comparison group is competing with our desire to collect data on participants using our novel intervention And it’s largely out of our control because of self- selection Key Terms & Concepts Demonstration Biostats Lab What did we do? Hypothesize a reduced decline (i.e. Interaction effect) associated with a novel intervention… Powered to detect a wide range of strength differences (pre/post), including Changes similar to historical ISS data ○ i.e. no benefit above what we already do 5% reduction in the mean change 10% reduction in the mean change 15% reduction in mean change 5% increase (i.e. worse) in the mean change Key Terms & Concepts Demonstration Biostats Lab How did we do it? Used historical ISS data as a baseline for Pre/Post changes observed Assumed similar SD-change (σ) Assumed 2-tailed α = 0.05 Created Power/Sample Size curves associated with the five different effect sizes described in prior slide Did this for all of the Primary outcomes Key Terms & Concepts Demonstration Biostats Lab Note that While our statistical plan was far more sophisticated, power analysis is very simple Simplifying assumptions were conservative on alpha risk We did not set a ―critical‖ power to detect, as most NIH grant applications typically assume We allow the reviewer to examine the trade-off between Power and Sample Size But we choose n based on priorities, logistics, etc. Key Terms & Concepts Demonstration Biostats Lab Info Needed to do this: Mean & SD of Change from ISS Data KES: mean = 145.02; SD=155.5 Projected mean changes under different scenarios 5% reduction ~ 145.02*.95 = 137.8 MS Excel or similar (graphing) Web access to Power software: http://www.stat.uiowa.edu/~rlenth/Power/ Key Terms & Concepts Demonstration Biostats Lab How did we do it? Choose one-sample t-test Key Terms & Concepts Demonstration Biostats Lab How did we do it? Sigma (σ) = 155.5 Mean Diff = 145 α =.05 (default) Doesn’t matter what you set Power for, as we’ll go into the “Options” menu anyway! Key Terms & Concepts Demonstration Biostats Lab How did we do it? Click on the “Options” Menu, and select “Graph” Key Terms & Concepts Demonstration Biostats Lab How did we do it? Key Terms & Concepts Demonstration Biostats Lab How did we do it? Nice graph for starters… but I want the data, not the picture. Click on “Show Data” to get it. Key Terms & Concepts Demonstration Biostats Lab How did we do it? Now we have the data that we can copy into EXCEL (or other program) for comparisons with other power/sample size curves in this model. Biostats Lab Repeat for other effect sizes …Here, we hope not to observe the effect with our new intervention that we’ve seen previously… PI Determined clinically relevant differences that would be meaningful 5%, 10%, 15% Better PI also wanted to power the study to detect a worsening of spaceflight effects, should that occur Key Terms & Concepts Demonstration Biostats Lab End result? Power as a function of Sample Size (Knee Extension Endurance) 100% 90% 80% Power 70% 60% 50% 5% increase Observed Mean Decrease in KEE (ISS n=17) 40% 5% reduction 10% reduction 15% reduction 30% 10 11 12 Key Terms & Concepts 13 14 15 16 17 18 19 Sample Size per Group Demonstration 20 21 22 23 24 25 Biostats Lab Interpretation With as few as 15 subjects in the novel intervention, we exceed 80% power to detect changes in Knee Extensor Endurance ranging from 5% more that what historical ISS data projects to as much as 15% less. Key Terms & Concepts Demonstration Biostats Lab Too much of a good thing? Some disciplines have the luxury of very precise and sensitive outcome measures High signal-to-noise ratio Low within or between person variability Can lead to overpowered studies Studies where even a small n is sufficient to reveal statistically significant differences for very small effects ―In order to be a difference, it has to make a difference‖ Dr. Bill Paloski, former NASA Researcher/Manager Key Terms & Concepts Demonstration Additional Strategies Biostats Lab What if required n were too high? Underpowered Studies: Less likely to detect differences—inconclusive results & frustration! Waste of resources, given ―no-answer‖ potential Less justifiable risk to participants Displacement of other research that could have taken place Limited scientific contributions …but… is it useless? Key Terms & Concepts Demonstration Additional Strategies Biostats Lab What can I do to increase my power honestly? Choose appropriate & powerful statistical techniques Repeated-Measures & Mixed-Modeled designs Co-vary nuisance variance contributors Choose more sensitive outcome measures Continuously scaled vs. ordinal/categorical Choose more reliable outcome measures Reduce error variance Narrow subject selection (& inference space) Key Terms & Concepts Demonstration Additional Strategies Biostats Lab What can I do to increase my power honestly? Challenge traditional notions of what constitutes ―important‖ results α =.05 really necessary? 2-tailed alpha testing appropriate? Power = 80% a reasonable cut-off? Should you be operating under traditional inferential boundaries at all? Characterization, descriptive, feasibility studies are part of science too. Key Terms & Concepts Demonstration Additional Strategies Biostats Lab What’s my alternative? Instead of ―powering to detect a significant difference,‖ some research is more preliminary or exploratory in nature. If so, project the range of effects you will be able to detect, given n. In other words, turn the equation around. ―Given n=??, the effects that I will be able to detect are of ### magnitude.‖ Key Terms & Concepts Demonstration Additional Strategies Biostats Lab Back to our example: If I know that I can only recruit 10 astronauts to participate… Solve for effect size, given n, σ, α Key Terms & Concepts Demonstration Additional Strategies Biostats Lab Back to our example: Graph Power (Y) vs. Effect Size (X) to understand what you’ll be able to detect… Is it worth it to pursue? Key Terms & Concepts Demonstration Additional Strategies Biostats Lab Another Approach Report on the information the study will provide, given n1 When only able to collect data of n size, why bother estimating a required sample size? Instead: ―For a sample Y size, I get information Z‖ ―From sample data of n size, I can characterize the effects that I’m interested in with what level of precision?‖ 1Parker, R, & German, N. (2003). Sample Size: More than Calculations. The American Statistician, August 2003, Vol 57(3). Key Terms & Concepts Demonstration Additional Strategies Biostats Lab Precision Precision here refers to our accuracy in reporting the effect that we will observe in a future study, given n Commonly measured as the width of the 95% Confidence Interval Or ―Half-width‖ The narrower the better… Key Terms & Concepts Demonstration Additional Strategies Biostats Lab Returning to our example… Given the σ of the differences observed on ISS so far, if I collect data from a future study that assumes similar variability (but hopefully less decline in KEE), I will be able to characterize my observed effects with what level of precision? Key Terms & Concepts Demonstration Additional Strategies Biostats Lab How would you do this? Use historical data for σ (recall = 155) Apply the usual calculations for CI’s Except here, we estimate many CI’s for different sample sizes, and we do so for our metric of precision based on distance (D) between the edges of the CI and the middle (i.e. the ―half-width‖ ): Key Terms & Concepts Demonstration Additional Strategies Biostats Lab Precision Common to plot the curves representing relationship of half-width and sample size. You could do this manually (or Excel) by applying the formula for CI’s You could ask for assistance from the Biostatistics lab You could purchase specialized software (ex. PASS) to assist you http://www.ncss.com/pass.html Key Terms & Concepts Demonstration Additional Strategies Biostats Lab Interpretation? GIVEN n= whatever it is… report on the precision. Is that level of precision meaningful enough to pursue? Also consider where the slope begins to change, here around n~10 Is your possible n close to that? What would it take to get you there?? Key Terms & Concepts Demonstration Additional Strategies Biostats Lab Recap Sample size & power calculations: Should happen BEFORE you plan your research Requires knowledge about anticipated effect sizes & σ, and are best appreciated as a continuum, rather than a cut-off Are affected by experimental design, outcome variables & statistical can plan Key Terms & Concepts Demonstration Additional Strategies Biostats Lab Recap Not all studies have the luxury of recruiting the right number of subjects Overpowered studies can send false alarms Underpowered studies can lead to false conclusions Not all studies should be designed to ―detect statistical differences,‖ and thus benefit by a different type of sample size analysis that focuses on Precision, rather than Power Key Terms & Concepts Demonstration Additional Strategies Biostats Lab What if I need help? Consult the Biostatistics lab at JSC in the early phases of planning your study Dr. Al Feiveson ○ Director of Biostatistics Lab ○ alan.h.feiveson@nasa.gov Dr. Rob Ploutz-Snyder ○ robert.ploutz-snyder-1@nasa.gov But please allow us adequate time to devote to your study… Key Terms & Concepts Demonstration Additional Strategies Robert Ploutz-Snyder, Ph.D. Biostatistician NASA JSC USRA / Division of Space Life Sciences Robert.Ploutz-Snyder-1@nasa.gov