Statistical considerations for grants Brian Healy Comments from previous class Change time of course Available on-line power calculators – http://www.cs.uiowa.edu/~rlenth/Power/ Two-sided vs. one-sided Comparison of statistical packages Review Type I error Type II error Ways to increase power Power/sample size calculation with continuous outcome Type I error We could plot the distribution of the sample means under the null before collecting data Type I error is the probability that you reject the null given that the null is true a = P(reject H0 | H0 is true) Notice that the shaded a area is still part of the null curve, but it is in the tail of the distribution Type II error Definition: when you fail to reject the null hypothesis when the alternative is in fact true (type II error) This type of error is based on a specific alternative b= P(fail to reject the H0 | HA is true) Power Definition: the probability that you reject the null hypothesis given that the alternative hypothesis is true. This is what we want to happen. Power = P(reject Ho | HA is true) = 1 - b Since this is a good thing, we want this to be high Outline Aspects of statistical considerations section of a grant Example statistical analysis section Worked example from dataset from students in class Management of data collection/ spreadsheet Aspects of statistical considerations Overarching statistical issues: – Data management – Methodological issues esp. related to data collection (ex. Image processing) – Handling missing data – Clustering/correlation of observations Specific aims: – Identify outcomes/explanatory variables – Type of analysis – Power calculation Research study I. Study design • • Experimental question- What are you trying to learn? How will you prove this? Sample selection- Who are you going to study? II. Data collection • What should be collected? III. Analysis of data • • Results- Was there any effect? Conclusions- What does this all mean? To whom do results apply? Experimental question: What? How? Sample selection: Who? How many? Collect Data Analysis: Is there an effect? Conclusion: To whom? How is statistics related to each stage? I. Study design • Experimental question- Define outcome, sources of variability, unit and analysis plan • Sample selection- Sample size, type of sample Experimental question In a grant, the experimental question is written as the specific aims – Generally, specific aims can be easily translated to a null hypothesis – If specific aims are more general, the specific null hypotheses are listed in the grant after the aims – This is the critical step in the grant because everything else is based on the aims – Usually easiest if can set up hypothesis as Y/N question Example Dr. Janet Hall kindly provided a grant to use as an example One goal of the grant was to investigate whether age had an effect on estrogen treatment in post-menopausal woman – Is there an interaction between estrogen and age? – The treatment is given to increase resting metabolic activity in the brain as measured by PET and other neuroimaging modalities In addition, the effect of age on resting metabolic activity at baseline (untreated) was of interest Specific aim 1 SPECIFIC AIM #1: To determine the effect of aging on changes in baseline (resting state) cortical function and their responses to estrogen using FDG-PET. – Hypotheses: – Resting state metabolic activity, as measured by FDGPET at baseline, is decreased in the dorsolateral prefrontal cortex (DLPFC) and increased in the hippocampus as a function of age. – Estrogen exposure results in progressive increases in resting metabolic activity in the DLPFC over time in young postmenopausal women that is not seen in their older counterparts. Hypothesis 1 Resting state metabolic activity, as measured by FDG-PET at baseline, is decreased in the dorsolateral prefrontal cortex (DLPFC) as a function of age. What is the experimental question? – Is the FDG-PET level different in the hippocampus or DLPFC for women of different ages? – What is the outcome? – What is the explanatory variable? Types of variables The outcome is FDG-PET level and this is a continuous variable The explanatory variable, age, could be considered continuous, but for this grant it was decided to group patients into young postmenopausal women (age 45-55) vs. old postmenopausal women (age 70-80) What type of analysis would we use in this case? – Are the data approximately normal? Sample selection Our sample selection is based on the definition of the groups – What is the effect of this definition? Does it affect the generalizability of the findings? For this study, we plan to sample small groups from a single site – Could another approach have been used? – What is the advantage of a single site? Disadvantage? Sample size calculation We have defined our null hypothesis, outcome and sample selection What sample size do we need? In this case, previous data showed mean (SD) FDG-PET at DLPFC in the young group of 83.0 (7.3) and in the old group of 76.2 (7.3) – What else do we need for our sample size calculation? – Power=0.8, a=0.05 Assuming equal groups, we need 20 patients per group Additional considerations Multiple comparisons – We have two outcomes so should we adjust the significance level for the two comparisons? – Bonferroni correction for significance level Do any specifics regarding the measurement need to be discussed? Confounders/adjustment Abbreviated grant section Analysis plan: The two groups will be compared using a two-sample t-test. Power calculation: Previous data has estimated the mean (SD) FDG-PET in the young group of 83.0 (7.3) and in the old group of 76.2 (7.3). Group sample sizes of 20 and 20 achieve 82% power to detect a difference of 6.8 between the two assuming a standard deviations of 7.3 in each group and a significance level of 0.05 using a two-sided two-sample t-test. Hypothesis 2 Estrogen exposure results in progressive increases in resting metabolic activity in the DLPFC over time in young postmenopausal women that is not seen in their older counterparts. What is the experimental question? – Is the effect of estrogen on the FDG-PET level in the DLPFC different for women of different ages? Types of variables One potential outcome is change in FDGPET level and this is a continuous variable Age group and treatment are the explanatory variables How many FDG-PET levels are measured and how many observations contribute to the analysis? What type of analysis could we use in this case? Data set-up We measure the change in four types of patients (young/treated, young/placebo, old/treated, old/placebo) We can estimate the mean change in all groups, but what is truly of interest for our hypothesis? – Interaction between the two measures – Linear regression/two-way ANOVA Young Old Treated MeanYoung,treated MeanOld,treated Placebo MeanYoung,placebo MeanOld,placebo Mean in treated old patients Mean in untreated old patients Mean in treated young patients Mean in untreated young patients Sample selection Now that our outcome and explanatory variable are clearly defined Our sample selection in this case is a little more complex – Age group is defined by enrollment – Patients in each group were randomized to treatment or placebo – What does the randomization get for us? Sample size calculation We have defined our null hypothesis, outcome and sample selection What sample size do we need? – What preliminary data would we need or what would we need to hypothesize to calculate the sample size? Some resources for this complex design on-line, but likely you should consider speaking to a statistician for this Abbreviated grant section Analysis plan: The effect of age on the treatment effect of estrogen in post-menopausal women will be investigated using a two-way ANOVA. The outcome for the analysis will be the change in the FDG-PET level before and after the treatment and the two factors will be age and treatment group. The focus of the analysis will be the interaction between the two factors. Power calculation: Given our preliminary data and available sample size, we will have 80% power to detect a hypothesized difference of x using a two-way ANOVA. Alternative analysis strategy Rather than focusing on the difference between the before and after treatment measurements, we could have included all of the measurements in a single model – Each patient contributes a before and after treatment measurement rather than a difference The analysis of this approach requires accounting for the repeated measures within a subject – Repeated measures ANOVA or mixed effects model Advantages of this approach Handles missing data more easily Generalizes to more than two measurements easily Power calculations with mixed effects models can be completed as well Conclusions Each hypothesis needs an analysis plan that describes the type of data and statistical approach used to analyze the data Each hypothesis also requires a sample size or power calculation Additional issues (missing data, confounding) must be included in the statistical analysis section Worked example Kidney transplant research Students in the class are investigating the effect of genetics of the donor/recipient pair on various outcomes – Creatinine level measured at time of transplant, 3 month, 6 month, 12 month and 36 months after transplant – Time to rejection of the transplant – Type of rejection (acute/chronic) Genetic factor of interest is large deletion polymorphisms at 20 sites Study design Patients have been followed at 4 different sites since 1995 – – – – Korea Finland BWH MGH Only HLA genetic data is available at the moment, but would like to genotype sufficient numbers of patients to determine if there is an effect Experimental question Specific aim: To explore the potential contribution of a new class of large deletion polymorphisms on the development of acute and chronic renal allograft rejection following renal transplantation. – Hypotheses: – Donor/recipient pairs with matching deletion polymorphisms will have lower creatinine levels at all time points compared to non-matched pairs – Donor/recipient pairs with matching deletion polymorphisms will have fewer acute/chronic rejection events compared to non-matched pairs Definition of groups Both the donor and recipient for each transplant will be genotyped and classified as either having the deletion or not having the deletion We decided to treat each group separately initially. What type of variable is the explanatory variable? Donor with deletion Recipient with deletion Group 1 Donor w/o deletion Group 2 Recipient w/o deletion Group 3 Group 4 Creatinine levels Here are the initial values for the creatinine for one of the populations Note the outliers at the end of the distribution. These would be very important to model Turned out they were incorrect data Analysis plan Initially, we will compare each creatinine measurement separately Since I have 4 groups (categorical outcome) and a continuous outcome, I will compare across the groups using ANOVA – The corrected data look sufficiently normal to make this analysis plan reasonable – An alternative option would be to use a Kruskal-Wallis test, which is a rank-based test that is not sensitive to the outliers Abbreviated analysis plan Analysis plan: The four groups of donor/recipient pairs will be compared using ANOVA. If a significant difference between the groups is observed, the pairwise comparisons will be completed with the appropriate correction for multiple comparisons. Although we could investigate the main effect of the donor’s and recipient’s deletion status in a two-way ANOVA model, our interest is in the four group comparison given the relationships seen in previous work. Additional considerations Rather than modeling each creatinine separately, should we model them together? – Trend with time? – Multiple comparisons if treat separately? Confounders: – Age – Gender – HLA status Should we treat all 20 deletion separately? Power calculation Unlike the previous example, we have no preliminary data regarding the effect of these deletions How can we complete a power calculation? – Option 1: Propose a sample size from each group and determine the difference between groups you could detect – Option 2: Estimate the effect using an available measurement/literature value Available measurement In the dataset, we have HLA status and can calculate the mean (SD) in each of the four groups Using this preliminary data, we can perform a power calculation and assume the effect size for the deletions will be similar to HLA – How good of a surrogate is HLA for deletion? Abbreviated power calculation Power calculation: Our preliminary data have shown that the mean (SD) month 12 creatinine levels of the recipient was 1.21 (0.29) in HLA identical donor/recipient pairs and 1.28 (0.33) in HLA non-identical donor/recipient pairs. We anticipate that recipients who are deletion matches will behave like the HLA identical recipients and recipients who are not deletion matches will behave like the HLA non-identical recipients. A sample size of 202 per group is required to have 80% power to detect the proposed difference between the groups at the 0.05 level using one-way ANOVA Additional considerations Pairwise tests Is there a better approximation for the group means? Clustering by country Proportion with acute rejection Another outcome for the study is the proportion of patients who experience acute rejection The table at the end of the study would look like this: Dyes/Ryes Acute rejection No acute rejection Dno/Ryes Dyes/Rno Dno/Rno Abbreviated analysis plan Analysis plan: The proportion of patients who have acute rejection will be compared across the groups using a chi-square test for each deletion separately. In order to investigate the combined effect of deletions, multiple logistic regression models will also be fit. Power analysis As previously, there is no preliminary data, but let’s try the set sample size approach now Assume that we have two groups, matched and non-matched, and we have 200 matched patients and 400 nonmatched patients What type of power analysis could we complete? Abbreviated power analysis Power analysis: Given our sample size (200 matched patients and 400 non-matched patients) and the assumption that matching would decrease the proportion with acute rejection, we will have at least 80% power to detect the differences presented in Table xx. Proportion with acute rejection among matches 0.2 Proportion with acute rejection among mismatches 0.33 0.3 0.4 0.5 0.45 0.55 0.65 Additional considerations Clustering by region – Stratified analysis Management of data collection Example Excel sheet All relevant information should be included as a column (Loss to follow-up date) No symbols in column names (ex. #) and column names should be as short as possible No empty rows or columns (white space) Conclusions Experimental question must be well defined to set up an appropriate analysis plan Sample size calculation based on analysis plan. If uncertain of power calculation, consult a statistician Attempt to address other aspects of your data THANK YOU!!!