Statistical consideration for grant

advertisement
Statistical considerations
for grants
Brian Healy
Comments from previous class
Change time of course
 Available on-line power calculators

– http://www.cs.uiowa.edu/~rlenth/Power/
Two-sided vs. one-sided
 Comparison of statistical packages

Review
Type I error
 Type II error
 Ways to increase power
 Power/sample size calculation with
continuous outcome

Type I error
We could plot the distribution of the sample
means under the null before collecting data
 Type I error is the probability that you reject
the null given that the null is true
a = P(reject H0 | H0 is true) Notice that the shaded

a
area is still part of the
null curve, but it is in
the tail of the
distribution
Type II error
Definition: when you fail to reject the
null hypothesis when the alternative is in
fact true (type II error)
 This type of error is based on a specific
alternative
b= P(fail to reject the H0 | HA is true)

Power

Definition: the probability that you reject
the null hypothesis given that the
alternative hypothesis is true. This is what
we want to happen.
Power = P(reject Ho | HA is true) = 1 - b

Since this is a good thing, we want this to
be high
Outline
Aspects of statistical considerations
section of a grant
 Example statistical analysis section
 Worked example from dataset from
students in class
 Management of data collection/
spreadsheet

Aspects of statistical considerations

Overarching statistical issues:
– Data management
– Methodological issues esp. related to data collection
(ex. Image processing)
– Handling missing data
– Clustering/correlation of observations

Specific aims:
– Identify outcomes/explanatory variables
– Type of analysis
– Power calculation
Research study
I.
Study design
•
•
Experimental question- What are you trying to
learn? How will you prove this?
Sample selection- Who are you going to study?
II. Data collection
•
What should be collected?
III. Analysis of data
•
•
Results- Was there any effect?
Conclusions- What does this all mean? To whom
do results apply?
Experimental question:
What? How?
Sample selection: Who? How many?
Collect Data
Analysis: Is there an effect?
Conclusion: To whom?
How is statistics related to each
stage?
I.
Study design
• Experimental question- Define
outcome, sources of variability, unit
and analysis plan
• Sample selection- Sample size, type of
sample
Experimental question

In a grant, the experimental question is written
as the specific aims
– Generally, specific aims can be easily translated to a
null hypothesis
– If specific aims are more general, the specific null
hypotheses are listed in the grant after the aims
– This is the critical step in the grant because
everything else is based on the aims
– Usually easiest if can set up hypothesis as Y/N
question
Example
Dr. Janet Hall kindly provided a grant to use as
an example
 One goal of the grant was to investigate
whether age had an effect on estrogen
treatment in post-menopausal woman

– Is there an interaction between estrogen and age?
– The treatment is given to increase resting metabolic
activity in the brain as measured by PET and other
neuroimaging modalities

In addition, the effect of age on resting
metabolic activity at baseline (untreated) was of
interest
Specific aim 1

SPECIFIC AIM #1: To determine the effect of
aging on changes in baseline (resting state)
cortical function and their responses to estrogen
using FDG-PET.
– Hypotheses:
– Resting state metabolic activity, as measured by FDGPET at baseline, is decreased in the dorsolateral
prefrontal cortex (DLPFC) and increased in the
hippocampus as a function of age.
– Estrogen exposure results in progressive increases in
resting metabolic activity in the DLPFC over time in
young postmenopausal women that is not seen in
their older counterparts.
Hypothesis 1
Resting state metabolic activity, as measured by
FDG-PET at baseline, is decreased in the
dorsolateral prefrontal cortex (DLPFC) as a
function of age.
 What is the experimental question?

– Is the FDG-PET level different in the hippocampus or
DLPFC for women of different ages?
– What is the outcome?
– What is the explanatory variable?
Types of variables
The outcome is FDG-PET level and this is a
continuous variable
 The explanatory variable, age, could be
considered continuous, but for this grant it was
decided to group patients into young postmenopausal women (age 45-55) vs. old postmenopausal women (age 70-80)
 What type of analysis would we use in this case?

– Are the data approximately normal?
Sample selection

Our sample selection is based on the
definition of the groups
– What is the effect of this definition? Does it
affect the generalizability of the findings?

For this study, we plan to sample small
groups from a single site
– Could another approach have been used?
– What is the advantage of a single site?
Disadvantage?
Sample size calculation
We have defined our null hypothesis, outcome
and sample selection
 What sample size do we need?
 In this case, previous data showed mean (SD)
FDG-PET at DLPFC in the young group of 83.0
(7.3) and in the old group of 76.2 (7.3)

– What else do we need for our sample size calculation?
– Power=0.8, a=0.05

Assuming equal groups, we need 20 patients per
group
Additional considerations

Multiple comparisons
– We have two outcomes so should we adjust
the significance level for the two
comparisons?
– Bonferroni correction for significance level
Do any specifics regarding the
measurement need to be discussed?
 Confounders/adjustment

Abbreviated grant section
Analysis plan: The two groups will be compared
using a two-sample t-test.
 Power calculation: Previous data has estimated
the mean (SD) FDG-PET in the young group of
83.0 (7.3) and in the old group of 76.2 (7.3).
Group sample sizes of 20 and 20 achieve 82%
power to detect a difference of 6.8 between the
two assuming a standard deviations of 7.3 in
each group and a significance level of 0.05 using
a two-sided two-sample t-test.

Hypothesis 2
Estrogen exposure results in progressive
increases in resting metabolic activity in
the DLPFC over time in young
postmenopausal women that is not seen
in their older counterparts.
 What is the experimental question?

– Is the effect of estrogen on the FDG-PET level
in the DLPFC different for women of different
ages?
Types of variables
One potential outcome is change in FDGPET level and this is a continuous variable
 Age group and treatment are the
explanatory variables
 How many FDG-PET levels are measured
and how many observations contribute to
the analysis?
 What type of analysis could we use in this
case?

Data set-up
We measure the change in four types of patients
(young/treated, young/placebo, old/treated,
old/placebo)
 We can estimate the mean change in all groups,
but what is truly of interest for our hypothesis?

– Interaction between the two measures
– Linear regression/two-way ANOVA
Young
Old
Treated
MeanYoung,treated
MeanOld,treated
Placebo
MeanYoung,placebo
MeanOld,placebo
Mean in treated
old patients
Mean in
untreated old
patients
Mean in treated
young patients
Mean in
untreated
young patients
Sample selection
Now that our outcome and explanatory
variable are clearly defined
 Our sample selection in this case is a little
more complex

– Age group is defined by enrollment
– Patients in each group were randomized to
treatment or placebo
– What does the randomization get for us?
Sample size calculation
We have defined our null hypothesis,
outcome and sample selection
 What sample size do we need?

– What preliminary data would we need or what
would we need to hypothesize to calculate the
sample size?

Some resources for this complex design
on-line, but likely you should consider
speaking to a statistician for this
Abbreviated grant section
Analysis plan: The effect of age on the
treatment effect of estrogen in post-menopausal
women will be investigated using a two-way
ANOVA. The outcome for the analysis will be the
change in the FDG-PET level before and after
the treatment and the two factors will be age
and treatment group. The focus of the analysis
will be the interaction between the two factors.
 Power calculation: Given our preliminary data
and available sample size, we will have 80%
power to detect a hypothesized difference of x
using a two-way ANOVA.

Alternative analysis strategy

Rather than focusing on the difference between
the before and after treatment measurements,
we could have included all of the measurements
in a single model
– Each patient contributes a before and after treatment
measurement rather than a difference

The analysis of this approach requires
accounting for the repeated measures within a
subject
– Repeated measures ANOVA or mixed effects model
Advantages of this approach
Handles missing data more easily
 Generalizes to more than two
measurements easily
 Power calculations with mixed effects
models can be completed as well

Conclusions
Each hypothesis needs an analysis plan
that describes the type of data and
statistical approach used to analyze the
data
 Each hypothesis also requires a sample
size or power calculation
 Additional issues (missing data,
confounding) must be included in the
statistical analysis section

Worked example
Kidney transplant research

Students in the class are investigating the effect
of genetics of the donor/recipient pair on various
outcomes
– Creatinine level measured at time of transplant, 3
month, 6 month, 12 month and 36 months after
transplant
– Time to rejection of the transplant
– Type of rejection (acute/chronic)

Genetic factor of interest is large deletion
polymorphisms at 20 sites
Study design

Patients have been followed at 4 different sites
since 1995
–
–
–
–

Korea
Finland
BWH
MGH
Only HLA genetic data is available at the
moment, but would like to genotype sufficient
numbers of patients to determine if there is an
effect
Experimental question

Specific aim: To explore the potential
contribution of a new class of large deletion
polymorphisms on the development of acute and
chronic renal allograft rejection following renal
transplantation.
– Hypotheses:
– Donor/recipient pairs with matching deletion
polymorphisms will have lower creatinine levels at all
time points compared to non-matched pairs
– Donor/recipient pairs with matching deletion
polymorphisms will have fewer acute/chronic
rejection events compared to non-matched pairs
Definition of groups
Both the donor and recipient for each transplant
will be genotyped and classified as either having
the deletion or not having the deletion
 We decided to treat each group separately
initially. What type of variable is the explanatory
variable?

Donor with
deletion
Recipient with
deletion
Group 1
Donor w/o deletion Group 2
Recipient w/o
deletion
Group 3
Group 4
Creatinine levels
Here are the initial
values for the
creatinine for one of
the populations
 Note the outliers at
the end of the
distribution. These
would be very
important to model
 Turned out they were
incorrect data

Analysis plan
Initially, we will compare each creatinine
measurement separately
 Since I have 4 groups (categorical
outcome) and a continuous outcome, I will
compare across the groups using ANOVA

– The corrected data look sufficiently normal to
make this analysis plan reasonable
– An alternative option would be to use a
Kruskal-Wallis test, which is a rank-based test
that is not sensitive to the outliers
Abbreviated analysis plan

Analysis plan: The four groups of donor/recipient
pairs will be compared using ANOVA. If a
significant difference between the groups is
observed, the pairwise comparisons will be
completed with the appropriate correction for
multiple comparisons. Although we could
investigate the main effect of the donor’s and
recipient’s deletion status in a two-way ANOVA
model, our interest is in the four group
comparison given the relationships seen in
previous work.
Additional considerations

Rather than modeling each creatinine separately,
should we model them together?
– Trend with time?
– Multiple comparisons if treat separately?

Confounders:
– Age
– Gender
– HLA status

Should we treat all 20 deletion separately?
Power calculation
Unlike the previous example, we have no
preliminary data regarding the effect of
these deletions
 How can we complete a power
calculation?

– Option 1: Propose a sample size from each
group and determine the difference between
groups you could detect
– Option 2: Estimate the effect using an
available measurement/literature value
Available measurement
In the dataset, we have HLA status and
can calculate the mean (SD) in each of the
four groups
 Using this preliminary data, we can
perform a power calculation and assume
the effect size for the deletions will be
similar to HLA

– How good of a surrogate is HLA for deletion?
Abbreviated power calculation

Power calculation: Our preliminary data have
shown that the mean (SD) month 12 creatinine
levels of the recipient was 1.21 (0.29) in HLA
identical donor/recipient pairs and 1.28 (0.33) in
HLA non-identical donor/recipient pairs. We
anticipate that recipients who are deletion
matches will behave like the HLA identical
recipients and recipients who are not deletion
matches will behave like the HLA non-identical
recipients. A sample size of 202 per group is
required to have 80% power to detect the
proposed difference between the groups at the
0.05 level using one-way ANOVA
Additional considerations
Pairwise tests
 Is there a better approximation for the
group means?
 Clustering by country

Proportion with acute rejection
Another outcome for the study is the proportion
of patients who experience acute rejection
 The table at the end of the study would look like
this:

Dyes/Ryes
Acute
rejection
No acute
rejection
Dno/Ryes
Dyes/Rno
Dno/Rno
Abbreviated analysis plan

Analysis plan: The proportion of patients
who have acute rejection will be compared
across the groups using a chi-square test
for each deletion separately. In order to
investigate the combined effect of
deletions, multiple logistic regression
models will also be fit.
Power analysis
As previously, there is no preliminary data,
but let’s try the set sample size approach
now
 Assume that we have two groups,
matched and non-matched, and we have
200 matched patients and 400 nonmatched patients
 What type of power analysis could we
complete?

Abbreviated power analysis

Power analysis: Given our sample size (200
matched patients and 400 non-matched
patients) and the assumption that matching
would decrease the proportion with acute
rejection, we will have at least 80% power to
detect the differences presented in Table xx.
Proportion with acute
rejection among matches
0.2
Proportion with acute
rejection among mismatches
0.33
0.3
0.4
0.5
0.45
0.55
0.65
Additional considerations

Clustering by region
– Stratified analysis
Management of data collection
Example Excel sheet
 All relevant information should be included
as a column (Loss to follow-up date)
 No symbols in column names (ex. #) and
column names should be as short as
possible
 No empty rows or columns (white space)

Conclusions
Experimental question must be well
defined to set up an appropriate analysis
plan
 Sample size calculation based on analysis
plan. If uncertain of power calculation,
consult a statistician
 Attempt to address other aspects of your
data

THANK YOU!!!
Download