Document 14529475

advertisement
Justifying Small-N for ISS Research: Can NASA Researchers Stray from Traditional Standards?
R
, James Fiedler1,2, Ph.D., Alan H. Feiveson1, Ph.D.
Robert J. Ploutz-Snyder1,2, Ph.D., PStat
1
Universities Space Research Association, Houston, TX, 2NASA Johnson Space Center, Houston, TX
Abstract
The Threshold Myth of Minimum-n
Value of Information Theory
Study Value as a Function of Step Costs
When submitting research proposals for external funding, principal investigators (PIs) are asked to justify their proposed sample size (n).
Grant writing dogma typically requires PIs to justify a sample size that
achieves at least 80% power to reject the null hypothesis with statistical tests using 2-tailed α = 0.05, an exercise that depends on knowing
variability and the anticipated effect size. This traditional approach is
a difficult task for any researcher to truly accomplish due to the number of assumptions that are required (but often not fully appreciated)
and the requirement for a known effect size with novel research. NASA
investigators have even more difficulty with this tradition because the
availability of our experimental subjects (e.g., long duration astronauts,
high fidelity analog subjects) is incredibly limited, and also because of
the extreme costs and time necessary for conducting large-n studies. We
present some recent ideas that extend Value of Information Theory to
sample size calculations. We argue that these methods are reasonable
alternatives to traditional sample size calculations, and should be considered for research that is highly innovative, costly, difficult to conduct,
or prohibitively difficult to complete within a reasonable time frame.
Few would argue that well-conceived research is highly valuable when
it produces statistically significant findings that are consistent with our
understanding of the discipline(s). Is it the case that all “non-significant”
research is worthless—i.e., there is no value in a study that fails to reject
the null hypothesis? Is it possible that the null hypothesis is actually
supported in the data, or that the actual effect size is smaller than we
assumed it would be? Alternatively, if the data do not support H0, say at
P = 0.06, is there no value to that information? In other words, is there
something of value other than P < 0.05 for well-conceived research,
or is it the case that these studies are “fundamentally-flawed” as has
been commonly claimed by grant reviewers remarking on studies that
are “underpowered” to detect statistical significance?
Value of Information (VOI) Theory [5] suggests
We chose Bedrest as a motivating example because there are substantial
“step” costs associated with the maximum number of bedrest participants the facility can process per year. For example, if the facility were
completely dedicated to a single experiment, the costs associated with
n = 18 would be substantially more than n = 17 because the addition of
that one extra subject would incur an expense (i.e., annual costs that are
not pro-rated per subject). In reality, the facility runs subjects in multiple studies simultaneously. We are using the best available esimtates of
Bedrest cost, but we are simplifying the model by assuming the facility
is used for a single study at a time. We arge that this simplification does
not invalidate the application of cost-based sample size determination
for similar situations.
Truth about H0
H0 is false H0 is true
Power
β
which is a function that could be maximized to arrive at the most costefficient sample size. However Projected Value (Vn) is an elusive construct that is difficult, if not impossible, to define.
• What is the value of a cure? A missed cure?
• What is the value of a negative side effect?
• Do your values match mine?
α
1−α
Convention sets Type I error rates (α), the chance of rejecting the null
and claiming a difference exists when it doesn’t, at α = 0.05. Scientific and statistical thinking has long held that Type I errors are the
most serious type of mistake, and so our statistical tests are set very
conservatively to minimize this possibility.
Type II error rates (β), the chance of failing to reject the null and
claiming no effect, are errors in which the data do not support rejecting
the null hypothesis, when in fact there really is an effect. These errors
are represented in sample size calculations as “power” (1 − β), where
power is the probability of rejecting a false null. Traditional sample size
justification sets power to 80% (thus β = 0.20) so that the likelihood of
detecting a true effect is high.
One obvious problem with this approach is that it requires PIs to
know the effect size that they hope to observe before collecting any
data! This requires a large leap of faith from imperfect pilot data or
marginally related manuscripts that are “close enough” to the proposed
research to be useful, yet “not too similar” as to render the proposed
research unnecessary and duplicative. A second problem is that the
calculations involved rely heavily on a set of assumptions for which even
small deviations can dramatically affect the “answer” (i.e., minimum
required n), thereby opening the door for miscalculations and/or the
erosion of scientific integrity.
One inherent problem with this paradigm is that while it certainly
is possible to perform these calculations, the result is an answer that
suggests that as long as n subjects participate in the study, the research
will be successful. The implicit interpretation is that there is a threshold
for n below which the study will provide no value (i.e., “fundamentally
flawed”) and above which the study has high value [1].
Bacchetti, McCulloch and Segal (BMS) [3] develop methods that avoid
the need to quantify projected value by using a surrogate function for
Vn that increases with n at least as fast as any reasonable definition of
Vn. Their model posits that there is a positive function f (n) and a value
n∗ such that
Cn∗
Cn
∗, and
≤
for
all
n
>
n
f (n∗) f (n)
Vn
Vn∗
∗, and
≥
for
all
n
≥
n
f (n∗) f (n)
Vn
Vn∗
∗
≥
for
all
n
≥
n
Cn∗ Cn
Figure 1: Qualitative depiction of the threshold myth. Reproduced from [1] with permission.
Even if one considers only traditional value metrics, like power, there
simply is no sample size threshold that separates valuable from valueless research, yet the belief is pervasive. In fact, through statistical simulations, ten common metrics of study value have been shown to have
roughly the same curve relating value to sample size [3]
(4)
(5)
(6)
Applying BMS’ Model to Bedrest Research
Given the challenges and limitations with traditional sample size calculations, and particularly given the challenges associated with conducting NASA research in the space or analog environments, we argue that
sample size justification for NASA-funded research should incorporate
broader appreciation of what is valuable. We review one approach that
links study cost to n in order to design the most cost-effective study.
We applied these methods to cost data derived with cooperation from
the NASA Flight Analog Project’s Bedrest Research Program. NASA’s
Bedrest Research Program is designed to operate year-round with continuous subject participation unless disrupted by severe weather events
or other unforseen circumstances. Given the number of beds in the facility, and assuming 70-day bedrest studies are being conducted (plus
the requisite pre and post support), the facility is able to complete approximately 17 bedrest subject trials per year. Some of the primary
costs involved in running this facility include recruitment, screening, dietary staff and supplies, medical personnel, administrative staff, costs
associated with collecting standard measures (e.g. bloodwork, MRI),
and hospital/facility rental. Investigators might also propose collecting data beyond the standard measures, which would incur additional
costs. Some of these costs are essentially fixed per year and others are
dependent on n.
1
34
51 55
n
No Step Costs
Large Step Costs
Moderate Step Costs
Substantial Step Costs
(3)
and suggest choosing either nmin, the smallest n that minimizes cost per
subject (5), or nroot, the smallest n that minimizes the cost per square
root of n (6).
Figure 2: Shapes of the relationship between projected value and sample size for 10 measures of study values and
situations (scale removed for clarity). Curves include (a) Shannon information with n0 = 100, where no is n equivalent
of the prior information; (b) reciprocal of confidence interval width; (c) reduction in Bayesian credible interval width
when n0 = 100; (d) reduction in squared error versus using prior mean when n0 = 100; (e) power for a standardized
effect size of 0.2; (f) additional cures from a Bayesian clinical trial with prior means (SDs) for cure rates of 0.4 (0.05)
versus 0.4 (0.1); (g) gain in Shannon information with n0 = 2; (h) reduction in squared error versus using a single
observation; (i) reduction in squared error versus using prior mean when n0 = 2; (j) reduction in Bayesian credible
interval width when n0 = 2. Reproduced from [3] with permission.
17
(2)
and that if f (n) can be chosen so that condition (2) holds for any n
under consideration, then choosing n∗ to minimize Cn/f (n) selects the
smallest sample size that meets (2) and (3) and guarantees the most
cost-efficient n is met or exceeded.
They propose two choices of f (n) for implementing their strategy
f (n) = n
√
f (n) = n
Study Costs per n
1/2
• How would you combine all of the positive and negative values to
arrive at “study value”?
BMS’s Extension of VOI Model
Conventional sample size justification is based on hypothesis testing,
where one rejects the null hypothesis of no difference if P < 0.05, assuming that we know the effect size a-priori, the standard deviation
(σ) of our dependent variable(s),and we accept conventional choices for
Type I and II errors that we may make in our hypothesis testing. Applying traditional frequentist hypothesis test theory, there are four possible
outcomes following a decision to reject or “accept” a null hypothesis
We reject H0 and assume Ha to be true
We accept H0 and assume it to be true
(1)
• What is the value of a scientific publication? Are all journals equal?
Background and Significance
Our decision
Projected Value (given n)
Cost Efficiency =
Total Study Cost (given n)
Vn
=
,
Cn
Figure 3: Study value per square root of n vs. sample size (n) with different magnitudes of annual costs (i.e., step
costs). Study cost per subject decreases in a smooth curve until an optimal n is reached, and then rises. This is in
sharp contrast to the optimal n when step costs are involved, where the annual costs based on steps of n is apparent.
Smaller optimal n results when a larger proporion of the total costs are step costs.
Figure 3 above shows four curves representing the relationship between
the cost per subject using BMS’s root-n function (6) and proposed sample size. The most cost-effective sample size is the location where the
curve begins to increase. The dashed curve serves as a reference, and it
represents this relationship if there were no step costs associated with
each block of n = 17 subjects. In the reference curve, n = 55 represents the bottom of the curve, where both increasing and decreasing n
results in increasing cost per subject. The other three curves show the
relative impact of moderate, large, and substantial step costs, with their
respective cost-effective sample sizes of n = 17, 34 and 51, respectively.
Conclusion
We support Bacchetti and colleagues in arguing that, particularly for
highly innovative, novel and expensive research, the dogmatic application of traditional sample size justification is unreasonable, and new ideas
are necessary. Their extension of Value of Information Theory seems as
reasonable a method as any that we are aware of.
References
[1] Bacchetti, P., (2010). Current Sample Size Conventions: Flaws, Harms, and Alternatives. Biomed
Central Medicine, 8:17.
[2] Bacchetti, P., Wolf, L.E., Segal, M.R., McCulloch, C.E. (2005). Ethics and Sample Size. American
Journal of Epidemiology, 161(2): 105-110.
[3] Bacchetti, P., McCulloch, C.E., Segal, M.R. (2008). Simple, Defensible Sample Sizes Based on Cost
Efficiency. Biometrics, 64:577-594.
[4] Bacchetti, P., Deeks, S.G., McCune, J.M. (2011). Breaking Free of Sample Size Dogma to Perform
Innovative Translational Research Science Translational Medicine, 3(87):24
[5] Yokota, F.,& Thompson, K.M (2004). Value of Information Analysis in Environmental Health Risk
Management Decisions: Past, Present and Future. Risk Analysis, 24:635-650.
Download