Reviewer 1 - BioMed Central

advertisement
Reviewer 1
1. Line 3 of methods (Abstract): “traditional methods or standard methodology” is still very
unspecific. Why not say e.g. “traditional one-stage sample size calculation” ? Are there
really multiple traditional methods, or is there just one ?
Response: The requested change has been made to the Abstract.
We examined the impact of the distribution of the dichotomous prognostic factor on
power and sample size for the interaction effect using traditional one-stage sample size
calculation.
2. Line 4 of results (Abstract): what does negative misspecification mean ? Say e.g.
misspecification such that the actual distribution was more skewed than planned. After
all, given that your planned distribution was always with p =< 0.50, negative
misspecification implied more imbalance. With p > 0.50 it would have meant more
balance.
Response: The requested change has been made to the Abstract.
Misspecification such that the actual distribution of the prognostic factor was more
skewed than planned led to a decrease in power with the greatest loss in power seen as
the distribution of the prognostic factor became less balanced.
3. P 1 bottom – p 2 top: This depends on the scaling of the interaction effect. In this paper
the interaction effect is defined in equation (2) as a contrast of four cell means, with
contrast coefficients -1 and +1. However, main effects are usually defined as contrasts
with coefficients -0.5 and +0.5: for the main effect of treatment: (mu11+mu12)/2 minus
(mu21+mu22)/2 for the main effect of the prognostic factor: (mu11+mu21)/2 minus
(mu12+mu22)/2.Using these same coefficients for the interaction, by changing equation
(2) into: (mu11 + mu22)/2 minus (mu12+mu21)/2, we see that the interaction effect and
the standard error of its estimator (i.e. the square root of equation (4)) are both divided
by a factor 2, making them comparable to those for the main effects. Your statement on
p 1-2, referring to Brookes, thus depends strongly on how interaction is scaled. With
your scaling in eq (2) the interaction indeed needs to be twice as large as the main effect
to have the same power. With my scaling, which is usual in ANOVA, it does not.
Response: We have clarified how the interaction effect is defined in the Introduction. The
following text was added on page 1:
Two articles by Brookes and colleagues showed that there is low power to detect an
interaction, scaled as a contrast of cell means, when a study is powered only to detect
the main effect unless size of the interaction effect is nearly twice as large as the main
effect [6, 7].
4. P 8 equation (7): - give a proof or reference for this equation, and say that (p1+q) =
actual prevalence. - replace nij-1 with N-4 as df for the t-statistic. P 11 halfway: so n1 =
nt/2 ? if so, then say this. If not, then clarify in view of page 10 saying that interim
analysis was done after the first N/2 subjects (where N = nt, I assume).
1
Response: We have updated the degrees of freedom in equation 7 and have clarified
the description on page 8 of the Methods section.
The effect of misspecifying the distribution of the prognostic factor was evaluated using
power curves. The formula used by Lachenbruch was extended to incorporate the
Student’s t-distribution [9]. Power for the interaction test, where Ψ is the cdf of the
Student’s t-distribution, by actual prevalence of the prognostic factor (p1 + q) and
magnitude of the interaction effect was calculated using equation 7 below.
𝑁(𝑝1 + 𝑞)(1 − 𝑝1 − 𝑞)𝜃 2
𝑃𝑜𝑤𝑒𝑟 = 1 − Ψ [√
− 𝑡(𝛼⁄ ,𝑁−4) ]
2
4𝜎 2
5. P 12, line -8: equation 10 must be equation (9)
Response: The Reviewer is correct and we have made the change on page 12.
The final sample size n2 and final critical value c2 were chosen so that the conditional
power formula shown in equation 9 was equal to 80%.
6. P 13 halfway: proportion k1 must be p1, or “proportion at level k1”
Response: The Reviewer is correct and we have made the change on Page 13.
Five thousand replications were performed for each combination of the interaction effect
and proportion at level k1.
7. P 13 line 3-4 from bottom: how can you expect quota sampling to have the same power
as the traditional one-stage sampling without quota ? As you say on page 9 quota
sampling prevents misspecification. And as you show on page 15 it preserves power
(table 1) which the traditional method does not (figure 1).
Response: We would expect quota sampling and traditional methods to have the same
power on average on under no misspecification. We have clarified the statement on
Page 13.
While we did not expect the quota sampling method to have power or type I error
estimates that differ from the traditional one-stage samling design under no
misspecifcation, we conducted the simulation study for this study design method to
confirm there was no impact on power and type I error.
8. P 5, line 6-12: make explicit that, given your restriction of the planned distribution to p =
0.10 to 0.50, negative misspecification means less balance/more skew, and positive
misspecification means more balance/less skew. This makes clear why negative
misspecification reduces power and positive misspecification increases power for the
traditional one-stage sample size planning, see your Figure 1 and your equation (4).
Response: We have clarified the definition of misspecification of the prognostic factor
distribution on page 5.
2
The misspecification of the prognostic factor was defined by the parameter q. The
misspecification could be positive or negative with negative misspecification implying
less balance (more skew) and positive misspecification implying more balance (less
skew).
9. P 6, last lines: add a comment that, as eq (4) shows, the variance of the interaction
effect estimator decreases, and so power increases, as p1 is closer to 0.50. This clarifies
the different effects of negative and positive misspecification in Figure 1.
Response: We have added the comment to Page 7 in the Method section.
It is clear from the equation as the prevalence of the prognostic factor (p1) increases, the
variance decreases, which would imply that the power increases for a fixed sample size.
10. P 7, l 2: j-th factor must be j-th factor level.
Response: We added the word level to Page 7 in the Methods section.
The sample size required for the ith treatment and jth prognostic factor level to detect the
interaction effect described under a balanced design (i.e. p1=0.5) with a two-sided
significance level of α and power equal to 1– β has been previously published by
Lachenbruch [9].
11. P 18, 2nd paragraph: this looks like a rehearsal of the para with the same heading on
page 13, but it is not clear to me what it tries to say. If you use the true design
parameters theta, sigma, p1 in the conditional power formula (9) instead of the interim
estimates, then the x-value in figure 2 will indeed move toward the y=x line, but the yvalue will not or less, as it is still the proportion of simulation runs (out of 5000 per dot,
see page 13) which lead to rejection of H0.
Response: The paragraph on page 18 that the Reviewer refers to is discussing the
results of validating the conditional power formula. The goal here is to show that the
empirical conditional power is similar to the conditional power calculated using the
formula. Due to some sample variation from the simulations the points do not line up
perfectly on the y=x line but they are very close. With more simulations the points would
likely line up on the y=x line. We have restricted the paragraph on page 18 to make this
more clear.
Figure 1 shows the validation results of the conditional power formula used in this paper.
The points line up along the y=x line, which implies that the formula we used to calculate
the conditional power was similar to the empirical conditional power. These results give
us confidence that the sample size re-estimation presented in the next section
performed as expected.
12. P 19, line 2-3: Indeed, table 7 shows that positive misspecification (i.e. actual prevalence
more balanced than planned, giving more power, see Figure 1 and equation (4)) also
leads to a larger sample size than planned. This is rather counterintuitive. Do you have a
good explanation for that ?
3
Response: The mean sample size increased when using the sample size re-estimation
procedure when there was a positive missspecification. While we agree with the
Reviewer that this seems counterintuitive, there is a reasonable explanation for this
phenomenon. Simulations of the sample size re-estimation procedure produce a few
results with a very large re-estimated sample size that inflates the mean. In table 7, we
also report the median sample size, which was exactly equal to the original planned
sample size in all simulations, except for one scenario. In this scenario (q=+0.15,
p1=0.10, θ=15), the median sample size was half of the original planned sample size.
We note this phenomenon on pages 22-23 of the Discussion.
In particular, the mean sample size tended to be larger the original planned sample size
in situations where the misspecification was leading to a more balanced design. In
theory, this should increase the power and reduce the sample size needed. However, in
some of the simulations in which the conditional power was low, but did not reach the
futility stopping rule, the new sample size estimated was very large, resulting in an
outlier. These outliers inflated the final mean sample size. To overcome this, we also
reported the median final sample size of the simulations, which was less than or equal to
the originally planned sample size.
13. P 22, line 5: what do you mean with “the most consistent” ? In what respect and
compared to what ? As far as I can tell from your tables, quota sampling gives the most
consistent results (power and type I error risk).
Response: We have clarified this statement on page 22 of the Discussion.
This method resulted in more optimal overall power and type I error estimates than the
modified quota sampling procedure.
14. P 5, line 1-4: I still think this notation is unnecessarily complicated for a dichotomous
prognostic factor. Why not just say: p = the proportion persons with value 1 on the factor,
and 1-p is the proportion with value 0 ?
Response: We appreciate the Reviewer’s concern, but have elected to keep the notation
of p1 so that the reader knows which level of the prognostic factor we refer to in the
paper.
15. P 8 l 5 (step 5): say that this step corrects for the imbalance in the prognostic factor by
first multiplying with the ratio p(1-p) / p1(1-p1), where p = 0.50 (balanced case), and next
multiplying this with 4 because there are four cells in the design.
Response: We appreciate the reviewer’s suggestion, but we think this adds a little bit of
complication to the formula. We have added some description to step 5 to make the
procedure clearer.
*
5. Lastly, to correct for imbalance in the prognostic factor, multiply n ij by
obtain the final total sample size N.
4
1
to
p1 1  p1 
16. P 16 halfway: in short, if the actual prevalence is closer to 50%, the power is higher than
planned, and if the actual prevalence is farther away from 50%, the power is lower than
planned, see also Figure 1 and equation (4).
Response: We have added the Reviewer’s suggested revision to page 16 in the Results
section.
In short, if the actual prevalence is closer to 50%, the power is higher than planned, and
if the actual prevalence is farther away from 50%, the power is lower than planned, see
also Figure 1 and equation 4.
17. P 18, line 7-8: traditional method, not methods, I guess. Also, refer to Figure 1 here
since there we see what the traditional power under misspecification was.
Response: We now refer to traditional method(s) as the traditional one-stage design. We
have added the reference to Figure 1 as suggested.
18. P 18 line -9: as table 6 shows, the type I error was even well below 5%, suggesting room
for power gain by changing the critical values c1 and c2.
Response: We have updated the sentence on page 19 of the Results section.
The empirical type I error was below 5%, suggesting room for power to gain by changing
the critical values of c1 and c2. for all combinations of θ, planned distribution of the
prognostic factor, and misspecification of the distribution of the prognostic factor.
19. P 20, line 5: replace “especially when” with “because”.
Response: We have replaced “especially when” with “because” on page 21 of the
Discussion.
20. P 21, line 2: was compromised, but rarely substantially (Table 3).
Response: We have modified the text on page 21-22 of the Discussion.
However, when the percentage switching was small and there was a negative
misspecification of the distribution of the prognostic factor, the power was compromised,
but rarely substantially.
21. Figure 1: use a different marker for each level of misspecification. In my black-and-white
print I can hardly distinguish colors.
Response: We have updated Figure 1 so that the markers are different and that is blackand-white friendly.
5
Reviewer 2
1. Equation (5) implies that n_{ij} are equal for all i and j, which is only true when p_1=0.5.
Response: We agree with the reviewer’s comment and have modified the sentence
describing the equation. The following text can be found on page x of the Methods
section:
The sample size required for the ith treatment and jth prognostic factor level to detect the
interaction effect described under a balanced design (i.e. p1=0.5) with a two-sided
significance level of α and power equal to 1– β has been previously published by
Lachenbruch [9].
6
Download