Reviewer 1 - BioMed Central

Reviewer 1 1. Line 3 of methods (Abstract): “traditional methods or standard methodology” is still very unspecific. Why not say e.g. “traditional one-stage sample size calculation” ? Are there really multiple traditional methods, or is there just one ? Response: The requested change has been made to the Abstract. We examined the impact of the distribution of the dichotomous prognostic factor on power and sample size for the interaction effect using traditional one-stage sample size calculation. 2. Line 4 of results (Abstract): what does negative misspecification mean ? Say e.g. misspecification such that the actual distribution was more skewed than planned. After all, given that your planned distribution was always with p =< 0.50, negative misspecification implied more imbalance. With p > 0.50 it would have meant more balance. Response: The requested change has been made to the Abstract. Misspecification such that the actual distribution of the prognostic factor was more skewed than planned led to a decrease in power with the greatest loss in power seen as the distribution of the prognostic factor became less balanced. 3. P 1 bottom – p 2 top: This depends on the scaling of the interaction effect. In this paper the interaction effect is defined in equation (2) as a contrast of four cell means, with contrast coefficients -1 and +1. However, main effects are usually defined as contrasts with coefficients -0.5 and +0.5: for the main effect of treatment: (mu11+mu12)/2 minus (mu21+mu22)/2 for the main effect of the prognostic factor: (mu11+mu21)/2 minus (mu12+mu22)/2.Using these same coefficients for the interaction, by changing equation (2) into: (mu11 + mu22)/2 minus (mu12+mu21)/2, we see that the interaction effect and the standard error of its estimator (i.e. the square root of equation (4)) are both divided by a factor 2, making them comparable to those for the main effects. Your statement on p 1-2, referring to Brookes, thus depends strongly on how interaction is scaled. With your scaling in eq (2) the interaction indeed needs to be twice as large as the main effect to have the same power. With my scaling, which is usual in ANOVA, it does not. Response: We have clarified how the interaction effect is defined in the Introduction. The following text was added on page 1: Two articles by Brookes and colleagues showed that there is low power to detect an interaction, scaled as a contrast of cell means, when a study is powered only to detect the main effect unless size of the interaction effect is nearly twice as large as the main effect [6, 7]. 4. P 8 equation (7): - give a proof or reference for this equation, and say that (p1+q) = actual prevalence. - replace nij-1 with N-4 as df for the t-statistic. P 11 halfway: so n1 = nt/2 ? if so, then say this. If not, then clarify in view of page 10 saying that interim analysis was done after the first N/2 subjects (where N = nt, I assume). 1 Response: We have updated the degrees of freedom in equation 7 and have clarified the description on page 8 of the Methods section. The effect of misspecifying the distribution of the prognostic factor was evaluated using power curves. The formula used by Lachenbruch was extended to incorporate the Student’s t-distribution [9]. Power for the interaction test, where Ψ is the cdf of the Student’s t-distribution, by actual prevalence of the prognostic factor (p1 + q) and magnitude of the interaction effect was calculated using equation 7 below. 𝑁(𝑝1 + 𝑞)(1 − 𝑝1 − 𝑞)𝜃 2 𝑃𝑜𝑤𝑒𝑟 = 1 − Ψ [√ − 𝑡(𝛼⁄ ,𝑁−4) ] 2 4𝜎 2 5. P 12, line -8: equation 10 must be equation (9) Response: The Reviewer is correct and we have made the change on page 12. The final sample size n2 and final critical value c2 were chosen so that the conditional power formula shown in equation 9 was equal to 80%. 6. P 13 halfway: proportion k1 must be p1, or “proportion at level k1” Response: The Reviewer is correct and we have made the change on Page 13. Five thousand replications were performed for each combination of the interaction effect and proportion at level k1. 7. P 13 line 3-4 from bottom: how can you expect quota sampling to have the same power as the traditional one-stage sampling without quota ? As you say on page 9 quota sampling prevents misspecification. And as you show on page 15 it preserves power (table 1) which the traditional method does not (figure 1). Response: We would expect quota sampling and traditional methods to have the same power on average on under no misspecification. We have clarified the statement on Page 13. While we did not expect the quota sampling method to have power or type I error estimates that differ from the traditional one-stage samling design under no misspecifcation, we conducted the simulation study for this study design method to confirm there was no impact on power and type I error. 8. P 5, line 6-12: make explicit that, given your restriction of the planned distribution to p = 0.10 to 0.50, negative misspecification means less balance/more skew, and positive misspecification means more balance/less skew. This makes clear why negative misspecification reduces power and positive misspecification increases power for the traditional one-stage sample size planning, see your Figure 1 and your equation (4). Response: We have clarified the definition of misspecification of the prognostic factor distribution on page 5. 2 The misspecification of the prognostic factor was defined by the parameter q. The misspecification could be positive or negative with negative misspecification implying less balance (more skew) and positive misspecification implying more balance (less skew). 9. P 6, last lines: add a comment that, as eq (4) shows, the variance of the interaction effect estimator decreases, and so power increases, as p1 is closer to 0.50. This clarifies the different effects of negative and positive misspecification in Figure 1. Response: We have added the comment to Page 7 in the Method section. It is clear from the equation as the prevalence of the prognostic factor (p1) increases, the variance decreases, which would imply that the power increases for a fixed sample size. 10. P 7, l 2: j-th factor must be j-th factor level. Response: We added the word level to Page 7 in the Methods section. The sample size required for the ith treatment and jth prognostic factor level to detect the interaction effect described under a balanced design (i.e. p1=0.5) with a two-sided significance level of α and power equal to 1– β has been previously published by Lachenbruch [9]. 11. P 18, 2nd paragraph: this looks like a rehearsal of the para with the same heading on page 13, but it is not clear to me what it tries to say. If you use the true design parameters theta, sigma, p1 in the conditional power formula (9) instead of the interim estimates, then the x-value in figure 2 will indeed move toward the y=x line, but the yvalue will not or less, as it is still the proportion of simulation runs (out of 5000 per dot, see page 13) which lead to rejection of H0. Response: The paragraph on page 18 that the Reviewer refers to is discussing the results of validating the conditional power formula. The goal here is to show that the empirical conditional power is similar to the conditional power calculated using the formula. Due to some sample variation from the simulations the points do not line up perfectly on the y=x line but they are very close. With more simulations the points would likely line up on the y=x line. We have restricted the paragraph on page 18 to make this more clear. Figure 1 shows the validation results of the conditional power formula used in this paper. The points line up along the y=x line, which implies that the formula we used to calculate the conditional power was similar to the empirical conditional power. These results give us confidence that the sample size re-estimation presented in the next section performed as expected. 12. P 19, line 2-3: Indeed, table 7 shows that positive misspecification (i.e. actual prevalence more balanced than planned, giving more power, see Figure 1 and equation (4)) also leads to a larger sample size than planned. This is rather counterintuitive. Do you have a good explanation for that ? 3 Response: The mean sample size increased when using the sample size re-estimation procedure when there was a positive missspecification. While we agree with the Reviewer that this seems counterintuitive, there is a reasonable explanation for this phenomenon. Simulations of the sample size re-estimation procedure produce a few results with a very large re-estimated sample size that inflates the mean. In table 7, we also report the median sample size, which was exactly equal to the original planned sample size in all simulations, except for one scenario. In this scenario (q=+0.15, p1=0.10, θ=15), the median sample size was half of the original planned sample size. We note this phenomenon on pages 22-23 of the Discussion. In particular, the mean sample size tended to be larger the original planned sample size in situations where the misspecification was leading to a more balanced design. In theory, this should increase the power and reduce the sample size needed. However, in some of the simulations in which the conditional power was low, but did not reach the futility stopping rule, the new sample size estimated was very large, resulting in an outlier. These outliers inflated the final mean sample size. To overcome this, we also reported the median final sample size of the simulations, which was less than or equal to the originally planned sample size. 13. P 22, line 5: what do you mean with “the most consistent” ? In what respect and compared to what ? As far as I can tell from your tables, quota sampling gives the most consistent results (power and type I error risk). Response: We have clarified this statement on page 22 of the Discussion. This method resulted in more optimal overall power and type I error estimates than the modified quota sampling procedure. 14. P 5, line 1-4: I still think this notation is unnecessarily complicated for a dichotomous prognostic factor. Why not just say: p = the proportion persons with value 1 on the factor, and 1-p is the proportion with value 0 ? Response: We appreciate the Reviewer’s concern, but have elected to keep the notation of p1 so that the reader knows which level of the prognostic factor we refer to in the paper. 15. P 8 l 5 (step 5): say that this step corrects for the imbalance in the prognostic factor by first multiplying with the ratio p(1-p) / p1(1-p1), where p = 0.50 (balanced case), and next multiplying this with 4 because there are four cells in the design. Response: We appreciate the reviewer’s suggestion, but we think this adds a little bit of complication to the formula. We have added some description to step 5 to make the procedure clearer. * 5. Lastly, to correct for imbalance in the prognostic factor, multiply n ij by obtain the final total sample size N. 4 1 to p1 1  p1  16. P 16 halfway: in short, if the actual prevalence is closer to 50%, the power is higher than planned, and if the actual prevalence is farther away from 50%, the power is lower than planned, see also Figure 1 and equation (4). Response: We have added the Reviewer’s suggested revision to page 16 in the Results section. In short, if the actual prevalence is closer to 50%, the power is higher than planned, and if the actual prevalence is farther away from 50%, the power is lower than planned, see also Figure 1 and equation 4. 17. P 18, line 7-8: traditional method, not methods, I guess. Also, refer to Figure 1 here since there we see what the traditional power under misspecification was. Response: We now refer to traditional method(s) as the traditional one-stage design. We have added the reference to Figure 1 as suggested. 18. P 18 line -9: as table 6 shows, the type I error was even well below 5%, suggesting room for power gain by changing the critical values c1 and c2. Response: We have updated the sentence on page 19 of the Results section. The empirical type I error was below 5%, suggesting room for power to gain by changing the critical values of c1 and c2. for all combinations of θ, planned distribution of the prognostic factor, and misspecification of the distribution of the prognostic factor. 19. P 20, line 5: replace “especially when” with “because”. Response: We have replaced “especially when” with “because” on page 21 of the Discussion. 20. P 21, line 2: was compromised, but rarely substantially (Table 3). Response: We have modified the text on page 21-22 of the Discussion. However, when the percentage switching was small and there was a negative misspecification of the distribution of the prognostic factor, the power was compromised, but rarely substantially. 21. Figure 1: use a different marker for each level of misspecification. In my black-and-white print I can hardly distinguish colors. Response: We have updated Figure 1 so that the markers are different and that is blackand-white friendly. 5 Reviewer 2 1. Equation (5) implies that n_{ij} are equal for all i and j, which is only true when p_1=0.5. Response: We agree with the reviewer’s comment and have modified the sentence describing the equation. The following text can be found on page x of the Methods section: The sample size required for the ith treatment and jth prognostic factor level to detect the interaction effect described under a balanced design (i.e. p1=0.5) with a two-sided significance level of α and power equal to 1– β has been previously published by Lachenbruch [9]. 6

Reviewer 1 - BioMed Central

Related documents

Products

Support

Reviewer 1 - BioMed Central

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib