Here are three additional concerns on BWC’ variance component analysis of multireader ROC data [2] To simplify notation we assume that the experiment has only one replication. If both readers and cases are treated as random, the general linear model in BWC can be written as follows: Aijk i rj ck (mr )ij (mc)ik (rc) jk ijk , (1) where Aijk is the pseudovalue for the ROC curve area of the modality i, reader j, and case sample k, r j is the random effect due to reader j, effect due to case sample k, and ijk ck is the random represents the error term when the experiment is repeated under identical condition. Our first concern is related to the validity of independent and additive random terms in model (1). Since Aijk is an estimate based on data from all n cases in the sample, we feel that the assumption that reader, case sample effects, and their interaction terms, along with the error term are independent and have an additive effect on Aijk may be too strong. Here are our reasons. Let us assume that the values of the decision variable generating the ROC curve area follow a linear mixed effect model, Yijkt t Rit Ckt ( MR )ijt ( MC )ikt ( RC ) jkt Eijkt , (2) where Yijkt is the value of the decision variable for the modality i, reader j, case k, and the true disease state t. Here Eijkt is the sampling error term, and each of the random terms has mean zero and constant variance, which does not depend on the true disease status.This is the same decision variable model used by both BWC and Roe and Metz (1997) in their respective simulation studies. Since Aijk is a complicated non-linear function of the decision variable values Yij11 , , Yijn1 , Yij12 , , Yijm 2 , we can show that if the model (2) is true, Aijk will not follow the model (1) with additive and independent random effects and error terms. Furthermore, the model (1) assumes that cov( Aijk , Aij ' k ' ) 0 if j j' and k k'. However, we note that Aijk is a non-linear function of Yijl1 , l 1, Yij 'l1 , l 1, , n, Yijl 2 , l 1,..., m and that Aij ' k ' is a non-linear functions of , n, Yij 'l 2 , l 1,..., m . Given that Yijlt and Yij ' lt for any l are correlated, we conclude that Aijk and Aij ' k ' are also correlated and hence that cov( Aijk , Aij ' k ' ) 0. Our second concern is related to the range of Aijk in the model (1). Since Aijk is an estimate for the ROC curve area, its range is usually between 0 and 1. However, random variables in the right side of model (1) can range between - and +.Therefore, a transformation, like a logit, may be more appropriate for the model (1). Our third concern is related to the concept of the random effect due to a case sample. Although the "variability due to cases" seems to be a natural notion, it is difficulty to define precisely and operationalize this notion. For example, how is this variability different from the sampling variability we estimate with the standard error of the area under the ROC curve? Since this notion may be related to the concept of a covariate, a better way to deal with this issue may be to extend the regression framework for the area under the ROC curve proposed by Dodd and Pepe (2003) for independent data to multi-reader-modality data. As for our concerns on the bootstrap method proposed in BWC, we would like to detail our precise concern on the proposed method. Using the same notation as above, we can write Aijk as a function of Yij11 , , Yijn1 , Yij12 , , Yijm 2 , Aijk hi (Yij11 , , Yijm 2 ) . , Yijn1 , Yij12 , Because Yij11 , , Yijn1 , Yij12 , , Yijm 2 are not independent, to derive valid bootstrap estimates we have to modify the standard bootstrap method in the i.i.d case to account for the correlation structure of the data. To understand the issues associated with the BWC’s bootstrap method, we first need to understand what is the definition of a bootstrap estimator for the variance of Aijk when Yij11 , , Yijm 2 are correlated. (For a more detailed description on this, , Yijn1 , Yij12 , see Shao and Tu (1995)). Let Fij ( y11, , yn1 , y12 , ym 2 ) be the joint distribution of Yij11 , estimator for Fij ( y11, ,Yijn1 ,Yij12 , , yn1 , y12 , ym 2 ) using the data Yij11 , , Yijm 2 , and F n ,m be an , Yijn1 , Yij12 , , Yijm 2 . Then, the bootstrap estimator for the variance of Aijk is given by the following formula: VarBOOT Var*[hi (Yij*11 , where {Yij*11 , , Yijn* 1, Yij*12 , , Yijn* 1 , Yij*12 , variance given Yij11 , * , Yijm 2 )], (3) * , Yijm 2 } is a sample from F n , m , and Var* is the conditional , Yijn1 , Yij12 , , Yijm 2 . Since we cannot directly calculate the bootstrap variance estimator given in (3), we need to use a Monte Carlo method for approximating VarBOOT . If we can generate B independent samples {Yij*11b , , Yijn*b1, Yij*12b , Am*b,n hi (Yij*11b , *b , Yijm 2 } ,b=1,…,B, from the estimated F n , m , we calculate , Yijn*b1 , Yij*12b , *b , Yijm 2 )], and approximate VarBOOT by B B b 1 b 1 B VarBOOT (1/ B) ( An*,bm (1/ B) An*,bm ) 2 . From this definition for the bootstrap variance estimator, we see that the validity of the bootstrap variance estimator depends on two closely related assumptions: (1) we can have a consistent estimator F n , m for the joint distribution Fij ( y11, , yn1 , y12 , , ym 2 ) , and (2) we can effectively sample from the estimated joint distribution F n , m . In the case where Yij11 , Fij ( y11, , yn1 , y12 , , Yijn1 , Yij12 , , ym 2 ) Fij ( y11 ) , Yijm 2 are i.i.d., we can write Fij ( ym 2 ) and estimate the univariate distribution function Fij(.) by its empirical distribution. We can also easily generate a bootstrap sample {Yij*11 , replacement from Yij11 , , Yijn* 1 , Yij*12 , , Yijn1 , Yij12 , * , Yijm 2 } by a simple random sampling with , Yijm 2 . However, when Yij11 , , Yijn1 , Yij12 , , Yijm 2 are correlated, how to deal with these two issues is not obvious due to the multivariate nature of Fij ( y11, , yn1 , y12 , , ym 2 ) . The method taken by BWC for estimating this multivariate distribution and generating a bootstrap sample from it was to treat reader and cases as either random or fixed. However, it is not clear to us what is the property of the resulting bootstrap procedure. In addition, the proposed re-sampling method seems to treat diseased and nondiseased patients in the same way. However, as we know the distribution of diseased patients is different from that of a non-diseased patient, they should be treated differently. Finally, we must point out that, despite our concerns, the BWC methods seem to have undergone extensive development, testing and use without serious problems arising. Thus the method appears to be robust enough that the concerns raised in our book and in this letter do not seem to constitute "fatal flaws". References: 1. S. V. Beiden, R. F. Wagner, G. Campbell. Components-of-variance models and multiplebootstrap experiments. Acad Radiol 2000; 7: 341-349. 2. L. E. Dodd and M. S. Pepe (2003). Semi-parametric regression for the area under the receiver operating characteristic curve. JASA, in press. J. Shao and D. Tu (1995). The Jackknife and Bootstrap. Springer, New York. 3.