file - BioMed Central

Methods Estimation of unpaired SSMD SSMD has recently been proposed for measuring the magnitude of difference between two populations [1]. Let random variables P1 and P2 denote two populations of interest and D denote the difference between P1 and P2. Suppose P1 has mean 1 and variance  12 , and P2 has mean  2 and variance  22 . The covariance between these two populations is  12 . Then SSMD (denoted as  ) is defined as the ratio of mean  D to standard deviation  D of the D D difference D, namely    1   2 . When two populations are independent, we are  12  22  2 1 2 interested in unpaired difference between the two populations. The SSMD corresponding to unpaired difference is called “unpaired SSMD”, which is   12  2 2 . If the two independent  1  2 populations have equal variances (namely  12   22   2 ), then   1   2 2 2 . SSMD defined above is a population parameter which needs to be estimated from observed samples. Suppose we have one sample (with sample size n1 , sample mean X 1 and sample standard deviation s1 ) from Population P1 and another independent sample (with n 2 , X 2 and s2 ) from Population P2. Let N = n1 + n2. Zhang [1] derived maximum-likelihood estimate (MLE) and method-of-moment (MM) estimate of unpaired SSMD when the two compared groups have normal distributions with unequal variances. When the two compared groups have normal distributions with equal variance, the uniformly minimal variance unbiased estimate (UMVUE) of unpaired SSMD [2] is, 2    N  2   ˆ  UMVUE  , K  2    N23    N  3.5 when n1  2, n2  2 . 2 2     2   K ( n1  1) s1  ( n 2  1) s 2   2  It is well-known that if two random variables X and U are independently distributed with X ~ N (  , 1) and V ~  2 ( p) , then the ratio Y  VX/ p has a noncentral t-distribution with p X1  X 2 degrees ( n1 1) s12 2 X 1  of freedom ( n2 1) s22 2  X2 Therefore, ~  ( N  2) X 1  X2 ( n1 1) s12 2 (n 1  X noncentrality ( n2 1) s22 2 1 1 n2 1 n1 We 2   2  n12  1) s  (n2  1) s 2 1 . 1 n2  , 1 . ) ( N  2)  X2  1 n1  2 1 1 n1 n 2 (  ) 1 n1  parameter X 1  X 2 ~ N 1   2 , (  ) and  ( n11  n12 ) 2 ~ N   ( namely T  and 2 2 2 ~ noncentral t ( N  2,  ( N  2) 2 1 1 n1 n 2 ), ~ noncentral t ( N  2, 2 1 1 n1 n 2 ). know , that namely In primary HTS experiments, n1  1 for most investigated siRNAs. s12 does not exist when X 11  X 2 where n1  1. In this case, the UMVUE of unpaired SSMD is then ˆ UMVUE  2 2 ( n  1 ) s 2 2 K 2 ( 2 ) K  2   n22 2   n2  2.5  ( 2 )  n 1 .  We know  Therefore, T  X 11  X2 (1  ) s2 2 ~  2 (n2  1)  (1  n12 ) 2 ~ N   X 11  X 2 ~ N 1   2 , (1  n12 ) 2 , namely X 11  X 2  1 n2 ( n2 1) s22 that ~ noncentral t (n2  1, 2 1 n1 and  2 1 n1  ,1 .  2 ) . 2 If set (n1  1) s12  0 when n1  1, then for both n1  1 and n1  2 (i.e., n1  1 ), T (n 1  X2 1 n1  1 n2  2 2 1  1) s1  ( n 2  1) s 2 ( N  2) ̂ UMVUE  b X 2 1 1 n1 n 2 2 K (n 1 X1  X 2 K 2( N 2)  2 ( )  kT where K  2    ( N23 )   N  3.5 ,   N  2 ,  2  N 2  1) s12  (n2  1) s 22 and k  ~ noncentral t ( , b ) and ( n11  n12 ) . Estimation of paired SSMD When two populations are correlated, we are usually interested in paired difference. The SSMD corresponding to paired difference is called “paired SSMD”. Suppose we observe n pairs of samples, ( X 11 , X 21 ), ( X 12 , X 22 ), , ( X 1n , X 2 n ) from populations P1 and P2 respectively. Let Dj be the difference between the jth pair of samples, namely D j  X 1 j  X 2 j . Let D and s D be the sample mean and sample standard deviation of D respectively, namely D  n 1 n  D j and s D2  j 1 n 1 n 1  (D j 1 j  D ) 2 . Assume that D is normally distributed, namely D ~ N ( D ,  ) . Then the MM, MLE and UMVUE of the paired SSMD ( n1 ) ̂ MM  sDD , ˆ MLE  nn1 sDD and ÛMVUE  n2 2 n21 sDD respectively. The proof of ML and ( 2 ) MLE is trivial. The proof of UMVUE is as follows. 2 D When D ~ N ( D ,  D2 ) , there are the following properties: ( D , s D2 ) is a complete sufficient statistic of (  D ,  D2 ) ; D and s D2 are independent with each other; and  2 (n  1) . Based on these properties, we have   n 1 ( n 2 2 ) 1 1 1  2x    x  12 2 E x e dx  and n 1 n 1  (n  1) s 2  2  0 n 1 2  ( ) 2  ( ) 2 2 D D  2  ( n 1) s D2  D2 is distributed to  ( n 1 )  ( n 1 ) 2  D 1 2     E n 2 2 2 E D E n  2  ( 2 )  (n  1) s 2  2 (n  1) s D2  ( 2 )  D D D   n 1 n 1 ( ) ( ) D  n 2 2 n21 sDD . Then ˆ is Set ˆ  n 2 2 2 ( 2 ) ( 2 ) (n  1) s D2    D .  D  a function of the complete sufficient statistic ( D , s D2 ) and ˆ is an unbiased estimate of  . Thus, ˆ is a UMVUE of  . D j ’s are independently distributed with N (  D ,  D2 ) , so D ~ N (  D , 1n  D2 ) , namely nD D ( n 1) s D2  D2 ~  2 (n  1) and ~ N ( n  , 1) . Therefore, nD D ( n 1) s D2  D2 Let T  k (n  1) nD sD ( n21 ) ( n2 2 ) ~ noncentral t (n  1, n  ) , namely nD sD ~ noncentral t (n  1, n  ) . . Then T ~ noncentral t ( , b ) and ̂UMVUE  kT where   n  1 , b  n and 2 n ( n 1) . Confidence interval of SSMD estimates Based on the estimates of SSMD and their distributions derived above, we have T ~ noncentral t ( , b ) and ̂UMVUE  kT for both unpaired and paired SSMDs although   n1  n2  2 , b  2 1 1 n1 n 2 and k  ( n1  n22 2 ) ( n1  n2 3 2 ) 1 n1  n2  2 ( n11  n12 ) in unpaired SSMD and ( ) 2 in paired SSMD. Let Ft ( ,b ) () be the n ( n 1) ( ) cumulative distribution function of noncentral t ( , b ) and Tobs be the observed value of T . Because T ~ noncentral t ( , b ) , we can find  L and  U such that Ft ( ,b L ) (Tobs )  1  2 and Ft ( ,b u ) (Tobs )  2 . Then (  L ,  U ) is a 1   confidence interval   n  1 , b  n and k  n 1 2 n2 2 of SSMD. The variance of a noncentral t ( ,  ) is      ( 21 )  2     2  2   ( )  (b ) 2 . Using ÛMVUE  kT , the variance of  2  2    2      (  1 ) is Var ( ÛMVUE )  k 2 Var T   k 2  2   2  2   (2 )  b 2  2  .  2      variance of T is ̂UMVUE      ( 21 )  2  2     2  2   ( )    . Thus, the  2  2      False negative rate and restricted false positive rate Let us focus on the situation where we want to select siRNAs with large positive effects, namely the siRNAs with   c1 where  denotes SSMD and c1 is the preset lowest value for large effects. In this situation, the FNR is the probability that we conclude   c1 whereas actually   c1 . The maximum FNR in a decision rule is called false negative level (FNL). Traditionally, the false positive rate is the probability that we conclude   c1 whereas actually   c1 . However, in RNAi HTS experiments, scientists are usually interested in controlling the probability of concluding   c1 given   c2 where 0  c2  c1 . This probability is called restricted false positive rate (RFPR) [2, 3]. The maximum RFPR in a decision rule is called restricted false positive level (RFPL). For example, for an observed SSMD value  obs (  obs  0 ), if we select all the siRNAs with ˆ   as hits, the FNR in this process (for c  c  0 ) is obs   1  2      FNR  Pr ˆ   obs |   c1  Pr T  k |   c1  Pr t ( , b )  kobs |   c1  Ft ( ,bc1 ) kobs and the RFPR in this process is RFPR  Pr ˆ   obs |   c2  Pr T  kobs |   c2  Pr t ( , b )  kobs |   c2  1  Ft ( ,bc2 )    obs   ; thus, FNL and RFPL in this process are Ft ( ,bc1 )     and 1  F  obs k t ( ,bc2 )    obs k   respectively.  obs k Similarly, for an observed SSMD value  obs (  obs  0 ), if we select all the siRNAs with ˆ   obs as hits, FNL and RFPL in this process (for c1  c2  0 ) are 1  Ft ( ,bc1 ) kobs and     Ft ( ,bc2 ) kobs respectively. Consequently, when we use SSMD-based ranking method for selecting siRNAs with a large positive value, in the process that we select all the m siRNAs with ˆ   * (  *  0 ) as hits, the FNL and RFPL are Ft ( ,bc1 ) k* and 1  Ft ( ,bc2 ) k* respectively; when we use SSMD-based ranking method for selecting siRNAs with a large negative value, in the process that we select all the m siRNAs with ˆ   * (  *  0 ) as hits, FNL and RFPL are 1  Ft ( ,bc1 ) k* and Ft ( ,bc2 ) k* respectively (Selection Criteria Ia and IIa in Table 1).         Hit selection using SSMD-based testing methods Based on T ~ noncentral t ( , b ) and ˆ  kT , we can determine a selection criterion so that a specific FNL or RFPL can be achieved. To select siRNAs with large positive effects, namely the siRNAs with   c1 ( c1  0 ), the following decision rule (namely Selection Criterion Ib in Table 1) achieves FNL to be  .  if ˆ  kQt ( ,bc1 ) ( 1 )  declare a hit , , where Qt ( ,bc1 ) ( 1 ) is the Decision Rule Ib :  ˆ  kQ not declare a hit , if  (  )  t ( ,bc1 ) 1   1 quantile of t ( , bc1 ) . The reason is as follows. The FNR for Decision Rule Ib is the ( ) (i.e., not declaring a hit) given   c . Hence, probability that ˆ  kQ t ( ,bc1 )  1 FNR  Pr ˆ  kQt ( ,bc1 ) ( 1 ) |   c1   Pr t ( , b )  Q  Pr t ( , bc )  Q  Pr T  Qt ( ,bc1 ) ( 1 ) |   c1 1   t ( ,bc1 ) ( 1 ) |   c1 t ( ,bc1 ) ( 1 )   1  1  Therefore, FNL   when using Selection Criterion Ib. Using Decision Rule Ib, the RFPR with respect to (w.r.t.) c2 and c1 (0  c2  c1 ) is RFPR  Pr ˆ  kQ ( ) |   c  t ( ,bc1 )   Pr t ( , b )  Q  Pr t ( , bc )  Q Q  1 F 1 2  Pr T  Qt ( ,bc1 ) ( 1 ) |   c 2  ( )  ( )  t ( ,bc1 ) ( 1 ) t ( ,bc1 ) 2 t ( ,bc2 ) t ( ,bc1 )   1 1 , where Ft ( ,bc2 )  is the cumulative distribution function of t ( , bc2 ) . Therefore,   RFPL  1  Ft ( ,bc2 ) Qt ( ,bc1 ) ( 1 ) when using Selection Criterion Ib. Similarly, we obtain Selection Criteria Ic, IIb and IIc and their FNLs and RFPLs listed in Table 1. References 1. Zhang XD: A pair of new statistical parameters for quality control in RNA interference high-throughput screening assays. Genomics 2007, 89:552-561. 2. Zhang XD: A new method with flexible and balanced control of false negatives and false positives for hit selection in RNA interference high-throughput screening assays. Journal of Biomolecular Screening 2007, 12:645-655. 3. Zhang XD, Ferrer M, Espeseth AS, Marine SD, Stec EM, Crackower MA, Holder DJ, Heyse JF, Strulovici B: The use of strictly standardized mean difference for hit selection in primary RNA interference high-throughput screening experiments. Journal of Biomolecular Screening 2007, 12:497-509.

file - BioMed Central

Related documents

Products

Support

file - BioMed Central

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib