Link between non-centrality parameter and effect size for the region-based score test Adopting the notations as in the article, let us consider the element of the score vector ππ = π ∑π=1(π¦π − πΜ )(πππ − πΜ π ) , π = 1, … , πΏ. Denote the number of cases, controls and total sample size as π π΄ , π π and π = π π΄ + π π . After algebraic transformations it can be shown that: ππ = 2π π΄ π π (ππ+ − ππ− )/π (1) where ππ+ and ππ− are the observed MAF in cases and controls, respectively, for the πth SNP. Then, the vector π = πΆπ is asymptotically distributed as a multivariate random vector with the unit covariance matrix and mean πΆπΈ(π), where πΈ(π) is the mathematical expectation of the score vector, which can be written as follows: πΈ(π) = πΈ({ππ }πΏπ=1 ) = {2π π΄ π π (πππ+ − πππ− )/π}πΏπ=1 , (2) where πππ+ and πππ− are population MAF in cases and controls of the πth variant. If we denote as π π relative risk of the πth SNP and assume low prevalence of a disease it follows [1]: πππ+ π π πππ− = . (1 + (π π − 1)πππ− ) (3) The score test statistic is the sum of squares of elements of vector π. If we define vector π = {ππ }πΏπ=1 = πΈ(π) = πΆπΈ(π), then the non-centrality parameter (NCP) of the score test statistic under the alternative hypothesis is: πΏ π = ∑ ππ2 (4) π=1 Under the null hypothesis of no variant being associated with a phenotype, which is equivalent to π π = 1, π = 1, … , πΏ, it follows from (3) that πππ+ = πππ− , which implies from (2) πΈ(ππ ) = 0, π = 1, … , πΏ and π = πΆπΈ(π) = {0}πΏπ=1; thus, π = 0. -1- Description of the assumptions for illustration of connection between non-centrality parameter and effect size From the considerations above, it can be seen that NCP π is a function of the number of cases π π΄ and controls π π in the study, relative risk of each variant π π , π = 1, … , πΏ; population MAF in controls πππ− , π = 1, … , πΏ; and covariance matrix of the score test statistic π (since matrix πΆ = (π΄π )−1 , where π = π΄π π΄). To illustrate the dependence between NCP and relative risk let us assume independence of variants within the region, which implies the matrix π is diagonal. Thus, π = ππππ({π£π }πΏπ=1 ) where π£π = π£ππ(ππ ) is variance of the πth SNP in our sample. It follows that π£ππ(ππ ) = 2π π (1 − πππ− )πππ− + 2π π΄ (1 − πππ+ )πππ+ , which is variance of the sum of two independent binomial random variables with the number of draws 2π π and 2π π΄ and the probability of success πππ− and πππ+ respectively. It follows: C = ππππ(1/√2π π΄ (1 − πππ+ )πππ+ + 2ππ (1 − πππ− )πππ− , π = 1, … , πΏ). (5) So, given population MAF of causal variants in controls πππ− , relative risk of causal variants π π , the number of cases π π΄ and controls π π , we can calculate the corresponding NCP π according to the following algorithm: 1. calculate πππ+ – population MAF in cases from (3) 2. calculate πΈ(π) – the expectation of the score vector π from (2) 3. obtain matrix πΆ from (5) 4. calculate vector π = {ππ }πΏπ=1 = πΆπΈ(π) 5. obtain NCP π from (4). For the purpose of illustration, let us assume π π΄ = π π = 500, population MAF and relative risk of all causal variants are equal. Additional File 7 depicts the non-centrality parameter (vertical -2- axis) as a function of relative risk (horizontal axis) and the number of causal variants (lines within each panel). Population MAF of causal variants in controls was the following: Panel 1 – 1%, Panel 2 – 0.5%, Panel 3 – 0.25%, Panel 4 – 0.125%. As can be seen, the non-centrality parameter monotonically increases with increasing relative risk, population MAF in controls and the number of causal variants within a region. References 1. Sul JH, Han B, He D, Eskin E: An optimal weighted aggregated association test for identification of rare variants involved in common diseases. Genetics 2011, 188(1):181-188. -3-