Power Comparison of Statistical Tests of Association with Multiple SNPs with the GAW16 Rheumatoid Arthritis Data Wei Zhong, Wei Pan§ Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 § Corresponding author Email addresses: Wei Pan: weip@biostat.umn.eduAbstract Due to weak association strengths between complex traits and causal DNA variants, it is important to evaluate statistical power of any test in genome-wide association studies. In this paper, we consider some well-known or newly claimed winners of single-locus and multilocus tests as applied to a block of SNPs, possibly in linkage disequilibrium (LD), for the case-control study design. Taking advantage of the large sample size of the GAW16 Rheumatoid Arthritis (RA) data and several wellestablished RA-associated genes or chromosome regions, we use repeated subsamples of the data to estimate the power and thus compare the performance of the tests under practical situations. The results confirm the well-known fact that there does not exist a uniformly most powerful test across all scenarios. Hence, we recommend the use of multiple tests, including the univariate single-locus test and four multilocus tests: the Sum test, a multivariate global test (e.g. the score test), the SumSqU and SumSqUw tests. Background Given that most complex diseases or traits are only weakly associated with causal DNA variants, the statistical power of a test being used to analyze genome-wide association (GWA) data is critical. In this paper, we consider some well-known or newly claimed winners of single-locus or multilocus tests as applied to multiple SNPs, e.g., in a sliding window or in a linkage disequilibrium (LD) block, for the case-control study design. A very recent study by Chapman and Whittaker (2008) indicated that a univariate test or an empirical Bayes test proposed by Goeman et al (2006) was often most powerful among several commonly used tests across a range of scenarios; in particular, they were more powerful than a so-called Sum test that is equivalent to regressing the response (i.e. disease status) on the sum of the coded SNPs, and than a weighted score test (WST) (Wang and Elston 2007); however, they did not consider simulation set-ups of Wang and Elston (2007), which was considered by Pan (2008). Pan's results indicated that under some simulation set-ups (as considered by Wang and Elston) and for some simulated disease-causing genes in the HapMap data, the Sum test was most powerful, though the univariate test and the Goeman test also worked well. However, arguably due to certain arbitrariness in simulated data and the relatively small sample size of the HapMap data, some doubts may linger on the practical relevance of the above conclusions. Here, taking advantage of the large sample size of the GAW16 Rheumatoid Arthritis (RA) data and several well-established RA-associated genes or chromosome regions, we use random subsamples of the data to estimate the power and thus compare the performance of several tests under practical situations. Methods Statistical tests Suppose that we have m independent observations (Yi , X i), where subject i has trait (e.g. disease status) value Yi and genotype X i= (X i1,… ,X ik)'. As in Wang and Elston (2007), we consider the dosage coding of X ij for an additive model: X ij = 0, 1 or 2, representing the copy number of one of the two alleles present in SNP j of subject i. With a binary trait, we use a joint logistic model: k Logit Pr( Y i 1) 0 X ij j (1) j 1 where Yi = 0 or 1 indicates whether subject i is a control (i.e. without disease) or a case (i.e. with disease). A global test of any possible association between the trait and SNPs can be formulated as jointly testing on j ’s with the null hypothesis H0: 1 = 2 =…= k =0 by the likelihood ratio test (LRT), Wald test or score test in the context of logistic regression (or more generally, of GLM); under H0, any of the three test statistics has an asymptotically chi-squared distribution with degrees of freedom DF=k. Here we use the LRT and call it the logistic-global (L-G) test. For a large k, the test can be low-powered because of the cost of the large DF. In contrast to the global test, another extreme is to conduct univariate or SNP-by-SNP tests: rather than including all the k SNPs, we include only one SNP in a marginal logistic model at one time: Logit Pr( Y i 1) 0 X ij j and test Hj,0: j = 0 for each j=1,...,k sequentially. Each test can be done with only one DF, but a multiple test adjustment has to be made, e.g., based on permutation or the Bonferroni adjustment. Because the Bonferroni adjustment is known to be conservative, permutation is more widely used, though it is computationally more demanding; we call the univariate test based on the Bonferroni adjustment or permutation as the U-B and U-P tests respectively. The basic idea of the Sum test is to make a compromise between joint modeling and its resulting large DF: while all the SNPs in the block are to be used, we make a key and possibly incorrect working assumption that the SNPs are all equally associated with the trait; that is, rather than unnecessarily aiming to estimate separate j ’s in (1), we use a common c in the logistic regression model: k Logit Pr( Y i 1) 0 X ij c j 1 (2) To address the question of whether there is any association of Yi with any Xij, we test H0: c =0, which can be easily done by the LRT, Wald test or score test in fitting model (2). The resulting test is equivalent to summing up all the SNP predictors and conducting a univariate test with the resulting new predictor. Before applying the Sum test (as for the WST), one may need to flip the codings of some SNPs to maximize the number of positively correlated SNP pairs (Wang and Elston 2007), though it is unclear how to do so optimally. Pan (2008) proposed an algorithm: i) starting from any initial coding, calculate all pairwise correlations; ii) select the SNP, say s, that has the largest number, say ns, of negative correlations with other SNPs; iii) if ns > #SNP/2, then flip the coding for SNP s (i.e. replace Xis by 2-Xis) and go to step i); otherwise, stop. We call the above algorithm as method “>” because of the inequality being used in step iii). A reasonable modification, called method “≥”, is to replace “>” by “≥” in step iii). Goeman et al (2006) proposed an empirical Bayes method to test on a large number of parameters, as applicable to the j ’s in the joint logistic regression model (1). A key idea is to treat j ’s as random, assuming a priori that = ( 1 ,…, k )' has a distribution with mean E( ) = 0 and covariance Cov( ) = σ2I. Thus, to test on the original H0: =0, one can instead test on a new H0: σ2 = 0 with a test statistic 1 1 T Go ( Y Y )' XX ' ( Y Y ) Y ( 1 Y ) Trace (( X X )' ( X X )) , 2 2 where Y and X are the response vector and the design matrix respectively, and Y and X are the sample means of the response and X respectively. The null distribution is unknown and has to be estimated by permutation or simulation. Pan (2008) proposed a test called SumSqU as the sum of squared score statistics and its weighted version called SumSqUw. The two tests are modifications to the standard score test. Compared to the standard score test, the SumSqU test statistic ignores the covariance matrix of the score statistic while the SumSqUw statistic uses only the diagonal elements of the covariance matrix. Pan (2008) found that the two tests (along with other two as the sums of squared coefficient estimates) performed similarly to each other and to Goeman's test in many situations; a distinguishing advantage of the former two tests is that their asymptotic null distributions are available and thus can be easily used to obtain p-values. Data We considered only a ±10Mb region around each of the three well-known RAassociated loci: PTPN22, MHC and TRAF-C5 loci (Plenge et al 2007) on chromosomes 1, 6 and 9 respectively (Plenge et al 2007). First, we applied the univariate SNP-by-SNP test (with the dosage coding of genotypes) to the three regions with the whole GAW16 Rheumatoid Arthritis (RA) subject sample, identifying four peaks: SNP rs2476601 on chromosome 1 with -log P=10.9; SNP rs2395175 on chromosome 6 with -log P=89.1; SNP rs2900180 on chromosome 9 with -log P=8.1; SNP rs872863 on chromosome 9 with -log P=10.5. The first three peaks corresponded to the three well-known RA-associated PTPN22, MHC and TRAF-C5 loci (Plenge et al 2007); because SNP rs872863 gave a smaller p-value than rs2900180, and the two SNPs were close to each other, we also included the former. Second, we ran Haploview on the HapMap CEU data to obtain an LD block around each peak; rs2395175 did not belong to any block and was only 150bp to an LD block to be used here, while each of the other three peak loci was included in an LD block. The four LD blocks were sized at 103Kb, 23Kb, 100Kb and 28Kb with the starting/ending SNPs in the HapMap data as rs10858018/rs2488457, rs743862/rs2395174, rs10985070/rs2269066 and rs4838054/rs7865976 respectively. There were 10, 16, 16 and 10 SNPs in the four LD blocks in the GAW16 RA data. Third, we removed any SNP with minor allele frequency (MAF) ≤ 0.05, ending up with k = 10, 15, 14 and 8 SNPs inside the four LD blocks in the GAW16 RA data respectively. Because of the highly significant p-values in the four LD blocks, corresponding to three well-known RA-associated loci (PTPN22, MHC, TRAF-C5), and related biological arguments (Plenge et al 2007), it is reasonable to assume that at least one SNP inside or near each LD block was associated with the disease. Thus, to investigate the power of various tests, we randomly subsampled n = 200 subjects from the cases and n = 200 subjects from the controls respectively, then applied each test to the resulting subsample. We repeated this process 1000 times. The proportion of the rejections of the null hypothesis (i.e. no association between RA and any SNP) by a test was its empirical power. Results The empirical power of each test is shown in Table 1. Two sets of the results were available for the Sum test, corresponding to the use of method “>” or “≥” in selecting SNP codings. It is interesting to see that for the region in chromosome 1, the U-P test was most powerful; surprisingly, even the often claimed conservative U-B was more powerful than the other tests. The U-P test was much more powerful than the others. The L-G test was ranked third, followed by SumSqUw, then Goeman's and SumSqU tests, and finally by the Sum test with minimal power. For chromosome 6, because of too strong signals, all the tests had power 1. For the region Chrom 9.1, the Goeman and SumSqU tests were most powerful, closely followed by the SumSqUw and Sum tests, then by U-P, U-B and finally by L-G; there was substantial power loss with U-B, as compared to U-P. For Chrom 9.2, the Sum test with method “>” and LG were the winner, followed by U-P and U-B; it is not clear why the Sum(≥), SumSqU and Goeman's tests had such a minimum power for this region. The high power of Sum(>) was obtained somewhat incidentally: starting from a different initial coding and applying the same algorithm with “>”, the resulting Sum test yielded an empirical power similar to that of the Sum(≥), as for other three regions, illustrating the possible sensitivity of the Sum test to SNP coding and a limitation of the proposed algorithm to select SNP codings. In summary, no test was uniformly most powerful; in particular, although the good performance of using either of the U-P and Goeman tests was confirmed, they might still lose to the Sum or G-L test. In addition, the SumSqU and Goeman tests had almost an equal power, while the SumSqUw test could be more powerful than the SumSqU and Goeman tests. Conclusions As expected, there does not exist a single uniformly most powerful test across all scenarios. Although in general, Goeman's test, the SumSqU and SumSqUw tests performed well across various scenarios in previous studies (Chapman and Whittaker 2008; Pan 2008), the Sum test (and the global multivariate test) may be more powerful under some practical situations, as demonstrated here with the GAW16 RA data. A main limitation of the Sum test is its possibly diminishing power when the associations between the trait and some SNPs are in opposite directions, hence its performance may be sensitive to the chosen coding of SNPs in an unknown way, as shown for Chrom 9.2 region; further studies are needed to sort out how to code SNPs for the Sum test. Nevertheless, additional advantages of the Sum test include its generality and simplicity: it is equally applicable to any applications with any generalized linear models; in particular, in contrast to Goeman's test, it does not require the use of permutation to calculate p-values. These advantages are also shared by the SumSqU and SumSqUw tests, which have similar power to that of Goeman's test across a wide range of scenarios. For these reasons, rather than dismissing the use of the Sum test (and the global test) as done by Chapman and Whittaker (2008), we recommend their use along with the univariate test, the SumSqU and SumSqUw tests. Acknowledgements The Genetic Analysis Workshops are supported by NIH grant R01 GM031575 from the National Institute of General Medical Sciences. This research was partially supported by NIH grants GM081535 and HL65462. References 1. Chapman JM, Whittaker J: Analysis of multiple SNPs in a candidate gene or region. Genetic Epidemiology 2008, 32(6): 560-566 2. Daly MJ, Rioux JD, Schaffner SF, Hudson TJ, Lander ES: High-resolution haplotype structure in the human genome. Nat Genet 2001, 29: 229-232. 3. Goeman JJ, van de Geer SAvan Houwelingen HC: Testing against a high dimensional alternative. J R Stat Soc B 2006, 68: 477-493. 4. Pan W: Asymptotic tests of association with multiple SNPs in linkage disequilibrium. Research Report 2008, Division of Biostatistics, University of Minnesota. Available at http://www.biostat.umn.edu/rrs.php 5. Plenge RM et al: TRAF1-C5 as a Risk Locus for Rheumatoid Arthritis--A Genomewide Study. N Engl J Med 2007, 357:1199-209. 6. Wang T, Elston RC: Improved power by use of a weighted score test for linkage disequilibrium mapping. Am J Hum Genet 2007, 80: 353-360. Tables Table 1 - Empirical power of various tests with nominal significance level α = 0.05. LD block U-B U-P L-G Go Sum(>) Sum(≥) SumSqU SumSqUw Chrom 1 0.609 0.694 0.580 0.407 0.109 0.029 0.406 0.488 Chrom 6 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 Chrom 9.1 0.462 0.649 0.442 0.771 0.730 0.750 0.770 0.739 Chrom 9.2 0.396 0.550 0.845 0.061 0.846 0.021 0.063 0.143