rr2008-015 - School of Public Health

advertisement
Power Comparison of Statistical Tests of Association
with Multiple SNPs with the GAW16 Rheumatoid
Arthritis Data
Wei Zhong, Wei Pan§
Division of Biostatistics, School of Public Health, University of Minnesota,
Minneapolis, MN 55455
§
Corresponding author
Email addresses:
Wei Pan: weip@biostat.umn.eduAbstract
Due to weak association strengths between complex traits and causal DNA variants, it
is important to evaluate statistical power of any test in genome-wide association
studies. In this paper, we consider some well-known or newly claimed winners of
single-locus and multilocus tests as applied to a block of SNPs, possibly in linkage
disequilibrium (LD), for the case-control study design. Taking advantage of the large
sample size of the GAW16 Rheumatoid Arthritis (RA) data and several wellestablished RA-associated genes or chromosome regions, we use repeated subsamples
of the data to estimate the power and thus compare the performance of the tests under
practical situations. The results confirm the well-known fact that there does not exist a
uniformly most powerful test across all scenarios. Hence, we recommend the use of
multiple tests, including the univariate single-locus test and four multilocus tests: the
Sum test, a multivariate global test (e.g. the score test), the SumSqU and SumSqUw
tests.
Background
Given that most complex diseases or traits are only weakly associated with causal
DNA variants, the statistical power of a test being used to analyze genome-wide
association (GWA) data is critical. In this paper, we consider some well-known or
newly claimed winners of single-locus or multilocus tests as applied to multiple
SNPs, e.g., in a sliding window or in a linkage disequilibrium (LD) block, for the
case-control study design. A very recent study by Chapman and Whittaker (2008)
indicated that a univariate test or an empirical Bayes test proposed by Goeman et al
(2006) was often most powerful among several commonly used tests across a range of
scenarios; in particular, they were more powerful than a so-called Sum test that is
equivalent to regressing the response (i.e. disease status) on the sum of the coded
SNPs, and than a weighted score test (WST) (Wang and Elston 2007); however, they
did not consider simulation set-ups of Wang and Elston (2007), which was considered
by Pan (2008). Pan's results indicated that under some simulation set-ups (as
considered by Wang and Elston) and for some simulated disease-causing genes in the
HapMap data, the Sum test was most powerful, though the univariate test and the
Goeman test also worked well. However, arguably due to certain arbitrariness in
simulated data and the relatively small sample size of the HapMap data, some doubts
may linger on the practical relevance of the above conclusions. Here, taking
advantage of the large sample size of the GAW16 Rheumatoid Arthritis (RA) data
and several well-established RA-associated genes or chromosome regions, we use
random subsamples of the data to estimate the power and thus compare the
performance of several tests under practical situations.
Methods
Statistical tests
Suppose that we have m independent observations (Yi , X i), where subject i has trait
(e.g. disease status) value Yi and genotype X i= (X i1,… ,X ik)'. As in Wang and Elston
(2007), we consider the dosage coding of X ij for an additive model: X ij = 0, 1 or 2,
representing the copy number of one of the two alleles present in SNP j of subject i.
With a binary trait, we use a joint logistic model:

k

Logit
Pr(
Y
i
1)

0

X
ij
j

(1)
j

1
where Yi = 0 or 1 indicates whether subject i is a control (i.e. without disease) or a
case (i.e. with disease). A global test of any possible association between the trait and
SNPs can be formulated as jointly testing on j ’s with the null hypothesis H0:
 1 =  2 =…= k =0 by the likelihood ratio test (LRT), Wald test or score test in the
context of logistic regression (or more generally, of GLM); under H0, any of the three
test statistics has an asymptotically chi-squared distribution with degrees of freedom
DF=k. Here we use the LRT and call it the logistic-global (L-G) test. For a large k,
the test can be low-powered because of the cost of the large DF.
In contrast to the global test, another extreme is to conduct univariate or SNP-by-SNP
tests: rather than including all the k SNPs, we include only one SNP in a marginal
logistic model at one time:
Logit
Pr(
Y
i
1)


0

X
ij

j
and test Hj,0: j = 0 for each j=1,...,k sequentially. Each test can be done with only one
DF, but a multiple test adjustment has to be made, e.g., based on permutation or the
Bonferroni adjustment. Because the Bonferroni adjustment is known to be
conservative, permutation is more widely used, though it is computationally more
demanding; we call the univariate test based on the Bonferroni adjustment or
permutation as the U-B and U-P tests respectively.
The basic idea of the Sum test is to make a compromise between joint modeling and
its resulting large DF: while all the SNPs in the block are to be used, we make a key
and possibly incorrect working assumption that the SNPs are all equally associated
with the trait; that is, rather than unnecessarily aiming to estimate separate j ’s in (1),
we use a common c in the logistic regression model:

k

Logit
Pr(
Y
i
1)

0

X
ij
c

j

1
(2)
To address the question of whether there is any association of Yi with any Xij, we test
H0: c =0, which can be easily done by the LRT, Wald test or score test in fitting
model (2). The resulting test is equivalent to summing up all the SNP predictors and
conducting a univariate test with the resulting new predictor. Before applying the Sum
test (as for the WST), one may need to flip the codings of some SNPs to maximize the
number of positively correlated SNP pairs (Wang and Elston 2007), though it is
unclear how to do so optimally. Pan (2008) proposed an algorithm: i) starting from
any initial coding, calculate all pairwise correlations; ii) select the SNP, say s, that has
the largest number, say ns, of negative correlations with other SNPs; iii) if ns >
#SNP/2, then flip the coding for SNP s (i.e. replace Xis by 2-Xis) and go to step i);
otherwise, stop. We call the above algorithm as method “>” because of the inequality
being used in step iii). A reasonable modification, called method “≥”, is to replace
“>” by “≥” in step iii).
Goeman et al (2006) proposed an empirical Bayes method to test on a large number of
parameters, as applicable to the j ’s in the joint logistic regression model (1). A key
idea is to treat j ’s as random, assuming a priori that  = (  1 ,…, k )' has a
distribution with mean E(  ) = 0 and covariance Cov(  ) = σ2I. Thus, to test on the
original H0:  =0, one can instead test on a new H0: σ2 = 0 with a test statistic
1
1
T
Go

(
Y

Y
)'
XX
'
(
Y

Y
)

Y
(
1

Y
)
Trace
((
X

X
)'
(
X

X
))
,
2
2
where Y and X are the response vector and the design matrix respectively, and Y and
X are the sample means of the response and X respectively. The null distribution is
unknown and has to be estimated by permutation or simulation.
Pan (2008) proposed a test called SumSqU as the sum of squared score statistics and
its weighted version called SumSqUw. The two tests are modifications to the standard
score test. Compared to the standard score test, the SumSqU test statistic ignores the
covariance matrix of the score statistic while the SumSqUw statistic uses only the
diagonal elements of the covariance matrix. Pan (2008) found that the two tests (along
with other two as the sums of squared coefficient estimates) performed similarly to
each other and to Goeman's test in many situations; a distinguishing advantage of the
former two tests is that their asymptotic null distributions are available and thus can
be easily used to obtain p-values.
Data
We considered only a ±10Mb region around each of the three well-known RAassociated loci: PTPN22, MHC and TRAF-C5 loci (Plenge et al 2007) on
chromosomes 1, 6 and 9 respectively (Plenge et al 2007). First, we applied the
univariate SNP-by-SNP test (with the dosage coding of genotypes) to the three
regions with the whole GAW16 Rheumatoid Arthritis (RA) subject sample,
identifying four peaks: SNP rs2476601 on chromosome 1 with -log P=10.9; SNP
rs2395175 on chromosome 6 with -log P=89.1; SNP rs2900180 on chromosome 9
with -log P=8.1; SNP rs872863 on chromosome 9 with -log P=10.5. The first three
peaks corresponded to the three well-known RA-associated PTPN22, MHC and
TRAF-C5 loci (Plenge et al 2007); because SNP rs872863 gave a smaller p-value
than rs2900180, and the two SNPs were close to each other, we also included the
former. Second, we ran Haploview on the HapMap CEU data to obtain an LD block
around each peak; rs2395175 did not belong to any block and was only 150bp to an
LD block to be used here, while each of the other three peak loci was included in an
LD block. The four LD blocks were sized at 103Kb, 23Kb, 100Kb and 28Kb with the
starting/ending SNPs in the HapMap data as rs10858018/rs2488457,
rs743862/rs2395174, rs10985070/rs2269066 and rs4838054/rs7865976 respectively.
There were 10, 16, 16 and 10 SNPs in the four LD blocks in the GAW16 RA data.
Third, we removed any SNP with minor allele frequency (MAF) ≤ 0.05, ending up
with k = 10, 15, 14 and 8 SNPs inside the four LD blocks in the GAW16 RA data
respectively.
Because of the highly significant p-values in the four LD blocks, corresponding to
three well-known RA-associated loci (PTPN22, MHC, TRAF-C5), and related
biological arguments (Plenge et al 2007), it is reasonable to assume that at least one
SNP inside or near each LD block was associated with the disease. Thus, to
investigate the power of various tests, we randomly subsampled n = 200 subjects from
the cases and n = 200 subjects from the controls respectively, then applied each test to
the resulting subsample. We repeated this process 1000 times. The proportion of the
rejections of the null hypothesis (i.e. no association between RA and any SNP) by a
test was its empirical power.
Results
The empirical power of each test is shown in Table 1. Two sets of the results were
available for the Sum test, corresponding to the use of method “>” or “≥” in
selecting SNP codings. It is interesting to see that for the region in chromosome 1, the
U-P test was most powerful; surprisingly, even the often claimed conservative U-B
was more powerful than the other tests. The U-P test was much more powerful than
the others. The L-G test was ranked third, followed by SumSqUw, then Goeman's and
SumSqU tests, and finally by the Sum test with minimal power. For chromosome 6,
because of too strong signals, all the tests had power 1. For the region Chrom 9.1, the
Goeman and SumSqU tests were most powerful, closely followed by the SumSqUw
and Sum tests, then by U-P, U-B and finally by L-G; there was substantial power loss
with U-B, as compared to U-P. For Chrom 9.2, the Sum test with method “>” and LG were the winner, followed by U-P and U-B; it is not clear why the Sum(≥),
SumSqU and Goeman's tests had such a minimum power for this region.
The high power of Sum(>) was obtained somewhat incidentally: starting from a
different initial coding and applying the same algorithm with “>”, the resulting Sum
test yielded an empirical power similar to that of the Sum(≥), as for other three
regions, illustrating the possible sensitivity of the Sum test to SNP coding and a
limitation of the proposed algorithm to select SNP codings.
In summary, no test was uniformly most powerful; in particular, although the good
performance of using either of the U-P and Goeman tests was confirmed, they might
still lose to the Sum or G-L test. In addition, the SumSqU and Goeman tests had
almost an equal power, while the SumSqUw test could be more powerful than the
SumSqU and Goeman tests.
Conclusions
As expected, there does not exist a single uniformly most powerful test across all
scenarios. Although in general, Goeman's test, the SumSqU and SumSqUw tests
performed well across various scenarios in previous studies (Chapman and Whittaker
2008; Pan 2008), the Sum test (and the global multivariate test) may be more
powerful under some practical situations, as demonstrated here with the GAW16 RA
data. A main limitation of the Sum test is its possibly diminishing power when the
associations between the trait and some SNPs are in opposite directions, hence its
performance may be sensitive to the chosen coding of SNPs in an unknown way, as
shown for Chrom 9.2 region; further studies are needed to sort out how to code SNPs
for the Sum test. Nevertheless, additional advantages of the Sum test include its
generality and simplicity: it is equally applicable to any applications with any
generalized linear models; in particular, in contrast to Goeman's test, it does not
require the use of permutation to calculate p-values. These advantages are also shared
by the SumSqU and SumSqUw tests, which have similar power to that of Goeman's
test across a wide range of scenarios. For these reasons, rather than dismissing the use
of the Sum test (and the global test) as done by Chapman and Whittaker (2008), we
recommend their use along with the univariate test, the SumSqU and SumSqUw tests.
Acknowledgements
The Genetic Analysis Workshops are supported by NIH grant R01 GM031575 from
the National Institute of General Medical Sciences. This research was partially
supported by NIH grants GM081535 and HL65462.
References
1. Chapman JM, Whittaker J: Analysis of multiple SNPs in a candidate gene
or region. Genetic Epidemiology 2008, 32(6): 560-566
2. Daly MJ, Rioux JD, Schaffner SF, Hudson TJ, Lander ES: High-resolution
haplotype structure in the human genome. Nat Genet 2001, 29: 229-232.
3. Goeman JJ, van de Geer SAvan Houwelingen HC: Testing against a high
dimensional alternative. J R Stat Soc B 2006, 68: 477-493.
4. Pan W: Asymptotic tests of association with multiple SNPs in linkage
disequilibrium. Research Report 2008, Division of Biostatistics, University
of Minnesota. Available at http://www.biostat.umn.edu/rrs.php
5. Plenge RM et al: TRAF1-C5 as a Risk Locus for Rheumatoid Arthritis--A
Genomewide Study. N Engl J Med 2007, 357:1199-209.
6. Wang T, Elston RC: Improved power by use of a weighted score test for
linkage disequilibrium mapping. Am J Hum Genet 2007, 80: 353-360.
Tables
Table 1 - Empirical power of various tests with nominal significance level α =
0.05.
LD block
U-B
U-P
L-G
Go
Sum(>)
Sum(≥)
SumSqU
SumSqUw
Chrom 1
0.609
0.694
0.580
0.407
0.109
0.029
0.406
0.488
Chrom 6
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
Chrom 9.1
0.462
0.649
0.442
0.771
0.730
0.750
0.770
0.739
Chrom 9.2
0.396
0.550
0.845
0.061
0.846
0.021
0.063
0.143
Download