View methods

advertisement
Genotyping
Genotyping of the EIRA samples was performed at the Genome Institute of Singapore
with the Illumina Human Hap300 v1.0 chip containing probes for 317,503 SNPs on
1966
samples
(643
ACPA-negative,
663
ACPA-positive,
660
controls);
Hap370CNVduo chip containing probes for 370,404 SNPs (excluding intensity-only
probes) on 686 samples (176 ACPA-negative, 498 ACPA-positive, 12 ACPAundefined RA cases); Hap550duo chip containing probes for 561466 SNPs on 538
samples (10 ACPA-negative, 58ACPA-positive, 1 ACPA-undefined RA case, 469
controls). Samples included for analysis had call-rates > 95% and inferred gender
consistent with clinical records. Forty-one samples were genotyped twice, with a
mean concordance prior to any SNP QC filtering of 99.96% (median: 99.98%; min:
99.24%; max: 100%). SNP filtering was performed based on chip type, eliminating
SNPs with call-rates below 95%; monomorphic SNPs; SNPs with a minor allele
frequency < 0.005; SNPs with a Hardy-Weinberg equilibrium p-value of < 1.0 x 10-7
in controls, and SNPs mapping to multiple locations and non-autosomal chromosomal
SNPs. This resulted in 306,994 autosomal SNPs on 1966 samples (643 ACPAnegative, 663 ACPA-positive, 660 controls) in Hap300; 324,981 autosomal SNPs in
Hap370cnvduo on 674 samples (173 ACPA-negative, 489 ACPA-positive, 12 ACPAundefined RA cases); 527,434 autosomal SNPs on 520 samples (10 ACPA-negative,
54 ACPA-positive, 1 ACPA-undefined RA case, 455 controls) in Hap550duo passing
the quality control filters.
We additionally genotyped several SNPs in EIRA when evidence for association
arose from purely imputed markers, using TaqMan allelic discrimination. Directly
determined genotypes for these SNPs (rs2037375, rs6799780, rs11815922) are
reported, instead of results from imputed genotypes.
Genotyping details for NARAC and WTCCC were published previously [1, 2].
Serologic analysis
We analyzed ACPA levels in the EIRA material with the Immunoscan-RA Mark2
ELISA (Euro-Diagnostica, Arnhem, The Netherlands) as described elsewhere [3].
Thresholds defining positive or negative ACPA status were set according to the
recommendations in the commercially available test kits. In the NARAC study ACPA
levels were also determined using the second-generation commercial anti-CCP
ELISA (INOVA Diagnostics Inc., CA, USA). Cases with serum antibody levels >20
U/ml were defined as ACPA-positive. The two brands of anti-CCP tests use the same
antigen for antibody detection.
Statistical Analysis
Closely related individuals were identified by Relpair [4] and PLINK [5] identifying 3
monozygous twin-pair; 1 parent offspring pair, and 2 sibling pairs. The member of
each pair with the lower call rate was dropped from further analysis. To quantify and
control for population stratification, we used a principal components approach
implemented in the EIGENSTRAT software [6]. EIGENSTRAT identified a total of
141 significant outliers (52 ACPA-negative; 56 ACPA-positive; 33 controls), which
were removed from further analysis. This resulted in a dataset of 1934 RA cases
(1147 ACPA-positive; 774 ACPA-negative; 13 ACPA-undefined) and 1079 controls
on 297,393 SNPs. The lambda GC for uncorrected Armitage trend tests of association
for RA was 1.013 (1.009 after removal of HLA region); for ACPA-positive RA and
control association analysis was 1.020 (1.015 after removal of HLA region); and for
ACPA-negative RA and control association analysis was 1.005 (1.004 after removal
of HLA region). These analyses, and results from STRUCTURE [7, 8] (data not
shown) convinced us that the EIRA study is largely free of significant population
stratification, and further corrections were not applied in subsequent analysis.
To facilitate comparisons with two previously published GWAS of RA risk [1, 2]
using different genotyping platforms, we utilized IMPUTE to impute genotypes in all
three studies using the reference panel consisting of 2,557,252 SNPs and haplotypes
of the db125 version observed in the CEPH European population sample of the
International HapMap Project. Four separate imputation runs were carried out with
the first batch of samples from EIRA[2] (all EIRA samples typed on Hap300K) with
303,911 typed SNPs common with the reference panel; with a second batch of
samples from EIRA (all samples typed on Hap370CNVduo and Hap550duo) with
301,703 typed SNPs; NARAC[2] with 458,019 typed SNPs and WTCCC[1] with
392,293 typed SNPs whose alleles are recoded to the forward strand consistent with
the alleles present in the CEPH European panel, and SNPs whose alleles do not match
that of the reference panel are removed before imputation. Imputed genotypes with
posterior probability scores <0.9 were coded as missing, and SNPs with call-rates
<0.9, those with Hardy-Weinberg P<1x10-7 in controls and minor allele frequencies
<0.005 were removed from further analysis. This resulted in a common set of
1,723,056 SNPs; of which 567,766 were typed in at least one or more of the original
studies, and 199,263 SNPs were directly typed in all EIRA samples, comprising the
primary analysis.
As an initial test of association in Stage I of our study, the genotype counts of each
SNP were compared (Figure 1), 1) between 774 ACPA-negative RA cases and a set
of 1079 controls from the EIRA study base; 2) between 1147 ACPA-positive RA
cases and a set of 1079 common controls from the EIRA study base; and 3) between
the two disease subgroups. The analysis was performed using the statistical software
packages PLINK [5]. Allele frequencies, odds ratios, and Armitage trend tests of
association calculated in this way comprised the primary statistics for subsequent
analysis.
In Stage II we performed direct comparison between the EIRA GWAS based on
ACPA-negative RA with EIRA GWAS based on ACPA-positive RA using CochranArmitage trend test statistics p-values from both analyses. Our null hypothesis was
that association in two independent GWAS for any SNP is independent and, for
allelic test, that the distribution of test statistics will follow Chi-square distribution
with df=1. The alternative hypothesis is that some SNPs will show association in both
groups and the Chi-square tests for such SNPs will deviate from the expected
distribution. However, due to multiple testing, it is difficult to perform meta analysis
for all potential associations. Also, the Bonferroni correction for each GWAS will in
this case be too conservative and not suitable for probable “common” associations,
since they are not totally independent. Instead of correction for multiple testing for
individual SNPs in each GWAS and comparison of association between two GWAS
after correction, we thus decided to apply such correction to the product of p-values
from Cochran-Armitage trend test for each SNP between two GWAS. We considered
a risk SNP to be common between ACPA positive RA and ACPA negative RA if the
corresponding p-values both were less than 0.05 and their product was less than the
threshold for genome-wide significance (2.9x10-08).
The direction for association was controlled for every SNP in range of significant
product of p-values, and the direction was consistent in all cases. Additional limits for
p≤0.05 for each association study were applied to restrict overload effect from very
low uncorrected values <5.8E-07 from a single GWAS. We noticed, however, that in
our study such low values appeared only from testing of SNPs in HLA locus and all
were very specific for ACPA-positive RA.
Finally, in Stage III we compared allelic frequencies from all EIRA-based GWAS
with NARAC and WTCCC and performed comparison when appropriate (Figure 1).
Bibliography for Supplemental materials
1.
2.
3.
4.
5.
6.
7.
8.
Genome-wide association study of 14,000 cases of seven common diseases
and 3,000 shared controls. Nature, 2007. 447(7145): p. 661-78.
Plenge, R.M., M. Seielstad, L. Padyukov, et al., TRAF1-C5 as a risk locus for
rheumatoid arthritis--a genomewide study. N Engl J Med, 2007. 357(12): p.
1199-209.
Ronnelid, J., M.C. Wick, J. Lampa, et al., Longitudinal analysis of
citrullinated protein/peptide antibodies (anti-CP) during 5 year follow up in
early rheumatoid arthritis: anti-CP status predicts worse disease activity and
greater radiological progression. Ann Rheum Dis, 2005. 64(12): p. 1744-9.
Epstein, M.P., W.L. Duren and M. Boehnke, Improved inference of
relationship for pairs of individuals. Am J Hum Genet, 2000. 67(5): p. 121931.
Purcell, S., B. Neale, K. Todd-Brown, et al., PLINK: a tool set for wholegenome association and population-based linkage analyses. Am J Hum Genet,
2007. 81(3): p. 559-75.
Price, A.L., N.J. Patterson, R.M. Plenge, et al., Principal components analysis
corrects for stratification in genome-wide association studies. Nat Genet,
2006. 38(8): p. 904-9.
Falush, D., M. Stephens and J.K. Pritchard, Inference of population structure
using multilocus genotype data: linked loci and correlated allele frequencies.
Genetics, 2003. 164(4): p. 1567-87.
Pritchard, J.K., M. Stephens and P. Donnelly, Inference of population
structure using multilocus genotype data. Genetics, 2000. 155(2): p. 945-59.
Download