Genotyping Genotyping of the EIRA samples was performed at the Genome Institute of Singapore with the Illumina Human Hap300 v1.0 chip containing probes for 317,503 SNPs on 1966 samples (643 ACPA-negative, 663 ACPA-positive, 660 controls); Hap370CNVduo chip containing probes for 370,404 SNPs (excluding intensity-only probes) on 686 samples (176 ACPA-negative, 498 ACPA-positive, 12 ACPAundefined RA cases); Hap550duo chip containing probes for 561466 SNPs on 538 samples (10 ACPA-negative, 58ACPA-positive, 1 ACPA-undefined RA case, 469 controls). Samples included for analysis had call-rates > 95% and inferred gender consistent with clinical records. Forty-one samples were genotyped twice, with a mean concordance prior to any SNP QC filtering of 99.96% (median: 99.98%; min: 99.24%; max: 100%). SNP filtering was performed based on chip type, eliminating SNPs with call-rates below 95%; monomorphic SNPs; SNPs with a minor allele frequency < 0.005; SNPs with a Hardy-Weinberg equilibrium p-value of < 1.0 x 10-7 in controls, and SNPs mapping to multiple locations and non-autosomal chromosomal SNPs. This resulted in 306,994 autosomal SNPs on 1966 samples (643 ACPAnegative, 663 ACPA-positive, 660 controls) in Hap300; 324,981 autosomal SNPs in Hap370cnvduo on 674 samples (173 ACPA-negative, 489 ACPA-positive, 12 ACPAundefined RA cases); 527,434 autosomal SNPs on 520 samples (10 ACPA-negative, 54 ACPA-positive, 1 ACPA-undefined RA case, 455 controls) in Hap550duo passing the quality control filters. We additionally genotyped several SNPs in EIRA when evidence for association arose from purely imputed markers, using TaqMan allelic discrimination. Directly determined genotypes for these SNPs (rs2037375, rs6799780, rs11815922) are reported, instead of results from imputed genotypes. Genotyping details for NARAC and WTCCC were published previously [1, 2]. Serologic analysis We analyzed ACPA levels in the EIRA material with the Immunoscan-RA Mark2 ELISA (Euro-Diagnostica, Arnhem, The Netherlands) as described elsewhere [3]. Thresholds defining positive or negative ACPA status were set according to the recommendations in the commercially available test kits. In the NARAC study ACPA levels were also determined using the second-generation commercial anti-CCP ELISA (INOVA Diagnostics Inc., CA, USA). Cases with serum antibody levels >20 U/ml were defined as ACPA-positive. The two brands of anti-CCP tests use the same antigen for antibody detection. Statistical Analysis Closely related individuals were identified by Relpair [4] and PLINK [5] identifying 3 monozygous twin-pair; 1 parent offspring pair, and 2 sibling pairs. The member of each pair with the lower call rate was dropped from further analysis. To quantify and control for population stratification, we used a principal components approach implemented in the EIGENSTRAT software [6]. EIGENSTRAT identified a total of 141 significant outliers (52 ACPA-negative; 56 ACPA-positive; 33 controls), which were removed from further analysis. This resulted in a dataset of 1934 RA cases (1147 ACPA-positive; 774 ACPA-negative; 13 ACPA-undefined) and 1079 controls on 297,393 SNPs. The lambda GC for uncorrected Armitage trend tests of association for RA was 1.013 (1.009 after removal of HLA region); for ACPA-positive RA and control association analysis was 1.020 (1.015 after removal of HLA region); and for ACPA-negative RA and control association analysis was 1.005 (1.004 after removal of HLA region). These analyses, and results from STRUCTURE [7, 8] (data not shown) convinced us that the EIRA study is largely free of significant population stratification, and further corrections were not applied in subsequent analysis. To facilitate comparisons with two previously published GWAS of RA risk [1, 2] using different genotyping platforms, we utilized IMPUTE to impute genotypes in all three studies using the reference panel consisting of 2,557,252 SNPs and haplotypes of the db125 version observed in the CEPH European population sample of the International HapMap Project. Four separate imputation runs were carried out with the first batch of samples from EIRA[2] (all EIRA samples typed on Hap300K) with 303,911 typed SNPs common with the reference panel; with a second batch of samples from EIRA (all samples typed on Hap370CNVduo and Hap550duo) with 301,703 typed SNPs; NARAC[2] with 458,019 typed SNPs and WTCCC[1] with 392,293 typed SNPs whose alleles are recoded to the forward strand consistent with the alleles present in the CEPH European panel, and SNPs whose alleles do not match that of the reference panel are removed before imputation. Imputed genotypes with posterior probability scores <0.9 were coded as missing, and SNPs with call-rates <0.9, those with Hardy-Weinberg P<1x10-7 in controls and minor allele frequencies <0.005 were removed from further analysis. This resulted in a common set of 1,723,056 SNPs; of which 567,766 were typed in at least one or more of the original studies, and 199,263 SNPs were directly typed in all EIRA samples, comprising the primary analysis. As an initial test of association in Stage I of our study, the genotype counts of each SNP were compared (Figure 1), 1) between 774 ACPA-negative RA cases and a set of 1079 controls from the EIRA study base; 2) between 1147 ACPA-positive RA cases and a set of 1079 common controls from the EIRA study base; and 3) between the two disease subgroups. The analysis was performed using the statistical software packages PLINK [5]. Allele frequencies, odds ratios, and Armitage trend tests of association calculated in this way comprised the primary statistics for subsequent analysis. In Stage II we performed direct comparison between the EIRA GWAS based on ACPA-negative RA with EIRA GWAS based on ACPA-positive RA using CochranArmitage trend test statistics p-values from both analyses. Our null hypothesis was that association in two independent GWAS for any SNP is independent and, for allelic test, that the distribution of test statistics will follow Chi-square distribution with df=1. The alternative hypothesis is that some SNPs will show association in both groups and the Chi-square tests for such SNPs will deviate from the expected distribution. However, due to multiple testing, it is difficult to perform meta analysis for all potential associations. Also, the Bonferroni correction for each GWAS will in this case be too conservative and not suitable for probable “common” associations, since they are not totally independent. Instead of correction for multiple testing for individual SNPs in each GWAS and comparison of association between two GWAS after correction, we thus decided to apply such correction to the product of p-values from Cochran-Armitage trend test for each SNP between two GWAS. We considered a risk SNP to be common between ACPA positive RA and ACPA negative RA if the corresponding p-values both were less than 0.05 and their product was less than the threshold for genome-wide significance (2.9x10-08). The direction for association was controlled for every SNP in range of significant product of p-values, and the direction was consistent in all cases. Additional limits for p≤0.05 for each association study were applied to restrict overload effect from very low uncorrected values <5.8E-07 from a single GWAS. We noticed, however, that in our study such low values appeared only from testing of SNPs in HLA locus and all were very specific for ACPA-positive RA. Finally, in Stage III we compared allelic frequencies from all EIRA-based GWAS with NARAC and WTCCC and performed comparison when appropriate (Figure 1). Bibliography for Supplemental materials 1. 2. 3. 4. 5. 6. 7. 8. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature, 2007. 447(7145): p. 661-78. Plenge, R.M., M. Seielstad, L. Padyukov, et al., TRAF1-C5 as a risk locus for rheumatoid arthritis--a genomewide study. N Engl J Med, 2007. 357(12): p. 1199-209. Ronnelid, J., M.C. Wick, J. Lampa, et al., Longitudinal analysis of citrullinated protein/peptide antibodies (anti-CP) during 5 year follow up in early rheumatoid arthritis: anti-CP status predicts worse disease activity and greater radiological progression. Ann Rheum Dis, 2005. 64(12): p. 1744-9. Epstein, M.P., W.L. Duren and M. Boehnke, Improved inference of relationship for pairs of individuals. Am J Hum Genet, 2000. 67(5): p. 121931. Purcell, S., B. Neale, K. Todd-Brown, et al., PLINK: a tool set for wholegenome association and population-based linkage analyses. Am J Hum Genet, 2007. 81(3): p. 559-75. Price, A.L., N.J. Patterson, R.M. Plenge, et al., Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet, 2006. 38(8): p. 904-9. Falush, D., M. Stephens and J.K. Pritchard, Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies. Genetics, 2003. 164(4): p. 1567-87. Pritchard, J.K., M. Stephens and P. Donnelly, Inference of population structure using multilocus genotype data. Genetics, 2000. 155(2): p. 945-59.