Supplementary Information (doc 58K)

advertisement
SUPPLEMENTARY MATERIAL
Trans-ethnic fine mapping using population-specific reference panels in
diverse Asian populations
Wang et al.
CONTENTS
1
Supplementary methods and analyses
2
1.1
Simulation to test for the rank of association signals at causal variant
2
1.2
Loci with evidence of multiple association signals
3
1
1
Supplementary Methods
1.1
Simulation to test for the rank of association signals at causal variant
Case-control data were simulated using the HAPGEN program with reference data from phase 2 of
the International HapMap Project. We randomly selected 2,000 SNPs that are present in all three
HapMap population panels but not on any of the popular genotyping platforms from Affymetrix and
Illumina (Affymetrix 500K, Affymetrix 6.0 and Illumin1M) as causal variants. For each causal
variant, we simulated 1000 cases and 1000 controls in each of the CEU, JPT+CHB and YRI panels
by assuming a multiplicative disease effect equivalent to an allelic relative risk of r at the minor
allele of the causal variant, with r = 1 for assessing false positives and r = 1.5 for evaluating
statistical power (see Supplementary Material for power simulations when r = 1.3). Genotype data
for SNPs located within 750kb on either flanks of the causal variant were simulated in each HapMap
panel. To mimic the scenario where experiments involving different populations utilized different
genotyping technologies, we thinned the simulated CEU case-control genotypes to only the SNPs
that are located on the Affymetrix 500K array. The JPT+CHB and YRI data were reduced to the
SNPs found on the Illumina 1M and Affymetrix 6.0 arrays respectively. To compare against the
ideal scenario that appropriate population-specific haplotype panels are available for imputation
against, we ran IMPUTE (Marchini, et al., 2007) on the thinned genotype data from each population
with the corresponding HapMap reference panel although we excluded the simulated causal variant
to represent an incomplete reference panel. This statistically infers the genotypes of the removed
SNPs. Frequentist tests for additive disease association are performed with SNPTEST on the thinned
genotype data and the imputed SNPs, with analyses of the latter incorporating the additional feature
in SNPTEST that averages across imputation
uncertainty(http://www.stats.ox.ac.uk/~marchini/software/gwas/snptest.html). The recommended
effective population sizes of 11,418, 14,269 and 17,469 were adopted for CEU, JPT+CHB and YRI
respectively during the HAPGEN simulations and the imputations with IMPUTE (Marchini, et al.,
2007).
2
1.2
Loci with evidence of multiple association signals
Rarely are the genotype-level data from GWAS in multiple ancestries available for performing
multiple iterations of conditional analyses, and our study provided a valuable opportunity to
investigate whether there could be multiple independent signals at each reported locus, which may
point to the presence of multiple causal variants. An iterative hierarchical approach was taken in the
multiple conditional analyses, where the most strongly associated SNP identified from each iteration
of the conditional trans-ethnic meta-analysis was added as a covariate, until the best SNP yielded a
log10 BF < 3.38 (Method section for deriving the threshold). This allowed us to identify six loci
(Table S3), where five loci (ABCA1, ZNF259, LIPC and CETP for HDL-C and TOMM40-APOE
cluster for LDL-C) possessed two distinct signals each; and COL5A1 for CCT possessed three
signals each.
As observed in previous reports, accounting for the presence of multiple signals can increase the
amount of phenotypic variance explained at each locus1. In this study, we observed an average of
1.7-fold increase in the amount of phenotypic variance explained by including multiple signals at
each locus compared to including only the top SNP (Table S3). The greatest increase in the
explained variance was observed for HDL-C in the Chinese at ABCA1, which jumped from 0.44%
to 1.37% after including rs1883025 with the best SNP in the unconditional analysis (rs1883023).
The two SNPs were practically independent in the three populations (r2 of 0.00 in both Chinese and
Malays, 0.04 in the Indians).
Multiple independent signals could increase the amount of phenotypic variance explained. Thus,
using a single index SNP to estimate genetic heritability at each locus, will underestimate the
contribution of these genetic variants. However, there are significant operational challenges in
performing multiple iterations of conditional analyses in a large multi-study meta-analysis, as
researchers typically share only GWAS summary statistics and not the genotype-level data necessary
for conditional analyses. Deconvoluting multiple independent signals to identify the underlying
functional variants also increase the complexity, although this offers the opportunity for novel
statistical methodologies to be developed, such as CAVIAR by Hormozdiari and colleagues that
explicitly aims to identify multiple causal variants within a single locus 2.
3
REFERENCES
1.
2.
Yang, J. et al. Conditional and joint multiple-SNP analysis of GWAS summary statistics identifies
additional variants influencing complex traits. Nat Genet 44, 369-75, S1-3 (2012).
Hormozdiari, F., Kostem, E., Kang, E.Y., Pasaniuc, B. & Eskin, E. Identifying Causal Variants at
Loci with Multiple Signals of Association. Genetics (2014).
4
5
Download