SUPPLEMENTARY MATERIAL Trans-ethnic fine mapping using population-specific reference panels in diverse Asian populations Wang et al. CONTENTS 1 Supplementary methods and analyses 2 1.1 Simulation to test for the rank of association signals at causal variant 2 1.2 Loci with evidence of multiple association signals 3 1 1 Supplementary Methods 1.1 Simulation to test for the rank of association signals at causal variant Case-control data were simulated using the HAPGEN program with reference data from phase 2 of the International HapMap Project. We randomly selected 2,000 SNPs that are present in all three HapMap population panels but not on any of the popular genotyping platforms from Affymetrix and Illumina (Affymetrix 500K, Affymetrix 6.0 and Illumin1M) as causal variants. For each causal variant, we simulated 1000 cases and 1000 controls in each of the CEU, JPT+CHB and YRI panels by assuming a multiplicative disease effect equivalent to an allelic relative risk of r at the minor allele of the causal variant, with r = 1 for assessing false positives and r = 1.5 for evaluating statistical power (see Supplementary Material for power simulations when r = 1.3). Genotype data for SNPs located within 750kb on either flanks of the causal variant were simulated in each HapMap panel. To mimic the scenario where experiments involving different populations utilized different genotyping technologies, we thinned the simulated CEU case-control genotypes to only the SNPs that are located on the Affymetrix 500K array. The JPT+CHB and YRI data were reduced to the SNPs found on the Illumina 1M and Affymetrix 6.0 arrays respectively. To compare against the ideal scenario that appropriate population-specific haplotype panels are available for imputation against, we ran IMPUTE (Marchini, et al., 2007) on the thinned genotype data from each population with the corresponding HapMap reference panel although we excluded the simulated causal variant to represent an incomplete reference panel. This statistically infers the genotypes of the removed SNPs. Frequentist tests for additive disease association are performed with SNPTEST on the thinned genotype data and the imputed SNPs, with analyses of the latter incorporating the additional feature in SNPTEST that averages across imputation uncertainty(http://www.stats.ox.ac.uk/~marchini/software/gwas/snptest.html). The recommended effective population sizes of 11,418, 14,269 and 17,469 were adopted for CEU, JPT+CHB and YRI respectively during the HAPGEN simulations and the imputations with IMPUTE (Marchini, et al., 2007). 2 1.2 Loci with evidence of multiple association signals Rarely are the genotype-level data from GWAS in multiple ancestries available for performing multiple iterations of conditional analyses, and our study provided a valuable opportunity to investigate whether there could be multiple independent signals at each reported locus, which may point to the presence of multiple causal variants. An iterative hierarchical approach was taken in the multiple conditional analyses, where the most strongly associated SNP identified from each iteration of the conditional trans-ethnic meta-analysis was added as a covariate, until the best SNP yielded a log10 BF < 3.38 (Method section for deriving the threshold). This allowed us to identify six loci (Table S3), where five loci (ABCA1, ZNF259, LIPC and CETP for HDL-C and TOMM40-APOE cluster for LDL-C) possessed two distinct signals each; and COL5A1 for CCT possessed three signals each. As observed in previous reports, accounting for the presence of multiple signals can increase the amount of phenotypic variance explained at each locus1. In this study, we observed an average of 1.7-fold increase in the amount of phenotypic variance explained by including multiple signals at each locus compared to including only the top SNP (Table S3). The greatest increase in the explained variance was observed for HDL-C in the Chinese at ABCA1, which jumped from 0.44% to 1.37% after including rs1883025 with the best SNP in the unconditional analysis (rs1883023). The two SNPs were practically independent in the three populations (r2 of 0.00 in both Chinese and Malays, 0.04 in the Indians). Multiple independent signals could increase the amount of phenotypic variance explained. Thus, using a single index SNP to estimate genetic heritability at each locus, will underestimate the contribution of these genetic variants. However, there are significant operational challenges in performing multiple iterations of conditional analyses in a large multi-study meta-analysis, as researchers typically share only GWAS summary statistics and not the genotype-level data necessary for conditional analyses. Deconvoluting multiple independent signals to identify the underlying functional variants also increase the complexity, although this offers the opportunity for novel statistical methodologies to be developed, such as CAVIAR by Hormozdiari and colleagues that explicitly aims to identify multiple causal variants within a single locus 2. 3 REFERENCES 1. 2. Yang, J. et al. Conditional and joint multiple-SNP analysis of GWAS summary statistics identifies additional variants influencing complex traits. Nat Genet 44, 369-75, S1-3 (2012). Hormozdiari, F., Kostem, E., Kang, E.Y., Pasaniuc, B. & Eskin, E. Identifying Causal Variants at Loci with Multiple Signals of Association. Genetics (2014). 4 5