Supplementary Information Supplement to: Ramos E, Doumatey A, Elkahloun A, et al. Pharmacogenomics, ancestry and clinical decision making for global populations. 1 METHODS DNA SAMPLE PREPARATION For directly genotyped samples from IGBO, AKAN, GAA, HUFS, DNAs were extracted following the manufacturer’s instructions (Gentra Puregene blood kit Plus ,QIAGEN, Valencia, CA) ; all samples were whole genome amplified using Illustra GenomiPhi HY DNA amplification kit (Fisher Scientific, Pittsburg, PA) followed by a clean-up step using ethanol precipitation method as previously described.1 DNA sample concentrations were determined using Pico green (Quant-iT™ PicoGreen® dsDNA assay kit, Invitrogen, Carlsbad, CA). Maasai in Kinyawa, Kenya (MKK) samples were purchased from the Coriell Institute for Medical Research (Camden, NJ). These samples were received as genomic DNA; quantity and quality were verified using a spectrophotometer (BioTeK uQuant, Winooski, VT). GENOTYPING All DNA samples used in this study were normalized to a single concentration of 60 ng/ul as required by the manufacturer; 1ug of DNA was used to genotype the samples using DMET™ Plus assay (Affymetrix, Santa Clara, CA). The methodology is a molecular inversion probes based technique and has been described elsewhere;2 briefly,48 samples were simultaneously processed and undergone multiplexed PCR amplification followed by an enzymatic fragmentation of generated PCR products, labeling, hybridization, array washing, staining and scanning. After scanning, the array images were then analyzed using DMET console software tool. Genotyping calls were automatically made using the sample and intensity files. Genotyping data was then exported for analyses. DATA MANAGEMENT Sequence variant information for the 1000 Genomes populations was downloaded from the 1000 Genomes website 3. Genotype and allele information corresponding to the variants available on the Affymetrix® DMET™ Plus platform based on the current annotation file were extracted from the sequence data files. Of the 1,936 drug metabolism markers included on the DMET™ gene chips, 1,156 2 variants were extracted from the available whole genome sequence datasets. Technical limitations such as low read-depth or accessibility associated with samples sequenced from 1000 Genomes project limited the number of markers available for analysis. A subset of variants deemed actionable [McLeod], were identified by the Food and Drug Administration as important Pharmacogenomic Biomarkers that have been listed in various drug labels4 as well as been shown to have evidence of clinical utility as reported in the pharmacogenomics database PharmGKB (www.pharmgkb.org). Sequence variant data downloaded from 1000 Genomes and directly genotyped data on the DMET™ Plus assay were merged into one dataset and minor allele frequencies (MAFs) were calculated using PLINK (-freq).5 The resulting dataset comprised genotype data on 1478 individuals from 1156 loci representing 212 different ADME-related genes. FST ESTIMATES Pairwise FST calculations to measure population differentiation for a given marker between populations were based on formulas previously described.6 When calculated between continental groups, MAFs for each allele were averaged for each continental-level groups (AFR, EUR and EAS). Due to admixture of intercontinental ancestries, the African American populations (ASW and HUFS) and the Latin American populations (AMR) were excluded. Guidelines established by Wright7 were used for interpreting FST output as follows: little population differentiation (0 to 0.05), moderate population differentiation (0.05 and 0.15), large population differentiation (0.15 and 0.25), and very large population differentiation (values greater than 0.25). PRINCIPAL COMPONENT ANALYSIS Principal components of ancestry were computed by decomposing the centered genotype matrix of 1478 individuals and 1156 markers coded as 0, 1, or 2 copies of the minor allele. The number of significant principal components was estimated using the minimum average partial test.8 The summary statistic is the average squared partial correlation after the first principal components have been partialed out of the 3 correlation matrix.9 The number of significant principal components equals the value of m that globally minimizes the summary statistic.9 In order to compare the multidimensional scaling (MDS) plot of ADME markers to the MDS plot of markers randomly sampled from genotype or sequence data of each population, we performed Procrustes analysis.10 Random autosomal markers were selected by pruning the least dense data set (i.e., HUFS) for pairwise r2 < 0.1. These markers were pulled from 1000 Genome sequence data (for all 14 of the 1000 Genome Populations) and genotypes from HapMap Phase 3 (MKK), AADM (IGBO, GAA, and AKAN) and HUFS datasets. The final data sets used for comparison included 1437 individuals and 1156 markers for the ADME data and 1437 individuals and 12,950 markers for the random data. Based on the 19 centroids, the two MDS plots were more similar than random (p-value of < 1 x 10-8 based on 100 million permutations). REFERENCES 1. Purification of DNA. Open Wetware, 2010. (Accessed January 10, 2010, at http://openwetware.org/index.php?title=Purification_of_DNA&oldid=430314.) 2. Burmester JK, Sedova M, Shapero MH, Mansfield E. DMET microarray technology for pharmacogenomics-based personalized medicine. Methods Mol Biol 2010;632:99-124. 3. 1000 Genomes Data. 1000 Genomes, 2012. (Accessed November 14, 2011, at ftp://ftp- trace.ncbi.nih.gov/1000genomes/ftp/release/20101123/interim_phase1_release/.) 4. Information for Healthcare Professionals Abacavir (marketed as Ziagen) and Abacavir-containing Medications. U.S. Food and Drug Administration, 2008. (Accessed May 11, 2009, at http://www.fda.gov/cder/drug/InfoSheets/HCP/abacavirHCP.htm.) 5. Purcell S, Neale B, Todd-Brown K, et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. American journal of human genetics 2007;81:559-75. 6. Chen G, Shriner D, Zhou J, et al. Development of admixture mapping panels for African Americans from commercial high-density SNP arrays. BMC genomics 2010;11:417. 7. Wright S. Genetical structure of populations. Nature 1950;166:247-9. 4 8. Shriner D. Investigating population stratification and admixture using eigenanalysis of dense genotypes. Heredity (Edinb) 2011;107:413-20. 9. Velicer WF. Determining the number of components from the matrix of partial correlations. Psychometrika 1976;41:321-7. 10. Wang C, Szpiech ZA, Degnan JH, et al. Comparing spatial maps of human population-genetic variation using Procrustes analysis. Stat Appl Genet Mol Biol 2010;9:Article 13. 5 SUPPLEMENTARY TABLES Supplementary Table S1. Number of participants per population sample. POPULATION YRI IGBO GAA AKAN LWK MKK ASW HUFS CEU TSI GBR FIN IBS CHS CHB JPT MXL PUR CLM TOTAL n 88 74 75 73 97 87 61 75 87 98 89 93 14 100 97 89 66 55 60 1478 6 Supplementary Table S2. Number of markers within specified interval of ΔMAF. ΔMAF 0 to < 0.05 0.05 to < 0.1 0.1 to < 0.15 0.15 to < 0.20 0.20 to < 0.25 0.25 to < 0.3 0.3 to < 0.35 0.35 to < 0.4 0.4 to < 0.45 0.45 to < 0.5 0.5 to < 0.55 0.55 to < 0.60 0.60 to < 0.65 0.65 to < 0.7 0.7 to < 0.75 0.75 to < 0.80 0.80 to < 0.85 0.85 to < 0.90 0.90 to < 0.95 0.95 to 1.0 AFR+AA EUR EAS AMR 361 552 797 671 245 218 241 243 242 214 95 132 183 109 17 68 83 36 6 31 29 23 1 10 6 3 0 1 3 1 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 7 SUPPLEMENTARY FIGURES Figure S1. Reference allele frequencies of actionable ADME SNPs separated by continental/ancestral grouping: A) AFR+AA; B) EUR; C) EAS; D) AMR. 8 9 10 11 Figure S2. Continental-level differences of ADME SNPs. The markers are sorted by increasing ΔMAF for the full ADME dataset and separated by continental/ancestral grouping: A) AFR+AA; B) EUR; C) EAS; D) AMR. 12 13 14 15 YRI IGBO GBR FIN GAA AKAN LWK MKK ASW IBS CHB CHS JPT MXL HUFS CEU TSI PUR CLM 0.0 -0.1 loc$points[, PC22] 0.1 0.2 Figure S3. Principal components analysis across global populations. Pharmacogenomic SNPs shared across all populations were analyzed for population structure. The first and second principal components show population structure at the continental level as well as the suggested relationships of admixed populations. -0.3 -0.2 -0.1 0.0 0.1 0.2 loc$points[, 1] PC1 16 Figure S4. Density plot of ΔMAF for populations grouped by continent/ancestry. 17