Supplementary Information (doc 1094K)

advertisement
Supplementary Information
Supplement to: Ramos E, Doumatey A, Elkahloun A, et al. Pharmacogenomics, ancestry and clinical
decision making for global populations.
1
METHODS
DNA SAMPLE PREPARATION
For directly genotyped samples from IGBO, AKAN, GAA, HUFS, DNAs were extracted following the
manufacturer’s instructions (Gentra Puregene blood kit Plus ,QIAGEN, Valencia, CA) ; all samples were
whole genome amplified using Illustra GenomiPhi HY DNA amplification kit (Fisher Scientific,
Pittsburg, PA) followed by a clean-up step using ethanol precipitation method as previously described.1
DNA sample concentrations were determined using Pico green (Quant-iT™ PicoGreen® dsDNA assay
kit, Invitrogen, Carlsbad, CA). Maasai in Kinyawa, Kenya (MKK) samples were purchased from the
Coriell Institute for Medical Research (Camden, NJ). These samples were received as genomic DNA;
quantity and quality were verified using a spectrophotometer (BioTeK uQuant, Winooski, VT).
GENOTYPING
All DNA samples used in this study were normalized to a single concentration of 60 ng/ul as required by
the manufacturer; 1ug of DNA was used to genotype the samples using DMET™ Plus assay (Affymetrix,
Santa Clara, CA). The methodology is a molecular inversion probes based technique and has been
described elsewhere;2 briefly,48 samples were simultaneously processed and undergone multiplexed PCR
amplification followed by an enzymatic fragmentation of generated PCR products, labeling,
hybridization, array washing, staining and scanning. After scanning, the array images were then analyzed
using DMET console software tool. Genotyping calls were automatically made using the sample and
intensity files. Genotyping data was then exported for analyses.
DATA MANAGEMENT
Sequence variant information for the 1000 Genomes populations was downloaded from the 1000
Genomes website 3. Genotype and allele information corresponding to the variants available on the
Affymetrix® DMET™ Plus platform based on the current annotation file were extracted from the
sequence data files. Of the 1,936 drug metabolism markers included on the DMET™ gene chips, 1,156
2
variants were extracted from the available whole genome sequence datasets. Technical limitations such
as low read-depth or accessibility associated with samples sequenced from 1000 Genomes project limited
the number of markers available for analysis. A subset of variants deemed actionable [McLeod], were
identified by the Food and Drug Administration as important Pharmacogenomic Biomarkers that have
been listed in various drug labels4 as well as been shown to have evidence of clinical utility as reported in
the pharmacogenomics database PharmGKB (www.pharmgkb.org). Sequence variant data downloaded
from 1000 Genomes and directly genotyped data on the DMET™ Plus assay were merged into one
dataset and minor allele frequencies (MAFs) were calculated using PLINK (-freq).5 The resulting dataset
comprised genotype data on 1478 individuals from 1156 loci representing 212 different ADME-related
genes.
FST ESTIMATES
Pairwise FST calculations to measure population differentiation for a given marker between populations
were based on formulas previously described.6 When calculated between continental groups, MAFs for
each allele were averaged for each continental-level groups (AFR, EUR and EAS). Due to admixture of
intercontinental ancestries, the African American populations (ASW and HUFS) and the Latin American
populations (AMR) were excluded. Guidelines established by Wright7 were used for interpreting FST
output as follows: little population differentiation (0 to 0.05), moderate population differentiation (0.05
and 0.15), large population differentiation (0.15 and 0.25), and very large population differentiation
(values greater than 0.25).
PRINCIPAL COMPONENT ANALYSIS
Principal components of ancestry were computed by decomposing the centered genotype matrix of 1478
individuals and 1156 markers coded as 0, 1, or 2 copies of the minor allele. The number of significant
principal components was estimated using the minimum average partial test.8 The summary statistic is the
average squared partial correlation after the first principal components have been partialed out of the
3
correlation matrix.9 The number of significant principal components equals the value of m that globally
minimizes the summary statistic.9 In order to compare the multidimensional scaling (MDS) plot of
ADME markers to the MDS plot of markers randomly sampled from genotype or sequence data of each
population, we performed Procrustes analysis.10 Random autosomal markers were selected by pruning the
least dense data set (i.e., HUFS) for pairwise r2 < 0.1. These markers were pulled from 1000 Genome
sequence data (for all 14 of the 1000 Genome Populations) and genotypes from HapMap Phase 3 (MKK),
AADM (IGBO, GAA, and AKAN) and HUFS datasets. The final data sets used for comparison included
1437 individuals and 1156 markers for the ADME data and 1437 individuals and 12,950 markers for the
random data. Based on the 19 centroids, the two MDS plots were more similar than random (p-value of <
1 x 10-8 based on 100 million permutations).
REFERENCES
1.
Purification of DNA. Open Wetware, 2010. (Accessed January 10, 2010, at
http://openwetware.org/index.php?title=Purification_of_DNA&oldid=430314.)
2.
Burmester JK, Sedova M, Shapero MH, Mansfield E. DMET microarray technology for
pharmacogenomics-based personalized medicine. Methods Mol Biol 2010;632:99-124.
3.
1000 Genomes Data. 1000 Genomes, 2012. (Accessed November 14, 2011, at ftp://ftp-
trace.ncbi.nih.gov/1000genomes/ftp/release/20101123/interim_phase1_release/.)
4.
Information for Healthcare Professionals Abacavir (marketed as Ziagen) and Abacavir-containing
Medications. U.S. Food and Drug Administration, 2008. (Accessed May 11, 2009, at
http://www.fda.gov/cder/drug/InfoSheets/HCP/abacavirHCP.htm.)
5.
Purcell S, Neale B, Todd-Brown K, et al. PLINK: a tool set for whole-genome association and
population-based linkage analyses. American journal of human genetics 2007;81:559-75.
6.
Chen G, Shriner D, Zhou J, et al. Development of admixture mapping panels for African
Americans from commercial high-density SNP arrays. BMC genomics 2010;11:417.
7.
Wright S. Genetical structure of populations. Nature 1950;166:247-9.
4
8.
Shriner D. Investigating population stratification and admixture using eigenanalysis of dense
genotypes. Heredity (Edinb) 2011;107:413-20.
9.
Velicer WF. Determining the number of components from the matrix of partial correlations.
Psychometrika 1976;41:321-7.
10.
Wang C, Szpiech ZA, Degnan JH, et al. Comparing spatial maps of human population-genetic
variation using Procrustes analysis. Stat Appl Genet Mol Biol 2010;9:Article 13.
5
SUPPLEMENTARY TABLES
Supplementary Table S1. Number of participants per population sample.
POPULATION
YRI
IGBO
GAA
AKAN
LWK
MKK
ASW
HUFS
CEU
TSI
GBR
FIN
IBS
CHS
CHB
JPT
MXL
PUR
CLM
TOTAL
n
88
74
75
73
97
87
61
75
87
98
89
93
14
100
97
89
66
55
60
1478
6
Supplementary Table S2. Number of markers within specified interval of ΔMAF.
ΔMAF
0 to < 0.05
0.05 to < 0.1
0.1 to < 0.15
0.15 to < 0.20
0.20 to < 0.25
0.25 to < 0.3
0.3 to < 0.35
0.35 to < 0.4
0.4 to < 0.45
0.45 to < 0.5
0.5 to < 0.55
0.55 to < 0.60
0.60 to < 0.65
0.65 to < 0.7
0.7 to < 0.75
0.75 to < 0.80
0.80 to < 0.85
0.85 to < 0.90
0.90 to < 0.95
0.95 to 1.0
AFR+AA EUR
EAS
AMR
361
552
797
671
245
218
241
243
242
214
95
132
183
109
17
68
83
36
6
31
29
23
1
10
6
3
0
1
3
1
0
0
1
0
0
0
1
0
0
0
1
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
7
SUPPLEMENTARY FIGURES
Figure S1. Reference allele frequencies of actionable ADME SNPs separated by
continental/ancestral grouping: A) AFR+AA; B) EUR; C) EAS; D) AMR.
8
9
10
11
Figure S2. Continental-level differences of ADME SNPs. The markers are sorted by increasing
ΔMAF for the full ADME dataset and separated by continental/ancestral grouping: A) AFR+AA;
B) EUR; C) EAS; D) AMR.
12
13
14
15
YRI
IGBO
GBR
FIN
GAA
AKAN
LWK
MKK
ASW
IBS
CHB
CHS
JPT
MXL
HUFS
CEU
TSI
PUR
CLM
0.0
-0.1
loc$points[,
PC22]
0.1
0.2
Figure S3. Principal components analysis across global populations. Pharmacogenomic SNPs
shared across all populations were analyzed for population structure. The first and second
principal components show population structure at the continental level as well as the suggested
relationships of admixed populations.
-0.3
-0.2
-0.1
0.0
0.1
0.2
loc$points[,
1]
PC1
16
Figure S4. Density plot of ΔMAF for populations grouped by continent/ancestry.
17
Download