Original article 647 CYP1A2 is more variable than previously thought: a genomic biography of the gene behind the human drug-metabolizing enzyme Sarah L. Browninga, Ayele Tarekegna,c, Endashaw Bekelec, Neil Bradmana and Mark G. Thomasb,d Background and objectives CYP1A2 metabolizes various drugs, endogenous compounds and procarcinogens. As human genetic diversity has been reported to decrease with distance from Ethiopia, we resequenced CYP1A2 in five Ethiopian ethnic groups representing a rough northeast to southwest transect across Ethiopia to establish: (i) what variation exists in comparison with what is already known globally and (ii) what CYP1A2 pharmacogenetic profiles may be present as several CYP1A2-metabolized drugs are administered to Ethiopians. (gene diversity using nonsynonymous variants): Ethiopia = 0.17 ± 0.02, other populations = 0.08 ± 0.03. Across the entire gene, Ethiopia also evidences all common variation found on a global scale. We provide evidence of weak purifying selection acting on CYP1A2 and show that the time to most recent common ancestor, calculated using variation in a nearby microsatellite, places several variants into a period predating the expansion of modern humans out of Africa less than 100 000 years c ago. Pharmacogenetics and Genomics 20:647–664 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins. Results and conclusions We found 49 different variable sites (30 of which are novel), nine nonsynonymous changes (seven of which are novel), one synonymous change and 55 different haplotypes, only three of which are previously reported. When haplotypes were constructed using only nonsynonymous polymorphisms to restrict haplotypes to those most likely to affect enzyme structure/ function, 10 haplotypes were identified (seven contain previously unidentified nonsynonymous variants and four are predicted to alter the enzyme structure/function). Most individuals have at least one copy of the ancestral haplotype. Comparing these data with those from publically available databases, Ethiopian groups display twice the variation seen in all other populations combined Pharmacogenetics and Genomics 2010, 20:647–664 Introduction Consistent with anatomically modern humans originating in Africa, there is more human genetic diversity in that continent than in all the others [1]. Recent reports have shown reducing genetic diversity with distance from Ethiopia [2–4] and suggest that anatomically modern humans migrated out of Africa from the north east (possibly via Ethiopia) by crossing the Bab-el-Mandreb strait at the mouth of the Red Sea [1,5–7]. Evidence of a more recent migration into Ethiopia, of Semitic-speaking people from Arabia, is also known from genetic, archaeological and linguistic studies [1]. As a consequence, it is possible that more human genetic/phenotypic variation will be observed in Ethiopians than in any other Supplemental digital content is available for this article. Direct URL citations appear in the printed text and are provided in the HTML and PDF versions of this article on the journal’s Website (www.pharmacogeneticsandgenomics.com). c 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins 1744-6872 Keywords: CYP1A2, cytochrome P450 1A2, drug metabolism, Ethiopia, microsatellite, pharmacogenetics, purifying selection, SNPstr, time to most recent common ancestor a The Centre for Genetic Anthropology, bResearch Department of Genetics, Evolution and Environment, University College London, London, UK, cAddis Ababa University, Addis Ababa, Ethiopia and dDepartment of Evolutionary Biology, Evolutionary Biology Centre, Uppsala University, Uppsala, Sweden Correspondence to Dr Sarah L. Browning, The Centre for Genetic Anthropology, Research Department of Genetics, Evolution and Environment, University College London, Gower Street, London WC1E 6BT, UK Tel: + 44 2076795061; fax: + 44 2076795052; e-mail: sarah.browning@ucl.ac.uk. Received 18 May 2010 Accepted 22 July 2010 geographically contiguous groups of indigenous people of similar number. The distribution of human genetic variation among Ethiopian populations has however been studied little. Limited investigation of CYP1A2 has been undertaken in the Ethiopian population to date. Researchers [8] have carried out CYP1A2 genotype and phenotype studies in 100 Ethiopians from Ethiopia and 73 living in Sweden. However, this study only sequenced the gene in 12 individuals; genotyping was restricted to intron 1 and the sample set was of mixed Ethiopian origin with donors from the Oromo, Amhara, Tigriyan and Gurage ethnic groups. CYP1A2 genotype studies have also been performed in Ethiopians as part of a wider study [9]. However, in the latter study only six previously ascertained single-nucleotide polymorphisms (SNPs) in six Ethiopians were genotyped and the ethnicity of DOI: 10.1097/FPC.0b013e32833e90eb Copyright © Lippincott Williams & Wilkins. Unauthorized reproduction of this article is prohibited. 648 Pharmacogenetics and Genomics 2010, Vol 20 No 11 individuals was not recorded. We resequenced the coding and exon-flanking regions of CYP1A2 in five Ethiopian ethnic groups representing a rough northeast to southwest transect across Ethiopia to establish what variation exists in comparison with what is already known in other populations. We were also interested in ascertaining what CYP1A2 relevant pharmacogenetic profiles may be present in the Ethiopian population as several CYP1A2metabolized drugs are administered to Ethiopians. As examples, both primaquine and praziquantel [10] are used as the first line of treatment for malaria and schistosomiasis, respectively [11]. Furthermore, coffee was first domesticated for human use in Ethiopia [12] and is an integral part of modern Ethiopian culture. The intake of caffeine (a well-known CYP1A2 substrate) is consequently widespread in Ethiopia. Cytochrome P450 1A2 Human CYP1A2 is mapped to the positive strand of the long arm of chromosome 15 at 15q24.1 at chromosomal location 15:72 828 237–72 835 994 [13] and is mainly expressed in the liver [14]. It is orientated head-to-head with CYP1A1, which is on the negative strand. CYP1A2 and CYP1A1 are separated by a 23.3 kb spacer region whose role in regulating either of the genes, or in governing the expression of both the genes simultaneously, is not yet understood [15]. CYP1A2 is approximately 7.8 kb long with seven exons and six introns [16]. Exon 1 and the downstream sequence of exon 7 are untranslated regions (UTRs). The gene has only one transcript, which is translated into a protein of 516 amino acid residues [13]. The active site is thought to include amino acids C458 and F451 in exon 7 and T321 in exon 4 [17]. CYP1A2 is a clinically important drug-metabolizing enzyme and is responsible for the oxidative metabolism of a wide variety of pharmaceutical drugs, the biotransformation of endogenous compounds, and the metabolic activation of some procarcinogens [18]. The enzyme is induced by a number of compounds and has many inhibitors [18]. Caffeine is frequently used as a substrate in CYP1A2 phenotype studies, but theophylene and melatonin are also used [19]. To date, roughly 125 allelic variants have been reported within CYP1A2 (from exon 1 to exon 7) and over 40 variants have been found within 3000 bases on either side of the gene [20–23]. No variation has been reported in exon 1 (50 UTR), no more than seven variants have been found in any of exons 3, 4, 5 and 6, yet more than 20 variants have been found in each of the exons 2 and 7 (including 30 UTR). Note that none of the variants are found in what is thought to be the active site of the protein. In addition, no copy number variation or gene conversion has been reported in CYP1A2. As many as 36 CYP1A2 haplotypes, including 21 subtypes, have been named by the Human Cytochrome P450 Allele Nomenclature Committee [20]. Following comprehen- sive sequencing in two studies [24,25], the majority of haplotypes have been reported in Japanese populations, but differences in haplotype frequencies are however evident among populations worldwide [19]. The associated functional status of each CYP1A2 haplotype also varies [19] and variation in the gene is thought to be associated with differences in efficacy and safety of drugs [19]. Methods Samples DNA samples were prepared from buccal swabs from males, 18 years old or older, unrelated at the paternal grandfather level. All samples were collected anonymously with informed consent from the National Health Research Ethical Clearance Committee under the Ethiopian Technology and Science Commission Department of Health Research. Sociological data, including age, current residence, birthplace, self-declared ethnic identity and religion of the individual were collected with similar information about the individual’s father, mother, paternal grandfather and maternal grandmother. Samples comprised: Afar (n = 76), Amhara (n = 77), Anuak (n = 76), Maale (n = 76) and Oromo (n = 76). Afar were collected from Dubti (11.741N, 41.091E) and Asayta (11.561N, 41.441E) in Afar, Amhara and Oromo from Addis Ababa (9.031N, 38.701E) and Jimma (7.671N, 36.831E), Anuak from the Gambela region [including Gog (7.581N, 34.501E), Itang (8.201N, 34.271E) and Akobo (7.821N, 33.031E)] and Maale from Jinka (5.651N, 36.651E) in the Bako Gazer woreda in South Omo. Genotype data from 95 individuals [12 Yoruba, 15 African–American, 22 European, 22 Hispanic and 24 East Asian (12 Japanese and 12 Han Chinese)] reported by the National Institute of Environmental Health Sciences (NIEHS) SNPs Programme [23] were incorporated in the analyses of this study to place the Ethiopian data in a worldwide context. Amplification and sequencing of CYP1A2 Amplification and sequencing conditions for all of the exons and flanking introns of CYP1A2 are described in Supplementary data 1 and 2, Supplemental digital content 1, http://links.lww.com/FPC/A209. We sequenced all seven CYP1A2 coding exons, introns 3 and 4 and part of introns 1, 2, 5 and 6, the 50 and 30 near gene regions and the 30 UTR. A total of 88 bases (72834757–72834845) in the 30 UTR could not be sequenced in either direction because of the poly A/T regions. Statistical analysis Pairwise linkage disequilibrium (LD) was measured by the D0 parameter [26], using GOLD software [27]. The following statistics were calculated using Arlequin software [28]: tests for departure of observed genotype frequencies from those expected under Hardy–Weinberg Copyright © Lippincott Williams & Wilkins. Unauthorized reproduction of this article is prohibited. A genomic biography of CYP1A2 Browning et al. 649 equilibrium, haplotype phase inference estimated from unphased population genotype data using the Excoffier– Laval–Balding approach [29], gene diversity [30], nucleotide diversity [30], genetic differences between population samples assessed using an exact test of population differentiation [31,32], genetic distance between populations as represented by population pairwise FST values [33], apportionment of diversity within and among more than two populations analyzed using hierarchical FST values [34]. Principal coordinates analysis [35] was performed using the R statistical package [36] on pairwise similarity matrices as previously described [37]. Effects of amino acid substitutions on the structure and function of CYP1A2 were predicted using PolyPhen software [38]. Median joining networks [39] were constructed using Network software Version 4.510 [40] and drawn using Network publisher Version 1.1.0.7 (Fluxus Technology Ltd). Tajima’s D [41], the McDonald and Kreitman [42] and Fu and Li’s [43] tests of neutrality were performed using DNAsp software [44]. Evidence of purifying selection at CYP1A2 nonsynonymous SNP sites was assessed using previously described methods [45,46]. Genohaplotyping of an AC microsatellite and a G > C single-nucleotide polymorphism (rs11072507) using a SNPstr system A 384 bp region containing the AC microsatellite (5.6 kb downstream of the 30 end of CYP1A2) and G > C SNP (rs11072507) was amplified using the forward primer TC TCATCTCGCAACTGGGGA and the reverse primer G GGTTGGGGGCCCATTGTCS. As the 30 end of the reverse primer annealed to the site of the G > C SNP, each allele was independently amplified. The fragment ending with the C allele was specifically amplified using the fluorescently labelled FAM-GGGTTGGGGGCCCATTGTCG reverse primer while the fragment ending with the mutated G allele was specifically labelled with the HEXGGGTTGGGGGCCCATTGTCC reverse primer. Each fluorescently labelled PCR product encompassed the SNP at one end and the microsatellite at the other, and the length of the PCR product varied among chromosomes, depending solely on the number of microsatellite repeats. Consequently, the gametic phase for the SNP and microsatellite could be empirically determined by electrophoresis on a genetic analyzer using fluorescent detection. Individual sample DNAs were amplified separately with an allele-specific, fluorescently labelled reverse primer and a common forward primer. Two separate PCR reactions per individual were carried out to increase the reliability of the results. DNA was amplified in 96-well plates in 10 ml reaction volumes containing 1 ng of template DNA, 0.3 mmol/l of each primer (forward and reverse), 0.13 units Taq DNA polymerase (HT Biotech, Cambridge, UK), 9.3 nmol/l TaqStartTM monoclonal antibody (BD Biosciences Clontech, Oxford, UK), 200 mmol/l dNTPs and reaction buffer supplied with the Taq polymerase. The cycling parameters were: 4 min of preincubation at 941C, followed by 35 cycles of 30 s at 941C, 30 s at 561C and 30 s at 721C, with a final elongation step for 7 min at 721C. A 2 ml aliquot of the diluted PCR product (1 in 5 dilution) was mixed with 9.89 ml of high purity (HiDi) formamide and 0.11 ml of ROX size standard (Applied Biosystems, Warrington, UK). The mixture was heated for 4 min at 961C and immediately cooled in ice. Samples were run on an ABI 3100 genetic analyzer and analyzed using GeneMapper software v4.0 (Applied Biosystems, Warrington UK). Genohaplotypes (rs11072507 genotypes and AC microsatellite haplotypes) were then recorded for each sample. To ensure that the SNPstr assay was accurately determining microsatellite lengths (by fragment mobility), a sample of rs11072507 heterozygous individuals also had their microsatellite lengths confirmed by sequencing (Supplementary data 3, Supplemental digital content 1, http://links.lww.com/FPC/ A209). Estimating the time to most recent common ancestor for CYP1A2 variants Under the stepwise mutation model, the average square distance in microsatellite allele repeat number between all sampled chromosomes and the ancestral haplotype, averaged over loci, has been shown to be linearly related to mt, where m is the mutation rate and t the coalescence time in generations [47,48]. The AC microsatellite alleles obtained from the SNPstr assay were used to date CYP1A2 variants in this study. As the gametic phase for the SNP (rs11072507) and AC microsatellite was empirically determined from the SNPstr assay for each sample, the SNP (rs11072507) was used to determine to which microsatellite haplotype the allele, which was being dated, was linked. Phase was inferred for all CYP1A2 variant alleles and rs11072507 from the pooled Ethiopian population by the Excoffier–Laval–Balding approach [29] implemented in Arlequin software [28]. When both the alleles of any particular CYP1A2 SNP were on the background of both the G and C of rs11072507, recombination was assumed. As recombination initiates a new distribution of microsatellite alleles in the evolutionary history of the gene (overlaid on the previous distribution), these variants were dated using microsatellites on the background of each of rs11072507 C and G separately and together (where possible). Of the date estimates produced from only rs11072507 G or C alleles, the older dates were assumed to indicate the coalescent date of the SNP being dated, whereas the younger was taken as the coalescent date of the recombination event. As the recombination between identical haplotypes would not affect coalescent date estimates, recombination between identical CYP1A2 haplotypes was not accounted for in the method. Average square distance and time were calculated using Ytime software, Version 2.08 [49]. The modal haplotype was assumed to be ancestral. The time to most recent Copyright © Lippincott Williams & Wilkins. Unauthorized reproduction of this article is prohibited. CYP1A2 variants observed in the Ethiopian samples 650 Table 1 Pharmacogenetics and Genomics 2010, Vol 20 No 11 Bold, private to one population; dbsnp, database single nucleotide polymorphism; Grey, novel mutations; f, frequency; n, chromosome number; NCBI, National Center for Biotechnology Information; UTR, untranslated region; White, known mutations. *Shortens protein by 21 amino acids. Copyright © Lippincott Williams & Wilkins. Unauthorized reproduction of this article is prohibited. A genomic biography of CYP1A2 Browning et al. 651 Fig. 1 –739 Intron 1 Exon 3 Intron 3 Intron 4 Intron 6 Exon 7 3′ UTR –505 1.00 –163 1.00 1.00 1513 0.99 0.08 1.00 1589 1.00 1.00 1.00 0.96 2159 1.00 1.00 0.99 1.00 0.98 2321 0.75 1.00 1.00 1.00 1.00 1.00 3613 0.94 0.75 0.93 1.00 1.00 0.85 0.84 5105 1.00 1.00 1.00 1.00 1.00 0.79 1.00 1.00 5347 1.00 1.00 1.00 1.00 0.99 0.99 0.68 1.00 1.00 5521 0.98 1.00 1.00 1.00 1.00 1.00 0.83 0.93 1.00 1.00 5620 0.95 0.10 0.26 0.23 0.82 0.79 0.05 1.00 0.06 1.00 1.00 5987 0.12 1.00 1.00 1.00 0.05 0.89 1.00 0.11 1.00 0.28 0.85 0.56 6324 0.43 0.10 1.00 1.00 0.52 1.00 0.02 0.63 1.00 1.00 0.85 0.65 1.00 6674 0.92 1.00 0.95 1.00 0.89 1.00 1.00 0.91 1.00 1.00 0.96 0.94 0.00 0.89 –505 –163 1589 2159 2321 3613 5105 5347 5521 5620 5987 6324 –739 Intron 1 1513 Exon 3 Intron 3 Intron 4 Intron 6 Exon 7 6674 3′ UTR Pairwise linkage disequilibrium (LD) (D0 ) across CYP1A2 in the combined Ethiopian sample. Monomorphic loci and rare variants (where frequency < 0.01) were removed from the datasets before the analysis. CYP1A2 variants and their relative locations within the gene are highlighted in grey, D0 values of 1 are highlighted in pink, significant w2 associations are in bold (P < 0.05). The area bordered in red constitutes an LD block as defined by Haploview (www.hapmap.org/haploview). UTR, untranslated region. common ancestor (unbiased estimate plus confidence interval) was inferred under the Simple Stepwise Mutation Model of microsatellite evolution. The AC microsatellite mutation rate per generation was assumed to be 0.0005 [50]. Confidence intervals were obtained on the distance between the assumed ancestral and sampled chromosomes (ignoring uncertainty in mutation rate) by simulation assuming a star genealogy. This type of genealogy was assumed because most nonancestral haplotypes were rare (in some cases most were singletons) and negative Tajima D values were observed for all Ethiopian populations (Table 6), indicating that the genealogy linking the CYP1A2 chromosomes was more like the star genealogy characteristic of population growth than the genealogy associated with no growth. For each generation, a time period of 32 years was assumed based on previously reported estimates [51]. Results CYP1A2 variation observed in Ethiopia A total of 49 different CYP1A2 polymorphic sites were observed in the Ethiopian samples (Table 1). No genotype frequencies for any population deviated significantly from Hardy–Weinberg equilibrium at the 1% significance level, variant sites were not observed within 17 bases on either side of each intron/exon boundary and all reported catalytic residues (amino acids D320 and T321 in exon 4, and F451 and C458 in exon 7) were monomorphic. As many as 21 (43%) of the variant alleles were private to populations and 30 (61%) were previously unreported (Table 1), including seven nonsynonymous variants, one of which is a premature stop codon in exon 7 (5384 C > A resulting in Y495X) observed in Anuak at 3%. Notably, nonsynonymous variants never exceeded frequencies of 11% in any one group and those predicted to alter the structure/function of the protein were observed at frequencies between 1 and 3% in their respective populations (Table 1). The majority of CYP1A2 SNP loci are in total LD (D0 = 1), but several cases where D0 was less than 1 were observed across the gene (Fig. 1). The majority of lower D0 values were evident between pairs of loci including at least one marker towards the 30 end of the gene, and loci from intron 1 up to and including 5521 in the 30 UTR constituted an LD block as defined by other investigators (Fig. 1). Copyright © Lippincott Williams & Wilkins. Unauthorized reproduction of this article is prohibited. CYP1A2 haplotypes observed across the entire gene in the Ethiopian samples 652 Pharmacogenetics and Genomics Table 2 2010, Vol 20 No 11 Haplotype frequencies are shown in Table 3. 1 Position from base A in the initiation codon (A in ATG is + 1, base before A is – 1) from the CYP1A2 genomic reference sequence (NC_000015.8). 2 White cell, allele observed in CYP1A2*1A, grey cell, derived allele. Underlined haplotypes were unambiguously resolved from homozygous genotypes at all loci or from a single site heterozygote. Haplotypes reported by the CYP450 Allele Nomenclature Committee are named. Three of the variants reported in the Ethiopians were not incorporated into haplotypes because they were only polymorphic in samples with missing data at other polymorphic sites. Copyright © Lippincott Williams & Wilkins. Unauthorized reproduction of this article is prohibited. A genomic biography of CYP1A2 Browning et al. 653 Table 3 CYP1A2 entire gene haplotype frequencies in Ethiopian populations Afar Amhara Anuak Maale Oromo Ethiopia overall Haplotype ID n Frequency n Frequency n Frequency n Frequency n Frequency n Frequency 1 (*1B) 2 3 (*1M) 4 5 6 7 (*17) 8 9 10 11 (*18) 12 13 14 15 (*19) 16 17 18 19 20 21 22 23 24 25 26 27 28 29 (*20) 30 31 32 33 34 35 36 37 (*21) 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 (*1F) Total 42 2 32 3 0 2 2 6 4 8 2 2 2 1 2 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 118 0.36 0.02 0.27 0.03 0.00 0.02 0.02 0.05 0.03 0.07 0.02 0.02 0.02 0.01 0.02 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00 51 0 42 2 0 0 1 2 0 9 0 1 0 0 0 0 0 0 1 1 0 4 0 1 1 2 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 124 0.41 0.00 0.34 0.02 0.00 0.00 0.01 0.02 0.00 0.07 0.00 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.01 0.00 0.03 0.00 0.01 0.01 0.02 0.01 0.01 0.01 0.01 0.01 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00 65 3 5 7 11 8 2 9 0 1 0 5 0 0 0 0 0 5 0 2 0 0 0 0 0 0 0 1 1 0 1 0 1 1 3 1 3 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 140 0.46 0.02 0.04 0.05 0.08 0.06 0.01 0.06 0.00 0.01 0.00 0.04 0.00 0.00 0.00 0.00 0.00 0.04 0.00 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.01 0.00 0.01 0.00 0.01 0.01 0.02 0.01 0.02 0.01 0.01 0.01 0.01 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00 42 0 16 3 0 1 1 0 0 12 0 0 0 0 1 0 0 20 2 2 1 8 0 0 0 2 0 0 8 0 0 0 0 0 0 0 0 1 1 1 0 0 1 4 1 1 1 1 1 0 0 0 0 0 0 132 0.32 0.00 0.12 0.02 0.00 0.01 0.01 0.00 0.00 0.09 0.00 0.00 0.00 0.00 0.01 0.00 0.00 0.15 0.02 0.02 0.01 0.06 0.00 0.00 0.00 0.02 0.00 0.00 0.06 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.01 0.01 0.00 0.00 0.01 0.03 0.01 0.01 0.01 0.01 0.01 0.00 0.00 0.00 0.00 0.00 0.00 1.00 31 1 24 8 0 2 1 1 2 8 0 1 0 0 0 0 0 4 0 0 0 4 1 0 0 0 2 0 1 0 0 0 0 0 0 0 0 1 0 1 0 0 1 1 0 0 0 0 0 1 1 2 1 1 1 102 0.30 0.01 0.24 0.08 0.00 0.02 0.01 0.01 0.02 0.08 0.00 0.01 0.00 0.00 0.00 0.00 0.00 0.04 0.00 0.00 0.00 0.04 0.01 0.00 0.00 0.00 0.02 0.00 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.00 0.01 0.00 0.00 0.01 0.01 0.00 0.00 0.00 0.00 0.00 0.01 0.01 0.02 0.01 0.01 0.01 1.00 231 6 119 23 11 13 7 18 6 38 2 9 2 1 3 1 1 30 4 6 2 17 2 1 1 4 3 2 11 1 2 1 1 1 3 1 3 3 2 3 1 1 2 5 1 1 1 1 1 1 1 2 1 1 1 616 0.375 0.010 0.193 0.037 0.018 0.021 0.011 0.029 0.010 0.062 0.003 0.015 0.003 0.002 0.005 0.002 0.002 0.049 0.006 0.010 0.003 0.028 0.003 0.002 0.002 0.006 0.005 0.003 0.018 0.002 0.003 0.002 0.002 0.002 0.005 0.002 0.005 0.005 0.003 0.005 0.002 0.002 0.003 0.008 0.002 0.002 0.002 0.002 0.002 0.002 0.002 0.003 0.002 0.002 0.002 1.000 Haplotypes are shown in Table 2. n, number of chromosomes. Across the entire CYP1A2 gene, 55 different haplotypes were observed in the Ethiopian samples (Table 2). Only haplotypes 1 (CYP1A2*1B), 3 (CYP1A2*1M) and 55 (CYP1A2*1F) were previously reported with the consequence that 52 novel haplotypes were found in this study. Of the novel haplotypes found in this study, haplotypes 7, 11, 15, 29 and 37 have now been named by the CYP450 Allele Nomenclature Committee [20] as CYP1A2*17, *18, *19, *20 and *21, respectively. CYP1A2*1B and *1M were the most frequent haplotypes within the Ethiopians and many of the novel haplotypes were rare (< 1%) (Table 3) and closely related to those previously reported (Fig. 2). When haplotypes were constructed using only nonsynonymous polymorphisms (named NS haplotypes hereafter), to restrict the haplotype set to those most likely to affect protein structure/function, 10 NS haplotypes were Copyright © Lippincott Williams & Wilkins. Unauthorized reproduction of this article is prohibited. 654 Pharmacogenetics and Genomics 2010, Vol 20 No 11 Fig. 2 21 47 41 ∗21 51 25 42 28 ∗17 44 24 30 32 45 38 8 34 13 ∗1M 6 26 10 40 46 9 33 48 14 17 27 12 16 23 ∗18 ∗1B ∗19 39 20 22 5 18 50 52 ∗1F 4 2 ∗20 35 36 19 31 53 43 49 54 Network analysis of CYP1A2 entire gene haplotypes observed in Ethiopian populations. Nodes represent haplotypes, which are named according to the nomenclature outlined in Table 2. Nodes are proportional to haplotype frequencies within the combined Ethiopian populations (Table 3). White nodes, previously reported haplotypes; grey nodes, novel haplotypes reported by this study. *The alleles named by the P450 Allele Nomenclature Committee. identified (Table 4). Seven contained previously unidentified nonsynonymous variants and four were predicted to alter the structure/function of the protein (Table 4). The modal NS haplotype ( Z 86%) in all populations was the ancestral NS haplotype 7 (Table 4). Potentially damaging NS haplotypes were observed in Amhara, Anuak and Oromo, but their frequencies never exceeded 3% in any one group (Table 4). Notably, diplotype configurations of the NS haplotypes revealed that all individuals have at least one haplotype predicted to code for an unaltered protein, and the majority have a copy of the NS haplotype without any mutations (Table 5). Analyzing Ethiopian CYP1A2 variation in the context of other populations All CYP1A2 nucleotide variants and haplotypes (in the regions sequenced in the Ethiopians in this study) found at a frequency of Z 3% in the combined NIEHS sample were detected in the Ethiopian samples (Fig. 3). CYP1A2 gene and nucleotide diversities were always observed to be highest in African populations and lowest in Europeans (Fig. 4). Notably, the pooled Ethiopian samples were always more diverse than the pooled NIEHS samples and even single Ethiopian ethnic groups were often more diverse than the combined NIEHS samples (Fig. 4). The majority of Ethiopian and NIEHS populations were significantly different (exact test of population differentiation, P < 0.05) when entire gene haplotypes were considered (Fig. 5a). However, when the haplotype set was restricted to markers that are most likely to affect the structure/function of the protein (i.e. NS haplotypes), considerably less pairwise differentiation was observed with significant differences only occurring among Ethiopian populations (Fig. 5b). Consistent with this, statistically significant interethnic differentiation was observed in the coding region in the Ethiopians (hierarchical FST based on NS haplotypes = 0.02, P < 0.00001) with 2% of variation occurring among groups. Significant FST values were also observed between Ethiopians and Europeans, and Ethiopians and East Asians (Fig. 6). Interestingly, as an illustration of intra-Ethiopia variation, a slightly greater FST was observed between Amhara and Anuak (FST = 0.12, P < 0.01) than between Hispanics and East Asians (FST = 0.11, P = 0.05) for CYP1A2 entire gene haplotypes. The recent evolutionary history of CYP1A2 Testing for selection in CYP1A2 Tajima’s D was not significantly different from zero in any population and Fisher’s exact test P values for each of the McDonald–Kreitman tests were above 0.05 (Table 6). Consequently, the hypothesis of neutrality [52] was not Copyright © Lippincott Williams & Wilkins. Unauthorized reproduction of this article is prohibited. f, frequency; n, number of chromosomes. Position from base A in the initiation codon (A in ATG is + 1, base before A is – 1) from the CYP1A2 genomic reference sequence (NC_000015.8). White cell, allele observed in CYP1A2*1A, grey cell, derived allele. Underlined haplotypes were unambiguously resolved from homozygous genotypes at all loci or from a single site heterozygote, bold haplotypes contain previously unidentified nonsynonymous variants. 3 Predictions made using PolyPhen software. Predicted effects of each NS haplotype are based upon the single amino acid alterations. 2 1 Table 4 CYP1A2 NS haplotypes (only nonsynonymous variants) A genomic biography of CYP1A2 Browning et al. 655 rejected in each case. Fu and Li’s D and F statistics (Table 6) were not significant at the 5% significance threshold for all populations except Amhara and Oromo. The negative D and F statistics for Amhara and Oromo were indicative of an excess of recent mutations in the genealogy, which is consistent with purifying or positive selection acting on CYP1A2 [43]. Although a Bonferroni correction for multiple tests (10 in this case) suggests that a P value of less than 0.005 would be considered significant, negative D and F test statistics were observed for all Ethiopian populations. As a consequence, further analysis was performed to try and determine the type of selection. As CYP1A2 is highly conserved between species, for example, humans, mice and rats [8], and noncoding variation is tolerated more than coding variation in humans (Figs 4 and 5), the prior hypothesis was that purifying selection, not positive selection, has been operating on CYP1A2. Testing for evidence of purifying selection Following the approach of Hughes et al. [45,46], with the exception of 50 noncoding SNPs, lower mean intrapopulation gene diversities and mean interpopulation genetic distances were observed for nonsynonymous SNPs (nonsense SNP and those predicted to cause radical and conservative changes to protein structure) than SNPs in the same gene, which have no effect on protein structure (Fig. 7). Where data were sufficiently informative to permit significance tests to be carried out, mean gene diversity was significantly lower for radical nonsynonymous SNPs than intronic and 30 UTR SNPs (Fig. 7a). Mean interpopulation genetic distance was also significantly lower for radical nonsynonymous SNPs than conservative nonsynonymous SNPs and SNPs with no effect on protein structure, however the mean for 50 noncoding SNPs was significantly lower than that for radical nonsynonymous SNPs (Fig. 7b) (this may be explained by small sample size as there were only two SNPs in the 50 noncoding category). These results are consistent with purifying selection having acted at nonsynonymous SNP sites predicted to cause radical changes to protein structure [45,46]. Evidence of purifying selection acting on nonsynonymous SNPs causing conservative amino acid changes was also shown (Supplementary data 4, Supplemental digital content 1, http://links.lww.com/FPC/ A209). CYP1A2 chronology: coalescent date estimates for CYP1A2 variants The CYP1A2 sequences and SNPstr genohaplotypes (which incorporated the rs11072507 genotypes with the AC microsatellite haplotypes) were informative enough to date nine CYP1A2 variants, in addition to the G > C SNP (rs11072507) in the SNPstr, in the Ethiopian populations. Details regarding the distribution of microsatellite alleles for rs11072507 C and G and for each CYP1A2 variant dated are shown in Supplementary data 5, Supplemental digital content 1, http://links.lww.com/ Copyright © Lippincott Williams & Wilkins. Unauthorized reproduction of this article is prohibited. 656 Pharmacogenetics and Genomics 2010, Vol 20 No 11 Table 5 CYP1A2 diplotypes configured from NS haplotypes observed in Ethiopian populations Afar CYP1A2 NS diplotype 2/1 7/3 7/4 1/7 2/7 5/7 6/7 8/7 9/7 10/7 7/7 Grand Total Amhara Anuak Maale Oromo Pooled Ethiopian sample n Frequency n Frequency n Frequency n Frequency n Frequency n Frequency 0 0 0 0 4 0 2 2 2 0 51 61 0.00 0.00 0.00 0.00 0.07 0.00 0.03 0.03 0.03 0.00 0.84 1.00 0 0 1 0 0 1 0 0 1 1 67 71 0.00 0.00 0.01 0.00 0.00 0.01 0.00 0.00 0.01 0.01 0.94 1.00 1 0 1 3 10 1 0 0 2 1 57 76 0.01 0.00 0.01 0.04 0.13 0.01 0.00 0.00 0.03 0.01 0.75 1.00 0 0 0 0 8 10 1 0 1 0 52 72 0.00 0.00 0.00 0.00 0.11 0.14 0.01 0.00 0.01 0.00 0.72 1.00 0 1 0 0 4 2 0 0 1 1 54 63 0.00 0.02 0.00 0.00 0.06 0.03 0.00 0.00 0.02 0.02 0.86 1.00 1 1 2 3 26 14 3 2 7 3 281 343 0.003 0.003 0.006 0.009 0.076 0.041 0.009 0.006 0.020 0.009 0.819 1.000 Unambiguously inferred diplotypes are underlined. n, number of individuals. FPC/A209. Coalescent date estimates ranged from 5000– 383 000 years (Table 7), and were consistent with each other (i.e. the dating is consistent with a parsimonious ordering of mutations) in the course of evolution of CYP1A2 in humans (Supplementary data 6, Supplemental digital content 1, http://links.lww.com/FPC/A209). Where dates could not be estimated from microsatellite data for nonsynonymous variants, date boundaries were approximated using a mutation network (Supplementary data 7, Supplemental digital content 1, http://links.lww.com/FPC/A209). Discussion CYP1A2 variation observed in Ethiopia Resequencing CYP1A2 (all exons and flanking intronic regions) in five Ethiopian ethnic groups has revealed a substantial amount of previously unreported genetic variation. We found 55 different CYP1A2 haplotypes in the Ethiopian samples alone. This haplotype set outnumbers most of those reported to date, in some instances across different populations, for each of the CYP450 genes [20]. Studies investigating genetic diversity of a range of drug metabolizing enzymes in Ethiopians should be encouraged as the great extent of genetic variability evidenced for CYP1A2 in the Ethiopians in this study is likely to apply to other genes. Several of the novel CYP1A2 alleles identified in this study were predicted to change the structure/function of the protein. As they were observed in individuals who were at least 18 years old, it is clear that these variants, at least in the heterozygous state, are compatible with survival to reproductive age and that tolerance of functional variation is, at least to some extent, evident for CYP1A2. The premature stop codon Y495X was identified in Anuak at 3% [with a 95% confidence interval of 0.007–0.066 (exact Pearson–Klopper method)], hence in a population numbering 45 655 (the 1994 census record for Anuak), it is expected that 2657 people would carry one copy of the premature stop codon while 41 people would carry two copies. The mutation occurs in the last exon and would consequently (i) not result in nonsense-mediated mRNA decay [53] and (ii) only cause the protein to lack 21 amino acids. Functional studies should be able to determine whether the premature stop codon leads to a nonfunctional enzyme or a protein with reduced function. We are not aware of any previous reports of variation in the coding region of CYP1A2 likely to result in the shortening of the associated protein. If nonfunctionality is the case and if homozygotes do exist, then such individuals would be living human CYP1A2 knockouts whose existence would open interesting possibilities for research into P450-mediated pharmacokinetic activity. CYP1A2 knockout mice are viable and fertile [54], however, in addition to showing decreased drug metabolism [54], they exhibit alterations in the expression of genes related to cell–cycle regulation, insulin action, lipogenesis, and fatty acid and cholesterol biosynthetic pathways [55]. The existence of human CYP1A2 knockouts may therefore be invaluable in assessing the precise role of human CYP1A2 in physiological processes. All other variants predicted to alter the structure/function of the protein were very rare and never exceeded more than a single observation in any one ethnic group. Unrecognized variation cannot be studied in vivo, and paucity of such knowledge may lead to inappropriate therapeutic intervention and increase the risk of adverse drug reactions. However, diplotype configurations indicate that most people in all populations in this study may, depending on variation in the promoter, be expected to have normal CYP1A2 function. Nevertheless, given the frequency of the nonancestral NS haplotypes there will be individuals, in different proportions in different ethnic groups, expected to have two copies of nonancestral NS haplotypes, but in this study in no case is this predicted to be greater than 1%. Analyzing Ethiopian CYP1A2 variation in the context of other populations Corresponding CYP1A2 sequence data from an additional five populations (African–Americans, Yoruba, Europeans, Hispanics and East Asians), generated by the NIEHS Copyright © Lippincott Williams & Wilkins. Unauthorized reproduction of this article is prohibited. A genomic biography of CYP1A2 Browning et al. 657 Fig. 3 Frequency (%) of CYP1A2 polymorphism in the NIEHS sample population 65 60 55 50 45 40 35 30 25 20 15 10 5 G >A −1 63 C −6 >A 1 A> 21 G 7 G >A 31 0 G > 33 A 1 C >T 61 3 T> G 86 9 G 14 >C 60 C >T 15 13 C >A 16 49 G >T 16 69 C >T 21 59 G 36 >A 13 T 51 >C 12 C >T 53 47 C >T 55 21 A 6 0 >G 21 C >T 63 24 G de 65 37 l G >A 66 74 C >G −5 −5 6 9 05 G >A 0 CYP1A2 polymorphism Frequency (%) of CYP1A2 haplotype in the NIEHS sample population 35 30 25 20 15 10 5 −1 63 C −1 63 C >A C YP ;5 >A 3 1A ;5 47 2∗ 34 T> 1B 7T C ;6 >C 32 ;5 −5 4G 52 69 >d −1 1A G el 63 >A >G C ;− ;6 >A 1 67 63 ;3 4C C 61 > >G 3T A; >C 21 59 ;5 34 G >A 7T C >C YP ;5 1A 52 2∗ 1A 1M >G ;6 −1 67 63 4C C >A >G ;5 −5 34 05 7T G >C >A −1 63 ;5 34 C −1 >A 33 7T 63 ;2 1C >C C 17 >T >A G ; ;1 53 >A 51 47 ;6 3C T> 13 C >A T> G ;5 ;1 3 −1 47 51 63 T> −1 3C C C 63 >A >A C ; >A ; 5 −1 −6 34 ;8 63 1A 7T 69 C > >C G G >A ; >C ;2 21 ;5 15 59 34 9G G 7T >A >A >C ; 6 ;5 02 52 1C 1A >T >G ;6 16 67 49 4C G >G >T ;5 34 7T >C 0 CYP1A2 haplotype Ethiopian populations evidence all the common variation observed in the National Institute of Environmental Health Sciences (NIEHS) African– American, Yoruba, European, Hispanic and East Asian sample populations. CYP1A2 polymorphisms observed (in the regions sequenced in the Ethiopians in this study) in the NIEHS samples are shown above, haplotypes are shown below. Variation observed in the Ethiopians is shown in grey, variation not observed in the Ethiopians is shown in black. CYP1A2* alleles and variants are numbered according to the CYP450 Allele Nomenclature Committee system. SNPs programme [23], were included in the analysis with the Ethiopians. Despite lacking power because of small sample sizes (Supplementary data 8, Supplemental digital content 1, http://links.lww.com/FPC/A209), the NIEHS data proved useful in placing the Ethiopian data, albeit tentatively, into a worldwide context. Copyright © Lippincott Williams & Wilkins. Unauthorized reproduction of this article is prohibited. 658 Pharmacogenetics and Genomics 2010, Vol 20 No 11 Eu ro pe an Am ha ra ic an isp H An ua k ia n As C om bi ne d N IE H S Ea st sa Af ar m pl es sa m pl es O ro m o C om bi ne d Et hi op ia n M aa le Yo ru ba 1.00 0.90 0.80 0.70 0.60 0.50 0.40 0.30 0.20 0.10 0.00 Af ric an –A m er ic an Gene diversity (h) Fig. 4 Ea st As ia n Eu ro pe an An ua k Yo ru ba H isp an ic N IE H S sa m pl es Am ha ra C om bi ne d Af ar sa m pl es Et hi op ia n C om bi ne d Af ric an –A m er ic M aa le an 0.09 0.08 0.07 0.06 0.05 0.04 0.03 0.02 0.01 0.00 O ro m o Nucleotide diversity (pi) Population Population Gene (above) and nucleotide (below) diversity based on variation across the entire CYP1A2 gene and the coding sequence (only nonsynonymous variation). Variation across the entire gene is shown in grey, nonsynonymous variation is shown in white. Error bars represent standard deviation. Consistent with purifying selection acting upon CYP1A2, coding variation is tolerated the least. Consistent with other studies [56] and with Africa being the birthplace of mankind, African populations were the most diverse in this study. Maale (1994 census population 46 458 [57]), Oromo (1994 census population, 17 080 318 [57]), Anuak (1994 census population 45 665 [57]) and the combined Ethiopian sample populations were often more diverse than the combined NIEHS data sets. Values were also comparatively high in both the NIEHS AfricanAmerican and Yoruba data sets. Furthermore, consistent with some anatomically modern humans migrating out of Africa via Ethiopia, and a more recent migration of Semitic-speaking peoples from Arabia into Ethiopia, all of the common CYP1A2 variation found outside Ethiopia remains present within Ethiopian groups. Consequently, the Ethiopians could perhaps serve not only as a suitable population for the development of CYP1A2 diagnostic markers/tests useful in pharmacogenetic prediction in populations worldwide, but also to ensure that such tests were not only suitable for developed countries. These findings also highlight the need to conduct population genetic research in Ethiopians if conclusions reached concerning populations outside Ethiopia are to be interpreted in context. When haplotypes were constructed using only nonsynonymous polymorphisms, so as to restrict the haplotype set to those most likely to affect the protein (although it is accepted that variation in splice sites could also affect the protein structure and variation in the promoter could affect gene expression), Europeans, Hispanics and East Copyright © Lippincott Williams & Wilkins. Unauthorized reproduction of this article is prohibited. A genomic biography of CYP1A2 Browning et al. 659 Fig. 5 (a) Afar Afar Amhara Anuak Maale Oromo African–American Yoruba European Hispanic East Asian − + + − + + − − + + + + + + − + + + + + − + + + + + + + + + + + − − − − + − − + + − − + Amhara 0.11 Anuak <0.01 <0.01 Maale <0.01 <0.01 <0.01 Oromo 0.13 0.01 <0.01 <0.01 African–American <0.01 <0.01 0.02 <0.01 0.02 Yoruba <0.01 <0.01 0.18 <0.01 0.05 0.96 European 0.48 0.61 <0.01 0.01 0.46 <0.01 <0.01 Hispanic 0.16 0.03 <0.01 0.01 0.46 0.12 0.04 0.06 East Asian 0.01 <0.01 <0.01 <0.01 0.07 0.06 0.09 <0.01 (b) Afar Amhara Anuak Maale Oromo African–American Yoruba European Hispanic East Asian + − + − − − − − − + + − − − − − − + − − − − − − − − − − − − − − − − − − − − − − − − − − Afar Amhara 0.01 Anuak 0.07 <0.01 Maale <0.01 <0.01 <0.01 Oromo 0.33 0.19 0.23 0.09 African–American 0.40 0.21 0.54 0.26 0.35 Yoruba 0.16 0.07 0.35 0.18 0.19 0.64 European 0.89 1.00 0.66 0.44 0.86 0.18 0.07 Hispanic 1.00 1.00 0.76 0.53 1.00 0.48 0.11 1.00 East Asian 0.39 0.67 0.29 0.13 0.56 0.31 0.16 1.00 − 0.10 − 1.00 Exact test of population differentiation P values (lower triangle) and significant/not significant ( + / – ) differences at the 5% threshold (upper triangle) for CYP1A2 entire gene (a) and NS (b) haplotypes. Consistent with purifying selection acting upon CYP1A2, coding variation is tolerated the least. Asians were considerably less variable than Ethiopians, African–Americans and Yoruba. As a consequence, public health policy makers may not have to be concerned about variable drug response, because of variation in the protein, in substantial proportions of individuals belonging to nonAfrican populations. However, given that currently most drug testing is undertaken on non-African populations, more testing on non-European/Asian populations is warranted. With increasing numbers of people having a recent African descent living in Europe and the Americas their pharmacogenetic profiles should be represented in clinical trials. In addition, there should be closer attention paid to them in postmarketing surveillance and greater awareness of genetic variability among them. Of further practical relevance in healthcare, statistically significant variation exists among Ethiopian indigenous groups living in close geographical proximity. Moreover, the Ethiopian populations were the only groups to be differentiated when NS haplotypes were considered. In light of this, the general Ethiopian population should perhaps not be treated, at the CYP1A2 protein level, as one homogenous group, a finding which undoubtedly has implications for future therapeutic intervention in Ethiopia. The recent evolutionary history of CYP1A2 The coalescent date estimates of the CYP1A2 variants in this study were old and all, except for 1589 G > T (which was not found in non-Ethiopians), predated the expansion of modern humans out of Africa less than 100 000 years ago [58]. In fact, five variants ( – 739 G > T and – 163 A > C, both of which are in intron 1, 1513 C > A in exon 3 causing S298R, 3613 T > C in intron 6 and 6324 Copyright © Lippincott Williams & Wilkins. Unauthorized reproduction of this article is prohibited. 660 Pharmacogenetics and Genomics 2010, Vol 20 No 11 Fig. 6 Afar Amhara Anuak Maale Oromo African–American Yoruba European Hispanic East Asian Afar −0.01 0.04 0.11 0.05 0.79 0.75 0.32 0.24 0.27 0.26 Amhara 0.02 −0.01 0.01 <0.01 0.12 0.28 0.08 0.99 0.99 0.99 Anuak 0.01 0.05 −0.01 0.17 0.15 0.77 0.77 0.03 0.13 0.06 Maale 0.01 0.05 0.00 −0.01 0.03 0.45 0.58 0.05 0.10 0.05 Oromo 0.00 0.01 0.01 0.01 −0.01 0.79 0.21 0.25 0.41 0.44 African–American −0.02 0.03 −0.01 −0.01 −0.02 −0.05 0.77 0.17 0.50 0.51 Yoruba 0.01 0.15 −0.03 −0.02 0.02 −0.03 −0.09 0.07 0.14 0.10 European 0.01 −0.01 0.04 0.04 0.01 0.04 0.15 −0.03 0.99 0.99 Hispanic 0.01 −0.02 0.03 0.03 0.00 0.02 0.11 0.00 −0.05 0.99 East Asian 0.00 −0.01 0.03 0.03 0.00 0.01 0.08 −0.01 −0.01 −0.03 0.3 PCO 2 (3.73%) 0.2 East African–American Asian 0.1 0.0 Yoruba Amhara Afar & Oromo European Hispanic Anuak & Maale −0.1 −0.2 −0.2 −0.1 0.0 0.1 0.2 0.3 PCO 1 (94.17%) Genetic distances (Fst) between Ethiopian and NIEHS sample populations based on CYP1A2 NS haplotypes. Population pairwise genetic distances (grey) and P values (upper triangle) are shown above. P values below the 5% significance threshold are shown in bold. A Principle Coordinates analysis plot of these Fst values is shown below. G > del in the 30 UTR) were estimated to have arisen before the emergence of modern humans in Africa less than 200 000 years ago [1,59,60]. Fu and Li’s tests pointed towards selection (purifying or positive) in CYP1A2 in Amhara and Oromo, but Tajima’s D and the McDonald–Kreitman test did not detect selection in any of the populations analyzed in this study. These tests are however known to lack power. Furthermore, as recombination was inferred in the Ethiopian datasets and many Ethiopian groups have hierarchical structures [61], it is possible that selective pressures operating on CYP1A2 would not be detected by commonly used neutrality tests. Reduction of both mean intrapopulation gene diversity and mean interpopulation genetic distance for radical nonsynonymous mutations in comparison with silent mutations (which have no effect on protein structure) in CYP1A2 was consistent with the hypothesis [45,46] that purifying selection has acted at these nonsynonymous SNP sites. Purifying selection was also evidenced in the case of conservative nonsynonymous SNPs. Further support for this phenomenon comes from the approximate Copyright © Lippincott Williams & Wilkins. Unauthorized reproduction of this article is prohibited. A genomic biography of CYP1A2 Browning et al. 661 Table 6 Results of neutrality tests performed on CYP1A2 in the Ethiopian and NIEHS populations Population n Afar Amhara Anuak Maale Oromo African–American Yoruba European Hispanic East Asian Tajima’s test McDonald–Kreitman test Tajima’s D (P value) Fisher’s exact test, P value (two tailed) 118 124 140 132 102 14 10 24 14 26 – 0.88 – 1.16 – 1.24 – 0.86 – 0.85 – 0.77 – 0.63 0.78 – 0.53 0.95 (P > 0.10) (P > 0.10) (P > 0.10) (P > 0.10) (P > 0.10) (P > 0.10) (P > 0.10) (P > 0.10) (P > 0.10) (P > 0.10) Fu and Li’s test with an outgroup D test (P value) 0.34 0.32 0.32 0.34 0.34 1.00 0.23 1.00 1.00 1.00 – 0.84 – 4.24 – 1.07 – 0.88 – 2.55 – 0.81 – 0.95 0.07 0.24 0.97 F test (P value) (P > 0.10) (0.01 < P < 0.02) (P > 0.10) (P > 0.10) (0.02 < P < 0.05) (P > 0.10) (P > 0.10) (P > 0.10) (P > 0.10) (P > 0.10) – 1.02 – 3.68 – 1.35 – 1.05 – 2.30 – 0.95 – 1.03 0.33 0.38 1.13 (P > 0.10) (0.01 < P < 0.02) (P > 0.10) (P > 0.10) (0.02 < P < 0.05) (P > 0.10) (P > 0.10) (P > 0.10) (P > 0.10) (P > 0.10) Each test was performed on each of the individual Ethiopian and NIEHS populations to control for the effects of different demographic histories. n, number of chromosomes. Mean gene diversity (h) NA NS (1 2) NA (a) 0.35 (1 ) Fig. 7 ∗ ∗ NA 0.30 0.25 0.20 0.15 0.10 0.05 (1 ) Sy no ny m ou s 3′ UT R (1 4) (2 9) In tro n C on se rv at ive N on se ns e Ra di ca l( 2) 5′ no nc od in g (2 ) 0.00 SNP category (number of SNP sites) ∗∗∗ (b) 0.07 ∗∗ NS ∗∗∗ ∗∗∗ ∗∗∗ Mean genetic distance 0.06 0.05 0.04 0.03 0.02 0.01 (4 5) Sy no ny m ou s (6 30 ) 3′ UT R (1 30 5) In tro n 0) (5 4 C on se rv at ive (4 5) N on se ns e Ra di ca l( 90 ) 5′ no nc od in g (9 0) 0.00 SNP category (number of interpopulation genetic distances) Evidence of purifying selection acting on CYP1A2. Mean intrapopulation gene diversity at nonsynonymous (radical, nonsense and conservative), synonymous and noncoding single nucleotide polymorphism (SNP) sites in the combined Ethiopian and NIEHS populations is shown above (a). Mean interpopulation genetic distance for all interpopulation comparisons for the same SNP sites is shown below (b). Error bars indicate variance from the mean. One tail P values from t-tests of the hypothesis that mean gene diversity/genetic distance for each SNP category equals that for radical nonsynonymous SNP loci are represented as follows: NA, t-test not applicable because of small sample number; NS, not significant, P > 0.05; *P < 0.05; **P < 0.01; ***P < 0.001. Radical SNPs = 217G > A (G73R) and 5094T > C (F432S), nonsense SNP = 5284C > A (Y495X), conservative SNPs = 53C > G (S18C), 310G > A (D104N), 331C > T (L111F), 613T > G (F205 V), 1460C > T (R281W), 1513C > A (S298R), 3463C > T (T395M), 3468A > C (N397H), 5105G > A (D436N), 5112C > T (T438I), 5253C > G (P485R), 5328G > A (R510Q). Copyright © Lippincott Williams & Wilkins. Unauthorized reproduction of this article is prohibited. 662 Inference of the TMRCA (unbiased estimate plus confidence interval) for CYP1A2 variants and rs11072507 Variants in yellow were assumed to have recombined with rs11072507 (Table S2) and were consequently dated using microsatellites on the background of each of rs11072507 C and G separately and together [C always produced the younger dates (in blue) which were assumed to be the coalescent dates of the recombination events]. All other CYP1A2 variants (in purple) only occurred on the background of rs11072507 G (Table S2). Date estimates in green were assumed to be the coalescent dates of the SNPs. CYP1A2 variants are arranged in the order of increasing time to most common recent ancestor (TMRCA). Both 2159 G > A and 5347 C > T could not be dated on the background of only the rs11072507 G allele because of small sample numbers. Coalescent date estimates would not however have been significantly different between the rs11072507 G and C background in any case because no more than two G linked chromosomes were available for each variant. In both cases, coalescent dates from rs11072507 G and C combined were assumed to be the coalescent dates of the SNPs. G, generations; n, chromosome number; n/a, not applicable; Y, years. Copyright © Lippincott Williams & Wilkins. Unauthorized reproduction of this article is prohibited. Pharmacogenetics and Genomics 2010, Vol 20 No 11 Table 7 A genomic biography of CYP1A2 Browning et al. 663 coalescence date boundaries of these nonsynonymous mutations being consistent with the following hypothesis [45]: mutations thought to be under purifying selection may include variants which drifted to high frequencies in smaller ancestral populations before the substantial population growth experienced by anatomically modern humans approximately 100 000 years ago. As effective population size increased, purifying selection became more effective and the frequencies and numbers of nonsynonymous alleles decreased gradually over time [45]. As the minor allele frequencies (1% to 11%) at CYP1A2 loci evidenced to be under purifying selection are substantially higher than those of genes causative of severe Mendelian diseases, the data suggests that the selective forces acting against these nonsynonymous SNPs are weak in comparison with those at SNP sites causative of severe disease [45,46]. Furthermore, as mutations associated with complex diseases are expected to be individually only slightly deleterious, as opposed to highly deleterious variants associated with Mendelian diseases, it has been claimed that evidence of weak purifying selection may be used to identify candidate alleles for complex disease-association studies [45,46]. As a consequence, it may be appropriate to include nonsynonymous SNPs identified in this study in future studies investigating complex diseases which have been linked to CYP1A2, for example several cancers [19] and cardiovascular disease [62]. Wooding et al. [56] investigated DNA sequence variation in a 3.7 kb noncoding sequence 50 of the CYP1A2 gene in more than 100 individuals of recent African, Asian and European ancestry, and present evidence for positive selection based on an excess of high-frequency derived SNPs in comparisons with outgroup species. We provide evidence of purifying selection within CYP1A2. There are many possible interpretations of the different conclusions, among which are: (i) positive selection may not be directly acting on the 50 region of CYP1A2, but on genetic loci other than CYP1A2 in LD with the 50 CYP1A2 locus [56], (ii) CYP1A2 has, in the course of evolution leading to anatomically modern human, been under positive selection, but subsequently has only been subject to purifying selection and (iii) although analysis, in this and the earlier study, may be consistent with selection, random drift alone could explain the patterns of variation observed. humans migrated out of Africa via Ethiopia, but also emphasizes the value of conducting population genetic research within Ethiopia if appropriate conclusions are to be formulated concerning populations outside of Ethiopia. Unrecognized variation can lead to unsuitable healthcare intervention and can increase the risk of an adverse drug reaction. Investigations such as this are therefore not only of benefit to the indigenous populations of Ethiopia, but are also of increasing importance in directing public healthcare policies in the developed world, where the number of individuals of recent Ethiopian descent is growing. Acknowledgements The authors thank all DNA sample donors, and Professor Sue Povey and Professor Dallas Swallow for their helpful discussion. Neil Bradman is chairman of The Centre for Genetic Anthropology (TCGA) and an honorary lecturer in the research department of Genetics, Evolution and Environment at University College London. He is also joint chairman of the London and City Group of Companies and has extensive business and financial interests including involvement in biotechnology ventures and educational material used by researchers in biomedicine and the life sciences. Nevertheless, he does not have any specific commercial interest in the subject matter of this study. The study was funded in part by a charitable trust of which Neil Bradman is a trustee. The charitable trust has no intellectual property or other rights whatsoever with respect to the research, which forms the subject matter of the paper. All other authors have no conflict of interest to declare. This study was supported by the Biotechnology and Biological Sciences Research Council. References 1 2 3 4 5 6 Conclusion This study has shown Ethiopian populations to be highly diverse compared with populations studied from the rest of the world and have a substantial amount of previously uncharacterized CYP1A2 variation. There is also evidence that much of the variation found on a global scale has been retained. Not only does this serve as further support for the proposition that some anatomically modern 7 8 9 Campbell MC, Tishkoff SA. African genetic diversity: implications for human demographic history, modern human origins, and complex disease mapping. Annu Rev Genomics Hum Genet 2008; 9:403–433. Li JZ, Absher DM, Tang H, Southwick AM, Casto AM, Ramachandran S, et al. Worldwide human relationships inferred from genome-wide patterns of variation. Science 2008; 319:1100–1104. Prugnolle F, Manica A, Balloux F. Geography predicts neutral genetic diversity of human populations. Curr Biol 2005; 15:R159–R160. Ramachandran S, Deshpande O, Roseman CC, Rosenberg NA, Feldman MW, Cavalli-Sforza LL. Support from the relationship of genetic and geographic distance in human populations for a serial founder effect originating in Africa. Proc Natl Acad Sci U S A 2005; 102:15942–15947. Forster P, Matsumura S. Evolution. Did early humans go north or south? Science 2005; 308:965–966. Reed FA, Tishkoff SA. African human diversity, origins and migrations. Curr Opin Genet Dev 2006; 16:597–605. Tishkoff SA, Reed FA, Friedlaender FR, Ehret C, Ranciaro A, Froment A, et al. The genetic structure and history of Africans and African Americans. Science 2009; 324:1035–1044. Aklillu E, Carrillo JA, Makonnen E, Hellman K, Pitarque M, Bertilsson L, et al. Genetic polymorphism of CYP1A2 in Ethiopians affecting induction and expression: characterization of novel haplotypes with single-nucleotide polymorphisms in intron 1. Mol Pharmacol 2003; 64:659–669. Jiang Z, Dragin N, Jorge-Nebert LF, Martin MV, Guengerich FP, Aklillu E, et al. Search for an association between the human CYP1A2 genotype and CYP1A2 metabolic phenotype. Pharmacogenet Genomics 2006; 16:359–367. Copyright © Lippincott Williams & Wilkins. Unauthorized reproduction of this article is prohibited. 664 Pharmacogenetics and Genomics 2010, Vol 20 No 11 10 Li XQ, Bjorkman A, Andersson TB, Gustafsson LL, Masimirembwa CM. Identification of human cytochrome P(450)s that metabolise anti-parasitic drugs and predictions of in vivo drug hepatic clearance from in vitro data. Eur J Clin Pharmacol 2003; 59:429–442. 11 Federal Ministry of Health. Malaria: Diagnosis and treatment guidelines for health workers in Ethiopia. Addis Ababa: Ethiopia; Federal Democratic Republic of Ethiopia, Ministry of Health; 2004. 12 Anthony F, Combes C, Astorga C, Bertrand B, Graziosi G, Lashermes P. The origin of cultivated Coffea arabica L. varieties revealed by AFLP and SSR markers. Theor Appl Genet 2002; 104:894–900. 13 NCBI36: http://www.ncbi.nlm.nih.gov/. [Accessed 2009]. 14 Shimada T, Yamazaki H, Mimura M, Inui Y, Guengerich FP. Interindividual variations in human liver cytochrome P-450 enzymes involved in the oxidation of drugs, carcinogens and toxic chemicals: studies with liver microsomes of 30 Japanese and 30 Caucasians. J Pharmacol Exp Ther 1994; 270:414–423. 15 Jiang Z, Dalton TP, Jin L, Wang B, Tsuneoka Y, Shertzer HG, et al. Toward the evaluation of function in genetic variability: characterizing human SNP frequencies and establishing BAC-transgenic mice carrying the human CYP1A1_CYP1A2 locus. Hum Mutat 2005; 25:196–206. 16 Ikeya K, Jaiswal AK, Owens RA, Jones JE, Nebert DW, Kimura S. Human CYP1A2: sequence, gene structure, comparison with the mouse and rat orthologous gene, and differences in liver 1A2 mRNA expression. Mol Endocrinol 1989; 3:1399–1408. 17 Sansen S, Yano JK, Reynald RL, Schoch GA, Griffin KJ, Stout CD, et al. Adaptations for the oxidation of polycyclic aromatic hydrocarbons exhibited by the structure of human P450 1A2. J Biol Chem 2007; 282:14348–14355. 18 Flockhart DA. Drug interactions: Cytochrome P450 drug interaction table. Indiana University School of Medicine 2007: http://medicine.iupui.edu/ clinpharm/ddis/table.asp. [Accessed 2010]. 19 Gunes A, Dahl ML. Variation in CYP1A2 activity and its clinical implications: influence of environmental factors and genetic polymorphisms. Pharmacogenomics 2008; 9:625–637. 20 Home page of the human Cytochrome P450 (CYP) allele nomenclature committee: http://www.cypalleles.ki.se/. [Accessed 2010]. 21 Klein TE, Chang JT, Cho MK, Easton KL, Fergerson R, Hewett M, et al. Integrating genotype and phenotype information: an overview of the PharmGKB project. Pharmacogenetics research network and knowledge base. Pharmacogenomics J 2001; 1:167–170. 22 NCBI dbSNP 129: http://www.ncbi.nlm.nih.gov/. [Accessed 2009]. 23 NIEHS SNPs. NIEHS Environmental Genome Project, University of Washington, Seattle, WA: http://egp.gs.washington.edu. [Accessed January 2009]. 24 Murayama N, Soyama A, Saito Y, Nakajima Y, Komamura K, Ueno K, et al. Six novel nonsynonymous CYP1A2 gene polymorphisms: catalytic activities of the naturally occurring variant enzymes. J Pharmacol Exp Ther 2004; 308:300–306. 25 Soyama A, Saito Y, Hanioka N, Maekawa K, Komamura K, Kamakura S, et al. Single nucleotide polymorphisms and haplotypes of CYP1A2 in a Japanese population. Drug Metab Pharmacokinet 2005; 20:24–33. 26 Lewontin RC. The interaction of selection and linkage. I. General considerations; heterotic models. Genetics 1964; 49:49–67. 27 Abecasis GR, Cookson WO. GOLD–graphical overview of linkage disequilibrium. Bioinformatics 2000; 16:182–183. 28 Excoffier L, Laval G, Schneider S. Arlequin (version 3.0): an integrated software package for population genetics data analysis. Evol Bioinform Online 2005; 1:47–50. 29 Excoffier L, Laval G, Balding D. Gametic phase estimation over large genomic regions using an adaptive window approach. Hum Genomics 2003; 1:7–19. 30 Nei M. Molecular Evolutionary Genetics. New York: Columbia University Press; 1987. 31 Goudet J, Raymond M, de Meeus T, Rousset F. Testing differentiation in diploid populations. Genetics 1996; 144:1933–1940. 32 Rousset F, Raymond M. Testing heterozygote excess and deficiency. Genetics 1995; 140:1413–1419. 33 Reynolds J, Weir BS, Cockerham CC. Estimation of the coancestry coefficient: basis for a short-term genetic distance. Genetics 1983; 105:767–779. 34 Excoffier L, Smouse PE, Quattro JM. Analysis of molecular variance inferred from metric distances among DNA haplotypes: application to human mitochondrial DNA restriction data. Genetics 1992; 131:479–491. 35 Gower JC. Some distance properties of latent root and vector methods used in multivariate analysis. Biometrika 1966; 53:325–328. 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 The R project for statistical computing: http://www.r-project.org/. [Accessed 2009]. Veeramah KR, Thomas MG, Weale ME, Zeitlyn D, Tarekegn A, Bekele E, et al. The potentially deleterious functional variant flavin-containing monooxygenase 2*1 is at high frequency throughout sub-Saharan Africa. Pharmacogenet Genomics 2008; 18:877–886. PolyPhen: Prediction of functional effect of human nsSNPs: http:// genetics.bwh.harvard.edu/pph/. [Accessed 2009]. Bandelt HJ, Forster P, Rohl A. Median-joining networks for inferring intraspecific phylogenies. Mol Biol Evol 1999; 16:37–48. Fluxus-engineering.com: http://www.fluxus-engineering.com/. [Accessed 2009]. Tajima F. Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics 1989; 123:585–595. McDonald JH, Kreitman M. Adaptive protein evolution at the Adh locus in Drosophila. Nature 1991; 351:652–654. Fu YX, Li WH. Statistical tests of neutrality of mutations. Genetics 1993; 133:693–709. Librado P, Rozas J. DnaSP v5: a software for comprehensive analysis of DNA polymorphism data. Bioinformatics 2009; 25:1451–1452. Hughes AL, Packer B, Welch R, Bergen AW, Chanock SJ, Yeager M. Widespread purifying selection at polymorphic sites in human protein-coding loci. Proc Natl Acad Sci U S A 2003; 100:15754–15757. Hughes AL, Packer B, Welch R, Bergen AW, Chanock SJ, Yeager M. Effects of natural selection on interpopulation divergence at polymorphic sites in human protein-coding Loci. Genetics 2005; 170:1181–1187. Goldstein DB, Ruiz LA, Cavalli-Sforza LL, Feldman MW. Genetic absolute dating based on microsatellites and the origin of modern humans. Proc Natl Acad Sci U S A 1995; 92:6723–6727. Slatkin M. A measure of population subdivision based on microsatellite allele frequencies. Genetics 1995; 139:457–462. Behar DM, Thomas MG, Skorecki K, Hammer MF, Bulygina E, Rosengarten D, et al. Multiple origins of Ashkenazi levites: Y chromosome evidence for both Near Eastern and European ancestries. Am J Hum Genet 2003; 73:768–779. Farrall M, Weeks DE. Mutational mechanisms for generating microsatellite allele-frequency distributions: an analysis of 4558 markers. Am J Hum Genet 1998; 62:1260–1262. Tremblay M, Vezina H. New estimates of intergenerational time intervals for the calculation of age and origins of mutations. Am J Hum Genet 2000; 66:651–658. Kimura M. Rare variant alleles in the light of the neutral theory. Mol Biol Evol 1983; 1:84–93. Maquat LE. Nonsense-mediated mRNA decay in mammals. J Cell Sci 2005; 118:1773–1776. Liang HC, Li H, McKinnon RA, Duffy JJ, Potter SS, Puga A, et al. Cyp1a2(–/ –) null mutant mice develop normally but show deficient drug metabolism. Proc Natl Acad Sci U S A 1996; 93:1671–1676. Smith AG, Davies R, Dalton TP, Miller ML, Judah D, Riley J, et al. Intrinsic hepatic phenotype associated with the Cyp1a2 gene as shown by cDNA expression microarray analysis of the knockout mouse. EHP Toxicogenomics 2003; 111:45–51. Wooding SP, Watkins WS, Bamshad MJ, Dunn DM, Weiss RB, Jorde LB. DNA sequence variation in a 3.7-kb noncoding sequence 50 of the CYP1A2 gene: implications for human population history and natural selection. Am J Hum Genet 2002; 71:528–542. Federal Democratic Republic of Ethiopia Office of Population and Housing Census Commission Central Statistical Authority. The 1994 population and housing census for Ethiopia. Results at country level. Volume 2 analytical report. Addis Ababa: Central Statistical Authority; 1999. Tishkoff SA, Verrelli BC. Patterns of human genetic diversity: implications for human evolutionary history and disease. Annu Rev Genomics Hum Genet 2003; 4:293–340. McDougall I, Brown FH, Fleagle JG. Stratigraphic placement and age of modern humans from Kibish, Ethiopia. Nature 2005; 433:733–736. White TD, Asfaw B, DeGusta D, Gilbert H, Richards GD, Suwa G, et al. Pleistocene homo sapiens from Middle Awash, Ethiopia. Nature 2003; 423:742–747. Freeman D, Pankhurst A. Peripheral people. The excluded minorities of Ethiopia. United Kingdom: C. Hurst and Co. Ltd; 2003. Cornelis MC, El Sohemy A, Campos H. Genetic polymorphism of CYP1A2 increases the risk of myocardial infarction. J Med Genet 2004; 41:758–762. Copyright © Lippincott Williams & Wilkins. Unauthorized reproduction of this article is prohibited.