CYP1A2 is more variable than previously thought: drug-metabolizing enzyme

advertisement
Original article 647
CYP1A2 is more variable than previously thought:
a genomic biography of the gene behind the human
drug-metabolizing enzyme
Sarah L. Browninga, Ayele Tarekegna,c, Endashaw Bekelec, Neil Bradmana
and Mark G. Thomasb,d
Background and objectives CYP1A2 metabolizes
various drugs, endogenous compounds and
procarcinogens. As human genetic diversity has been
reported to decrease with distance from Ethiopia, we
resequenced CYP1A2 in five Ethiopian ethnic groups
representing a rough northeast to southwest transect
across Ethiopia to establish: (i) what variation exists in
comparison with what is already known globally and
(ii) what CYP1A2 pharmacogenetic profiles may be
present as several CYP1A2-metabolized drugs are
administered to Ethiopians.
(gene diversity using nonsynonymous variants):
Ethiopia = 0.17 ± 0.02, other populations = 0.08 ± 0.03.
Across the entire gene, Ethiopia also evidences all
common variation found on a global scale. We provide
evidence of weak purifying selection acting on CYP1A2 and
show that the time to most recent common ancestor,
calculated using variation in a nearby microsatellite, places
several variants into a period predating the expansion of
modern humans out of Africa less than 100 000 years
c
ago. Pharmacogenetics and Genomics 20:647–664 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins.
Results and conclusions We found 49 different variable
sites (30 of which are novel), nine nonsynonymous
changes (seven of which are novel), one synonymous
change and 55 different haplotypes, only three of which are
previously reported. When haplotypes were constructed
using only nonsynonymous polymorphisms to restrict
haplotypes to those most likely to affect enzyme structure/
function, 10 haplotypes were identified (seven contain
previously unidentified nonsynonymous variants and four
are predicted to alter the enzyme structure/function).
Most individuals have at least one copy of the ancestral
haplotype. Comparing these data with those from
publically available databases, Ethiopian groups display
twice the variation seen in all other populations combined
Pharmacogenetics and Genomics 2010, 20:647–664
Introduction
Consistent with anatomically modern humans originating
in Africa, there is more human genetic diversity in that
continent than in all the others [1]. Recent reports have
shown reducing genetic diversity with distance from
Ethiopia [2–4] and suggest that anatomically modern
humans migrated out of Africa from the north east
(possibly via Ethiopia) by crossing the Bab-el-Mandreb
strait at the mouth of the Red Sea [1,5–7]. Evidence of a
more recent migration into Ethiopia, of Semitic-speaking
people from Arabia, is also known from genetic, archaeological and linguistic studies [1]. As a consequence, it is
possible that more human genetic/phenotypic variation
will be observed in Ethiopians than in any other
Supplemental digital content is available for this article. Direct URL citations
appear in the printed text and are provided in the HTML and PDF versions of this
article on the journal’s Website (www.pharmacogeneticsandgenomics.com).
c 2010 Wolters Kluwer Health | Lippincott Williams & Wilkins
1744-6872 Keywords: CYP1A2, cytochrome P450 1A2, drug metabolism, Ethiopia,
microsatellite, pharmacogenetics, purifying selection, SNPstr, time to most
recent common ancestor
a
The Centre for Genetic Anthropology, bResearch Department of Genetics,
Evolution and Environment, University College London, London, UK, cAddis
Ababa University, Addis Ababa, Ethiopia and dDepartment of Evolutionary
Biology, Evolutionary Biology Centre, Uppsala University, Uppsala, Sweden
Correspondence to Dr Sarah L. Browning, The Centre for Genetic Anthropology,
Research Department of Genetics, Evolution and Environment, University College
London, Gower Street, London WC1E 6BT, UK
Tel: + 44 2076795061; fax: + 44 2076795052;
e-mail: sarah.browning@ucl.ac.uk.
Received 18 May 2010 Accepted 22 July 2010
geographically contiguous groups of indigenous people
of similar number. The distribution of human genetic
variation among Ethiopian populations has however been
studied little.
Limited investigation of CYP1A2 has been undertaken in
the Ethiopian population to date. Researchers [8] have
carried out CYP1A2 genotype and phenotype studies in
100 Ethiopians from Ethiopia and 73 living in Sweden.
However, this study only sequenced the gene in 12
individuals; genotyping was restricted to intron 1 and the
sample set was of mixed Ethiopian origin with donors
from the Oromo, Amhara, Tigriyan and Gurage ethnic
groups. CYP1A2 genotype studies have also been
performed in Ethiopians as part of a wider study [9].
However, in the latter study only six previously ascertained single-nucleotide polymorphisms (SNPs) in six
Ethiopians were genotyped and the ethnicity of
DOI: 10.1097/FPC.0b013e32833e90eb
Copyright © Lippincott Williams & Wilkins. Unauthorized reproduction of this article is prohibited.
648 Pharmacogenetics and Genomics 2010, Vol 20 No 11
individuals was not recorded. We resequenced the coding
and exon-flanking regions of CYP1A2 in five Ethiopian
ethnic groups representing a rough northeast to southwest transect across Ethiopia to establish what variation
exists in comparison with what is already known in other
populations. We were also interested in ascertaining what
CYP1A2 relevant pharmacogenetic profiles may be present in the Ethiopian population as several CYP1A2metabolized drugs are administered to Ethiopians. As
examples, both primaquine and praziquantel [10] are
used as the first line of treatment for malaria and
schistosomiasis, respectively [11]. Furthermore, coffee
was first domesticated for human use in Ethiopia [12]
and is an integral part of modern Ethiopian culture. The
intake of caffeine (a well-known CYP1A2 substrate) is
consequently widespread in Ethiopia.
Cytochrome P450 1A2
Human CYP1A2 is mapped to the positive strand of the
long arm of chromosome 15 at 15q24.1 at chromosomal
location 15:72 828 237–72 835 994 [13] and is mainly
expressed in the liver [14]. It is orientated head-to-head
with CYP1A1, which is on the negative strand. CYP1A2
and CYP1A1 are separated by a 23.3 kb spacer region
whose role in regulating either of the genes, or in governing the expression of both the genes simultaneously, is
not yet understood [15]. CYP1A2 is approximately 7.8 kb
long with seven exons and six introns [16]. Exon 1 and
the downstream sequence of exon 7 are untranslated
regions (UTRs). The gene has only one transcript, which
is translated into a protein of 516 amino acid residues
[13]. The active site is thought to include amino acids
C458 and F451 in exon 7 and T321 in exon 4 [17].
CYP1A2 is a clinically important drug-metabolizing
enzyme and is responsible for the oxidative metabolism
of a wide variety of pharmaceutical drugs, the biotransformation of endogenous compounds, and the metabolic
activation of some procarcinogens [18]. The enzyme is
induced by a number of compounds and has many
inhibitors [18]. Caffeine is frequently used as a substrate
in CYP1A2 phenotype studies, but theophylene and
melatonin are also used [19].
To date, roughly 125 allelic variants have been reported
within CYP1A2 (from exon 1 to exon 7) and over 40
variants have been found within 3000 bases on either side
of the gene [20–23]. No variation has been reported in
exon 1 (50 UTR), no more than seven variants have been
found in any of exons 3, 4, 5 and 6, yet more than 20
variants have been found in each of the exons 2 and 7
(including 30 UTR). Note that none of the variants are
found in what is thought to be the active site of the
protein. In addition, no copy number variation or gene
conversion has been reported in CYP1A2. As many as
36 CYP1A2 haplotypes, including 21 subtypes, have
been named by the Human Cytochrome P450 Allele
Nomenclature Committee [20]. Following comprehen-
sive sequencing in two studies [24,25], the majority of haplotypes have been reported in Japanese populations, but
differences in haplotype frequencies are however evident
among populations worldwide [19]. The associated
functional status of each CYP1A2 haplotype also varies
[19] and variation in the gene is thought to be associated
with differences in efficacy and safety of drugs [19].
Methods
Samples
DNA samples were prepared from buccal swabs from
males, 18 years old or older, unrelated at the paternal
grandfather level. All samples were collected anonymously
with informed consent from the National Health Research Ethical Clearance Committee under the Ethiopian
Technology and Science Commission Department of
Health Research. Sociological data, including age, current
residence, birthplace, self-declared ethnic identity and
religion of the individual were collected with similar
information about the individual’s father, mother, paternal grandfather and maternal grandmother. Samples comprised: Afar (n = 76), Amhara (n = 77), Anuak (n = 76),
Maale (n = 76) and Oromo (n = 76). Afar were collected
from Dubti (11.741N, 41.091E) and Asayta (11.561N,
41.441E) in Afar, Amhara and Oromo from Addis Ababa
(9.031N, 38.701E) and Jimma (7.671N, 36.831E), Anuak
from the Gambela region [including Gog (7.581N,
34.501E), Itang (8.201N, 34.271E) and Akobo (7.821N,
33.031E)] and Maale from Jinka (5.651N, 36.651E) in the
Bako Gazer woreda in South Omo.
Genotype data from 95 individuals [12 Yoruba, 15
African–American, 22 European, 22 Hispanic and 24 East
Asian (12 Japanese and 12 Han Chinese)] reported by the
National Institute of Environmental Health Sciences
(NIEHS) SNPs Programme [23] were incorporated in the
analyses of this study to place the Ethiopian data in a
worldwide context.
Amplification and sequencing of CYP1A2
Amplification and sequencing conditions for all of the
exons and flanking introns of CYP1A2 are described in
Supplementary data 1 and 2, Supplemental digital content 1, http://links.lww.com/FPC/A209. We sequenced all seven
CYP1A2 coding exons, introns 3 and 4 and part of introns
1, 2, 5 and 6, the 50 and 30 near gene regions and the 30
UTR. A total of 88 bases (72834757–72834845) in the 30
UTR could not be sequenced in either direction because
of the poly A/T regions.
Statistical analysis
Pairwise linkage disequilibrium (LD) was measured by
the D0 parameter [26], using GOLD software [27]. The
following statistics were calculated using Arlequin software [28]: tests for departure of observed genotype
frequencies from those expected under Hardy–Weinberg
Copyright © Lippincott Williams & Wilkins. Unauthorized reproduction of this article is prohibited.
A genomic biography of CYP1A2 Browning et al. 649
equilibrium, haplotype phase inference estimated from
unphased population genotype data using the Excoffier–
Laval–Balding approach [29], gene diversity [30], nucleotide diversity [30], genetic differences between population
samples assessed using an exact test of population differentiation [31,32], genetic distance between populations
as represented by population pairwise FST values [33], apportionment of diversity within and among more than two
populations analyzed using hierarchical FST values [34].
Principal coordinates analysis [35] was performed using
the R statistical package [36] on pairwise similarity
matrices as previously described [37]. Effects of amino
acid substitutions on the structure and function of
CYP1A2 were predicted using PolyPhen software [38].
Median joining networks [39] were constructed using
Network software Version 4.510 [40] and drawn using
Network publisher Version 1.1.0.7 (Fluxus Technology
Ltd). Tajima’s D [41], the McDonald and Kreitman [42]
and Fu and Li’s [43] tests of neutrality were performed
using DNAsp software [44]. Evidence of purifying
selection at CYP1A2 nonsynonymous SNP sites was
assessed using previously described methods [45,46].
Genohaplotyping of an AC microsatellite and a G > C
single-nucleotide polymorphism (rs11072507) using a
SNPstr system
A 384 bp region containing the AC microsatellite (5.6 kb
downstream of the 30 end of CYP1A2) and G > C SNP
(rs11072507) was amplified using the forward primer TC
TCATCTCGCAACTGGGGA and the reverse primer G
GGTTGGGGGCCCATTGTCS. As the 30 end of the
reverse primer annealed to the site of the G > C SNP, each
allele was independently amplified. The fragment ending
with the C allele was specifically amplified using the fluorescently labelled FAM-GGGTTGGGGGCCCATTGTCG
reverse primer while the fragment ending with the
mutated G allele was specifically labelled with the HEXGGGTTGGGGGCCCATTGTCC reverse primer. Each
fluorescently labelled PCR product encompassed the
SNP at one end and the microsatellite at the other, and
the length of the PCR product varied among chromosomes, depending solely on the number of microsatellite
repeats. Consequently, the gametic phase for the SNP and
microsatellite could be empirically determined by electrophoresis on a genetic analyzer using fluorescent detection.
Individual sample DNAs were amplified separately with
an allele-specific, fluorescently labelled reverse primer
and a common forward primer. Two separate PCR reactions per individual were carried out to increase the
reliability of the results. DNA was amplified in 96-well
plates in 10 ml reaction volumes containing 1 ng of template DNA, 0.3 mmol/l of each primer (forward and reverse), 0.13 units Taq DNA polymerase (HT Biotech,
Cambridge, UK), 9.3 nmol/l TaqStartTM monoclonal
antibody (BD Biosciences Clontech, Oxford, UK),
200 mmol/l dNTPs and reaction buffer supplied with
the Taq polymerase. The cycling parameters were: 4 min
of preincubation at 941C, followed by 35 cycles of 30 s at
941C, 30 s at 561C and 30 s at 721C, with a final elongation step for 7 min at 721C. A 2 ml aliquot of the diluted
PCR product (1 in 5 dilution) was mixed with 9.89 ml of
high purity (HiDi) formamide and 0.11 ml of ROX size
standard (Applied Biosystems, Warrington, UK). The
mixture was heated for 4 min at 961C and immediately
cooled in ice. Samples were run on an ABI 3100 genetic
analyzer and analyzed using GeneMapper software v4.0
(Applied Biosystems, Warrington UK). Genohaplotypes
(rs11072507 genotypes and AC microsatellite haplotypes)
were then recorded for each sample. To ensure that the
SNPstr assay was accurately determining microsatellite
lengths (by fragment mobility), a sample of rs11072507
heterozygous individuals also had their microsatellite
lengths confirmed by sequencing (Supplementary data 3,
Supplemental digital content 1, http://links.lww.com/FPC/
A209).
Estimating the time to most recent common ancestor
for CYP1A2 variants
Under the stepwise mutation model, the average square
distance in microsatellite allele repeat number between
all sampled chromosomes and the ancestral haplotype,
averaged over loci, has been shown to be linearly related
to mt, where m is the mutation rate and t the coalescence
time in generations [47,48]. The AC microsatellite alleles
obtained from the SNPstr assay were used to date
CYP1A2 variants in this study. As the gametic phase for
the SNP (rs11072507) and AC microsatellite was
empirically determined from the SNPstr assay for each
sample, the SNP (rs11072507) was used to determine to
which microsatellite haplotype the allele, which was
being dated, was linked. Phase was inferred for all CYP1A2
variant alleles and rs11072507 from the pooled Ethiopian
population by the Excoffier–Laval–Balding approach [29]
implemented in Arlequin software [28]. When both the
alleles of any particular CYP1A2 SNP were on the background of both the G and C of rs11072507, recombination
was assumed. As recombination initiates a new distribution
of microsatellite alleles in the evolutionary history of the
gene (overlaid on the previous distribution), these variants
were dated using microsatellites on the background of each
of rs11072507 C and G separately and together (where
possible). Of the date estimates produced from only
rs11072507 G or C alleles, the older dates were assumed
to indicate the coalescent date of the SNP being dated,
whereas the younger was taken as the coalescent date of
the recombination event. As the recombination between
identical haplotypes would not affect coalescent date
estimates, recombination between identical CYP1A2 haplotypes was not accounted for in the method.
Average square distance and time were calculated using
Ytime software, Version 2.08 [49]. The modal haplotype
was assumed to be ancestral. The time to most recent
Copyright © Lippincott Williams & Wilkins. Unauthorized reproduction of this article is prohibited.
CYP1A2 variants observed in the Ethiopian samples
650
Table 1
Pharmacogenetics and Genomics 2010, Vol 20 No 11
Bold, private to one population; dbsnp, database single nucleotide polymorphism; Grey, novel mutations; f, frequency; n, chromosome number; NCBI, National Center for Biotechnology Information; UTR, untranslated
region; White, known mutations.
*Shortens protein by 21 amino acids.
Copyright © Lippincott Williams & Wilkins. Unauthorized reproduction of this article is prohibited.
A genomic biography of CYP1A2 Browning et al. 651
Fig. 1
–739
Intron 1
Exon 3
Intron 3
Intron 4
Intron 6
Exon 7
3′ UTR
–505
1.00
–163
1.00
1.00
1513
0.99
0.08
1.00
1589
1.00
1.00
1.00
0.96
2159
1.00
1.00
0.99
1.00
0.98
2321
0.75
1.00
1.00
1.00
1.00
1.00
3613
0.94
0.75
0.93
1.00
1.00
0.85
0.84
5105
1.00
1.00
1.00
1.00
1.00
0.79
1.00
1.00
5347
1.00
1.00
1.00
1.00
0.99
0.99
0.68
1.00
1.00
5521
0.98
1.00
1.00
1.00
1.00
1.00
0.83
0.93
1.00
1.00
5620
0.95
0.10
0.26
0.23
0.82
0.79
0.05
1.00
0.06
1.00
1.00
5987
0.12
1.00
1.00
1.00
0.05
0.89
1.00
0.11
1.00
0.28
0.85
0.56
6324
0.43
0.10
1.00
1.00
0.52
1.00
0.02
0.63
1.00
1.00
0.85
0.65
1.00
6674
0.92
1.00
0.95
1.00
0.89
1.00
1.00
0.91
1.00
1.00
0.96
0.94
0.00
0.89
–505
–163
1589
2159
2321
3613
5105
5347
5521
5620
5987
6324
–739
Intron 1
1513
Exon 3
Intron 3
Intron 4
Intron 6
Exon 7
6674
3′ UTR
Pairwise linkage disequilibrium (LD) (D0 ) across CYP1A2 in the combined Ethiopian sample. Monomorphic loci and rare variants (where frequency
< 0.01) were removed from the datasets before the analysis. CYP1A2 variants and their relative locations within the gene are highlighted in grey,
D0 values of 1 are highlighted in pink, significant w2 associations are in bold (P < 0.05). The area bordered in red constitutes an LD block as defined
by Haploview (www.hapmap.org/haploview). UTR, untranslated region.
common ancestor (unbiased estimate plus confidence
interval) was inferred under the Simple Stepwise
Mutation Model of microsatellite evolution. The AC
microsatellite mutation rate per generation was assumed
to be 0.0005 [50]. Confidence intervals were obtained on
the distance between the assumed ancestral and sampled
chromosomes (ignoring uncertainty in mutation rate) by
simulation assuming a star genealogy. This type of
genealogy was assumed because most nonancestral
haplotypes were rare (in some cases most were singletons) and negative Tajima D values were observed for all
Ethiopian populations (Table 6), indicating that the
genealogy linking the CYP1A2 chromosomes was more
like the star genealogy characteristic of population growth
than the genealogy associated with no growth. For each
generation, a time period of 32 years was assumed based
on previously reported estimates [51].
Results
CYP1A2 variation observed in Ethiopia
A total of 49 different CYP1A2 polymorphic sites were
observed in the Ethiopian samples (Table 1). No
genotype frequencies for any population deviated significantly from Hardy–Weinberg equilibrium at the 1%
significance level, variant sites were not observed within
17 bases on either side of each intron/exon boundary and
all reported catalytic residues (amino acids D320 and
T321 in exon 4, and F451 and C458 in exon 7) were
monomorphic. As many as 21 (43%) of the variant alleles
were private to populations and 30 (61%) were previously
unreported (Table 1), including seven nonsynonymous
variants, one of which is a premature stop codon in exon 7
(5384 C > A resulting in Y495X) observed in Anuak at
3%. Notably, nonsynonymous variants never exceeded
frequencies of 11% in any one group and those predicted
to alter the structure/function of the protein were
observed at frequencies between 1 and 3% in their
respective populations (Table 1). The majority of CYP1A2
SNP loci are in total LD (D0 = 1), but several cases where
D0 was less than 1 were observed across the gene (Fig. 1).
The majority of lower D0 values were evident between
pairs of loci including at least one marker towards the 30
end of the gene, and loci from intron 1 up to and
including 5521 in the 30 UTR constituted an LD block as
defined by other investigators (Fig. 1).
Copyright © Lippincott Williams & Wilkins. Unauthorized reproduction of this article is prohibited.
CYP1A2 haplotypes observed across the entire gene in the Ethiopian samples
652 Pharmacogenetics and Genomics
Table 2
2010, Vol 20 No 11
Haplotype frequencies are shown in Table 3.
1
Position from base A in the initiation codon (A in ATG is + 1, base before A is – 1) from the CYP1A2 genomic reference sequence (NC_000015.8).
2
White cell, allele observed in CYP1A2*1A, grey cell, derived allele. Underlined haplotypes were unambiguously resolved from homozygous genotypes at all loci or from a single site heterozygote. Haplotypes reported by the
CYP450 Allele Nomenclature Committee are named. Three of the variants reported in the Ethiopians were not incorporated into haplotypes because they were only polymorphic in samples with missing data at other polymorphic
sites.
Copyright © Lippincott Williams & Wilkins. Unauthorized reproduction of this article is prohibited.
A genomic biography of CYP1A2 Browning et al. 653
Table 3
CYP1A2 entire gene haplotype frequencies in Ethiopian populations
Afar
Amhara
Anuak
Maale
Oromo
Ethiopia overall
Haplotype
ID
n
Frequency
n
Frequency
n
Frequency
n
Frequency
n
Frequency
n
Frequency
1 (*1B)
2
3 (*1M)
4
5
6
7 (*17)
8
9
10
11 (*18)
12
13
14
15 (*19)
16
17
18
19
20
21
22
23
24
25
26
27
28
29 (*20)
30
31
32
33
34
35
36
37 (*21)
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55 (*1F)
Total
42
2
32
3
0
2
2
6
4
8
2
2
2
1
2
1
1
1
1
1
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
118
0.36
0.02
0.27
0.03
0.00
0.02
0.02
0.05
0.03
0.07
0.02
0.02
0.02
0.01
0.02
0.01
0.01
0.01
0.01
0.01
0.01
0.01
0.01
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
1.00
51
0
42
2
0
0
1
2
0
9
0
1
0
0
0
0
0
0
1
1
0
4
0
1
1
2
1
1
1
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
124
0.41
0.00
0.34
0.02
0.00
0.00
0.01
0.02
0.00
0.07
0.00
0.01
0.00
0.00
0.00
0.00
0.00
0.00
0.01
0.01
0.00
0.03
0.00
0.01
0.01
0.02
0.01
0.01
0.01
0.01
0.01
0.01
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
1.00
65
3
5
7
11
8
2
9
0
1
0
5
0
0
0
0
0
5
0
2
0
0
0
0
0
0
0
1
1
0
1
0
1
1
3
1
3
1
1
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
140
0.46
0.02
0.04
0.05
0.08
0.06
0.01
0.06
0.00
0.01
0.00
0.04
0.00
0.00
0.00
0.00
0.00
0.04
0.00
0.01
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.01
0.01
0.00
0.01
0.00
0.01
0.01
0.02
0.01
0.02
0.01
0.01
0.01
0.01
0.01
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
1.00
42
0
16
3
0
1
1
0
0
12
0
0
0
0
1
0
0
20
2
2
1
8
0
0
0
2
0
0
8
0
0
0
0
0
0
0
0
1
1
1
0
0
1
4
1
1
1
1
1
0
0
0
0
0
0
132
0.32
0.00
0.12
0.02
0.00
0.01
0.01
0.00
0.00
0.09
0.00
0.00
0.00
0.00
0.01
0.00
0.00
0.15
0.02
0.02
0.01
0.06
0.00
0.00
0.00
0.02
0.00
0.00
0.06
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.01
0.01
0.01
0.00
0.00
0.01
0.03
0.01
0.01
0.01
0.01
0.01
0.00
0.00
0.00
0.00
0.00
0.00
1.00
31
1
24
8
0
2
1
1
2
8
0
1
0
0
0
0
0
4
0
0
0
4
1
0
0
0
2
0
1
0
0
0
0
0
0
0
0
1
0
1
0
0
1
1
0
0
0
0
0
1
1
2
1
1
1
102
0.30
0.01
0.24
0.08
0.00
0.02
0.01
0.01
0.02
0.08
0.00
0.01
0.00
0.00
0.00
0.00
0.00
0.04
0.00
0.00
0.00
0.04
0.01
0.00
0.00
0.00
0.02
0.00
0.01
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.01
0.00
0.01
0.00
0.00
0.01
0.01
0.00
0.00
0.00
0.00
0.00
0.01
0.01
0.02
0.01
0.01
0.01
1.00
231
6
119
23
11
13
7
18
6
38
2
9
2
1
3
1
1
30
4
6
2
17
2
1
1
4
3
2
11
1
2
1
1
1
3
1
3
3
2
3
1
1
2
5
1
1
1
1
1
1
1
2
1
1
1
616
0.375
0.010
0.193
0.037
0.018
0.021
0.011
0.029
0.010
0.062
0.003
0.015
0.003
0.002
0.005
0.002
0.002
0.049
0.006
0.010
0.003
0.028
0.003
0.002
0.002
0.006
0.005
0.003
0.018
0.002
0.003
0.002
0.002
0.002
0.005
0.002
0.005
0.005
0.003
0.005
0.002
0.002
0.003
0.008
0.002
0.002
0.002
0.002
0.002
0.002
0.002
0.003
0.002
0.002
0.002
1.000
Haplotypes are shown in Table 2.
n, number of chromosomes.
Across the entire CYP1A2 gene, 55 different haplotypes
were observed in the Ethiopian samples (Table 2). Only
haplotypes 1 (CYP1A2*1B), 3 (CYP1A2*1M) and 55
(CYP1A2*1F) were previously reported with the consequence that 52 novel haplotypes were found in this study.
Of the novel haplotypes found in this study, haplotypes 7,
11, 15, 29 and 37 have now been named by the CYP450
Allele Nomenclature Committee [20] as CYP1A2*17, *18,
*19, *20 and *21, respectively. CYP1A2*1B and *1M were
the most frequent haplotypes within the Ethiopians and
many of the novel haplotypes were rare (< 1%) (Table 3)
and closely related to those previously reported (Fig. 2).
When haplotypes were constructed using only nonsynonymous polymorphisms (named NS haplotypes hereafter),
to restrict the haplotype set to those most likely to affect
protein structure/function, 10 NS haplotypes were
Copyright © Lippincott Williams & Wilkins. Unauthorized reproduction of this article is prohibited.
654
Pharmacogenetics and Genomics 2010, Vol 20 No 11
Fig. 2
21
47
41
∗21
51
25
42
28
∗17
44
24
30
32
45
38
8
34
13
∗1M
6
26
10
40
46
9
33
48
14
17
27
12
16
23
∗18
∗1B
∗19
39
20
22
5
18
50
52
∗1F
4
2
∗20
35
36
19
31
53
43
49
54
Network analysis of CYP1A2 entire gene haplotypes observed in Ethiopian populations. Nodes represent haplotypes, which are named according to
the nomenclature outlined in Table 2. Nodes are proportional to haplotype frequencies within the combined Ethiopian populations (Table 3). White
nodes, previously reported haplotypes; grey nodes, novel haplotypes reported by this study. *The alleles named by the P450 Allele Nomenclature
Committee.
identified (Table 4). Seven contained previously unidentified nonsynonymous variants and four were predicted
to alter the structure/function of the protein (Table 4). The
modal NS haplotype ( Z 86%) in all populations was the
ancestral NS haplotype 7 (Table 4). Potentially damaging
NS haplotypes were observed in Amhara, Anuak and
Oromo, but their frequencies never exceeded 3% in any
one group (Table 4). Notably, diplotype configurations of
the NS haplotypes revealed that all individuals have at least
one haplotype predicted to code for an unaltered protein,
and the majority have a copy of the NS haplotype without
any mutations (Table 5).
Analyzing Ethiopian CYP1A2 variation in the context
of other populations
All CYP1A2 nucleotide variants and haplotypes (in the
regions sequenced in the Ethiopians in this study) found
at a frequency of Z 3% in the combined NIEHS sample
were detected in the Ethiopian samples (Fig. 3). CYP1A2
gene and nucleotide diversities were always observed to
be highest in African populations and lowest in Europeans
(Fig. 4). Notably, the pooled Ethiopian samples were
always more diverse than the pooled NIEHS samples and
even single Ethiopian ethnic groups were often more
diverse than the combined NIEHS samples (Fig. 4). The
majority of Ethiopian and NIEHS populations were
significantly different (exact test of population differentiation, P < 0.05) when entire gene haplotypes were
considered (Fig. 5a). However, when the haplotype set
was restricted to markers that are most likely to affect the
structure/function of the protein (i.e. NS haplotypes),
considerably less pairwise differentiation was observed
with significant differences only occurring among Ethiopian populations (Fig. 5b). Consistent with this, statistically significant interethnic differentiation was observed
in the coding region in the Ethiopians (hierarchical FST
based on NS haplotypes = 0.02, P < 0.00001) with 2% of
variation occurring among groups. Significant FST values
were also observed between Ethiopians and Europeans,
and Ethiopians and East Asians (Fig. 6). Interestingly, as
an illustration of intra-Ethiopia variation, a slightly greater
FST was observed between Amhara and Anuak (FST = 0.12,
P < 0.01) than between Hispanics and East Asians (FST =
0.11, P = 0.05) for CYP1A2 entire gene haplotypes.
The recent evolutionary history of CYP1A2
Testing for selection in CYP1A2
Tajima’s D was not significantly different from zero in any
population and Fisher’s exact test P values for each of the
McDonald–Kreitman tests were above 0.05 (Table 6).
Consequently, the hypothesis of neutrality [52] was not
Copyright © Lippincott Williams & Wilkins. Unauthorized reproduction of this article is prohibited.
f, frequency; n, number of chromosomes.
Position from base A in the initiation codon (A in ATG is + 1, base before A is – 1) from the CYP1A2 genomic reference sequence (NC_000015.8).
White cell, allele observed in CYP1A2*1A, grey cell, derived allele. Underlined haplotypes were unambiguously resolved from homozygous genotypes at all loci or from a single site heterozygote, bold haplotypes
contain previously unidentified nonsynonymous variants.
3
Predictions made using PolyPhen software. Predicted effects of each NS haplotype are based upon the single amino acid alterations.
2
1
Table 4
CYP1A2 NS haplotypes (only nonsynonymous variants)
A genomic biography of CYP1A2 Browning et al. 655
rejected in each case. Fu and Li’s D and F statistics
(Table 6) were not significant at the 5% significance
threshold for all populations except Amhara and Oromo.
The negative D and F statistics for Amhara and Oromo were
indicative of an excess of recent mutations in the genealogy,
which is consistent with purifying or positive selection
acting on CYP1A2 [43]. Although a Bonferroni correction for
multiple tests (10 in this case) suggests that a P value of
less than 0.005 would be considered significant, negative D
and F test statistics were observed for all Ethiopian populations. As a consequence, further analysis was performed to
try and determine the type of selection. As CYP1A2 is highly
conserved between species, for example, humans, mice and
rats [8], and noncoding variation is tolerated more than
coding variation in humans (Figs 4 and 5), the prior hypothesis was that purifying selection, not positive selection,
has been operating on CYP1A2.
Testing for evidence of purifying selection
Following the approach of Hughes et al. [45,46], with the
exception of 50 noncoding SNPs, lower mean intrapopulation gene diversities and mean interpopulation genetic distances were observed for nonsynonymous SNPs
(nonsense SNP and those predicted to cause radical and
conservative changes to protein structure) than SNPs in
the same gene, which have no effect on protein structure
(Fig. 7). Where data were sufficiently informative to
permit significance tests to be carried out, mean gene
diversity was significantly lower for radical nonsynonymous SNPs than intronic and 30 UTR SNPs (Fig. 7a).
Mean interpopulation genetic distance was also significantly lower for radical nonsynonymous SNPs than
conservative nonsynonymous SNPs and SNPs with no
effect on protein structure, however the mean for 50 noncoding SNPs was significantly lower than that for radical
nonsynonymous SNPs (Fig. 7b) (this may be explained by
small sample size as there were only two SNPs in the 50
noncoding category). These results are consistent with
purifying selection having acted at nonsynonymous
SNP sites predicted to cause radical changes to protein structure [45,46]. Evidence of purifying selection acting on nonsynonymous SNPs causing conservative
amino acid changes was also shown (Supplementary data
4, Supplemental digital content 1, http://links.lww.com/FPC/
A209).
CYP1A2 chronology: coalescent date estimates for
CYP1A2 variants
The CYP1A2 sequences and SNPstr genohaplotypes
(which incorporated the rs11072507 genotypes with the
AC microsatellite haplotypes) were informative enough
to date nine CYP1A2 variants, in addition to the G > C
SNP (rs11072507) in the SNPstr, in the Ethiopian
populations. Details regarding the distribution of microsatellite alleles for rs11072507 C and G and for each
CYP1A2 variant dated are shown in Supplementary data 5,
Supplemental digital content 1, http://links.lww.com/
Copyright © Lippincott Williams & Wilkins. Unauthorized reproduction of this article is prohibited.
656
Pharmacogenetics and Genomics 2010, Vol 20 No 11
Table 5
CYP1A2 diplotypes configured from NS haplotypes observed in Ethiopian populations
Afar
CYP1A2 NS diplotype
2/1
7/3
7/4
1/7
2/7
5/7
6/7
8/7
9/7
10/7
7/7
Grand Total
Amhara
Anuak
Maale
Oromo
Pooled Ethiopian sample
n
Frequency
n
Frequency
n
Frequency
n
Frequency
n
Frequency
n
Frequency
0
0
0
0
4
0
2
2
2
0
51
61
0.00
0.00
0.00
0.00
0.07
0.00
0.03
0.03
0.03
0.00
0.84
1.00
0
0
1
0
0
1
0
0
1
1
67
71
0.00
0.00
0.01
0.00
0.00
0.01
0.00
0.00
0.01
0.01
0.94
1.00
1
0
1
3
10
1
0
0
2
1
57
76
0.01
0.00
0.01
0.04
0.13
0.01
0.00
0.00
0.03
0.01
0.75
1.00
0
0
0
0
8
10
1
0
1
0
52
72
0.00
0.00
0.00
0.00
0.11
0.14
0.01
0.00
0.01
0.00
0.72
1.00
0
1
0
0
4
2
0
0
1
1
54
63
0.00
0.02
0.00
0.00
0.06
0.03
0.00
0.00
0.02
0.02
0.86
1.00
1
1
2
3
26
14
3
2
7
3
281
343
0.003
0.003
0.006
0.009
0.076
0.041
0.009
0.006
0.020
0.009
0.819
1.000
Unambiguously inferred diplotypes are underlined.
n, number of individuals.
FPC/A209. Coalescent date estimates ranged from 5000–
383 000 years (Table 7), and were consistent with each
other (i.e. the dating is consistent with a parsimonious
ordering of mutations) in the course of evolution of CYP1A2
in humans (Supplementary data 6, Supplemental digital
content 1, http://links.lww.com/FPC/A209). Where dates could
not be estimated from microsatellite data for nonsynonymous variants, date boundaries were approximated using a
mutation network (Supplementary data 7, Supplemental
digital content 1, http://links.lww.com/FPC/A209).
Discussion
CYP1A2 variation observed in Ethiopia
Resequencing CYP1A2 (all exons and flanking intronic
regions) in five Ethiopian ethnic groups has revealed a
substantial amount of previously unreported genetic variation. We found 55 different CYP1A2 haplotypes in the
Ethiopian samples alone. This haplotype set outnumbers
most of those reported to date, in some instances across
different populations, for each of the CYP450 genes [20].
Studies investigating genetic diversity of a range of drug
metabolizing enzymes in Ethiopians should be encouraged as the great extent of genetic variability evidenced
for CYP1A2 in the Ethiopians in this study is likely to
apply to other genes.
Several of the novel CYP1A2 alleles identified in this study
were predicted to change the structure/function of the
protein. As they were observed in individuals who were
at least 18 years old, it is clear that these variants, at least
in the heterozygous state, are compatible with survival to
reproductive age and that tolerance of functional variation
is, at least to some extent, evident for CYP1A2. The
premature stop codon Y495X was identified in Anuak at 3%
[with a 95% confidence interval of 0.007–0.066 (exact
Pearson–Klopper method)], hence in a population numbering 45 655 (the 1994 census record for Anuak), it is
expected that 2657 people would carry one copy of the
premature stop codon while 41 people would carry two
copies. The mutation occurs in the last exon and would
consequently (i) not result in nonsense-mediated mRNA
decay [53] and (ii) only cause the protein to lack 21 amino
acids. Functional studies should be able to determine
whether the premature stop codon leads to a nonfunctional
enzyme or a protein with reduced function. We are not aware
of any previous reports of variation in the coding region of
CYP1A2 likely to result in the shortening of the associated protein. If nonfunctionality is the case and if homozygotes do exist, then such individuals would be living
human CYP1A2 knockouts whose existence would open
interesting possibilities for research into P450-mediated
pharmacokinetic activity. CYP1A2 knockout mice are viable
and fertile [54], however, in addition to showing decreased
drug metabolism [54], they exhibit alterations in the
expression of genes related to cell–cycle regulation, insulin
action, lipogenesis, and fatty acid and cholesterol biosynthetic pathways [55]. The existence of human CYP1A2
knockouts may therefore be invaluable in assessing the
precise role of human CYP1A2 in physiological processes.
All other variants predicted to alter the structure/function
of the protein were very rare and never exceeded more
than a single observation in any one ethnic group.
Unrecognized variation cannot be studied in vivo, and
paucity of such knowledge may lead to inappropriate
therapeutic intervention and increase the risk of adverse
drug reactions. However, diplotype configurations indicate that most people in all populations in this study may,
depending on variation in the promoter, be expected to
have normal CYP1A2 function. Nevertheless, given the
frequency of the nonancestral NS haplotypes there will
be individuals, in different proportions in different ethnic
groups, expected to have two copies of nonancestral NS
haplotypes, but in this study in no case is this predicted
to be greater than 1%.
Analyzing Ethiopian CYP1A2 variation in the
context of other populations
Corresponding CYP1A2 sequence data from an additional
five populations (African–Americans, Yoruba, Europeans,
Hispanics and East Asians), generated by the NIEHS
Copyright © Lippincott Williams & Wilkins. Unauthorized reproduction of this article is prohibited.
A genomic biography of CYP1A2 Browning et al. 657
Fig. 3
Frequency (%) of CYP1A2 polymorphism
in the NIEHS sample population
65
60
55
50
45
40
35
30
25
20
15
10
5
G
>A
−1
63
C
−6 >A
1
A>
21 G
7
G
>A
31
0
G
>
33 A
1
C
>T
61
3
T>
G
86
9
G
14 >C
60
C
>T
15
13
C
>A
16
49
G
>T
16
69
C
>T
21
59
G
36 >A
13
T
51 >C
12
C
>T
53
47
C
>T
55
21
A
6 0 >G
21
C
>T
63
24
G
de
65
37 l
G
>A
66
74
C
>G
−5
−5
6
9
05
G
>A
0
CYP1A2 polymorphism
Frequency (%) of CYP1A2 haplotype
in the NIEHS sample population
35
30
25
20
15
10
5
−1
63
C
−1
63
C
>A
C
YP
;5
>A
3
1A
;5
47
2∗
34
T>
1B
7T
C
;6
>C
32
;5
−5
4G
52
69
>d
−1
1A
G
el
63
>A
>G
C
;−
;6
>A
1
67
63
;3
4C
C
61
>
>G
3T
A;
>C
21
59
;5
34
G
>A
7T
C
>C
YP
;5
1A
52
2∗
1A
1M
>G
;6
−1
67
63
4C
C
>A
>G
;5
−5
34
05
7T
G
>C
>A
−1
63
;5
34
C
−1
>A
33
7T
63
;2
1C
>C
C
17
>T
>A
G
;
;1
53
>A
51
47
;6
3C
T>
13
C
>A
T>
G
;5
;1
3
−1
47
51
63
T>
−1
3C
C
C
63
>A
>A
C
;
>A
;
5
−1
−6
34
;8
63
1A
7T
69
C
>
>C
G
G
>A
;
>C
;2
21
;5
15
59
34
9G
G
7T
>A
>A
>C
;
6
;5
02
52
1C
1A
>T
>G
;6
16
67
49
4C
G
>G
>T
;5
34
7T
>C
0
CYP1A2 haplotype
Ethiopian populations evidence all the common variation observed in the National Institute of Environmental Health Sciences (NIEHS) African–
American, Yoruba, European, Hispanic and East Asian sample populations. CYP1A2 polymorphisms observed (in the regions sequenced in the
Ethiopians in this study) in the NIEHS samples are shown above, haplotypes are shown below. Variation observed in the Ethiopians is shown in grey,
variation not observed in the Ethiopians is shown in black. CYP1A2* alleles and variants are numbered according to the CYP450 Allele
Nomenclature Committee system.
SNPs programme [23], were included in the analysis with
the Ethiopians. Despite lacking power because of small
sample sizes (Supplementary data 8, Supplemental digital
content 1, http://links.lww.com/FPC/A209), the NIEHS data
proved useful in placing the Ethiopian data, albeit tentatively, into a worldwide context.
Copyright © Lippincott Williams & Wilkins. Unauthorized reproduction of this article is prohibited.
658
Pharmacogenetics and Genomics 2010, Vol 20 No 11
Eu
ro
pe
an
Am
ha
ra
ic
an
isp
H
An
ua
k
ia
n
As
C
om
bi
ne
d
N
IE
H
S
Ea
st
sa
Af
ar
m
pl
es
sa
m
pl
es
O
ro
m
o
C
om
bi
ne
d
Et
hi
op
ia
n
M
aa
le
Yo
ru
ba
1.00
0.90
0.80
0.70
0.60
0.50
0.40
0.30
0.20
0.10
0.00
Af
ric
an
–A
m
er
ic
an
Gene diversity (h)
Fig. 4
Ea
st
As
ia
n
Eu
ro
pe
an
An
ua
k
Yo
ru
ba
H
isp
an
ic
N
IE
H
S
sa
m
pl
es
Am
ha
ra
C
om
bi
ne
d
Af
ar
sa
m
pl
es
Et
hi
op
ia
n
C
om
bi
ne
d
Af
ric
an
–A
m
er
ic
M
aa
le
an
0.09
0.08
0.07
0.06
0.05
0.04
0.03
0.02
0.01
0.00
O
ro
m
o
Nucleotide diversity (pi)
Population
Population
Gene (above) and nucleotide (below) diversity based on variation across the entire CYP1A2 gene and the coding sequence (only nonsynonymous
variation). Variation across the entire gene is shown in grey, nonsynonymous variation is shown in white. Error bars represent standard deviation.
Consistent with purifying selection acting upon CYP1A2, coding variation is tolerated the least.
Consistent with other studies [56] and with Africa being
the birthplace of mankind, African populations were the
most diverse in this study. Maale (1994 census population
46 458 [57]), Oromo (1994 census population, 17 080 318
[57]), Anuak (1994 census population 45 665 [57]) and
the combined Ethiopian sample populations were often
more diverse than the combined NIEHS data sets. Values
were also comparatively high in both the NIEHS AfricanAmerican and Yoruba data sets. Furthermore, consistent
with some anatomically modern humans migrating out
of Africa via Ethiopia, and a more recent migration of
Semitic-speaking peoples from Arabia into Ethiopia, all of
the common CYP1A2 variation found outside Ethiopia
remains present within Ethiopian groups. Consequently,
the Ethiopians could perhaps serve not only as a suitable
population for the development of CYP1A2 diagnostic
markers/tests useful in pharmacogenetic prediction in
populations worldwide, but also to ensure that such tests
were not only suitable for developed countries. These
findings also highlight the need to conduct population
genetic research in Ethiopians if conclusions reached concerning populations outside Ethiopia are to be interpreted
in context.
When haplotypes were constructed using only nonsynonymous polymorphisms, so as to restrict the haplotype set
to those most likely to affect the protein (although it is
accepted that variation in splice sites could also affect the
protein structure and variation in the promoter could
affect gene expression), Europeans, Hispanics and East
Copyright © Lippincott Williams & Wilkins. Unauthorized reproduction of this article is prohibited.
A genomic biography of CYP1A2 Browning et al. 659
Fig. 5
(a)
Afar
Afar
Amhara
Anuak
Maale
Oromo
African–American
Yoruba
European
Hispanic
East Asian
−
+
+
−
+
+
−
−
+
+
+
+
+
+
−
+
+
+
+
+
−
+
+
+
+
+
+
+
+
+
+
+
−
−
−
−
+
−
−
+
+
−
−
+
Amhara
0.11
Anuak
<0.01
<0.01
Maale
<0.01
<0.01
<0.01
Oromo
0.13
0.01
<0.01
<0.01
African–American
<0.01
<0.01
0.02
<0.01
0.02
Yoruba
<0.01
<0.01
0.18
<0.01
0.05
0.96
European
0.48
0.61
<0.01
0.01
0.46
<0.01
<0.01
Hispanic
0.16
0.03
<0.01
0.01
0.46
0.12
0.04
0.06
East Asian
0.01
<0.01
<0.01
<0.01
0.07
0.06
0.09
<0.01
(b)
Afar
Amhara
Anuak
Maale
Oromo
African–American
Yoruba
European
Hispanic
East Asian
+
−
+
−
−
−
−
−
−
+
+
−
−
−
−
−
−
+
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
Afar
Amhara
0.01
Anuak
0.07
<0.01
Maale
<0.01
<0.01
<0.01
Oromo
0.33
0.19
0.23
0.09
African–American
0.40
0.21
0.54
0.26
0.35
Yoruba
0.16
0.07
0.35
0.18
0.19
0.64
European
0.89
1.00
0.66
0.44
0.86
0.18
0.07
Hispanic
1.00
1.00
0.76
0.53
1.00
0.48
0.11
1.00
East Asian
0.39
0.67
0.29
0.13
0.56
0.31
0.16
1.00
−
0.10
−
1.00
Exact test of population differentiation P values (lower triangle) and significant/not significant ( + / – ) differences at the 5% threshold (upper triangle)
for CYP1A2 entire gene (a) and NS (b) haplotypes. Consistent with purifying selection acting upon CYP1A2, coding variation is tolerated the least.
Asians were considerably less variable than Ethiopians,
African–Americans and Yoruba. As a consequence, public
health policy makers may not have to be concerned about
variable drug response, because of variation in the protein,
in substantial proportions of individuals belonging to nonAfrican populations. However, given that currently most
drug testing is undertaken on non-African populations,
more testing on non-European/Asian populations is warranted. With increasing numbers of people having a recent
African descent living in Europe and the Americas their
pharmacogenetic profiles should be represented in clinical
trials. In addition, there should be closer attention paid to
them in postmarketing surveillance and greater awareness
of genetic variability among them.
Of further practical relevance in healthcare, statistically
significant variation exists among Ethiopian indigenous
groups living in close geographical proximity. Moreover,
the Ethiopian populations were the only groups to be differentiated when NS haplotypes were considered. In light
of this, the general Ethiopian population should perhaps
not be treated, at the CYP1A2 protein level, as one homogenous group, a finding which undoubtedly has implications for future therapeutic intervention in Ethiopia.
The recent evolutionary history of CYP1A2
The coalescent date estimates of the CYP1A2 variants in
this study were old and all, except for 1589 G > T (which
was not found in non-Ethiopians), predated the expansion of modern humans out of Africa less than 100 000
years ago [58]. In fact, five variants ( – 739 G > T and
– 163 A > C, both of which are in intron 1, 1513 C > A
in exon 3 causing S298R, 3613 T > C in intron 6 and 6324
Copyright © Lippincott Williams & Wilkins. Unauthorized reproduction of this article is prohibited.
660 Pharmacogenetics and Genomics 2010, Vol 20 No 11
Fig. 6
Afar
Amhara
Anuak
Maale
Oromo
African–American Yoruba
European
Hispanic
East Asian
Afar
−0.01
0.04
0.11
0.05
0.79
0.75
0.32
0.24
0.27
0.26
Amhara
0.02
−0.01
0.01
<0.01
0.12
0.28
0.08
0.99
0.99
0.99
Anuak
0.01
0.05
−0.01
0.17
0.15
0.77
0.77
0.03
0.13
0.06
Maale
0.01
0.05
0.00
−0.01
0.03
0.45
0.58
0.05
0.10
0.05
Oromo
0.00
0.01
0.01
0.01
−0.01
0.79
0.21
0.25
0.41
0.44
African–American
−0.02
0.03
−0.01
−0.01
−0.02
−0.05
0.77
0.17
0.50
0.51
Yoruba
0.01
0.15
−0.03
−0.02
0.02
−0.03
−0.09
0.07
0.14
0.10
European
0.01
−0.01
0.04
0.04
0.01
0.04
0.15
−0.03
0.99
0.99
Hispanic
0.01
−0.02
0.03
0.03
0.00
0.02
0.11
0.00
−0.05
0.99
East Asian
0.00
−0.01
0.03
0.03
0.00
0.01
0.08
−0.01
−0.01
−0.03
0.3
PCO 2 (3.73%)
0.2
East
African–American Asian
0.1
0.0
Yoruba
Amhara
Afar & Oromo
European
Hispanic
Anuak & Maale
−0.1
−0.2
−0.2
−0.1
0.0
0.1
0.2
0.3
PCO 1 (94.17%)
Genetic distances (Fst) between Ethiopian and NIEHS sample populations based on CYP1A2 NS haplotypes. Population pairwise genetic
distances (grey) and P values (upper triangle) are shown above. P values below the 5% significance threshold are shown in bold. A Principle
Coordinates analysis plot of these Fst values is shown below.
G > del in the 30 UTR) were estimated to have arisen
before the emergence of modern humans in Africa less
than 200 000 years ago [1,59,60].
Fu and Li’s tests pointed towards selection (purifying or
positive) in CYP1A2 in Amhara and Oromo, but Tajima’s
D and the McDonald–Kreitman test did not detect
selection in any of the populations analyzed in this study.
These tests are however known to lack power. Furthermore, as recombination was inferred in the Ethiopian
datasets and many Ethiopian groups have hierarchical
structures [61], it is possible that selective pressures
operating on CYP1A2 would not be detected by commonly used neutrality tests.
Reduction of both mean intrapopulation gene diversity
and mean interpopulation genetic distance for radical
nonsynonymous mutations in comparison with silent
mutations (which have no effect on protein structure)
in CYP1A2 was consistent with the hypothesis [45,46]
that purifying selection has acted at these nonsynonymous SNP sites. Purifying selection was also evidenced in
the case of conservative nonsynonymous SNPs. Further
support for this phenomenon comes from the approximate
Copyright © Lippincott Williams & Wilkins. Unauthorized reproduction of this article is prohibited.
A genomic biography of CYP1A2 Browning et al. 661
Table 6
Results of neutrality tests performed on CYP1A2 in the Ethiopian and NIEHS populations
Population
n
Afar
Amhara
Anuak
Maale
Oromo
African–American
Yoruba
European
Hispanic
East Asian
Tajima’s test
McDonald–Kreitman test
Tajima’s D (P value)
Fisher’s exact test, P value (two tailed)
118
124
140
132
102
14
10
24
14
26
– 0.88
– 1.16
– 1.24
– 0.86
– 0.85
– 0.77
– 0.63
0.78
– 0.53
0.95
(P > 0.10)
(P > 0.10)
(P > 0.10)
(P > 0.10)
(P > 0.10)
(P > 0.10)
(P > 0.10)
(P > 0.10)
(P > 0.10)
(P > 0.10)
Fu and Li’s test with an outgroup
D test (P value)
0.34
0.32
0.32
0.34
0.34
1.00
0.23
1.00
1.00
1.00
– 0.84
– 4.24
– 1.07
– 0.88
– 2.55
– 0.81
– 0.95
0.07
0.24
0.97
F test (P value)
(P > 0.10)
(0.01 < P < 0.02)
(P > 0.10)
(P > 0.10)
(0.02 < P < 0.05)
(P > 0.10)
(P > 0.10)
(P > 0.10)
(P > 0.10)
(P > 0.10)
– 1.02
– 3.68
– 1.35
– 1.05
– 2.30
– 0.95
– 1.03
0.33
0.38
1.13
(P > 0.10)
(0.01 < P < 0.02)
(P > 0.10)
(P > 0.10)
(0.02 < P < 0.05)
(P > 0.10)
(P > 0.10)
(P > 0.10)
(P > 0.10)
(P > 0.10)
Each test was performed on each of the individual Ethiopian and NIEHS populations to control for the effects of different demographic histories.
n, number of chromosomes.
Mean gene diversity (h)
NA
NS
(1
2)
NA
(a) 0.35
(1
)
Fig. 7
∗
∗
NA
0.30
0.25
0.20
0.15
0.10
0.05
(1
)
Sy
no
ny
m
ou
s
3′
UT
R
(1
4)
(2
9)
In
tro
n
C
on
se
rv
at
ive
N
on
se
ns
e
Ra
di
ca
l(
2)
5′
no
nc
od
in
g
(2
)
0.00
SNP category (number of SNP sites)
∗∗∗
(b) 0.07
∗∗
NS
∗∗∗
∗∗∗
∗∗∗
Mean genetic distance
0.06
0.05
0.04
0.03
0.02
0.01
(4
5)
Sy
no
ny
m
ou
s
(6
30
)
3′
UT
R
(1
30
5)
In
tro
n
0)
(5
4
C
on
se
rv
at
ive
(4
5)
N
on
se
ns
e
Ra
di
ca
l(
90
)
5′
no
nc
od
in
g
(9
0)
0.00
SNP category (number of interpopulation genetic distances)
Evidence of purifying selection acting on CYP1A2. Mean intrapopulation gene diversity at nonsynonymous (radical, nonsense and conservative),
synonymous and noncoding single nucleotide polymorphism (SNP) sites in the combined Ethiopian and NIEHS populations is shown above (a).
Mean interpopulation genetic distance for all interpopulation comparisons for the same SNP sites is shown below (b). Error bars indicate variance
from the mean. One tail P values from t-tests of the hypothesis that mean gene diversity/genetic distance for each SNP category equals that for
radical nonsynonymous SNP loci are represented as follows: NA, t-test not applicable because of small sample number; NS, not significant,
P > 0.05; *P < 0.05; **P < 0.01; ***P < 0.001. Radical SNPs = 217G > A (G73R) and 5094T > C (F432S), nonsense SNP = 5284C > A (Y495X),
conservative SNPs = 53C > G (S18C), 310G > A (D104N), 331C > T (L111F), 613T > G (F205 V), 1460C > T (R281W), 1513C > A (S298R),
3463C > T (T395M), 3468A > C (N397H), 5105G > A (D436N), 5112C > T (T438I), 5253C > G (P485R), 5328G > A (R510Q).
Copyright © Lippincott Williams & Wilkins. Unauthorized reproduction of this article is prohibited.
662
Inference of the TMRCA (unbiased estimate plus confidence interval) for CYP1A2 variants and rs11072507
Variants in yellow were assumed to have recombined with rs11072507 (Table S2) and were consequently dated using microsatellites on the background of each of rs11072507 C and G separately and together [C always
produced the younger dates (in blue) which were assumed to be the coalescent dates of the recombination events]. All other CYP1A2 variants (in purple) only occurred on the background of rs11072507 G (Table S2). Date
estimates in green were assumed to be the coalescent dates of the SNPs. CYP1A2 variants are arranged in the order of increasing time to most common recent ancestor (TMRCA). Both 2159 G > A and 5347 C > T could not be
dated on the background of only the rs11072507 G allele because of small sample numbers. Coalescent date estimates would not however have been significantly different between the rs11072507 G and C background in any
case because no more than two G linked chromosomes were available for each variant. In both cases, coalescent dates from rs11072507 G and C combined were assumed to be the coalescent dates of the SNPs.
G, generations; n, chromosome number; n/a, not applicable; Y, years.
Copyright © Lippincott Williams & Wilkins. Unauthorized reproduction of this article is prohibited.
Pharmacogenetics and Genomics 2010, Vol 20 No 11
Table 7
A genomic biography of CYP1A2 Browning et al. 663
coalescence date boundaries of these nonsynonymous
mutations being consistent with the following hypothesis
[45]: mutations thought to be under purifying selection
may include variants which drifted to high frequencies in
smaller ancestral populations before the substantial
population growth experienced by anatomically modern
humans approximately 100 000 years ago. As effective
population size increased, purifying selection became
more effective and the frequencies and numbers of
nonsynonymous alleles decreased gradually over time
[45].
As the minor allele frequencies (1% to 11%) at CYP1A2
loci evidenced to be under purifying selection are substantially higher than those of genes causative of severe
Mendelian diseases, the data suggests that the selective
forces acting against these nonsynonymous SNPs are
weak in comparison with those at SNP sites causative of
severe disease [45,46]. Furthermore, as mutations associated with complex diseases are expected to be individually only slightly deleterious, as opposed to highly
deleterious variants associated with Mendelian diseases,
it has been claimed that evidence of weak purifying selection may be used to identify candidate alleles for complex
disease-association studies [45,46]. As a consequence, it
may be appropriate to include nonsynonymous SNPs
identified in this study in future studies investigating
complex diseases which have been linked to CYP1A2,
for example several cancers [19] and cardiovascular
disease [62].
Wooding et al. [56] investigated DNA sequence variation
in a 3.7 kb noncoding sequence 50 of the CYP1A2 gene in
more than 100 individuals of recent African, Asian and
European ancestry, and present evidence for positive
selection based on an excess of high-frequency derived
SNPs in comparisons with outgroup species. We provide
evidence of purifying selection within CYP1A2. There are
many possible interpretations of the different conclusions, among which are: (i) positive selection may not be
directly acting on the 50 region of CYP1A2, but on genetic
loci other than CYP1A2 in LD with the 50 CYP1A2 locus
[56], (ii) CYP1A2 has, in the course of evolution leading
to anatomically modern human, been under positive
selection, but subsequently has only been subject to
purifying selection and (iii) although analysis, in this and
the earlier study, may be consistent with selection, random
drift alone could explain the patterns of variation observed.
humans migrated out of Africa via Ethiopia, but also
emphasizes the value of conducting population genetic
research within Ethiopia if appropriate conclusions are to
be formulated concerning populations outside of Ethiopia. Unrecognized variation can lead to unsuitable healthcare intervention and can increase the risk of an adverse
drug reaction. Investigations such as this are therefore not
only of benefit to the indigenous populations of Ethiopia,
but are also of increasing importance in directing public
healthcare policies in the developed world, where the
number of individuals of recent Ethiopian descent is
growing.
Acknowledgements
The authors thank all DNA sample donors, and Professor
Sue Povey and Professor Dallas Swallow for their helpful
discussion. Neil Bradman is chairman of The Centre for
Genetic Anthropology (TCGA) and an honorary lecturer
in the research department of Genetics, Evolution and
Environment at University College London. He is also
joint chairman of the London and City Group of
Companies and has extensive business and financial
interests including involvement in biotechnology ventures and educational material used by researchers in
biomedicine and the life sciences. Nevertheless, he does
not have any specific commercial interest in the subject
matter of this study. The study was funded in part by a
charitable trust of which Neil Bradman is a trustee. The
charitable trust has no intellectual property or other
rights whatsoever with respect to the research, which
forms the subject matter of the paper. All other authors
have no conflict of interest to declare. This study was
supported by the Biotechnology and Biological Sciences
Research Council.
References
1
2
3
4
5
6
Conclusion
This study has shown Ethiopian populations to be highly
diverse compared with populations studied from the rest
of the world and have a substantial amount of previously
uncharacterized CYP1A2 variation. There is also evidence
that much of the variation found on a global scale has
been retained. Not only does this serve as further support
for the proposition that some anatomically modern
7
8
9
Campbell MC, Tishkoff SA. African genetic diversity: implications for human
demographic history, modern human origins, and complex disease mapping.
Annu Rev Genomics Hum Genet 2008; 9:403–433.
Li JZ, Absher DM, Tang H, Southwick AM, Casto AM, Ramachandran S,
et al. Worldwide human relationships inferred from genome-wide patterns of
variation. Science 2008; 319:1100–1104.
Prugnolle F, Manica A, Balloux F. Geography predicts neutral genetic
diversity of human populations. Curr Biol 2005; 15:R159–R160.
Ramachandran S, Deshpande O, Roseman CC, Rosenberg NA,
Feldman MW, Cavalli-Sforza LL. Support from the relationship of genetic
and geographic distance in human populations for a serial founder effect
originating in Africa. Proc Natl Acad Sci U S A 2005; 102:15942–15947.
Forster P, Matsumura S. Evolution. Did early humans go north or south?
Science 2005; 308:965–966.
Reed FA, Tishkoff SA. African human diversity, origins and migrations. Curr
Opin Genet Dev 2006; 16:597–605.
Tishkoff SA, Reed FA, Friedlaender FR, Ehret C, Ranciaro A, Froment A,
et al. The genetic structure and history of Africans and African Americans.
Science 2009; 324:1035–1044.
Aklillu E, Carrillo JA, Makonnen E, Hellman K, Pitarque M, Bertilsson L, et al.
Genetic polymorphism of CYP1A2 in Ethiopians affecting induction and
expression: characterization of novel haplotypes with single-nucleotide
polymorphisms in intron 1. Mol Pharmacol 2003; 64:659–669.
Jiang Z, Dragin N, Jorge-Nebert LF, Martin MV, Guengerich FP, Aklillu E, et al.
Search for an association between the human CYP1A2 genotype and CYP1A2
metabolic phenotype. Pharmacogenet Genomics 2006; 16:359–367.
Copyright © Lippincott Williams & Wilkins. Unauthorized reproduction of this article is prohibited.
664 Pharmacogenetics and Genomics 2010, Vol 20 No 11
10 Li XQ, Bjorkman A, Andersson TB, Gustafsson LL, Masimirembwa CM.
Identification of human cytochrome P(450)s that metabolise anti-parasitic
drugs and predictions of in vivo drug hepatic clearance from in vitro data.
Eur J Clin Pharmacol 2003; 59:429–442.
11 Federal Ministry of Health. Malaria: Diagnosis and treatment guidelines for
health workers in Ethiopia. Addis Ababa: Ethiopia; Federal Democratic
Republic of Ethiopia, Ministry of Health; 2004.
12 Anthony F, Combes C, Astorga C, Bertrand B, Graziosi G, Lashermes P. The
origin of cultivated Coffea arabica L. varieties revealed by AFLP and SSR
markers. Theor Appl Genet 2002; 104:894–900.
13 NCBI36: http://www.ncbi.nlm.nih.gov/. [Accessed 2009].
14 Shimada T, Yamazaki H, Mimura M, Inui Y, Guengerich FP. Interindividual
variations in human liver cytochrome P-450 enzymes involved in the oxidation
of drugs, carcinogens and toxic chemicals: studies with liver microsomes of 30
Japanese and 30 Caucasians. J Pharmacol Exp Ther 1994; 270:414–423.
15 Jiang Z, Dalton TP, Jin L, Wang B, Tsuneoka Y, Shertzer HG, et al. Toward
the evaluation of function in genetic variability: characterizing human SNP
frequencies and establishing BAC-transgenic mice carrying the human
CYP1A1_CYP1A2 locus. Hum Mutat 2005; 25:196–206.
16 Ikeya K, Jaiswal AK, Owens RA, Jones JE, Nebert DW, Kimura S. Human
CYP1A2: sequence, gene structure, comparison with the mouse and rat
orthologous gene, and differences in liver 1A2 mRNA expression. Mol
Endocrinol 1989; 3:1399–1408.
17 Sansen S, Yano JK, Reynald RL, Schoch GA, Griffin KJ, Stout CD, et al.
Adaptations for the oxidation of polycyclic aromatic hydrocarbons
exhibited by the structure of human P450 1A2. J Biol Chem 2007;
282:14348–14355.
18 Flockhart DA. Drug interactions: Cytochrome P450 drug interaction table.
Indiana University School of Medicine 2007: http://medicine.iupui.edu/
clinpharm/ddis/table.asp. [Accessed 2010].
19 Gunes A, Dahl ML. Variation in CYP1A2 activity and its clinical implications:
influence of environmental factors and genetic polymorphisms.
Pharmacogenomics 2008; 9:625–637.
20 Home page of the human Cytochrome P450 (CYP) allele nomenclature
committee: http://www.cypalleles.ki.se/. [Accessed 2010].
21 Klein TE, Chang JT, Cho MK, Easton KL, Fergerson R, Hewett M, et al.
Integrating genotype and phenotype information: an overview of the
PharmGKB project. Pharmacogenetics research network and knowledge
base. Pharmacogenomics J 2001; 1:167–170.
22 NCBI dbSNP 129: http://www.ncbi.nlm.nih.gov/. [Accessed 2009].
23 NIEHS SNPs. NIEHS Environmental Genome Project, University of
Washington, Seattle, WA: http://egp.gs.washington.edu. [Accessed
January 2009].
24 Murayama N, Soyama A, Saito Y, Nakajima Y, Komamura K, Ueno K, et al.
Six novel nonsynonymous CYP1A2 gene polymorphisms: catalytic
activities of the naturally occurring variant enzymes. J Pharmacol Exp Ther
2004; 308:300–306.
25 Soyama A, Saito Y, Hanioka N, Maekawa K, Komamura K, Kamakura S, et al.
Single nucleotide polymorphisms and haplotypes of CYP1A2 in a Japanese
population. Drug Metab Pharmacokinet 2005; 20:24–33.
26 Lewontin RC. The interaction of selection and linkage. I. General
considerations; heterotic models. Genetics 1964; 49:49–67.
27 Abecasis GR, Cookson WO. GOLD–graphical overview of linkage
disequilibrium. Bioinformatics 2000; 16:182–183.
28 Excoffier L, Laval G, Schneider S. Arlequin (version 3.0): an integrated
software package for population genetics data analysis. Evol Bioinform
Online 2005; 1:47–50.
29 Excoffier L, Laval G, Balding D. Gametic phase estimation over large
genomic regions using an adaptive window approach. Hum Genomics
2003; 1:7–19.
30 Nei M. Molecular Evolutionary Genetics. New York: Columbia University
Press; 1987.
31 Goudet J, Raymond M, de Meeus T, Rousset F. Testing differentiation in
diploid populations. Genetics 1996; 144:1933–1940.
32 Rousset F, Raymond M. Testing heterozygote excess and deficiency.
Genetics 1995; 140:1413–1419.
33 Reynolds J, Weir BS, Cockerham CC. Estimation of the coancestry
coefficient: basis for a short-term genetic distance. Genetics 1983;
105:767–779.
34 Excoffier L, Smouse PE, Quattro JM. Analysis of molecular variance inferred
from metric distances among DNA haplotypes: application to human
mitochondrial DNA restriction data. Genetics 1992; 131:479–491.
35 Gower JC. Some distance properties of latent root and vector methods
used in multivariate analysis. Biometrika 1966; 53:325–328.
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
The R project for statistical computing: http://www.r-project.org/.
[Accessed 2009].
Veeramah KR, Thomas MG, Weale ME, Zeitlyn D, Tarekegn A, Bekele E, et al.
The potentially deleterious functional variant flavin-containing
monooxygenase 2*1 is at high frequency throughout sub-Saharan Africa.
Pharmacogenet Genomics 2008; 18:877–886.
PolyPhen: Prediction of functional effect of human nsSNPs: http://
genetics.bwh.harvard.edu/pph/. [Accessed 2009].
Bandelt HJ, Forster P, Rohl A. Median-joining networks for inferring
intraspecific phylogenies. Mol Biol Evol 1999; 16:37–48.
Fluxus-engineering.com: http://www.fluxus-engineering.com/. [Accessed
2009].
Tajima F. Statistical method for testing the neutral mutation hypothesis by
DNA polymorphism. Genetics 1989; 123:585–595.
McDonald JH, Kreitman M. Adaptive protein evolution at the Adh locus in
Drosophila. Nature 1991; 351:652–654.
Fu YX, Li WH. Statistical tests of neutrality of mutations. Genetics 1993;
133:693–709.
Librado P, Rozas J. DnaSP v5: a software for comprehensive analysis of
DNA polymorphism data. Bioinformatics 2009; 25:1451–1452.
Hughes AL, Packer B, Welch R, Bergen AW, Chanock SJ, Yeager M.
Widespread purifying selection at polymorphic sites in human
protein-coding loci. Proc Natl Acad Sci U S A 2003; 100:15754–15757.
Hughes AL, Packer B, Welch R, Bergen AW, Chanock SJ, Yeager M. Effects
of natural selection on interpopulation divergence at polymorphic sites in
human protein-coding Loci. Genetics 2005; 170:1181–1187.
Goldstein DB, Ruiz LA, Cavalli-Sforza LL, Feldman MW. Genetic absolute
dating based on microsatellites and the origin of modern humans. Proc Natl
Acad Sci U S A 1995; 92:6723–6727.
Slatkin M. A measure of population subdivision based on microsatellite allele
frequencies. Genetics 1995; 139:457–462.
Behar DM, Thomas MG, Skorecki K, Hammer MF, Bulygina E,
Rosengarten D, et al. Multiple origins of Ashkenazi levites: Y chromosome
evidence for both Near Eastern and European ancestries. Am J Hum Genet
2003; 73:768–779.
Farrall M, Weeks DE. Mutational mechanisms for generating microsatellite
allele-frequency distributions: an analysis of 4558 markers. Am J Hum
Genet 1998; 62:1260–1262.
Tremblay M, Vezina H. New estimates of intergenerational time intervals for
the calculation of age and origins of mutations. Am J Hum Genet 2000;
66:651–658.
Kimura M. Rare variant alleles in the light of the neutral theory. Mol Biol Evol
1983; 1:84–93.
Maquat LE. Nonsense-mediated mRNA decay in mammals. J Cell Sci 2005;
118:1773–1776.
Liang HC, Li H, McKinnon RA, Duffy JJ, Potter SS, Puga A, et al. Cyp1a2(–/
–) null mutant mice develop normally but show deficient drug metabolism.
Proc Natl Acad Sci U S A 1996; 93:1671–1676.
Smith AG, Davies R, Dalton TP, Miller ML, Judah D, Riley J, et al. Intrinsic
hepatic phenotype associated with the Cyp1a2 gene as shown by cDNA
expression microarray analysis of the knockout mouse. EHP
Toxicogenomics 2003; 111:45–51.
Wooding SP, Watkins WS, Bamshad MJ, Dunn DM, Weiss RB, Jorde LB.
DNA sequence variation in a 3.7-kb noncoding sequence 50 of the CYP1A2
gene: implications for human population history and natural selection. Am J
Hum Genet 2002; 71:528–542.
Federal Democratic Republic of Ethiopia Office of Population and Housing
Census Commission Central Statistical Authority. The 1994 population and
housing census for Ethiopia. Results at country level. Volume 2 analytical
report. Addis Ababa: Central Statistical Authority; 1999.
Tishkoff SA, Verrelli BC. Patterns of human genetic diversity: implications for
human evolutionary history and disease. Annu Rev Genomics Hum Genet
2003; 4:293–340.
McDougall I, Brown FH, Fleagle JG. Stratigraphic placement and age of
modern humans from Kibish, Ethiopia. Nature 2005; 433:733–736.
White TD, Asfaw B, DeGusta D, Gilbert H, Richards GD, Suwa G, et al.
Pleistocene homo sapiens from Middle Awash, Ethiopia. Nature 2003;
423:742–747.
Freeman D, Pankhurst A. Peripheral people. The excluded minorities of
Ethiopia. United Kingdom: C. Hurst and Co. Ltd; 2003.
Cornelis MC, El Sohemy A, Campos H. Genetic polymorphism of
CYP1A2 increases the risk of myocardial infarction. J Med Genet 2004;
41:758–762.
Copyright © Lippincott Williams & Wilkins. Unauthorized reproduction of this article is prohibited.
Download