Prevalence of Clinically Relevant and Haplotypes in African Populations UGT1A

advertisement
doi: 10.1111/j.1469-1809.2010.00638.x
Prevalence of Clinically Relevant UGT1A Alleles
and Haplotypes in African Populations
Laura J. Horsfall1 , David Zeitlyn2 , Ayele Tarekegn3 , Endashaw Bekele4 , Mark G. Thomas1,5 ,
Neil Bradman3 and Dallas M. Swallow1∗
1
Department of Genetics, Evolution and Environment, University College London, Wolfson House, London NW1 2HE, UK
2
Institute of Social and Cultural Anthropology, School of Anthropology and Museum Ethnography, University of Oxford
3
The Centre for Genetic Anthropology, Department of Genetics, Evolution and Environment, University College London
4
Department of Biology, Addis Ababa University, Addis Ababa, Ethiopia
5
Department of Evolutionary Biology, Uppsala University, Uppsala, Sweden
Summary
Variation of a short (TA)n repeat sequence (rs8175347) covering the TATA box of UGT1A1 (UDPglucuronosyltransferase1A1) is associated with hyperbilirubinaemia (Gilbert’s syndrome) and adverse drug reactions,
and is used for dosage advice for irinotecan. Several reports indicate that the low-activity (risk) alleles ((TA)7 and (TA)8 ))
are very frequent in Africans but the patterns of association with other variants in the UGT1A gene complex that may
modulate these responses are not well known. rs8175347 and two other clinically relevant UGT1A variants (rs11692021
and rs10929302) were assayed in 2616 people from Europe and Africa. Low-activity (TA)n alleles frequencies were
highest in equatorial Africa, (TA)7, being the most common in Cameroon, Ghana, southern Sudan, and in Ethiopian
Anuak. Haplotypic diversity was also greatest in equatorial Africa, but in Ethiopia was very variable across ethnic groups.
Resequencing of the promoter of a sample subset revealed no novel variations, but rs34547608 and rs887829 were typed
and shown to be tightly associated with (TA)n . Our results illustrate the need for investigation of the effect of UGT1A
variants other than (TA)n on the risk of irinotecan toxicity, as well as hyperbilirubinaemia due to hemolytic anaemia or
human immunodeficiency virus protease inhibitors, so that appropriate pharmacogenetic advice can be given.
Keywords: Drug metabolism, UDP-glucuronosyltransferase 1A1, UDP-glucuronosyltransferase 1A7, haplotype
diversity, allele frequency, population structure, UGT1A gene complex, bilirubin
Introduction
UDP-glucuronosyltransferase 1A isoform 1 (UGT1A1) is a
phase II drug metabolizing enzyme responsible for converting a wide array of drugs to water-soluble glucuronides suitable for renal or biliary elimination (MIM∗ 191740). UGT1A1
is also the main isozyme capable of conjugating bilirubin,
the endogenous yellow pigment resulting from natural haeme
catabolism (Bosma et al., 1994).
The inherited hyperbilirubinaemia known as Gilbert’s syndrome (MIM∗ 143500), for which intermittent episodes of
jaundice are the most widely recognised symptom, is attributable to reduced UGT1A1 activity (Bosma et al., 1995).
∗
Corresponding author: Dallas M. Swallow, Department of Genetics, Environment and Evolution, University College London,
Wolfson House, London NW1 2HE, UK. Tel: 0207-679-5040;
Fax: 0207-387-3496; E-mail: d.swallow@ucl.ac.uk
236
Annals of Human Genetics (2011) 75,236–246
Gilbert’s syndrome has been studied mainly in European and
East Asian populations where its prevalence is estimated at
3–9% (Kornberg, 1942; Bosma et al., 1995; Owens & Evans,
1975; Gwee et al., 1992; Buyukasik et al., 2008). The underlying genetic cause in most populations is considered to
be homozygosity for seven thymine–adenine repeats (TA)7
(UGT1A1∗ 28, rs8175347) in the TATA box promoter motif
(Bosma et al., 1995; Borlak et al., 2000) and mean bilirubin
levels of (TA)7 homozygotes are approximately double those
of (TA)6 homozygotes (Lampe et al., 1999; Premawardhena
et al., 2003; Lin et al., 2006). Although neurotoxic at very
high levels, particularly in children, as a potent antioxidant,
moderately elevated bilirubin has been proposed to protect
against adult oxidative stress-mediated diseases (Stocker et al.,
1987). Indeed, strong negative associations have been observed between bilirubin level and incidence of cancer and
cardiovascular disease (Novotny & Vitek, 2003; Zucker et al.,
C 2011 The Authors
C 2011 Blackwell Publishing Ltd/University College London
Annals of Human Genetics UGT1A in African populations
2004; Temme et al., 2001). Raised bilirubin levels can also inhibit replication in vitro of various blood pathogens including
pneumococcus, the malaria parasite Plasmodium, and human
immunodeficiency virus (HIV) (Najib, 1937; McPhee et al.,
1996; Kumar et al., 2008).
The prevalence of (TA)7 homozygosity in European
populations is 6–10% (Premawardhena et al., 2003). Even
higher frequencies have been reported in sub-Saharan Africa
(Premawardhena et al., 2003). Though rarely identified in
other populations, two additional repeat alleles, (TA)5 and
(TA)8, are also present at low frequency in people of recent
African descent (Beutler et al., 1998; Premawardhena et al.,
2003). There is a negative association between UGT1A1 expression and repeat length of the four alleles, attributable to
decreasing promoter activity acting via altered affinity for the
TATA-binding protein (Beutler et al., 1998; Hsieh et al.,
2007). Although the (TA)n alleles appear to have similar effects
on bilirubin levels in people of recent African descent (Chaar
et al., 2005; Hong et al., 2007; Carpenter et al., 2008) and
low-activity alleles confer significantly raised risk of developing gallstones requiring surgery (Passon et al., 2001; Heeney
et al., 2003), Gilbert’s syndrome is rarely diagnosed in Africa
(Bougouma et al., 1999).
Homozygosity for (TA)7 has also been associated with
adverse drug reactions (ADRs) due to reduced clearance,
most notably life-threatening toxicity to chemotherapy with
high-dose irinotecan (Hoskins et al., 2007). Data supporting this association led the Food and Drug Administration
(FDA) in 2004 to alter the label to recommend a lower starting dose for patients with the (TA)7 /(TA)7 genotype (New
Drug Application 20-571). Severe hyperbilirubinaemia following treatment with the HIV protease inhibitors indinavir
and atazanavir is also much more frequent in (TA)7 homozygotes due to the inhibitory effect of these drugs on UGT1A1
activity (Danoff et al., 2004; Zhang et al., 2005; Lankisch
et al., 2006; Rodriguez-Novoa et al., 2007). However, it
is likely that the pharmacogenetic effects of (TA)7 are confounded by additional common functional variants located
in the UGT1A1 regulatory regions and in the other enzymes encoded by the UGT1A gene complex (Lankisch et al.,
2006). This gene complex encodes the nine UGT1A isoforms
and in Europeans and East Asians a region of strong linkage disequilibrium (LD) extends across much of the complex
(about 90 kb) (Innocenti et al., 2005). Several other “lowactivity” alleles reside on the same haplotype as (TA)7 in European populations (Innocenti et al., 2002; Kohle et al., 2003;
Innocenti et al., 2005; Menard et al., 2009), and probably
play a role in outcomes associated with irinotecan and HIV
therapy (Lankisch et al., 2006; Lankisch et al., 2009). Lower
levels of LD across the UGT1A gene complex have been reported in African–Americans and for the Yoruba of Nigeria
(Innocenti et al., 2002; Odeberg et al., 2006; Hong et al.,
C
2007), but studies for other indigenous African groups are
lacking.
It is increasingly clear that the people of the African continent show higher levels of genetic diversity and population
substructure than most human populations on other continents. The study of pharmacogenetically relevant variation
in Africa is thus particularly important for identifying groups
potentially at risk of poor drug response or ADRs. While
cancer therapy with irinotecan must be comparatively rare
on the African continent, this drug is used to treat people of
recent African descent in the United States and Europe. Also
the possible implications of UGT1A variation with respect to
the HIV treatments that are subsidised for use across Africa,
and the negative interaction of low-activity UGT1A1 (TA)n
promoter variants, with inherited blood disorders common
in parts of Africa, are of considerable clinical importance in
Africa itself (Chaar et al., 2005; Kaplan et al., 2008).
Our first aim was therefore to establish the allele frequencies of (TA)n in different parts of the continent in relation
to geography and ethnic origin. The second aim was to determine whether there are differences, across these defined
populations, in the haplotype backgrounds of the (TA)7 allele with respect to other functional single-nucleotide polymorphisms (SNPs), which might indicate greater functional
diversity both in Africa and in people of recent African
descent. For this study, two SNPs were selected that are
thought to play a role in irinotecan metabolism and toxicity
(Innocenti et al., 2004; Cote et al., 2007). In order to determine whether there is any further variation in the immediate
promoter that might modulate expression in some Africans,
we also sequenced the region immediately upstream of the
start of translation of UGT1A1 in a subset of the samples.
Materials and Methods
Samples
The 2316 buccal DNA samples analysed in this study are part
of an in-house collection assembled by The Centre for Genetic
Anthropology at University College London. All samples were
collected from ostensibly healthy individuals unrelated at the paternal grandfather level and were anonymous, since names were
not recorded. They were collected between 1998 and 2007 with
informed consent and ethical approval (UCLH 99/0196). The
samples tested were from 18 countries across six geographic regions defined as follows: North Europe (NE), the Middle East
(ME), North Africa (NA), West Africa (WA), Central East Africa
(CEA), and South East Africa (SEA) (Veeramah et al., 2008). Selfreported cultural identity/ethnicity and language details were
also available for the majority of the panel. For the most detailed
analyses, country subgroups of 40 or more individuals of the
same self-declared cultural identity/language group were tested
separately.
2011 The Authors
C 2011 Blackwell Publishing Ltd/University College London
Annals of Human Genetics Annals of Human Genetics (2011) 75,236–246
237
L. J. Horsfall et al.
Genotyping
For this study, two SNPs were selected in addition to (TA)n
variant: the rs10929302 (−3156G > A, UGT1A1∗ 93) located
in the phenobarbital response enhancer module (PBREM) located approximately 3 kb upstream of the (TA)n variant that
has been claimed to better predict irinotecan toxicity (Innocenti et al., 2004; Cote et al., 2007), and the nonsynonymous SNP rs11692021 (Trp208Arg, UGT1A7∗ 3) located in the
substrate-binding exon of UGT1A7 (MIM∗ 606432) located approximately 90 kb upstream from (TA)n , which reduces glucuronidation of SN-38, the active metabolite of irinotecan. All
three loci are in strong LD in European populations (Kohle et al.,
2003; Innocenti et al., 2002).
The (TA)n variant was assayed by a previously reported technique using high-percentage polyacrylamide gels (Sampietro
et al., 1998). The selected SNPs were assayed using TaqMAN
technology (Applied Biosystems, Foster City, CA). TaqMAN
probes were designed by Applied Biosystems and polyermase
chain reactions were performed in 384-well microplates using a
gradient cycler. TaqMAN probes are reported in the supplementary Table S1A. Fluorescence was measured using an ABI Prism
7000 (Applied Biosystems, Applera, UK, Warrington, Cheshire,
UK) sequence detection system, and genotypes were assigned
with 95% confidence using ABI Prism 7000 SDS software
version 2.1. A batch of 368 samples from African and nonAfrican groups was first tested to check that there was adequate
allelic variation in the populations under study, and of these, 156
samples were replicated in the larger panel to validate typing.
Call rates were >95% for rs10929302 and >92% for UGT1A7
rs11692021. In all instances, researchers were blind to the sample
origin at the time of typing.
A region upstream (−380) and downstream (+60) of the ATG
start site of UGT1A1 (∼ −330 from the (TA)n sequence to ∼
+100 of the (TA)n sequence) was resequenced in a subset of 372
African samples to represent each geographic region (65 from Algeria [NA], 82 from Cameroon [WA], 148 from Ethiopia [CEA],
77 from Malawi [SEA] and included most from Ethiopia which
is the most diverse country) using an ABI 96-capillary 3730xl
DNA Analyzer (Applied Biosystems, Applera, UK) (see Table
S2 for sequencing primers). This allowed typing of rs34547608
(at −52 bp from (TA)n ) and rs887829 (at −310 bp from (TA)n )).
In all cases, genotypes were inferred assuming no silent alleles.
Data Analyses
All analyses were performed using Arlequin 3.1 unless otherwise
specified (Excoffier et al., 2005).
Exact tests for deviation from Hardy–Weinberg equilibrium
were performed (using 10,000 steps in a Markov chain; 10,000
dememorization steps). For display on the map in Figure 1, (TA)n
genotypes were recoded into three “expression” phenotypes using groups assigned from bilirubin levels in a study on people
with recent African ancestry (African-Caribbean) (Chaar et al.,
2005). Comparisons of genetic distances between populations
(regions, countries, and ethnic groups) based on (TA)n genotype
238
Annals of Human Genetics (2011) 75,236–246
frequencies were made by calculating pairwise F ST values (10,000
permutations). Because of the large number of different ethnic
groups with very few individuals, we limited the analysis of ethnic
groups to those with at least 40 members (n = 1838). To visualize
these differences, principal coordinates analysis (PCO) was performed on F ST matrices within R-programming environment
using routines in the APE package. Genetic similarity was quantified as being equal to the value of F ST subtracted from one.
Values along the main diagonal, which represent the similarity
of each population to itself, were calculated from the estimated
genetic distance between two copies of the same sample by the
formula n/(n−1).
The D measure of LD between the three genotyped loci
was calculated using LDMax which uses the expectationmaximization algorithm to determine phase and is available
as part of the GOLD software package (http://www.sph.
umich.edu/csg/abecasis/GOLD/docs/stats.html). Haplotypes
were inferred using PHASE v2.1.1 (100 iterations; 500 burn-in).
The resulting haplotype frequencies were used to calculate
Nei’s gene diversity index (h) and population differentiation
using exact tests (Markov chain length 100,000 steps). Where
appropriate, the standard Bonferroni correction for multiple
testing was applied by multiplying the significance value by the
number of comparisons.
Results
UGT1A1 (TA)n Allele Frequencies
The allele frequencies of UGT1A1 (TA)n. are presented in
Table 1 (see Table S3 for genotype frequency data). The allele frequency of (TA)7 ranged from 0.32 in Yemen and the
Chewa of Malawi to 0.60 in the Anuak of Ethiopia. In Tanzania, Uganda, southern Sudan, Nigeria, Ethiopian Anuak, and
all ethnic groups in Cameroon and Ghana, (TA)7 is the most
common variant. The (TA)5 and (TA)8 alleles, which were
not detected in the British sample, were present at low frequencies in all of the other groups tested. Overall, the (TA)5
allele was more prevalent than (TA)8 and reached a frequency
of 0.10 or above in five of the 13 sub-Saharan African countries. The Ethiopian Anuak was the only sub-Saharan ethnic
group without a single occurrence of (TA)5 . Although this
is a dinucleotide repeat (or microsatellite) polymorphism, no
novel alleles were identified.
The geographic and ethnic distribution of inferred low-,
intermediate- and high-expression phenotypes based on recoded genotype data are presented in Figure 1. The distribution shows that low-activity genotypes are highly prevalent in
equatorial regions of Africa and that Ethiopia has the highest
within country interethnic group variability.
Pairwise FST results and associated p-values are shown in
supplementary Tables S4A–C. The pairwise Fst values show
significant differentiation between sub-Saharan African regions and regions outside of sub-Saharan Africa, (though for
C 2011 The Authors
C 2011 Blackwell Publishing Ltd/University College London
Annals of Human Genetics UGT1A in African populations
Figure 1 Distribution of UGT1A1 (TA)n rs8175347 genotypes categorized as low
((TA)7/7 , (TA)7/8 , or (TA)8/8 ), intermediate ((TA)6/7 or (TA)6/8 ), and high ((TA)5 or
homozygous (TA)6/6 ) expression genotypes across countries and country subgroups. See
Table 1 for full details of groups. Bantu is short for Bantu language speakers. Note that
the frequencies of the low-activity alleles in the different country groups are significantly
higher in the equatorial belt (+10 to −10 latitude) than elsewhere, p = 0.000015,
Student’s t-test and also significantly higher for the equatorial belt than the rest of Africa
(p = 0.00019). However also note the interethnic differences, particularly in Ethiopia.
CEA, statistical significance did not remain after Bonferroni
correction). However, there was little differentiation between
countries within regions or between ethnic groups within
countries in most cases. The exceptions were Senegal in the
WA region and the Ethiopian ethnic groups. A PCO plot derived from pairwise Fst measurements between all the distinct
ethnic/language groups shows that while the SEA groups
cluster, the CEA and WA are more differentiated (Fig. 2).
The increasing values on the first principal component axis
broadly correspond to increasing (TA)7 frequencies.
Analysis of the Two SNPs, rs11692021
and rs10929302
The allele frequencies of the two SNPs by country and ethnic
group are shown in Table 1. The globally minor allele of the
UGT1A7 nonsynonymous SNP rs11692021 was at highest
C
frequency in the countries and individual ethnic groups in
the CEA region (range: 0.33–0.53), at relatively lower frequency in SEA (range: 0.15–0.29) and at intermediate frequency in WA and the regions outside of sub-Saharan Africa
(range: 0.21–0.41). A similar pattern was seen with UGT1A1
rs10929302, though the differences were less marked.
Variability of LD in Different Countries
and Ethnic Groups
Pairwise D values, which give a measure of recombination, were calculated using data from countries and ethnic groups separately. There were distinct differences in the
patterns of LD in each of the groups (see supplementary
Table S5 for D values). Samples from the countries outside of
sub-Saharan Africa showed the highest LD, with D of greater
than 0.92 between the (TA)n and the UGT1A1 PBREM SNP
2011 The Authors
C 2011 Blackwell Publishing Ltd/University College London
Annals of Human Genetics Annals of Human Genetics (2011) 75,236–246
239
L. J. Horsfall et al.
Table 1 Allele frequency (≥1%) by country and country subgroup (based on self-declared cultural identity/ethnic group or language group)
of (TA)n and the two SNPs, rs10929302 and rs11692021.
Gene locus/rs numbers and ∗ nomenclature
UGT1A1
UGT1A1
UGT1A7
rs8175347
rs10929302
rs11692021
Region
Country
Group
Number
(TA)5
UGT1A1∗ 36
(TA)6
UGT1A1∗ 1
(TA)7
UGT1A1∗ 28
(TA)8
UGT1A1∗ 37
−3156 A
UGT1A1∗ 93
+622 C
UGT1A7∗ 3/∗ 4
NE
Britain
British
90
0.00
0.64
0.35
0.00
0.33
0.35
ME
Turkey
Yemen
Anatolian
96
120
0.00
0.03
0.62
0.64
0.38
0.32
0.01
0.00
0.35
0.32
0.41
0.35
NA
Morocco
89
72
0.02
0.02
0.63
0.65
0.35
0.33
0.00
0.00
0.30
0.27
0.36
0.35
Algeria
183
0.01
0.64
0.34
0.01
0.32
0.38
Senegal
191
96
95
135
90
45
99
273
90
42
76
0.10
0.13
0.07
0.10
0.11
0.10
0.09
0.05
0.02
0.06
0.08
0.49
0.45
0.54
0.37
0.39
0.33
0.43
0.41
0.41
0.43
0.38
0.36
0.39
0.34
0.47
0.46
0.49
0.45
0.50
0.53
0.46
0.51
0.04
0.03
0.05
0.05
0.04
0.08
0.03
0.04
0.04
0.05
0.03
0.27
0.31
0.24
0.37
0.40
0.30
0.33
0.40
0.43
0.38
0.44
0.25
0.21
0.30
0.29
0.30
0.27
0.24
0.31
0.37
0.36
0.26
367
149
106
102
156
129
41
0.01
0.00
0.00
0.01
0.02
0.04
0.05
0.54
0.63
0.37
0.58
0.52
0.39
0.39
0.43
0.35
0.60
0.38
0.36
0.48
0.53
0.02
0.02
0.03
0.02
0.03
0.05
0.04
0.36
0.31
0.47
0.36
0.32
0.40
0.38
0.50
0.53
0.42
0.53
0.46
0.33
0.40
39
50
260
91
61
56
84
0.06
0.09
0.13
0.14
0.16
0.11
0.13
0.41
0.43
0.48
0.51
0.48
0.48
0.48
0.46
0.43
0.33
0.32
0.33
0.35
0.37
0.06
0.01
0.05
0.03
0.03
0.06
0.03
0.39
0.36
0.27
0.28
0.28
0.26
0.28
0.24
0.29
0.19
0.19
0.19
0.19
0.16
56
197
101
0.05
0.10
0.10
0.54
0.50
0.50
0.35
0.34
0.32
0.01
0.06
0.07
0.34
0.26
0.27
0.20
0.17
0.15
96
0.09
0.48
0.38
0.04
0.26
0.18
2616
0.06
0.51
0.39
0.03
0.33
0.32
Ifrane
Berbers
WA
Manjak
Wolof
Ghana
Nigeria
Cameroon
Bulsa
Kasena
Igbo
Arabe
Kotoko
Mambila
CEA
Ethiopia
Amhara
Anuak
Oromo
Sudan North
Sudan South
Dinka
SEA
Uganda
Tanzania
Malawi
Mozambique
Chagga
Chewa
Tumbuka
Yao
Sena Bantu
speakers
Zimbabwe
South Africa
Bantu
speakers
Lemba
Overall
Values in bold represent the full dataset for the country, and include the groups listed below, as well as individuals in groups of less than 40,
frequency data are also shown for more uniform/ethnic groups with datasets of more than 40 people. The “Arabe” from near Lake Chad,
Cameroon are described by anthropologists as Choa, Shuwa, or Shewa Arabs. The “Bantu speakers” refer to Bantu language speakers, of
varied or unrecorded ethnic group, collected in Sena, Mozambique, or in South Africa, where they were considered distinct from the Lemba
collected at the same time. The Yemeni samples (various ethnicities) were all collected in the Hadramaut region. All alleles are shown for
rs8175347 and only minor alleles (globally) are shown for rs10929302 and rs11692021. After adjusting for multiple comparisons, there was
no evidence for significant deviation from Hardy–Weinberg equilibrium within regions, countries, or ethnic groups for any of the loci,
with respectively 1/18, 2/54, and 3/66 comparisons across the three loci showing p-values of between 0.05 and 0.01 before correction. Full
genotype data are shown in Table S4. NE = Northern Europe; ME = Middle East; NA = North Africa; WA = West Africa; CEA = Central
East Africa; SEA = South East Africa.
240
Annals of Human Genetics (2011) 75,236–246
C 2011 The Authors
C 2011 Blackwell Publishing Ltd/University College London
Annals of Human Genetics UGT1A in African populations
Figure 2 Principal coordinates plot of the pairwise FST values for the country subgroups. Calculated using UGT1A (TA)n frequency
data. NE = Northern Europe; ME = Middle East; NA = North Africa; WA = West Africa; CEA = Central East Africa; SEA =
South East Africa. See Table 1 for full details of groups. Bantu is short for Bantu language speakers. See supplementary Tables 4A–C
for the pairwise FST data and p-values. This plot shows the clustering of the SEA groups that contrasts with the much greater genetic
distances between the CEA groups.
rs10929302. The CEA region showed the lowest level of LD.
Within Ethiopia, significant LD was detected across all three
pairs of loci in the Anuak but for none in the Oromo.
Haplotype frequencies estimated using PHASE are presented in Table 2. Very similar frequencies were obtained using
the expectation-maximization algorithm (data not shown).
The haplotype frequencies and estimated diversity indices (see
supplementary Fig. S1 for Nei’s h values) show that haplotypic
diversity is greater in sub-Saharan Africa. The haplotype encompassing all three “high” activity alleles (TG6: ancestral
T allele of rs11692021 and the G allele of rs10929302 together with (TA)6 ) is the most prevalent in all groups except
for the Ethiopian Anuak (where the low-activity haplotype
CA7 is slightly more frequent). The only other major haplotype background for (TA)6 was CG6. Overall (TA)7 occurs
most frequently as part of haplotype CA7 (derived C allele
of rs11692021 and the derived A allele of rs10929302) but
has a more diverse haplotypic background in the sub-Saharan
groups where the frequencies of TA7 and TG7 were found
C
to be over 0.10 in many instances Although (TA)8 is relatively rare, its haplotypic background appears the most variable, while the other rare allele (TA)5 , has only one major
background (TG5), again a combination with the ancestral
SNP alleles (see Table S7 for comments on ancestral alleles).
Exact tests of population differentiation using haplotype frequencies (see supplementary Table S6 for p-values) show the
most genetic differentiation between the Europeans (British
and Turkish), and all others, and also between the Ethiopian
Amhara and Oromo, and all others (including the Ethiopian
Anuak).
Resequencing of the UGT1A1 Promoter
As a pilot study to check the promoter sequence context of
the low-activity (TA)n alleles in Africans, sequence of the
immediate UGT1A1 promoter region (−250 from the (TA)n
sequence to +100 of the (TA)n sequence) was scanned for
2011 The Authors
C 2011 Blackwell Publishing Ltd/University College London
Annals of Human Genetics Annals of Human Genetics (2011) 75,236–246
241
Table 2 Estimated haplotype frequency (>1%) by country and country sub-group.
L. J. Horsfall et al.
242
Annals of Human Genetics (2011) 75,236–246
C 2011 The Authors
C 2011 Blackwell Publishing Ltd/University College London
Annals of Human Genetics UGT1A in African populations
Table 3 Haplotypes comprising the three promoter loci (from left rs887829, rs34547608, and (TA)n ) inferred using PHASE for a subset of
African samples (n = 372).
Promoter haplotype
CT6
TT7
CC5
TT8
CT7
TT6
CC6
CT5
CT8
TT5
Total chromosomes
NA
Algeria
Amhara
CEA
Anuak
0.677
0.308
0.700
0.280
0.326
0.612
0.008
0.020
0.054
0.008
Oromo
0.537
0.417
0.009
0.009
0.028
WA
Cameroon
SEA
Malawi
0.470
0.445
0.037
0.043
0.448
0.325
0.156
0.039
0.006
0.006
0.006
0.008
0.006
130
50
138
108
164
0.006
0.006
154
Frequency
0.502
0.410
0.042
0.031
0.007
0.003
0.001
0.001
0.001
0.001
Haplotypes named according to the allele composition. See Table S3 for details of the extended haplotypes and the methods section for details
of samples. It can be seen that for the vast majority of cases (98.7%) that the C allele of rs887829 is found with (TA)5 or (TA)6 and the T
allele is found with (TA)7 or (TA)8 .
a total of 372 samples. No novel variation was identified,
but the previously reported rs34547608 and rs887829 were
found and typed in all 372 individuals. These SNPs do not
significantly increase the haplotypic diversity, the rs34547608
C allele being very tightly associated with the (TA)5 allele
(confirming previous reports based on data from 101 African–
Americans (Beutler et al., 1998), and the rs887829 T allele
being tightly associated with (TA)7 and also (TA)8 . Inferred
three locus haplotype frequencies are reported in Table 3, and
five locus haplotypes in supplementary Table S7.
Discussion
In this paper, we confirm previous observations that the promoter variant (TA)7 of UGT1A1, which is associated with
reduced UGT1A1 activity, hyperbilirubinaemia, and specific
ADRs, occurs in a region of strong LD in non-African populations. However in Africa, where there are also more (TA)n
alleles (TA5,6,7 and 8), we show more heterogeneity of haplotype background as well as large differences in the frequency
of the alleles in different regions. Overall there is a geographic
trend. The low-activity (TA)n genotypes are more prevalent
in the equatorial regions but the haplotype diversity is greater.
There are also differences between ethnic groups within
a single country, and these are statistically significant in the
case of Ethiopia. The Anuak show much higher frequencies
of the (TA)7 allele and the low-activity haplotype, while the
Oromo show a very high level of haplotype diversity. These
distributions may simply reflect demography but it is interesting to note an apparent correspondence to the distribution of malaria. For example, the Ethiopian Anuak with the
C
highest frequencies of (TA)7/8 live in the low-lying western
regions around Gambella, where malaria is endemic, whereas
the Amhara and Oromo, with lower frequencies, live in the
eastern highland regions where malaria is infrequent or absent.
It is noteworthy that high levels of unconjugated bilirubin can
inhibit P. falciparum replication, suggesting that low UGT1A1
activity may possibly have conferred a selective advantage by
protection from malaria, similar to other genetic traits such as
glucose-6-phosphate dehydrogenase (G6PD) deficiency and
sickle cell anaemia (Kumar et al., 2008). Others have noted
that high frequencies of the (TA)7 allele occur in other areas where malaria is endemic such as much of the Indian
subcontinent (Premawardhena et al., 2003).
For drugs, such as irinotecan, a combination of (TA)7
and functional SNPs in other UGT1A isoforms, such as
UGT1A7, has been proposed to be a better predictor of drug
toxicity (Lankisch et al., 2008), but these coexist on the same
haplotype (CA7) so that the whole haplotype is predictive of
risk, and it is hard to separate the effects of the TATA box
variation from that of other functional SNPs. In the African
populations studied here the situation is quite different and
recombination has separated the low-activity alleles. In the
TA7 haplotype, for example, (frequency 0.22 in the Tanzanian sample) the low-expression (TA)7 allele is on the same
chromosome as the high-activity UGT1A7 allele while the
converse is true of the CG6 which is frequent in the Ghanaian Bulsa. When, in 2004, the FDA approved a commercial
test to predict a potentially fatal response to irinotecan therapy, they did not consider the complexity of the possible
interaction with other functional SNPs in the UGT1A1 regulatory elements and within other UGT1A isoforms, which
are predicted to lead to intermediate phenotype. Thus many
2011 The Authors
C 2011 Blackwell Publishing Ltd/University College London
Annals of Human Genetics Annals of Human Genetics (2011) 75,236–246
243
L. J. Horsfall et al.
African–Americans may be prescribed doses of the drug based
on advice that might not be relevant for people of all ancestries.
The results described in this paper, in particular the evidence of greater haplotype diversity across the UGT1A complex, in sub-Saharan Africa than in Europe, emphasise the
need for further investigation of the effect of the other functional UGT1A variants in addition to (TA)n on the risk
of hyperbilirubinaemia due to interactions with hemolytic
anaemia, or treatment with HIV protease inhibitors, as well
as to the risk of irinotecan toxicity. In addition, further resequencing of the rest of the UGT1A gene complex in people
of African ancestry is indicated. Our pilot resequencing of the
UGT1A1 promoter, in a diverse sample set, however, failed
to identify novel SNPs and typing of the previously reported
rs887829 and rs34547608 showed that there is very little recombination with (TA)n, so that even if these alleles modulate
the function of (TA)n , the effect of this would be seen only
in rare individuals. The haplotype and ethnicity information
reported here will help in the construction of appropriate
phenotype–genotype association studies and development of
better diagnostic tests. As well as testing other functional variants, it seems reasonable to suggest that rs887829 might be
useful diagnostically as a marker for (TA)7 and (TA)8 since
it would be easier to incorporate into multiplex assays than
the microsatellite. This would also provide a solution to the
problem that the (TA)8 allele cannot be typed in the commerR
UGT1A1 molecular assay package insert)
cial assay (Invader
despite its importance as a risk allele.
Acknowledgements
We thank all the sample donors, and also the DNA collectors:
Leila Laredj, Matthew Forka, Liz Caldwell, M. le Roux, Pieta
Näsänen, Tudor Parfitt, Tankei Helenius, Dr. Fouad Berrada,
Esther William, D. Gomis, H. Babiker, J. Course, Hicram, James
Wilson; Ranji Arasaretnam, Mari Wyn Burley, Heather Elding,
and Anke Liebert for help with electrophoresis and sequencing;
the Melford Charitable Trust for providing funding; Dr. Stephen
Pereira for helpful discussion.
References
Beutler, E., Gelbart, T. & Demina, A. (1998) Racial variability in the
UDP-glucuronosyltransferase 1 (UGT1A1) promoter: A balanced
polymorphism for regulation of bilirubin metabolism? Proc Natl
Acad Sci USA 95, 8170–8174.
Borlak, J., Thum, T., Landt, O., Erb, K. & Hermann, R. (2000)
Molecular diagnosis of a familial nonhemolytic hyperbilirubinemia (Gilbert’s syndrome) in healthy subjects. Hepatology 32, 792–
795.
Bosma, P. J., Chowdhury, J. R., Bakker, C., Gantla, S., De Boer, A.,
Oostra, B. A., Lindhout, D., Tytgat, G. N., Jansen, P. L. & Oude
Elferink, R. P. (1995) The genetic basis of the reduced expression
244
Annals of Human Genetics (2011) 75,236–246
of bilirubin UDP-glucuronosyltransferase 1 in Gilbert’s syndrome.
N Engl J Med 333, 1171–1175.
Bosma, P. J., Seppen, J., Goldhoorn, B., Bakker, C., Oude Elferink,
R. P., Chowdhury, J. R., Chowdhury, N. R. & Jansen, P. L.
(1994) Bilirubin UDP-glucuronosyltransferase 1 is the only relevant bilirubin glucuronidating isoform in man. J Biol Chem 269,
17960–17964.
Bougouma, A., Ilboudo, P. D., Bonkoungou, P., Sombie, R. & Siko,
A. (1999) La maladie de Gilbert chez le Noir Africain: A propos de
4 observations au Centre Hospitalier National de Ouagadougou.
Medecine d’Afrique Noire 45, 613–617.
Buyukasik, Y., Akman, U., Buyukasik, N. S., Goker, H., Kilicarslan,
A., Shorbagi, A. I., Hascelik, G. & Haznedaroglu, I. C. (2008)
Evidence for higher red blood cell mass in persons with unconjugated hyperbilirubinemia and Gilbert’s syndrome. Am J Med Sci
335, 115–119.
Carpenter, S. L., Lieff, S., Howard, T. A., Eggleston, B. & Ware,
R. E. (2008) UGT1A1 promoter polymorphisms and the development of hyperbilirubinemia and gallbladder disease in children
with sickle cell anemia. Am J Hematol 63, 800–803.
Chaar, V., Keclard, L., Diara, J. P., Leturdu, C., Elion, J., Krishnamoorthy, R., Clayton, J. & Romana, M. (2005) Association
of UGT1A1 polymorphism with prevalence and age at onset of
cholelithiasis in sickle cell anemia. Haematologica 90, 188–199.
Cote, J. F., Kirzin, S., Kramar, A., Mosnier, J. F., Diebold, M. D.,
Soubeyran, I., Thirouard, A. S., Selves, J., Laurent-Puig, P. &
Ychou, M. (2007) UGT1A1 polymorphism can predict hematologic toxicity in patients treated with irinotecan. Clin Cancer Res
13, 3269–3275.
Danoff, T. M., Campbell, D. A., Mccarthy, L. C., Lewis, K. F.,
Repasch, M. H., Saunders, A. M., Spurr, N. K., Purvis, I. J.,
Roses, A. D. & Xu, C. F. (2004) A Gilbert’s syndrome UGT1A1
variant confers susceptibility to tranilast-induced hyperbilirubinemia. Pharmacogenomics J 4, 49–53.
Excoffier, L., Laval, G. & Schneider, S. (2005) Arlequin (version
3.0): An integrated software package for population genetics data
analysis. Evol Bioinform Online 1, 47–50.
Gwee, K. A., Koay, E. S. & Kang, J. Y. (1992) The prevalence of
isolated unconjugated hyperbilirubinaemia (Gilbert’s syndrome)
in subjects attending a health screening programme in Singapore.
Singapore Med J 33, 588–589.
Heeney, M. M., Howard, T. A., Zimmerman, S. A. & Ware, R.
E. (2003) UGT1A promoter polymorphisms influence bilirubin
response to hydroxyurea therapy in sickle cell anemia. J Lab Clin
Med 141, 279–282.
Hong, A. L., Huo, D., Kim, H. J., Niu, Q., Fackenthal, D. L.,
Cummings, S. A., John, E. M., West, D. W., Whittemore, A.
S., Das, S. & Olopade, O. I. (2007) UDP-glucuronosyltransferase
1A1 gene polymorphisms and total bilirubin levels in an ethnically
diverse cohort of women. Drug Metab Dispos 35, 1254–1261.
Hoskins, J. M., Goldberg, R. M., Qu, P., Ibrahim, J. G. & Mcleod,
H. L. (2007) UGT1A1∗ 28 genotype and irinotecan-induced neutropenia: Dose matters. J Natl Cancer Inst 99, 1290–1295.
Hsieh, T. Y., Shiu, T. Y., Huang, S. M., Lin, H. H., Lee, T. C.,
Chen, P. J., Chu, H. C., Chang, W. K., Jeng, K. S., Lai, M.
M. & Chao, Y. C. (2007) Molecular pathogenesis of Gilbert’s
syndrome: Decreased TATA-binding protein binding affinity of
UGT1A1 gene promoter. Pharmacogenet Genomics 17, 229–236.
Innocenti, F., Grimsley, C., Das, S., Ramirez, J., Cheng, C., KuttabBoulos, H., Ratain, M. J. & Di Rienzo, A. (2002) Haplotype
structure of the UDP-glucuronosyltransferase 1A1 promoter in
different ethnic groups. Pharmacogenetics 12, 725–733.
C 2011 The Authors
C 2011 Blackwell Publishing Ltd/University College London
Annals of Human Genetics UGT1A in African populations
Innocenti, F., Liu, W., Chen, P., Desai, A. A., Das, S.
& Ratain, M. J. (2005) Haplotypes of variants in the
UDP-glucuronosyltransferase1A9 and 1A1 genes. Pharmacogenet
Genomics 15, 295–301.
Innocenti, F., Undevia, S. D., Iyer, L., Chen, P. X., Das, S., Kocherginsky, M., Karrison, T., Janisch, L., Ramirez, J., Rudin, C. M.,
Vokes, E. E. & Ratain, M. J. (2004) Genetic variants in the UDPglucuronosyltransferase 1A1 gene predict the risk of severe neutropenia of irinotecan. J Clin Oncol 22, 1382–1388.
Kaplan, M., Slusher, T., Renbaum, P., Essiet, D. F., Pam, S.,
Levy-Lahad, E. & Hammerman, C. (2008) (TA)n UDPglucuronosyltransferase 1A1 promoter polymorphism in Nigerian
neonates. Pediatr Res 63, 109–111.
Kohle, C., Mohrle, B., Munzel, P. A., Schwab, M., Wernet, D.,
Badary, O. A. & Bock, K. W. (2003) Frequent co-occurrence
of the TATA box mutation associated with Gilbert’s syndrome (UGT1A1∗ 28) with other polymorphisms of the UDPglucuronosyltransferase-1 locus (UGT1A6∗ 2 and UGT1A7∗ 3) in
Caucasians and Egyptians. Biochem Pharmacol 65, 1521–1527.
Kornberg, A. (1942) Latent liver disease in persons recovered from
catarrhal jaundice and in otherwise normal medical students as
revealed by the bilirubin excretion test. J Clin Invest 21, 299–308.
Kumar, S., Guha, M., Choubey, V., Maity, P., Srivastava, K., Puri, S.
K. & Bandyopadhyay, U. (2008) Bilirubin inhibits Plasmodium falciparum growth through the generation of reactive oxygen species.
Free Radic Biol Med 44, 602–613.
Lampe, J. W., Bigler, J., Horner, N. K. & Potter, J. D. (1999) UDPglucuronosyltransferase (UGT1A1∗ 28 and UGT1A6∗ 2) polymorphisms in Caucasians and Asians: Relationships to serum bilirubin
concentrations. Pharmacogenetics 9, 341–349.
Lankisch, T., Moebius, U., Wehmeier, M., Behrens, G., Manns,
M., Schmidt, R. & Strassburg, C. (2006) Gilbert’s disease and
atazanavir: From phenotype to UDP-glucuronosyltransferase haplotype. Hepatology 44, 1324–1332.
Lankisch, T. O., Behrens, G., Ehmer, U., Mobius, U., Rockstroh, J.,
Wehmeier, M., Kalthoff, S., Freiberg, N., Manns, M. P., Schmidt,
R. E. & Strassburg, C. P. (2009) Gilbert’s syndrome and hyperbilirubinemia in protease inhibitor therapy– an extended haplotype of genetic variants increases risk in indinavir treatment. J
Hepatol 50, 1010–1018.
Lankisch, T. O., Schulz, C., Zwingers, T., Erichsen, T. J., Manns,
M. P., Heinemann, V. & Strassburg, C. P. (2008) Gilbert’s
Syndrome and irinotecan toxicity: Combination with UDPglucuronosyltransferase 1A7 variants increases risk. Cancer Epidemiol Biomarkers Prev 17, 695–701.
Lin, J. P., O’Donnell, C. J., Schwaiger, J. P., Cupples, L. A., Lingenhel, A., Hunt, S. C., Yang, S. & Kronenberg, F. (2006) Association
between the UGT1A1∗ 28 allele, bilirubin levels, and coronary
heart disease in the Framingham Heart Study. Circulation 114,
1476–1481.
McPhee, F., Caldera, P., Bemis, G., Mcdonagh, A., Kuntz, I. &
Craik, C. (1996) Bile pigments as HIV-1 protease inhibitors and
their effects on HIV-1 viral maturation and infectivity in vitro.
Biochem J 320(Pt 2), 681–686.
Menard, V., Girard, H., Harvey, M., Perusse, L. & Guillemette, C.
(2009) Analysis of inherited genetic variations at the UGT1 locus
in the French-Canadian population. Hum Mutat 30, 677–687.
Najib, F. (1937) Defensive role of bilirubinemia in pneumococcal
infection. The Lancet 229, 505–506.
New Drug Application 20-571 - Final label – UGT1A1 Camptosar
R
(irinotecan HCl). http://www.fda.gov/MedWatch/safety/
2005/Jun_PI/Camptosar_PI.pdf.
C
Novotny, L. & Vitek, L. (2003) Inverse relationship between serum
bilirubin and atherosclerosis in men: A meta-analysis of published
studies. Exp Biol Med (Maywood) 228, 568–571.
Odeberg, J. M., Andrade, J., Holmberg, K., Hoglund, P., Malmqvist,
U. & Odeberg, J. (2006) UGT1A polymorphisms in a Swedish
cohort and a human diversity panel, and the relation to bilirubin
plasma levels in males and females. Eur J Clin Pharmacol 62, 829–
837.
Owens, D. & Evans, J. (1975) Population studies on Gilbert’s syndrome. J Med Genet 12, 152–156.
Passon, R. G., Howard, T. A., Zimmerman, S. A., Schultz, W. H.
& Ware, R. E. (2001) Influence of bilirubin uridine diphosphateglucuronosyltransferase 1A promoter polymorphisms on serum
bilirubin levels and cholelithiasis in children with sickle cell anemia. J Pediatr Hematol Oncol 23, 448–451.
Premawardhena, A., Fisher, C. A., Liu, Y. T., Verma, I. C., De Silva,
S., Arambepola, M., Clegg, J. B. & Weatherall, D. J. (2003) The
global distribution of length polymorphisms of the promoters of
the glucuronosyltransferase 1 gene (UGT1A1): Hematologic and
evolutionary implications. Blood Cells Mol Dis 31, 98–101.
Rodriguez-Novoa, S., Martin-Carbonero, L., Barreiro, P.,
Gonzalez-Pardo, G., Jimenez-Nacher, I., Gonzalez-Lahoz, J. &
Soriano, V. (2007) Genetic factors influencing atazanavir plasma
concentrations and the risk of severe hyperbilirubinemia. Aids 21,
41–46.
Sampietro, M., Lupica, L., Perrero, L., Romano, R., Molteni, V. &
Fiorelli, G. (1998) TATA-box mutant in the promoter of the uridine diphosphate glucuronosyltransferase gene in Italian patients
with Gilbert’s syndrome. Ital J Gastroenterol Hepatol 30, 194–198.
Stocker, R., Yamamoto, Y., Mcdonagh, A. F., Glazer, A. N. & Ames,
B. N. (1987) Bilirubin is an antioxidant of possible physiological
importance. Science 235, 1043–1046.
Temme, E. H., Zhang, J., Schouten, E. G. & Kesteloot, H. (2001)
Serum bilirubin and 10-year mortality risk in a Belgian population. Cancer Causes Control 12, 887–894.
Veeramah, K. R., Thomas, M. G., Weale, M. E., Zeitlyn, D.,
Tarekegn, A., Bekele, E., Mendell, N. R., Shephard, E. A., Bradman, N. & Phillips, I. R. (2008) The potentially deleterious functional variant flavin-containing monooxygenase 2∗ 1 is at high frequency throughout sub-Saharan Africa. Pharmacogenet Genomics
18, 877–886.
Zhang, D., Chando, T., Everett, D., Patten, C., Dehal, S. &
Humphreys, W. (2005) In vitro inhibition of UDP glucuronosyltransferases by atazanavir and other HIV protease inhibitors and
the relationship of this property to in vivo bilirubin glucuronidation. Drug Metab Dispos 33, 1729–1739.
Zucker, S. D., Horn, P. S. & Sherman, K. E. (2004) Serum bilirubin
levels in the U.S. population: Gender effect and inverse correlation
with colorectal cancer. Hepatology 40, 827–835.
Supporting Information
Additional supporting information may be found in the online
version of this article:
Table S1 Taqman primers. PBREM = phenobarbital response enhancer module
Table S2 Sequencing primers.
Table S3 Raw genotype data by country and country subgroup.
2011 The Authors
C 2011 Blackwell Publishing Ltd/University College London
Annals of Human Genetics Annals of Human Genetics (2011) 75,236–246
245
L. J. Horsfall et al.
Table S4A Pairwise FST values for (TA)n for regions with
significance values.
Table S4B Pairwise FST values for (TA)n for countries with
significance values.
Table S4C Pairwise FST values for (TA)n ethnic/uniform
country subgroups (≥40 members) with significance values.
Table S5 Pairwise D values for the UGT1A loci rs8175347,
rs10929302, and rs11692021 in countries and ethnic groups.
Table S6 p-Values with standard deviations for exact tests
of differentiation of UGT1A haplotype frequencies among
samples.
Table S7 Haplotypes comprising all five loci inferred using PHASE (on the separate groups), and their frequencies,
246
Annals of Human Genetics (2011) 75,236–246
for a subset of African samples compared with the probable
ancestral alleles (as found in primates).
Figure S1 Diversity indices (h) for the UGT1A haplotypes
encompassing loci rs8175347, rs10929302, and rs11692021.
As a service to our authors and readers, this journal provides supporting information supplied by the authors. Such
materials are peer-reviewed and may be reorganised for online delivery, but are not copy-edited or typeset. Technical support issues arising from supporting information
(other than missing files) should be addressed to the authors.
Received: 5 August 2010
Accepted: 29 November 2010
C 2011 The Authors
C 2011 Blackwell Publishing Ltd/University College London
Annals of Human Genetics 
Download