doi: 10.1111/j.1469-1809.2010.00638.x Prevalence of Clinically Relevant UGT1A Alleles and Haplotypes in African Populations Laura J. Horsfall1 , David Zeitlyn2 , Ayele Tarekegn3 , Endashaw Bekele4 , Mark G. Thomas1,5 , Neil Bradman3 and Dallas M. Swallow1∗ 1 Department of Genetics, Evolution and Environment, University College London, Wolfson House, London NW1 2HE, UK 2 Institute of Social and Cultural Anthropology, School of Anthropology and Museum Ethnography, University of Oxford 3 The Centre for Genetic Anthropology, Department of Genetics, Evolution and Environment, University College London 4 Department of Biology, Addis Ababa University, Addis Ababa, Ethiopia 5 Department of Evolutionary Biology, Uppsala University, Uppsala, Sweden Summary Variation of a short (TA)n repeat sequence (rs8175347) covering the TATA box of UGT1A1 (UDPglucuronosyltransferase1A1) is associated with hyperbilirubinaemia (Gilbert’s syndrome) and adverse drug reactions, and is used for dosage advice for irinotecan. Several reports indicate that the low-activity (risk) alleles ((TA)7 and (TA)8 )) are very frequent in Africans but the patterns of association with other variants in the UGT1A gene complex that may modulate these responses are not well known. rs8175347 and two other clinically relevant UGT1A variants (rs11692021 and rs10929302) were assayed in 2616 people from Europe and Africa. Low-activity (TA)n alleles frequencies were highest in equatorial Africa, (TA)7, being the most common in Cameroon, Ghana, southern Sudan, and in Ethiopian Anuak. Haplotypic diversity was also greatest in equatorial Africa, but in Ethiopia was very variable across ethnic groups. Resequencing of the promoter of a sample subset revealed no novel variations, but rs34547608 and rs887829 were typed and shown to be tightly associated with (TA)n . Our results illustrate the need for investigation of the effect of UGT1A variants other than (TA)n on the risk of irinotecan toxicity, as well as hyperbilirubinaemia due to hemolytic anaemia or human immunodeficiency virus protease inhibitors, so that appropriate pharmacogenetic advice can be given. Keywords: Drug metabolism, UDP-glucuronosyltransferase 1A1, UDP-glucuronosyltransferase 1A7, haplotype diversity, allele frequency, population structure, UGT1A gene complex, bilirubin Introduction UDP-glucuronosyltransferase 1A isoform 1 (UGT1A1) is a phase II drug metabolizing enzyme responsible for converting a wide array of drugs to water-soluble glucuronides suitable for renal or biliary elimination (MIM∗ 191740). UGT1A1 is also the main isozyme capable of conjugating bilirubin, the endogenous yellow pigment resulting from natural haeme catabolism (Bosma et al., 1994). The inherited hyperbilirubinaemia known as Gilbert’s syndrome (MIM∗ 143500), for which intermittent episodes of jaundice are the most widely recognised symptom, is attributable to reduced UGT1A1 activity (Bosma et al., 1995). ∗ Corresponding author: Dallas M. Swallow, Department of Genetics, Environment and Evolution, University College London, Wolfson House, London NW1 2HE, UK. Tel: 0207-679-5040; Fax: 0207-387-3496; E-mail: d.swallow@ucl.ac.uk 236 Annals of Human Genetics (2011) 75,236–246 Gilbert’s syndrome has been studied mainly in European and East Asian populations where its prevalence is estimated at 3–9% (Kornberg, 1942; Bosma et al., 1995; Owens & Evans, 1975; Gwee et al., 1992; Buyukasik et al., 2008). The underlying genetic cause in most populations is considered to be homozygosity for seven thymine–adenine repeats (TA)7 (UGT1A1∗ 28, rs8175347) in the TATA box promoter motif (Bosma et al., 1995; Borlak et al., 2000) and mean bilirubin levels of (TA)7 homozygotes are approximately double those of (TA)6 homozygotes (Lampe et al., 1999; Premawardhena et al., 2003; Lin et al., 2006). Although neurotoxic at very high levels, particularly in children, as a potent antioxidant, moderately elevated bilirubin has been proposed to protect against adult oxidative stress-mediated diseases (Stocker et al., 1987). Indeed, strong negative associations have been observed between bilirubin level and incidence of cancer and cardiovascular disease (Novotny & Vitek, 2003; Zucker et al., C 2011 The Authors C 2011 Blackwell Publishing Ltd/University College London Annals of Human Genetics UGT1A in African populations 2004; Temme et al., 2001). Raised bilirubin levels can also inhibit replication in vitro of various blood pathogens including pneumococcus, the malaria parasite Plasmodium, and human immunodeficiency virus (HIV) (Najib, 1937; McPhee et al., 1996; Kumar et al., 2008). The prevalence of (TA)7 homozygosity in European populations is 6–10% (Premawardhena et al., 2003). Even higher frequencies have been reported in sub-Saharan Africa (Premawardhena et al., 2003). Though rarely identified in other populations, two additional repeat alleles, (TA)5 and (TA)8, are also present at low frequency in people of recent African descent (Beutler et al., 1998; Premawardhena et al., 2003). There is a negative association between UGT1A1 expression and repeat length of the four alleles, attributable to decreasing promoter activity acting via altered affinity for the TATA-binding protein (Beutler et al., 1998; Hsieh et al., 2007). Although the (TA)n alleles appear to have similar effects on bilirubin levels in people of recent African descent (Chaar et al., 2005; Hong et al., 2007; Carpenter et al., 2008) and low-activity alleles confer significantly raised risk of developing gallstones requiring surgery (Passon et al., 2001; Heeney et al., 2003), Gilbert’s syndrome is rarely diagnosed in Africa (Bougouma et al., 1999). Homozygosity for (TA)7 has also been associated with adverse drug reactions (ADRs) due to reduced clearance, most notably life-threatening toxicity to chemotherapy with high-dose irinotecan (Hoskins et al., 2007). Data supporting this association led the Food and Drug Administration (FDA) in 2004 to alter the label to recommend a lower starting dose for patients with the (TA)7 /(TA)7 genotype (New Drug Application 20-571). Severe hyperbilirubinaemia following treatment with the HIV protease inhibitors indinavir and atazanavir is also much more frequent in (TA)7 homozygotes due to the inhibitory effect of these drugs on UGT1A1 activity (Danoff et al., 2004; Zhang et al., 2005; Lankisch et al., 2006; Rodriguez-Novoa et al., 2007). However, it is likely that the pharmacogenetic effects of (TA)7 are confounded by additional common functional variants located in the UGT1A1 regulatory regions and in the other enzymes encoded by the UGT1A gene complex (Lankisch et al., 2006). This gene complex encodes the nine UGT1A isoforms and in Europeans and East Asians a region of strong linkage disequilibrium (LD) extends across much of the complex (about 90 kb) (Innocenti et al., 2005). Several other “lowactivity” alleles reside on the same haplotype as (TA)7 in European populations (Innocenti et al., 2002; Kohle et al., 2003; Innocenti et al., 2005; Menard et al., 2009), and probably play a role in outcomes associated with irinotecan and HIV therapy (Lankisch et al., 2006; Lankisch et al., 2009). Lower levels of LD across the UGT1A gene complex have been reported in African–Americans and for the Yoruba of Nigeria (Innocenti et al., 2002; Odeberg et al., 2006; Hong et al., C 2007), but studies for other indigenous African groups are lacking. It is increasingly clear that the people of the African continent show higher levels of genetic diversity and population substructure than most human populations on other continents. The study of pharmacogenetically relevant variation in Africa is thus particularly important for identifying groups potentially at risk of poor drug response or ADRs. While cancer therapy with irinotecan must be comparatively rare on the African continent, this drug is used to treat people of recent African descent in the United States and Europe. Also the possible implications of UGT1A variation with respect to the HIV treatments that are subsidised for use across Africa, and the negative interaction of low-activity UGT1A1 (TA)n promoter variants, with inherited blood disorders common in parts of Africa, are of considerable clinical importance in Africa itself (Chaar et al., 2005; Kaplan et al., 2008). Our first aim was therefore to establish the allele frequencies of (TA)n in different parts of the continent in relation to geography and ethnic origin. The second aim was to determine whether there are differences, across these defined populations, in the haplotype backgrounds of the (TA)7 allele with respect to other functional single-nucleotide polymorphisms (SNPs), which might indicate greater functional diversity both in Africa and in people of recent African descent. For this study, two SNPs were selected that are thought to play a role in irinotecan metabolism and toxicity (Innocenti et al., 2004; Cote et al., 2007). In order to determine whether there is any further variation in the immediate promoter that might modulate expression in some Africans, we also sequenced the region immediately upstream of the start of translation of UGT1A1 in a subset of the samples. Materials and Methods Samples The 2316 buccal DNA samples analysed in this study are part of an in-house collection assembled by The Centre for Genetic Anthropology at University College London. All samples were collected from ostensibly healthy individuals unrelated at the paternal grandfather level and were anonymous, since names were not recorded. They were collected between 1998 and 2007 with informed consent and ethical approval (UCLH 99/0196). The samples tested were from 18 countries across six geographic regions defined as follows: North Europe (NE), the Middle East (ME), North Africa (NA), West Africa (WA), Central East Africa (CEA), and South East Africa (SEA) (Veeramah et al., 2008). Selfreported cultural identity/ethnicity and language details were also available for the majority of the panel. For the most detailed analyses, country subgroups of 40 or more individuals of the same self-declared cultural identity/language group were tested separately. 2011 The Authors C 2011 Blackwell Publishing Ltd/University College London Annals of Human Genetics Annals of Human Genetics (2011) 75,236–246 237 L. J. Horsfall et al. Genotyping For this study, two SNPs were selected in addition to (TA)n variant: the rs10929302 (−3156G > A, UGT1A1∗ 93) located in the phenobarbital response enhancer module (PBREM) located approximately 3 kb upstream of the (TA)n variant that has been claimed to better predict irinotecan toxicity (Innocenti et al., 2004; Cote et al., 2007), and the nonsynonymous SNP rs11692021 (Trp208Arg, UGT1A7∗ 3) located in the substrate-binding exon of UGT1A7 (MIM∗ 606432) located approximately 90 kb upstream from (TA)n , which reduces glucuronidation of SN-38, the active metabolite of irinotecan. All three loci are in strong LD in European populations (Kohle et al., 2003; Innocenti et al., 2002). The (TA)n variant was assayed by a previously reported technique using high-percentage polyacrylamide gels (Sampietro et al., 1998). The selected SNPs were assayed using TaqMAN technology (Applied Biosystems, Foster City, CA). TaqMAN probes were designed by Applied Biosystems and polyermase chain reactions were performed in 384-well microplates using a gradient cycler. TaqMAN probes are reported in the supplementary Table S1A. Fluorescence was measured using an ABI Prism 7000 (Applied Biosystems, Applera, UK, Warrington, Cheshire, UK) sequence detection system, and genotypes were assigned with 95% confidence using ABI Prism 7000 SDS software version 2.1. A batch of 368 samples from African and nonAfrican groups was first tested to check that there was adequate allelic variation in the populations under study, and of these, 156 samples were replicated in the larger panel to validate typing. Call rates were >95% for rs10929302 and >92% for UGT1A7 rs11692021. In all instances, researchers were blind to the sample origin at the time of typing. A region upstream (−380) and downstream (+60) of the ATG start site of UGT1A1 (∼ −330 from the (TA)n sequence to ∼ +100 of the (TA)n sequence) was resequenced in a subset of 372 African samples to represent each geographic region (65 from Algeria [NA], 82 from Cameroon [WA], 148 from Ethiopia [CEA], 77 from Malawi [SEA] and included most from Ethiopia which is the most diverse country) using an ABI 96-capillary 3730xl DNA Analyzer (Applied Biosystems, Applera, UK) (see Table S2 for sequencing primers). This allowed typing of rs34547608 (at −52 bp from (TA)n ) and rs887829 (at −310 bp from (TA)n )). In all cases, genotypes were inferred assuming no silent alleles. Data Analyses All analyses were performed using Arlequin 3.1 unless otherwise specified (Excoffier et al., 2005). Exact tests for deviation from Hardy–Weinberg equilibrium were performed (using 10,000 steps in a Markov chain; 10,000 dememorization steps). For display on the map in Figure 1, (TA)n genotypes were recoded into three “expression” phenotypes using groups assigned from bilirubin levels in a study on people with recent African ancestry (African-Caribbean) (Chaar et al., 2005). Comparisons of genetic distances between populations (regions, countries, and ethnic groups) based on (TA)n genotype 238 Annals of Human Genetics (2011) 75,236–246 frequencies were made by calculating pairwise F ST values (10,000 permutations). Because of the large number of different ethnic groups with very few individuals, we limited the analysis of ethnic groups to those with at least 40 members (n = 1838). To visualize these differences, principal coordinates analysis (PCO) was performed on F ST matrices within R-programming environment using routines in the APE package. Genetic similarity was quantified as being equal to the value of F ST subtracted from one. Values along the main diagonal, which represent the similarity of each population to itself, were calculated from the estimated genetic distance between two copies of the same sample by the formula n/(n−1). The D measure of LD between the three genotyped loci was calculated using LDMax which uses the expectationmaximization algorithm to determine phase and is available as part of the GOLD software package (http://www.sph. umich.edu/csg/abecasis/GOLD/docs/stats.html). Haplotypes were inferred using PHASE v2.1.1 (100 iterations; 500 burn-in). The resulting haplotype frequencies were used to calculate Nei’s gene diversity index (h) and population differentiation using exact tests (Markov chain length 100,000 steps). Where appropriate, the standard Bonferroni correction for multiple testing was applied by multiplying the significance value by the number of comparisons. Results UGT1A1 (TA)n Allele Frequencies The allele frequencies of UGT1A1 (TA)n. are presented in Table 1 (see Table S3 for genotype frequency data). The allele frequency of (TA)7 ranged from 0.32 in Yemen and the Chewa of Malawi to 0.60 in the Anuak of Ethiopia. In Tanzania, Uganda, southern Sudan, Nigeria, Ethiopian Anuak, and all ethnic groups in Cameroon and Ghana, (TA)7 is the most common variant. The (TA)5 and (TA)8 alleles, which were not detected in the British sample, were present at low frequencies in all of the other groups tested. Overall, the (TA)5 allele was more prevalent than (TA)8 and reached a frequency of 0.10 or above in five of the 13 sub-Saharan African countries. The Ethiopian Anuak was the only sub-Saharan ethnic group without a single occurrence of (TA)5 . Although this is a dinucleotide repeat (or microsatellite) polymorphism, no novel alleles were identified. The geographic and ethnic distribution of inferred low-, intermediate- and high-expression phenotypes based on recoded genotype data are presented in Figure 1. The distribution shows that low-activity genotypes are highly prevalent in equatorial regions of Africa and that Ethiopia has the highest within country interethnic group variability. Pairwise FST results and associated p-values are shown in supplementary Tables S4A–C. The pairwise Fst values show significant differentiation between sub-Saharan African regions and regions outside of sub-Saharan Africa, (though for C 2011 The Authors C 2011 Blackwell Publishing Ltd/University College London Annals of Human Genetics UGT1A in African populations Figure 1 Distribution of UGT1A1 (TA)n rs8175347 genotypes categorized as low ((TA)7/7 , (TA)7/8 , or (TA)8/8 ), intermediate ((TA)6/7 or (TA)6/8 ), and high ((TA)5 or homozygous (TA)6/6 ) expression genotypes across countries and country subgroups. See Table 1 for full details of groups. Bantu is short for Bantu language speakers. Note that the frequencies of the low-activity alleles in the different country groups are significantly higher in the equatorial belt (+10 to −10 latitude) than elsewhere, p = 0.000015, Student’s t-test and also significantly higher for the equatorial belt than the rest of Africa (p = 0.00019). However also note the interethnic differences, particularly in Ethiopia. CEA, statistical significance did not remain after Bonferroni correction). However, there was little differentiation between countries within regions or between ethnic groups within countries in most cases. The exceptions were Senegal in the WA region and the Ethiopian ethnic groups. A PCO plot derived from pairwise Fst measurements between all the distinct ethnic/language groups shows that while the SEA groups cluster, the CEA and WA are more differentiated (Fig. 2). The increasing values on the first principal component axis broadly correspond to increasing (TA)7 frequencies. Analysis of the Two SNPs, rs11692021 and rs10929302 The allele frequencies of the two SNPs by country and ethnic group are shown in Table 1. The globally minor allele of the UGT1A7 nonsynonymous SNP rs11692021 was at highest C frequency in the countries and individual ethnic groups in the CEA region (range: 0.33–0.53), at relatively lower frequency in SEA (range: 0.15–0.29) and at intermediate frequency in WA and the regions outside of sub-Saharan Africa (range: 0.21–0.41). A similar pattern was seen with UGT1A1 rs10929302, though the differences were less marked. Variability of LD in Different Countries and Ethnic Groups Pairwise D values, which give a measure of recombination, were calculated using data from countries and ethnic groups separately. There were distinct differences in the patterns of LD in each of the groups (see supplementary Table S5 for D values). Samples from the countries outside of sub-Saharan Africa showed the highest LD, with D of greater than 0.92 between the (TA)n and the UGT1A1 PBREM SNP 2011 The Authors C 2011 Blackwell Publishing Ltd/University College London Annals of Human Genetics Annals of Human Genetics (2011) 75,236–246 239 L. J. Horsfall et al. Table 1 Allele frequency (≥1%) by country and country subgroup (based on self-declared cultural identity/ethnic group or language group) of (TA)n and the two SNPs, rs10929302 and rs11692021. Gene locus/rs numbers and ∗ nomenclature UGT1A1 UGT1A1 UGT1A7 rs8175347 rs10929302 rs11692021 Region Country Group Number (TA)5 UGT1A1∗ 36 (TA)6 UGT1A1∗ 1 (TA)7 UGT1A1∗ 28 (TA)8 UGT1A1∗ 37 −3156 A UGT1A1∗ 93 +622 C UGT1A7∗ 3/∗ 4 NE Britain British 90 0.00 0.64 0.35 0.00 0.33 0.35 ME Turkey Yemen Anatolian 96 120 0.00 0.03 0.62 0.64 0.38 0.32 0.01 0.00 0.35 0.32 0.41 0.35 NA Morocco 89 72 0.02 0.02 0.63 0.65 0.35 0.33 0.00 0.00 0.30 0.27 0.36 0.35 Algeria 183 0.01 0.64 0.34 0.01 0.32 0.38 Senegal 191 96 95 135 90 45 99 273 90 42 76 0.10 0.13 0.07 0.10 0.11 0.10 0.09 0.05 0.02 0.06 0.08 0.49 0.45 0.54 0.37 0.39 0.33 0.43 0.41 0.41 0.43 0.38 0.36 0.39 0.34 0.47 0.46 0.49 0.45 0.50 0.53 0.46 0.51 0.04 0.03 0.05 0.05 0.04 0.08 0.03 0.04 0.04 0.05 0.03 0.27 0.31 0.24 0.37 0.40 0.30 0.33 0.40 0.43 0.38 0.44 0.25 0.21 0.30 0.29 0.30 0.27 0.24 0.31 0.37 0.36 0.26 367 149 106 102 156 129 41 0.01 0.00 0.00 0.01 0.02 0.04 0.05 0.54 0.63 0.37 0.58 0.52 0.39 0.39 0.43 0.35 0.60 0.38 0.36 0.48 0.53 0.02 0.02 0.03 0.02 0.03 0.05 0.04 0.36 0.31 0.47 0.36 0.32 0.40 0.38 0.50 0.53 0.42 0.53 0.46 0.33 0.40 39 50 260 91 61 56 84 0.06 0.09 0.13 0.14 0.16 0.11 0.13 0.41 0.43 0.48 0.51 0.48 0.48 0.48 0.46 0.43 0.33 0.32 0.33 0.35 0.37 0.06 0.01 0.05 0.03 0.03 0.06 0.03 0.39 0.36 0.27 0.28 0.28 0.26 0.28 0.24 0.29 0.19 0.19 0.19 0.19 0.16 56 197 101 0.05 0.10 0.10 0.54 0.50 0.50 0.35 0.34 0.32 0.01 0.06 0.07 0.34 0.26 0.27 0.20 0.17 0.15 96 0.09 0.48 0.38 0.04 0.26 0.18 2616 0.06 0.51 0.39 0.03 0.33 0.32 Ifrane Berbers WA Manjak Wolof Ghana Nigeria Cameroon Bulsa Kasena Igbo Arabe Kotoko Mambila CEA Ethiopia Amhara Anuak Oromo Sudan North Sudan South Dinka SEA Uganda Tanzania Malawi Mozambique Chagga Chewa Tumbuka Yao Sena Bantu speakers Zimbabwe South Africa Bantu speakers Lemba Overall Values in bold represent the full dataset for the country, and include the groups listed below, as well as individuals in groups of less than 40, frequency data are also shown for more uniform/ethnic groups with datasets of more than 40 people. The “Arabe” from near Lake Chad, Cameroon are described by anthropologists as Choa, Shuwa, or Shewa Arabs. The “Bantu speakers” refer to Bantu language speakers, of varied or unrecorded ethnic group, collected in Sena, Mozambique, or in South Africa, where they were considered distinct from the Lemba collected at the same time. The Yemeni samples (various ethnicities) were all collected in the Hadramaut region. All alleles are shown for rs8175347 and only minor alleles (globally) are shown for rs10929302 and rs11692021. After adjusting for multiple comparisons, there was no evidence for significant deviation from Hardy–Weinberg equilibrium within regions, countries, or ethnic groups for any of the loci, with respectively 1/18, 2/54, and 3/66 comparisons across the three loci showing p-values of between 0.05 and 0.01 before correction. Full genotype data are shown in Table S4. NE = Northern Europe; ME = Middle East; NA = North Africa; WA = West Africa; CEA = Central East Africa; SEA = South East Africa. 240 Annals of Human Genetics (2011) 75,236–246 C 2011 The Authors C 2011 Blackwell Publishing Ltd/University College London Annals of Human Genetics UGT1A in African populations Figure 2 Principal coordinates plot of the pairwise FST values for the country subgroups. Calculated using UGT1A (TA)n frequency data. NE = Northern Europe; ME = Middle East; NA = North Africa; WA = West Africa; CEA = Central East Africa; SEA = South East Africa. See Table 1 for full details of groups. Bantu is short for Bantu language speakers. See supplementary Tables 4A–C for the pairwise FST data and p-values. This plot shows the clustering of the SEA groups that contrasts with the much greater genetic distances between the CEA groups. rs10929302. The CEA region showed the lowest level of LD. Within Ethiopia, significant LD was detected across all three pairs of loci in the Anuak but for none in the Oromo. Haplotype frequencies estimated using PHASE are presented in Table 2. Very similar frequencies were obtained using the expectation-maximization algorithm (data not shown). The haplotype frequencies and estimated diversity indices (see supplementary Fig. S1 for Nei’s h values) show that haplotypic diversity is greater in sub-Saharan Africa. The haplotype encompassing all three “high” activity alleles (TG6: ancestral T allele of rs11692021 and the G allele of rs10929302 together with (TA)6 ) is the most prevalent in all groups except for the Ethiopian Anuak (where the low-activity haplotype CA7 is slightly more frequent). The only other major haplotype background for (TA)6 was CG6. Overall (TA)7 occurs most frequently as part of haplotype CA7 (derived C allele of rs11692021 and the derived A allele of rs10929302) but has a more diverse haplotypic background in the sub-Saharan groups where the frequencies of TA7 and TG7 were found C to be over 0.10 in many instances Although (TA)8 is relatively rare, its haplotypic background appears the most variable, while the other rare allele (TA)5 , has only one major background (TG5), again a combination with the ancestral SNP alleles (see Table S7 for comments on ancestral alleles). Exact tests of population differentiation using haplotype frequencies (see supplementary Table S6 for p-values) show the most genetic differentiation between the Europeans (British and Turkish), and all others, and also between the Ethiopian Amhara and Oromo, and all others (including the Ethiopian Anuak). Resequencing of the UGT1A1 Promoter As a pilot study to check the promoter sequence context of the low-activity (TA)n alleles in Africans, sequence of the immediate UGT1A1 promoter region (−250 from the (TA)n sequence to +100 of the (TA)n sequence) was scanned for 2011 The Authors C 2011 Blackwell Publishing Ltd/University College London Annals of Human Genetics Annals of Human Genetics (2011) 75,236–246 241 Table 2 Estimated haplotype frequency (>1%) by country and country sub-group. L. J. Horsfall et al. 242 Annals of Human Genetics (2011) 75,236–246 C 2011 The Authors C 2011 Blackwell Publishing Ltd/University College London Annals of Human Genetics UGT1A in African populations Table 3 Haplotypes comprising the three promoter loci (from left rs887829, rs34547608, and (TA)n ) inferred using PHASE for a subset of African samples (n = 372). Promoter haplotype CT6 TT7 CC5 TT8 CT7 TT6 CC6 CT5 CT8 TT5 Total chromosomes NA Algeria Amhara CEA Anuak 0.677 0.308 0.700 0.280 0.326 0.612 0.008 0.020 0.054 0.008 Oromo 0.537 0.417 0.009 0.009 0.028 WA Cameroon SEA Malawi 0.470 0.445 0.037 0.043 0.448 0.325 0.156 0.039 0.006 0.006 0.006 0.008 0.006 130 50 138 108 164 0.006 0.006 154 Frequency 0.502 0.410 0.042 0.031 0.007 0.003 0.001 0.001 0.001 0.001 Haplotypes named according to the allele composition. See Table S3 for details of the extended haplotypes and the methods section for details of samples. It can be seen that for the vast majority of cases (98.7%) that the C allele of rs887829 is found with (TA)5 or (TA)6 and the T allele is found with (TA)7 or (TA)8 . a total of 372 samples. No novel variation was identified, but the previously reported rs34547608 and rs887829 were found and typed in all 372 individuals. These SNPs do not significantly increase the haplotypic diversity, the rs34547608 C allele being very tightly associated with the (TA)5 allele (confirming previous reports based on data from 101 African– Americans (Beutler et al., 1998), and the rs887829 T allele being tightly associated with (TA)7 and also (TA)8 . Inferred three locus haplotype frequencies are reported in Table 3, and five locus haplotypes in supplementary Table S7. Discussion In this paper, we confirm previous observations that the promoter variant (TA)7 of UGT1A1, which is associated with reduced UGT1A1 activity, hyperbilirubinaemia, and specific ADRs, occurs in a region of strong LD in non-African populations. However in Africa, where there are also more (TA)n alleles (TA5,6,7 and 8), we show more heterogeneity of haplotype background as well as large differences in the frequency of the alleles in different regions. Overall there is a geographic trend. The low-activity (TA)n genotypes are more prevalent in the equatorial regions but the haplotype diversity is greater. There are also differences between ethnic groups within a single country, and these are statistically significant in the case of Ethiopia. The Anuak show much higher frequencies of the (TA)7 allele and the low-activity haplotype, while the Oromo show a very high level of haplotype diversity. These distributions may simply reflect demography but it is interesting to note an apparent correspondence to the distribution of malaria. For example, the Ethiopian Anuak with the C highest frequencies of (TA)7/8 live in the low-lying western regions around Gambella, where malaria is endemic, whereas the Amhara and Oromo, with lower frequencies, live in the eastern highland regions where malaria is infrequent or absent. It is noteworthy that high levels of unconjugated bilirubin can inhibit P. falciparum replication, suggesting that low UGT1A1 activity may possibly have conferred a selective advantage by protection from malaria, similar to other genetic traits such as glucose-6-phosphate dehydrogenase (G6PD) deficiency and sickle cell anaemia (Kumar et al., 2008). Others have noted that high frequencies of the (TA)7 allele occur in other areas where malaria is endemic such as much of the Indian subcontinent (Premawardhena et al., 2003). For drugs, such as irinotecan, a combination of (TA)7 and functional SNPs in other UGT1A isoforms, such as UGT1A7, has been proposed to be a better predictor of drug toxicity (Lankisch et al., 2008), but these coexist on the same haplotype (CA7) so that the whole haplotype is predictive of risk, and it is hard to separate the effects of the TATA box variation from that of other functional SNPs. In the African populations studied here the situation is quite different and recombination has separated the low-activity alleles. In the TA7 haplotype, for example, (frequency 0.22 in the Tanzanian sample) the low-expression (TA)7 allele is on the same chromosome as the high-activity UGT1A7 allele while the converse is true of the CG6 which is frequent in the Ghanaian Bulsa. When, in 2004, the FDA approved a commercial test to predict a potentially fatal response to irinotecan therapy, they did not consider the complexity of the possible interaction with other functional SNPs in the UGT1A1 regulatory elements and within other UGT1A isoforms, which are predicted to lead to intermediate phenotype. Thus many 2011 The Authors C 2011 Blackwell Publishing Ltd/University College London Annals of Human Genetics Annals of Human Genetics (2011) 75,236–246 243 L. J. Horsfall et al. African–Americans may be prescribed doses of the drug based on advice that might not be relevant for people of all ancestries. The results described in this paper, in particular the evidence of greater haplotype diversity across the UGT1A complex, in sub-Saharan Africa than in Europe, emphasise the need for further investigation of the effect of the other functional UGT1A variants in addition to (TA)n on the risk of hyperbilirubinaemia due to interactions with hemolytic anaemia, or treatment with HIV protease inhibitors, as well as to the risk of irinotecan toxicity. In addition, further resequencing of the rest of the UGT1A gene complex in people of African ancestry is indicated. Our pilot resequencing of the UGT1A1 promoter, in a diverse sample set, however, failed to identify novel SNPs and typing of the previously reported rs887829 and rs34547608 showed that there is very little recombination with (TA)n, so that even if these alleles modulate the function of (TA)n , the effect of this would be seen only in rare individuals. The haplotype and ethnicity information reported here will help in the construction of appropriate phenotype–genotype association studies and development of better diagnostic tests. As well as testing other functional variants, it seems reasonable to suggest that rs887829 might be useful diagnostically as a marker for (TA)7 and (TA)8 since it would be easier to incorporate into multiplex assays than the microsatellite. This would also provide a solution to the problem that the (TA)8 allele cannot be typed in the commerR UGT1A1 molecular assay package insert) cial assay (Invader despite its importance as a risk allele. Acknowledgements We thank all the sample donors, and also the DNA collectors: Leila Laredj, Matthew Forka, Liz Caldwell, M. le Roux, Pieta Näsänen, Tudor Parfitt, Tankei Helenius, Dr. Fouad Berrada, Esther William, D. Gomis, H. Babiker, J. Course, Hicram, James Wilson; Ranji Arasaretnam, Mari Wyn Burley, Heather Elding, and Anke Liebert for help with electrophoresis and sequencing; the Melford Charitable Trust for providing funding; Dr. Stephen Pereira for helpful discussion. References Beutler, E., Gelbart, T. & Demina, A. (1998) Racial variability in the UDP-glucuronosyltransferase 1 (UGT1A1) promoter: A balanced polymorphism for regulation of bilirubin metabolism? Proc Natl Acad Sci USA 95, 8170–8174. Borlak, J., Thum, T., Landt, O., Erb, K. & Hermann, R. (2000) Molecular diagnosis of a familial nonhemolytic hyperbilirubinemia (Gilbert’s syndrome) in healthy subjects. Hepatology 32, 792– 795. Bosma, P. J., Chowdhury, J. R., Bakker, C., Gantla, S., De Boer, A., Oostra, B. A., Lindhout, D., Tytgat, G. N., Jansen, P. L. & Oude Elferink, R. P. (1995) The genetic basis of the reduced expression 244 Annals of Human Genetics (2011) 75,236–246 of bilirubin UDP-glucuronosyltransferase 1 in Gilbert’s syndrome. N Engl J Med 333, 1171–1175. Bosma, P. J., Seppen, J., Goldhoorn, B., Bakker, C., Oude Elferink, R. P., Chowdhury, J. R., Chowdhury, N. R. & Jansen, P. L. (1994) Bilirubin UDP-glucuronosyltransferase 1 is the only relevant bilirubin glucuronidating isoform in man. J Biol Chem 269, 17960–17964. Bougouma, A., Ilboudo, P. D., Bonkoungou, P., Sombie, R. & Siko, A. (1999) La maladie de Gilbert chez le Noir Africain: A propos de 4 observations au Centre Hospitalier National de Ouagadougou. Medecine d’Afrique Noire 45, 613–617. Buyukasik, Y., Akman, U., Buyukasik, N. S., Goker, H., Kilicarslan, A., Shorbagi, A. I., Hascelik, G. & Haznedaroglu, I. C. (2008) Evidence for higher red blood cell mass in persons with unconjugated hyperbilirubinemia and Gilbert’s syndrome. Am J Med Sci 335, 115–119. Carpenter, S. L., Lieff, S., Howard, T. A., Eggleston, B. & Ware, R. E. (2008) UGT1A1 promoter polymorphisms and the development of hyperbilirubinemia and gallbladder disease in children with sickle cell anemia. Am J Hematol 63, 800–803. Chaar, V., Keclard, L., Diara, J. P., Leturdu, C., Elion, J., Krishnamoorthy, R., Clayton, J. & Romana, M. (2005) Association of UGT1A1 polymorphism with prevalence and age at onset of cholelithiasis in sickle cell anemia. Haematologica 90, 188–199. Cote, J. F., Kirzin, S., Kramar, A., Mosnier, J. F., Diebold, M. D., Soubeyran, I., Thirouard, A. S., Selves, J., Laurent-Puig, P. & Ychou, M. (2007) UGT1A1 polymorphism can predict hematologic toxicity in patients treated with irinotecan. Clin Cancer Res 13, 3269–3275. Danoff, T. M., Campbell, D. A., Mccarthy, L. C., Lewis, K. F., Repasch, M. H., Saunders, A. M., Spurr, N. K., Purvis, I. J., Roses, A. D. & Xu, C. F. (2004) A Gilbert’s syndrome UGT1A1 variant confers susceptibility to tranilast-induced hyperbilirubinemia. Pharmacogenomics J 4, 49–53. Excoffier, L., Laval, G. & Schneider, S. (2005) Arlequin (version 3.0): An integrated software package for population genetics data analysis. Evol Bioinform Online 1, 47–50. Gwee, K. A., Koay, E. S. & Kang, J. Y. (1992) The prevalence of isolated unconjugated hyperbilirubinaemia (Gilbert’s syndrome) in subjects attending a health screening programme in Singapore. Singapore Med J 33, 588–589. Heeney, M. M., Howard, T. A., Zimmerman, S. A. & Ware, R. E. (2003) UGT1A promoter polymorphisms influence bilirubin response to hydroxyurea therapy in sickle cell anemia. J Lab Clin Med 141, 279–282. Hong, A. L., Huo, D., Kim, H. J., Niu, Q., Fackenthal, D. L., Cummings, S. A., John, E. M., West, D. W., Whittemore, A. S., Das, S. & Olopade, O. I. (2007) UDP-glucuronosyltransferase 1A1 gene polymorphisms and total bilirubin levels in an ethnically diverse cohort of women. Drug Metab Dispos 35, 1254–1261. Hoskins, J. M., Goldberg, R. M., Qu, P., Ibrahim, J. G. & Mcleod, H. L. (2007) UGT1A1∗ 28 genotype and irinotecan-induced neutropenia: Dose matters. J Natl Cancer Inst 99, 1290–1295. Hsieh, T. Y., Shiu, T. Y., Huang, S. M., Lin, H. H., Lee, T. C., Chen, P. J., Chu, H. C., Chang, W. K., Jeng, K. S., Lai, M. M. & Chao, Y. C. (2007) Molecular pathogenesis of Gilbert’s syndrome: Decreased TATA-binding protein binding affinity of UGT1A1 gene promoter. Pharmacogenet Genomics 17, 229–236. Innocenti, F., Grimsley, C., Das, S., Ramirez, J., Cheng, C., KuttabBoulos, H., Ratain, M. J. & Di Rienzo, A. (2002) Haplotype structure of the UDP-glucuronosyltransferase 1A1 promoter in different ethnic groups. Pharmacogenetics 12, 725–733. C 2011 The Authors C 2011 Blackwell Publishing Ltd/University College London Annals of Human Genetics UGT1A in African populations Innocenti, F., Liu, W., Chen, P., Desai, A. A., Das, S. & Ratain, M. J. (2005) Haplotypes of variants in the UDP-glucuronosyltransferase1A9 and 1A1 genes. Pharmacogenet Genomics 15, 295–301. Innocenti, F., Undevia, S. D., Iyer, L., Chen, P. X., Das, S., Kocherginsky, M., Karrison, T., Janisch, L., Ramirez, J., Rudin, C. M., Vokes, E. E. & Ratain, M. J. (2004) Genetic variants in the UDPglucuronosyltransferase 1A1 gene predict the risk of severe neutropenia of irinotecan. J Clin Oncol 22, 1382–1388. Kaplan, M., Slusher, T., Renbaum, P., Essiet, D. F., Pam, S., Levy-Lahad, E. & Hammerman, C. (2008) (TA)n UDPglucuronosyltransferase 1A1 promoter polymorphism in Nigerian neonates. Pediatr Res 63, 109–111. Kohle, C., Mohrle, B., Munzel, P. A., Schwab, M., Wernet, D., Badary, O. A. & Bock, K. W. (2003) Frequent co-occurrence of the TATA box mutation associated with Gilbert’s syndrome (UGT1A1∗ 28) with other polymorphisms of the UDPglucuronosyltransferase-1 locus (UGT1A6∗ 2 and UGT1A7∗ 3) in Caucasians and Egyptians. Biochem Pharmacol 65, 1521–1527. Kornberg, A. (1942) Latent liver disease in persons recovered from catarrhal jaundice and in otherwise normal medical students as revealed by the bilirubin excretion test. J Clin Invest 21, 299–308. Kumar, S., Guha, M., Choubey, V., Maity, P., Srivastava, K., Puri, S. K. & Bandyopadhyay, U. (2008) Bilirubin inhibits Plasmodium falciparum growth through the generation of reactive oxygen species. Free Radic Biol Med 44, 602–613. Lampe, J. W., Bigler, J., Horner, N. K. & Potter, J. D. (1999) UDPglucuronosyltransferase (UGT1A1∗ 28 and UGT1A6∗ 2) polymorphisms in Caucasians and Asians: Relationships to serum bilirubin concentrations. Pharmacogenetics 9, 341–349. Lankisch, T., Moebius, U., Wehmeier, M., Behrens, G., Manns, M., Schmidt, R. & Strassburg, C. (2006) Gilbert’s disease and atazanavir: From phenotype to UDP-glucuronosyltransferase haplotype. Hepatology 44, 1324–1332. Lankisch, T. O., Behrens, G., Ehmer, U., Mobius, U., Rockstroh, J., Wehmeier, M., Kalthoff, S., Freiberg, N., Manns, M. P., Schmidt, R. E. & Strassburg, C. P. (2009) Gilbert’s syndrome and hyperbilirubinemia in protease inhibitor therapy– an extended haplotype of genetic variants increases risk in indinavir treatment. J Hepatol 50, 1010–1018. Lankisch, T. O., Schulz, C., Zwingers, T., Erichsen, T. J., Manns, M. P., Heinemann, V. & Strassburg, C. P. (2008) Gilbert’s Syndrome and irinotecan toxicity: Combination with UDPglucuronosyltransferase 1A7 variants increases risk. Cancer Epidemiol Biomarkers Prev 17, 695–701. Lin, J. P., O’Donnell, C. J., Schwaiger, J. P., Cupples, L. A., Lingenhel, A., Hunt, S. C., Yang, S. & Kronenberg, F. (2006) Association between the UGT1A1∗ 28 allele, bilirubin levels, and coronary heart disease in the Framingham Heart Study. Circulation 114, 1476–1481. McPhee, F., Caldera, P., Bemis, G., Mcdonagh, A., Kuntz, I. & Craik, C. (1996) Bile pigments as HIV-1 protease inhibitors and their effects on HIV-1 viral maturation and infectivity in vitro. Biochem J 320(Pt 2), 681–686. Menard, V., Girard, H., Harvey, M., Perusse, L. & Guillemette, C. (2009) Analysis of inherited genetic variations at the UGT1 locus in the French-Canadian population. Hum Mutat 30, 677–687. Najib, F. (1937) Defensive role of bilirubinemia in pneumococcal infection. The Lancet 229, 505–506. New Drug Application 20-571 - Final label – UGT1A1 Camptosar R (irinotecan HCl). http://www.fda.gov/MedWatch/safety/ 2005/Jun_PI/Camptosar_PI.pdf. C Novotny, L. & Vitek, L. (2003) Inverse relationship between serum bilirubin and atherosclerosis in men: A meta-analysis of published studies. Exp Biol Med (Maywood) 228, 568–571. Odeberg, J. M., Andrade, J., Holmberg, K., Hoglund, P., Malmqvist, U. & Odeberg, J. (2006) UGT1A polymorphisms in a Swedish cohort and a human diversity panel, and the relation to bilirubin plasma levels in males and females. Eur J Clin Pharmacol 62, 829– 837. Owens, D. & Evans, J. (1975) Population studies on Gilbert’s syndrome. J Med Genet 12, 152–156. Passon, R. G., Howard, T. A., Zimmerman, S. A., Schultz, W. H. & Ware, R. E. (2001) Influence of bilirubin uridine diphosphateglucuronosyltransferase 1A promoter polymorphisms on serum bilirubin levels and cholelithiasis in children with sickle cell anemia. J Pediatr Hematol Oncol 23, 448–451. Premawardhena, A., Fisher, C. A., Liu, Y. T., Verma, I. C., De Silva, S., Arambepola, M., Clegg, J. B. & Weatherall, D. J. (2003) The global distribution of length polymorphisms of the promoters of the glucuronosyltransferase 1 gene (UGT1A1): Hematologic and evolutionary implications. Blood Cells Mol Dis 31, 98–101. Rodriguez-Novoa, S., Martin-Carbonero, L., Barreiro, P., Gonzalez-Pardo, G., Jimenez-Nacher, I., Gonzalez-Lahoz, J. & Soriano, V. (2007) Genetic factors influencing atazanavir plasma concentrations and the risk of severe hyperbilirubinemia. Aids 21, 41–46. Sampietro, M., Lupica, L., Perrero, L., Romano, R., Molteni, V. & Fiorelli, G. (1998) TATA-box mutant in the promoter of the uridine diphosphate glucuronosyltransferase gene in Italian patients with Gilbert’s syndrome. Ital J Gastroenterol Hepatol 30, 194–198. Stocker, R., Yamamoto, Y., Mcdonagh, A. F., Glazer, A. N. & Ames, B. N. (1987) Bilirubin is an antioxidant of possible physiological importance. Science 235, 1043–1046. Temme, E. H., Zhang, J., Schouten, E. G. & Kesteloot, H. (2001) Serum bilirubin and 10-year mortality risk in a Belgian population. Cancer Causes Control 12, 887–894. Veeramah, K. R., Thomas, M. G., Weale, M. E., Zeitlyn, D., Tarekegn, A., Bekele, E., Mendell, N. R., Shephard, E. A., Bradman, N. & Phillips, I. R. (2008) The potentially deleterious functional variant flavin-containing monooxygenase 2∗ 1 is at high frequency throughout sub-Saharan Africa. Pharmacogenet Genomics 18, 877–886. Zhang, D., Chando, T., Everett, D., Patten, C., Dehal, S. & Humphreys, W. (2005) In vitro inhibition of UDP glucuronosyltransferases by atazanavir and other HIV protease inhibitors and the relationship of this property to in vivo bilirubin glucuronidation. Drug Metab Dispos 33, 1729–1739. Zucker, S. D., Horn, P. S. & Sherman, K. E. (2004) Serum bilirubin levels in the U.S. population: Gender effect and inverse correlation with colorectal cancer. Hepatology 40, 827–835. Supporting Information Additional supporting information may be found in the online version of this article: Table S1 Taqman primers. PBREM = phenobarbital response enhancer module Table S2 Sequencing primers. Table S3 Raw genotype data by country and country subgroup. 2011 The Authors C 2011 Blackwell Publishing Ltd/University College London Annals of Human Genetics Annals of Human Genetics (2011) 75,236–246 245 L. J. Horsfall et al. Table S4A Pairwise FST values for (TA)n for regions with significance values. Table S4B Pairwise FST values for (TA)n for countries with significance values. Table S4C Pairwise FST values for (TA)n ethnic/uniform country subgroups (≥40 members) with significance values. Table S5 Pairwise D values for the UGT1A loci rs8175347, rs10929302, and rs11692021 in countries and ethnic groups. Table S6 p-Values with standard deviations for exact tests of differentiation of UGT1A haplotype frequencies among samples. Table S7 Haplotypes comprising all five loci inferred using PHASE (on the separate groups), and their frequencies, 246 Annals of Human Genetics (2011) 75,236–246 for a subset of African samples compared with the probable ancestral alleles (as found in primates). Figure S1 Diversity indices (h) for the UGT1A haplotypes encompassing loci rs8175347, rs10929302, and rs11692021. As a service to our authors and readers, this journal provides supporting information supplied by the authors. Such materials are peer-reviewed and may be reorganised for online delivery, but are not copy-edited or typeset. Technical support issues arising from supporting information (other than missing files) should be addressed to the authors. Received: 5 August 2010 Accepted: 29 November 2010 C 2011 The Authors C 2011 Blackwell Publishing Ltd/University College London Annals of Human Genetics