doi: 10.1046/j.1529-8817.2003.00062.x An Analysis of Consanguinity and Social Structure Within the UK Asian Population Using Microsatellite Data A. D. J. Overall1,‡,∗ , M. Ahmad1 , M. G. Thomas2 and R. A. Nichols1 1 School of Biological Sciences, Queen Mary, University of London, London E1 4NS, UK 2 The Centre for Genetic Anthropology, University College London, London WC1E 6BT, UK Summary We analysed microsatellite genotypes sampled from the Pakistani and Indian communities in Nottingham, UK, to investigate the genetic consequences of substructuring mediated by traditional marriage customs. The application of a recently developed likelihood approach identified significant levels of population substructure within the Pakistani community as a whole, as well as within the finer divisions of castes and biradheri. In addition, high levels of cryptic or unacknowledged consanguinity were detected within subgroups of this community, including biradheri. The Indian sample showed no significant evidence of either substructure or consanguinity. We demonstrate that estimates of disease gene frequencies can be inaccurate unless they are made jointly with estimates of population substructure and consanguinity ((θ ≡ F ST ) and C). The magnitude of these estimates also highlights the importance of accounting for the finer scale of social structuring when making decisions regarding the risk of recessive disorders in offspring. Keywords: Consanguinity, substructure, inbreeding, microsatellites, UK Asian population. Introduction The UK Asian population, now numbering over a million individuals, was largely founded by migrants from the Indian sub-continent during the 1950s. This particular episode of migration was motivated by numerous factors, including large dam building projects in Kashmir and the booming textile industry throughout many of the UK Midland cities at this time. The majority of these early founders originated from just a few locations throughout the Sub-continent, namely the Punjab, Gujarat and to a smaller extent Bangladesh. Each of these regions is quite distinctive in their religions and culture which, in turn, ∗ Correspondence to: Dr. Andrew D. J. Overall, Institute of Cell, Animal and Population Biology, Ashworth Laboratories, West Mains Road, University of Edinburgh, Edinburgh EH9 3JT, UK. Tel: +44 (0)0131 651 3047. Fax: +44 (0)0131 650 5455. E-mail: andy.overall@ed.ac.uk ‡ Present address: Institute of Cell, Animal and Population Biology, Ashworth Laboratories, West Mains Road, University of Edinburgh, Edinburgh EH9 3JT, UK. C University College London 2003 have influenced the pattern of settlement of family members and the subsequent development of communities (Ballard, 1994). Although marriage traditions were not abandoned on arrival, the distance between Britain and the sub-continent has modified the traditional relationships between couples, particularly within cultures known to have quite elaborate marriage practices, such as consanguineous unions within Pakistani Muslim communities. An earlier study (Darr & Modell, 1988) identified an increase in consanguinity within a British-Pakistani Muslim community and suggested that this increase resulted from the direct effects of migration. Generally, however, studies of social structure have focused largely on the people of the subcontinent itself. For example, consanguinity has been estimated to range between 38 – 49% within the Punjab (Bittles et al. 1992) and differentiation between 8 Indian Hindu castes has been estimated to be high (GST = 0.04 (Mukherjee et al. 1999)). How these studies relate to the fine-scale social structure of the present Asian population of the UK is largely unknown. For this study, Annals of Human Genetics (2003) 67,525–537 525 A. D. J. Overall et al. genetic and sociological data were collected from the Asian community of Nottingham, UK, to measure the degree of consanguinity and substructure within this community and to identify any correlation between social networks, such as caste and biradheri (extended family networks) and patterns of multilocus homozygosity. Consanguinity and substructure within populations both have their consequences for health. Consanguinity, usually defined as marriage between second cousins or closer, can lead to an increased birth prevalence of recessive disorders through inheritance of a copy of a recessive allele from each parent (Devi et al. 1987; Chitty & Winter, 1989; Özlap et al. 1990; Bundey & Alam, 1993; Zlotogora, 1997; Hutchesson et al. 1998; Modell & Darr, 2002). There is also evidence that consanguinity can lead to an increased likelihood of spontaneous abortion (Hussain, 1998) and it has been estimated that first cousin progeny experience 4.4% more pre-reproductive mortalities than non-consanguineous offspring (Bittles & Neel, 1994). There is also evidence that consanguineous couples are more likely to have had consanguineous parents (Darr & Modell, 1988; Hussain, 1998), resulting in an accumulation of identity of alleles at loci above that expected simply through the consideration of parental relatedness. A detailed review of the outcomes of consanguineous offspring and its relevance to clinical genetics is given by Bittles (2001). Substructure within populations can also lead to a localised increase in the frequency of a recessive allele, and hence disorder (Heinisch et al. 1995; Bittles, 2001), but there are important differences with the case of consanguinity (Overall et al. 2002). In a subdivided population, genetic drift and founder events can elevate the local frequency of deleterious alleles, especially in societies that have strongly endogamous kinship groups. Because the increased frequency is due to chance effects, different alleles are likely to benefit in different subpopulations. For example, certain inherited disorders have been found to occur almost exclusively within individual tribal groups in Oman (Rajib & Patton, 1999). On the other hand, polygenic disorders due to epistatic interactions between alleles at multiple genes are more likely in populations with consanguineous marriage, as consanguineous offspring have an elevated probability of identity by descent (IBD) across the whole genome. 526 Annals of Human Genetics (2003) 67,525–537 Despite these differences in pattern of disease within populations, differentiating consanguinity from population substructure using genetic information is not practicable using conventional genetic analyses. The alternative of asking about marriage patterns has its own difficulties. In part, this is because of the sensitive nature of kin-relationships; there is often difficulty in obtaining reliable information orally. Furthermore, there may be unrecognised patterns due to the interplay between traditional marriage practices and other factors such as small population size or elevated endogamy in poorly defined subgroups. Small population size in some UK Asian populations, for example, may lead to cryptic consanguinity within traditionally caste endogamous communities. In addition, substructure may occur within castes through the formation of extended families, or biradheri. It could be important to recognise this finer scale of social structuring within the UK Asian communities, particularly when offering advice relating to the potential risks of genetic disorders within future offspring. This study addresses these issues by quantifying the relative contributions of consanguinity and substructure, at the caste and biradheri level, using a recently developed method that distinguishes between the two, by applying genotypic information (Overall & Nichols, 2001). Materials and Methods Study Populations A research worker of Punjabi descent and fluent in Punjabi collected information using a simple questionnaire. The location of the subject’s family origin was noted, where the origin refers to the previous home within the sub-continent of the subject’s family. Information regarding caste and parental relatedness was also collected. For the questionnaire, both caste and biradheri were recorded. Pakistani biradheri have been treated in the literature as equivalent traditional social/occupational groups to the Indian castes (Shami et al. 1994; Wang et al. 2000). Biradheri has also been used to identify a sub-caste (Vertovec, 1994) or a localised endogamous kin-network, within which endogamous marriages are arranged (Ballard, 1990; Shaw, 1994). By this definition, individuals belonging to a caste, such as Rajput, can belong to different and C University College London 2003 Consanguinity and Substructure in UK Asian Populations relatively unconnected biradheri within this caste. We attempted to identify individuals belonging to a tightknit kin-network within otherwise potentially substructured castes. A combination of both substructure and consanguinity may therefore be expected within the UK Asian communities. The data was obtained on a voluntary and anonymous basis at a GP surgery in Nottingham, in accordance with ethical committee guidelines (ELCHA Research Ethics Committee). Staff ensured that samples were not obtained from close relatives, although more distantly related individuals such as these belonging to the same biradheri, could not be easily identified. In total, 188 individuals contributed to the questionnaire and had buccal-scrape samples collected for subsequent genetic analysis. Genetic Data DNA was extracted from buccal swabs using the Chelex procedure (Walsh et al. 1991) and amplified at ten microsatellite loci: D2S1338, D3S1358, D8S1179, D16S539, D18S51, D19S433, D21S11, FGA, THO1 and vWA, using SGMPlus fluoro-labelled primers (P.E. Biosystems, 1999). The PCR products were run through a 5% polyacrylamide gel on an ABI 377 sequencer and the fragments were analysed using GeneScan and Genotyper software. Genetic Analysis The heterozygosity for each of the loci is given in Table 2a, along with the heterozygosity for two broadly sampled populations for comparison: US AfroCaribbean (n = 195) and US Caucasian (n = 200); data obtained from P.E. Biosystems User Manual (1999). Differences in allele frequency distributions were quantified as F ST values estimated for each locus using the Arlequin software package (Schneider et al. 2000), and results are given in Table 2b. Genetic identity was treated in a traditional hierarchical fashion that requires identification of alleles within individuals, subpopulations and total populations (Weir & Cockerham, 1984). For the subsets of individuals who could be identified as belonging to a specific group, such as a caste or ancestral country of origin, F-statistics were calculated using the Arlequin software package C University College London 2003 (Schneider et al. 2000). Estimates of F ST values for each identified Asian location represented by approximately 20 individuals (India, Pakistan, Jullundur, Mirpur and Kashmir) are given in Table 2c. With the Indian castes, only two qualified (Jat-Sikh and HinduKhatri), whereas for the Pakistani castes only Rajput numbered above 20 individuals. The remainder of the castes were pooled and are referred to as ‘non-Rajput’. It is believed by some historians that the Rajput now represents the amalgamation of 36 Hindu warrior castes on conversion to Islam, as well as comprising individuals self-identifying with this group (Wang et al. 2000). Nevertheless, inter-caste differentiation may still be recognisable due to ancient isolation, albeit with an expected reduction in power through comparing heterogeneous populations. Estimates of F IS were calculated for the same groups and are given in Table 3. This hierarchical F-statistics approach is possible when individuals can be reliably allocated into a series of subgroups. Using AMOVA (Arlequin 2000; Schneider et al. 2000), for example, would identify which level of subdivision results in the most differentiation. However, this approach requires that individuals be allocated to specific groups, and this allocation is difficult for the UK Asian population as social boundaries, rather than geographic barriers, may be influencing reproductive isolation. Unfortunately, social barriers within any community are difficult to delineate. Identifying castes within Pakistani communities, therefore, unlike Indian communities, may not accurately reflect endogamous groups. Incorrectly pooling differentiated subgroups will result in an increase in homozygosity over Hardy-Weinberg expectations, the Wahlund effect, which might be falsely attributed to consanguinity. In those cases where there was unexplained excess homozygosity we used the method of Overall & Nichols (2001) to analyse the genetic data. This approach estimates the relative contributions of substructure and consanguinity to excess homozygosity without prior assignment of specific subgroups. Maximum Likelihood Estimation of Consanguinity and Substructure The logic underpinning the method to distinguish consanguinity and substructure can be clarified by considering the elevated allelic homozygosity that can be Annals of Human Genetics (2003) 67,525–537 527 A. D. J. Overall et al. observed within subpopulations, which is quantified by F IS (Wright, 1921; Cockerham, 1973). In a large randomly mating subpopulation, the number of closely related couples is small. A pattern of marriage that favours higher frequencies of related pairings would elevate the probability of genes being IBD. Consequently homozygosity in individuals (I) is inflated above random expectations based on the subpopulation (S) frequencies (hence the subscripts I and S). The expected effect on F IS estimates is essentially given by F̂ I S = C kg =1 c g Rg . Here, c g is the proportion of the consanguines (C) inbred to degree Rg (e.g., c 1 is the proportion of the population inbred to degree R1 , where R1 = 1 /16 for offspring of first cousins. c 2 could be the proportion inbred to degree R2 where R = 1 /8 for offspring of half sibs, and so on, where 2k g =1 c g = 1). Generally, the effect on individuals with R < 1 /32 is negligible, so practical calculations need only consider the first five categories of inbreeding (up to k = 5). If subpopulations comprising the total population have different allele frequencies, then there will be an excess of homozygosity over expectation for an undivided panmictic population, the Wahlund effect. Some alleles will be locally common in a particular subpopulation (S), and pairs of such alleles drawn from the same subpopulation will occur at a higher rate than pairs of alleles drawn from the total population, predicted from the allele frequencies in the total population (T). This correlation between alleles from the same subpopulation is measured by F ST , or its equivalent θ (Cockerham, 1973). It follows that both population substructure, or consanguinity, or some combination of the two could explain the excess homozygosity observed in a population. Consider the case where the excess homozygosity over HW expectations is F = 0.03. Where there is both population substructure and consanguinity this excess implies C kg =1 (c g [Rg + (1 − Rg )θ]) + (1 − C)θ = 0.03. Notice that in the extreme case of no consanguineous pairings (C = 0) this reduces to θ = 0.03, so that the excess is explained entirely by differentiation between allele frequencies between the subpopulations, in accordance with Wright’s island model (Wright, 1931). Conversely, if there is no population substructure (θ = 0) we obtain 528 Annals of Human Genetics (2003) 67,525–537 C kg =1 c g Rg = 0.03; so that the effect is accounted for by consanguinity. In the case of substructure alone, it is important to appreciate that θ relates to the increased probability of IBD at each locus within every individual. The effect of consanguinity is quite different. For example, if the excess homozygosity has come about through first cousin unions, such that C × 1 /16 = 0.03, then, clearly, only a proportion of the population is contributing to this excess, about 50% (C = 0.5). It is only these individuals that are expected to have an increased probability of alleles being IBD at each of their loci. The remaining 50% of the sample are expected to have genotypes corresponding to HW expectations. For this reason, the distribution of the number of homozygous loci, within an individual, is different for each of these two scenarios. It is these differences in the distribution of homozygous loci within individuals that allows the relative contributions of consanguinity and substructure to be estimated (Overall & Nichols, 2001). The parameters we are interested in estimating are C, the proportion of the population inbred through consanguineous parents and θ, the magnitude of substructure. Because we expect consanguinity to be largely between first-cousin unions, as observed by Darr & Modell (1988), we set from our general treatment above c g = 1 and R = 1 /16 . The θ and C parameters are estimated using , the likelihood of the sample: p i (θ + (1 − θ) p j ) i = j (1 − C ) = Loci 2 p i p j (1 − θ ) i = j Individuals p i (R + (1 − R)(θ + (1 − θ ) p j )) i = j +C Loci 2 p i p j (1 − R)(1 − θ ) i = j (1) where p i is the allele frequency for allele i (Overall & Nichols, 2001). This method was applied to 10,000 combinations of values for C and θ , resulting in a grid of likelihood values. Figure 1 illustrates the results of equation (1) for sub-samples of the data that gave positive estimates of F IS (table 3). Results The Asian population of Nottingham broadly consists of individuals originating from two principal regions of C University College London 2003 Consanguinity and Substructure in UK Asian Populations Table 1 Sample data (numbers of individuals (I), 1st cousin offspring (1C), 2nd cousin offspring (2C) and individuals belonging to biradheri (B) grouped into country of origin, location within country and caste.) N N Country of origin I 1C 2C B Location I India 87 1 0 2 Jullundur Ludihana Punjab Amristar Hoshiapur Phagare Delhi Gujurat Jammu Khushab Purawara India Mirpur Kashmir Lahore Jhelum Rawalpindi Islamabad Sargodar Chawaal Faislabad Gujaranwala Sailkot Sial 44 9 9 8 7 3 1 1 1 1 1 2 39 28 11 6 6 4 2 1 1 1 1 1 Pakistan 101 21 10 27 the Punjab: Mirpur and Jullundur. Mirpur is located within the Kashmir area of Pakistan and Jullundur in the northwest of India, approximately 150 km Southeast of Mirpur. Although these two locations are the most common regions of origin in our sample, a further 19 locations of variable specificity were identified through a questionnaire (Table 1). The sampled individual’s caste is also presented in Table 1, along with the number of individuals identified as being offspring of 1st and 2nd cousin unions and the number belonging to a biradheri. Multiple origins were not observed in our sample (i.e., the recent ancestry of the volunteer was reported to have originated in the same location in India or Pakistan through both parents). The age structure of the volunteers was broad, between 20 and 65 years (data not shown), and included migrants as well as offspring and grandchildren of the migrant generation. The specificity of the locations recorded varies; for example, nine individuals gave the Punjab as a location and 28 gave Kashmir, both of which cover large areas. Other C University College London 2003 N 1C 2C B Caste I 1 Jat-Sikh Hindu-Khatri Hindu-Brahmin Hindu-Pandit Hindu-Rai Sikh-Ramgara Sikh-Sein Muslim-Gujur Muslim-Jat None stated 56 19 4 1 1 1 1 2 1 1 Rajput Bains Gujar Kashmiri Moghul Jat Sheikh Bhatt Chaudri Syed None stated 41 13 6 6 4 2 2 1 1 1 24 1 1 13 7 1 6 4 10 9 1 1 3 1 1 1 1C 2C 1 8 5 1 2 1 B 1 1 4 3 1 1 14 5 1 1 2 1 1 3 1 1 3 individuals specified cities. However, the majority of the individuals’ ancestries could be traced back to just three locations: Jullundur (51% of Indians), Mirpur and Kashmir (38% and 27% of Pakistanis respectively). These proportions are typical of those observed throughout British Pakistanis and Indians (Ballard, 1994). Castes are also unevenly represented, with Jat-Sikh being the most prevalent of the Indian castes (at 65%), and Rajput the most prevalent of the Pakistani castes (56%). The prevalence of the Rajput caste might be indicative to some degree of the self-identification of non-Rajput individuals that occurred on their conversion to Islam (Wang et al. 2000). Of the 188 individuals scored for 10 loci, 96% of the genotypes gave reliable results with 5 samples failing to amplify at any loci. Table 2b indicates that both the Indian and Pakistani samples are significantly different from both Afro-Caribbean and US Caucasian over all loci, albeit with low F ST values. Table 2c gives the values of F ST for those groups that are numerically well Annals of Human Genetics (2003) 67,525–537 529 A. D. J. Overall et al. Table 2a Expected heterozygosities for the 10 SGM Plus loci (AC = Afro-Caribbean, USC = US Caucasian from P.E. Biosystems (1999)) India Pakistan AC USC D2 D3 D8 D16 D18 D19 D21 FGA THO1 vWA 0.885 0.875 0.893 0.885 0.732 0.614 0.754 0.787 0.793 0.792 0.788 0.796 0.838 0.707 0.804 0.750 0.855 0.716 0.876 0.872 0.768 0.772 0.847 0.782 0.813 0.890 0.863 0.838 0.885 0.928 0.861 0.861 0.840 0.850 0.744 0.761 0.780 0.750 0.817 0.806 Table 2b F ST values (AC = Afro-Caribbean, USC = US Caucasian from P.E. Biosystems (1999)) India-AC Pakistan-AC India-USC Pakistan-USC AC-USC D3 D8 D16 D18 D19 D21 FGA THO1 vWA Total 0.006∗ 0.010∗ 0.000 0.001 0.006∗ 0.000 0.001 0.002 0.000 0.003 0.006∗ 0.006∗ 0.002 0.003 0.009∗ 0.000 0.000 0.001 0.004 0.005∗ 0.003 0.000 0.004 0.005∗ 0.007∗ 0.003 0.002 0.003 0.003 0.001∗ 0.005 0.001 0.001 0.002 0.003 0.000 0.000 0.004 0.003 0.004∗ 0.015∗ 0.019∗ 0.004 0.013∗ 0.027∗ 0.003 0.007∗ 0.003 0.003 0.008∗ 0.003∗ 0.003∗ 0.002∗ 0.003∗ 0.007∗ 0.08 0.10 F ST significantly greater than 0 (p < 0.05). Table 2c F ST estimates between identifiable subgroups of Asian sample Group F ST India Pakistan Jullundur Mirpur Kashmir Jat-Sikh Hindu-Khatri Rajput Non-Rajput 0.0006 A 1.0 0.8 0.0000 0.6 0.0000 0.0058 C ∗ D2 0.4 0.2 0.1 9 0. 0.02 0.5 Annals of Human Genetics (2003) 67,525–537 0.0 0.00 0.5 530 0.1 represented (N ≈ 20). Whether countries (India and Pakistan), regions (Jullundur, Mirpur and Kashmir) or castes (Jat-Sikh, Hindu-Khatri and Rajput) were compared with each other, significant differences calculated as F ST were not observed. Artificial clusters such as Jullundur vs non-Jullundur and Mirpur/Kashmir vs non-Mirpur/Kashmir also gave non-significant values of F ST (results not shown). The largest value of F ST observed was that between Rajput and the remainder of the Pakistani caste members (non-Rajput). Table 3 gives estimates of F IS for the same groups. Small, but non-significant values were generally observed within the Indian groups, with a negative value for the Hindu-Khatri caste. The Pakistani groups, on the other hand, all gave significant positive values of F IS . 0.04 0.06 θ Figure 1 Joint estimates of the degree of population substructure (θ) and the proportion of consanguineous unions (C). (A) Likelihood surface for Pakistani sample with 10%, 50% and 90% CLV envelopes. N = 101. (B) Likelihood surface for Mirpuri sample with 10%, 50% and 90% CLV envelopes. N = 39. (C) Likelihood surface for Pakistani biradheri with 10%, 50% and 90% CLV envelopes. N = 28. Figures 1 (A-C) show the likelihood plots for the Pakistani, Mirpuri and biradheri samples respectively. The axes are C, the estimated proportion of the sample with consanguineous parents (R = 1 /16 ), and θ, the estimated magnitude of substructure. The envelopes C University College London 2003 Consanguinity and Substructure in UK Asian Populations Discussion B 1.0 0.8 C 0.6 0.9 0.4 0.2 0.5 0.1 0.5 0.0 0.00 0.02 0.04 0.06 0.08 0.10 θ C 1.0 0.8 C 0.6 0.9 0.4 0.2 0.1 0.0 0.00 0.02 0.5 0.1 0.5 0.04 0.06 0.08 0.10 θ Figure 1 (continued) enclose the 10, 50 and 90% critical likelihood values (CLV). These values are found by taking the cumulative sum of the likelihood values such that the corresponding cumulative sum is just less than or equal to 10, 50 or 90% of the total cumulative sum. Table 3 F IS estimates for each of the identifiable subgroups of the Asian sample. Country F IS Region F IS Caste F IS India 0.00844 Jullundur 0.01081 Pakistan 0.04378∗∗ Mirpur Kashmir 0.04971∗ 0.08416∗∗ Jat-Sikh Hindu-Khatri Rajput Non-Rajput 0.00887 − 0.03352 0.04257∗ 0.05733∗∗ ∗ C University College London 2003 Relative to most European societies, the social structure of the UK Asian population is complex, being largely a result of marriage traditions. Hindu and Sikh communities, for example, are often caste-endogamous. Strict rules of exogamy can also prevail so that marriage into the father’s, mother’s, and both parents mother’s descent group (got) is prohibited in addition to the reciprocal exchange of women between families (Ballard, 1990). It may be that the negative F IS value observed for the Hindu Khatri caste (Table 3) is reflecting caste endogamy, but got exogamy. Marriage within Pakistani Muslim societies, however, has fewer exclusions: first order relatives; parents, sibs and parent’s sibs (Ballard, 1990). Amongst Punjabi Muslims, an active preference for first cousin marriage is also common (Bittles et al. 1991) and there is evidence that such arrangements are increasing in prevalence (Darr & Modell, 1988). One other important aspect of Pakistani Muslim societies is the extent of family networks that develop through biradheri endogamy, where marriages within these large kin-networks often occur between related individuals (Darr & Modell, 1988; Ballard, 1990). Despite the recognition of such groups, it is often not possible to identify a simple nested hierarchy of population divisions within which individuals can be placed. It is more often the case that identifiable divisions overlap, making population substructure difficult to quantify. The F ST values presented in Table 2c show that no significant deviations were found between the more obvious divisions of the Asian sample. The F IS estimates in Table 3, however, vary widely between the divisions with significant positive values estimated for the major Pakistani groups (caste and locations) as well as the total Pakistani sample itself. These samples were inspected further using the likelihood approach (equation (1)). p < 0.05,∗∗ p < 0.01 Annals of Human Genetics (2003) 67,525–537 531 A. D. J. Overall et al. The excessive homozygosity observed for the Pakistani sample appears to be contributed to by substructure (Figure 1(A)). The maximum likelihood values are for θ = 0.029 and C = 0.1. With this magnitude of variation within the Pakistani subgroup the F ST value presented in Table 2c is likely to be an underestimate. The questionnaire identified 20% of the Pakistani individuals as having parents related as first cousins. This proportion appears to be low, relative to other studies on UK Pakistani communities (Darr & Modell, 1988; Qureshi, 2003), but is corroborated by the genetic data. The contour of Figure 1(A) shows that the top 10% most likely parameter combinations exclude values above C > 0.3. The locations with significant F IS values are Mirpur and Kashmir. The Mirpuri sample (Figure 1(B)) is of interest as, similar to a previous study on the Mirpuri (Overall & Nichols, 2001), it shows high levels of consanguinity, the most likely parameter combination being 39% consanguines (R = 1 /16 ) and θ = 0.015. With N = 39, the CLV envelopes are broad, though the 10% CLV excludes zero for both parameters. Again, Figure 1(B) is in accordance with the questionnaire, which identifies 1 /3 of the Mirpuris as having parents related as first cousins. The estimates of C for the Mirpuris are in keeping with other studies of UK Pakistani communities where estimates are around 55% first cousin marriages (Darr & Modell, 1988; Modell, 1991), a value not excluded by the 10% most likely parameter values. This result suggests some variation in the proportion of consanguinity in this Pakistani sample. The majority of Pakistani immigrants trace their origins to rural Mirpur and, as other studies have observed (Rao & Inbaraj, 1977; Bittles, et al. 1991; Bittles, 1994), the highest rates of consanguinity are to be found in rural communities. It appears that this trend has continued in their immigrant descendants. The fact that the Pakistani sample as a whole has a lower value for C might reflect a large proportion of non-Mirpuri individuals descending from more urban locations on the subcontinent. The likelihood method was unable to distinguish the most likely parameter combination for the Kashmiri sample, most probably due to the small sample size (N = 28), the method being sensitive to sample size when either C is close to 1 or when the magnitude of θ is equivalent to the R value being considered, as 532 Annals of Human Genetics (2003) 67,525–537 the value in Table 3 suggests is the case. Under these conditions the distribution of homozygous genotypes is expected to be similar for both complete consanguinity and substructure, and the two cannot be distinguished. The limitations of the method are detailed in Overall & Nichols (2001). In addition to increasing sample size, the resolution of the method is also greatly improved by the addition of more polymorphic loci. An issue not generally considered in previous studies of UK Pakistanis is the heterogeneity that can exist within communities through more complex social structuring, which is in contrast to the detailed investigations conducted on the sub-continent (e.g., Hussain, 1998; Wang et al. 2000) and elsewhere around the world (Bittles, 2001). Both of the Pakistani caste groupings gave significant positive F IS estimates. The non-Rajput group (figure not shown), as expected given the numerous castes and biradheri that it comprises, shows evidence of substructure with the most likely parameter combination of θ = 0.028 and C = 0. The single Rajput caste (figure not shown) also has a maximum likelihood of θ = 0.037 and C = 0, supporting the Y-chromosome haplotype diversity observed by Wang et al. (2000) in showing evidence of substructure within the Rajput. Figure 1(C) shows the likelihood contour for individuals whose parents were identified as belonging to the same ‘biradheri’. Individuals who gave this response to the questionnaire did so as an alternative to consanguinity and it is unclear to how related the parents were. The excess homozygosity observed within this group is high, and corresponds to both substructure and consanguinity, with a maximum likelihood value of consanguinity equivalent to 47% first cousin offspring and θ = 0.033. Because the estimation of C requires a prior estimate of R, we cannot be certain that the magnitude of excess homozygosity refers to first cousin offspring. If an R-value corresponding to second cousin offspring is inputed into equation (1) (R = 0.03125), a maximum likelihood estimate of C of around 100% results. It is possible, due to the sensitivity of the questionnaire, that respondents may have preferred to claim that their parents were from the same biradheri rather than acknowledge first cousin consanguinity. Table 1 reveals that, unlike consanguinity which is largely confined to the rural Mirpur and Kashmir, individuals whose parents C University College London 2003 Consanguinity and Substructure in UK Asian Populations belong to the same biradheri are far more widespread, suggesting that a proportion of those questioned gave an accurate response. Whatever the cause, this magnitude of consanguinity is clearly relevant to the genetic health of the UK Pakistani population, as relationships other than acknowledged first cousin offspring have received little attention. The maximum likelihood estimate for substructure (θ = 0.033) is indicative of the isolation expected to exist between large kin-networks. Such social structuring has been compared with the tribes of the Middle East (Modell & Darr, 2002), with the implication that inherited disorders may be unevenly distributed as a result of restricted intermarriage. The evidence of substructure within the UK Pakistani biradheri group certainly suggests this as a possibility. The magnitude of substructure and consanguinity within our biradheri grouping is consistent with that found by Wang et al. (2000), where biradheri were identified as groups equivalent to the Indian caste. Wang et al. (2000) identified three biradheri in a study of communities throughout the Punjab: the Awan, Khattar and Rajpoot, and found significant differentiation between them and inbreeding within them at autosomal loci. The result of our likelihood analysis points to a similar conclusion, although it is important to note that the likelihood estimate is an average of the different degrees of endogamy expected within different biradheri, as well as between the individual families within biradheri; differences that were observed in the Pakistani population by Wang et al. (2000). Because only a proportion of those individuals identifying with a caste claimed to belong to a biradheri in our survey, numbers were too few to make meaningful comparisons between biradheri from different caste groups, which would have been a more appropriate comparison with the Wang et al. (2000) study. However these groups are stratified into endogamous units, the Rajput example does provide an indication that social groupings can contain significant diversity, possibly through past social and geographic isolation. A further comparison was used to test the link with reported consanguinity: estimates from Pakistani individuals who report that their parents are unrelated and not from biradheri. The likelihood surface (not shown) for this group, as would be expected if the parent’s re- C University College London 2003 lationship was accurately reported, gave the most likely parameter combination of zero for both C and θ. The Indian groups did not show any significant deviations from Hardy-Weinberg equilibrium. This result at first appears surprising, given that caste endogamy may be expected to generate substructure in the total population, as is seen on the sub-continent with Hindu caste and tribal groups (Mukherjee et al. 1999). This may reflect the small sample sizes for all but the JatSikh caste in our data set. In addition, Sikh caste endogamy is not expected to be adhered to in marriage arrangements as strongly as within Hindu communities (Ballard, 1994). The Jat-Sikh caste (N = 56) was represented by individuals who could trace their ancestry to five different Indian locations. Unlike the Pakistani Rajput caste, which was similarly composed of individuals from various origins, the Jat-Sikh did not show any evidence of substructure. This possibly reflects the fact that the majority (34/56) of the Jat-Sikhs trace their origins to Jullundur, a prosperous and well connected region of the Punjab likely to have experienced significant influxes from surrounding areas, particularly during the partition (Ballard, 1990). Although many Indian Sikhs are also known to be twice migrants, having first migrated to East Africa before moving on to the UK from the 1960s onwards, these predominantly belong to the Ramgara (Ramgarhia) caste, with the Jat-Sikhs mostly migrating to the UK directly from the Indian Sub-continent (Bhachu, 1986). Another possible cause of this result is that the traditional caste endogamy that may still be present on the sub-continent has been disrupted by a combination of restrictive migration policies, settlement, and a community adapting and assimilating into a western society (Ballard, 1990). There is thus little evidence from the Jat-Sikhs that caste endogamy plays an important role within this UK community. In a previous study on the Jullundur community of Nottingham (Overall & Nichols, 2001), excess homozygosity was observed where the most likely explanation was substructure, and it was suggested that this could be through caste endogamy. However, the Jullundur subsample obtained for this study, with care being taken not to re-sample the same individuals, showed no significant excess of homozygosity (Table 3). Annals of Human Genetics (2003) 67,525–537 533 A. D. J. Overall et al. Relevance of Consanguinity and Substructure to Genetic Health D = C [q (R + (1 − R) (θ + (1 − θ) q ))] + (1 − C) [q + (θ + (1 − θ) q )] , where it then follows that q = (C R (1 − θ ) + θ ) − 0.008 0.006 q 0.004 0.4 0.2 0.04 0 0.02 θ 0.8 0.000 0.6 1 0.002 C Figure 2 The change in frequency estimates for a recessive allele (q) with C and θ with R = 0.0625 and observed incidence of 1:10000. when either C = 0.614 or θ = 0.048 and when D = 1:10000 when either C = 0.223 or θ = 0.015. The magnitude of substructure detected in our survey can have appreciable effects. As an illustration, (C R (1 − θ) + θ )2 − 4 (C R (1 − θ) + θ − 1) D . (2) 2 (C R (1 − θ) + θ − 1) Figure 2 illustrates the effect of substructure and consanguinity on the estimated frequency, q, where the coefficient of consanguinity is R = 0.0625 and where the observed incidence is 1:10000. This Figure shows that, as expected, more homozygotes are expected for a given allele frequency with increasing consanguinity through the addition of autozygous homozygotes. For a given incidence, therefore, the estimated frequency of the disease allele declines with increasing consanguinity. This effect is more marked with rarer disorders, reflecting the fact that carriers of rare mutations are unlikely to mate with carriers of the same disorder unless they are related (Modell & Darr, 2002). Substructure has a similar effect on this estimation. In these examples, a halving of the estimated gene frequency when D = 1:1000 occurs 534 0.010 0 One important implication of our results is that simple methods of estimating disease gene frequencies may need to be modified. Some studies (Hutchesson et al. 1998; Hall et al. 1999) use an equation derived from Dahlberg (1930) and Edwards (1989) to estimate the frequency of specific autosomal recessive inborn errors of metabolism, q, from estimates of D, the incidence of the disorder, C, the proportion of the population from consanguineous unions and R, the inbreeding coefficient where D = C(q(R + (1 − R)q)) + (1 − C)q2 . Our results suggest that this approach needs additional modification to include a parameter for substructure. The Pakistani group, for example, showed evidence of substructure where the frequency of a disease gene could vary between subpopulations; estimating q from such a substructured population would lead to an overestimate due to a Wahlund effect. In a case such as this a more complete formulation of disease incidence would be: D=1:10000 Annals of Human Genetics (2003) 67,525–537 equation (2) was used to modify the results of Hutcheson et al. (1998) on the estimation of 10 autosomal recessive gene frequencies. Figures 3(A) and 3(B) show, respectively, the disease frequencies and gene frequencies estimated from a North-Western European and a Pakistani group from the West Midlands, UK. Data were obtained from the West Midlands neonatal screening programme and the National Census. Figure 3(A) shows that five out of the ten disorders have significantly different disease frequencies (higher in the Pakistani sample). Figure 3(B) shows the corresponding gene frequency estimates. One set takes into account only consanguinity (filled circles, C = 0.7 and R = 0.0686 according to Hutchesson et al. (1998)) and our new estimates allow for both consanguinity and substructure. One uses C University College London 2003 Consanguinity and Substructure in UK Asian Populations A 0.0008 Disease frequency 0.0007 0.0006 0.0005 0.0004 0.0003 0.0002 B Niemann Pick disease type C Non-ketotic hyperglycinaemia San Fillipo diseaseType C Mucopolysaccharidosis type 1 MCADD Hyperoxaluria type1 Cystinosis Galactosaemia Tyrosinaemia type 1 0 Phenylketonuria 0.0001 0.016 Gene frequency 0.014 0.012 0.010 0.008 0.006 0.004 Niemann Pick disease type C Non-ketotic hyperglycinaemia San Fillipo diseaseType C Mucopolysaccharidosis type 1 MCADD Hyperoxaluria type1 Cystinosis Galactosaemia Tyrosinaemia type 1 0.000 Phenylketonuria 0.002 Figure 3 (A) Incidence frequencies of ten autosomal recessive inborn errors of metabolism In NW European (open squares) and Pakistani (filled circles) populations of the West Midlands, UK. Data from Hutchesson et al. (1998). The error bars represent 95% confidence intervals where the observed cases of each disorder are considered to be Poisson distributed, according to the method of Hutchesson et al. (1998). (B) Estimated gene frequencies of ten autosomal recessive inborn errors of metabolism. Error bars as in figure 3(A). Open squares: NW √ European; estimated as (incidence). Filled circles: Pakistani; estimated using eq. (2) with C = 0.7, R = 0.0686, θ = 0 as in Hutchesson et al. (1998). Open circles: Pakistani; estimated using eq. (2) with C = 0.4, R = 0.0625, θ = 0.01. Filled triangle: Pakistani; estimated using eq. (2) with C = 0.5, R = 0.0625, θ = 0.03. the parameter estimates from the Mirpuri (open circles, C = 0.4, R = 0.0625 and θ = 0.01); another from the biradheri (closed triangles, C = 0.5, R = 0.0625 and θ = 0.03) samples. The general trend is as noted by Hutchesson et al. (1998), with only PKU showing C University College London 2003 any significant differentiation between the two ethnic groups for all three parameter combinations considered. In general, the modified gene frequency estimates incorporating the Mirpuri parameters tend to be higher than those from Hutcheson et al. (1998), while those Annals of Human Genetics (2003) 67,525–537 535 A. D. J. Overall et al. incorporating the biradheri parameters tend to be lower. The biradheri type parameters decrease the gene frequency estimate for tyrosinaemia type 1 such that the differences between the two ethnic groups are no longer significant. The Mirpuri type parameters increase the gene frequency estimate for MCADD (medium chain acyl-CoA dehydrogenase deficiency) such that the differences for this disorder are no longer significant. Higher disease incidence in Pakistani communities relative to NW European populations has generally been attributed to consanguinity. The results of this study indicate that the magnitude of consanguinity practiced amongst certain divisions of the population varies. For example, individuals whose family origins trace back to rural Pakistan, such as Mirpur, are more likely to be consanguines. The Hutchesson et al. (1998) model of inferring gene frequency is appropriate for such cases. A higher incidence may also be contributed to by substructure, such as we found with the Nottingham Pakistani sample. If the subpopulations could be identified, then greater accuracy would be obtained by treating them separately. However, our results provide evidence of cryptic population structure. Equation (2) is therefore of use to estimate the average frequency of the recessive allele within a substructured population, and takes into account any differing incidence of disease between the subpopulations. The UK Pakistani population has been studied in detail, in particular because of the clear implications a traditional consanguineous society has for recessive disorders (Terry et al. 1985; Darr & Modell, 1988; Chitty & Winter, 1989; Bundey et al. 1991). The results of this study suggest that simply identifying broad estimates of the proportion of consanguineous unions, along with an average coefficient of consanguinity, may provide too vague a picture of how disease incidence relates to the risk of congenital disorders. The substructure identified by our likelihood analysis indicates the potential for heterogeneity in disease gene frequencies across the various Pakistani subpopulations. It would be helpful to those offering health advice to be able to identify whether such heterogeneity is accounted for by caste/biradheri differentiation, or differentiation due to some other identifiable subgroup such as the Mirpuris or Kashmiris. The Indian community, on the other hand, shows no significant evidence of substructuring, despite indi536 Annals of Human Genetics (2003) 67,525–537 cations of a Wahlund effect in a previous study on the Jullundur community (Overall & Nichols, 2001) and the expectation of caste endogamy. This is perhaps not so surprising considering how successful the Indian community has been in settling and establishing itself within the UK society (Ballard, 1990). In contrast to Indian customs, the Pakistani marriage strategies, in particular those of the Mirpuris, remain very much as they were during the early immigration period and continue to involve a high proportion of trans-continental arrangements (Ballard, 1990). Traditional marriage practices will continue to generate the social structuring of the Pakistani population for some time, and for this reason the influence of both consanguinity and substructure on the assessment of risk of congenital recessive disorders needs to be pursued. Acknowledgements This study was supported by The Sir Jules Thorn Charitable Trust grant 98/28A. We are grateful for the comments made by two anonymous referees, which improved the manuscript considerably. Also thanks to Roger Ballard (Centre for Applied South Asian Studies) for guidance in setting up the sampling procedure, and the communities of Nottingham who participated in the project. References Bhachu, P. (1986) Twice Migrants. London: Tavistock. Ballard, R. (1990) Migration and Kinship: the differential effect of marriage rules on the process of Punjabi migration into Britain. In: Migration and Ethnicity (eds C. Clarke & C. Peach), Cambridge University Press. Ballard, R. (1994) The emergence of Desh Pardesh. In: Desh Pardesh: The South Asian presence in Britain (Ed. R. Ballard). London: Oxford University Press. Bittles, A.H. (2001) Consanguinity and its relevance to clinical genetics. Clin Genet 60, 89–98. Bittles, A.H., Mason, W.M., Greene, J. & Appaji, Rao, N. (1991) Reproductive behavior and health in consanguineous marriages. Science 252,789–794. Bittles, A.H. & Neel, J.V. (1994) The costs of human inbreeding and their implications for variations at the DNA level. Nature Genet 8, 117–121. Bittles, A.H., Shami, S.A. & Appaji Rao, N. (1992) Consanguineous marriage in South Asia: incidence, causes and effects. In: Minority Populations: Genetics, Demography and Health (eds A.H. Bittles & D.F. Roberts), pp. 102–117. London: Macmillan. C University College London 2003 Consanguinity and Substructure in UK Asian Populations Bittles, A.H. (1994) The role and significance of consanguinity as a demographic variable. Pop Dev Rev 20, 561–584. Bundey, S. & Alam, H. A five year prospective study of the health of children in different ethnic groups, with particular reference to the effect of inbreeding. Eur J Hum Genet 1, 206–219. Bundey, S., Alam, H., Kaur, A., Mir, S. & Lancashire, R. (1991) Why do Pakistani babies have high perinatal and neonatal mortality rates? Paediatr Per Epidemiol 5, 101–114. Chitty, L.S. & Winter, R.M. (1989) Perinatal mortality indifferent ethnic groups. Arch Dis Child 64, 1036–1041. Cockerham, C.C. (1973) Analysis of gene frequencies. Genetics 74, 679–700. Dahlberg, G. (1930) Inzucht bei Polyhybridat beim Menschen. Hereditas 14, 83–96. Darr, A. & Modell, B. (1988) The frequency of consanguineous marriage amoung British Pakistanis. J Med Genet 25, 186–190. Devi, A. R. R., Appaji Rao, N. & Bittles, A.H. (1987) Inbreeding and the incidence of childhood genetic disease in Karnataka, Sounth India. J Med Genet 24, 362–365. Edwards, J.H. (1989) Familiarity, recessivity and germline mosaicism. Ann Hum Genet 53, 33–47. Hall, S.K., Hutchesson, A.C.J. & Kirk, J.M. (1999) Congenital hypothyroidism, seasonality and consanguinity in the West Midlands, England. Acta Paediatr 88, 212–215. Heinisch, U., Zlotogora, J., Kafert, S. & Gieselmann, V. (1995) Multiple mutations are responsible for the high frequency of metachromatic leukodystrophy in a small geographic area. Am J Hum Genet 56, 51–57. Hussain, R. (1998) The role of consanguinity and inbreeding as a determinant of spontaneous abortion in Karachi, Pakistan. Ann Hum Genet 62, 147–157. Hutchesson, A.C.J., Bundey, S., Preece, M.A., Hall, S.K. & Green, A. (1998) A comparison of disease and gene frequencies of inborn errors of metabolism among different ethnic groups in the West Midlands, UK. J Med Genet 35, 366–370. Modell, B. (1991) Social and genetic implications of customary consanguineous marriage among British Pakistanis. Report of a meeting held at the Ciba Foundation on 15 January 1991. J Med Genet 28, 720–723. Modell, B. & Darr, A. (2002) Genetic counselling and customary consanguineous marriage. Nat Rev Genet 3, 225– 229. Mukherjee, N., Majumder, P.P., Roy, B., Roy, M., Dey, B., Chakraborty, M. & Banerjee, S. (1999) Variation at 4 short tandem repeat loci in 8 population groups of India. Human Biology 71, 439–446. Overall, A.D.J. & Nichols, R.A. (2001) A method for distinguishing consanguinity and population substructure using multilocus genotype data. Mol Biol Evol 18, 2048– 2056. C University College London 2003 Overall, A.D.J., Ahmad, M. & Nichols, R.A. (2002) The effect of reproductive compensation on recessive disorders within consanguineous human populations. Heredity 88, 474–479. Özlap, I., Coskun, T., Tokol, S., Demircin, G. & Mönch, E. (1990) Inherited metabolic disorders in Turkey. J Inher Metab Dis 13, 732–738. Biosystems, P.E. (1999) AmpF STR SGM Plus user manual. California: Perkin-Elmer. Qureshi, N., Gilbert, P. & Raeburn, J.A. (2003) Consanguinity and genetic morbidity in a British primary care setting: a pilot study with trained linkworkers. Ann Hum Biol 30, 140–147. Rajib, A. & Patton, M.A. (1999) Analysis of the population structure in Oman. Commun Genet 2, 23–26. Rao, P.S.S. & Inbaraj, S.G. (1977) Inbreeding in Tamil Nadu, South India. Soc Biol 24, 281–288. Schneider, S., Roessli, D. & Excoffier, L. (2000) Arlequin ver. 2000: A software for population genetics analysis. University of Geneva, Switzerland: Genetics and Biometry Laboratory. Shami, S.A., Grant, J.C. & Bittles, A.H. (1994) Consanguineous marriage within social/occupational class boundaries in Pakistan. J Biosoc Sci 26, 91–96. Shaw, A. (1994) The Pakistani community in Oxford. In: Desh Pardesh: The South Asian presence in Britain (Ed. R. Ballard). London: Oxford University Press. Terry, P.B., Bissenden, J.G., Condie, R.G. & Mathew, P.M. (1985) Ethnic differences in congenital malformations. Arch Dis Child 62, 866–879. Vertovec, S. (1994) Caught in an ethnic quandary: IndoCaribbean Hindus in London In: Desh Pardesh: The South Asian presence in Britain (Ed. R. Ballard). London: Oxford University Press. Walsh, P.S., Metzgar, D.A. & Higuchi, R. (1991) Chelex(R) 100 as a medium for the simple extraction of DNA for PCR based typing from forensic material. Bio Techniques 1, 91–98. Wang, W., Sullivan, S.G., Ahmed, S., Chandler, D., Zhivotovsky, L.A. & Bittles, A.H. (2000) A genome-based study of consanguinity in three co-resident endogamous Pakistan communities. Ann Hum Genet 64, 41–49. Weir, B.S. & Cockerham, C.C. (1984) Estimating F statistics for the analysis of population structure. Evolution 38, 1358– 1370. Wright, S. (1921) Systems of mating. Genetics 6, 111–178. Wright, S. (1931) Evolution in Mendelian populations. Genetics 16, 97–159. Zlotogora, J. (1997) Genetic disorders among Palestinean Arabs. Am J Med Genet 68, 472–475. Received: 19 March 2003 Accepted: 11 July 2003 Annals of Human Genetics (2003) 67,525–537 537