Supporting Information Ne=2 Ne=5 Ne=20 Ne=50 Ne=100 0.5 He 0.4 0.3 0.2 0.1 0.0 0 20 40 60 80 100 Generations (t) Fig. S5. Loss in simulated and theoretical predicted mean (±SEM) gene diversity (expected heterozygosity, He) by random genetic drift. The mean (±SEM) simulated He was calculated over 100 simulations in populations with different effective size over 100 generations. The theoretical heterozygosity at generation t was calculated as Het=H0 Π ({1 − [1/(2Net + 1)]}), where Het is the expected heterozygosity at generation t, H0 is the expected heterozygosity at t=0, and Net is the effective population size at generation t. The initial genetic variation in the population at t=0 was H0=0.5 and there were two alleles. There was no mutation or selection. The simulated loss in He was in good agreement with the theoretically expected values for different values of Ne. Linkage disequilibrium (LD) 1.0 c=0.001 c=0.01 c=0.1 0.8 0.6 0.4 0.2 0.0 0 100 200 300 400 500 Generations (t) Fig. S6. Decay in theoretical and simulated mean (±SEM) linkage disequilibrium (D) between neutral alleles at two partially linked loci. The mean (±SEM) decay in linkage disequilibrium was calculated over 100 simulations for 500 generations and is in good agreement with the theoretically expected values for different values of c. The theoretical linkage disequilibrium was calculated as LD=(1 - c)t, where c is the recombination rate and t the generation number. 1.000 Fitness (w) 0.998 0.996 0.994 Recessive mutations (h = 0), theoretical Co-dominant mutations (h = 0.5), theoretical Recessive mutations (h = 0), simulated Co-dominant mutations (h = 0.5), simulated 0.992 0.990 10-5 5x10-5 10-4 5x10-4 10-3 5x10-3 Cumulative mutation rate (U) Fig. S7. Theoretical and simulated mean (±SEM) fitness in populations with recessive (h=0) and co-dominant (h=0.5) mutations with selection coefficient s=0.1 and total mutation rate U= 510-3 to 110-5 and effective population size Ne=100. The equilibrium fitness was reached after 20,000 generations, and the values of 50 simulations were used to calculate the mean (±SEM) fitness. The theoretical predicted fitness values were calculated as w=e-U, for completely recessive mutations (h=0), and w=e-2U for co-dominant mutations (h=0.5), where U is the total mutation rate across the linked region (i.e. haplotype block). Assuming a single-base mutation rate of μ=110-8 and a linked region of 105 bp (Stenzel et al. 2004), the total mutation rate would equate to U=110-3. The simulated values approach the theoretically expected values reasonably well although the fitness values in simulations are marginally higher than the theoretically expected value. The explanation for this small bias is that the allele frequency of deleterious mutations in a selection-mutation balance is based on infinitely large populations. However, due to a low level of inbreeding in finite populations, the frequency of these mutations can be appreciably less than the theoretically expected frequency in infinite populations (Crow & Kimura 1970). Consequently, the equilibrium fitness in the simulated populations is marginally higher than the theoretically expected fitness value. 20 Effective number of alleles (ne) 18 16 14 12 10 8 6 = 10-5 = 10-4 = 10-3 4 2 0.0 0.1 0.2 0.3 0.4 0.5 Overdominance selection (S) Fig. S8. Theoretically expected and mean (±SEM) simulated effective number of alleles (ne) maintained in a population with size Ne=1000 across a range of overdominant selection coefficients (S) and mutation rates (μ). Population were simulated with effective size Ne=1000, overdominant selection coefficients S=0.01, 0.05, 0.1, 0.2 and 0.5, and mutation rate μ=10-5, 10-4 and 10-3. The figure shows that the simulated values for ne are in good agreement with the theoretically expected values over the entire range of S and for different values of μ. The theoretically expected values were calculated using: ne ≈ 2(NeS)½ / (4.6 log10{0.4 / [2Neμ / (NeS)½]})½, when 2Neμ / (NeS)½ < 0.1. When 2Neμ / (NeS)½ 0.1, the following approximation was used: ne ≈ 3.7Neμ + (NeS)½. These equations are derived from equations 9.7.19, 9.7.29 and 9.7.30 (in Crow & Kimura 1970). E=0 E = 0.01 E = 0.08 E = 0.16 Linkage disequilibrium (LD) 1.0 0.8 0.6 0.4 0.2 0.0 0 100 200 300 400 500 Generations (t) Fig. S9. Decay in simulated mean(±SEM) linkage disequilibrium (LD) between haplotype blocks in populations with size Ne=1000. Both haplotypes consist of two haplotype blocks separated by a recombination hotspot with recombination rate c=0.01. Each block carried a single (unique) recessive deleterious mutation with selection coefficients s=0, 0.1, 0.2 and 0.4 and dominance coefficient h=0 (see Figure 2). The solid line represents the theoretical values of LD for two haplotype blocks without epistatic selection (i.e. haplotype blocks fixed for mutations with s=0). Epistatic selection of E=0.16 maintains a high level of linkage disequilibrium between haplotype blocks and can extinguish the recombination hotspot. Note that the simulated recombination rate (c=0.01) represents an extremely “hot” recombination spot. The median map distance induced by a hotspot is 0.043 cM (or one crossover per 2,300 meioses) and the hottest identified in the human genome is 1.2 cM (one crossover per 80 meioses, i.e. c=0.012), (The International HapMap Consortium 2007). 0 .1 6 A B C : s = 0 .5 , n o b o ttle n e c k O v e rd o m in a n c e : S = 0 .5 , n o b o ttle n e c k 0 .1 4 0 .1 2 G ST 0 .1 0 0 .0 8 0 .0 6 0 .0 4 0 .0 2 0 .0 0 0 500 1000 1500 2000 G e n e ra tio n s (t) Fig. S10. Population differentiation (GST) with ABC evolution and overdominant selection in simulated source-sink metapopulations. Selection coefficients are s=0.5 (ABC evolution, solid symbols) and S=0.5 (overdominance, open symbols). The source population has an infinitely large population size (N=∞), and the sink population has a constant size N=5000 (circles). The migration is unidirectional with rate 2Nm=1. Overdominant selection in resulted in a rapid homogenization of the gene pools (open circles), whereas populations remained genetically differentiated with ABC evolution (solid circles). ABC evolution thus appears to be more consistent with the high level of MHC differentiation commonly observed in vertebrate populations (see e.g. Muirhead 2001; Richman et al. 2003). Table S1. Haplotype genealogy of a simulated population subject to ABC evolution over >3105 generations (data of Fig. 3a). The parameters used were: overdominant mutation rate μ=10-5, overdominance selection S=0.05, total mutation rate of completely recessive (h=0) deleterious (s=0.01) mutations U=10-3, and size Ne=1000. Simulations with incomplete linkage (c=0.001) were run as well and gave qualitatively similar results (data not shown). The first column shows the overdominant allele (labelled with the generation number it arose), the second column the generation it was last observed (rounded to the nearest 100), and the third column its derived mutant allele by which it was replaced (the derived mutant allele is also labelled by its generation number). The forth column shows the number of generations the parental allele coexisted with its derived mutant, and the fifth column the total number of generations it existed. The sixth column shows the total number of mutations the haplotype received at its overdominant gene. The last column shows the number of deleterious mutations that were fixed in the haplotype when it went extinct. Allele Time in generations Mutations Extinct Replaced Time co- Total Total no. of Total no. by by existed time overdominant of bad with existed mutations mutations received received mutant 0 12800 11035 1765 12800 103 1 670 14800 12190 2610 14130 70 4 1566 11500 9604 1896 9934 54 1 7922 22500 21770 730 14578 61 4 9604 44500 41843 2657 34896 162 8 11035 299000 298318 682 287965 923 136 12190 19600 17509 2091 7410 24 4 17509 22800 21461 1339 5291 26 8 21461 56900 55676 1224 35439 113 18 21770 57900 57155 745 36130 157 17 31288 56000 55933 67 24712 83 18 41843 48500 47307 1193 6657 20 9 47307 56700 49826 6874 9393 39 9 49826 109300 107715 1585 59474 263 22 55676 111600 111199 401 55924 229 35 55933 75600 75064 536 19667 44 27 57155 >300400 Extant H2 * 243245 968 131 75064 78400 77667 733 3336 3 27 77667 86600 86306 294 8933 25 30 86306 192400 191859 541 106094 329 83 107715 117200 114704 2496 9485 38 24 111199 122300 121525 775 11101 54 38 114704 122600 119364 3236 7896 27 24 119364 239600 238473 1127 120236 596 66 121525 130200 129714 486 8675 37 40 129714 139200 138560 640 9486 46 47 138560 >300400 Extant H5 * 161840 626 124 191859 >300400 Extant H4 * 108541 353 140 238473 >300400 Extant H3 * 61927 265 97 298318 >300400 Extant H1 * 2082 4 138 Text S1. Genealogies during ABC evolution. The genealogies presented in Figure 3 are representative examples taken from many simulation runs. ABC evolution resulted in genealogies and a pattern of genetic differentiation that are characteristic for the MHC in two important aspects: (1) little divergence from the ancestral allele, and (2) large genetic differentiation of alleles in extant population. Firstly, some extant alleles have diverged only little from the ancestral type. For example, haplotype H1 in Figure 3a has diverged from the ancestral type by only two mutations (i.e. steps in the genealogy) and H3 has diverged by three mutations. This compares to 17 mutations for the least diverged allele in the overdominant genealogy (Fig. 3b). Secondly, despite this high level of genetic conservation, ABC evolution resulted in considerable genetic differentiation in the extant population. The haplotypes in Figure 3a have diverged from each other by a combined total of 2 + 3 + 10 + 8 + 9 - 2 = 30 mutations by generation 300,000. (Note that H3 and H4 share coancestry and have two mutations in common (overdominant mutations in generation 1566 and 9604), and hence, two mutations were deducted). The genetic variation in the population with a gene under overdominant selection is considerably lower, and alleles in the genealogy of Figure 3b have diverged by a combined total of six mutations after 300,000 generations. Some alleles are reminiscent for trans-species polymorphism. For example, overdominant allele 11035 (in haplotype H1, Fig. 3a) persisted for 287965 generations. The long persistence time is particularly remarkable given the relatively small population size (Ne=1000) and high overdominant mutation rate (μ=10-5). With high mutation rate and small Ne, the rate of allelic turnover increases. Nevertheless, it had survived 923 overdominant mutations before it was replaced by the invading mutant allele 287283 at generation 299000. Text S2. Demographic scenarios simulated. Aguilar et al. (2004) found that “a severe bottleneck (to an effective size of 10 individuals or fewer for one or two generations, followed by ≈12 generations of population growth) was necessary to explain near monomorphism at the 18 [microsatellite] loci”. I simulated various bottleneck scenarios, and found that a two generation single-pair bottleneck with subsequent population growth (with r=0.28, (Aguilar et al. 2004)) to final size N=104 was a realistic bottleneck scenario. During this scenario, a neutral microsatellite locus with initial heterozygosity He=0.36 and stepwise mutation rate μ=10-4 becomes monomorphic in 85% of simulations. The initial heterozygosity was based on the observed mean heterozygosity at 18 microsatellite loci in Santa Catalina, the most polymorphic fox population analysed by Aguilar et al. (2004). The mutation rate and model were also taken from Aguilar et al. (2004). With this demographic scenario, the probability that 18 (unlinked) microsatellite loci become monomorphic equals p=0.8518=0.053. References for Supporting Information Aguilar A, Roemer G, Debenham S, Binns M, Garcelon D, et al. (2004) High MHC diversity maintained by balancing selection in an otherwise genetically monomorphic mammal. Proc Natl Acad Sci USA 101: 3490-3494. doi:10.1073/pnas.0306582101 Crow JF, Kimura M (1970) An introduction to population genetics theory. Harper & Row Publishers, New York. Muirhead CA (2001) Consequences of population structure on genes under balancing selection. Evolution 55: 1532-1541. Richman AD, Herrera LG, Nash D, Schierup MH (2003) Relative roles of mutation and recombination in generating allelic polymorphism at an MHC class II locus in Peromyscus maniculatus. Genet Res Camb 82: 89–99. Stenzel A, Lu T, Koch WA, Hampe J, Guenther SM. et al. (2004) Patterns of linkage disequilibrium in the MHC region on human chromosome 6p. Hum Genet 114: 377–385. doi:10.1007/s00439-003-1075-5 The International HapMap Consortium (2007) A second generation human haplotype map of over 3.1 million SNPs. Nature 449: 851-862. doi:10.1038/nature06258 van Oosterhout C, Joyce DA, Cummings SM, Blais J, Barson, NJ, et al. (2006) Balancing selection, random genetic drift and genetic variation at the Major Histocompatibility Complex (MHC) in two wild populations of guppies (Poecilia reticulata). Evolution 60: 2562–2574.