Lecture 7: Balancing selection Balancing selection: overdominant selection Balancing selection is the only form of natural selection which preserves genetic variation in populations. In this process two or more alleles are beneficial and are maintained over time. There are three kinds of balancing selection: 1. Heterozygote advantage (overdominant selection) 2. Frequency dependent selection 3. Spatial/temporal variation The heterozygote advantage is characterized by a higher fitness of individuals that carry two different alleles compared to those that are homozygotes. Imagine a locus with two alleles A and a. In a diploid population the three genotypes (AA, Aa, aa) will have different relative fitnesses. However, the highest fitness is linked to the heterozygote genotype. The selection coefficients s and t denote the selective disadvantage of both homozygotes compared to the heterozygote genotype. Genotype AA Aa aa Relative fitness w 1-s 1 1-t The fitness scheme shows why both alleles are maintained in the population. This does not necessarily mean that the frequencies of both alleles do not change. We can calculate the change Δs p by using the allele frequencies of the A allele ( p ) and the a allele ( q = 1 p ) , respectively: Δsp stable equilibrium 0 p Figure 1: Lecture SS Population Genetics I 1 Lecture 7: Balancing selection Δs p = pq pwAA wAa qwAa waa , w Δs p = pqqt ps . w Nevertheless, under some conditions a stable equilibrium can be reached. If qt ps 0 , Δs p = 0 . Figure 1 indicates that the equilibrium point is stable. Example 1: One famous case of heterozygote advantage was found in the human population. The locus for the protein globin present in the erythrocyte is mutated, which causes an amino acid substitution in the 6th position of the β-chain (glutamine → valine). This brings about a structural disorder leading to a sickle-shaped erythrocyte. This so called sickle-cell can neither bind to oxygen effectively nor pass the capillary tubes. This dysfunction leads to the sickle-cell anemia. As a Mendelian trait sicklecall anemia is controlled by one single locus and shows a simple Mendelian inheritance pattern. According to our 'two allele example' above we can define the A-allele to be 'normal' or 'wild type' and the a-allele to be the sickle-cell allele. The fitness scheme then looks like: Genotype AA Aa aa Relative fitness w 1-s 1 1-t Normal Heterozygote for sickle-cell Sickle-cell anemia anemia Phenotype The prevalence of sickle-cell anemia is strongly correlated with the distribution of malaria in tropical Africa, Asia and South America. The reason for this overlap is the partial resistance of the atypically formed erythrocytes against Malaria tropica. This Lecture SS Population Genetics I 2 Lecture 7: Balancing selection selective advantage is most efficient in the heterozygotes because they do suffer less under the anemia and under malaria. Referred to the fitness scheme the consequence of the anemia is denoted by the selection coefficient t whereas the impact of malaria sensitivity is denoted as s . If we assume that the frequencies of both alleles already reached a stable equilibrium we can calculate s , the selection coefficient for malaria. 0 = q~t ~ ps 0 = 1 ~ p t ~ ps , ~ ps = t ~ pt or t s ~ p= and q~ = , respectively. s+t s+t (1) Now we can calculate s , the selection coefficient for Malaria tropica in the human population. First we need to know q~ , which is easy to ascertain. We estimate it from the genotype frequency in a human population from Yemen: 1 q~ = f aa + f Aa = 0.12 , where f aa 0 because the lifespan of this genotype is 2 very short. It follows from eq. (1) 0.12 = s , where t 1 as the frequency of the genotype of sickle-cell anemia is s +1 close to 0. The selection coefficient for Malaria tropica is thus s 0.14 . It is smaller than t but all in all very high. Lecture SS Population Genetics I 3 Lecture 7: Balancing selection Example 2: MHC/HLA polymorphism The immune system of vertebrates is able to identify a huge amount of pathogens by targeting particular surface structures, which are symptomatic for this pathogen. This complex process is governed by the major histocompatibility complex (MHC), also known as HLA system (Human Leukocyte Antigen) in humans. The MHC is a large gene family in vertebrates that codes for more than 100 genes, mostly involved in immune response and tissue tolerance, which is expressed by the term 'histocompatibility'. In humans this gene cluster is located on the 6th chromosome. The MHC is referred to as a protein complex acting as surface antigens in all cells carrying a nucleus. Therefore they account for the immunological identity of an organism. The first step of an immune response comprises the identification of foreign proteins and the presentation of corresponding antigens T lymphocytes, which orchestrate the the to then immune system's response to infected or degenerated cells. This initial step is done by the MHC and defined restriction. Due as to MHC this Figure 2: The MHC class I molecule triggers the MHC function it has been shown restriction. that some MHC loci are highly polymorphic. Lecture SS Population Genetics I 4 Lecture 7: Balancing selection This especially holds for the antigen recognition site (ARS), also called the peptide binding site. It belongs to the first of three MHC classes: the MHC class I (Figure 2). The MHC class I pathway triggers the cell-mediated immune response by activating natural killer cells, which then destroy infected or degenerated cells via lysis. The composition of a class I MHC molecule is somehow similar to an antibody (immunoglobulins, Ig). It consists of a large membrane-linked subunit, the heavy chain (HC) and a smaller dissolvable subunit called β2 microglobulin (B2M). The HC can again be divided into three extracellular domains (α1, α2, α3), one transmembrane protein and a cytoplasmic tail. The α3-domain and the B2M molecule have a non-covalent bond and a similar structure. The crucial part of the class I MHC, however, consists of the α1 and α2 domains. Their folding comprises 8 anti-parallel β sheets of 57 amino acid residues. They form a gap representing the peptide binding site ARS. The three corresponding genetic loci HLA-A, HLA-B and HLA-C show great variability in the human population. As the frequencies of possible alleles are quite balanced it is very unlikely that an individual exhibits two times the same allele. Therefore, most humans are heterozygous at the HLA loci. The composition of MHC alleles is called MHC haplotype. Accordingly, the MHC is polygenic and polymorphic in order to provide the target for a wide range of possible ligands. The polymorphism at the synonymous and non-synonymous sites at the HLA-A locus are calculated to be π = 0.043 , whereas the neutral 4-fold degenerated sites at other loci show π = 0.001. Because of this high diversity, we have to correct for multiple hits using K (in percent) instead of . Lecture SS Population Genetics I 5 Lecture 7: Balancing selection Antigen Remaining codons in recognition site α1 and α2 (N=125human) Domain α3 (N=92) (ARS) (N=57) Locus KS KA KS KA KS KA HLA-A (5) 3.5 13.3*** 2.5 1.6 9.5 1.6** HLA-B (4) 7.1 18.1** 6.9 2.4* 1.5 0.5 HLA-C (3) 3.8 8.8 10.4 4.8 2.1 1.0 5.1 2.4 5.8 1.1** (number of sequences) Human Overall mean 4.7 14.1*** Besides the large excess of segregating sites which are found at the ARS loci compared to other genes, the number of nucleotide polymorphisms at nonsynonymous sites is much higher than at synonymous sites. The numbers in the table show that KA is significantly higher than KS for most human HLA loci. This causes an dN/dS ratio greater than 1, which clearly deviates from neutral expectations: dN = 1 neutrally evolving , dS dN < 1 negative selection removes nonsynonymous changes α3 , dS dN > 1 positive selection accelerate s substitut ions HLA genes . dS Positive Darwinian selection accelerates substitutions at the HLA loci. One must know that the gene products of the HLA loci can differ in up to 20 amino acids. On the contrary the non-ARS loci exhibit much less substitutions and an dN/dS ratio Lecture SS Population Genetics I 6 Lecture 7: Balancing selection smaller than one. This is due to negative selection acting on amino acid substitutions at these sites. Transspecies polymorphism A phylogenetic analysis of the HLAA and HLA-B alleles from humans Figure 3: and their corresponding alleles in chimpanzees (C-A, C-B) shows that species specific alleles do not cluster together (Figure 3). For example the human H-A11 is closer to the chimpanzee allele C-A108 than to any other human allele, likewise H-B13 and C-B1 and others. This implies that the human and chimpanzee alleles diverged before the split between the two species. Whereas humans and chimpanzees share a MRCA 5-8 million years ago the splitting time for certain MHC loci is more than 20 million years ago. Such polymorphisms, where the divergence of alleles predates the divergence of species, are called transspecies polymorphisms. Due to balancing selection these alleles did not fix nor got lost. In brief there are three remarkable facts about the MHC in humans: 1. large polymorphism in the ARS region, 2. non-synonymous changes in the ARS region exceed synonymous changes by far, 3. the allele lineages are very old and conserved through different species. Lecture SS Population Genetics I 7 Lecture 7: Balancing selection Balancing selection: Frequency-dependent selection Frequency-dependent selection specifies a type of balancing selection, where the fitness of a genotype is related to its frequency in the population. Thus, genetic variation will be preserved, but in contrast to overdominant selection it will lead to great fluctuations in allele frequencies. There are two possible modes of frequencydependent selection: In terms of positive frequency-dependent selection the fitness of an allele increases as is becomes more abundant in the population. By contrast, negative frequencydependent selection occurs when the fitness of a genotype increases as it becomes rare in the population. This often is a consequence of species interactions like predation, parasitism and competition. Example: One example of negative frequency-dependent selection was found in Arabidopsis thaliana. This plant species exhibits R-gene loci which are involved in pathogen recognition. The Rpm1 gene is thereby of particular interest. It has the ability to recognize Pseudomonas species carrying AvrRpm1 and AvrB alleles. There are two alleles leading to two differing genotypes: one is lacking Rpm1 and therefore is susceptible to a Pseudomonas infection, the other is carrying the Rpm1 gene, which provides pathogen resistance. This disease resistance polymorphism is attributable to a recombination event, which caused a deletion/insertion polymorphism in an Arabidopsis ancestor. Studies of randomly chosen alleles throughout the species range showed a frequency for the resistance allele (R-allele) of 0.52. That is, both genotypes, resistant and susceptible, are about equally present in the population. Genetic data suggests that both alleles diverged ~9.8 million years ago and were preserved in the population. The frequency of the resistance allele with Rpm1, however, may have fluctuated over time. Lecture SS Population Genetics I 8 Lecture 7: Balancing selection Figure 4: Stahl et al. (1999). Dynamics of disease resistance polymorphism at the Rpm1 locus of Arabidopsis. Nature 400, p. 667 If Rpm1 is an effective resistance gene against Pseudomonas infections, why did it not fix in the whole population? Temporal fluctuations occur in host-pathogen interactions because a certain genotype is advantageous at a certain time point (Figure 4). A low resistance allele frequency in the host plant promotes the pathogen to spread throughout the host population. This selection pressure leads to an increase of resistance allele frequency which brings about the repression of the pathogen. Presumably, there is a cost to express resistance genes like Rpm1 in times when there is no Pseudomonas around. The appropriate model describing the ArabidopsisPseudomonas interactions includes the advances and retreats of the resistance allele frequency as a dynamic polymorphism and provides a reason why they do not get fixed in the population. We can use the term 'trench warfare' to describe this form of host-pathogen interaction. Dynamical polymorphisms of ancient, substantially different alleles within a species are typical for this form of coevolution Lecture SS Population Genetics I 9