Balancing Selection

advertisement
Lecture 7: Balancing selection
Balancing selection: overdominant selection
Balancing selection is the only form of natural selection which preserves genetic
variation in populations. In this process two or more alleles are beneficial and are
maintained over time. There are three kinds of balancing selection:
1. Heterozygote advantage (overdominant selection)
2. Frequency dependent selection
3. Spatial/temporal variation
The heterozygote advantage is characterized by a higher fitness of individuals that
carry two different alleles compared to those that are homozygotes. Imagine a locus
with two alleles A and a. In a diploid population the three genotypes (AA, Aa, aa)
will have different relative fitnesses. However, the highest fitness is linked to the
heterozygote genotype. The selection coefficients s and t denote the selective
disadvantage of both homozygotes compared to the heterozygote genotype.
Genotype
AA
Aa
aa
Relative fitness w
1-s
1
1-t
The fitness scheme shows why both alleles are maintained in the population. This
does not necessarily mean that the frequencies of both alleles do not change. We
can calculate the change Δs p by using the allele frequencies of the A allele ( p ) and
the a allele ( q = 1  p ) , respectively:
Δsp
stable equilibrium
0
p
Figure 1:
Lecture SS
Population Genetics I
1
Lecture 7: Balancing selection
Δs p =
pq pwAA  wAa   qwAa  waa 
,
w
Δs p =
pqqt  ps 
.
w
Nevertheless, under some conditions a stable equilibrium can be reached. If
qt  ps  0 , Δs p = 0 . Figure 1 indicates that the equilibrium point is stable.
Example 1:
One famous case of heterozygote advantage was found in the human population.
The locus for the protein globin present in the erythrocyte is mutated, which causes
an amino acid substitution in the 6th position of the β-chain (glutamine → valine).
This brings about a structural disorder leading to a sickle-shaped erythrocyte. This
so called sickle-cell can neither bind to oxygen effectively nor pass the capillary
tubes. This dysfunction leads to the sickle-cell anemia. As a Mendelian trait sicklecall anemia is controlled by one single locus and shows a simple Mendelian
inheritance pattern.
According to our 'two allele example' above we can define the A-allele to be
'normal' or 'wild type' and the a-allele to be the sickle-cell allele. The fitness scheme
then looks like:
Genotype
AA
Aa
aa
Relative fitness w
1-s
1
1-t
Normal
Heterozygote for sickle-cell
Sickle-cell
anemia
anemia
Phenotype
The prevalence of sickle-cell anemia is strongly correlated with the distribution of
malaria in tropical Africa, Asia and South America. The reason for this overlap is the
partial resistance of the atypically formed erythrocytes against Malaria tropica. This
Lecture SS
Population Genetics I
2
Lecture 7: Balancing selection
selective advantage is most efficient in the heterozygotes because they do suffer
less under the anemia and under malaria. Referred to the fitness scheme the
consequence of the anemia is denoted by the selection coefficient t whereas the
impact of malaria sensitivity is denoted as s . If we assume that the frequencies of
both alleles already reached a stable equilibrium we can calculate s , the selection
coefficient for malaria.
0 = q~t  ~
ps
0 = 1  ~
p t  ~
ps ,
~
ps = t  ~
pt or
t
s
~
p=
and q~ =
, respectively.
s+t
s+t
(1)
Now we can calculate s , the selection coefficient for Malaria tropica in the human
population. First we need to know q~ , which is easy to ascertain. We estimate it
from the genotype frequency in a human population from Yemen:
1
q~ = f aa  + f  Aa  = 0.12 , where f aa   0 because the lifespan of this genotype is
2
very short. It follows from eq. (1)
0.12 =
s
, where t  1 as the frequency of the genotype of sickle-cell anemia is
s +1
close to 0. The selection coefficient for Malaria tropica is thus s  0.14 . It is smaller
than t but all in all very high.
Lecture SS
Population Genetics I
3
Lecture 7: Balancing selection
Example 2:
MHC/HLA polymorphism
The immune system of vertebrates is able to identify a huge amount of pathogens
by targeting particular surface structures, which are symptomatic for this pathogen.
This complex process is governed by the major histocompatibility complex
(MHC), also known as HLA system (Human Leukocyte Antigen) in humans. The
MHC is a large gene family in vertebrates that codes for more than 100 genes,
mostly involved in immune response and tissue tolerance, which is expressed by the
term 'histocompatibility'. In humans this gene cluster is located on the
6th chromosome. The MHC is referred to as a protein complex acting as surface
antigens in all cells carrying a nucleus. Therefore they account for the
immunological identity of an organism.
The first step of an immune
response comprises the identification of foreign proteins and
the
presentation
of
corresponding
antigens
T lymphocytes,
which
orchestrate
the
the
to
then
immune
system's response to infected
or
degenerated
cells.
This
initial step is done by the MHC
and
defined
restriction.
Due
as
to
MHC
this
Figure 2: The MHC class I molecule triggers the MHC
function it has been shown restriction.
that some MHC loci are
highly polymorphic.
Lecture SS
Population Genetics I
4
Lecture 7: Balancing selection
This especially holds for the antigen recognition site (ARS), also called the peptide
binding site. It belongs to the first of three MHC classes: the MHC class I (Figure
2). The MHC class I pathway triggers the cell-mediated immune response by
activating natural killer cells, which then destroy infected or degenerated cells via
lysis. The composition of a class I MHC molecule is somehow similar to an antibody
(immunoglobulins, Ig). It consists of a large membrane-linked subunit, the heavy
chain (HC) and a smaller dissolvable subunit called β2 microglobulin (B2M). The HC
can again be divided into three extracellular domains (α1, α2, α3), one
transmembrane protein and a cytoplasmic tail. The α3-domain and the B2M
molecule have a non-covalent bond and a similar structure. The crucial part of the
class I MHC, however, consists of the α1 and α2 domains. Their folding comprises
8 anti-parallel β sheets of 57 amino acid residues. They form a gap representing the
peptide binding site ARS. The three corresponding genetic loci HLA-A, HLA-B and
HLA-C show great variability in the human population. As the frequencies of
possible alleles are quite balanced it is very unlikely that an individual exhibits two
times the same allele. Therefore, most humans are heterozygous at the HLA loci.
The composition of MHC alleles is called MHC haplotype. Accordingly, the MHC is
polygenic and polymorphic in order to provide the target for a wide range of
possible ligands.
The polymorphism at the synonymous and non-synonymous sites at the HLA-A
locus are calculated to be π = 0.043 , whereas the neutral 4-fold degenerated sites at
other loci show π = 0.001. Because of this high diversity, we have to correct for
multiple hits using K (in percent) instead of  .
Lecture SS
Population Genetics I
5
Lecture 7: Balancing selection
Antigen
Remaining codons in
recognition site
α1 and α2 (N=125human)
Domain α3 (N=92)
(ARS) (N=57)
Locus
KS
KA
KS
KA
KS
KA
HLA-A (5)
3.5
13.3***
2.5
1.6
9.5
1.6**
HLA-B (4)
7.1
18.1**
6.9
2.4*
1.5
0.5
HLA-C (3)
3.8
8.8
10.4
4.8
2.1
1.0
5.1
2.4
5.8
1.1**
(number
of
sequences)
Human
Overall mean 4.7
14.1***
Besides the large excess of segregating sites which are found at the ARS loci
compared to other genes, the number of nucleotide polymorphisms at nonsynonymous sites is much higher than at synonymous sites. The numbers in the
table show that KA is significantly higher than KS for most human HLA loci. This
causes an dN/dS ratio greater than 1, which clearly deviates from neutral
expectations:
dN
= 1  neutrally evolving ,
dS
dN
< 1  negative selection removes nonsynonymous changes α3  ,
dS
dN
> 1  positive selection accelerate s substitut ions HLA genes  .
dS
Positive Darwinian selection accelerates substitutions at the HLA loci. One must
know that the gene products of the HLA loci can differ in up to 20 amino acids. On
the contrary the non-ARS loci exhibit much less substitutions and an dN/dS ratio
Lecture SS
Population Genetics I
6
Lecture 7: Balancing selection
smaller than one. This is due to negative selection acting on amino acid
substitutions at these sites.
Transspecies polymorphism
A phylogenetic analysis of the HLAA and HLA-B alleles from humans
Figure 3:
and their corresponding alleles in
chimpanzees (C-A, C-B) shows that
species
specific
alleles
do
not
cluster
together
(Figure 3).
For
example the human H-A11 is closer
to the chimpanzee allele C-A108
than to any other human allele,
likewise
H-B13
and
C-B1
and
others. This implies that the human
and chimpanzee alleles diverged
before the split between the two
species. Whereas humans and chimpanzees share a MRCA 5-8 million years ago the
splitting time for certain MHC loci is more than 20 million years ago. Such
polymorphisms, where the divergence of alleles predates the divergence of species,
are called transspecies polymorphisms. Due to balancing selection these alleles
did not fix nor got lost.
In brief there are three remarkable facts about the MHC in humans:
1. large polymorphism in the ARS region,
2. non-synonymous changes in the ARS region exceed synonymous changes by
far,
3. the allele lineages are very old and conserved through different species.
Lecture SS
Population Genetics I
7
Lecture 7: Balancing selection
Balancing selection: Frequency-dependent selection
Frequency-dependent selection specifies a type of balancing selection, where the
fitness of a genotype is related to its frequency in the population. Thus, genetic
variation will be preserved, but in contrast to overdominant selection it will lead to
great fluctuations in allele frequencies. There are two possible modes of frequencydependent selection:
In terms of positive frequency-dependent selection the fitness of an allele increases
as is becomes more abundant in the population. By contrast, negative frequencydependent selection occurs when the fitness of a genotype increases as it becomes
rare in the population. This often is a consequence of species interactions like
predation, parasitism and competition.
Example:
One
example
of
negative
frequency-dependent
selection
was
found
in
Arabidopsis thaliana. This plant species exhibits R-gene loci which are involved in
pathogen recognition. The Rpm1 gene is thereby of particular interest. It has the
ability to recognize Pseudomonas species carrying AvrRpm1 and AvrB alleles. There
are two alleles leading to two differing genotypes: one is lacking Rpm1 and
therefore is susceptible to a Pseudomonas infection, the other is carrying the Rpm1
gene, which provides pathogen resistance. This disease resistance polymorphism is
attributable to a recombination event, which caused a deletion/insertion
polymorphism in an Arabidopsis ancestor. Studies of randomly chosen alleles
throughout the species range showed a frequency for the resistance allele (R-allele)
of 0.52. That is, both genotypes, resistant and susceptible, are about equally present
in the population. Genetic data suggests that both alleles diverged ~9.8 million
years ago and were preserved in the population. The frequency of the resistance
allele with Rpm1, however, may have fluctuated over time.
Lecture SS
Population Genetics I
8
Lecture 7: Balancing selection
Figure 4: Stahl et al. (1999). Dynamics of disease resistance polymorphism at the Rpm1 locus of
Arabidopsis. Nature 400, p. 667
If Rpm1 is an effective resistance gene against Pseudomonas infections, why did it
not fix in the whole population?
Temporal fluctuations occur in host-pathogen interactions because a certain
genotype is advantageous at a certain time point (Figure 4). A low resistance allele
frequency in the host plant promotes the pathogen to spread throughout the host
population. This selection pressure leads to an increase of resistance allele
frequency which brings about the repression of the pathogen. Presumably, there is
a cost to express resistance genes like Rpm1 in times when there is no
Pseudomonas around. The appropriate model describing the ArabidopsisPseudomonas interactions includes the advances and retreats of the resistance
allele frequency as a dynamic polymorphism and provides a reason why they do not
get fixed in the population. We can use the term 'trench warfare' to describe this
form
of
host-pathogen
interaction.
Dynamical
polymorphisms
of
ancient,
substantially different alleles within a species are typical for this form of coevolution
Lecture SS
Population Genetics I
9
Download