Genetic Variation and Inheritance - Evolutionary Biology

advertisement
Genetic Variation and Inheritance
In the last lecture, we have identified three conditions for evolution under natural selection:
1) Organisms must vary
2) At least some of this variation must be genetic ("heritable")
3) Some of the heritable variation must affect fitness
In this section, we take a closer look at these points.
POINT 1: Individuals of a population vary for almost any character we may measure. This is
obvious for morphological characters (just take a look around), but also holds for any other
character type: From most complex behavioral traits or life-history traits (such as longevity) to
biochemical traits like the amino acid composition of a protein.
Continuous and discrete characters: There are two types of characters. In most morphological or
behavioral traits variation is continuous (e.g. body size), in other traits, it falls into distinct
categories. Examples for the latter include the individual’s sex, or amino acid variants in proteins.
POINT 2: In order for natural selection to drive evolution, at least some of these phenotypic
differences must be heritable. That many morphological characteristics of individuals are heritable
had, of course, been known long before Darwin, soon after the publication of the Origin of
Species it became a major subject of scientific inquiry.
For a continuous (or quantitative) character, the part of the phenotypic variation that is heritable
and can be used by natural selection is measured by the so-called (“narrow sense”) heritability
h2. h2 is a number between 0 and 1 (between nothing or all). It is measured as the slope from the
midparent-offspring regression:
offspring
height
mid-parent height
Typical heritabilities for quantitative traits are around h2 ≈ 0.2 – 0.5.
So far, we have concentrated on a particular phenotypic trait and asked how much of its naturally
occurring variation is heritable. In Darwin’s time and until the 1960s this indirect “phenotype to
genotype” approach was practically the only way to measure heritable variation. After the
discovery of the DNA made the genotype directly available, powerful molecular methods today
allow for direct measurements of heritable variation. A question that one may ask in this context
is: How much of the genome is variable?
Protein gel electrophoresis is a method that works on the protein level. It works as follows:
•
Nondenatured soluble proteins with different net charge migrate at different rates through
starch or acrylamide gels to which an electric current is applied.
•
The charge characteristics stem primarily from the 3 amino acids with positively charged
side chains (lysine, arginine, histidine) and the 2 amino acids with negatively charged side
chains (aspartic acid and glutamic acid).
•
The net charge determines proteins movement toward anode or cathode; size and shape
also influence migrational properties.
•
The gel is stained with protein-specific chemicals. After staining, bands appear on the gel,
which usually can be interpreted in simple genetic terms.
Protein electrophoresis detects so-called allozymes. While different allozymes correspond to
different alleles on the DNA level, the reverse is not true: not all amino acid substitutions affect
the mobility on the gel. There is thus a tendency for protein electrophoresis to underestimate the
amount of amino acid variation. Another, inevitable limitation is that variation in the nucleotide
sequence that does not alter the amino acid sequence cannot be detected. These comprise
silent substitutions that change the codon of a given amino acid, but not the amino acid itself,
and noncoding substitutions that occur in introns, or upstream or downstream of a gene.
The advantage of protein gel electrophoresis over full DNA sequencing is that it is cheap and
quick: 1 gel containing extracts from 25 individuals can be sectioned into 5 replicate slices and
each slide incubated with a different protein-specific stain. ~20 such gels can be run per day in
an active lab. Thus, in a single day, a total of 2500 genotypes could reasonably be scored.
DNA sequencing Also full DNA sequencing works
by electrophoresis. After amplification of the DNA
target sequence by cloning or PCR, the standard
technique is the dideoxy method.
•
Starting from a specific primer, a copy of the
target sequence is synthesized by DNA
polymerase.
•
A low concentration of fluorescently-labelled
dideoxynucleotides (deoxynucleotides
lacking the hydroxyl group, ddNTPs) is
added to the reaction. If polymerase
randomly uses a ddNTP, the reaction stops.
•
A collection of fragments of increasing size
with labeled end-nucleotide is obtained.
•
Fragments are separated by gel electrophoresis from longest to shortest. Maximal
length of fragments: about 500 – 700 base
pairs.
•
The colors are read by a laser scanner,
translated into nucleotides and printed out.
Measures of molecular variability
Variation in the amino acid and nucleotide sequences is discrete. There are two standard
measures for discrete variation:
1. A locus of interest (e.g. a gene) is called polymorphic if more than one allelic variant is
found (for practical reasons a locus is generally already defined polymorphic if frequency
of the major allele is less than 95%). Loci that are not polymorphic are called
monomorphic.
2. The heterozygosity of a locus is the fraction of individuals in a population that are
hetero-zygote at the locus. The heterozygosity is influenced by how equal frequencies of
different alleles at a locus are.
Polymorphism and heterozygosity can be extended to a measure of variability on the genome
level:
• The level of polymorphism P in the genome is the proportion of all polymorphic loci.
• The genome heterozygosity H is the average heterozygosity, or the fraction of all loci in
an individual that are heterozygote.
From allozyme data, we obtain the following estimates for genome wide protein polymorphism and
heterozygosity: P is about 30% in humans and 40% in Drosophila, and H about 7% in humans
and 14% in Drosophila. Drosophila has about 13,600 loci (a conservative estimate); this would
represent 1632 heterozygous loci per individual.
At the DNA level, the index of heterozygosity is more commonly referred to as nucleotide
diversity with symbol !. Nucleotide diversity is defined not only for diploid organisms: It is the
average number of nucleotide differences per site between a pair of DNA sequences drawn at
random from a population. For example, imagine a population sample that consists of three
sequences of length 5, TTAGC, TAAGC, and TTACC. The pairwise difference between
sequences1-2 and between sequences 1-3 is 1, the difference between sequences 2 and 3 is 2.
The average difference per pair is thus (1+1+2)/3 = 5/3 and the average pairwise difference per
site is ! = (5/3)/5 = 1/3. Nucleotide polymorphism is often estimated from a sample of size n as
q = Sn/an (Watterson’s estimator), where Sn is the proportion of polymorphic sites in the sample,
and an = 1 + 1/2 + 1/3 + … + 1/(n-1). Division by an makes estimates from samples of different size
comparable. From the sample that is given in the example above, q = 2/5 /(3/2) = 4/15.
Nucleotide and allozyme variation are not directly comparable; by far the largest portion of
nucleotide variability comes from silent or non-coding sites. In non coding regions of the X
chromosome in African Drosophila melanogaster populations, estimates of ! and q are both
around 1%, meaning that 1 in 100 sites are heterozygous in an average female fly and about
20% of all nucleotide sites are polymorphic in a population of 1 Million.
•
The heritability data for particular traits and the variability data of proteins and DNA show
large amounts of heritable genetic variation. In sexually reproducing species, virtually every
individual is genetically unique.
POINT 3: How much of this variation affects fitness? Fitness, or reproductive success, is usually
much more difficult to measure than a generic phenotypic trait. There is no lack of examples,
which show that there is heritable variation in fitness and that evolution by natural selection does
actually happen (such as the evolution of antibiotic and insecticide resistance). The main issue is
to estimate how widespread natural selection is. On the level of the phenotype, this is the
discussion about whether a particular trait is adaptive (see last lecture). There is a similar question
on the genotypic level: Does natural selection explain patterns of nucleotide diversity within
populations and differences in DNA sequences among species? In principle, there are two main
forces that could be responsible: natural selection and random genetic drift. Methods to
distinguish these two cases will be discussed later in this lecture. Again, there are many examples
that clearly show natural selection. A particularly striking result is that even differential codon
usage for a given amino acid seems to have an effect on fitness (probably due to differences in
translational accuracy or speed), as revealed by patterns of so-called codon bias. Nevertheless,
the relative contribution of natural selection and drift to evolution is still a matter of active
research.
Mendelian Inheritance
Even if there is heritable variation in fitness in a population, evolution by natural selection will only
work if inheritance has certain special properties. Most biologists in the 19 th century, including
Darwin, believed in blending inheritance. With blending inheritance, offspring produce gametes
that are intermediate between the gametes have been inherited from the parents. Offspring of a
black and a white moth would be gray, and offspring of a gray and a white moth light grey, etc.
Put mathematically, one can show that variation (as measured by the variance) is reduced by half
each generation under blending inheritance.
For natural selection to work not only there needs to be heritable variation, we also need a
mechanism that maintains new variants. Darwin knew this problem and was very worried about it,
but did not get it resolved in his lifetime. The problem was only solved in 1918, when Ronald A.
Fisher pointed out that natural selection would work under Mendelian inheritance (Mendel 1866,
rediscovered about 1900). With Mandelian inheritance, only phenotypes may blend, but
genotypes are preserved. As a consequence, a new variant can accumulate in frequency and
eventually take over the entire population. Ironically, the early Mendelians in 1900 – 1920
opposed natural selection, but rather believed in Evolution by macromutational jumps. The
synthesis of Darwinism and Mendelism was only possible after Medelian inheritance was applied
to continuous traits.
For a long time it was believed that Mendelian inheritance cannot explain the smooth transitions
and correlations among relatives that had been measured for continuous traits. Only later it was
realized that smooth distributions and evolutionary change (as seen in breeding experiments) can
be explained by the effect of many Mendelian factors of small effect. This is largely a
consequence of the central limit theorem, which says that the distribution of traits that are
influenced by a large number of independent factors will be approximately normal.
Today it is commonly believed that most evolution occurs in a large number of small steps. The
study of continuous (or quantitative) traits, their co-variation and heritabilities, and relation to the
underlying genetic basis is the subject of quantitative genetics. Most quantitative genetics does
not use DNA sequence data, but has an abstract notion of genes and loci that is derived from
phenotypic data. This is contrasted by molecular genetics that works with sequence data, but, in
most cases, does not provide information about phenotypes. Closing the gap between the study
of evolution at the level of phenotypes and at the level of molecular genotypes remains one of
the major challenges in evolutionary biology.
Download