Genetic Variation and Inheritance In the last lecture, we have identified three conditions for evolution under natural selection: 1) Organisms must vary 2) At least some of this variation must be genetic ("heritable") 3) Some of the heritable variation must affect fitness In this section, we take a closer look at these points. POINT 1: Individuals of a population vary for almost any character we may measure. This is obvious for morphological characters (just take a look around), but also holds for any other character type: From most complex behavioral traits or life-history traits (such as longevity) to biochemical traits like the amino acid composition of a protein. Continuous and discrete characters: There are two types of characters. In most morphological or behavioral traits variation is continuous (e.g. body size), in other traits, it falls into distinct categories. Examples for the latter include the individual’s sex, or amino acid variants in proteins. POINT 2: In order for natural selection to drive evolution, at least some of these phenotypic differences must be heritable. That many morphological characteristics of individuals are heritable had, of course, been known long before Darwin, soon after the publication of the Origin of Species it became a major subject of scientific inquiry. For a continuous (or quantitative) character, the part of the phenotypic variation that is heritable and can be used by natural selection is measured by the so-called (“narrow sense”) heritability h2. h2 is a number between 0 and 1 (between nothing or all). It is measured as the slope from the midparent-offspring regression: offspring height mid-parent height Typical heritabilities for quantitative traits are around h2 ≈ 0.2 – 0.5. So far, we have concentrated on a particular phenotypic trait and asked how much of its naturally occurring variation is heritable. In Darwin’s time and until the 1960s this indirect “phenotype to genotype” approach was practically the only way to measure heritable variation. After the discovery of the DNA made the genotype directly available, powerful molecular methods today allow for direct measurements of heritable variation. A question that one may ask in this context is: How much of the genome is variable? Protein gel electrophoresis is a method that works on the protein level. It works as follows: • Nondenatured soluble proteins with different net charge migrate at different rates through starch or acrylamide gels to which an electric current is applied. • The charge characteristics stem primarily from the 3 amino acids with positively charged side chains (lysine, arginine, histidine) and the 2 amino acids with negatively charged side chains (aspartic acid and glutamic acid). • The net charge determines proteins movement toward anode or cathode; size and shape also influence migrational properties. • The gel is stained with protein-specific chemicals. After staining, bands appear on the gel, which usually can be interpreted in simple genetic terms. Protein electrophoresis detects so-called allozymes. While different allozymes correspond to different alleles on the DNA level, the reverse is not true: not all amino acid substitutions affect the mobility on the gel. There is thus a tendency for protein electrophoresis to underestimate the amount of amino acid variation. Another, inevitable limitation is that variation in the nucleotide sequence that does not alter the amino acid sequence cannot be detected. These comprise silent substitutions that change the codon of a given amino acid, but not the amino acid itself, and noncoding substitutions that occur in introns, or upstream or downstream of a gene. The advantage of protein gel electrophoresis over full DNA sequencing is that it is cheap and quick: 1 gel containing extracts from 25 individuals can be sectioned into 5 replicate slices and each slide incubated with a different protein-specific stain. ~20 such gels can be run per day in an active lab. Thus, in a single day, a total of 2500 genotypes could reasonably be scored. DNA sequencing Also full DNA sequencing works by electrophoresis. After amplification of the DNA target sequence by cloning or PCR, the standard technique is the dideoxy method. • Starting from a specific primer, a copy of the target sequence is synthesized by DNA polymerase. • A low concentration of fluorescently-labelled dideoxynucleotides (deoxynucleotides lacking the hydroxyl group, ddNTPs) is added to the reaction. If polymerase randomly uses a ddNTP, the reaction stops. • A collection of fragments of increasing size with labeled end-nucleotide is obtained. • Fragments are separated by gel electrophoresis from longest to shortest. Maximal length of fragments: about 500 – 700 base pairs. • The colors are read by a laser scanner, translated into nucleotides and printed out. Measures of molecular variability Variation in the amino acid and nucleotide sequences is discrete. There are two standard measures for discrete variation: 1. A locus of interest (e.g. a gene) is called polymorphic if more than one allelic variant is found (for practical reasons a locus is generally already defined polymorphic if frequency of the major allele is less than 95%). Loci that are not polymorphic are called monomorphic. 2. The heterozygosity of a locus is the fraction of individuals in a population that are hetero-zygote at the locus. The heterozygosity is influenced by how equal frequencies of different alleles at a locus are. Polymorphism and heterozygosity can be extended to a measure of variability on the genome level: • The level of polymorphism P in the genome is the proportion of all polymorphic loci. • The genome heterozygosity H is the average heterozygosity, or the fraction of all loci in an individual that are heterozygote. From allozyme data, we obtain the following estimates for genome wide protein polymorphism and heterozygosity: P is about 30% in humans and 40% in Drosophila, and H about 7% in humans and 14% in Drosophila. Drosophila has about 13,600 loci (a conservative estimate); this would represent 1632 heterozygous loci per individual. At the DNA level, the index of heterozygosity is more commonly referred to as nucleotide diversity with symbol !. Nucleotide diversity is defined not only for diploid organisms: It is the average number of nucleotide differences per site between a pair of DNA sequences drawn at random from a population. For example, imagine a population sample that consists of three sequences of length 5, TTAGC, TAAGC, and TTACC. The pairwise difference between sequences1-2 and between sequences 1-3 is 1, the difference between sequences 2 and 3 is 2. The average difference per pair is thus (1+1+2)/3 = 5/3 and the average pairwise difference per site is ! = (5/3)/5 = 1/3. Nucleotide polymorphism is often estimated from a sample of size n as q = Sn/an (Watterson’s estimator), where Sn is the proportion of polymorphic sites in the sample, and an = 1 + 1/2 + 1/3 + … + 1/(n-1). Division by an makes estimates from samples of different size comparable. From the sample that is given in the example above, q = 2/5 /(3/2) = 4/15. Nucleotide and allozyme variation are not directly comparable; by far the largest portion of nucleotide variability comes from silent or non-coding sites. In non coding regions of the X chromosome in African Drosophila melanogaster populations, estimates of ! and q are both around 1%, meaning that 1 in 100 sites are heterozygous in an average female fly and about 20% of all nucleotide sites are polymorphic in a population of 1 Million. • The heritability data for particular traits and the variability data of proteins and DNA show large amounts of heritable genetic variation. In sexually reproducing species, virtually every individual is genetically unique. POINT 3: How much of this variation affects fitness? Fitness, or reproductive success, is usually much more difficult to measure than a generic phenotypic trait. There is no lack of examples, which show that there is heritable variation in fitness and that evolution by natural selection does actually happen (such as the evolution of antibiotic and insecticide resistance). The main issue is to estimate how widespread natural selection is. On the level of the phenotype, this is the discussion about whether a particular trait is adaptive (see last lecture). There is a similar question on the genotypic level: Does natural selection explain patterns of nucleotide diversity within populations and differences in DNA sequences among species? In principle, there are two main forces that could be responsible: natural selection and random genetic drift. Methods to distinguish these two cases will be discussed later in this lecture. Again, there are many examples that clearly show natural selection. A particularly striking result is that even differential codon usage for a given amino acid seems to have an effect on fitness (probably due to differences in translational accuracy or speed), as revealed by patterns of so-called codon bias. Nevertheless, the relative contribution of natural selection and drift to evolution is still a matter of active research. Mendelian Inheritance Even if there is heritable variation in fitness in a population, evolution by natural selection will only work if inheritance has certain special properties. Most biologists in the 19 th century, including Darwin, believed in blending inheritance. With blending inheritance, offspring produce gametes that are intermediate between the gametes have been inherited from the parents. Offspring of a black and a white moth would be gray, and offspring of a gray and a white moth light grey, etc. Put mathematically, one can show that variation (as measured by the variance) is reduced by half each generation under blending inheritance. For natural selection to work not only there needs to be heritable variation, we also need a mechanism that maintains new variants. Darwin knew this problem and was very worried about it, but did not get it resolved in his lifetime. The problem was only solved in 1918, when Ronald A. Fisher pointed out that natural selection would work under Mendelian inheritance (Mendel 1866, rediscovered about 1900). With Mandelian inheritance, only phenotypes may blend, but genotypes are preserved. As a consequence, a new variant can accumulate in frequency and eventually take over the entire population. Ironically, the early Mendelians in 1900 – 1920 opposed natural selection, but rather believed in Evolution by macromutational jumps. The synthesis of Darwinism and Mendelism was only possible after Medelian inheritance was applied to continuous traits. For a long time it was believed that Mendelian inheritance cannot explain the smooth transitions and correlations among relatives that had been measured for continuous traits. Only later it was realized that smooth distributions and evolutionary change (as seen in breeding experiments) can be explained by the effect of many Mendelian factors of small effect. This is largely a consequence of the central limit theorem, which says that the distribution of traits that are influenced by a large number of independent factors will be approximately normal. Today it is commonly believed that most evolution occurs in a large number of small steps. The study of continuous (or quantitative) traits, their co-variation and heritabilities, and relation to the underlying genetic basis is the subject of quantitative genetics. Most quantitative genetics does not use DNA sequence data, but has an abstract notion of genes and loci that is derived from phenotypic data. This is contrasted by molecular genetics that works with sequence data, but, in most cases, does not provide information about phenotypes. Closing the gap between the study of evolution at the level of phenotypes and at the level of molecular genotypes remains one of the major challenges in evolutionary biology.