Chapter 6. Multi-locus coevolution, epistasis, and linkage

Chapter 6. Multi-locus coevolution, epistasis, and linkage disequilibrium Biological Motivation Obviously, more than a dsingle locus is involved. Here we develop a basic framework for studying two locus systems introducing the concepts of epistasis, recombination, and linkage disequilibrium. After studying how coevolution proceeds in a simple two locus system (motivated by???) we move on to explore ??? INTRODUCE EPISTASIS AND LINKAGE DISEQUILIBRIUM Key Questions:    What patterns of epistasis are likely to be generated by species interactions? How do these patterns of epistasis influence the dynamics and outcome of coevolution? What patterns of linkage disequilibrium do we expect to emerge in coevolving systems? Building a 2-locus model of coevolution Our goal is to develop the simplest possible model that captures the potentially important consequences of the multi-locus gene-for-gene interactions for coevolution between X and X. Clearly, the simplest starting point is to focus on only a single pair of loci and haploid sexual species. Within haploid sexuals, recombination occurs in a transient diploid phase but selection occurs in the haploid phase. Thus, we avoid the complexities of diploidy that we struggled with in the previous chapter. Of course, ignoring diploidy also comes at the cost of reduced realism since both XX and XX are, indeed, diploid species. We imagine that rusts and flax’s run into each other at random, and that when a Flax individual with genotype i encounters a rust individual with genotype j, an infection results with probability 𝛼𝑖,𝑗 . If we assume that infection has negative fitness consequences for the flax and positive fitness consequences for the rust, the fitness of the four possible Flax genotypes is given by: 𝑊𝑋,𝐴𝐵 = 1 − 𝑠𝑋 (𝑌𝐴𝐵 𝛼𝐴𝐵,𝐴𝐵 + 𝑌𝐴𝑏 𝛼𝐴𝐵,𝐴𝑏 + 𝑌𝑎𝐵 𝛼𝐴𝐵,𝑎𝐵 + 𝑌𝑎𝑏 𝛼𝐴𝐵,𝑎𝑏 ) (1a) 𝑊𝑋,𝐴𝑏 = 1 − 𝑠𝑋 (𝑌𝐴𝐵 𝛼𝐴𝑏,𝐴𝐵 + 𝑌𝐴𝑏 𝛼𝐴𝑏,𝐴𝑏 + 𝑌𝑎𝐵 𝛼𝐴𝑏,𝑎𝐵 + 𝑌𝑎𝑏 𝛼𝐴𝑏,𝑎𝑏 ) (1b) 𝑊𝑋,𝑎𝐵 = 1 − 𝑠𝑋 (𝑌𝐴𝐵 𝛼𝑎𝐵,𝐴𝐵 + 𝑌𝐴𝑏 𝛼𝑎𝐵,𝐴𝑏 + 𝑌𝑎𝐵 𝛼𝑎𝐵,𝑎𝐵 + 𝑌𝑎𝑏 𝛼𝑎𝐵,𝑎𝑏 ) (1c) 𝑊𝑋,𝑎𝑏 = 1 − 𝑠𝑋 (𝑌𝐴𝐵 𝛼𝑎𝑏,𝐴𝐵 + 𝑌𝐴𝑏 𝛼𝑎𝑏,𝐴𝑏 + 𝑌𝑎𝐵 𝛼𝑎𝑏,𝑎𝐵 + 𝑌𝑎𝑏 𝛼𝑎𝑏,𝑎𝑏 ) (1d) Similarly, the fitness of the four possible Rust genotypes is given by: 𝑊𝑌,𝐴𝐵 = 1 − 𝑠𝑌 (1 − 𝑋𝐴𝐵 𝛼𝐴𝐵,𝐴𝐵 − 𝑋𝐴𝑏 𝛼𝐴𝑏,𝐴𝐵 − 𝑋𝑎𝐵 𝛼𝑎𝐵,𝐴𝐵 − 𝑋𝑎𝑏 𝛼𝑎𝑏,𝐴𝐵 ) (2a) 𝑊𝑌,𝐴𝑏 = 1 − 𝑠𝑌 (1 − 𝑋𝐴𝐵 𝛼𝐴𝐵,𝐴𝑏 − 𝑋𝐴𝑏 𝛼𝐴𝑏,𝐴𝑏 − 𝑋𝑎𝐵 𝛼𝑎𝐵,𝐴𝑏 − 𝑋𝑎𝑏 𝛼𝑎𝑏,𝐴𝑏 ) (2b) Mathematica Resources: http://www.webpages.uidaho.edu/~snuismer/Nuismer_Lab/the_theory_of_coevolution.htm 𝑊𝑌,𝑎𝐵 = 1 − 𝑠𝑌 (1 − 𝑋𝐴𝐵 𝛼𝐴𝐵,𝑎𝐵 − 𝑋𝐴𝑏 𝛼𝐴𝑏,𝑎𝐵 − 𝑋𝑎𝐵 𝛼𝑎𝐵,𝑎𝐵 − 𝑋𝑎𝑏 𝛼𝑎𝑏,𝑎𝐵 ) (2c) 𝑊𝑌,𝑎𝑏 = 1 − 𝑠𝑌 (1 − 𝑋𝐴𝐵 𝛼𝐴𝐵,𝑎𝑏 − 𝑋𝐴𝑏 𝛼𝐴𝑏,𝑎𝑏 − 𝑋𝑎𝐵 𝛼𝑎𝐵,𝑎𝑏 − 𝑋𝑎𝑏 𝛼𝑎𝑏,𝑎𝑏 ) (2d) Now, if we assume that the probability of survival to mating for the various Flax and Rust genotypes depends on these fitnesses, we can calculate the frequency of each genotype after selection but prior to random mating. As before, we can calculate these frequencies by multiplying the current frequency by its relative fitness. For the Flax, this yields the following expressions: ′ 𝑋𝐴𝐵 = 𝑋𝐴𝐵 𝑊𝑋,𝐴𝐵 ̅𝑋 𝑊 (3a) ′ 𝑋𝐴𝑏 = 𝑋𝐴𝑏 𝑊𝑋,𝐴𝑏 ̅𝑋 𝑊 (3b) ′ 𝑋𝑎𝐵 = 𝑋𝑎𝐵 𝑊𝑋,𝑎𝐵 ̅𝑋 𝑊 (3c) ′ 𝑋𝑎𝑏 = 𝑋𝑎𝑏 𝑊𝑋,𝑎𝑏 ̅𝑋 𝑊 (3d) ̅𝑋 is the population mean fitness of species X and is given by: where, as usual, the symbol 𝑊 ̅𝑋 = 𝑋𝐴𝐵 𝑊𝑋,𝐴𝐵 + 𝑋𝐴𝑏 𝑊𝑋,𝐴𝑏 + 𝑋𝑎𝐵 𝑊𝑋,𝑎𝐵 + 𝑋𝑎𝑏 𝑊𝑋,𝑎𝑏 𝑊 (3e) The same procedure can now be applied to the rust population to calculate the frequency of two-locus genotypes there after selection but prior to mating: ′ 𝑌𝐴𝐵 = 𝑌𝐴𝐵 𝑊𝑌,𝐴𝐵 ̅𝑌 𝑊 (4a) ′ 𝑌𝐴𝑏 = 𝑋𝐴𝑏 𝑊𝑌,𝐴𝑏 ̅𝑌 𝑊 (4b) ′ 𝑌𝑎𝐵 = 𝑌𝑎𝐵 𝑊𝑌,𝑎𝐵 ̅𝑌 𝑊 (4c) ′ 𝑌𝑎𝑏 = 𝑌𝑎𝑏 𝑊𝑌,𝑎𝑏 ̅𝑌 𝑊 (4d) ̅𝑋 is the population mean fitness of species X and is given by: where, as usual, the symbol 𝑊 ̅𝑌 = 𝑌𝐴𝐵 𝑊𝑌,𝐴𝐵 + 𝑌𝐴𝑏 𝑊𝑌,𝐴𝑏 + 𝑌𝑎𝐵 𝑊𝑌,𝑎𝐵 + 𝑌𝑎𝑏 𝑊𝑌,𝑎𝑏 𝑊 (4e) OK, so now we know what the frequencies of the various genotypes are just before mating ensues. How can we now move forward to incorporate changes to genotype frequencies that accrue during the process of mating? If we are willing to assume that both Flax and Rust mate at random and have quite large population sizes, we can derive basic expressions for changes in genotype frequencies. The long and 2 tedious way to go about this is to first tabulate the frequency of offspring with various genotypes that are produced by all possible combinations of parents (Table 1). RECOMBINATION! INTRODUCE IT HERE Table 1. Genotype frequencies produced by random matings Maternal|Paternal genotypes AB|AB AB|Ab AB|aB AB|ab Ab|AB Ab|Ab Ab|aB Ab|ab aB|AB aB|Ab aB|aB aB|ab ab|AB ab|Ab ab|aB ab|ab Frequency of mating AB 𝑋𝐴𝐵 𝑋𝐴𝐵 𝑋𝐴𝐵 𝑋𝐴𝑏 𝑋𝐴𝐵 𝑋𝑎𝐵 𝑋𝐴𝐵 𝑋𝑎𝑏 𝑋𝐴𝐵 𝑋𝐴𝐵 𝑋𝐴𝐵 𝑋𝐴𝑏 𝑋𝐴𝐵 𝑋𝑎𝐵 𝑋𝐴𝐵 𝑋𝑎𝑏 𝑋𝐴𝐵 𝑋𝐴𝐵 𝑋𝐴𝐵 𝑋𝐴𝑏 𝑋𝐴𝐵 𝑋𝑎𝐵 𝑋𝐴𝐵 𝑋𝑎𝑏 𝑋𝐴𝐵 𝑋𝐴𝐵 𝑋𝐴𝐵 𝑋𝐴𝑏 𝑋𝐴𝐵 𝑋𝑎𝐵 𝑋𝐴𝐵 𝑋𝑎𝑏 1 1/2 1/2 (1 − 𝑟)/2 1/2 0 𝑟/2 0 1/2 𝑟/2 0 0 (1 − 𝑟)/2 0 0 0 Offspring genotype Ab aB 0 1/2 0 𝑟/2 1/2 1 (1 − 𝑟)/2 1/2 0 (1 − 𝑟)/2 0 0 𝑟/2 1/2 0 0 0 0 1/2 𝑟/2 0 0 (1 − 𝑟)/2 0 1/2 (1 − 𝑟)/2 1 1/2 𝑟/2 0 1/2 0 ab 0 0 0 (1 − 𝑟)/2 0 0 𝑟/2 1/2 0 𝑟/2 0 1/2 (1 − 𝑟)/2 1/2 1/2 1 What Table 1 provides us with is the raw material for calculating the frequency of the various genotypes in the offspring generation. All we need to do now is sum up the entries in each column, weighting each entry by the frequency with which the two relevant parental genotypes encounter one another at random and mate. Mathematically, this amounts to evaluating the following expression for each of the four possible offspring genotypes, i: 𝑋𝑖′′ = ∑4𝑗=1 ∑4𝑘=1 𝑋𝑗′ 𝑋𝑘′ Π𝑋,𝑗+𝑘→𝑖 (5a) and the following expression for the four possible offspring genotype in Rust: 𝑌𝑖′′ = ∑4𝑗=1 ∑4𝑘=1 𝑌𝑗′ 𝑌𝑘′ Π𝑌,𝑗+𝑘→𝑖 (5b) where Π𝑋,𝑗+𝑘→𝑖 and Π𝑌,𝑗+𝑘→𝑖 are the probability that two parents with genotypes j and k produce an offspring of genotype i within the Flax and Rust populations, respectively, and are given in the offspring genotype columns of Table 1. Although equations (5) help to see, mechanistically speaking, how the genotype frequencies within one generation are translated into those of the next through the process of segregation and recombination, they are quite clunky and not terribly insightful. Fortunately, these equations can be greatly simplified and re-expressed in a way that is much easier to implement from a practical 3 standpoint. Specifically, plugging away at equations (5) algebraically for a while (or more realistically, a very long while) allows them to be re-written as: ′′ ′ 𝑋𝐴𝐵 = 𝑋𝐴𝐵 + 𝑟𝑋 𝐷𝑋′ (6a) ′′ ′ 𝑋𝐴𝑏 = 𝑋𝐴𝑏 − 𝑟𝑋 𝐷𝑋′ (6b) ′′ ′ 𝑋𝑎𝐵 = 𝑋𝑎𝐵 − 𝑟𝑋 𝐷𝑋′ (6c) ′′ ′ 𝑋𝑎𝑏 = 𝑋𝑎𝑏 + 𝑟𝑋 𝐷𝑋′ (6d) in the Flax and as: ′′ ′ 𝑌𝐴𝐵 = 𝑌𝐴𝐵 + 𝑟𝑌 𝐷𝑌′ (7a) ′′ ′ 𝑌𝐴𝑏 = 𝑌𝐴𝑏 − 𝑟𝑌 𝐷𝑌′ (7b) ′′ ′ 𝑌𝑎𝐵 = 𝑌𝑎𝐵 − 𝑟𝑌 𝐷𝑌′ (7c) ′′ ′ 𝑌𝑎𝑏 = 𝑌𝑎𝑏 + 𝑟𝑌 𝐷𝑌′ (7d) in the rust. In these equations, DX and DY quantify linkage disequilibrium, a measure of the statistical ′ ′ association (i.e., the covariance) between alleles at the A and B loci. Specifically, 𝐷𝑋′ = 𝑋𝐴𝐵 𝑋𝑎𝑏 − ′ ′ ′ ′ ′ ′ ′ 𝑋𝐴𝑏 𝑋𝑎𝐵 and 𝐷𝑌 = 𝑌𝐴𝐵 𝑌𝑎𝑏 − 𝑌𝐴𝑏 𝑌𝑎𝐵 such that linkage disequilibrium is positive if there is an excess of AB and ab genotypes within a population and negative if it is, instead, the Ab and aB genotypes that are in excess. A key insight provided by equations (6-7) is that the change in genotype frequencies that occurs in response to random mating depends entirely on the rate of recombination. If no recombination occurs, genotype frequencies within the offspring population remain identical to those within the parental population. If, instead, recombination occurs, genotype frequencies in the offspring generation differ from those in the parental generation by an amount proportional to linkage disequilibrium. Clearly, then, recombination can influence coevolution only in cases where coevolutionary selection, or some other evolutionary force, acts to create linkage disequilibrium within populations of interacting species. We are now at a point where we have successfully described how genotype frequencies change over the course of a single generation. To maintain some generality, let’s wait to substitute in the specific values for fitness corresponding to our GFG model, and simply express how genotype frequencies change in terms of arbitrary fitness values, W. Specifically, subsitututing (3) into (6) and changing from recursion equations to difference equations, yields the following expressions for the change in host genotype frequencies that occurs over the course of a single generation: ∆𝑋𝐴𝐵 = ̅ 𝑋 )𝑊 ̅𝑋 𝑟𝑋 (𝑊𝑋,𝑎𝐵 𝑊𝑋,𝐴𝑏 𝑋𝑎𝐵 𝑋𝐴𝑏 −𝑊𝑋,𝑎𝑏 𝑊𝑋,𝐴𝐵 𝑋𝑎𝑏 𝑋𝐴𝐵 )+𝑋𝐴𝐵 (𝑊𝑋,𝐴𝐵 −𝑊 2 ̅ 𝑊𝑋 (8a) ∆𝑋𝐴𝑏 = ̅ 𝑋 )𝑊 ̅𝑋 𝑟𝑋 (𝑊𝑋,𝑎𝑏 𝑊𝑋,𝐴𝐵 𝑋𝑎𝑏 𝑋𝐴𝐵 −𝑊𝑋,𝑎𝐵 𝑊𝑋,𝐴𝑏 𝑋𝑎𝐵 𝑋𝐴𝑏 )+𝑋𝐴𝑏 (𝑊𝑋,𝐴𝑏 −𝑊 ̅ 𝑋2 𝑊 (8b) 4 ∆𝑋𝑎𝐵 = ̅ 𝑋 )𝑊 ̅𝑋 𝑟𝑋 (𝑊𝑋,𝑎𝑏 𝑊𝑋,𝐴𝐵 𝑋𝑎𝑏 𝑋𝐴𝐵 −𝑊𝑋,𝑎𝐵 𝑊𝑋,𝐴𝑏 𝑋𝑎𝐵 𝑋𝐴𝑏 )+𝑋𝑎𝐵 (𝑊𝑋,𝑎𝐵 −𝑊 2 ̅ 𝑊𝑋 (8c) ∆𝑋𝑎𝑏 = ̅ 𝑋 )𝑊 ̅𝑋 𝑟𝑋 (𝑊𝑋,𝑎𝐵 𝑊𝑋,𝐴𝑏 𝑋𝑎𝐵 𝑋𝐴𝑏 −𝑊𝑋,𝑎𝑏 𝑊𝑋,𝐴𝐵 𝑋𝑎𝑏 𝑋𝐴𝐵 )+𝑋𝑎𝑏 (𝑊𝑋,𝑎𝑏 −𝑊 ̅ 𝑋2 𝑊 (8d) Equations for the pathogen species, Y, are essentially identical and so are not shown. We are now to a point where we could, if we wished, simply simulate the process of coevolution by plugging in the values for fitness we derived previously for the GFG system (EQUSTIONS X) and iterating equations (X). Although this approach would surely provide us with some insights into the process of a coevolution, a much more insightful and elegant approach is to first make a change of variables (Appendix 3) that allows us to focus on allele frequencies and linkage disequilibrium rather than genotype frequencies. In addition to facilitating biological interpretation and intuition, this change of variables simplifies our model by reducing the number of variables we follow from four in equations (X) to three, which is the actual number of degrees of freedom in the system. In order to make the change of variables from genotype frequencies to allele frequencies and linkage disequilibrium, we first need to clearly define the new variables. Specifically, we define allele frequencies: 𝑝𝑋,𝐴 = 𝑋𝐴𝐵 + 𝑋𝐴𝑏 (9a) 𝑝𝑋,𝐵 = 𝑋𝐴𝐵 + 𝑋𝑎𝐵 (9b) 𝑝𝑌,𝐴 = 𝑌𝐴𝐵 + 𝑌𝐴𝑏 (9c) 𝑝𝑌,𝐵 = 𝑌𝐴𝐵 + 𝑌𝑎𝐵 (9d) and linkage disequilibrium: 𝐷𝑋 = 𝑋𝐴𝐵 𝑋𝑎𝑏 − 𝑋𝐴𝑏 𝑋𝑎𝐵 (10a) 𝐷𝑌 = 𝑌𝐴𝐵 𝑌𝑎𝑏 − 𝑌𝐴𝑏 𝑌𝑎𝐵 (10b) for both of the interacting species. The next step in our change of variables is to write down new recursions that capture the way in which our new variables change over the course of a single generation. The easiest way to do this is to just substitute the predicted values for the genotype ′′ ′′ frequencies in the next generation (e.g., 𝑋𝐴𝐵 , 𝑋𝐴𝑏, etc.) into expressions (9-10), yielding: ′′ 𝑝𝑋,𝐴 = 𝑊𝑋,𝐴𝑏 𝑋𝐴𝑏 +𝑊𝑋,𝐴𝐵 𝑋𝐴𝐵 ̅𝑋 𝑊 (11a) ′′ 𝑝𝑋,𝐵 = 𝑊𝑋,𝑎𝐵 𝑋𝑎𝐵 +𝑊𝑋,𝐴𝐵 𝑋𝐴𝐵 ̅𝑋 𝑊 (11b) 𝐷𝑋′′ = (𝑊𝑋,𝑎𝐵 𝑊𝑋,𝐴𝑏 𝑋𝑎𝐵 𝑋𝐴𝑏 −𝑊𝑋,𝑎𝑏 𝑊𝑋,𝐴𝐵 𝑋𝑎𝑏 𝑋𝐴𝐵 )(𝑟𝑋 −1) ̅ 𝑋2 𝑊 5 (11c) ′′ 𝑝𝑌,𝐴 = 𝑊𝑌,𝐴𝑏 𝑌𝐴𝑏 +𝑊𝑌,𝐴𝐵 𝑌𝐴𝐵 ̅𝑌 𝑊 (12a) ′′ 𝑝𝑌,𝐵 = 𝑊𝑌,𝑎𝐵 𝑌𝑎𝐵 +𝑊𝑌,𝐴𝐵 𝑌𝐴𝐵 ̅𝑌 𝑊 (12b) 𝐷𝑌′′ = (𝑊𝑌,𝑎𝐵 𝑊𝑌,𝐴𝑏 𝑌𝑎𝐵 𝑌𝐴𝑏 −𝑊𝑌,𝑎𝑏 𝑊𝑌,𝐴𝐵 𝑌𝑎𝑏 𝑌𝐴𝐵 )(𝑟𝑌 −1) ̅ 𝑌2 𝑊 (12c) Obviously, we still have a bit of a problem! Our equations now contain a mix of old and new variables which can never be a good thing. The way to move forward is to recognize that the genotype frequencies appearing in the right hand sides of the equations can be re-written using definitions (9-10) in the following way: 𝑋𝐴𝐵 = 𝑝𝑋,𝐴 𝑝𝑋,𝐵 + 𝐷𝑋 (13a) 𝑋𝐴𝑏 = 𝑝𝑋,𝐴 𝑞𝑋,𝐵 − 𝐷𝑋 (13b) 𝑋𝑎𝐵 = 𝑞𝑋,𝐴 𝑝𝑋,𝐵 − 𝐷𝑋 (13c) 𝑋𝑎𝑏 = 𝑞𝑋,𝐴 𝑞𝑋,𝐵 + 𝐷𝑋 (13d) 𝑌𝐴𝐵 = 𝑝𝑌,𝐴 𝑝𝑌,𝐵 + 𝐷𝑌 (14a) 𝑌𝐴𝑏 = 𝑝𝑌,𝐴 𝑞𝑌,𝐵 − 𝐷𝑌 (14b) 𝑌𝑎𝐵 = 𝑞𝑌,𝐴 𝑝𝑌,𝐵 − 𝐷𝑌 (14c) 𝑌𝑎𝑏 = 𝑞𝑌,𝐴 𝑞𝑌,𝐵 + 𝐷𝑌 (14d) Substituting (13 and 14) into (11 and 12) and doing a bit of algebra allows us to finally complete our change of variables and arrive at a set of equations expressed entirely in terms of the new variables. ′′ 𝑝𝑋,𝐴 = 𝑝𝑋,𝐴 (𝑞𝑋,𝐵 𝑊𝑋,𝐴𝑏 +𝑝𝑋,𝐵 𝑊𝑋,𝐴𝐵 )+(𝑊𝑋,𝐴𝐵 −𝑊𝑋,𝐴𝑏 )𝐷𝑋 ′′ 𝑝𝑋,𝐵 = 𝑝𝑋,𝐵 (𝑞𝑋,𝐴 𝑊𝑋,𝑎𝐵 +𝑝𝑋,𝐴 𝑊𝑋,𝐴𝐵 )+(𝑊𝑋,𝐴𝐵 −𝑊𝑋,𝑎𝐵 )𝐷𝑋 𝐷𝑋′′ = (15a) _ 𝑊𝑋 (15b) _ 𝑊𝑋 (𝑊𝑋,𝑎𝐵 𝑊𝑋,𝐴𝑏 (𝑞𝑋,𝐴 𝑝𝑋,𝐵 −𝐷𝑋 )(𝑝𝑋,𝐴 𝑞𝑋,𝐵 −𝐷𝑋 )−𝑊𝑋,𝑎𝑏 𝑊𝑋,𝐴𝐵 (𝑞𝑋,𝐴 𝑞𝑋,𝐵 +𝐷𝑋 )(𝑝𝑋,𝐴 𝑝𝑋,𝐵 +𝐷𝑋 ))(𝑟𝑋 −1) ̅ 𝑋2 𝑊 (15c) and, ′′ 𝑝𝑌,𝐴 = 𝑝𝑌,𝐴 (𝑞𝑌,𝐵 𝑊𝑌,𝐴𝑏 +𝑝𝑌,𝐵 𝑊𝑌,𝐴𝐵 )+(𝑊𝑌,𝐴𝐵 −𝑊𝑌,𝐴𝑏 )𝐷𝑌 ̅𝑌 𝑊 (16a) ′′ 𝑝𝑌,𝐵 = 𝑝𝑌,𝐵 (𝑞𝑌,𝐴 𝑊𝑌,𝑎𝐵 +𝑝𝑌,𝐴 𝑊𝑌,𝐴𝐵 )+(𝑊𝑌,𝐴𝐵 −𝑊𝑌,𝑎𝐵 )𝐷𝑌 ̅𝑌 𝑊 (16b) 6 𝐷𝑌′′ = (𝑊𝑌,𝑎𝐵 𝑊𝑌,𝐴𝑏 (𝑞𝑌,𝐴 𝑝𝑌,𝐵 −𝐷𝑌 )(𝑝𝑌,𝐴 𝑞𝑌,𝐵 −𝐷𝑌 )−𝑊𝑌,𝑎𝑏 𝑊𝑌,𝐴𝐵 (𝑞𝑌,𝐴 𝑞𝑌,𝐵 +𝐷𝑌 )(𝑝𝑌,𝐴 𝑝𝑌,𝐵 +𝐷𝑌 ))(𝑟𝑌 −1) ̅ 𝑌2 𝑊 (16c) We now have a set of equations describing how allele frequencies and linkage disequilibrium evolve in response to natural selection and random mating over the course of a single generation. Our last move is to re-write these recursions as difference equations by subtracting their values at the start of the generation from (15-16): _ ∆𝑝𝑋,𝐴 = 𝑝𝑋,𝐴 (𝑝𝑋,𝐵 𝑊𝑋,𝐴𝐵 +𝑞𝑋,𝐵 𝑊𝑋,𝐴𝑏 −𝑊𝑋 )+(𝑊𝑋,𝐴𝐵 −𝑊𝑋,𝐴𝑏 )𝐷𝑋 (17a) _ 𝑊𝑋 _ ∆𝑝𝑋,𝐵 = ∆𝐷𝑋 = 𝑝𝑋,𝐵 (𝑞𝑋,𝐴 𝑊𝑋,𝑎𝐵 +𝑝𝑋,𝐴 𝑊𝑋,𝐴𝐵 −𝑊𝑋 )+(𝑊𝑋,𝐴𝐵 −𝑊𝑋,𝑎𝐵 )𝐷𝑋 (17b) _ 𝑊𝑋 ̅ 𝑋2 (𝑊𝑋,𝑎𝐵 𝑊𝑋,𝐴𝑏 (𝑞𝑋,𝐴 𝑝𝑋,𝐵 −𝐷𝑋 )(𝑝𝑋,𝐴 𝑞𝑋,𝐵 −𝐷𝑋 )−𝑊𝑋,𝑎𝑏 𝑊𝑋,𝐴𝐵 (𝑞𝑋,𝐴 𝑞𝑋,𝐵 +𝐷𝑋 )(𝑝𝑋,𝐴 𝑝𝑋,𝐵 +𝐷𝑋 ))(𝑟𝑋 −1)−𝐷𝑋 𝑊 ̅ 𝑋2 𝑊 (17c) and, ∆𝑝𝑌,𝐴 = ̅ 𝑌 )+(𝑊𝑌,𝐴𝐵 −𝑊𝑌,𝐴𝑏 )𝐷𝑌 𝑝𝑌,𝐴 (𝑞𝑌,𝐵 𝑊𝑌,𝐴𝑏 +𝑝𝑌,𝐵 𝑊𝑌,𝐴𝐵 −𝑊 ̅ 𝑊𝑌 (18a) ∆𝑝𝑌,𝐵 = ̅ 𝑌 )+(𝑊𝑌,𝐴𝐵 −𝑊𝑌,𝑎𝐵 )𝐷𝑌 𝑝𝑌,𝐵 (𝑞𝑌,𝐴 𝑊𝑌,𝑎𝐵 +𝑝𝑌,𝐴 𝑊𝑌,𝐴𝐵 −𝑊 ̅𝑌 𝑊 (18b) ∆𝐷𝑌 = ̅ 𝑌2 (𝑊𝑌,𝑎𝐵 𝑊𝑌,𝐴𝑏 (𝑞𝑌,𝐴 𝑝𝑌,𝐵 −𝐷𝑌 )(𝑝𝑌,𝐴 𝑞𝑌,𝐵 −𝐷𝑌 )−𝑊𝑌,𝑎𝑏 𝑊𝑌,𝐴𝐵 (𝑞𝑌,𝐴 𝑞𝑌,𝐵 +𝐷𝑌 )(𝑝𝑌,𝐴 𝑝𝑌,𝐵 +𝐷𝑌 ))(𝑟𝑌 −1)−𝐷𝑌 𝑊 ̅ 𝑌2 𝑊 (18c) PHHHEEEEWWWWYYYY! We have done it. With the bulk of the tedious algebraic book-keeping behind us, we can finally move on. Analyzing the model We can now transform the general two locus model described by difference equations (17-18) into a specific model of coevolution between Flax and Flax-Rust by replacing the general values of fitness W with their specific given by equations (X) and values of the interaction matrix appropriate for our gene-for-gene model: 1 0 𝛼=[ 0 0 1 1 0 0 1 0 1 0 1 1 ] 1 1 (19) Here, the interaction matrix depicts the outcome of the classical gene-for-gene model where host resistance genes (A and B) are able to recognize parasite avirulence genes (a and b), but not parasite virulence genes (A and B). Even after working long and hard to simplify the resulting equations, however, I couldn’t get them to fit on a single line of this page. As a general rule of thumb, if your equation doesn’t fit on a single line, you aren’t going to learn much from it. So, what can we do? One option is to charge straight ahead and 7 simply rely on our computer to simulate coevolution by iterating our recursion equations for a large number of parameter combinations. Although there is nothing wrong with this approach, we can actually gain quite a bit of biological insight by using a bit more mathematical finesse, and developing approximations that assume selection is not too strong and that recombination occurs with some reasonable frequency. To be a bit more specific, one way to proceed is to pursue a Quasi-Linkage Equilibrium (QLE) approximation (REFS). Although a great deal of difficult math has gone into rigorous mathematical investigation when and where the QLE approximation can be applied (REFS), our approach here will be more informal and, I hope, more practical for those who simply want to learn something about biology rather than mathematics. As a general rule of thumb, anytime selection is not too strong (less than a 1% difference in fitness among genotypes) and recombination is of a larger magnitude than selection (if the fitness difference among genotypes is 1%, recombination should be at least 0.1 or greater), linkage disequilibrium will change much more rapidly than allele frequencies and will, in fact, approach a quasiequilibrium state where its value is small, and a function of the current allele frequencies within the population. What this means to us is that if selection is weak (< 1%) and recombination is frequent (>10%), linkage disequilibrium will be as small as selection (< 0.01). As a result, as long as we are willing to tolerate some small amount of inaccuracy in our prediction, we can ignore all terms in our difference equations that include things like s2, D2, and s*D because these terms will all be very small and quite negligible. To be a bit more formal, if we are willing to assume recombination is frequent, and that selection is weak and of some small order ε, linkage disequilibrium will also be weak and of order ε, allowing us to ignore all terms of order ε2 and higher. Clearly, what this means is that our QLE approximation will be more and more accurate as the difference in fitness among genotypes decreases, because the terms we ignore become ever smaller in relation to the terms we keep. Returning to the specific case of coevolution between Flax and Flax Rust, what we are going to assume is that the fitness consequences of the interaction are relatively weak such that 𝑠𝑋 and 𝑠𝑌 are both of small order ε, and that recombination within both species is relatively frequent (i.e., > ε). Our next step in implementing our QLE approximation is to replace each of our difference equations with its first order Taylor Series Expansion in ε. Using Mathematica, this is an incredibly trivial thing to do, and yields the following approximate expressions for evolutionary change in the Flax: ∆𝑝𝑋,𝐴 ≈ 𝑠𝑋 𝑝𝑋,𝐴 𝑞𝑋,𝐴 𝑞𝑌,𝐴 (1 − 𝑝𝑋,𝐵 𝑞𝑌,𝐵 ) (20a) ∆𝑝𝑋,𝐵 ≈ 𝑠𝑋 𝑝𝑋,𝐵 𝑞𝑋,𝐵 𝑞𝑌,𝐵 (1 − 𝑝𝑋,𝐴 𝑞𝑌,𝐴 ) (20b) ∆𝐷𝑋 ≈ −𝑠𝑋 (1 − 𝑟𝑋 )𝑝X,A 𝑞X,A 𝑝X,B 𝑞X,B 𝑞Y,A 𝑞Y,B − 𝑟𝑋 𝐷𝑋 (20c) and Flax Rust: ∆𝑝𝑌,𝐴 ≈ 𝑠𝑌 𝑝Y,A 𝑞Y,A 𝑝X,A (1 − 𝑝X,B 𝑞Y,B ) (21a) ∆𝑝𝑌,𝐵 ≈ 𝑠𝑌 𝑝Y,B 𝑞Y,B 𝑝X,B (1 − 𝑝X,A 𝑞Y,A ) (21b) ∆𝐷𝑌 ≈ 𝑠𝑌 (1 − 𝑟𝑌 )𝑝Y,A 𝑞Y,A 𝑝Y,B 𝑞Y,B 𝑝X,A 𝑝X,B − 𝑟𝑌 𝐷𝑌 (21c) 8 The beauty of the QLE approximation, and the primary reason for using it, (other than the fact that it makes truly lovely equations) is that it allows us to “see” things about the biology of a system that we might otherwise spend hours upon hours simulating and still never pick up on. For instance, even a passing inspection of our approximation reveals that coevolutionary change in allele frequencies is independent of linkage disequilibrium. As a result, we can solve for the quasi-equilibrium values of linkage disequilibrium by simply setting (20c and 21c) equal to zero and solving for 𝐷𝑋 and 𝐷𝑌 : ̃𝑋 ≈ − 𝑠𝑋 (1−𝑟𝑋 )𝑝X,A 𝑞X,A 𝑝X,B 𝑞X,B 𝑞Y,A 𝑞Y,B 𝐷 𝑟 (22a) ̃𝑌 ≈ 𝑠𝑌 (1−𝑟𝑌 )𝑝Y,A 𝑞Y,A 𝑝Y,B 𝑞Y,B 𝑝X,A 𝑝X,B 𝐷 𝑟 (22b) 𝑋 𝑌 Remarkably, this shows that the sign of linkage disequilibrium should be different in Flax and Flax rust. Specifically, linkage disequilibrium between resistance genes within the flax should always be negative whereas linkage disequilibrium between virulence genes in the rust should always be positive. The biological reason for this intriguing pattern is that Flax individuals receive a fitness benefit by carrying a single resistance gene at either locus (A or B) whereas rust individuals must carry virulence alleles at both loci (A and B) in order to evade detection and elimination by the host. Consequently, within the Flax population, the quantity 𝑊𝑋,𝐴𝐵 𝑊𝑋,𝑎𝑏 − 𝑊𝑋,𝐴𝑏 𝑊𝑋,𝑎𝐵 is negative, indicating it experiences negative epistasis. In contrast, within the Rust population the quantity 𝑊𝑌,𝐴𝐵 𝑊𝑌,𝑎𝑏 − 𝑊𝑌,𝐴𝑏 𝑊𝑌,𝑎𝐵 is positive, indicating the rust population experiences positive epistasis. Our QLE approximation has already unearthed a valuable insight about our expectations for the form of epistasis and sign of linkage disequilibrium that we expect to emerge from GFG coevolution. Can we push our QLE approximation further to learn about the dynamics and outcomes of coevolution? The place to start is with an analysis of allele frequency change. Inspecting equations (19a-b, 20a-b) reveals that as long as genetic variation exists at all loci, host resistance genes and parasite virulence genes will increase in frequency (Figure 1). Only when the parasite fixes both virulence alleles, or the host has no resistance alleles at either locus, does coevolution cease. This picture of coevolutionary dynamics is remarkably similar to what we saw when we studied gene-for-gene coevolution in a haploid, single locus model. Only when we study the relative rates of coevolution in the two species do we see the novel twist that multiple loci and epistasis bring to the table. Specifically, if both Flax and Rust initially have very low frequencies of resistance and virulence alleles, the Flax population will evolve resistance much more rapidly than the rust can evolve to overcome it (Figure 1). The reason for this striking difference in coevolutionary rates is, again, epistasis. Because the host realizes fitness benefits by having a resistance allele at a single locus (because that is sufficient to recognize and clear the rust), selection is quite effective at increasing the frequency of even a very rare resistance allele. In contrast, the parasite must carry virulence alleles at both loci in order to avoid recognition by hosts with even only a single resistance allele. Thus, if virulence alleles are initially very rare, rust individuals carrying virulence alleles at both loci are incredibly rare, and selection has a very difficult time increasing the frequency of the virulence alleles (Figure 1). Together with our earlier results about the sign of linkage disequilibrium, this shows that it is epistasis that causes our two locus model to differ in interesting ways from the single locus model we studied earlier in Chapter 2. 9 At this point, you should be wondering just how much we should trust our QLE approximation. As with any approximation, whether it be the QLE, quantitative genetics, or adaptive dynamics, it is important to evaluate robustness using simulations. In addition to allowing the generality of our conclusions to be explored, performing simulations also helps serve as a safeguard against straightforward algebraic errors. Although there are many ways to perform simulations, each of which makes its own set of assumptions, for our purposes it is sufficient to simply iterate the general recursion equations (X). Our goal is to push the limits of our QLE approximation to see when it “breaks”. As we already saw in Figure 1, when𝑠𝑋 = 𝑠𝑌 = 0.01 and 𝑟𝑋 = 𝑟𝑌 = 0.1 the predictions of our QLE approximation and exact simulations are more or less indistinguishable, even over 5,000 generations of coevolution. But what about scenarios where the interaction between Flax and Rust has large fitness consequences for the interacting species, perhaps something more along the lines of 𝑠𝑋 = 𝑠𝑌 = 0.1? In such cases, Figure 2 shows that significant errors begin to creep into our QLE approximation, which over sufficient amounts of time, lead to significant quantitative errors in our predictions for allele frequencies and linkage disequilibria. That said, the amount of error accruing over any single generation remains vanishingly small, and qualitative predictions such as the sign of linkage disequilibrium, remain entirely robust. What can we conclude from this exercise about the robustness of the QLE approximation? The answer is that it largely depends on the question you wish to address. If the goal is qualitative prediction, the QLE approximation often works remarkably well even when its basic assumptions are grossly violated. If, on the other hand, the goal is quantitative prediction, selection really does need to be quite weak and recombination quite frequent for accuracy to be maintained over thousands of generations. Answers to key questions What patterns of epistasis are likely to be generated by species interactions? Our exploration of coevolution between Flax and its pathogen M. lini, revealed interesting patterns of epistasis. Specifically, our QLE approximation revealed that the host plant, L. marginale, experiences negative epistasis because under the assumptions of our gene-for-gene model, a single resistance gene can be sufficient to recognize and clear the pathogen. In contrast, we found that epistasis within the rust, M. lini, was positive, a pattern that emerges because host recognition can be thwarted only by evading recognition at both loci. How do these patterns of epistasis influence the dynamics and outcome of coevolution? By studying coevolution using our QLE approximation and numerical simulation, we found that epistasis plays an important role in the dynamics of coevolution while having no real impact on its ultimate outcome. Specifically, because epistasis in the Flax is negative, significant fitness gains accrue to individuals carrying a resistance allele at only a single locus. Thus, selection is quite effective at increasing the frequency of resistance genes in the host. In contrast, because epistasis within the rust is positive, significant fitness gains accrue only to individuals carrying virulence alleles at both loci. Consequently, selection has a very hard time gaining traction on rare virulence alleles, thus slowing the rate at which they increase in frequency within the rust population. 10 What patterns of linkage disequilibrium do we expect to emerge in coevolving systems? In light of our observation that epistasis is negative within the Flax population but positive within the Rust population, it is perhaps not surprising that we also observe differences in the sign of linkage disequilibrium in the two species. Specifically, linkage disequilibrium within the Flax population tends to be negative whereas within the rust population it tends to be positive. What this means is that if we sample a Flax individual and find that it is carrying a resistance allele at one locus, it is less likely than expected based on allele frequencies that it is also carrying a resistance allele at another locus. In contrast, if we sample a rust individual and find that it is carrying a virulence allele at one locus, it is more likely than we would expect based on allele frequencies that it is also carrying a virulence allele at another locus. New Questions Arising: Our simple model of gene-for-gene coevolution between Flax and Flax-Rust suggests that …. Although thought provoking, the tenuous connection between this prediction and available empirical data immediately raises several important questions:  Are the patterns of epistasis and linkage disequilibrium we uncovered for our simple gene-forgene model also likely to occur in other species interactions?  If species interactions are mediated by quantitative traits, rather than molecular recognition, should we still expect epistasis and linkage disequilibrium to matter? In the next two sections, we will develop extensions of our two-locus model that will help us to answer these questions and gain further insight into the process of coevolution in multi-locus systems. Extensions Extension 1: Evaluating the consistency of epistasis and disequilibrium in coevolving interactions Just how general are the patterns of epistasis and linkage disequilibrium we observed in our investigation of gene-for-gene coevolution between Wild flax and flax rust? Perhaps the easiest way to answer this question is to jump right into developing and analyzing a model of coevolution for a very different interaction. Because it is still fresh in our memory from the previous chapter, let’s use the interaction between the snail XX and its schistosome parasite, XX, as our test case. As a quick refresher, recent empirical studies have identified two molecules that have been hypothesized to interact and play an important role in the outcome of an encounter between an individual snail and an individual schistosome. Specifically, snails deploy FREP molecules that bind to specific mucin molecules produced by the schistosome; when the FREP “matches” the mucin, the infection is cleared. In the previous chapter, we developed interaction matrices describing this molecular interaction under various assumptions, all of which involved only a single diploid locus. As many of you probably guessed, the assumption that the structure of these molecules depends on only a single genetic locus is most likely false (REFS). Given this information, let’s revisit this interaction and explore how it is likely to coevolve given a fresh set of genetic assumptions. To keep things simple, we are going to forget all about diploidy 11 and simply focus in on a scenario where FREP and mucin are produced by only a pair of haploid, diallelic loci in each species. The particular assumption we are going to make to adapt this interaction to the two-locus modeling framework we developed earlier in this chapter is that each genotype makes a FREP or mucin molecule with a unique conformation. JUSTIFY USING MOLECULAR SPECIFICS… I THINK THIS MEANS YOU MUST HAVE AN ACTIVE SITE THAT IS ENCODED BY TWO SEPARATE LOCI? If the genotypes of snail and shistosome match, the host FREP molecule binds to the parasite mucin molecule and the infection is cleared. If, in contrast, the genotypes of the two species do not match, the snail FREP fails to bind to the schistosome mucin and the infection succeeds. Together, these assumptions lead to the following interaction matrix describing the probability that a schistosome genotype evades recognition and successfully infects a host genotype: 0 1 𝛼=[ 1 1 1 0 1 1 1 1 0 1 1 1 ] 1 0 (23) where snail genotypes are in columns {AB, Ab, aB, ab} and schistosome genotypes are in rows {AB, Ab, aB, ab}. All we need to do now is plug the appropriate values of 𝛼 from (23) into the general expressions for coevolutionary change in two locus systems we developed previously (17-18). It probably comes as little surprise, however, that even after a lengthy session of algebra, there is really no way to write these exact recursion equations down in a way that really helps us to understand what the hell is going on. As before, this level of complexity suggests that it is time to deploy an approximation; in this case, the best possible approximation for the job is Quasi-Linkage Equilibrium. Developing a QLE approximation for coevolution between the snail B. glabrata and the schistosome parasite S. mansoni makes the exact same assumptions and follows the exact same steps as when we used it to study coevolution between wild flax and flax rust. In short, we assume selection is weak (order ε) and recombination sufficiently common for linkage disequilibrium to also be small (order ε). We then describe evolutionary change in allele frequencies and linkage disequilibrium using their first order Taylor Series exampsions in ε (see accompanying Mathematica notebook). The result is the following system of recursion equations describing evolutionary change in the snail: ∆𝑝𝑋,𝐴 ≈ −𝑠𝑋 𝑝X,A 𝑞X,A (1 − 2𝑝Y,A )(𝑞Y,B − 𝑝X,B (1 − 2𝑝Y,B )) (24a) ∆𝑝𝑋,𝐵 ≈ −𝑠𝑋 𝑝X,B 𝑞X,B (1 − 2𝑝Y,B )(𝑞Y,A − 𝑝X,A (1 − 2𝑝Y,A )) (24b) ∆𝐷𝑋 ≈ 𝑠𝑋 𝑝X,A 𝑞X,A 𝑝X,B 𝑞X,B (1 − 2𝑝Y,A )(1 − 2𝑝Y,B )(1 − 𝑟𝑋 ) − 𝑟𝑋 𝐷𝑋 (24c) and evolutionary change in the schistosome: ∆𝑝𝑌,𝐴 ≈ 𝑠𝑌 𝑝Y,A 𝑞Y,A (1 − 2𝑝X,A )(𝑞Y,B − 𝑝X,B (1 − 2𝑝Y,B )) (25a) ∆𝑝𝑌,𝐵 ≈ 𝑠𝑌 𝑝Y,B 𝑞Y,B (1 − 2𝑝X,B )(𝑞Y,A − 𝑝X,A (1 − 2𝑝Y,A )) (25b) 12 ∆𝐷𝑌 ≈ −𝑠𝑌 𝑝Y,A 𝑞Y,A 𝑝Y,B 𝑞Y,B (1 − 2𝑝X,A )(1 − 2𝑝X,B )(1 − 𝑟𝑌 ) − 𝑟𝑌 𝐷𝑌 (25c) where all terms of order ε2 and greater have been ignored. As before, our QLE approximation decouples coevolutionary changes in allele frequencies from changes in linkage disequilibrium, allowing us to solve for the quasi-equilibrium values of linkage disequilibrium: ̃𝑋 ≈ 𝑠𝑋 𝑝X,A 𝑞X,A 𝑝X,B 𝑞X,B (1−2𝑝Y,A )(1−2𝑝Y,B )(1−𝑟𝑋 ) 𝐷 𝑟 (26a) ̃𝑌 ≈ − 𝑠𝑌 𝑝Y,A 𝑞Y,A 𝑝Y,B 𝑞Y,B (1−2𝑝X,A )(1−2𝑝X,B )(1−𝑟𝑌 ) 𝐷 𝑟 (26b) 𝑋 𝑌 Just a quick look at these expressions reveals that they are quite different from those we found for flax and flax rust. Whereas the sign of linkage disequilibrium was constant and predictable for Flax and Flax Rust, these expressions contain terms that allow the sign of linkage disequilibrium to fluctuate as allele frequencies change over time within the snail and schistosome populations. The underlying cause of these changes in the sign and magnitude of linkage disequilibrium is what is often referred to as fluctuating epistasis (REFS). What this means is that the sign and magnitude of epistasis changes over time within snail and schistosome populations as differing combinations of alleles become more infective or more resistant than predicted by the individual alleles themselves. To be more specific, the terms (1 − 2𝑝YA )(1 − 2𝑝YB ) and (1 − 2𝑝XA )(1 − 2𝑝XB ) appearing in equations (26) measure which genotypes (and thus molecular structures of mucin and FREP molecules) are most common within the schistosome and snail populations. If the most common alleles within the schistosome population are A and B (or a and b), then the first of these two terms is positive, indicating epistatic selection in favor of AB (or ab) genotypes within the snail population, and a corresponding excess of these genotypes within the snail population. Similarly, if the second of these terms is positive, because the A and B alleles (or a and b) are the most frequent within the snail population, epistatic selection favors AB (or ab) genotypes within the schistosome population and a corresponding excess of these genotypes. Although these considerations clearly show that the potential exists for the sign of epistatic selection and linkage disequilibrium to fluctuate within this system, equations (26) make it equally clear that this will only occur if allele frequencies themselves fluctuate over time. We now know that drawing conclusions about the sign of linkage disequilibrium requires that we understand what is going on with allele frequencies within both snail and schistosome populations. Unfortunately, drawing definitive conclusions about the coevolution of allele frequencies within this system is much more challenging than it was for the gene-for-gene model we studied earlier in this chapter. As a consequence, simple inspection of the QLE approximation no longer suffices, and we must take the more formal mathematical approach of identifying equilibria and evaluating their local stability. If we follow the standard protocol and first identify equilibria by setting equations (24a,b and 25a,b) equal to zero and solving for the allele frequencies that satisfy the equality, we find that there are thirty possible equilibria!!! Clearly, it isn’t going to be possible to neatly summarize all thirty of these equilibria in a tidy table. Instead, let’s focus on just the five equilibria that exist, are not purely unstable, and help us to learn something important about coevolution between the snail B. glomurata and its schistosome parasite S. mansoni (Table X). 13 Table 2. A subset of equilibria and their local stability 𝑝𝑋,𝐴 𝑝𝑋,𝐵 𝑝𝑌,𝐴 𝑝𝑌,𝐵 Eigenvalues Stability 0 0 1 1 1, 1, 1, 1 Neutrally stable 0 1 1 0 1, 1, 1, 1 Neutrally stable 1 0 0 1 1, 1, 1, 1 Neutrally stable 1 1 0 0 1, 1, 1, 1 Neutrally stable 1/2 1 1 1 − ⅈ √𝑠𝑋 √𝑠𝑌 , 1 − ⅈ √𝑠𝑋 √𝑠𝑌 , 4 4 1 1 1 + ⅈ √𝑠𝑋 √𝑠𝑌 , 1 + ⅈ √𝑠𝑋 √𝑠𝑌 4 4 Cycles with increasing amplitude 1/2 1/2 1/2 Looking through this table of equilibria we see that many of the usual suspects are present, including equilibria corresponding to fixed mismatching of the two species as well as the fully polymorphic equilibrium where allele frequencies equal one half. A quick inspection of the local stability conditions reveals, however, that multi-locus genetics adds an interesting twist. Specifically, the equilibria corresponding to fixed mismatching, where the snail is completely unable to recognize and defend against the shistosome is neutrally stable. What this means, is that even if new mutations arise within a snail population that allow infections by the schistosome population to be recognized and cleared, they will not aggressively spread through the population. Instead, we expect any novel and advantageous mutants to remain hovering near a frequency of zero. How can this be the case? Why do these novel mutations not spread rapidly through the snail population? The answer to these questions can be found in the pattern of epistasis produced by the infection matrix (X). The key feature of this infection matrix that leads to these somewhat counterintuitive results is the requirement that an individual snail must have alleles that match those of the schistosome at both loci for the infection to be recognized and cleared. Because, any novel mutations in the snail population that match the corresponding resident allele in the schistosome population will inevitably be very, very rare, the chance of any individual snail carrying the “right” allele at both loci is vanishingly small. As a consequence, even though selection strongly favors increases in the frequency of alleles within the snail population that match those in the schistosome population, epistasis prevents them from spreading efficiently, greatly reducing the rate of adaptation within the snail. Turning our attention to the fully polymorphic equilibrium reveals that, near this equilibrium, allele frequencies behave in much the same way as we might expect for a single locus model. Specifically, our local stability analysis reveals eigenvalues consisting of a real component equal to one and an imaginary component. As we know well by now, in a discrete time system, eigenvalues like these suggest oscillations of increasing amplitude. Although we have observed similar cyclical dynamics in previous models, what makes cycles super cool in this case is their consequences for patterns of linkage disequilibrium. Earlier in this section we showed that the sign and magnitude of linkage disequilibrium in both snail and schistosome could, conceivably, fluctuate over time if allele frequencies also fluctuate. 14 We can now put this all together and predict what we should observe in the snail-schistosome system. If all alleles are relatively common, we expect to see oscillations in allele frequencies and oscillations in the sign and magnitude of linkage disequilibrium in snail and schistosome. The reason for this is largely that the particular combinations of mucin alleles that are best able to evade recognition by the most common snail FREP molecules and those combinations of snail FREP molecules best able to recognize the most common schistosome mucin molecules are favored by (epistatic) natural selection. In other words, we have demonstrated the potential for fluctuating epistasis in this interaction, one of the most perennially popular explanations for why coevolution might favor the evolution of sexual recombination (REFS). Unlike gene-for-gene coevolution between Flax and Flax-Rust, it appears that within the snailschistosome system the presence of two, epistatically interacting loci may qualitatively change the dynamics of the system. Specifically, although we still expect allele frequencies to oscillate if alternative forms of the allele are common, if both loci approach monomorphism our local stability analyses seem to suggest the system might get stuck in a state where the snail is unable to recognize and defend against the schistosome. This intriguing conclusion, along with our prediction of fluctuating disequilibrium, however, relies on very simple analyses of our QLE approximation. How robust are the predictions of our QLE approximation to cases where selection is strong or recombination infrequent? The easiest way to answer this important question is by simulating the exact recursion equations (X) and comparing the results to the predictions of our QLE approximation for specific combinations of parameters. Not surprisingly, these simulations reveal our QLE approximation is quite accurate as long as selection is weak and recombination frequent (Figures X and X). As selection becomes stronger relative to recombination, the quantitative accuracy of our QLE approximation begins to break down, although broad qualitative predictions (e.g., fluctuating LD) remain robust. The only place we really wind up in trouble is when recombination becomes very weak; in such cases we can really blow it by using the QLE! For instance, Figure 6 shows just how bad our QLE approximation can become when recombination is quite weak. That the QLE breaks down in this case should not come as any real surprise, however, since we knew going in that we were egregiously violating a critical assumption. This just shows, once again, the importance of evaluating the robustness of any approximation we use to gain analytical insight, whether it be the QLE, classical quantitative genetics, or adaptive dynamics. Extension 2: Evaluating the importance of epistasis and disequilibrium for quantitative traits So far in this chapter we have focused on systems where we have some insight, albeit incomplete, into how host and parasite genotypes might interact mechanistically to produce either infection or resistance. In many other cases, however, we may know only how individual phenotypes interact to produce various outcomes of an interaction. For instance, in Chapter 3 we studied coevolution between the cuckoo, XXX, and its warbler host, XXX, assuming interactions depended on egg coloration in the two species. We further assumed that egg coloration in both species was the product of a very large number of genes, each with only a very small impact on phenotype. How would our predictions about the dynamics and outcome of coevolution in this interaction change if egg 15 coloration were instead controlled only by a pair of loci in each species? Fortunately, it is possible to answer this question using the mathematical machinery we have already developed in this chapter. Our starting point for integrating the interaction between cuckoo and warbler into the twolocus population genetic framework we have developed in this chapter is to specify how individual phenotypes impact the outcome of encounters. To keep things consistent with our previous studies of coevolution between cuckoo and warbler in Chapter 3, we will assume the probability the host bird fails to recognize a counterfeit egg within its nest and rears it as its own, depends on the degree to which cuckoo and warbler egg coloration matches. This verbal description of egg recognition can be captured by a phenotype matching function of the form: 𝑃(𝑥, 𝑦) = 𝐸𝑥𝑝[−(𝑥 − 𝑦)2 ] (27a) which we saw in Chapter 3 can be approximated by its first order Taylor Series expansion: 𝑃(𝑥, 𝑦) ≈ 1 − 𝜔(𝑥 − 𝑦)2 (27b) where the parameter 𝜔 quantifies the ability of warblers to discriminate among eggs based on differences in coloration. For the approximation (27b) to be accurate, this parameter must be small relative to the distance in egg coloration between interacting individuals. The next step we must take is to specify how genotypes are translated into phenotypes. The simplest way this can be done is to assume phenotypes are determined additively, and that the phenotypic effect of each locus is equal, as shown in Table 3. With this translation between genotype and phenotype in hand, we can generate an interaction matrix by simply plugging the phenotypic values for each genotype from Table 3 into the interaction function (27b). Table 3. Genotypes and their associated phenotypes Genotype Egg coloration, z ab 0 aB 1 Ab 1 AB 2 The result of this substitution is the following interaction matrix describing the probability that a cuckoo with a particular genotype produces eggs that successfully masquerade as the eggs of a particular warbler genotype and thus avoid ejection: 1 1−𝜔 𝛼=[ 1−𝜔 1 − 4𝜔 1−𝜔 1 1 1−𝜔 1−𝜔 1 1 1−𝜔 1 − 4𝜔 1−𝜔 ] 1−𝜔 1 (28) where warbler genotypes are in columns {AB, Ab, aB, ab} and cuckoo genotypes are in rows {AB, Ab, aB, ab}. 16 To move forward with our analysis, we just need to substitute the values of the interaction matrix (28) into the general expressions for coevolutionary change (17-18) and analyze the resulting model. Not surprisingly, however, there is a catch. Specifically, making this substitution leads to an algebraic mess. Although it might, in principle, be possible to sort this mess out, a simpler and more efficient approach is to again apply a Quasi-Linkage Equilibrium approximation. As before, this requires that we assume selection is weak (order ε) and that recombination occurs with a sufficient frequency for linkage disequilibrium to be kept small (order ε). We then use first order Taylor Series expansions in ε (see accompanying Mathematica notebook) to derive approximate expressions for coevolutionary change in warbler: 𝜔𝑠 ∆𝑝𝑋,𝐴 ≈ 1−𝑠𝑋 𝑝X,A 𝑞X,A (1 + 2(𝑝X,B − 𝑝Y,A − 𝑝Y,B )) (29a) 𝑋 𝜔𝑠 ∆𝑝𝑋,𝐵 ≈ 1−𝑠𝑋 𝑝X,B 𝑞X,B (1 + 2(𝑝X,A − 𝑝Y,A − 𝑝Y,B )) (29b) 𝑋 ∆𝐷𝑋 ≈ 2𝜔𝑠𝑋 𝑝 𝑞 𝑝 𝑞 (1 − 𝑟𝑋 ) − 𝑟𝑋 𝐷𝑋 1−𝑠𝑋 X,A X,A X,B X,B (29c) and cuckoo: ∆𝑝𝑌,𝐴 ≈ −𝜔𝑠𝑌 𝑝Y,A 𝑞Y,A (1 + 2(𝑝Y,B − 𝑝X,A − 𝑝X,B )) (30a) ∆𝑝𝑌,𝐵 ≈ −𝜔𝑠𝑌 𝑝Y,B 𝑞Y,B (1 + 2(𝑝Y,A − 𝑝X,A − 𝑝X,B )) (30b) ∆𝐷𝑌 ≈ −2𝜔𝑠𝑌 𝑝Y,A 𝑞Y,A 𝑝Y,B 𝑞Y,B (1 − 𝑟𝑌 ) − 𝑟𝑌 𝐷𝑌 (30c) where all terms of order ε2 and greater have been ignored. As we have seen previously in this chapter, because the QLE approximation decouples coevolutionary changes in allele frequencies from changes in linkage disequilibrium, we can solve for the quasi-equilibrium values of linkage disequilibrium without thinking about evolution of allele frequencies themselves. Pursuing this strategy by setting ∆𝐷𝑋 and ∆𝐷𝑌 equal to zero and solving for 𝐷𝑋 and 𝐷𝑌 yields the following QLE solutions for linkage disequilibrium in warbler and cuckoo: ̃𝑋 ≈ 2𝜔𝑠𝑋 𝑝X,A 𝑞X,A 𝑝X,B 𝑞X,B (1−𝑟𝑋 ) 𝐷 (1−𝑠 )𝑟 (31a) ̃𝑌 ≈ − 2𝜔𝑠𝑌 𝑝Y,A 𝑞Y,A 𝑝Y,B 𝑞Y,B (1−𝑟𝑌 ) 𝐷 (31b) 𝑋 𝑋 𝑟𝑌 If we inspect these expressions for even a minute, it becomes clear that the sign of linkage disequilibrium within warbler and cuckoo populations will be quite predictable. Specifically, equations (31) demonstrate that LD will always be positive within the warbler population but negative within the cuckoo population. Why should this be the case? The easiest way to understand why LD should differ in sign between the two species is to return to thinking about phenotypes and their interaction with fitness. Before we tie these things together, however, it is important to realize that the QLE approximation guarantees that phenotypes follow a unimodal, and unskewed, frequency distribution. 17 The simple reason for this is that when a phenotype is determined by multiple loci, bimodality (or multimodality for that matter) and skew can only arise through the build-up of linkage disequilibrium. Of course, since our QLE approximation applies only under conditions where LD remains small, we know that it also applies only to cases where phenotype distributions remain unimodal and unskewed. Now, if egg color distributions in both warbler and cuckoo are unimodal and unskewed, the best possible egg coloration for the cuckoo is that which matches the average egg coloration of the warbler population; any cuckoo whose egg coloration score is either smaller than, or larger than, the optimal coloration defined by the warbler will have reduced fitness. As a consequence, selection acting on the cuckoo population is of a stabilizing form, resulting in an excess of intermediate phenotypes corresponding to negative linkage disequilibrium (Figure Xa). For the warbler, however, the best possible egg coloration is that which is the furthest away from the average egg coloration of the cuckoo population. Thus, selection favors cuckoos with egg coloration scores that are either larger than the warbler average OR smaller than the warbler average. Thus, selection acting on the cuckoo is of a disruptive form, favoring the buildup of positive linkage disequilibrium (Figure Xb). We now know that linkage disequilibrium takes a particularly simple and intuitive form in the interaction between warbler and cuckoo. In order to build on this understanding, we now turn to an analysis of allele frequency dynamics. Unfortunately, our expressions for allele frequency change cannot be studied through inspection alone and we must turn to a formal analysis of equilibria and local stability. Following the same old standard procedure reveals that there are 49 possible equilibria. Of course, many of these equilibria do not even exist (see Mathematica notebook), and of those that do, none are ever locally stable. To be more specific, if we walk down this lengthy list of equilibria and stop to inspect the four eigenvalues associated with each, it becomes apparent that in each and every case, at least one of the eigenvalues has a real component greater than one. As we have seen previously for discrete time systems, having even one eigenvalue with a real component greater than unity is sufficient to guarantee instability. What this observation suggests, particularly when combined with the observation that many of the equilibria are associated with imaginary eigenvalues, is that the only possible outcome of coevolution between warbler and cuckoo is perpetual allele frequency cycles. Hopefully, at this point at least some of you are wondering how this can possibly be correct when back in Chapter 3 we predicted that this same interaction between warblers and cuckoos would lead to either a stable equilibrium where egg coloration in the two species match or an endless coevolutionary chase where egg coloration became ever more extreme in both species. What changed? Although there are many ways to answer this question, the simplest is to point out that the genetic architecture of traits involved in coevolutionary interactions matters, and it matters a lot. Whereas in Chapter 3 we assumed egg coloration depended on a very large number of genes, each with a very small phenotypic effect, we are now assuming egg coloration is controlled by only two loci, each with a substantive impact on phenotype. An important consequence of this difference is that in Chapter 3 the genetic variance associated with egg coloration could not evolve (by assumption), whereas the genetic variance can evolve quite rapidly in the population genetic framework we study here. It is well known that this difference can lead to significantly different outcomes to the coevolutionary process (REFS). 18 At this point, we have gleaned all the really useful information we can from our QLE approximation. The easiest way to push things a little bit further is to use numerical iteration of the more exact recursion equations defined by (17-18) and (28) to explore the robustness of our QLE predictions. Not too surprisingly, these numerical analyses show that our QLE approximation works very well when the parameter 𝜔, and thus the strength of selection, is small and the rate of recombination is relatively large by comparison (Figure 7). In such cases, the only possible outcome is fluctuations in the allele frequencies of warbler and cuckoo accompanied by positive linkage disequilibrium in the warbler and negative linkage disequilibrium in the cuckoo. As the parameter, 𝜔, becomes larger or the rate of recombination smaller, the quantitative accuracy of our QLE approximation deteriorates as we should expect (Figure 8). Interestingly, however, even under such conditions, it appears that the qualitative predictions of our QLE approximation remain entirely correct: allele frequencies cycle and LD is positive within the warbler population but negative within the cuckoo population. The reason I use the caveat “appears” in this case, is because numerical investigations can rarely investigate the entire parameter space leaving open the possibility that somewhere there exist parameter combinations that yield alternative outcomes but which we failed to explore. Conclusions and Synthesis Once coevolutionary interactions depend upon multiple loci, epistatic selection and linkage disequilibrium have the potential to become powerful forces in the coevolutionary process. For instance, our exploration of coevolution between Rust and Flax revealed that although including two loci does not impact the outcome of coevolution it does allow epistasis to impact the rate. We saw something quite different when we explored coevolution between schistosome and snail. Here epistastic selection changes everything and allows for patterns of fluctuating disequilibria… Finally, our consideration of coevolution between the warbler, x, and the parasitic cuckoo, x, has revealed that the assumptions we make about the genetic architecture underlying the traits of coevolving species can have significant impacts on our expectations. Specifically, in This result demonstrates that if we are to accurately predict the outcome and dynamics of coevolution we need to know something about the way in which Much as we saw with dominance in the previous chapter, plays an important role in the dynamics and outcome of coevolution. In addition to building up linkage disequilibrium between loci, epistatic selection causes changes in the frequency of alleles at one locus to depend on the frequency of alleles at another. Ultimately, these interact We know so very little about the actual genetic architecture and patterns of epistasis within real systems it is almost shocking; certainly humbling. This, along with dominance, is the frontier of coevolutionary genetics. In summary, 19 20 References Figure Legends Dybdahl, M. F., C. E. Jenkins, and S. L. Nuismer. 2014. Identifying the Molecular Basis of Host-Parasite Coevolution: Merging Models and Mechanisms. AMERICAN NATURALIST 184:1-13. Mitta, G., C. M. Adema, B. Gourbal, E. S. Loker, and A. Theron. 2012. Compatibility polymorphism in snail/schistosome interactions: From field to theory to molecular mechanisms. Developmental and Comparative Immunology 37:1-8. 21

Chapter 6. Multi-locus coevolution, epistasis, and linkage

Related documents

Products

Support

Chapter 6. Multi-locus coevolution, epistasis, and linkage

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib