Chapter 6. Multi-locus coevolution, epistasis, and linkage disequilibrium Biological Motivation Obviously, more than a dsingle locus is involved. Here we develop a basic framework for studying two locus systems introducing the concepts of epistasis, recombination, and linkage disequilibrium. After studying how coevolution proceeds in a simple two locus system (motivated by???) we move on to explore ??? INTRODUCE EPISTASIS AND LINKAGE DISEQUILIBRIUM Key Questions: ο· ο· ο· What patterns of epistasis are likely to be generated by species interactions? How do these patterns of epistasis influence the dynamics and outcome of coevolution? What patterns of linkage disequilibrium do we expect to emerge in coevolving systems? Building a 2-locus model of coevolution Our goal is to develop the simplest possible model that captures the potentially important consequences of the multi-locus gene-for-gene interactions for coevolution between X and X. Clearly, the simplest starting point is to focus on only a single pair of loci and haploid sexual species. Within haploid sexuals, recombination occurs in a transient diploid phase but selection occurs in the haploid phase. Thus, we avoid the complexities of diploidy that we struggled with in the previous chapter. Of course, ignoring diploidy also comes at the cost of reduced realism since both XX and XX are, indeed, diploid species. We imagine that rusts and flax’s run into each other at random, and that when a Flax individual with genotype i encounters a rust individual with genotype j, an infection results with probability πΌπ,π . If we assume that infection has negative fitness consequences for the flax and positive fitness consequences for the rust, the fitness of the four possible Flax genotypes is given by: ππ,π΄π΅ = 1 − π π (ππ΄π΅ πΌπ΄π΅,π΄π΅ + ππ΄π πΌπ΄π΅,π΄π + πππ΅ πΌπ΄π΅,ππ΅ + πππ πΌπ΄π΅,ππ ) (1a) ππ,π΄π = 1 − π π (ππ΄π΅ πΌπ΄π,π΄π΅ + ππ΄π πΌπ΄π,π΄π + πππ΅ πΌπ΄π,ππ΅ + πππ πΌπ΄π,ππ ) (1b) ππ,ππ΅ = 1 − π π (ππ΄π΅ πΌππ΅,π΄π΅ + ππ΄π πΌππ΅,π΄π + πππ΅ πΌππ΅,ππ΅ + πππ πΌππ΅,ππ ) (1c) ππ,ππ = 1 − π π (ππ΄π΅ πΌππ,π΄π΅ + ππ΄π πΌππ,π΄π + πππ΅ πΌππ,ππ΅ + πππ πΌππ,ππ ) (1d) Similarly, the fitness of the four possible Rust genotypes is given by: ππ,π΄π΅ = 1 − π π (1 − ππ΄π΅ πΌπ΄π΅,π΄π΅ − ππ΄π πΌπ΄π,π΄π΅ − πππ΅ πΌππ΅,π΄π΅ − πππ πΌππ,π΄π΅ ) (2a) ππ,π΄π = 1 − π π (1 − ππ΄π΅ πΌπ΄π΅,π΄π − ππ΄π πΌπ΄π,π΄π − πππ΅ πΌππ΅,π΄π − πππ πΌππ,π΄π ) (2b) Mathematica Resources: http://www.webpages.uidaho.edu/~snuismer/Nuismer_Lab/the_theory_of_coevolution.htm ππ,ππ΅ = 1 − π π (1 − ππ΄π΅ πΌπ΄π΅,ππ΅ − ππ΄π πΌπ΄π,ππ΅ − πππ΅ πΌππ΅,ππ΅ − πππ πΌππ,ππ΅ ) (2c) ππ,ππ = 1 − π π (1 − ππ΄π΅ πΌπ΄π΅,ππ − ππ΄π πΌπ΄π,ππ − πππ΅ πΌππ΅,ππ − πππ πΌππ,ππ ) (2d) Now, if we assume that the probability of survival to mating for the various Flax and Rust genotypes depends on these fitnesses, we can calculate the frequency of each genotype after selection but prior to random mating. As before, we can calculate these frequencies by multiplying the current frequency by its relative fitness. For the Flax, this yields the following expressions: ′ ππ΄π΅ = ππ΄π΅ ππ,π΄π΅ Μ π π (3a) ′ ππ΄π = ππ΄π ππ,π΄π Μ π π (3b) ′ πππ΅ = πππ΅ ππ,ππ΅ Μ π π (3c) ′ πππ = πππ ππ,ππ Μ π π (3d) Μ π is the population mean fitness of species X and is given by: where, as usual, the symbol π Μ π = ππ΄π΅ ππ,π΄π΅ + ππ΄π ππ,π΄π + πππ΅ ππ,ππ΅ + πππ ππ,ππ π (3e) The same procedure can now be applied to the rust population to calculate the frequency of two-locus genotypes there after selection but prior to mating: ′ ππ΄π΅ = ππ΄π΅ ππ,π΄π΅ Μ π π (4a) ′ ππ΄π = ππ΄π ππ,π΄π Μ π π (4b) ′ πππ΅ = πππ΅ ππ,ππ΅ Μ π π (4c) ′ πππ = πππ ππ,ππ Μ π π (4d) Μ π is the population mean fitness of species X and is given by: where, as usual, the symbol π Μ π = ππ΄π΅ ππ,π΄π΅ + ππ΄π ππ,π΄π + πππ΅ ππ,ππ΅ + πππ ππ,ππ π (4e) OK, so now we know what the frequencies of the various genotypes are just before mating ensues. How can we now move forward to incorporate changes to genotype frequencies that accrue during the process of mating? If we are willing to assume that both Flax and Rust mate at random and have quite large population sizes, we can derive basic expressions for changes in genotype frequencies. The long and 2 tedious way to go about this is to first tabulate the frequency of offspring with various genotypes that are produced by all possible combinations of parents (Table 1). RECOMBINATION! INTRODUCE IT HERE Table 1. Genotype frequencies produced by random matings Maternal|Paternal genotypes AB|AB AB|Ab AB|aB AB|ab Ab|AB Ab|Ab Ab|aB Ab|ab aB|AB aB|Ab aB|aB aB|ab ab|AB ab|Ab ab|aB ab|ab Frequency of mating AB ππ΄π΅ ππ΄π΅ ππ΄π΅ ππ΄π ππ΄π΅ πππ΅ ππ΄π΅ πππ ππ΄π΅ ππ΄π΅ ππ΄π΅ ππ΄π ππ΄π΅ πππ΅ ππ΄π΅ πππ ππ΄π΅ ππ΄π΅ ππ΄π΅ ππ΄π ππ΄π΅ πππ΅ ππ΄π΅ πππ ππ΄π΅ ππ΄π΅ ππ΄π΅ ππ΄π ππ΄π΅ πππ΅ ππ΄π΅ πππ 1 1/2 1/2 (1 − π)/2 1/2 0 π/2 0 1/2 π/2 0 0 (1 − π)/2 0 0 0 Offspring genotype Ab aB 0 1/2 0 π/2 1/2 1 (1 − π)/2 1/2 0 (1 − π)/2 0 0 π/2 1/2 0 0 0 0 1/2 π/2 0 0 (1 − π)/2 0 1/2 (1 − π)/2 1 1/2 π/2 0 1/2 0 ab 0 0 0 (1 − π)/2 0 0 π/2 1/2 0 π/2 0 1/2 (1 − π)/2 1/2 1/2 1 What Table 1 provides us with is the raw material for calculating the frequency of the various genotypes in the offspring generation. All we need to do now is sum up the entries in each column, weighting each entry by the frequency with which the two relevant parental genotypes encounter one another at random and mate. Mathematically, this amounts to evaluating the following expression for each of the four possible offspring genotypes, i: ππ′′ = ∑4π=1 ∑4π=1 ππ′ ππ′ Ππ,π+π→π (5a) and the following expression for the four possible offspring genotype in Rust: ππ′′ = ∑4π=1 ∑4π=1 ππ′ ππ′ Ππ,π+π→π (5b) where Ππ,π+π→π and Ππ,π+π→π are the probability that two parents with genotypes j and k produce an offspring of genotype i within the Flax and Rust populations, respectively, and are given in the offspring genotype columns of Table 1. Although equations (5) help to see, mechanistically speaking, how the genotype frequencies within one generation are translated into those of the next through the process of segregation and recombination, they are quite clunky and not terribly insightful. Fortunately, these equations can be greatly simplified and re-expressed in a way that is much easier to implement from a practical 3 standpoint. Specifically, plugging away at equations (5) algebraically for a while (or even a great while) allows them to be re-written as: ′′ ′ ππ΄π΅ = ππ΄π΅ + ππ π·π′ (6a) ′′ ′ ππ΄π = ππ΄π − ππ π·π′ (6b) ′′ ′ πππ΅ = πππ΅ − ππ π·π′ (6c) ′′ ′ πππ = πππ + ππ π·π′ (6d) in the Flax and as: ′′ ′ ππ΄π΅ = ππ΄π΅ + ππ π·π′ (7a) ′′ ′ ππ΄π = ππ΄π − ππ π·π′ (7b) ′′ ′ πππ΅ = πππ΅ − ππ π·π′ (7c) ′′ ′ πππ = πππ + ππ π·π′ (7d) in the rust. In these equations, DX and DY quantify linkage disequilibrium, a measure of the statistical ′ ′ association (i.e., the covariance) between alleles at the A and B loci. Specifically, π·π′ = ππ΄π΅ πππ − ′ ′ ′ ′ ′ ′ ′ ππ΄π πππ΅ and π·π = ππ΄π΅ πππ − ππ΄π πππ΅ such that linkage disequilibrium is positive if there is an excess of AB and ab genotypes within a population and negative if it is, instead, the Ab and aB genotypes that are in excess. A key insight provided by equations (6-7) is that the change in genotype frequencies that occurs in response to random mating depends entirely on the rate of recombination. If no recombination occurs, genotype frequencies within the offspring population remain identical to those within the parental population. If, instead, recombination occurs, genotype frequencies in the offspring generation differ from those in the parental generation by an amount proportional to linkage disequilibrium. Clearly, then, recombination can influence coevolution only in cases where coevolutionary selection, or some other evolutionary force, acts to create linkage disequilibrium within populations of interacting species. We are now at a point where we have successfully described how genotype frequencies change over the course of a single generation. To maintain some generality, let’s wait to substitute in the specific values for fitness corresponding to our GFG model, and simply express how genotype frequencies change in terms of arbitrary fitness values, W. Specifically, subsitututing (3) into (6) and changing from recursion equations to difference equations, yields the following expressions for the change in host genotype frequencies that occurs over the course of a single generation: βππ΄π΅ = Μ π )π Μ π ππ (ππ,ππ΅ ππ,π΄π πππ΅ ππ΄π −ππ,ππ ππ,π΄π΅ πππ ππ΄π΅ )+ππ΄π΅ (ππ,π΄π΅ −π 2 Μ ππ (8a) βππ΄π = Μ π )π Μ π ππ (ππ,ππ ππ,π΄π΅ πππ ππ΄π΅ −ππ,ππ΅ ππ,π΄π πππ΅ ππ΄π )+ππ΄π (ππ,π΄π −π Μ π2 π (8b) 4 βπππ΅ = Μ π )π Μ π ππ (ππ,ππ ππ,π΄π΅ πππ ππ΄π΅ −ππ,ππ΅ ππ,π΄π πππ΅ ππ΄π )+πππ΅ (ππ,ππ΅ −π 2 Μ ππ (8c) βπππ = Μ π )π Μ π ππ (ππ,ππ΅ ππ,π΄π πππ΅ ππ΄π −ππ,ππ ππ,π΄π΅ πππ ππ΄π΅ )+πππ (ππ,ππ −π Μ π2 π (8d) Equations for the pathogen species, Y, are essentially identical and so are not shown. We are now to a point where we could, if we wished, simply simulate the process of coevolution by plugging in the values for fitness we derived previously for the GFG system (EQUSTIONS X) and iterating equations (X). Although this approach would surely provide us with some insights into the process of a coevolution, a much more insightful and elegant approach is to first make a change of variables (Appendix 3) that allows us to focus on allele frequencies and linkage disequilibrium rather than genotype frequencies. In addition to facilitating biological interpretation and intuition, this change of variables simplifies our model by reducing the number of variables we follow from four in equations (X) to three, which is the actual number of degrees of freedom in the system. In order to make the change of variables from genotype frequencies to allele frequencies and linkage disequilibrium, we first need to clearly define the new variables. Specifically, we define allele frequencies: ππ,π΄ = ππ΄π΅ + ππ΄π (9a) ππ,π΅ = ππ΄π΅ + πππ΅ (9b) ππ,π΄ = ππ΄π΅ + ππ΄π (9c) ππ,π΅ = ππ΄π΅ + πππ΅ (9d) and linkage disequilibrium: π·π = ππ΄π΅ πππ − ππ΄π πππ΅ (10a) π·π = ππ΄π΅ πππ − ππ΄π πππ΅ (10b) for both of the interacting species. The next step in our change of variables is to write down new recursions that capture the way in which our new variables change over the course of a single generation. The easiest way to do this is to just substitute the predicted values for the genotype ′′ ′′ frequencies in the next generation (e.g., ππ΄π΅ , ππ΄π, etc.) into expressions (9-10), yielding: ′′ ππ,π΄ = ππ,π΄π ππ΄π +ππ,π΄π΅ ππ΄π΅ Μ π π (11a) ′′ ππ,π΅ = ππ,ππ΅ πππ΅ +ππ,π΄π΅ ππ΄π΅ Μ π π (11b) π·π′′ = (ππ,ππ΅ ππ,π΄π πππ΅ ππ΄π −ππ,ππ ππ,π΄π΅ πππ ππ΄π΅ )(ππ −1) Μ π2 π 5 (11c) ′′ ππ,π΄ = ππ,π΄π ππ΄π +ππ,π΄π΅ ππ΄π΅ Μ π π (12a) ′′ ππ,π΅ = ππ,ππ΅ πππ΅ +ππ,π΄π΅ ππ΄π΅ Μ π π (12b) π·π′′ = (ππ,ππ΅ ππ,π΄π πππ΅ ππ΄π −ππ,ππ ππ,π΄π΅ πππ ππ΄π΅ )(ππ −1) Μ π2 π (12c) Obviously, we still have a bit of a problem! Our equations now contain a mix of old and new variables which can never be a good thing. The way to move forward is to recognize that the genotype frequencies appearing in the right hand sides of the equations can be re-written using definitions (9-10) in the following way: ππ΄π΅ = ππ,π΄ ππ,π΅ + π·π (13a) ππ΄π = ππ,π΄ ππ,π΅ − π·π (13b) πππ΅ = ππ,π΄ ππ,π΅ − π·π (13c) πππ = ππ,π΄ ππ,π΅ + π·π (13d) ππ΄π΅ = ππ,π΄ ππ,π΅ + π·π (14a) ππ΄π = ππ,π΄ ππ,π΅ − π·π (14b) πππ΅ = ππ,π΄ ππ,π΅ − π·π (14c) πππ = ππ,π΄ ππ,π΅ + π·π (14d) Substituting (13 and 14) into (11 and 12) and doing a bit of algebra allows us to finally complete our change of variables and arrive at a set of equations expressed entirely in terms of the new variables. ′′ ππ,π΄ = ππ,π΄ (ππ,π΅ ππ,π΄π +ππ,π΅ ππ,π΄π΅ )+(ππ,π΄π΅ −ππ,π΄π )π·π ′′ ππ,π΅ = ππ,π΅ (ππ,π΄ ππ,ππ΅ +ππ,π΄ ππ,π΄π΅ )+(ππ,π΄π΅ −ππ,ππ΅ )π·π π·π′′ = (15a) _ ππ (15b) _ ππ (ππ,ππ΅ ππ,π΄π (ππ,π΄ ππ,π΅ −π·π )(ππ,π΄ ππ,π΅ −π·π )−ππ,ππ ππ,π΄π΅ (ππ,π΄ ππ,π΅ +π·π )(ππ,π΄ ππ,π΅ +π·π ))(ππ −1) Μ π2 π (15c) and, ′′ ππ,π΄ = ππ,π΄ (ππ,π΅ ππ,π΄π +ππ,π΅ ππ,π΄π΅ )+(ππ,π΄π΅ −ππ,π΄π )π·π Μ π π (16a) ′′ ππ,π΅ = ππ,π΅ (ππ,π΄ ππ,ππ΅ +ππ,π΄ ππ,π΄π΅ )+(ππ,π΄π΅ −ππ,ππ΅ )π·π Μ π π (16b) 6 π·π′′ = (ππ,ππ΅ ππ,π΄π (ππ,π΄ ππ,π΅ −π·π )(ππ,π΄ ππ,π΅ −π·π )−ππ,ππ ππ,π΄π΅ (ππ,π΄ ππ,π΅ +π·π )(ππ,π΄ ππ,π΅ +π·π ))(ππ −1) Μ π2 π (16c) We now have a set of equations describing how allele frequencies and linkage disequilibrium evolve in response to natural selection and random mating over the course of a single generation. Our last move is to re-write these recursions as difference equations by subtracting their values at the start of the generation from (15-16): _ βππ,π΄ = ππ,π΄ (ππ,π΅ ππ,π΄π΅ +ππ,π΅ ππ,π΄π −ππ )+(ππ,π΄π΅ −ππ,π΄π )π·π (17a) _ ππ _ βππ,π΅ = βπ·π = ππ,π΅ (ππ,π΄ ππ,ππ΅ +ππ,π΄ ππ,π΄π΅ −ππ )+(ππ,π΄π΅ −ππ,ππ΅ )π·π (17b) _ ππ Μ π2 (ππ,ππ΅ ππ,π΄π (ππ,π΄ ππ,π΅ −π·π )(ππ,π΄ ππ,π΅ −π·π )−ππ,ππ ππ,π΄π΅ (ππ,π΄ ππ,π΅ +π·π )(ππ,π΄ ππ,π΅ +π·π ))(ππ −1)−π·π π Μ π2 π (17c) and, βππ,π΄ = Μ π )+(ππ,π΄π΅ −ππ,π΄π )π·π ππ,π΄ (ππ,π΅ ππ,π΄π +ππ,π΅ ππ,π΄π΅ −π Μ ππ (18a) βππ,π΅ = Μ π )+(ππ,π΄π΅ −ππ,ππ΅ )π·π ππ,π΅ (ππ,π΄ ππ,ππ΅ +ππ,π΄ ππ,π΄π΅ −π Μ π π (18b) βπ·π = Μ π2 (ππ,ππ΅ ππ,π΄π (ππ,π΄ ππ,π΅ −π·π )(ππ,π΄ ππ,π΅ −π·π )−ππ,ππ ππ,π΄π΅ (ππ,π΄ ππ,π΅ +π·π )(ππ,π΄ ππ,π΅ +π·π ))(ππ −1)−π·π π Μ π2 π (18c) PHHHEEEEWWWWYYYY! We have done it. With the bulk of the tedious algebraic book-keeping behind us, we can finally move on. Analyzing the model We can now transform the general two locus model described by difference equations (17-18) into a specific model of coevolution between Flax and Flax-Rust by replacing the general values of fitness W with their specific given by equations (X) and values of the interaction matrix appropriate for our gene-for-gene model: 0 1 πΌ=[ 1 1 1 0 1 1 1 1 0 1 1 1 ] 1 0 (19) Here, the interaction matrix depicts the outcome of the classical gene-for-gene model where host resistance genes (A and B) are able to recognize parasite avirulence genes (a and b), but not parasite virulence genes (A and B). Even after working long and hard to simplify the resulting equations, however, I couldn’t get them to fit on a single line of this page. As a general rule of thumb, if your equation doesn’t fit on a single line, you aren’t going to learn much from it. So, what can we do? One option is to charge straight ahead and 7 simply rely on our computer to simulate coevolution by iterating our recursion equations for a large number of parameter combinations. Although there is nothing wrong with this approach, we can actually gain quite a bit of biological insight by using a bit more mathematical finesse, and developing approximations that assume selection is not too strong and that recombination occurs with some reasonable frequency. To be a bit more specific, one way to proceed is to pursue a Quasi-Linkage Equilibrium (QLE) approximation (REFS). Although a great deal of difficult math has gone into rigorous mathematical investigation when and where the QLE approximation can be applied (REFS), our approach here will be more informal and, I hope, more practical for those who simply want to learn something about biology rather than mathematics. As a general rule of thumb, anytime selection is not too strong (less than a 1% difference in fitness among genotypes) and recombination is of a larger magnitude than selection (if the fitness difference among genotypes is 1%, recombination should be at least 0.1 or greater), linkage disequilibrium will change much more rapidly than allele frequencies and will, in fact, approach a quasiequilibrium state where its value is small, and a function of the current allele frequencies within the population. What this means to us is that if selection is weak (< 1%) and recombination is frequent (>10%), linkage disequilibrium will be as small as selection (< 0.01). As a result, as long as we are willing to tolerate some small amount of inaccuracy in our prediction, we can ignore all terms in our difference equations that include things like s2, D2, and s*D because these terms will all be very small and quite negligible. To be a bit more formal, if we are willing to assume recombination is frequent, and that selection is weak and of some small order ε, linkage disequilibrium will also be weak and of order ε, allowing us to ignore all terms of order ε2 and higher. Clearly, what this means is that our QLE approximation will be more and more accurate as the difference in fitness among genotypes decreases, because the terms we ignore become ever smaller in relation to the terms we keep. Returning to the specific case of coevolution between Flax and Flax Rust, what we are going to assume is that the fitness consequences of the interaction are relatively weak such that π π and π π are both of small order ε, and that recombination within both species is relatively frequent (i.e., > ε). Our next step in implementing our QLE approximation is to replace each of our difference equations with its first order Taylor Series Expansion in ε. Using Mathematica, this is an incredibly trivial thing to do, and yields the following approximate expressions for evolutionary change in the Flax: βππ,π΄ ≈ π π ππ,π΄ ππ,π΄ πππ΄ (1 − πππ΅ πππ΅ ) (20a) βππ,π΅ ≈ π π πππ΅ πππ΅ πππ΅ (1 − πππ΄ πππ΄ ) (20b) βπ·π ≈ −π π (1 − ππ )πXA πXA πXB πXB πYA πYB − ππ π·π (20c) and Flax Rust: βππ,π΄ ≈ π π πYA πYA πXA (1 − πXB πYB ) (21a) βππ,π΅ ≈ π π πYB πYB πXB (1 − πXA πYA ) (21b) βπ·π ≈ π π (1 − ππ )πYA πYA πYB πYB πXA πXB − ππ π·π (21c) 8 The beauty of the QLE approximation, and the primary reason for using it, (other than the fact that it makes truly lovely equations) is that it allows us to “see” things about the biology of a system that we might otherwise spend hours upon hours simulating and still never pick up on. For instance, even a passing inspection of our approximation reveals that coevolutionary change in allele frequencies is independent of linkage disequilibrium. As a result, we can solve for the quasi-equilibrium values of linkage disequilibrium by simply setting (20c and 21c) equal to zero and solving for π·π and π·π : Μπ ≈ − π π (1−ππ )πXA πXA πXB πXB πYA πYB π· π (22a) Μπ ≈ π π (1−ππ )πYA πYA πYB πYB πXA πXB π· π (22b) π π Remarkably, this shows that the sign of linkage disequilibrium should be different in Flax and Flax rust. Specifically, linkage disequilibrium between resistance genes within the flax should always be negative whereas linkage disequilibrium between virulence genes in the rust should always be positive. The biological reason for this intriguing pattern is that Flax individuals receive a fitness benefit by carrying a single resistance gene at either locus (A or B) whereas rust individuals must carry virulence alleles at both loci (A and B) in order to evade detection and elimination by the host. Consequently, within the Flax population, the quantity ππ,π΄π΅ ππ,ππ − ππ,π΄π ππ,ππ΅ is negative, indicating it experiences negative epistasis. In contrast, within the Rust population the quantity ππ,π΄π΅ ππ,ππ − ππ,π΄π ππ,ππ΅ is positive, indicating the rust population experiences positive epistasis. Our QLE approximation has already unearthed a valuable insight about our expectations for the form of epistasis and sign of linkage disequilibrium that we expect to emerge from GFG coevolution. Can we push our QLE approximation further to learn about the dynamics and outcomes of coevolution? The place to start is with an analysis of allele frequency change. Inspecting equations (19a-b, 20a-b) reveals that as long as genetic variation exists at all loci, host resistance genes and parasite virulence genes will increase in frequency (Figure 1). Only when the parasite fixes both virulence alleles, or the host has no resistance alleles at either locus, does coevolution cease. This picture of coevolutionary dynamics is remarkably similar to what we saw when we studied gene-for-gene coevolution in a haploid, single locus model. Only when we study the relative rates of coevolution in the two species do we see the novel twist that multiple loci and epistasis bring to the table. Specifically, if both Flax and Rust initially have very low frequencies of resistance and virulence alleles, the Flax population will evolve resistance much more rapidly than the rust can evolve to overcome it (Figure 1). The reason for this striking difference in coevolutionary rates is, again, epistasis. Because the host realizes fitness benefits by having a resistance allele at a single locus (because that is sufficient to recognize and clear the rust), selection is quite effective at increasing the frequency of even a very rare resistance allele. In contrast, the parasite must carry virulence alleles at both loci in order to avoid recognition by hosts with even only a single resistance allele. Thus, if virulence alleles are initially very rare, rust individuals carrying virulence alleles at both loci are incredibly rare, and selection has a very difficult time increasing the frequency of the virulence alleles (Figure 1). Together with our earlier results about the sign of linkage disequilibrium, this shows that it is epistasis that causes our two locus model to differ in interesting ways from the single locus model we studied earlier in Chapter 2. 9 At this point, you should be wondering just how much we should trust our QLE approximation. As with any approximation, whether it be the QLE, quantitative genetics, or adaptive dynamics, it is important to evaluate robustness using simulations. In addition to allowing the generality of our conclusions to be explored, performing simulations also helps serve as a safeguard against straightforward algebraic errors. Although there are many ways to perform simulations, each of which makes its own set of assumptions, for our purposes it is sufficient to simply iterate the general recursion equations (X). Our goal is to push the limits of our QLE approximation to see when it “breaks”. As we already saw in Figure 1, whenπ π = π π = 0.01 and ππ = ππ = 0.1 the predictions of our QLE approximation and exact simulations are more or less indistinguishable, even over 5,000 generations of coevolution. But what about scenarios where the interaction between Flax and Rust has large fitness consequences for the interacting species, perhaps something more along the lines of π π = π π = 0.1? In such cases, Figure 2 shows that significant errors begin to creep into our QLE approximation, which over sufficient amounts of time, lead to significant quantitative errors in our predictions for allele frequencies and linkage disequilibria. That said, the amount of error accruing over any single generation remains vanishingly small, and qualitative predictions such as the sign of linkage disequilibrium, remain entirely robust. What can we conclude from this exercise about the robustness of the QLE approximation? The answer is that it largely depends on the question you wish to address. If the goal is qualitative prediction, the QLE approximation often works remarkably well even when its basic assumptions are grossly violated. If, on the other hand, the goal is quantitative prediction, selection really does need to be quite weak and recombination quite frequent for accuracy to be maintained over thousands of generations. Answers to key questions What patterns of epistasis are likely to be generated by species interactions? Our exploration of coevolution between Flax and its pathogen M. lini, revealed interesting patterns of epistasis. Specifically, our QLE approximation revealed that the host plant, L. marginale, experiences negative epistasis because under the assumptions of our gene-for-gene model, a single resistance gene can be sufficient to recognize and clear the pathogen. In contrast, we found that epistasis within the rust, M. lini, was positive, a pattern that emerges because host recognition can be thwarted only by evading recognition at both loci. How do these patterns of epistasis influence the dynamics and outcome of coevolution? By studying coevolution using our QLE approximation and numerical simulation, we found that epistasis plays an important role in the dynamics of coevolution while having no real impact on its ultimate outcome. Specifically, because epistasis in the Flax is negative, significant fitness gains accrue to individuals carrying a resistance allele at only a single locus. Thus, selection is quite effective at increasing the frequency of resistance genes in the host. In contrast, because epistasis within the rust is positive, significant fitness gains accrue only to individuals carrying virulence alleles at both loci. Consequently, selection has a very hard time gaining traction on rare virulence alleles, thus slowing the rate at which they increase in frequency within the rust population. 10 What patterns of linkage disequilibrium do we expect to emerge in coevolving systems? In light of our observation that epistasis is negative within the Flax population but positive within the Rust population, it is perhaps not surprising that we also observe differences in the sign of linkage disequilibrium in the two species. Specifically, linkage disequilibrium within the Flax population tends to be negative whereas within the rust population it tends to be positive. What this means is that if we sample a Flax individual and find that it is carrying a resistance allele at one locus, it is less likely than expected based on allele frequencies that it is also carrying a resistance allele at another locus. In contrast, if we sample a rust individual and find that it is carrying a virulence allele at one locus, it is more likely than we would expect based on allele frequencies that it is also carrying a virulence allele at another locus. New Questions Arising: Our simple model of gene-for-gene coevolution between Flax and Flax-Rust suggests that …. Although thought provoking, the tenuous connection between this prediction and available empirical data immediately raises several important questions: ο· Are the patterns of epistasis and linkage disequilibrium we uncovered for our simple gene-forgene model also likely to occur in other species interactions? ο· If species interactions are mediated by quantitative traits, rather than molecular recognition, should we still expect epistasis and linkage disequilibrium to matter? In the next two sections, we will develop extensions of our two-locus model that will help us to answer these questions and gain further insight into the process of coevolution in multi-locus systems. Extensions Extension 1: Evaluating the consistency of epistasis and disequilibrium in coevolving interactions Just how general are the patterns of epistasis and linkage disequilibrium we observed in our investigation of gene-for-gene coevolution between Wild flax and flax rust? Perhaps the easiest way to answer this question is to jump right into developing and analyzing a model of coevolution for a very different interaction. Because it is still fresh in our memory from the previous chapter, let’s use the interaction between the snail XX and its schistosome parasite, XX, as our test case. As a quick refresher, recent empirical studies have identified two molecules that have been hypothesized to interact and play an important role in the outcome of an encounter between an individual snail and an individual schistosome. Specifically, snails deploy FREP molecules that bind to specific mucin molecules produced by the schistosome; when the FREP “matches” the mucin, the infection is cleared. In the previous chapter, we developed interaction matrices describing this molecular interaction under various assumptions, all of which involved only a single diploid locus. As many of you probably guessed, the assumption that the structure of these molecules depends on only a single genetic locus is most likely false (REFS). Given this information, let’s revisit this interaction and explore how it is likely to coevolve given a fresh set of genetic assumptions. To keep things simple, we are going to forget all about diploidy 11 and simply focus in on a scenario where FREP and mucin are produced by only a pair of haploid, diallelic loci in each species. The particular assumption we are going to make to adapt this interaction to the two-locus modeling framework we developed earlier in this chapter is that each genotype makes a FREP or mucin molecule with a unique conformation. If the genotypes of snail and shistosome match, the host FREP molecule binds to the parasite mucin molecule and the infection is cleared. If, in contrast, the genotypes of the two species do not match, the snail FREP fails to bind to the schistosome mucin and the infection succeeds. Together, these assumptions lead to the following interaction matrix describing the probability that a schistosome genotype evades recognition and successfully infects a host genotype: 0 1 πΌ=[ 1 1 1 0 1 1 1 1 0 1 1 1 ] 1 0 (23) where snail genotypes are in columns {AB, Ab, aB, ab} and schistosome genotypes are in rows {AB, Ab, aB, ab}. All we need to do now is plug the appropriate values of πΌ from (23) into the general expressions for coevolutionary change in two locus systems we developed previously (17-18). It probably comes as little surprise, however, that even after a lengthy session of algebra, there is really no way to write these exact recursion equations down in a way that really helps us to understand what the hell is going on. As before, this level of complexity suggests that it is time to deploy an approximation; in this case, the best possible approximation for the job is Quasi-Linkage Equilibrium. Developing a QLE approximation for coevolution between the snail B. glabrata and the schistosome parasite S. mansoni makes the exact same assumptions and follows the exact same steps as when we used it to study coevolution between wild flax and flax rust. In short, we assume selection is weak (order ε) and recombination sufficiently common for linkage disequilibrium to also be small (order ε). We then describe evolutionary change in allele frequencies and linkage disequilibrium using their first order Taylor Series exampsions in ε (see accompanying Mathematica notebook). The result is the following system of recursion equations describing evolutionary change in the snail: βππ,π΄ ≈ −π π πXA πXA (1 − 2πYA )(πYB − πXB (1 − 2πYB )) (24a) βππ,π΅ ≈ −π π πXB πXB (1 − 2πYB )(πYA − πXA (1 − 2πYA )) (24b) βπ·π ≈ π π πXA πXA πXB πXB (1 − 2πYA )(1 − 2πYB )(1 − ππ ) − ππ π·π (24c) and evolutionary change in the schistosome: βππ,π΄ ≈ π π πYA πYA (1 − 2πXA )(πYB − πXB (1 − 2πYB )) (25a) βππ,π΅ ≈ π π πYB πYB (1 − 2πXB )(πYA − πXA (1 − 2πYA )) (25b) βπ·π ≈ −π π πYA πYA πYB πYB (1 − 2πXA )(1 − 2πXB )(1 − ππ ) − ππ π·π (25c) 12 where all terms of order ε2 and greater have been ignored. As before, our QLE approximation decouples coevolutionary changes in allele frequencies from changes in linkage disequilibrium, allowing us to solve for the quasi-equilibrium values of linkage disequilibrium: Μπ ≈ π π πXA πXA πXB πXB (1−2πYA )(1−2πYB )(1−ππ ) π· (26a) Μπ ≈ − π π πYA πYA πYB πYB (1−2πXA )(1−2πXB )(1−ππ ) π· (26b) ππ ππ Just a quick look at these expressions reveals that they are quite different from those we found for flax and flax rust. Whereas the sign of linkage disequilibrium was constant and predictable for Flax and Flax Rust, these expressions contain terms that allow the sign of linkage disequilibrium to fluctuate as allele frequencies change over time within the snail and schistosome populations. The underlying cause of these changes in the sign and magnitude of linkage disequilibrium is what is often referred to as fluctuating epistasis (REFS). What this means is that the sign and magnitude of epistasis changes over time within snail and schistosome populations as differing combinations of alleles become more infective or more resistant than predicted by the individual alleles themselves. To be more specific, the terms (1 − 2πYA )(1 − 2πYB ) and (1 − 2πXA )(1 − 2πXB ) appearing in equations (26) measure which genotypes (and thus molecular structures of mucin and FREP molecules) are most common within the schistosome and snail populations. If the most common alleles within the schistosome population are A and B (or a and b), then the first of these two terms is positive, indicating epistatic selection in favor of AB (or ab) genotypes within the snail population, and a corresponding excess of these genotypes within the snail population. Similarly, if the second of these terms is positive, because the A and B alleles (or a and b) are the most frequent within the snail population, epistatic selection favors AB (or ab) genotypes within the schistosome population and a corresponding excess of these genotypes. Although these considerations clearly show that the potential exists for the sign of epistatic selection and linkage disequilibrium to fluctuate within this system, equations (26) make it equally clear that this will only occur if allele frequencies themselves fluctuate over time. We now know that drawing conclusions about the sign of linkage disequilibrium requires that we understand what is going on with allele frequencies within both snail and schistosome populations. Unfortunately, drawing definitive conclusions about the coevolution of allele frequencies within this system is much more challenging than it was for the gene-for-gene model we studied earlier in this chapter. As a consequence, simple inspection of the QLE approximation no longer suffices, and we must take the more formal mathematical approach of identifying equilibria and evaluating their local stability. If we follow the standard protocol and first identify equilibria by setting equations (24a,b and 25a,b) equal to zero and solving for the allele frequencies that satisfy the equality, we find that there are thirty possible equilibria!!! Clearly, it isn’t going to be possible to neatly summarize all thirty of these equilibria in a tidy table. Instead, let’s relegate the majority of the equilibria to the accompanying Mathematica notebook and focus on the five equilibria that exist, are not purely unstable, and help us to learn something important about coevolution between the snail B. glomurata and its schistosome parasite S. mansoni (Table X). 13 Table X. A subset of equilibria and their local stability ππ,π΄ ππ,π΅ ππ,π΄ ππ,π΅ Eigenvalues Stability 0 0 1 1 0, 0, 0, 0 Neutrally stable 0 1 1 0 0, 0, 0, 0 Neutrally stable 1 0 0 1 0, 0, 0, 0 Neutrally stable 1 1 0 0 0, 0, 0, 0 Neutrally stable 1/2 1 1 1 − β √π π √π π , 1 − β √π π √π π , 4 4 1 1 1 + β √π π √π π , 1 + β √π π √π π 4 4 Cycles with increasing amplitude 1/2 1/2 1/2 What we learn from stability… Epistasis matters!!! Why should this equilibrium not be downright unstable? The answer lies in epistasis. Although the host desperately wants to escape, it can’t because only two mutations or two rare alleles will do it any good. So its fucked by epistasis! Extension 2: Evaluating the importance of epistasis and disequilibrium for quantitative traits HERE THERE ARE SO MANY POINTS TO MAKE… ASSUMPTIONS OF LANDE LAND, MAPPING BETWEEN FITNESS FUNCTIONS AND EPISTASIS… ADDITIEV TRAITS VS ADDITIVE FITNESSES. MAPPING BETWEEN LD AND VG, ETC… Conclusions and Synthesis Like dominance before, epistasis plays an important role in the dynamics and outcome of coevolution. Yet, here too, we know so very little about actual patterns of epistasis within real systems it is almost shocking; certainly humbling. This, along with dominance, is the frontier of coevolutionary genetics. 14 15 References Figure Legends Dybdahl, M. F., C. E. Jenkins, and S. L. Nuismer. 2014. Identifying the Molecular Basis of Host-Parasite Coevolution: Merging Models and Mechanisms. AMERICAN NATURALIST 184:1-13. Mitta, G., C. M. Adema, B. Gourbal, E. S. Loker, and A. Theron. 2012. Compatibility polymorphism in snail/schistosome interactions: From field to theory to molecular mechanisms. Developmental and Comparative Immunology 37:1-8. 16