Chapter 6. Multi-locus coevolution, epistasis, and linkage disequilibrium Biological Motivation Obviously, more than a dsingle locus is involved. Here we develop a basic framework for studying two locus systems introducing the concepts of epistasis, recombination, and linkage disequilibrium. After studying how coevolution proceeds in a simple two locus system (motivated by???) we move on to explore ??? INTRODUCE EPISTASIS AND LINKAGE DISEQUILIBRIUM Key Questions: ο· ο· ο· What patterns of epistasis are likely to be generated by species interactions? How do these patterns of epistasis influence the dynamics and outcome of coevolution? What patterns of linkage disequilibrium do we expect to emerge in coevolving systems? Building a 2-locus model of coevolution Our goal is to develop the simplest possible model that captures the potentially important consequences of the multi-locus gene-for-gene interactions for coevolution between X and X. Clearly, the simplest starting point is to focus on only a single pair of loci and haploid sexual species. Within haploid sexuals, recombination occurs in a transient diploid phase but selection occurs in the haploid phase. Thus, we avoid the complexities of diploidy that we struggled with in the previous chapter. Of course, ignoring diploidy also comes at the cost of reduced realism since both XX and XX are, indeed, diploid species. We imagine that rusts and flax’s run into each other at random, and that this has negative fitness consequences for the flax and posoitive fitness consequences for the rust… Assuming random encounters and that the probability of infection depends upon the two locus genotypes of flax and rust, the fitness of the four possible Flax genotypes is given by: ππ,π΄π΅ = 1 − π π (ππ΄π΅ πΌπ΄π΅,π΄π΅ + ππ΄π πΌπ΄π΅,π΄π + πππ΅ πΌπ΄π΅,ππ΅ + πππ πΌπ΄π΅,ππ ) (1a) ππ,π΄π = 1 − π π (ππ΄π΅ πΌπ΄π,π΄π΅ + ππ΄π πΌπ΄π,π΄π + πππ΅ πΌπ΄π,ππ΅ + πππ πΌπ΄π,ππ ) (1b) ππ,ππ΅ = 1 − π π (ππ΄π΅ πΌππ΅,π΄π΅ + ππ΄π πΌππ΅,π΄π + πππ΅ πΌππ΅,ππ΅ + πππ πΌππ΅,ππ ) (1c) ππ,ππ = 1 − π π (ππ΄π΅ πΌππ,π΄π΅ + ππ΄π πΌππ,π΄π + πππ΅ πΌππ,ππ΅ + πππ πΌππ,ππ ) (1d) Similarly, the fitness of the four possible Rust genotypes is given by: ππ,π΄π΅ = 1 − π π (1 − ππ΄π΅ πΌπ΄π΅,π΄π΅ − ππ΄π πΌπ΄π,π΄π΅ − πππ΅ πΌππ΅,π΄π΅ − πππ πΌππ,π΄π΅ ) (2a) ππ,π΄π = 1 − π π (1 − ππ΄π΅ πΌπ΄π΅,π΄π − ππ΄π πΌπ΄π,π΄π − πππ΅ πΌππ΅,π΄π − πππ πΌππ,π΄π ) (2b) Mathematica Resources: http://www.webpages.uidaho.edu/~snuismer/Nuismer_Lab/the_theory_of_coevolution.htm ππ,ππ΅ = 1 − π π (1 − ππ΄π΅ πΌπ΄π΅,ππ΅ − ππ΄π πΌπ΄π,ππ΅ − πππ΅ πΌππ΅,ππ΅ − πππ πΌππ,ππ΅ ) (2c) ππ,ππ = 1 − π π (1 − ππ΄π΅ πΌπ΄π΅,ππ − ππ΄π πΌπ΄π,ππ − πππ΅ πΌππ΅,ππ − πππ πΌππ,ππ ) (2d) Now, if we assume that the probability of survival to mating for the various Flax and Rust genotypes depends on these fitnesses, we can calculate the frequency of each genotype after selection but prior to random mating. As before, we can calculate these frequencies by multiplying the current frequency by its relative fitness. For the Flax, this yields the following expressions: ′ ππ΄π΅ = ππ΄π΅ ππ,π΄π΅ Μ π π (3a) ′ ππ΄π = ππ΄π ππ,π΄π Μ π π (3b) ′ πππ΅ = πππ΅ ππ,ππ΅ Μ π π (3c) ′ πππ = πππ ππ,ππ Μ π π (3d) Μ π is the population mean fitness of species X and is given by: where, as usual, the symbol π Μ π = ππ΄π΅ ππ,π΄π΅ + ππ΄π ππ,π΄π + πππ΅ ππ,ππ΅ + πππ ππ,ππ π (3e) The same procedure can now be applied to the rust population to calculate the frequency of two-locus genotypes there after selection but prior to mating: ′ ππ΄π΅ = ππ΄π΅ ππ,π΄π΅ Μ π π (4a) ′ ππ΄π = ππ΄π ππ,π΄π Μ π π (4b) ′ πππ΅ = πππ΅ ππ,ππ΅ Μ π π (4c) ′ πππ = πππ ππ,ππ Μ π π (4d) Μ π is the population mean fitness of species X and is given by: where, as usual, the symbol π Μ π = ππ΄π΅ ππ,π΄π΅ + ππ΄π ππ,π΄π + πππ΅ ππ,ππ΅ + πππ ππ,ππ π (4e) OK, so now we know what the frequencies of the various genotypes are just before mating ensues. How can we now move forward to incorporate changes to genotype frequencies that accrue during the process of mating? If we are willing to assume that both Flax and Rust mate at random and have quite large population sizes, we can derive basic expressions for changes in genotype frequencies. The long and 2 tedious way to go about this is to first tabulate the frequency of offspring with various genotypes that are produced by all possible combinations of parents (Table 1). RECOMBINATION! INTRODUCE IT HERE Table 1. Genotype frequencies produced by random matings Maternal|Paternal genotypes AB|AB AB|Ab AB|aB AB|ab Ab|AB Ab|Ab Ab|aB Ab|ab aB|AB aB|Ab aB|aB aB|ab ab|AB ab|Ab ab|aB ab|ab Frequency of mating AB ππ΄π΅ ππ΄π΅ ππ΄π΅ ππ΄π ππ΄π΅ πππ΅ ππ΄π΅ πππ ππ΄π΅ ππ΄π΅ ππ΄π΅ ππ΄π ππ΄π΅ πππ΅ ππ΄π΅ πππ ππ΄π΅ ππ΄π΅ ππ΄π΅ ππ΄π ππ΄π΅ πππ΅ ππ΄π΅ πππ ππ΄π΅ ππ΄π΅ ππ΄π΅ ππ΄π ππ΄π΅ πππ΅ ππ΄π΅ πππ 1 1/2 1/2 (1 − π)/2 1/2 0 π/2 0 1/2 π/2 0 0 (1 − π)/2 0 0 0 Offspring genotype Ab aB 0 1/2 0 π/2 1/2 1 (1 − π)/2 1/2 0 (1 − π)/2 0 0 π/2 1/2 0 0 0 0 1/2 π/2 0 0 (1 − π)/2 0 1/2 (1 − π)/2 1 1/2 π/2 0 1/2 0 ab 0 0 0 (1 − π)/2 0 0 π/2 1/2 0 π/2 0 1/2 (1 − π)/2 1/2 1/2 1 What Table 1 provides us with is the raw material for calculating the frequency of the various genotypes in the offspring generation. All we need to do now is sum up the entries in each column, weighting each entry by the frequency with which the two relevant parental genotypes encounter one another at random and mate. Mathematically, this amounts to evaluating the following expression for each of the four possible offspring genotypes, i: ππ′′ = ∑4π=1 ∑4π=1 ππ′ ππ′ Ππ,π+π→π (5a) and the following expression for the four possible offspring genotype in Rust: ππ′′ = ∑4π=1 ∑4π=1 ππ′ ππ′ Ππ,π+π→π (5b) where Ππ,π+π→π and Ππ,π+π→π are the probability that two parents with genotypes j and k produce an offspring of genotype i within the Flax and Rust populations, respectively. Although equations (5) help to see, mechanistically speaking, how the genotype frequencies within one generation are translated into those of the next through the process of segregation and recombination, they are quite clunky and not terribly insightful. Fortunately, these equations can be greatly simplified and re-expressed in a way that is much easier to implement from a practical standpoint, and also much more biologically insightful. Specifically, plugging away algebraically allows equations (5) to be re-written as: 3 ′′ ′ ππ΄π΅ = ππ΄π΅ + ππ π·π′ (6a) ′′ ′ ππ΄π = ππ΄π − ππ π·π′ (6b) ′′ ′ πππ΅ = πππ΅ − ππ π·π′ (6c) ′′ ′ πππ = πππ + ππ π·π′ (6d) in the Flax and as ′′ ′ ππ΄π΅ = ππ΄π΅ + ππ π·π′ (7a) ′′ ′ ππ΄π = ππ΄π − ππ π·π′ (7b) ′′ ′ πππ΅ = πππ΅ − ππ π·π′ (7c) ′′ ′ πππ = πππ + ππ π·π′ (7d) in the rust. In these equations, DX and DY quantify linkage disequilibrium, a measure of the statistical ′ ′ ′ ′ association or covariance between alleles at the A and B loci. Specifically, π·π′ = ππ΄π΅ πππ − ππ΄π πππ΅ and ′ ′ ′ ′ ′ π·π = ππ΄π΅ πππ − ππ΄π πππ΅ such that linkage disequilibrium is positive if there is an excess of AB and ab genotypes within a population and negative if it is, instead, the Ab and aB genotypes that are in excess. A key insight illuminated by equations (6-7) is that the change in genotype frequencies that occurs in response to random mating depends entirely on the rate of recombination. If no recombination occurs, genotype frequencies within the offspring population remain identical to those within the parental population. If, instead, recombination occurs, genotype frequencies in the offspring generation differ from those in the parental generation by an amount proportional to linkage disequilibrium. Clearly, then, recombination can influence coevolution only in cases where coevolutionary selection, or some other evolutionary force, acts to create linkage disequilibrium within populations of interacting species. We are now at a point where we have successfully described how genotype frequencies change over the course of a single generation. To maintain some generality, let’s wait to subsititute in the specific values for fitness corresponding to our GFG model, and simply express how genotype frequencies change in terms of arbitrary fitness values, W. Specifically, subsitututing (3) into (6) and changing from recursion equations to difference equations, yields the following expressions for the change in host genotype frequencies that occurs over the course of a single generation: βππ΄π΅ = Μ π )π Μ π ππ (ππ,ππ΅ ππ,π΄π πππ΅ ππ΄π −ππ,ππ ππ,π΄π΅ πππ ππ΄π΅ )+ππ΄π΅ (ππ,π΄π΅ −π 2 Μ ππ (8a) βππ΄π = Μ π )π Μ π ππ (ππ,ππ ππ,π΄π΅ πππ ππ΄π΅ −ππ,ππ΅ ππ,π΄π πππ΅ ππ΄π )+ππ΄π (ππ,π΄π −π Μ π2 π (8b) βπππ΅ = Μ π )π Μ π ππ (ππ,ππ ππ,π΄π΅ πππ ππ΄π΅ −ππ,ππ΅ ππ,π΄π πππ΅ ππ΄π )+πππ΅ (ππ,ππ΅ −π Μ π2 π (8c) 4 βπππ = Μ π )π Μ π ππ (ππ,ππ΅ ππ,π΄π πππ΅ ππ΄π −ππ,ππ ππ,π΄π΅ πππ ππ΄π΅ )+πππ (ππ,ππ −π 2 Μ ππ (8d) Equations for the pathogen species, Y, are essentially identical and so are not shown. We are now to a point where we could, if we wished, simply simulate the process of coevolution by plugging in the values for fitness we derived previously for the GFG system (EQUSTIONS X) and iterating equations (X). Although this approach would surely provide us with some insights into the process of a coevolution, a much more insightful and elegant approach is to first make a change of variables (Appendix 3) that allows us to focus on allele frequencies and linkage disequilibrium rather than genotype frequencies. In addition to facilitating biological interpretation and intuition, this change of variables simplifies our model by reducing the number of variables we follow from four in equations (X) to three, which is the actual number of degrees of freedom in the system. In order to make the change of variables from genotype frequencies to allele frequencies and linkage disequilibrium, we first need to clearly define the new variables. Specifically, we define allele frequencies: ππ,π΄ = ππ΄π΅ + ππ΄π (9a) ππ,π΅ = ππ΄π΅ + πππ΅ (9b) ππ,π΄ = ππ΄π΅ + ππ΄π (9c) ππ,π΅ = ππ΄π΅ + πππ΅ (9d) and linkage disequilibrium: π·π = ππ΄π΅ πππ − ππ΄π πππ΅ (10a) π·π = ππ΄π΅ πππ − ππ΄π πππ΅ (10b) for both of the interacting species. The next step in our change of variables is to write down new recursions that capture the way in which our new variables change over the course of a single generation. The easiest way to do this is to just substitute the predicted values for the genotype ′′ ′′ frequencies in the next generation (e.g., ππ΄π΅ , ππ΄π, etc.) into expressions (9-10), yielding: ′′ ππ,π΄ = ππ,π΄π ππ΄π +ππ,π΄π΅ ππ΄π΅ Μ π π (11a) ′′ ππ,π΅ = ππ,ππ΅ πππ΅ +ππ,π΄π΅ ππ΄π΅ Μ π π (11b) π·π′′ = (ππ,ππ΅ ππ,π΄π πππ΅ ππ΄π −ππ,ππ ππ,π΄π΅ πππ ππ΄π΅ )(ππ −1) Μ π2 π ′′ ππ,π΄ = ππ,π΄π ππ΄π +ππ,π΄π΅ ππ΄π΅ Μ π π (11c) (12a) 5 ′′ ππ,π΅ = π·π′′ = ππ,ππ΅ πππ΅ +ππ,π΄π΅ ππ΄π΅ Μ π π (12b) (ππ,ππ΅ ππ,π΄π πππ΅ ππ΄π −ππ,ππ ππ,π΄π΅ πππ ππ΄π΅ )(ππ −1) Μ π2 π (12c) Obviously, we still have a bit of a problem! Our equations now contain a mix of old and new variables which can never be a good thing. The way to move forward is to recognize that the genotype frequencies appearing in the right hand sides of the equations can be re-written using definitions (9-10) in the following way: ππ΄π΅ = ππ,π΄ ππ,π΅ + π·π (13a) ππ΄π = ππ,π΄ ππ,π΅ − π·π (13b) πππ΅ = ππ,π΄ ππ,π΅ − π·π (13c) πππ = ππ,π΄ ππ,π΅ + π·π (13d) ππ΄π΅ = ππ,π΄ ππ,π΅ + π·π (14a) ππ΄π = ππ,π΄ ππ,π΅ − π·π (14b) πππ΅ = ππ,π΄ ππ,π΅ − π·π (14c) πππ = ππ,π΄ ππ,π΅ + π·π (14d) Substituting (13 and 14) into (11 and 12) and doing a bit of algebra allows us to finally complete our change of variables and arrive at a set of equations expressed entirely in terms of the new variables. ′′ ππ,π΄ = ππ,π΄ (ππ,π΅ ππ,π΄π +ππ,π΅ ππ,π΄π΅ )+(ππ,π΄π΅ −ππ,π΄π )π·π ′′ ππ,π΅ = ππ,π΅ (ππ,π΄ ππ,ππ΅ +ππ,π΄ ππ,π΄π΅ )+(ππ,π΄π΅ −ππ,ππ΅ )π·π π·π′′ = _ ππ _ ππ (ππ,ππ΅ ππ,π΄π (ππ,π΄ ππ,π΅ −π·π )(ππ,π΄ ππ,π΅ −π·π )−ππ,ππ ππ,π΄π΅ (ππ,π΄ ππ,π΅ +π·π )(ππ,π΄ ππ,π΅ +π·π ))(ππ −1) Μ π2 π (15a) (15b) (15c) and, ′′ ππ,π΄ = ππ,π΄ (ππ,π΅ ππ,π΄π +ππ,π΅ ππ,π΄π΅ )+(ππ,π΄π΅ −ππ,π΄π )π·π Μ π π (16a) ′′ ππ,π΅ = ππ,π΅ (ππ,π΄ ππ,ππ΅ +ππ,π΄ ππ,π΄π΅ )+(ππ,π΄π΅ −ππ,ππ΅ )π·π Μ π π (16b) π·π′′ = (ππ,ππ΅ ππ,π΄π (ππ,π΄ ππ,π΅ −π·π )(ππ,π΄ ππ,π΅ −π·π )−ππ,ππ ππ,π΄π΅ (ππ,π΄ ππ,π΅ +π·π )(ππ,π΄ ππ,π΅ +π·π ))(ππ −1) Μ π2 π 6 (16c) PHHHEEEEWWWWYYYY! We have done it. We now have a set of equations describing how allele frequencies and linkage disequilibrium evolve in response to natural selection and random mating over the course of a single generation. With the bulk of the mathematical tedium behind us, we can now move on to use these equations to gain insights into the process of coevolution between rust and flax. Analyzing the model With our general model in hand, we can return to flax and flax rust and begin to answer the questions we posed at the beginning of the chapter. In order to do so effectively, however, we are going to need to take one final mathematical leap. Specifically, we are going to introduce the idea of a quasilinkage equilibrium (QLE) approximation that will allow us to better understand coevolution both here, and in later chapters. Although frequently misunderstood, this approximation assumes only that selection is relatively weak and that recombination is relatively frequent. Often, the idea of weak selection causes a knee jerk reaction that the approximation does not allow us to understand cases of “real world” coevolution. In my opinion, this knee jerk reaction is generally misguided. When we say “weak selection” in the context of a QLE approximation, we mean as a rule of thumb that selection should be less than 5% per generation. Although this qualifies as weak mathematically, it certainly is quiet strong, and not often observed, within natural populations. The second misconception about the QLE is that it assume linkage disequilibrium is zero. Again, this is untrue. βππ,π΄ ≈ π π ππ,π΄ ππ,π΄ πππ΄ (1 − πππ΅ πππ΅ ) (17a) βππ,π΅ ≈ π π πππ΅ πππ΅ πππ΅ (1 − πππ΄ πππ΄ ) (17b) βπ·π ≈ −π π (1 − ππ )πXA πXA πXB πXB πYA πYB − ππ π·π (17c) βππ,π΄ ≈ π π πYA πYA πXA (1 − πXB πYB ) (18a) βππ,π΅ ≈ π π πYB πYB πXB (1 − πXA πYA ) (18b) βπ·π ≈ π π (1 − ππ )πYA πYA πYB πYB πXA πXB − ππ π·π (18c) Now that is a pretty set of equations! The beauty of the QLE approximation, and the primary reason for using it, is that it allows us to “see” things about the biology of a system that we might otherwise spend hours upon hours simulating and still never pick up on. For instance, here we can immediately see that coevolutionary change in allele frequencies is independent of linkage disequilibrium. As a result, we can solve for the quasi-equilibrium values of linkage disequilibrium very easily by simply setting (17c and 18c) equal to zero and solving for D and D: Μπ ≈ − π π (1−ππ )πXA πXA πXB πXB πYA πYB π· π (19a) π 7 Μπ ≈ π π (1−ππ )πYA πYA πYB πYB πXA πXB π· π (19b) π A quick inspection of (19) shows that the sign of linkage disequilribium should be different in Flax and Flax rust. Specifically, linkage disequilibrium between resistance genes within the flax should always be negative whereas linkage disequilbrium between virulence genes in the rust should always be positive. Why does this consistent pattern arise? MUST HAVE SOMETHING TO DO WITH THE FACT THAT THE HOST NEEDS ONLY ONE R GENE TO “WIN” WHEREAS THE PATHOGEN NEEDS TWO V GENES TO “WIN”. SIGN OF EPISTASIS SHOULD BE DIFFERENT IN THE TWO SPECIES. IS IT? Yup… Our QLE approximation has already unearthed a valuable insight about our expectations for the form of epistasis and sign of linkage disequilibrium that we expect to emerge from GFG coevolution. Can we push our QLE approximation further to learn about the dynamics and outcomes of coevolution? The place to start is with an analysis of allele frequency change. Answers to key questions New Questions Arising: Extensions Extension 1: Snails and schistosomes Extension 2: Quantitative traits Conclusions and Synthesis Like dominance before, epistasis plays an important role in the dynamics and outcome of coevolution. Yet, here too, we know so very little about actual patterns of epistasis within real systems it is almost shocking; certainly humbling. This, along with dominance, is the frontier of coevolutionary genetics. 8 References Figure Legends Dybdahl, M. F., C. E. Jenkins, and S. L. Nuismer. 2014. Identifying the Molecular Basis of Host-Parasite Coevolution: Merging Models and Mechanisms. AMERICAN NATURALIST 184:1-13. Mitta, G., C. M. Adema, B. Gourbal, E. S. Loker, and A. Theron. 2012. Compatibility polymorphism in snail/schistosome interactions: From field to theory to molecular mechanisms. Developmental and Comparative Immunology 37:1-8. 9