MUTATION-SELECTION BALANCE I N MULTI-LOCUS SYSTEMS. I. DUPLICATE GENE ACTION EVELYN PRITCHETT-EWING Department of Biology, The University of New Mexico, Albuquerque, New Mexico 87131 Manuscript received October 2,1979 Revised copy received April 22,1981 ABSTRACT A theoretical model is presented that extends the case of selection against homozygous recessives counterbalanced by mutation to a system of n loci. This extension allows analysis of the role of gene duplication in the evolution of new function. The aspect of retention of function for sufficiently long periods of time to allow for divergence us. silencing of nonfunctional loci is discussed in relation to examples in salmonid and catastomid fishes and in the globin-like clusters. important aspect of evolution is the ability to acquire new functions. Gene ANduplication (OHNO1970) provides a likely mechanism for allowing divergence in function. Duplicated loci are free to accumulate and “experiment” with mutations, as they are sheltered from selection because of the retention of the normal functioning allele at the ancestral locus. Whether duplications arise tandemly via unequal crossing over or arise from chromosome or genome doubling, there are two possible results: (1) through accumulation of mutations, new function may evolve and be retained, or (2) deletenous nonfunctional products may result and be silenced. The outcome and rate of loss of function depends upon popand STOCKWELL 1978). The importance ulation size (LI 1980; BAILEY,POULTER of the role of duplication in evolution is illustrated quite clearly, for example, in certain fishes, e.g., the salmonids and catastomids that arose from tetraploid ancestors (BAILEY,POULTER and STOCKWELL 1978), and in the evolution of the aand P-like globin clusters (DAYHOFF 1969; MANIATIS et aZ. 1980). This problem is addressed here by extending to n loci a two-locus model that was analyzed by CHRISTIANSEN and FRYDENBERG (1977; see also KIMURAand KING1979). By incorporating multiple-locus effects, the examples of evolution in some of the fishes and in the globin genes can now be examined rather specifically from a theoretical standpoint. The deterministic model incorporates selection against completely homozygous recessive counterbalanced by irreversible mutation. The extension from two to n loci allows for a more complete analysis with respect to the evolution of function. Both tandem and genomic duplications are included, as results hold for all possible recombination values. This type of selection is generally associated with duplicate genes, but can also be applied to genes that interact epistatically. Genetics 98: 409-415 June, 1981. 410 E. PRITCHETT-EWING THE MODEL CHRISTIANSEN and FRYDENBERG (1977) examined a two-locus model of duplicate gene action where the relative fitnesses of all genotypes are 1, with the exception of the completely recessive homozygote, whose fitness is l-s. In the absence of mutation or some other pressure, the deleterious gamete will, of course, eventually be eliminated, as selection will override any initial effects of recombination to increase the level of this gamete within the population. We note that, depending upon the initial configuration of gametic frequencies and the values of the recombination frequency ( r ) and selection coefficient(s), the demise of the deleterious gamete can be extremely slow. If irreversible mutation is incorporated such that the rates differ at each of the two loci, monomorphism results a t one locus and polymorphism at the other. This, then, gives the appearance of a singlelocus situation where the frequency of the detrimental gamete is, as in the analogous single-locus model of selection against the recessive, counterbalanced by one-way mutation. If, on the other hand, mutation rates are equal at the two loci, polymorphism at both loci occurs when r is sufficiently larger than the mutation rate. So, depending upon the relationship of the mutation rates, what is in fact caused by two loci may appear to result from one or two loci. This can then be related to loss or retention of function. The extension of the model to three or more loci, two alleles per locus, is direct and shows that stability requirements and equilibrium behavior can be predicted from corresponding models involving fewer loci. The major focus will be on the three-locus case. Consider three loci, two alleles per locus. The eight gametic frequencies are denoted by x1 = f(ABC), x2 = f(ABc), x3 = f(AbC), x4 = f(Abc), x5 = (aBC), x6 = f(aBc), x7 = f(abC) and xs = f(abc). As before, let the relative fitnesses of all genotypes be 1, with the exception of the completely recessive homozygote, aabbcc, whose fitness is l-s. Then the new gametic frequencies after selection are as given in the recursion relationships listed below, following the notation of STROBECK (1973) : where 4 11 are the various measures of disequilibrium and where E, the average fitness over all genotypes, is given by iz = 1 - sx,”. Here, rl denotes the recombination fraction between the A and B loci; r2, the recombination fraction between the B and C loci; and r3, the recombination frequency between the A and C loci. Now let mutation act to alter the frequencies into the actual gametic frequencies of the next generation. In particular, let the mutation rates be as follows: let p be the mutation rate from A to a,v the rate from B to b, and 7 the rate from C to c. After mutation, the gametic frequencies can be computed by the equations given 1971) : below (see KARLINand MCGREGOR x:’= (1-p) ( 1 7 )(1-7)s: x; = (l-p) (1-v)xi + (1-,p) (1-v)7x; x; = 1 - &X)Zl, i = 1,2, . . . We find that p z , the frequency of A after selection and mutation, can be computed as follows: p~=x:’+x;+x;fxy= (l-p)pA/Z. (3a) Similarly, we have that p; = ( l-v)ps/Z (3b) & = (1-7) p o / z . (3c) and By analogy with the two-locus case, each of these can be generalized immediately as follows: pA,n = ( l-/J-)n pA/nzi (44 where PA,,,P B , ~and pc,, are the frequencies of the A, B and C alleles, respectively, after n generations and where C $ ( i=1,2,. . ., n ) is the average fitness in generation i. Case with differing mutation rates: To analyze the equilibrium behavior, we merely examine equations (4a, b, c) in pairwise combinations as n 4 W . The case $A = = Cc = 0 is unstable. If y > v > T , then pa,, -+ 0 and pB,. 4 0 as n 4 W . By setting A pc = 0, i.e., cB Ape = p g - pc = ( p c / z z ) ( I - r C ) = 0, we obtain f, = dq. Then ($4 (0,0,1 ($d, $B, $ C ) or, in terms of gametic frequencies, _ _ - (&,& f , , f & , &,&,&,%I= (~,0,0,0,0,0,1 --V/T/S, V/T/S). (5b) Parallel results emerge for the relationships p > T > v and v > T > p. SO.for different mutation rates at each of the three loci, there is global convergence to the monogenic state to give the appearance of a single-locus situation. Case where two mutation rates are identical: The next case to be considered is that where two of the mutation rates are the same. For example, let p = v > T . By examining the limiiing behavior of the ratios pA,, : pc,, and P B , n : pc,,,. we have that both pa,, and pB,, converge to zero as n + W . Again, by examining Apc, we Consequently, have that the frequency of the abc gamete is given by Pa = we obtain the following: ~TT. ($A, $B, $C> = (090, -dTs) (6a) As a result, there is again the appearance of deterioration to a monogeilic condition. Further, as this result is global, the extension to the three-locus case now applies to duplicated loci. If, on the other hand, p = v < T , there is the illusion of a two-locus condition. By looking at pc,, : p A , , and pc,, : pB,, we see that pC,, approaches zero. while pA approaches p B at a constant rate. Therefore, ($AA, $B, $C> = ($A, c$A, 0) (6b) This will be stable for recombination values greater than the corresponding mutation rates. It is unstable for rl = 0 and hence is applicable to tandemly duplicated loci that they have been physically separated over time. Obviously, this also applies to genomically duplicated loci. The same results emerge for other relationships among the three mutation rates when two of the three rates are equal. Case where all mutation rates are identical: We finally examine the situation where p = v = T . Again, we find that the frequency of the detrimental abc gamete is given by 2, =v/r/s and that pa,, approaches pB,nand pC,, a t constant rates, so 413 DUPLICATE G E N E ACTION that ($Ai,tB, &) = (JjA, c1 lj,, cz $ A ) at equilibrium to give the appearance of a trigenic condition. Generalization to n-loci: Now consider an n-locus system, two alleles per locus, such that Ai and aj are the dominant and recessive alleles at locus j ( i 1,2, . . .,n).Again assume that the relative fitness of each genotype is 1, with the exception of the completely recessive homozygote whose viability is l-s. After selection, then, the average fitness over all genotypes is E = l-sxZ,,. Again, mutation acts to alter gametic frequencies so that pi is the mutation rate at locus i from Ai to ai. Following selection and mutation, the gene frequency of Ai is given by p:, =( l-pj)pAj/W for all i, j ;= I,2, . . . ,n. (8) We obtain the generalization PAj,n = ( l-,pj)npA~/IIE<for all j , i 1,2, . . .,n, (9) where p.4f.nis the frequency of Ai after n generations, and Ei is the average fitness in generation i. As a consequence, if pi # pm(j,7n=1,2. . ., n) for all i, m, the limiting process used in the three-locus case can be employed. Thus, PA^,^ * 0 as n * where i # m(i=1,2. . ., n) and ZjAm = 1 - d a , where .E.L~is the smallest mutation rate. Once more there is the illusion of a single-locus disorder where the = d p m / s and n is the equilibrium frequency of the detrimental gamete is z2% number of loci. Similarly, if two of the mutation rates are the same, the appearance of monogenicity or digencity may be given, etc. (See, for example, equations --f - 6a-c) . CONCLUSIONS The model presented extends to n loci the classical model of selection against the deleterious recessive genotype counterbalanced by irreversible mutation. The extension, however, allows consideration of the effects of duplications at multiple loci. When chromosomal doubling occurs, many loci with varying functions are, of course, involved. We expect that, when a given locus is duplicated, it and its duplicate will initially have identical mutation rates, but that these rates will vary from locus to locus. This aspect is included in the extended model. Such variation in mutation rates among duplicated loci influences the equilibrium outcome of monomorphism us. polymorphism. This can then be related to retention or loss of function. Similarly, an extended model is more appropriate for loci that have undergone multiple tandem duplications over time, such as has occurred in the globin-like clusters. For example, one might expect that the mutation rates of two recent duplicates to be more similar to each other than to the ancestral locus from which they derived. Again, the extended model incorporates this aspect of evolution. 414 E. PRITCHETT-EWING Whether duplicated loci retain function for sufficient time to allow for divergence or are silenced (NEI 1975) depends in part upon population size. The twolocus version of the model presented here has been examined specifically in terms of size effects (BAILEY,POULTER and STOCKWELL 1978; KIMURAand KING1979; and LI 1980). BAILEY,POULTER and STOCKWELL (1978) utilized evidence from groups of fishes, the salmonids and catastomids, which are thought to have arisen from tetraploid ancestors. The evidence indicates that a large percentage of their duplicated loci have remained functional. Their computer simulation suggests that this slow rate of silencing can be explained by a form of the model presented under conditions of large population size. Specifically, for population size N > 1000, the time for 50% probability of silencing is about 15 N ++r3I4, where p is the mutation rate. Thus, the model shows that unlinked duplicates can be retained effectively in the functional state for sufficient time to be available for evolution of new function. LI (1980) re-examined this question using a different and more extensive simulation approach with application to the fish data. In addition, he included the effects of tandem duplications and linkage disequilibrium. LI showed that if Np, > 0.01, the population remains polymorphic for normal and null alleles, but if N p 5 0.01, the population becomes monomorphic for the normal allele. Further, his results indicate that if more than two loci are involved, as in the model given above, the rates of gene loss increase, particularly in large populations, as sheltering becomes more effective. He found that genes can persist if the time for diploidization is long, the mutation rate is low, the effective size is large and/or divergence in regulation or function results. The aspects of the model pertaining to multiple tandem duplications may be exemplified in the evolution of the globin-like genes (see MANIATIS et al. 1980; DAYHOFF 1969; NEI 1975). It is believed that the a- and @-likegenes evolved through duplication, separating about 500 million years ago. Recently, they have been mapped to two different chromosomes in man. Both the a- and @-likegenes then underwent a series of tandem duplications that involved amino acid sequence divergence, as well as apparent regulatory evolution that involved switches in gene expression during development. In humans, there are embryonic-fetal-adult switches in the @-like cluster and fetal-adult switches in the a-like cluster. If these developmental switches result from alterations of the flanking sequences, the above model is directly applicable; otherwise, as pointed out by LI, it is only an approximation. Within both clusters, there are silenced loci. In the @-likecluster, for example, five duplicates are active in man: E in embryonic development, Gy and A y in fetal development, and 6 and ;P in adults. Two pseudogenes, $PI and $@2,have been found. They show high sequence homology with the normal adult @-gene,yet do not encode a functional polypeptide, having been silenced as a result of deletions and insertions that alter the reading frame. Thus, this is an excellent example of duplication followed by divergence (OHNO 1970). This sort of structure has been found in all mammalian globin clusters examined (e.g., HARDISON et al. 1978). This process of repeated duplication and evolution of developmental switches is reflected in globin phylogenetic trees (DAYHOFF 1969; NEI 1975 for discussion), DUPLICATE GENE ACTION 415 I wish to thank J. FELSENSTEIN, H. HARPENDING and M. NEI for their valuable comments. This research was supported in part by Public Health Service grant GM-07661. LITERATURE CITED BAILEY,G. S., R. T. M. POULTER and P. A. STOCKWELL, 1978 Gene duplication in tetraploid fish: model f o r gene silencing at unlinked duplicated loci. Proc. Natl. Acad Sci. US. 11: 5575-5579. F. B. and 0. FRYDENBERG, 1977 Selection-mutation balance for two nonallelic CHRISTIANSEN, recessives producing an inferior double homozygote. Am. J. Hum. Genet. 29: 195-207. DAYHOFF,M. O., (ed.), 1969 Atlas of Protein Sequence and Structure. Natl. Biomed. Res. Found., Silver Springs, MD. HARDISON, R. C., E. T. BUTLER,E. LACY,T. MANIATIS,N. ROSENTHAL and A. EFSTRATIADIS, 1979 The structure and transcription of four linked rabbit p-like globin genes. Cell 18: 1285-1297. KARLIN, S. and J. MCGREGOR, 1971 On mutation-selection balance for two-locus haploid and diploid populations. Theoret. Pop. Biol. 2: 60-70. KIMURA,M.and J. L. KING,1979 Fixation of a deleterious allele at one of two “duplicate” loci by mutation pressure and random drift. Proc. Natl. Acad. Sci. U.S. 76:2858-2861 LI, W.-H., 1980 Rate of gene silencing at duplicate loci: a theoretical study and interpretation of data from tetraploid fishes. Genetics 95:237-258. MANIATIS, T.,E. F. FRITSCH, J. LAUERand R. M. LAWN,1980 The molecular genetics of human hemoglobins. Ann. Rev. Genet. 14: 145-178 NEI, M. 1975 Molecular Population Genetics and Euolution. North-Holland Pub. Co., Amsterdam-Oxford. OHNO,S., 1970 Euolution by Gene Duplication. Springer, Berlin. STROBECK, C., 1973 The three locus model with multiplicative fitness values. Genet. Res., Camb. 22 : 195-200. Corresponding editor: M. NEI