Department of Biology, The University of New Mexico, Albuquerque, New Mexico 87131
Manuscript received October 2,1979
Revised copy received April 22,1981
A theoretical model is presented that extends the case of selection against
homozygous recessives counterbalanced by mutation to a system of n loci. This
extension allows analysis of the role of gene duplication in the evolution of
new function. The aspect of retention of function for sufficiently long periods
of time to allow for divergence us. silencing of nonfunctional loci is discussed
in relation to examples in salmonid and catastomid fishes and in the globin-like
important aspect of evolution is the ability to acquire new functions. Gene
ANduplication (OHNO1970) provides a likely mechanism for allowing divergence in function. Duplicated loci are free to accumulate and “experiment” with
mutations, as they are sheltered from selection because of the retention of the
normal functioning allele at the ancestral locus. Whether duplications arise tandemly via unequal crossing over or arise from chromosome or genome doubling,
there are two possible results: (1) through accumulation of mutations, new function may evolve and be retained, or (2) deletenous nonfunctional products may
result and be silenced. The outcome and rate of loss of function depends upon popand STOCKWELL
1978). The importance
ulation size (LI 1980; BAILEY,POULTER
of the role of duplication in evolution is illustrated quite clearly, for example, in
certain fishes, e.g., the salmonids and catastomids that arose from tetraploid ancestors (BAILEY,POULTER
1978), and in the evolution of the aand P-like globin clusters (DAYHOFF
et aZ. 1980).
This problem is addressed here by extending to n loci a two-locus model that
was analyzed by CHRISTIANSEN
(1977; see also KIMURAand
KING1979). By incorporating multiple-locus effects, the examples of evolution in
some of the fishes and in the globin genes can now be examined rather specifically
from a theoretical standpoint. The deterministic model incorporates selection
against completely homozygous recessive counterbalanced by irreversible mutation. The extension from two to n loci allows for a more complete analysis with
respect to the evolution of function. Both tandem and genomic duplications are included, as results hold for all possible recombination values. This type of selection
is generally associated with duplicate genes, but can also be applied to genes that
interact epistatically.
Genetics 98: 409-415 June, 1981.
(1977) examined a two-locus model of duplicate gene action where the relative fitnesses of all genotypes are 1, with the exception of the completely recessive homozygote, whose fitness is l-s. In the absence of mutation or some other pressure, the deleterious gamete will, of course,
eventually be eliminated, as selection will override any initial effects of recombination to increase the level of this gamete within the population. We note that,
depending upon the initial configuration of gametic frequencies and the values of
the recombination frequency ( r ) and selection coefficient(s), the demise of the
deleterious gamete can be extremely slow. If irreversible mutation is incorporated such that the rates differ at each of the two loci, monomorphism results a t one
locus and polymorphism at the other. This, then, gives the appearance of a singlelocus situation where the frequency of the detrimental gamete is, as in the analogous single-locus model of selection against the recessive, counterbalanced by
one-way mutation. If, on the other hand, mutation rates are equal at the two loci,
polymorphism at both loci occurs when r is sufficiently larger than the mutation
rate. So, depending upon the relationship of the mutation rates, what is in fact
caused by two loci may appear to result from one or two loci. This can then be
related to loss or retention of function.
The extension of the model to three or more loci, two alleles per locus, is direct
and shows that stability requirements and equilibrium behavior can be predicted
from corresponding models involving fewer loci. The major focus will be on the
three-locus case.
Consider three loci, two alleles per locus. The eight gametic frequencies are
denoted by x1 = f(ABC), x2 = f(ABc), x3 = f(AbC), x4 = f(Abc), x5 = (aBC),
x6 = f(aBc), x7 = f(abC) and xs = f(abc). As before, let the relative fitnesses of
all genotypes be 1, with the exception of the completely recessive homozygote,
aabbcc, whose fitness is l-s. Then the new gametic frequencies after selection are
as given in the recursion relationships listed below, following the notation of
(1973) :
4 11
are the various measures of disequilibrium and where E, the average fitness over
all genotypes, is given by
iz = 1 - sx,”.
Here, rl denotes the recombination fraction between the A and B loci; r2, the
recombination fraction between the B and C loci; and r3, the recombination
frequency between the A and C loci.
Now let mutation act to alter the frequencies into the actual gametic frequencies of the next generation. In particular, let the mutation rates be as follows: let p
be the mutation rate from A to a,v the rate from B to b, and 7 the rate from C to c.
After mutation, the gametic frequencies can be computed by the equations given
1971) :
below (see KARLINand MCGREGOR
x:’= (1-p) ( 1 7 )(1-7)s:
x; = (l-p) (1-v)xi
+ (1-,p) (1-v)7x;
x; = 1 - &X)Zl, i = 1,2, . . .
We find that p z , the frequency of A after selection and mutation, can be computed as follows:
Similarly, we have that
p; = ( l-v)ps/Z
& = (1-7) p o / z .
By analogy with the two-locus case, each of these can be generalized immediately
as follows:
= ( l-/J-)n
where PA,,,P B , ~and pc,, are the frequencies of the A, B and C alleles, respectively,
after n generations and where C $ ( i=1,2,. . ., n ) is the average fitness in
generation i.
Case with differing mutation rates: To analyze the equilibrium behavior, we
merely examine equations (4a, b, c) in pairwise combinations as n 4 W . The case
$A =
= Cc = 0 is unstable. If y > v > T , then pa,, -+ 0 and pB,. 4 0 as n 4 W .
By setting A pc = 0, i.e.,
Ape = p g - pc = ( p c / z z ) ( I - r C ) = 0,
we obtain f, = dq.
($d, $B, $ C )
or, in terms of gametic frequencies,
_ _ -
(&,& f , , f & , &,&,&,%I= (~,0,0,0,0,0,1
Parallel results emerge for the relationships p > T > v and v > T > p. SO.for different mutation rates at each of the three loci, there is global convergence to the
monogenic state to give the appearance of a single-locus situation.
Case where two mutation rates are identical: The next case to be considered is
that where two of the mutation rates are the same. For example, let p = v > T . By
examining the limiiing behavior of the ratios pA,, : pc,, and P B , n : pc,,,. we have
that both pa,, and pB,, converge to zero as n + W . Again, by examining Apc, we
have that the frequency of the abc gamete is given by Pa =
we obtain the following:
$B, $C>
= (090,
As a result, there is again the appearance of deterioration to a monogeilic condition. Further, as this result is global, the extension to the three-locus case now
applies to duplicated loci.
If, on the other hand, p = v < T , there is the illusion of a two-locus condition.
By looking at pc,, : p A , , and pc,, : pB,, we see that pC,, approaches zero. while pA
approaches p B at a constant rate. Therefore,
($AA, $B,
= ($A,
This will be stable for recombination values greater than the corresponding mutation rates. It is unstable for rl = 0 and hence is applicable to tandemly duplicated loci that they have been physically separated over time. Obviously, this also
applies to genomically duplicated loci. The same results emerge for other relationships among the three mutation rates when two of the three rates are equal.
Case where all mutation rates are identical: We finally examine the situation
where p = v = T . Again, we find that the frequency of the detrimental abc gamete
is given by 2, =v/r/s and that pa,, approaches pB,nand pC,, a t constant rates, so
that ($Ai,tB,
&) = (JjA, c1 lj,, cz $ A ) at equilibrium to give the appearance of a
trigenic condition.
Generalization to n-loci: Now consider an n-locus system, two alleles per locus,
such that Ai and aj are the dominant and recessive alleles at locus j ( i
. . .,n).Again assume that the relative fitness of each genotype is 1, with the
exception of the completely recessive homozygote
whose viability is l-s. After selection, then, the average fitness over all genotypes
is E = l-sxZ,,. Again, mutation acts to alter gametic frequencies so that pi is the
mutation rate at locus i from Ai to ai. Following selection and mutation, the gene
frequency of Ai is given by
=( l-pj)pAj/W for all i, j
I,2, . . . ,n.
We obtain the generalization
PAj,n = ( l-,pj)npA~/IIE<for all j , i
1,2, . . .,n,
where p.4f.nis the frequency of Ai after n generations, and Ei is the average fitness
in generation i. As a consequence, if pi # pm(j,7n=1,2. . ., n) for all i, m, the
limiting process used in the three-locus case can be employed. Thus, PA^,^ * 0 as
n * where i # m(i=1,2. . ., n) and ZjAm = 1 - d a ,
where .E.L~is the smallest
mutation rate. Once more there is the illusion of a single-locus disorder where the
= d p m / s and n is the
equilibrium frequency of the detrimental gamete is z2%
number of loci. Similarly, if two of the mutation rates are the same, the appearance of monogenicity or digencity may be given, etc. (See, for example, equations
6a-c) .
The model presented extends to n loci the classical model of selection against
the deleterious recessive genotype counterbalanced by irreversible mutation. The
extension, however, allows consideration of the effects of duplications at multiple
loci. When chromosomal doubling occurs, many loci with varying functions are,
of course, involved. We expect that, when a given locus is duplicated, it and its
duplicate will initially have identical mutation rates, but that these rates will
vary from locus to locus. This aspect is included in the extended model. Such
variation in mutation rates among duplicated loci influences the equilibrium outcome of monomorphism us. polymorphism. This can then be related to retention
or loss of function. Similarly, an extended model is more appropriate for loci that
have undergone multiple tandem duplications over time, such as has occurred in
the globin-like clusters. For example, one might expect that the mutation rates
of two recent duplicates to be more similar to each other than to the ancestral
locus from which they derived. Again, the extended model incorporates this
aspect of evolution.
Whether duplicated loci retain function for sufficient time to allow for divergence or are silenced (NEI 1975) depends in part upon population size. The twolocus version of the model presented here has been examined specifically in terms
of size effects (BAILEY,POULTER
1978; KIMURAand KING1979;
(1978) utilized evidence from
groups of fishes, the salmonids and catastomids, which are thought to have arisen
from tetraploid ancestors. The evidence indicates that a large percentage of their
duplicated loci have remained functional. Their computer simulation suggests
that this slow rate of silencing can be explained by a form of the model presented
under conditions of large population size. Specifically, for population size N >
1000, the time for 50% probability of silencing is about 15 N ++r3I4,
where p is
the mutation rate. Thus, the model shows that unlinked duplicates can be retained
effectively in the functional state for sufficient time to be available for evolution
of new function.
LI (1980) re-examined this question using a different and more extensive
simulation approach with application to the fish data. In addition, he included the
effects of tandem duplications and linkage disequilibrium. LI showed that if Np, >
0.01, the population remains polymorphic for normal and null alleles, but if N p 5
0.01, the population becomes monomorphic for the normal allele. Further, his
results indicate that if more than two loci are involved, as in the model given
above, the rates of gene loss increase, particularly in large populations, as sheltering becomes more effective. He found that genes can persist if the time for diploidization is long, the mutation rate is low, the effective size is large and/or
divergence in regulation or function results.
The aspects of the model pertaining to multiple tandem duplications may be
exemplified in the evolution of the globin-like genes (see MANIATIS
et al. 1980;
1969; NEI 1975). It is believed that the a- and @-likegenes evolved
through duplication, separating about 500 million years ago. Recently, they have
been mapped to two different chromosomes in man. Both the a- and @-likegenes
then underwent a series of tandem duplications that involved amino acid sequence
divergence, as well as apparent regulatory evolution that involved switches in
gene expression during development. In humans, there are embryonic-fetal-adult
switches in the @-like cluster and fetal-adult switches in the a-like cluster. If
these developmental switches result from alterations of the flanking sequences,
the above model is directly applicable; otherwise, as pointed out by LI, it is only
an approximation. Within both clusters, there are silenced loci. In the @-likecluster, for example, five duplicates are active in man: E in embryonic development,
Gy and A y in fetal development, and 6 and ;P in adults. Two pseudogenes, $PI
and [email protected],have been found. They show high sequence homology with the normal
adult @-gene,yet do not encode a functional polypeptide, having been silenced as
a result of deletions and insertions that alter the reading frame. Thus, this is an
excellent example of duplication followed by divergence (OHNO 1970). This
sort of structure has been found in all mammalian globin clusters examined (e.g.,
et al. 1978). This process of repeated duplication and evolution of
developmental switches is reflected in globin phylogenetic trees (DAYHOFF
NEI 1975 for discussion),
I wish to thank J. FELSENSTEIN,
and M. NEI for their valuable comments.
This research was supported in part by Public Health Service grant GM-07661.
1978 Gene duplication in tetraploid
fish: model f o r gene silencing at unlinked duplicated loci. Proc. Natl. Acad Sci. US. 11:
F. B. and 0. FRYDENBERG,
1977 Selection-mutation balance for two nonallelic
recessives producing an inferior double homozygote. Am. J. Hum. Genet. 29: 195-207.
DAYHOFF,M. O., (ed.), 1969 Atlas of Protein Sequence and Structure. Natl. Biomed. Res.
Found., Silver Springs, MD.
1979 The structure and transcription of four linked rabbit p-like globin genes. Cell 18:
1971 On mutation-selection balance for two-locus haploid and
diploid populations. Theoret. Pop. Biol. 2: 60-70.
KIMURA,M.and J. L. KING,1979 Fixation of a deleterious allele at one of two “duplicate” loci
by mutation pressure and random drift. Proc. Natl. Acad. Sci. U.S. 76:2858-2861
LI, W.-H., 1980 Rate of gene silencing at duplicate loci: a theoretical study and interpretation
of data from tetraploid fishes. Genetics 95:237-258.
J. LAUERand R. M. LAWN,1980 The molecular genetics of human
hemoglobins. Ann. Rev. Genet. 14: 145-178
NEI, M. 1975 Molecular Population Genetics and Euolution. North-Holland Pub. Co., Amsterdam-Oxford.
OHNO,S., 1970 Euolution by Gene Duplication. Springer, Berlin.
C., 1973 The three locus model with multiplicative fitness values. Genet. Res., Camb.
22 : 195-200.
Corresponding editor: M. NEI