Tempo and Mode of Synonymous Substitutions in Mitochondrial

advertisement
Tempo and Mode of Synonymous
DNA of Primates
Substitutions
in Mitochondrial
Jun Adachi and Masami Hasegawa
Department of Statistical
Mathematics, Tokyo
Science, The Graduate University
for Advanced
Studies, and The Institute of Statistical
Nucleotide substitutions of the four-fold degenerate sites and the total third codon positions of mitochondrial
DNA from human, common chimpanzee, bonobo, gorilla, and orangutan were examined in detail by three alternative
Markov models; (1) Hasegawa, Kishino, and Yano’s (1985) model, (2) Tamura and Nei’s (1993) model, and (3) the
general reversible Markov model. These sites are expected to be relatively free from constraint, and therefore their
tempo and mode in evolution should reflect those of mutation. It turned out that, among the alternative models, the
general reversible Markov model best approximates the nucleotide substitutions of the four-fold degenerate sites
and the total third codon positions, while the maximum likelihood estimates of the numbers of nucleotide substitutions along each branch do not differ significantly among the three models. It was further shown that the transition
rate of these sites during evolution, and therefore transitional mutation rate of mtDNA, are higher in humans than
in chimpanzees and gorillas probably by about two times. However, transversional mutation rate and amino acid
substitution rate do not differ significantly between humans and the African apes. These and additional observations
suggest heterogeneity of the mutation rate as well as of the constraint operating on the mtDNA-encoded
proteins
among different lineages of Hominoidea.
Introduction
Nucleotide
substitutions
of the third positions of
four-fold degenerate codon families are always synonymous, and are expected to be relatively free from constraint, and therefore their tempo and mode in evolution
should reflect those of mutation. Since the evolutionary
rate of animal mitochondrial
DNA (mtDNA) is much
higher than that of nuclear DNA and hence the multiplehit effect is great in a comparison between distantly related species, we must compare closely related species
in order to examine the pattern of synonymous
nucleotide substitutions
of mtDNA.
Horai et al. (1995) determined complete mitochondrial DNA (mtDNA) sequences from human (Homo supiens), common chimpanzee
(Pun troglodytes), pygmy
chimpanzee (bonobo; Pan paniscus), gorilla (Gorilla gorilla), and orangutan (Pongo pygmaeus). Their data provide us with an opportunity to examine in detail the tempo and mode of mtDNA evolution in Hominoidea, and
in this paper we will concentrate our attention particularly
on those of the synonymous substitutions in Horai et al.‘s
data. These substitutions have been studied by Kondo et
al. (1993) and by Tamura (1994) for Horai et al.‘s (1992)
earlier data of 4.8 kb, but these authors did not examine
the heterogeneity
of the evolutionary
rate among lineages. We will show that, although the evolutionary rate
at the amino acid sequence level seems to be uniform
among lineages of the African apes/human clade as suggested by Horai et al. (1995) and by Adachi and Hasegawa (199%), transition rate of synonymous sites is higher in the human than in the African apes.
Markov Models of Nucleotide Substitutions
Hasegawa, Kishino, and Yano (1985) proposed a
stationary Markov model (HKY85 model) for nucleotide substitution,
in which a nucleotide i is replaced by
another nucleotide j in an infinitesimally
short time interval, dt, with a probability of
ox7Fjdt
(T H C, A H G)
for transversion
I
Mol. Biol. Evol. 13(1):20&208. 1996
0 1996 by the Society for Molecular Biology and Evolution. ISSN: 0737-4038
200
CT,C
H-
(1)
A, G),
where 7Fj is the frequency of nucleotide j, and OLand p
are parameters that determine transition and transversion
rates, respectively. This model takes account of the different rate between transition and transversion,
and of
biased nucleotide composition. The latter is particularly
important in analyzing mtDNA, in which G is extremely
rare in the L-strand. Recently, Tamura and Nei (1993)
proposed a more general model (TN93 model), in which
Pti(dt) is given by
I
Key words: Mitochondrial
DNA, nucleotide substitution, Markov
model, third positions of codons, rate heterogeneity
among lineages.
Address for correspondence
and reprints: Jun Adachi, The Institute of Statistical Mathematics, 4-6-7 Minami-Azabu,
Minato-ku, Tokyo 106, Japan. E-mail adachi@ism.ac.jp.
for transition
ayTj
Pij(dt)
aRnj
=
dt
for pyrimidine
dt
(T@ c>
for purine transition
I
f37Fjdt
transition
(A H G)
for transversion
(T, C w A, G),
(2)
Synonymous
Table 1
Numbers
of Transition
Anderson
Anderson ....
DCMl ......
SB17F ......
Chimp ......
Bonobo .....
Gorilla ......
Orang .......
0 (1)
2 (3)
47 (54)
48 (53)
92 (118)
204 (263)
and Transversion
Nucleotide Differences
SB17F
Chimp
12 (21)
11 (34)
17 (41)
329 (758)
332 (760)
327 (757)
(2)
(53)
(52)
(117)
(262)
49
50
94
206
(55)
(54)
(119)
(264)
19 (23)
81 (110)
195 (257)
Nom.-Numbers
of transition nucleotide differences (upper right half) and those of transversion
sites) and total of the third codon positions (3,569 sites; in parentheses) among Hominoidea.
allowing different transition rates for purines and pyrimidines.
Tamura (1994) showed that the TN93 model is superior to the HKY85 model in approximating
the fourfold degenerate sites, as well as all the third codon positions in Horai et al.‘s (1992) data of 4.8 kb.
Furthermore,
Yang (1994) estimated 4 X 4 transition matrices of the most general reversible
Markov
model (REV model) for \Irq-globin pseudogenes
of six
primate species and for a 0.9-kb segment of mtDNA
sequences from nine primates. Saccone et al. (1990) also
proposed a similar reversible model. Saccone et al. and
Tamura (1994) estimated transition matrices for their respective models from pairwise comparisons of sequences, and hence the matrix differs between different species-pairs of the same gene. They did not propose any
method to synthesize the estimated matrices from several comparisons
when one deals with three or more
species. It would be desirable to estimate a single transition probability
matrix from a tree, and Yang (1994)
first gave the maximum likelihood (ML) method for estimating the transition probability
matrix from a tree
with more than three species.
Tamura and Nei (1993) studied the pattern of nucleotide substitution
by inferring ancestral sequences
with the parsimony method. However, it has been pointed out that the parsimony method gives a biased estimate of the ancestral sequences when there is a bias in
nucleotide composition
(Perna and Kocher 1995; Collins, Wimberger,
and Naylor 1994). Furthermore,
the
parsimony
method has no time structure (Goldman
1990), and therefore it would be desirable to estimate
the matrix by using the ML method (Yang 1994).
By using complete sequence data of mitochondrial
genomes from 20 vertebrate species, Adachi and Hasegawa (unpublished
data) estimated 20 X 20 transition
matrix of amino acids for mtDNA-encoded
proteins by
the ML method, and, in this paper, by using the same
method we will estimate the matrix for the four-fold
degenerate sites and for the total of the third codon positions of mtDNA.
in Mitochondrial
Gorilla
Orang
DNA
201
among Hominoidea
DCMl
2
47
48
92
204
Substitutions
Bonobo
322
327
320
142
(741)
(747)
(738)
(318)
355
353
350
343
328
82 (111)
198 (258)
differences
(855)
(851)
(848)
(818)
(793)
357
360
355
364
344
368
(872)
(872)
(869)
(896)
(873)
(895)
200 (267)
(lower left half) of four-fold degenerate sites (1,667
Sequence Data
Horai et al.‘s (1995) sequences from chimpanzee,
bonobo, gorilla, and orangutan are used in this analysis.
Furthermore,
three human sequences are used; a European (Anderson
et al. 1981) revised by Horai et al.
(1995) and designated as ‘Anderson’,
a Japanese (Ozawa et al. 199 1) designated as DCM 1, and an African
(Horai et al. 1995) designated as SB17E
The L-strand sequences of the mtDNA of the following protein-encoding
regions encoded by the Hstrand are used: ND1 (33 10-4260 in the numbering of
Anderson et al. 1981), ND2 (4473-5510),
CO1 (59077442), CO11 (7589-8266), ATPase 8 (8369-8524), ATPase 6 (8575-9204,
overlapping
region with ATPase 8,
8525-8574,
was excluded), CO111 (9210-9989),
ND3
(10062-10403),
ND4L (10473-10757),
ND4 (1076912136), ND5 (12340-14145),
and Cyt-b (1475015886). The total number of deduced codons is 3,569.
Among these, the number of codons remaining four-fold
degenerate
during evolution
is 1,667. Table 1 gives
numbers of transition and transversion
differences between species at the four-fold degenerate sites and at the
total of the third codon positions.
Transition
Probability
Matrix of the REV Model
We estimated transition probability matrices of the
REV model for the time interval of one substitution per
100 sites from the 1,667 sites data and from the 3,569
sites data by the ML based on the tree of the five hominoid species with the three human individuals ((chimp,
bonobo), ((Anderson, DCMl), SB 17F), gorilla, orang),
and they are given in tables 2a and 2b. These tables
show that the occurrence of nucleotide substitution
is
distinctly asymmetric
between the two strands of mtDNA. For the four-fold degenerate sites, the rates of
G+A and T+C transitions are, respectively,
0.04124/
0.00627 = 6.6 and 0.01895/0.00441
= 4.3 times higher
on the L-strand (as represented in table 2a) than on the
H-strand. This nucleotide substitution
bias is roughly
consistent with Tanaka and Ozawa’s (1994) estimates
202
Adachi
and Hasegawa
t
Table 2u
The Transition Probability Matrix of the REV Model for
a Unit Time Interval (One Substitution per 100 Sites) Estimated by the ML from the Four-fold Degenerate Sites
(1,667 Sites)
*”
T
T ....
C ....
A....
G....
0.97893
0.00627
0.00066
0.00132
0.01895
0.99306
0.00061
0.00144
0.00175
0.00054
0.99432
0.04124
0.00037
0.00013
0.00441
0.95599
IT . . . .
0.144
0.434
0.381
0.041
C
NOTE.-IT refers to nucleotide
A
composition.
T
C
A
T....
c....
A....
G....
0.97980
0.00707
0.00037
0.00086
0.01911
0.9925 1
0.00035
0.00119
0.00086
0.00030
0.9949 1
0.03702
0.00024
0.00012
0.00437
0.96093
IT . . . .
0.159
0.429
0.369
0.043
NOTE.-IT refers to nucleotide
Bonobo
ei
-
Human
G
Table BThe Transition Probability Matrix of the REV
Model for a Unit Time Interval (One Substitution per 100
Sites) Estimated by the ML from the Total of Third Codon Positions (3,569 Sites)
fl
Chimp
I
0.1 substitutions I site
G
compositions.
from the four-fold degenerate sites of the entire mitochondrial genomes of 43 human individuals; that is, the
rate of G-+A and T-X transitions are 9 and 1.8 times
higher on the L-strand than on the H-strand.
The ML tree estimated by the REV model for the
four-fold degenerate sites data is represented in figure 1,
in which only Anderson was used from human, and
number of four-fold degenerate sites is 1,669 for this
data set. Table 3 gives the branch lengths of the tree of
the four-fold degenerate sites estimated by the ML method, that does not assume the rate constancy among lineages, based on the REV, TN93, and HKY85 models.
The branch length leading to human (2-Human) is significantly longer than those leading to chimpanzees
(2l-Chimp and 2- 1-Bonobo). Longer branch length of human than those of the chimpanzees holds also when either of DCMl or SB 17F is used instead of Anderson.
Among the alternative models, we can select the
best model by minimizing
the Akaike Information
Criterion (Akaike 1973, 1974) defined by AIC = -2 X
(log-likelihood)
+ 2 X (number of parameters).
The
REV, TN93, and HKY85 models gave AIC of 11,253.3,
11,287.7, and 11,349.2 for the four-fold degenerate sites,
and the REV model turned out to be the best among
these models in approximating
the evolution of the fourfold degenerate sites, and although the TN93 model is
FIG. l.-The
ML tree of the four-fold degenerate sites based on
the REV model of table 2~. The horizontal length of each branch is
proportional to the estimated number of substitutions. The root of this
tree is arbitrarily placed within 3-orangutan branch. The NucML program in MOLPHY ver. 2.3 (Adachi and Hasegawa 1995~) for the ML
inference of DNA or RNA phylogeny was applied.
inferior to the REV model,
HKY85 model.
Fitting
of Models
Degenerate
it is much better than the
to the Four-Fold
Sites Data
For the alignment of 5 OTUs, 45 = 1,024 configurations of nucleotide sites are possible, and probabilities of respective configurations
were calculated for the
four-fold degenerate sites under the respective models
with the branch lengths given in table 3. Grouping these
configurations
into 11 categories of TTTTT, CCCCC,
AAAAA,
GGGGG,
1 -TC-transition
(configurations
which could arise from one transition between T and C),
1-AG-transition,
1-GT-transversion,
1-GC-transversion,
1-AT-transversion,
1-AC-transversion,
and 22-changes
(configurations
which could not arise from less than two
changes), a x2 test for the REV model gave a P value
of 0.005 (table 4). Although this value is much better
than those for the TN93 and HKY85 models (<10v6),
Table 3
Branch Lengths (Numbers of Substitutions per 100 Sites)
of the Four-fold Degenerate Nucleotide Sites (1,669 Sites)
for the REV, TN93, and HKY85 Models (ML Estimates)
REV
HKY85
TN93
Including both chimp
and bonobo
....
l-Chimp..
I-Bonobo . . . . .
2-Human . . . . .
.....
3-Gorilla
3-Orang . . . . . .
2-l . . . . . . . . . .
3-2 . . . . . . . . . .
2-l-Chimp
....
2-1-Bonobo
...
5.71
5.66
21.54
21.10
82.61
8.35
8.86
14.06
14.01
2
2
2
+
2
+
2
2
+
0.83
0.83
1.98
2.30
7.00
1.51
1.91
1.72
1.72
5.64
5.75
21.25
21.35
81.66
8.52
8.68
14.16
14.27
+
+
+
+
+
2
+
+
+
0.83
0.84
1.97
2.34
7.00
1.52
1.91
1.73
1.74
5.90
5.31
21.32
20.55
86.37
8.14
9.45
14.04
13.45
?
2
+
+
+
+
+
+
+
0.83
0.80
1.98
2.36
7.64
1.52
1.99
1.73
1.72
Nom.-The
ML estimates of parameters are as follows; a/p = 23.6 for the
HKY85 model, and (av + qJ(2p)
= 31.0 and a#+ = 0.4 for the TN93 model.
t refers to 1 SE.
Synonymous
Substitutions
in Mitochondrial
DNA
203
Table 4
Distribution of Configurations of Four-fold Degenerate Sites (1,669 Sites) for the REV, TN93, and HKYSS Models (ML
Estimates)
REV MODEL
CONFIGURATION
Trrrf
...........
ccccc
...........
AAAAA
..........
GGGGG
..........
1-TC-transition
.... .
.... .
1-AG-transition
...
I-GT-transversion
1-GC-transversion
..
...
1-AT-transversion
..
I-AC-transversion
?2-changes
........
OBSERVED
22
354
388
0
365
152
1
5
59
104
219
1,669
Total
...... ........
Expected
17.1
348.2
390.5
0.3
366.3
153.2
1.4
6.7
45.7
102.1
237.6
1,669.O
(Observed Expected)*/
Expected
1.404
0.097
0.016
0.309
0.005
0.009
0.114
0.43 1
3.871
0.035
1.456
x* = 7.747
df = 1
P = 0.005
the transition matrix of table 2a still does not approximate satisfactorily the evolution of four-fold degenerate
sites. Discrepancy
of the REV model with the data is
mostly due to more frequent AT-transversions
than expected. The numbers of O-change configurations
expected from the REV and TN93 models coincide with
the observed ones, suggesting that the site-heterogeneity
for the four-fold degenerate sites is not as important as
in the case of amino acid sequences studied by Adachi
and Hasegawa (199%). This is as expected because of
the freedom of these sites from constraint.
A more detailed study clarifies additional discrepancies of the REV model from the data. The most significant
discrepancy
is found for the configuration
CCCCT (in the order of chimpanzee,
bonobo, human,
gorilla, and orangutan). The observed numbers of sites
of the CCCCT configuration are 68 and 147 for the fourfold degenerate sites and the total third codon positions,
respectively,
while the expected numbers for the REV
model are 96.6 and 222.8. These discrepancies
would
be due to the unequal base composition
of orangutan
from the other Hominoidea
species. The TT)S of SB 17E
chimpanzee,
bonobo, gorilla, and orangutan are 0.139,
0.154, 0.153, 0.159, and 0.123, respectively,
and the
7~~‘s are 0.438, 0.424, 0.424, 0.421, and 0.457. Suppose
nij is the number of sites in which gorilla has a base i
and orangutan has j in the four-fold degenerate
sites
(number of sites n = 1,667), and suppose ni* = XjnG
and ny = &nij. Then, nT* - &T = 60, Suggesting
lower
T content in the orangutan than in the gorilla. In order
to test whether this difference is significant, the variance
TN93 MODEL
Expected
20.1
333.5
385.7
0.3
372.5
156.3
0.8
6.1
29.6
129.7
234.4
1,669.O
HKY 85 MODEL
(Observed Expected)2/
Expected
Expected
12.5
303.5
417.3
2.4
389.7
127.9
0.9
6.9
29.0
130.1
248.8
0.180
1.260
0.014
0.348
0.151
0.118
0.050
0.198
29.20
5.092
1.012
1,669.O
x2 = 37.624
df = 4
P < 10-6
(Observed Expected)*/
Expected
7.178
8.414
2.057
2.417
1.566
4.541
0.011
0.523
21.034
5.236
3.569
x2 = 66.546
df = 5
P < 10-6
of this difference is estimated by the following
(Hasegawa and Kishino 1989),
vai++
formula
- n*,) = nT* + n*T
- 2n,
- (nP - &&2/n.
(3)
The SE of nT* - &T is estimated to be 18.7, indicating
that the orangutan has significantly lower T content than
the gorilla. A similar analysis for the C content shows
that nc* - n*c = -60 and its SE is 19.9, suggesting
that the orangutan has significantly
higher C content
than the gorilla. These significant differences of T and
C contents of the orangutan from the other Hominoids
hold not only for the gorilla but also for human, chimpanzee and bonobo, and these hold for the total third
codon positions (data not shown).
It is apparent that the transition rate between purines is higher than that between pyrimidines
by about
two times, and in terms of AIC the TN93 model better
approximates the 1,669 sites data than the HKY85 model does. As for the branch lengths, however, the estimates from the three models do not differ significantly
(table 3), and therefore the estimates of the evolutionary
rate and the branching
dates would be robust to the
choice among these models. For this reason we shall use
the HKY85 model in estimating the heterogeneity
of
evolutionary rate and the branching dates in Hominoidea
because of its simplicity.
Rate Heterogeneity
among Lineages
Figure 1 and table 3 suggest higher evolutionary
rate in human than in chimpanzee and bonobo. Kishino
204
Adachi
and Hasegawa
Table 5
Branching Dates and Evolutionary Rates Estimated from the Four-fold Degenerate Sites
of mtDNA (1,667 Sites)
MODEL
Constant
Rate
Rate Change
in Human
Rates in branches
Human . . . . . . . . . . . . . . . . . . . .
Other branches . . . . . . . . . . . . .
Rates (10-3/million
years)
cxi . . . . . . . . . . . . . . . . . . . . . . . .
(Y2 . . . . . . . . . . . . . . . . . . . . . . . .
p, ......... ...............
265.3 + 60.7
-
p2 . . . . . . . . . . . . . . . . . . . . . . . .
-
Branching
10.3 + 0.9
203.4
449.7
10.1
11.1
k
2
k
k
39.6
120.8
0.8
3.0
5.88
3.38
1.54
0.07
0.05
13
k
+
+
f.
+
0.71
0.62
0.25
0.02
0.02
200.8 2 36.6
431.6 2 97.3
10.2 2 0.8
-
dates (million years)
Orang/African
apes . . . . . . . . . .
Gorilla/(Chimp,
Human) . . . . .
Human/Chimp
..............
Chimp/Bonobo
..... ........
SB 17F/(Anderson,
DCMl)
...
Anderson/DCM 1 . . . . . . . . . . . .
SSR . . . . . . . . . . . . . . . . . . . . . . . .
df..........................
P ..........................
AIC . . . . . . . . . . . . . . . . . . . . . . . .
5.57
3.90
1.30
0.11
0.08
13
+
+t
+
+
0.83
0.63
0.27
0.03
0.03
51.79
32
0.02
163.64
38.24
30
0.15
154.08
5.95
3.49
1.56
0.07
0.05
13
2
2
+
2
2
0.66
0.51
0.24
0.02
0.02
38.36
31
0.17
152.21
1‘
Minimum AIC
NOTE-?
is 1 SE. SSR refers to the sum of squares of residuals in the generalized least squares of D = (. . . , V,,, . . . ,
S Ij, . . .), where V,, and S,, are numbers of transversion and transition differences between species i and j given in table 1.
and Hasegawa (1990) devised a method to estimate simultaneously
the evolutionary
rate and the branching
dates from difference data of nucleotides (such as given
in table 1) by the generalized least-squares. The method
assumes the HKY85 model and allows rate variation
among lineages; that is, cx and B in equation (1) can
differ among branches, and we can assign different rate
parameters to different branches. Among the alternative
models for rate variation, we can select the best model
by minimizing
the AIC, in which a penalty is imposed
in introducing too many parameters.
By assuming
13 million years for the orangutan
separation (Pilbeam 1988; Andrews 1992; McCrossin
and Benefit 1993), we estimate the evolutionary
rates
and the branching dates based on several models for rate
variation (Table 5). At first, we use the simplest model
that assumes constant rate. For this model, sum of
squares of residuals (SSR) is 51.79. SSR follows a x2
distribution with the degree of freedom (df) equal to s(s
- 1) (s: number of OTUs) minus the number of free
parameters. The P value of this model is 0.02 which is
not satisfactorily
good. Fig. 2 shows generalized leastsquares fitting of the relationship between S/n and V/n,
where S and V refer to the numbers of transition and
transversion
differences
between OTUs, and n is the
number of nucleotides, based on Model 1. The plots of
human vs. two chimpanzees (node 2) are above the theoretical curves of the HKY85 and REV models. This
deviation would likely be due to change of the pattern
of nucleotide substitution in the human lineage relative
to the others. It should be noted that, when the transition
probability matrix of table 2a holds for all the lineages,
all the plots in figure 2 should be distributed along the
theoretical curve of the REV model even if the absolute
rate differs among different lineages. Model 2 (table 5)
which allows rate variation in the human lineages improves AIC. The estimate of B of human in Model 2
(B2) does not differ significantly from those of the other
branches (Br), and hence, in Model 3, B of human is
assumed to be identical to those of the other branches.
This reduction of the number of free parameters by one
from Model 2 improves AIC.
Introduction of more complicated models does not
improve AIC, and it turned out that Model 3 is the best
model in approximating
the data of four-fold degenerate
sites. The P value for Model 3 is as high as 0.17. The
estimates of Model 3 are 6.0 + 0.7, 3.5 + 0.5, and 1.6
+ 0.2 million years old (+ 1 SE), respectively,
for the
separations of gorilla, human/chimpanzee,
and chimpanzee/bonobo.
These estimates
are all consistent
with
Synonymous
0.3
0.25
0.2
$0.15
0.1
0.05
0
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
Vhl
FIG. 2.-Generalized
least-squares fitting of the relationship between S/n and V/n for the 1,669 sites data of four-fold degenerate sites
based on the HKY85 model (solid line). The dotted line represents the
theoretical curve for the REV model. Vertical and horizontal lines indicate 2SDs of S/n and V/n, respectively. The numbering corresponds
to that of nodes in figure 1, and node 4 corresponds to the orangutan
separation.
those estimated by the ML from the amino acid sequences (Adachi and Hasegawa 199%).
The estimates of transition
rate ws = 2(nrnc+
‘TF*~~)(x are 0.0674 2 0.0152 and 0.0314 5 0.0057/
million years/site for the human and the other lineages,
respectively. The estimate of transversion rate uv = 2(7~~
+ 7rc)(7rA + 7ro)p is 0.0050 2 0.0004/million
years/site
for Model 3. Therefore, the estimates of the total substitution rate, u = u, + uv, of four-fold degenerate sites
is 0.0724 2 0.0154 and 0.0363 + O.O059/million years/
site, respectively,
for human and the African apes.
The ratio of transition rate parameter 01 of human
to that of others is 2.15 5 0.42, suggesting significant
accerelation of transition rate in human. Similar analysis
of the total third position data (3,569 sites) also chooses
Model 3 as the best (Adachi 1995), and estimates the
ratio to be 2.12 5 0.31. However, the fit of this model
to the 3,569 sites data (P = 0.04) is not as good as to
the 1,667 sites data (P = 0.17), probably because of the
heterogeneity
of the rate between third positions of twofold and four-fold degenerate codons.
Another remarkable feature in figure 2 is that the
plots for the orangutan divergence are located significantly below the theoretical curves. By using the 4%kb
data of Horai et al. (1992) with siamang as an outgroup,
Adachi and Hasegawa (1995~) suggested that transversion rate in the orangutan lineages is higher by about
Substitutions
in Mitochondrial
DNA
205
1.5-fold than in the African apes/human clade, while the
transition rate in the orangutan does not differ significantly from the others. The deviation of the plots for
the orangutan divergence might be due to this putative
difference of the substitution pattern in orangutan from
the others.
For this reason, the above estimates of the branching dates within the African apes/human clade might be
biased to be young. In order to confirm this and to obtain
more reliable estimates, the entire sequence data of siamang or gibbon is needed. Nevertheless,
the relative
values of the estimates within the African apes/human
clade would be reliable, and as long as the gorilla
branching was not older than 6-7 million years, the human/chimpanzee
branching would be some 4-5 million
years old. Higher evolutionary
rate in the orangutan lineage than in the African ape lineages was suggested also
for the amino acid sequence level (Horai et al. 1992;
Adachi and Hasegawa 1995b).
In contrast to the obvious accerelation of transition
rate in human, there is no indication of higher transversion rate in human than in other hominoids.
Furthermore, numbers of amino acid differences in the 3,569
sites of mtDNA-encoded
proteins of gorilla from Anderson, DCMl, SB 17F, chimpanzee,
and bonobo are
209, 211, 213, 220, and 212, respectively, indicating no
higher amino acid substitution
rate in human than in
chimpanzees.
In spite of higher transitional
mutation
rate in human mtDNA, the average amino acid substitution rate seems not to differ significantly between human and chimpanzees probably because of higher constraints operating on the mtDNA-encoded
proteins of
human. On the other hand, it is clear that the transversion rate of four-fold degenerate sites and the amino acid
substitution rate are higher in orangutan than in others
(Adachi and Hasegawa 1995b, 1995~). The transition
rate may also be higher in orangutan than in the African
apes, but this is not obvious probably because of saturation. It is noteworthy that the length of 0.078 + 0.009
per site for the 4-orang (node 4 corresponds
to the
orangutan separation) branch estimated by Adachi and
Hasegawa (1995b) from the amino acid sequence data
including
siamang (Horai et al. 1992) is significantly
longer than that of 0.054 5 0.007 for 4-siamang
(5siamang plus 5-4; node 5 refers to the siamang separation), while the estimated length of 0.563 ? 0.089 per
site for 4-orang from the four-fold degenerate sites is
shorter than that of 1.252 + 0.158 for 4-siamang (Adachi and Hasegawa 1995~). The ratio of 4-siamang/4orang for the four-fold degenerate sites is 2.23 2 0.45,
and that for the amino acid sequences is 0.69 t 0.12,
which is significantly smaller than the former. If the substitution rate of four-fold degenerate sites represents mutation rate, these observations indicate that mutation rate
206
Adachi
and Hasegawa
differs among different lineages and that the extent of
constraints operating on proteins also differs among lineages in a different way from that of mutation rate.
Discussion
Various hypotheses have been proposed to explain
the apparently higher rate of mtDNA evolution in warmblooded vertebrates than in cold-blooded
ones (Thomas
and Beckenbach
1989; Adachi, Cao, and Hasegawa
1993; Martin, Naylor, and Palumbi 1992; Martin and
Palumbi 1993). It is known that oxygen radicals damage
DNA, and oxidative
damage is greatest to mtDNA
(Richter, Park, and Ames 1988). Although no single factor can explain all variations in rates of mtDNA evolution, species with higher metabolic rates and accordingly with higher content of oxygen radicals are likely
to have higher mutation rate of mtDNA (Martin, Naylor,
and Palumbi 1992; Martin and Palumbi 1993). The rate
difference demonstrated
in this work, however, is between the closely related species of human and chimpanzee, with presumably similar metabolic rate, and the
mechanism of this difference remains to be studied. In
this context, the higher rate of oxygen radical production
in rat liver mitochondria
than in mouse (Sohal, Svensson, and Brunk 1990) is interesting. This might explain
why rat has a higher evolutionary
rate of mtDNA than
mouse in spite of larger body size and presumably
of
lower metabolic rate (Martin and Palumbi 1993).
It is now clear that no universal clock for the evolution of mtDNA can be assumed in phylogenetic
analyses, and this underscores the attempt of dating by using
the simple clock. The dating is justified only by careful
analyses
taking account
the possible
rate variation
among lineages (Kishino and Hasegawa 1990). Even by
these analyses, we must take the estimates as approximate, because the model we use is always approximate.
Furthermore, there is always ambiguity in calibrating the
clock. If the orangutan separation was 16 million years
old, which is probably the oldest limit (Pilbeam 1988;
Andrews
1992; McCrossin
and Benefit 1993), rather
than 13 million years, the estimate of the human/chimpanzee separation from the 1,667 sites data becomes 4.3
+ 0.6 million years old. Taking account of the analyses
of the four-fold degenerate sites and of the amino acid
sequences
(Adachi and Hasegawa
199%, 199%) as
well, the overall evidence seems to suggest that the human/chimpanzee
separation
in the mtDNA tree was
some 4-5 million years old. Although
some nuclear
genes suggest earlier divergence
between human and
chimpanzee (Kishino and Hasegawa 1990), the discrepancy can be regarded to be due to ancestral polymorphism of the nuclear genes (Hasegawa 199 1; Adachi and
Hasegawa 1995b).
The higher transitional mutation rate of human suggested in this work is closely relevant to the dating of
the deepest root of the human mtDNA tree, and is in
favour of the recent origin hypothesis of modem humans
(Cann, Stoneking, and Wilson 1987; Vigilant et al. 1991;
Hasegawa et al. 1993), because, if this suggestion is real,
the dating by a constant rate clock with the human/chimpanzee separation as a reference must be an older estimate. Horai et al.‘s (1993) analysis of the control region
sequences from 193 humans of various ethnic origins
revealed that the African individual SB 17F had the most
diverged sequence, which coalesced directly into the
deepest root of the human mtDNA tree. We estimate the
date of the deepest root to be 70,000 + 20,000 years
old (the separation of SB17F from Anderson/DCMl)
from the best model (Model 3) in table 5. This age might
seem amazingly young, and a shortcoming of the model
might have made this a younger estimate than the truth.
While Model 3 assumes constant rate clock in the human lineage after the human/chimpanzee
separation
with higher rate than in the chimpanzee lineage, that is,
the rate change is assumed to have occurred discontinuously at the point of the human/chimpanzee
separation,
it must have occurred more continuously
in the real evolutionary process. However, we cannot estimate the time
period of the rate change and its pattern directly from
the data presently available, though the dating is dependent on such details of a model.
From the nonsynonymous
sites and the RNA genes,
Horai et al. (1995) estimated 450,000 + 70,000 years
old for the deepest root of the human mtDNA tree, while
they estimated 143,000 + 18,000 years from the synonymous sites. Since the relative rate test by interspecies
comparison
of amino acid differences
of mtDNA-encoded proteins does not suggest higher rate on average
in humans than in chimpanzees,
the discrepancy
between the two estimates might suggest that the acceleration of amino acid substitution
has occurred in the
human lineage quite recently in the evolutionary
time
scale long after the human/chimpanzee
separation. Takahata (1993) proposed the hypothesis that the relaxation
of selective constraint
began with the emergence
of
Homo sapiens. The above finding is highly interesting
in this respect. In this way, the evolutionary
rate seems
to have changed in a complicated way. Although there
remain several uncertainties
on the date of the deepest
root of the human mtDNA tree, our analysis would reinforce the recent origin hypothesis of modem humans
(Cann, Stoneking, and Wilson 1987).
Acknowledgments
We thank H. Kishino and J. Reeves for helpful
comments and discussions, and two anonymous review-
Synonymous Substitutions in Mitochondrial DNA
ers for comments which contributed to the improvement
of the manuscript.
This paper constitutes a part of the
research done by J.A. in partial fulfillment of a Ph.D.
degree at The Graduate University for Advanced Studies. This work was supported by grants from the Ministry of Education,
Science, and Culture of Japan to
M.H.
LITERATURE CITED
ADACHI, J. 1995. Modeling of Molecular Evolution and Maximum Likelihood Inference of Molecular Phylogeny. Ph.D.
dissertation, The Graduate University for Advanced Studies.
ADACHI, J., Y. CAO, and M. HASEGAWA. 1993. Tempo and
mode of mitochondrial DNA evolution in vertebrates at the
amino acid sequence level: rapid evolution in warm-blooded vertebrates. J. Mol. Evol. 36:270-281.
ADACHI, J., and M. HASEGAWA. 199%. MOLPHY: Programs
for Molecular Phylogenetics,
ver. 2.3. Institute of Statistical
Mathematics, Tokyo.
-.
199% Improved dating of the human-chimpanzee
separation in the mitochondrial
DNA tree: heterogeneity
among amino acid sites. J. Mol. Evol. 40:622-628.
. 1995~. Time scale for the mitochondrial DNA tree of
human evolution. Pp. 46-68 in S. BRENNER and K. HANIHARA, eds. The origin and past of modern humans as
viewed from DNA. World Scientific Publications,
Singapore.
AKAIKE, H. 1973. Information theory and an extension of the
maximum likelihood principle. Pp. 267-281 in B. N. PETROV and E CSAKI, eds. Second International
Symposium
on Information Theory, Akademiai Kiado, Budapest.
-.
1974. A new look at the statistical model identification. IEEE Trans. Autom. Contr. 19:716-723.
ANDERSON, S., A. T. B ANKIER, B. G. B ARRELL et al. ( 14 coauthors). 198 1. Sequence and organization of the human
mitochondrial genome. Nature 290:457464.
ANDREWS, l? 1992. Evolution and environment in the Hominoidea. Nature 360641-646.
CANN, R. L., M. STONEKING,and A. C. WILSON. 1987. Mitochondrial DNA and human evolution. Nature 32531-36.
COLLINS, T. M., I? H. WIMBERGER, and G. J. l? NAYLOR. 1994.
Compositional bias, character-state bias, and character-state
reconstruction using parsimony. Syst. Biol. 43:482496.
GOLDMAN, N. 1990. Maximum likelihood inference of phylogenetic trees, with special reference to a Poisson process
model of DNA substitution
and to parsimony analyses.
Syst. Zool. 39:345-361.
HASEGAWA, M. 1991. Molecular phylogeny and man’s place
in Hominoidea. J. Anthrop. Sot. Nippon 99:49-61.
HASEGAWA, M., A. DI RIENZO, T. D. KOCHER, and A. C. WILSON. 1993. Toward a more accurate time scale for the human mitochondrial DNA tree. J. Mol. Evol. 37:347-354.
HASEGAWA, M., and H. KISHINO. 1989. Heterogeneity of tempo and mode of mitochondrial DNA evolution among mammalian orders. Jpn. J. Genet. 64:243-258.
HASEGAWA, M., H. KISHINO, and T. YANO. 1985. Dating of
the human-ape splitting by a molecular clock of mitochondrial DNA. J. Mol. Evol. 22:160-174.
207
HORAI, S., K. HAYASAKA, R. KONDO, K. TSUGANE, and N.
TAKAHATA. 1995. The recent African origin of modem humans revealed by complete sequences of hominoid mitochondrial DNAs. Proc. Natl. Acad. Sci. USA, 92:532-536.
HORAI, S., R. KONDO, Y. NAKAGAWA-HATTORI,S. HAYASHI,
S. SONODA, and K. TAJIMA. 1993. Peopling of the Americas, founded by four major lineages of mitochondrial DNA.
Mol. Biol. Evol. 10:23-47.
HORAI, S., Y. SATTA, K. HAYASAKA, R. KONDO, T. INOUE, T
ISHIDA, S. HAYASHI, and N. TAKAHATA. 1992. Man’s place
in Hominoidea revealed by mitochondrial DNA genealogy.
J. Mol. Evol. 353243;
Erratum 37:89 (1993).
KISHINO, H., and M. HASEGAWA. 1990. Converting distance to
time: an application to human evolution. Methods Enzymol.
183:550-570.
KONDO, R., S. HORAI, Y. SARA, and N. TAKAHATA. 1993.
Evolution of hominoid mitochondrial DNA with special reference to the silent substitution rate over the genome. J.
Mol. Evol. 36:5 17-53 1.
MARTIN, A. I?, G. J. P NAYLOR, and S. R. PALUMBI. 1992.
Rates of mitochondrial DNA evolution in sharks are slow
compared with mammals. Nature 357: 153-155.
MARTIN, A. I?, and S. R. PALUMBI. 1993. Body size, metabolic
rate, generation time, and the molecular clock. Proc. Natl.
Acad. Sci. USA 90:4087-4091.
MCCROSSIN, M. L., and B. R. BENEFIT. 1993. Recently recovered Kenyupithecus mandible and its implications for great
ape and human origins. Proc. Natl. Acad. Sci. USA 90:
1962-1966.
OZAWA, T., M. TANAKA, H. INO et al. (14 co-authors). 1991.
Distinct clustering of point mutations in mitochondrial
DNA among patients with mitochondrial encephalomyopathies and Parkinson’s
disease. Biochem. Biophys. Res.
Commun. 176:938-946.
PERNA, N. T., and T. D. KOCHER. 1995. Unequal base frequencies and the estimation of substitution rates. Mol. Biol.
Evol. 12:359-361.
PILBEAM, D. 1988. Human origins and evolution. Pp. 89-114
in A. C. Fabian, ed. Origins. Cambridge Univ. Press, Cambridge.
RICHTER, C., J.-W. PARK, and B. N. AMES. 1988. Normal oxidative damage to mitochondrial
and nuclear DNA is extensive. Proc. Natl. Acad. Sci. USA 856465-6467.
SACCONE, C., C. LANAVE, G. PESOLE, and G. PREPARATA.
1990. Influence of base composition on quantitative estimates of gene evolution. Methods Enzymol. 183:57&583.
SOHAL, R. S., I. SVENSSON, and U. T. BRUNK. 1990. Hydrogen
peroxide production by liver mitochondria in different species. Mech. Ageing Dev. 53:209-215.
TAKAHATA, N. 1993. Relaxed natural selection in human populations during the Pleistocene. Jpn. J. Genet. 68:539-547.
TAMURA, K. 1994. Model selection in the estimation of the
number of nucleotide substitutions. Mol. Biol. Evol. ll:/
001014/157.
TAMURA, K., and M. NEI. 1993. Estimation of the number of
nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. Mol. Biol. Evol.
10:5 12-526.
208
Adachi
and Hasegawa
TANAKA,M., and T OZAWA. 1994. Strand asymmetry in human mitochondrial DNA mutations. Genomics 22:327-335.
THOMAS,W. K., and A. T. BECKENBACH.1989. Variation in
salmonid mitochondrial DNA: evolutionary constraints and
mechanisms of substitution. J. Mol. Evol. 29:233-245.
VIGILANT,L., M. STONEKING, H. HARPENDING, K. HAWKES,
and A. C. WILSON. 1991. African populations and the evo-
lution of human mitochondrial DNA. Science 253:15031507.
YANG, Z. 1994. Estimating the pattern of nucleotide substitution. J. Mol. Evol. 39: 105-l 11.
TAKASHI GOJOBORI, reviewing
Accepted
August
18, 1995
editor
Download