Workshop Handouts - UNC Center for Bioinformatics

advertisement
Introduction to MEGA
Download at: http://www.megasoftware.net/index.html
Manual at: www.megasoftware.net/mega4
Thomas Randall, PhD
tarandal@email.unc.edu
Use of phylogenetic analysis
software tools
Bioinformatics software for biologists
in the genomics era
Sudhir Kumar and Joel Dudley
Bioinformatics 23: 1713-1717
Fig 1(B) Relative impacts of evolutionary analysis software packages over the last 10 years. Only non-commercial software packages
available on-line (without fee) are included, except for two available for a nominal fee (shown with dashed line). Data for both panels were
obtained from the Web of Science (February 2007 edition). For panel B, the numbers of new citation were generated using the ‘Cited
References’ facility with the search arguments for author name, cited work and citation year kindly provided by Joe Felsenstein for MEGA
(www.megasoftware.net), PAUP (paup.csit.fsu.edu), PHYLIP (evolution.genetics.washington.edu/phylip.html), MrBayes
(mrbayes.csit.fsu.edu), Puzzle (www.tree-puzzle.de), PhyML (atgc.lirmm.fr/phyml) andPAML (abacus.gene.ucl.ac.uk/software/paml.html).
MEGA contains all elements
necessary for building a tree
• Import and editing
sequence/chromatographs
• Clustalw for alignment
• Various options for contructing a
phylogeny
• Several options for generating statistical
significance
• Tree viewing function
Basic steps to build a
phylogeny
1.
2.
3.
4.
5.
Import and Align sequences
Select tree building option
Select distance matrix
Choose type of bootstrapping
Manipulate tree with tree viewer
Phylogeny options in MEGA4
•
•
•
•
UPGMA
Neighbor joining
Minimum evolution
Maximum parsimony
Distance methods
General rules
build tree with two independent methodologies for confirmation
in MEGA - one distance method plus parsimony
Maximum parsimony less effective for more distantly related sequences due to
homoplasy (multiple substitutions at same site can accumulate over time)
WARNING: “Phylogenetics has a long history of heated arguments about the relative merits
of different methods—researchers in the field seem preadapted for ideological warfare—”
Huelsenbeck et al., Syst. Biol. 51: 673
UPGMA (Unweighted Pair
Group Method with Arithmatic
Mean )
• UPGMA employs a sequential clustering algorithm (neighbor
joining), in which pairwise distances between sequences are
computed, and the phylogenetic tree is built in a stepwise manner.
We first identify from among all the sequences the two that are most
similar to each other and then treat these as a new single branch.
Subsequently from among the remaining sequences we identify the
pair with the highest similarity, and so on.
• Assumes equal evolutionary rates (a clock)
Neighbor-Joining
An algorithm for constructing phylogenetic trees using distance
data. Once a distance measurement between a set of
sequences has been determined, a neighbor joining algorithm
will find the two closest, group them, then look for the next
closest until all sequences are fit into a tree. Different algorithms
for doing this have been written that either do or do not consider
evolutionary distance.
Examples: clustalw, UPGMA, neighbor (phylip)
Difference between this and UPGMA (also a neighbor joining method)
Is it does not assume a constant evolutionary rate in all lineages
Minimum evolution
All possible trees are produced, the tree with the smallest total branch
Length is chosen as the best tree. Branch length is proportional to the
distance between each sequence.
Maximum Parsimony
–
The selection of the phylogenetic tree requiring the least
number of substitutions from among all possible phylogenetic
trees as the most likely to be the true phylogenetic tree.
–
Usefulness declines with increasing evolutionary distance
Johns Hopkins University - Fall 2003
Phylogenetics & Computational Genomics - 410.640.71
Informative sites in parsimony
OTU 1
2
3
4
1
T
T
T
T
2
C
T
T
T
3
A
A
C
C
4
G
G
G
T
5
A
A
A
A
6
T
A
T
A
7
C
C
C
G
8
T
T
G
G
9
A
A
A
A
10 Sites
G
G
G
C
Invariant sites are not used in parsimony (they yield no information on character
state changes)
Informative sites (at least two different kinds of residues – each present at least two
times) are used by parsimony because they discriminate between topologies –
i.e. different topologies require different numbers of changes between residues
Singleton sites can not be used to discriminate between topologies (they require 1
change for all topologies)
Lecture #7
Page 3
Maximum parsimony (MP)
options
•
Exhaustive
Not an option here, but all possible trees are searched, practically this takes too much time so
various shorcuts (branch and bound, heuristic) have been developed
•
Branch and bound
This is a method of searching through tree space in order to find optimal trees. It is not exhaustive,
trees with a total length longer than those already examined are not considered, reducing the
complexity of the search. Guaranteed to find all MP trees. Becomes time consuming if more than
20 sequences are considered
•
Heuristic
Another approximate search, still using a branch and bound approach but making more
assumptions. More useful for larger trees but no guarantee of finding the MP tree with the shortest
length
•
CNI (Close-Neighbor-Interchange)
In any method, examining all possible topologies is very time consuming. This algorithm reduces
the time spent searching by first producing a temporary tree, and then examining all of the
topologies that are different from this temporary tree by a topological distance of dT = 2 and 4. If
this is repeated many times, and all the topologies previously examined are avoided, one can
usually obtain the tree being sought.
*
Statistical tests of significance
Bootstrapping
*
This is a method of attempting to estimate confidence levels of inferred relationships. The bootstrap
proceeds by resampling the original data matrix with replacement of the characters. It is analagous to
cutting the data matrix into individual columns of data and throwing the characters into a hat. A
character is then drawn at random from this hat and it becomes the first character of the new
datamatrix. The character is then replaced in the hat, the hat is shaken and again another character is
drawn from the hat. This process is repeated until our new pseudoreplicate is the same size as the
original. Some characters will be sampled more than once and some will not be sampled at all. This
process is repeated many times (say, 100-1,000) and phylogenies are reconstructed each time. After
the bootstrap procedure is finished, a majority-rule consensus tree is constructed from the optimal tree
from each bootstrap sample. The bootstrap support for any internal branch is the number of times it was
recovered during the bootstrapping procedure.
Interior Branch Test
Similar to bootstrapping but is unwieldy with a large number of taxa. A t-test, which is computed using the
bootstrap procedure, is constructed based on the interior branch length and its standard error and is
available only for the NJ and Minimum Evolution trees. MEGA shows the confidence probability in the
Tree Explorer; if this value is greater than 95% for a given branch, then the inferred length for that
branch is considered significantly positive.
Other phylogeny software
PHYLIP
MrBayes: Bayesian
Inference of Phylogeny
TREE-PUZZLE 5.2:
Maximum likelihood analysis
MEGA has no ability to do either maximum likelihood analysis
or bayesian inference. These are more sophisticated, and
computationally intensive (and can be more accurate for distantly
related sequences)
Distance
Distance is a phylogenetic method that considers the additive differences between
either nucleotides or amino acids along the entire length of sequence. A
distance measurement is made considering each type of substitution (either
transversion or transition) weighted differently, depending on the distance
algorithm and weighting matrix used. As distances are re-computed for all
possible pairs of sequence during each step of the assembly this can be
computationally intensive.
12 898
Homo_sapie
Pan
Gorilla
Pongo
Hylobates
Macaca_fus
M_mulatta
M_fascicul
M_sylvanus
Saimiri_sc
Tarsius_sy
Lemur_catt
AAGCTTCACC
AAGCTTCACC
AAGCTTCACC
AAGCTTCACC
AAGCTTTACA
AAGCTTTTCC
AAGCTTTTCT
AAGCTTCTCC
AAGCTTCTCC
sAAGCTTCAC
aAAGTTTCAT
AAGCTTCATA
GGCGCAGTCA
GGCGCAATTA
GGCGCAGTTG
GGCGCAACCA
GGTGCAACCG
GGCGCAACCA
GGCGCAACCA
GGCGCAACCA
GGTGCAACTA
CGGCGCAATG
TGGAGCCACC
GGAGCAACCA
TTCTCATAAT
TCCTCATAAT
TTCTTATAAT
CCCTCATGAT
TCCTCATAAT
TCCTTATGAT
TCCTCATGAT
CCCTTATAAT
TCCTTATAGT
ATCCTAATAA
ACTCTTATAA
TTCTAATAAT
CGCCCACGGG
CGCCCACGGA
TGCCCACGGA
TGCCCATGGA
CGCCCACGGA
CGCTCACGGA
TGCTCACGGA
CGCCCACGGG
TGCCCATGGA
TCGCTCACGG
TTGCCCATGG
CGCACATGGC
CTTACATCCT
CTTACATCCT
CTTACATCAT
CTCACATCCT
CTAACCTCTT
CTCACCTCTT
CTCACCTCTT
CTCACCTCTT
CTCACCTCTT
GTTTACTTCG
CCTCACCTCC
CTTACATCAT
12
Homo_sapie
0.288560
Pan
0.315343
Gorilla
0.291143
Pongo
0.309930
Hylobates
0.297051
Macaca_fus
0.036582
M_mulatta
0.000000
M_fascicul
0.098273
M_sylvanus
0.129816
Saimiri_sc
-1.000000
Tarsius_sy
-1.000000
Lemur_catt
0.393103
0.000000
0.310181
0.094328
0.339246
0.110803
0.329470
0.182639
0.330862
0.210562
0.322962
0.286715
0.088360
0.288560
0.098273
0.310181
0.000000
0.321059
0.133409
-1.000000
-1.000000
-1.000000
-1.000000
0.431062
0.407353
0.094328
0.321059
0.000000
0.311692
0.113612
0.304045
0.195508
0.302154
0.219479
0.301975
0.303507
0.135182
0.315343
0.129816
0.339246
0.133409
0.311692
0.000000
-1.000000
-1.000000
-1.000000
-1.000000
0.432920
0.390241
0.110803
-1.000000
0.113612
-1.000000
0.000000
-1.000000
0.189484
-1.000000
0.219367
-1.000000
0.292586
-1.000000
0.291143
-1.000000
0.329470
-1.000000
0.304045
-1.000000
-1.000000
0.000000
-1.000000
0.483555
0.403571
-1.000000
0.182639
-1.000000
0.195508
-1.000000
0.189484
-1.000000
0.000000
-1.000000
0.220062
-1.000000
0.306528
-1.000000
0.309930
-1.000000
0.330862
-1.000000
0.302154
-1.000000
-1.000000
0.483555
-1.000000
0.000000
0.401607
-1.000000
0.210562 0.286715
0.431062
0.219479 0.303507
0.432920
0.219367 0.292586
0.403571
0.220062 0.306528
0.401607
0.000000 0.308618
0.407699
0.308618 0.000000
0.382417
0.297051 0.036582
0.393103
0.322962 0.088360
0.407353
0.301975 0.135182
0.390241
-1.000000 -1.000000
-1.000000
-1.000000 -1.000000
-1.000000
0.407699 0.382417
0.000000
matrix listing all pairwise differences
DNA Distance matrices
A
G
C
T
Jukes-Cantor distance
In the Jukes-Cantor model, the rate of nucleotide
substitution is the same for all pairs of the four
nucleotides A, T, C, and G.
Many more models, with increasing complexity
Distance matrices in Mega
Kimura 2-parameter distance
Kimura’s two parameter model corrects for different substitution
rates between transitions (i.e. purine to purine) and transversions (i.e. purine to pyrimidine).
Tamura-Nei distance
The Tamura-Nei model (1993) corrects for multiple hits, taking into account the
differences in substitution rate between nucleotides and the inequality of nucleotide
frequencies. It distinguishes between transitional substitution rates between purines and
transversional substitution rates between pyrimidines. It also assumes equality of
substitution rates among sites (see related gamma model).
Also:
# differences
Tamura 3-parameter
LogDet
Which DNA distance matrix is
appropriate?
–
When the Jukes-Cantor * estimate of the number of nucleotide substitutions per site (d)
between different sequences is about 0.05 or less (d < 0.05), use the Jukes-Cantor distance
whether there is a transition/transversion bias or not or whether the substitution rate (l) varies
with nucleotide site or not. In this case, the Kimura distance or the gamma distance gives
essentially the same value as the Jukes-Cantor distance. One may also use the p-distance
for constructing a topology.
–
When 0.05 < d < 0.3, use the Jukes-Cantor distance unless the transition/transversion ratio
(R) is high, say R >5. When this ratio is high and the number of nucleotides examined is
large, (>10K) use the Kimura distance or the gamma distances for Kimura's 2-parameter
model.
–
When 0.3 < d < 1 and there is evidence that l varies extensively with site, use gamma
distances. In general, one may choose different gamma distances, estimating a from data.
–
When 0.3 < d < 1 and the frequencies of the four nucleotides (A, T, C, G) deviate
substantially from equality but there is no strong transition/transversion bias, use the TajimaNei distance. When there are strong transition/transversion and G+C content biases, use the
Tamura or Tamura-Nei distance.
–
When d > 1 for many pairs of sequences, the phylogenetic tree estimated is not reliable for a
number of reasons (e.g., large standard errors of d's and sequence alignment errors). We
therefore suggest that these sets of data should not be used.
Protein Distance matrices
in Mega
•
p-distance
This distance is the proportion (p) of amino acid sites at which the two sequences to be compared
are different. It is obtained by dividing the number of amino acid differences by the total number of
sites compared. It does not make any correction for multiple substitutions at the same site or
differences in evolutionary rates among sites.
•
Equal Input Model (Amino acids)
In real data, frequencies usually vary among different kind of amino acids. In this case, the
correction based on the equal input model gives a better estimate of the number of amino acid
substitutions than the Poisson correction distance. Note that this assumes an equality of
substitution rates among sites and the homogeneity of substitution patterns between lineages.
•
Poisson correction
The Poisson correction distance assumes equality of substitution rates among sites and equal
amino acid frequencies while correcting for multiple substitutions at the same site.
•
PAM & JTT
The PAM and JTT distances correct for multiple substitutions based on a model of amino acid
substitution described as substitution-rate matrices.
*
ModelTest does a likelihood analysis on your data to determine
The most appropriate DNA substitution matrix.
WARNING: only for advanced users, also requires PAUP for an input
FindModel – web based version of ModelTest
Input is a concatenated fasta file
http://hcv.lanl.gov/content/hcv-db/findmodel/findmodel.html
Result:
MODEL CONSIDERED:
JC : Jukes-Cantor (model 1)
AIC1 = 27875.89594 lnL = -13937.947970
FindModel output
JC+G : Jukes-Cantor plus Gamma (model 3)
AIC3 = 27877.899848 lnL = -13937.949924
F81 : Felsenstein 1981 (model 5)
AIC5 = 27352.654274 lnL = -13673.327137
F81+G : Felsenstein 1981 plus Gamma (model 7)
AIC7 = 27354.660556 lnL = -13673.330278
K80 : Kimura 2-parameter (model 9)
AIC9 = 27871.085794 lnL = -13934.542897
AIC = Akaike Information Criterion
lnL = maximum likelihood
K80+G : Kimura 2-parameter plus Gamma (model 11)
AIC11 = 27872.977786 lnL = -13934.488893
HKY : Hasegawa-Kishino-Yano (model 13)
AIC13 = 27336.418362 lnL = -13664.209181
HKY+G : Hasegawa-Kishino-Yano plus Gamma (model 15)
AIC15 = 27338.425764 lnL = -13664.212882
AICi = −2 ln Li + 2ki
Model favored is the one with the lowest AIC
TrN : Tamura-Nei (model 21)
AIC21 = 27338.336148 lnL = -13664.168074
TrN+G : Tamura-Nei plus Gamma (model 23)
AIC23 = 27340.335138 lnL = -13664.167569
GTR : General Time Reversible (model 53)
AIC53 = 27342.287716 lnL = -13663.143858
GTR+G : General Time Reversible plus Gamma (model 55)
AIC55 = 27344.30355 lnL = -13663.151775
AIC-SELECTED MODEL: HKY : Hasegawa-Kishino-Yano (model 13)
lnL = -13664.209181
AIC = 27336.418362
DNA Substitution models in ModelFind
Reduced set:
JC : Jukes-Cantor (model 1)
JC+G : Jukes-Cantor plus Gamma (model 3)
F81 : Felsenstein 1981 (model 5)
F81+G : Felsenstein 1981 plus Gamma (model 7)
K80 : Kimura 2-parameter (model 9)
K80+G : Kimura 2-parameter plus Gamma (model 11)
HKY : Hasegawa-Kishino-Yano (model 13)
HKY+G : Hasegawa-Kishino-Yano plus Gamma (model 15)
TrN : Tamura-Nei (model 21)
TrN+G : Tamura-Nei plus Gamma (model 23)
GTR : General Time Reversible (model 53)
GTR+G : General Time Reversible plus Gamma (model 55)
Red indicates models available in MEGA
If a model in black is suggested, use the one immediately below
If GTR is suggested, use LogDet
parallelized clustalw
http://cbsuapps.tc.cornell.edu/clustalw.aspx
non parallelized clustalw
http://inquiry.unc.edu/inquiry/
Many MSA algorithms
PLOS Comp. Biol. 3: e123
Alternative alignment tools
FACT: in published comparisons between
alignment tools, clustalw usually comes
out close to the bottom
T Coffee – better, more computationally intensive
Muscle – better, less intensive than T Coffee
Promals – designed to optimize alignment for
distantly related sequences
Outputs for above need to be put in
Appropriate format (.aln, .phy, .nex)
http://prodata.swmed.edu/promals/promals.php
http://www.drive5.com/muscle/
http://cbsuapps.tc.cornell.edu/t_coffee.aspx
Displaying extensions on a PC
• My Computer > Tools > Folder Options >
View > unclick on “Hide Extensions…”
• Also, Control Panels > Folder Options >
View > unclick on “Hide Extensions…”
Test data sets
Nature 442: 37
Science 320: 499
Computing d
1) Compute Jukes-Cantor distance; examine distance matrix. If d < 0.05 stop and
use Jukes-Cantor substitution model
2) If 0.05 < d < 0.3, check R also; use Kimura 2 parameter option for computing d;
change “Substitutions to Include” option from
d: transitions + transversions to R = s/v and calculate
3) Choose model based on the guide on previous page
Analysis Preferences:
Setting up an analysis
User defined options
Analysis Preferences
(Distance Computation)
Substitution Model - In this set of options, you choose the various attributes of the substitution
models.
•
Model - Here you select a stochastic model for estimating evolutionary distance by clicking on the
ellipses to the right of the currently selected model (click on the lime square to select this row
first). This will reveal a menu containing many different distance methods and models.
•
Substitutions to Include - Depending on the distance model or method selected, the
evolutionary distance can be teased into two or more components. By clicking on the drop-down
button (first click on the lime square to select this row), you will be provided with a list of
components relevant to the chosen model.
•
Transition/Transversion Ratio - This option will be visible if the chosen model requires you to
provide a value for the Transition/Transversion ratio (R).
•
Pattern among Lineages - This option becomes available if the selected model has formulas that
allow the relaxation of the assumption of homogeneity of substitution patterns among lineages.
•
Rates among Sites - This option becomes available if the selected distance model has formulas
that allow rate variation among sites. If you choose gamma-distributed rates, then the Gamma
parameter option becomes visible.
Treatment of gaps
Gaps often are inserted during the alignment of homologous regions of sequences and
represent deletions or insertions (indels). They introduce some complications in
distance estimation. Furthermore, sites with missing information sometimes result
from experimental difficulties; they present the same alignment problems as gaps. In
the following discussion, both of these situations are treated in the same way.
In MEGA, there are two ways to treat gaps. One is to delete all of these sites from the
data analysis. This option, called the Complete-Deletion, is generally desirable
because different regions of DNA or amino acid sequences evolve under different
evolutionary forces. The second method is relevant if the number of nucleotides
involved in a gap is small and if the gaps are distributed more or less randomly. In
that case it may be possible to compute a distance for each pair of sequences,
ignoring only those gaps that are involved in the comparison; this option is called
Pairwise-Deletion. The following table illustrates the effect of these options on
distance estimation with the following three sequences:
Complete-Deletion *
Pairwise-Deletion
Uniform Rates vs. Gamma
distribution
Ignore this option as MEGA has no way to calculate “a”, the value of gamma distribution
A gamma distribution reflects that there is a substitution difference between
different amino acids/nucleotides; a = 1, subsitution variation is very high;
a = infinity, all substitutions are equally likely
Tree Explorer
• Save tree as .emf file (for ppt or word)
northNigeria
57
96
turkey Turkey2005
swan Czech2006
48
36
mallard B avaria2006
26
27
swan Mongolia2005
swan Astrakhan2005
turkey Suzdalka2005
swan Iran2006
mallard Italy2005
• Save tree as .nwk file (for opening in other
tree viewers)
((((northNigeria:0.00240616,((turkey_Turkey2005:0.00314559,swan_Czech2006:0.00255065)0.96:0.00324648,(mallard_Bavaria2006:0.00405158,
swan_Mongolia2005:0.00164881)0.27:0.00003413)0.27:0.00003360)0.57:0.00084134,swan_Astrakhan2005:0.00402428)0.49:0.00081094,
turkey_Suzdalka2005:0.00717544)0.37:0.00028133,swan_Iran2006:0.00464073,mallard_Italy2005:0.09386135);
• Save tree as .mts file (for opening in
MEGA)
LagosSO494
72
60
100
LagosSO493
chicken Egypt2006
92
swan Czech2006
55
turkey Turkey2005
northNigeria
11
26
13
swan Mongolia2005
goose Iraq2006
34
40
Mr Bayes
1,000,000
generations
1.5 hrs-cluster
98
100
4
turkey Turkey20
12
swan Czech2006
13
goose Iraq2006
swan Mongolia20
63
15
31
goose Novo2005
chicken Tula200
20
100
LagosBA209
91
27
LagosBA211
10 0
LagosBA209
99
15
goose Novo2005
37
LagosBA211
100
100
turkey Suzdalka2005
100
35
swan Iran2006
chicken Tula2005
82
LagosBA210
Gull Qinghai200
Gull Qinghai2005
39
duck Kurgan2005
swan Astrakhan2
LagosBA210
87
northNigeria
mallard Bavaria
56
duck Kurgan2005
swan Astrakhan2005
80
LagosSO300
chicken Egypt20
mallard Bavaria2006
24
LagosSO493
100
LagosSO300
63
9
LagosSO452
LagosSO494
66
MEGA
NJ
bootstrapping
<1 min
laptop 1G RAM
LagosSO452
16
78
swan Iran2006
turkey Suzdalka
87
chicken Thai2005
chicken Thai200
duck Jiangxi2005
duck Jiangxi200
chicken Hebei2005
chicken Hebei20
mallard Italy2005
mallard Italy20
swan Iran2
397
PHYLIP
dnapars
bootstrapping
30 min
laptop 1G RAM
456
258
500
416
482
goose Iraq
mallard It
swan Astra
chicken He
chicken Tu
duck Jiang
turkey Suz
chicken Th
goose Novo
turkey Suz
Gull Qingh
chicken Tu
northNiger
goose Iraq
mallard Ba
LagosBA209
247
472
56
LagosBA210
LagosBA211
134
swan Iran2
LagosSO300
303
LagosSO493
213
209
73
297
215
LagosSO494
swan Czech
turkey Tur
422
161
243
65
duck Kurga
77
mallard Ba
swan Mongo
chicken Th
55
duck Jiang
213
405
65
Tree-Puzzle
maximum likelihood
10,000 steps
<1 min
laptop 1G RAM
mallard It
LagosBA211
98
87
LagosBA210
swan Astra
98
duck Kurga
65
swan Czech
turkey Tur
chicken Eg
96
LagosSO493
97
LagosSO300
98
LagosSO494
95
Dataset from Nature 442: 37 Multiple introductions of H5N1 in Nigeria
northNiger
LagosBA209
91
chicken He
chicken Eg
Gull Qingh
swan Mongo
74
LagosSO452
51
goose Novo
99
LagosSO452
Tree Explorer
•
Condensed Trees
When several interior branches of a phylogenetic tree have low statistical support (PC or PB) values, it
often is useful to produce a multifurcating tree by assuming that all interior branches have a
branch length equal to 0. We call this multifurcating tree a condensed tree. In MEGA, condensed
trees can be produced for any level of PC or PB value. For example, if there are several branches
with PC or PB values of less than 50%, a condensed tree with the 50% PC or PB level will have a
multifurcating tree with all its branch lengths reduced to 0.
•
Consensus Tree
The MP method produces many equally parsimonious trees. Choosing this command produces a
composite tree that is a consensus among all such trees, for example, either as a strict
consensus, in which all conflicting branching patterns among the trees are resolved by making
those nodes multifurcating or as a Majority-Rule consensus, in which conflicting branching
patterns are resolved by selecting the pattern seen in more than 50% of the trees.
Importing trees from other phylogenetic tools
Work – outtrees from phylip, .dnd and .phb files from clustalw
TreePuzzle, Mr Bayes (.con file needs a little processing)
MEGA4
Caption View
Caption function gives a publication quality summary of analysis,
and suggested references for publication
About authors
Gene Duplication and Gene
Subsitution in Evolution
Masatoshi Nei
Nature 221: 40
Evolution by the Birth-and-Death
Process in Multigene Families of
the Vertebrate Immune System
Nei, M., et al.
Proc. Natl. Acad. Sci USA 94: 7799
MEGA3: Integrated software for
Molecular Evolutionary Genetics
Analysis and sequence alignment
Sudhir Kumar, K Tamura, and M Nei
Briefings in Bioinformatics 5:150-163
The Neighbor-joining Method: A
New Method for
Reconstructing Phylogenetic Trees
Naruya Saitou and Masatoshi Nei
Mol. Biol. Evol 4: 406
Much of the material in this handout derived from:
Molecular evolution and phylogenetics 2000
M Nei, S Kumar - Oxford Univ. Press, New York
Download