Lecture 1

advertisement
Claude dePamphilis
Lecture 1, Phylogenetics
Biol 597
Fall 2005
Why tree building?
Can learn a lot from trees – almost never interested in the tree
itself as the end product, but what we can learn from the tree.




improves sequence alignment
platform for comparative biology
trace history of genomes, genes, organisms
Tree is a way of organizing information, and making it
possible to make inferences or predictions.
Roughly divided into five "units". Some of the topics that you
should especially focus on, with one or a few goals or questions
follow each section.
 Concepts of trees, viewing trees, interpreting trees.
Difference between gene and species trees. Other basic
concepts.
 Methods of tree building: begin with multiple sequence
alignment, several tree building approaches
o Parsimony:
o Distance models for sequence evolution: inference
of constraint and adaptive evolution
o Likelihood methods as a general tool for hypothesis
testing
o Confidence in trees
 Inferences from trees: gene family analysis, constraint
in protein coding regions, inference of ancestors
 Software
 Specialty topics: molecular clocks, Bayesian inference
1
1. concepts of trees and inferences based on trees
- Graphical hypotheses of evolutionary history (and shared
ancestry)
coli
worm
human
increasing complexity - yes
problem: contemp.ancestry
- trees as hypotheses of evolutionary history (and shared
ancestry)
coli
worm
human
shared ancestry
 branch points – nodes – ancestors
example
 end points – leaves – terminals – tips – current day
2
 Views of trees
o Any Rotation – same thing
o Topology defines the tree
 Rooted trees, unrooted trees
o significance: rooted trees imply ORDER of events
 branching order (cladogram), or degree of divergence
(phylogram)
o Show branching order only – topology - cladogram
o Show degree of divergence – phenogram
Examples – fish, chicken, mouse, human – rooted
phenogram –
3
- species trees [ORTHOLOGOUS SEQUENCES]
o each leaf - different species
o data may be single gene or many genes combined
together
o usually – gene or protein sequence – must be aligned
(MSA). See Clustal paper.
o phylogenetic tree – reveals species relationships –
 examples – tree of life –
o monophyly, paraphyly, polyphyly -- draw tree - monophyletic – common ancestor and all
descendents
 example: mammals, flowering plants
 paraphyletic – common ancestor, but not all
descendents.
o Prokaryote
o dinosaurs
o “seed plant”
 polyphyletic - common ancestor not group
o organisms with color vision
4
common ancestry: ancestral organisms  will be able to infer features of these ancestors
– organisms
 if tree is rooted, can infer order of the events
 who came first, the chicken or the egg?
Example: pro prop pro pro euk euk euk
- gene trees I:
o contain different genes from one or more organisms
 example – globin genes –
everyone w/ 2 copies
alpha  alpha alpha  alpha beta
o traces history of gene duplications –
 paralog – genes whose common ancestor is a
gene dup event
 ortholog – genes whose ancestor is a
speciation event 5
 gene family within one organism – all paralogs -
o common ancestry: ancestral genes
 can make predictions about these ancestors genes
 example – presence of motifA – in each –
ancestor had it
- common: combination gene and species tree; multiple genes,
multiple species
o example – drawn example of a gene and species trees
 eg., chicken or the egg?
- HOMOPLASY: convergence, parallelism, reversal
- inference of ancestral states using ACCTRAN
* be able to "read" a phylogenetic tree, and draw correct
inferences about the monophyly of groups of organisms or
sequences
6
* given a tree and a set of data for a given character, be
able to infer the ancestral states of the character using the
method of ACCTRAN
2. methods of building phylogenetic trees
- input for tree building: multiple sequence alignment
brief overview of Clustal
- parsimony, distance, likelihood compared and contrasted
- the basic approaches, similarities and differences
- standard (nonparametric) bootstrap in phylogenies: use and
interpretation
- strengths and weaknesses of each of the major methods
- PHYLIP as an intro to computer programs for phylogeny
* be able to perform and interpret a small parsimony
analysis by hand, as we did in class, or using any of the main
approaches including boostrap, with PHYLIP
3. distance models of sequence evolution
* contrast the different distance models for sequence (or
protein) evolution. What are some advantages and disadvantages?
4. maximum likelihood as a general tool for hypothesis testing
* what is the likelihood ratio test and how is it used to test
a wide variety of possible hypotheses about sequence evolution,
such as: rates of evolution, monophyly of group or sequences,
similarity of branching history of two trees, etc.
7
* be able to outline or diagram the goals and basic steps in a
parametric bootstrap analysis, and it's use in hypothesis testing
in sequence studies.
5. further concepts and their application
- gene families II, reconciled gene trees
- long branch attraction conditions, causes
* The example given in class of phylogenetic analyses of
invertebrate animals was a good example of a dataset where
different methods gave different results, but exploring the
different results led to a better understanding of the history of
the sequences. What were some "take home lessons" to be gained
from this example?
8
Download