#29 - Phylogenetics 10/31/07 BCB 444/544 Phylogenetics

advertisement
#29 - Phylogenetics
10/31/07
Required Reading
BCB 444/544
(before lecture)
Mon Oct 29 - Lecture 28
Lecture 29
Promoter & Regulatory Element Prediction
• Chp 9 - pp 113 - 126
Wed Oct 30 - Lecture 29
Phylogenetics
Phylogenetics Basics
• Chp 10 - pp 127 - 141
Thurs Oct 31 - Lab 9
Gene & Regulatory Element Prediction
#29_Oct31
Fri Oct 30 - Lecture 29
Phylogenetic Tree Construction Methods & Programs
• Chp 11 - pp 142 - 169
BCB 444/544 F07 ISU Terribilini #29- Phylogenetics
10/31/07
1
Assignments & Announcements
10/31/07
2
10/31/07
4
BCB 544 "Team" Projects
Mon Oct 29 - HW#5
Last week of classes will be devoted to Projects
HW#5 = Hands-on exercises with phylogenetics
and tree-building software
Due: Mon Nov 5
BCB 444/544 F07 ISU Terribilini #29- Phylogenetics
• Written reports due:
• Mon Dec 3 (no class that day)
(not Fri Nov 1 as previously posted)
• Oral presentations (20-30') will be:
• Wed-Fri Dec 5,6,7
• 1 or 2 teams will present during each class period
¾ See Guidelines for Projects posted online
BCB 444/544 F07 ISU Terribilini #29- Phylogenetics
10/31/07
3
BCB 544 Only:
New Homework Assignment
Seminars this Week
BCB List of URLs for Seminars related to Bioinformatics:
544 Extra#2
Due:
BCB 444/544 F07 ISU Terribilini #29- Phylogenetics
http://www.bcb.iastate.edu/seminars/index.html
√PART 1 - ASAP
PART 2 - meeting prior to 5 PM Fri Nov 2
• Nov 1 Thurs - BBMB Seminar 4:10 in 1414 MBB
UCLA TBA -something cool about
structure and evolution?
• Todd Yeates
Part 1 - Brief outline of Project, email to Drena & Michael
after response/approval, then:
• Nov 2 Fri - BCB Faculty Seminar 2:10 in 102 ScI
Part 2 - More detailed outline of project
• Bob Jernigan BBMB, ISU
• Control of Protein Motions by Structure
Read a few papers and summarize status of problem
Schedule meeting with Drena & Michael to discuss ideas
BCB 444/544 F07 ISU Terribilini #29- Phylogenetics
BCB 444/544 Fall 07 Dobbs
10/31/07
5
BCB 444/544 F07 ISU Terribilini #29- Phylogenetics
10/31/07
6
1
#29 - Phylogenetics
10/31/07
Chp 10 - Phylogenetics
Evolution and Phylogenetics
• Evolution – the development of biological form from
SECTION IV MOLECULAR PHYLOGENETICS
other preexisting forms
Xiong: Chp 10 Phylogenetics Basics
•
•
•
•
•
•
• Evolution proceeds by natural selection
Evolution and Phylogenetics
Terminology
Gene Phylogeny vs. Species Phylogeny
Forms of Tree Representation
Why Finding a True Tree is Dificult
Procedure of Building a Phylogenetic Tree
BCB 444/544 F07 ISU Terribilini #29- Phylogenetics
10/31/07
7
Natural Selection
10/31/07
8
Phylogenetics
• Phylogenetics is the study of the evolutionary
history of living organisms
• Uses tree like diagrams to represent the
pedigrees of the organisms
• Species can produce more offspring than the
environment can support. This leads to
competition for resources. Genetic variations
exist in a population that give some individuals
• Similarities and differences seen in a multiple
an advantage, others a disadvantage, leading to
sequence alignment are easier to make sense of
differential reproductive success.
BCB 444/544 F07 ISU Terribilini #29- Phylogenetics
BCB 444/544 F07 ISU Terribilini #29- Phylogenetics
in a phylogenetic tree
10/31/07
9
Data Used in Phylogenetics
BCB 444/544 F07 ISU Terribilini #29- Phylogenetics
10/31/07
10
Molecular Phylogenetics
• Fossil records - morphology and timeline of
divergence
• Molecular phylogenetics is the study of
evolutionary relationships of genes and
other biological macromolecules by
analyzing their sequences
• Sequence similarity can be used to infer
evolutionary relationships
• Limitations - not available for all species in all areas,
morphology determined by multiple genetic factors, fossils
for microorganisms are especially rare
• Molecular data - DNA and protein sequences molecular fossils
• Advantages - lots of data, easy to obtain
• Limitations - can be difficult to get sequences from
extinct species
• Physical, behavior, and developmental
characteristics can also be used in phylogenetics
BCB 444/544 F07 ISU Terribilini #29- Phylogenetics
BCB 444/544 Fall 07 Dobbs
10/31/07
11
BCB 444/544 F07 ISU Terribilini #29- Phylogenetics
10/31/07
12
2
#29 - Phylogenetics
10/31/07
Assumptions in Molecular Phylogenetics
Terminology
Taxa (terminal nodes)
• Sequences used are homologous, i.e.
share a common ancestor
• Phylogenetic divergence is bifurcating,
i.e. parent branch splits into two
daughter branches
• Each position in a sequence evolved
independently
• Molecular Clock – sequences evolve at
constant rates (only used in some
methods)
BCB 444/544 F07 ISU Terribilini #29- Phylogenetics
A
10/31/07
13
F
G
H
Internal node
BCB 444/544 F07 ISU Terribilini #29- Phylogenetics
10/31/07
14
C
D
E
F
G
• Tree topology is the branching pattern in a tree
Dichotomy
Bifurcation
H
10/31/07
15
Rooted vs. Unrooted Trees
C
D
Polytomy
Multifurcation
BCB 444/544 F07 ISU Terribilini #29- Phylogenetics
10/31/07
16
Rooted vs. Unrooted Trees
Rooted Tree
B
E
Tree Topology
BCB 444/544 F07 ISU Terribilini #29- Phylogenetics
A
D
Root
• Clade = group of taxa descended from a common
ancestor
• Lineage = branch path depicting ancestordescendant relationship
• Paraphyletic group = group of taxa that share more
than one closest common ancestor
B
C
Branch
Terminology
A
B
• Unrooted trees have no root node – do not assume
knowledge of a common ancestor, just relationships
• Can convert between unrooted and rooted, but
first need to determine where the root is
• Two ways to define the root:
Unrooted Tree
C
A
• Use an outgroup
• Midpoint rooting – midpoint of the two most divergent
groups is assigned to be the root
B
BCB 444/544 F07 ISU Terribilini #29- Phylogenetics
BCB 444/544 Fall 07 Dobbs
D
10/31/07
17
BCB 444/544 F07 ISU Terribilini #29- Phylogenetics
10/31/07
18
3
#29 - Phylogenetics
10/31/07
Outgroups
Gene Phylogeny vs. Species Phylogeny
• Outgroup is a sequence related to the sequences
being studied, but is more distantly related
• Must be distinct from the ingroup, but not too
distant
• If outgroup is too distantly related, it can lead to
errors in tree construction
• Trick is to find the closest related sequence that
is removed from the ingroup
BCB 444/544 F07 ISU Terribilini #29- Phylogenetics
10/31/07
• When using molecular data, we are technically
building a phylogeny for just that sequence, not for
the species from which the sequences came
• Species evolution is the result of mutations in the
entire genome
• Your gene may have evolved differently than other
genes in the genome
• To obtain a species phylogeny, we need to use a
variety of gene families to construct the tree
19
Forms of Tree Representation
BCB 444/544 F07 ISU Terribilini #29- Phylogenetics
10/31/07
20
Forms of Tree Representation
• Newick format – text format for use by computer
programs
• Example: (((B,C),A),(D,E))
• Can also have branch lengths
Phylogram
Branch lengths
represent amount
of evolutionary
divergence
Cladogram
Branch lengths are
meaningless, only
topology matters
BCB 444/544 F07 ISU Terribilini #29- Phylogenetics
10/31/07
21
Consensus Trees
BCB 444/544 Fall 07 Dobbs
10/31/07
22
Why Finding a True Tree is Difficult
Number of rooted trees
Multiple trees
that are equally
optimal – build
consensus tree
by collapsing
disagreements
into a single
node
BCB 444/544 F07 ISU Terribilini #29- Phylogenetics
BCB 444/544 F07 ISU Terribilini #29- Phylogenetics
• The number of possible
trees grows
exponentially with the
number of species (or
sequences)
• Nr = (2n -3)!/2(n-2)(n-2)!
• Nu = (2n -5)!/2(n-3)(n-3)!
• To find the best tree,
you must explore all
possibilities (or must
you?)
10/31/07
23
BCB 444/544 F07 ISU Terribilini #29- Phylogenetics
10/31/07
24
4
#29 - Phylogenetics
10/31/07
Tree Building Procedure
Choice of Molecular Markers
• Choose molecular markers
• Perform MSA
• Choose a model of evolution
• Determine tree building method
• Assess tree reliability
BCB 444/544 F07 ISU Terribilini #29- Phylogenetics
• Very closely related organisms - nucleic acid
sequence will show more differences
• For individuals within a species - faster mutation
rate is in noncoding regions of mtDNA
• More distantly related species - slowly evolving
nucleic acid sequences like ribosomal RNA or
protein sequences
• Very distantly related species - use highly
conserved protein sequences
10/31/07
25
Advantages of Protein Sequences
10/31/07
26
Advantages of DNA Sequences
• Better for closely related species
• Show synonymous and non-synonymous
mutations, which allows analysis of positive
and negative selection events
• More highly conserved - mutations in DNA may not change
amino acid sequence
• Third position in a codon especially can vary - violates our
assumption of independent evolution of all positions in a
sequence
• DNA sequences can be biased by codon usage differences
between species - causes variations in sequence that are not
attributable to evolution
• In alignments, DNA sequences that are not related can show
a lot of similarity due to only 4 letters in alphabet, proteins
do not have this problem (at least not as much)
• Introducing gaps in alignments of DNA sequences can cause
frameshift errors, making alignment biologically meaningless
BCB 444/544 F07 ISU Terribilini #29- Phylogenetics
BCB 444/544 F07 ISU Terribilini #29- Phylogenetics
10/31/07
• Lots of nonsynonymous mutations may mean
positive selection for new functions of protein
with different amino acid sequence
• Lots of synonymous mutations may mean negative
selection - changed amino acid sequence is
detrimental
27
Multiple Sequence Alignment
BCB 444/544 F07 ISU Terribilini #29- Phylogenetics
10/31/07
28
10/31/07
30
Automatic Editing of Alignments
• Most critical step in tree building - cannot
build correct tree without correct
alignment
• Should build alignments with multiple
programs, then inspect and compare to
identify the most reasonable one
• Most alignments need manual editing
• Rascal and NorMD – correct alignment
errors, remove potentially unrelated or
highly divergent sequences
• Gblocks – detect and eliminate poorly
aligned positions and divergent regions
• Make sure important functional residues align
• Align secondary structure elements
• Use full alignment or just parts
BCB 444/544 F07 ISU Terribilini #29- Phylogenetics
BCB 444/544 Fall 07 Dobbs
10/31/07
29
BCB 444/544 F07 ISU Terribilini #29- Phylogenetics
5
Download