BCB 444/544 Phylogenetics Lecture 29 #29_Oct31

advertisement

BCB 444/544

Lecture 29

Phylogenetics

#29_Oct31

BCB 444/544 F07 ISU Terribilini #29- Phylogenetics 10/31/07 1

Required Reading

(before lecture)

Mon Oct 29 - Lecture 28

Promoter & Regulatory Element Prediction

• Chp 9 - pp 113 - 126

Wed Oct 30 - Lecture 29

Phylogenetics Basics

• Chp 10 - pp 127 - 141

Thurs Oct 31 - Lab 9

Gene & Regulatory Element Prediction

Fri Oct 30 - Lecture 29

Phylogenetic Tree Construction Methods & Programs

• Chp 11 - pp 142 - 169

BCB 444/544 F07 ISU Terribilini #29- Phylogenetics 10/31/07 2

Assignments & Announcements

Mon Oct 29 HW#5

HW#5 = Hands-on exercises with phylogenetics and tree-building software

Due: Mon Nov 5 (not Fri Nov 1 as previously posted)

BCB 444/544 F07 ISU Terribilini #29- Phylogenetics 10/31/07 3

BCB 544 "Team" Projects

Last week of classes will be devoted to Projects

• Written reports due:

• Mon Dec 3 (no class that day)

• Oral presentations (20-30') will be:

• Wed-Fri Dec 5,6,7

• 1 or 2 teams will present during each class period

 See Guidelines for Projects posted online

BCB 444/544 F07 ISU Terribilini #29- Phylogenetics 10/31/07 4

BCB 544 Only:

New Homework Assignment

544 Extra#2

Due: √PART 1 - ASAP

PART 2 - meeting prior to 5 PM Fri Nov 2

Part 1 - Brief outline of Project, email to Drena & Michael after response/approval, then:

Part 2 - More detailed outline of project

Read a few papers and summarize status of problem

Schedule meeting with Drena & Michael to discuss ideas

BCB 444/544 F07 ISU Terribilini #29- Phylogenetics 10/31/07 5

Seminars this Week

BCB List of URLs for Seminars related to Bioinformatics: http://www.bcb.iastate.edu/seminars/index.html

• Nov 1 Thurs - BBMB Seminar 4:10 in 1414 MBB

• Todd Yeates

UCLA

TBA -something cool about structure and evolution?

• Nov 2 Fri - BCB Faculty Seminar 2:10 in 102 ScI

• Bob Jernigan

BBMB, ISU

• Control of Protein Motions by Structure

BCB 444/544 F07 ISU Terribilini #29- Phylogenetics 10/31/07 6

Chp 10 - Phylogenetics

SECTION IV MOLECULAR PHYLOGENETICS

Xiong: Chp 10 Phylogenetics Basics

• Evolution and Phylogenetics

• Terminology

• Gene Phylogeny vs. Species Phylogeny

• Forms of Tree Representation

• Why Finding a True Tree is Dificult

• Procedure of Building a Phylogenetic Tree

BCB 444/544 F07 ISU Terribilini #29- Phylogenetics 10/31/07 7

Evolution and Phylogenetics

• Evolution – the development of biological form from other preexisting forms

• Evolution proceeds by natural selection

BCB 444/544 F07 ISU Terribilini #29- Phylogenetics 10/31/07 8

Natural Selection

• Species can produce more offspring than the environment can support. This leads to competition for resources. Genetic variations exist in a population that give some individuals an advantage, others a disadvantage, leading to differential reproductive success.

BCB 444/544 F07 ISU Terribilini #29- Phylogenetics 10/31/07 9

Phylogenetics

• Phylogenetics is the study of the evolutionary history of living organisms

• Uses tree like diagrams to represent the pedigrees of the organisms

• Similarities and differences seen in a multiple sequence alignment are easier to make sense of in a phylogenetic tree

BCB 444/544 F07 ISU Terribilini #29- Phylogenetics 10/31/07 10

Data Used in Phylogenetics

• Fossil records - morphology and timeline of divergence

• Limitations - not available for all species in all areas, morphology determined by multiple genetic factors, fossils for microorganisms are especially rare

• Molecular data - DNA and protein sequences molecular fossils

• Advantages - lots of data, easy to obtain

• Limitations - can be difficult to get sequences from extinct species

• Physical, behavior, and developmental characteristics can also be used in phylogenetics

BCB 444/544 F07 ISU Terribilini #29- Phylogenetics 10/31/07 11

Molecular Phylogenetics

• Molecular phylogenetics is the study of evolutionary relationships of genes and other biological macromolecules by analyzing their sequences

• Sequence similarity can be used to infer evolutionary relationships

BCB 444/544 F07 ISU Terribilini #29- Phylogenetics 10/31/07 12

Assumptions in Molecular Phylogenetics

• Sequences used are homologous, i.e. share a common ancestor

• Phylogenetic divergence is bifurcating, i.e. parent branch splits into two daughter branches

• Each position in a sequence evolved independently

• Molecular Clock – sequences evolve at constant rates (only used in some methods)

BCB 444/544 F07 ISU Terribilini #29- Phylogenetics 10/31/07 13

Terminology

Taxa (terminal nodes)

A B C D E F G H

Branch

Internal node

Root

BCB 444/544 F07 ISU Terribilini #29- Phylogenetics 10/31/07 14

Terminology

• Clade = group of taxa descended from a common ancestor

• Lineage = branch path depicting ancestordescendant relationship

• Paraphyletic group = group of taxa that share more than one closest common ancestor

A B C D E F G H

BCB 444/544 F07 ISU Terribilini #29- Phylogenetics 10/31/07 15

Tree Topology

• Tree topology is the branching pattern in a tree

Dichotomy

Bifurcation

BCB 444/544 F07 ISU Terribilini #29- Phylogenetics

Polytomy

Multifurcation

10/31/07 16

Rooted vs. Unrooted Trees

A

Rooted Tree

B C D

A

Unrooted Tree

C

B

BCB 444/544 F07 ISU Terribilini #29- Phylogenetics

D

10/31/07 17

Rooted vs. Unrooted Trees

• Unrooted trees have no root node – do not assume knowledge of a common ancestor, just relationships

• Can convert between unrooted and rooted, but first need to determine where the root is

• Two ways to define the root:

• Use an outgroup

• Midpoint rooting – midpoint of the two most divergent groups is assigned to be the root

BCB 444/544 F07 ISU Terribilini #29- Phylogenetics 10/31/07 18

Outgroups

• Outgroup is a sequence related to the sequences being studied, but is more distantly related

• Must be distinct from the ingroup, but not too distant

• If outgroup is too distantly related, it can lead to errors in tree construction

• Trick is to find the closest related sequence that is removed from the ingroup

BCB 444/544 F07 ISU Terribilini #29- Phylogenetics 10/31/07 19

Gene Phylogeny vs. Species Phylogeny

• When using molecular data, we are technically building a phylogeny for just that sequence, not for the species from which the sequences came

• Species evolution is the result of mutations in the entire genome

• Your gene may have evolved differently than other genes in the genome

• To obtain a species phylogeny, we need to use a variety of gene families to construct the tree

BCB 444/544 F07 ISU Terribilini #29- Phylogenetics 10/31/07 20

Forms of Tree Representation

Phylogram

Branch lengths represent amount of evolutionary divergence

BCB 444/544 F07 ISU Terribilini #29- Phylogenetics

Cladogram

Branch lengths are meaningless, only topology matters

10/31/07 21

Forms of Tree Representation

• Newick format – text format for use by computer programs

• Example: (((B,C),A),(D,E))

• Can also have branch lengths

BCB 444/544 F07 ISU Terribilini #29- Phylogenetics 10/31/07 22

Consensus Trees

Multiple trees that are equally optimal – build consensus tree by collapsing disagreements into a single node

BCB 444/544 F07 ISU Terribilini #29- Phylogenetics 10/31/07 23

Why Finding a True Tree is Difficult

Number of rooted trees

• The number of possible trees grows exponentially with the number of species (or sequences)

• N r

• N u

= (2n -3)!/2

= (2n -5)!/2

(n-2)

(n-3)

(n-2)!

(n-3)!

• To find the best tree, you must explore all possibilities (or must you?)

BCB 444/544 F07 ISU Terribilini #29- Phylogenetics 10/31/07 24

Tree Building Procedure

Choose molecular markers

Perform MSA

Choose a model of evolution

Determine tree building method

Assess tree reliability

BCB 444/544 F07 ISU Terribilini #29- Phylogenetics 10/31/07 25

Choice of Molecular Markers

Very closely related organisms - nucleic acid sequence will show more differences

For individuals within a species - faster mutation rate is in noncoding regions of mtDNA

More distantly related species - slowly evolving nucleic acid sequences like ribosomal RNA or protein sequences

Very distantly related species - use highly conserved protein sequences

BCB 444/544 F07 ISU Terribilini #29- Phylogenetics 10/31/07 26

Advantages of Protein Sequences

• More highly conserved - mutations in DNA may not change amino acid sequence

• Third position in a codon especially can vary - violates our assumption of independent evolution of all positions in a sequence

• DNA sequences can be biased by codon usage differences between species - causes variations in sequence that are not attributable to evolution

• In alignments, DNA sequences that are not related can show a lot of similarity due to only 4 letters in alphabet, proteins do not have this problem (at least not as much)

• Introducing gaps in alignments of DNA sequences can cause frameshift errors, making alignment biologically meaningless

BCB 444/544 F07 ISU Terribilini #29- Phylogenetics 10/31/07 27

Advantages of DNA Sequences

• Better for closely related species

• Show synonymous and non-synonymous mutations, which allows analysis of positive and negative selection events

• Lots of nonsynonymous mutations may mean positive selection for new functions of protein with different amino acid sequence

• Lots of synonymous mutations may mean negative selection - changed amino acid sequence is detrimental

BCB 444/544 F07 ISU Terribilini #29- Phylogenetics 10/31/07 28

Multiple Sequence Alignment

• Most critical step in tree building - cannot build correct tree without correct alignment

• Should build alignments with multiple programs, then inspect and compare to identify the most reasonable one

• Most alignments need manual editing

• Make sure important functional residues align

• Align secondary structure elements

• Use full alignment or just parts

BCB 444/544 F07 ISU Terribilini #29- Phylogenetics 10/31/07 29

Automatic Editing of Alignments

• Rascal and NorMD – correct alignment errors, remove potentially unrelated or highly divergent sequences

• Gblocks – detect and eliminate poorly aligned positions and divergent regions

BCB 444/544 F07 ISU Terribilini #29- Phylogenetics 10/31/07 30

Download