Detecting horizontal gene transfers using discrepancies in species and gene classifications

advertisement
Detecting horizontal gene transfers
using discrepancies in species and
gene classifications
Vladimir Makarenkov
and
Alix Boc
Université du Québec à Montréal
Presentation summary
• Network models in phylogenetic analysis
• What is a horizontal gene transfer (HGT)?
• Description of the new method
• Examples of application
• Future works
• T-REX software
Inferring phylogenetic trees
Four main approaches:
• Distance-based methods
• UPGMA by Michener and Sokal (1957)
• ADDTREE by Sattath et Tversky (1977)
• Neighbor-joining (NJ) by Saitou and Nei (1988)
• UNJ and BioNJ methods by Gascuel (1997)
• Fitch by Felsenstein (1997)
• Weighted least-squares MW by Makarenkov and Leclerc (1999)
• Maximum Parsimony (Camin and Sokal 1965; Farris 1970; Fitch 1971)
• Maximum Likelihood (Felsenstein 1981)
• Bayesian approach (Rannala and Yang 1996; Huelsenbeck and Ronquist 2001)
Phylogenetic mechanisms requiring a
network representation
1
2
3
1
4
5
2
3
4
5
• Horizontal gene transfer (i.e. lateral gene transfer)
• Hybridization
• Homoplasy and gene convergence
• Gene duplication and gene loss
Software for building phylogenetic networks
• SplitsTree, Huson (1998)
• T-Rex, Makarenkov (2001)
• NeighborNet, Bryant and Moulton (2002)
Methods for detecting horizontal gene transfers
• Hein (1990) and Hein et al. (1995, 1996)
• Haseler and Churchill (1993)
• Page (1994); Page and Charleston (1998)
• Charleston (1998)
• Hallet and Lagergren (2001)
• Mirkin, Fenner, Galperin and Koonin (2003)
• V’yugin, Gelfand and Lyubetsky (2003)
• Boc and Makarenkov (2003); Makarenkov, Boc and Diallo (2004)
Thee types of horizontal gene transfer
Partial gene transfer versus complete transfer
Root
5
3
2
4
1
A
B
C
D
E
Partial Transfer
F
Complete Transfer
Root
Root
5
3
2
7
4
1
6
A
B
C
(a)
D
E
F
A
B
C
D
(b)
E
F
The new model
Basic ideas:
Species Tree
A
A
B
D
C
C
D
F
E
E
F
B
Gene Tree
1) Reconcile the species and gene phylogenetic trees using either a
topological (Robinson and Foulds topological distance) or a metric
(least-squares) criterion
2) Incorporate necessary biological rules into the mathematical model
3) Maintain the algorithmic time complexity polynomial
Partial gene transfer. Incorporating biological rules.
Root
j
i1
Situations
branch
x
a
y
i
z
b
w
when
(a,b)
evolutionary
a
can
distance
new
HGT
affect
between
species i and j,
and cannot affect the distance
between i1 and j.
the
Partial gene transfer. Incorporating biological rules (2).
Root
Root
Root
j
j
j
x
a
y
z
b
w
i
x
a
y
z
b
w
x
a
y
z
b
w
i
i
(a)
(b)
(c)
Three cases when the evolutionary distance between the
species i and j is not affected by addition of a new HGT
branch (a,b)
Partial gene transfer. Incorporating biological rules (3).
Root
No HGTs can be considered when affected branches are
located on the same lineage
Partial gene transfer. Incorporating biological rules (4).
Root
LGT1
LGT2
Lineage 1
Lineage 2
No HGT can be considered when two HGTs affecting
a pair of lineages intersect as shown
Partial gene transfer. Incorporating biological rules (5).
b
a
j
i
j
i
(b)
(a)
a1
b
b1
j
i
(c)
a1
b1
a
b
a1
b1
a1
b1
a
b
a
j
i
(d)
• Cases (a) and (b): path between the leaves i and j is allowed to go
through both HGT branches (a,b) and (a1,b1).
• Cases (c) and (d) : path between the leaves i and j is not allowed to
go through both HGT branches (a,b) and (a1,b1).
Algorithmic scheme
Step 1. Construction of the species and gene phylogenies T and T1
• Let X be a set of n taxa (i.e. species or objects).
• Infer a binary species phylogenetic tree T from the sequence or
distance data (using 16S or 23S rRNAs or other genes that are not
supposed to be transferred horizontally). This tree has 2n-3
branches and n leaves.
• T is explicitly rooted.
• Consider the same n taxa from X. Infer a binary gene
phylogenetic tree T1.
Algorithmic scheme
Step 2. LS mapping of the gene tree into the species tree
• If the topologies of T and T1 are identical, no horizontal gene
transfers can be indicated.
• If the topologies of T and T1 are different, it may be the result
of horizontal gene transfers. The gene tree T1 can be mapped into
the species tree T by fitting by least squares the branch lengths of
T to the pairwise distances in T1 (Bryant and Wadell 1998;
Makarenkov and Leclerc 1999).
• Each pair of branches of the species tree T is tested for the
possibility of an HGT. All branch lengths in T are reassessed
according to the pairwise distance in T1.
Algorithmic scheme
Step 3. Unique and Multiple gene transfer scenarios
Once all pairs of branches in T are tested, an ordered list L of all
possible HGT connections between pairs of branches in T can be
established. Each entry of L is associated with a gain in fit obtained
for a particular HGT. The researchers can then either:
1. Choose the best (most probable HGTs) from this list taking into
account the HGT order in L as well as any useful knowledge about
the data at hand (Multiple Scenario).
2. Or, add to the species tree T the best second (according to the LS
criterion), third, and so forth HGT branches in the way that the
computation of each new HGT branch is done taking into account all
previously added HGTs (Unique Scenario).
Optimization problem : Least-squares
The least-squares loss function to be minimized with an unknown length l of
the HGT branch (a,b):
Q(ab,l) =
 ( Min {d (i , a)  d ( j , b); d ( j ,a)  d (i , b)}  l  (i , j ) )2
dist ( i , j )  l
+
 ( d (i , j )  (i , j ) )2
dist ( i , j )  l

min
Root
d(i,j) - the minimum path-length distance between
the leaves (i.e. taxa) i and j in the tree T
j
x
a
y
(i,j) - the given dissimilarity value between i and j
dist(i,j) = d(i,j) – Min { d(i,a) + d(j,b); d(j,a) + d(i,b) }
i
z
b
w
Complete gene transfer
A
A
A
A
B
D
D
D
C
B
C
C
D
C
E
F
E
B
E
F
F
B
1
2
E
F
Species Tree + HGT1
Upcoming HGT2
Species Tree
Upcoming HGT1
Species Tree T
3
Species Tree + HGT2
Upcoming HGT3
Species Tree + HGT3
(Gene Tree)
A
A
A
B
D
B
C
C
D
F
E
E
F
B
C
D
E
Gene Tree T 1
F
Optimization problem : Robinson and Foulds
topological distance
A
C
D
A
E
D
B
T
E
B
T1
C
The topological distance of Robinson and Foulds (1981)
between two phylogenetic trees is equal to the minimum number
of elementary operations consisting of merging or splitting
vertices necessary to transform one tree into another.
Robinson and Foulds topological distance
A
C
B
T
D
A
E
B
C
D
A
E
B
C
T1
E
Robinson and Foulds distance between T and T1 is 2.
The HGT minimizing the Robinson and Foulds topological
distance between the species and gene phylogenetic trees
can be considered as the best candidate to reconcile the
species and gene phylogenies.
D
HGT detection algorithm : Complete transfer
•
Test all connections between pairs of branches in the species tree T.
•
Compute the RF distance or LS coefficient for each connexion. In
the case of LS optimization, the length of each edge of the species
tree is reassessed according to the gene distance matrix.
•
The best HGT found (i.e. HGT minimizing the optimization
criterion) is added to the species tree transforming it into another
phylogenetic tree.
•
Run the algorithm while the transformed species tree is topologically
different from the gene tree.
•
Time complexity: O(kn4) for k HGTs and n species.
Application example 1
Horizontal transfer of the Rubisco Large subunit gene
Delwiche, C.F., and J. D. Palmer. 1996. Rampant
Horizontal Transfer and Duplication of Rubisco Genes in
Eubacteria and Plastids. Mol. Biol. Evol. 13:873-882.
rbcL Gene Phylogeny
Rhodobacter Sphaeroides I
Xanthobacter
Alcaligenes H16 chromosomal
Alcaligenes H16 plasmid
Alcaligenes 17707 chromosomal
Mn oxidizing bacterium (S|85-9a1)
Cyanidium
Ahnfeltia
Antithamnion
Porphyridium
Cryptomonas
Ectocarpus
Olistodiscus
Cylindrotheca
 Proteobacteria
 Proteobacteria
 Proteobacteria
Red Type
(FORM I)
Red and Brown Plastids
Cyanobacteria
 Proteobacteria
Prochlorococcus
Hydrogenovibrio L2
Chromatium L
Pseudomonas
Thiobacillus ferrooxidans fe1
Nitrobacter
Thiobacillus ferr. 19859
Thiobacillus denitrificans I
Endosymbiont
Synechococcus
Anabaena
Prochlorothrix
Anacystis
Synechocystis
Prochloron
Cyanophora
Euglena
Pyramimonas
Chlamidomonas
Chlorella
Bryopsis
Coleochaete
Marchantia
Pseudotsuga
Nicotiana
Oryza
 Proteobacteria
 Proteobacteria
 Proteobacteria
 Proteobacteria
 Proteobacteria
 Proteobacteria
Cyanobacteria
Glaucophyte Plastid
Green Plastids
Green Type
(FORM I)
Delwiche and Palmer (1996) - hypotheses of HGTs
1- Cyanobacteria
→ γ-Proteobacteria
2- α-Proteobacteria → Red and brown algae
3- γ-Proteobacteria → α-Proteobacteria
4- γ-Proteobacteria → β-Proteobacteria
HGTs of the rbcL gene
2
4
7
3
5
6
1
Rhodobacter Sphaeroïde I
Xanthobacter
Mn oxidizing bacterium
Nitrobacter
Alcaligenes H 16 plasmid
Alcaligenes Chromosomal
Alcaligenes 17707 Chromosomal
Thiobacillus denitificans I
Endosymbiont
Pseudomonas
Thiobacillus fe 1
Thiobacillus ferr . 19859
Chromatium L
Hydrogenovibrio L 2
Prochlorococus
Anabaena
Synechococcus
Anacystis
Prochlorothrix
Synechocystis
Prochloron
Cyanophora
Cylindrotheca
Olistodiscus
Ectocarpus
Cryptomonas
Anthithamnion
Ahnfeltia
Porphyridium
Cyanidium
Oryza
Nicotiana
Pseudotsuga
Marchantia
Coleochaete
Pyramimonas
Euglena
Chlamidomonas
Bryopsis
Cholrella
- Proteobacteria
ß- Proteobacteria
 - Proteobacteria
Cyanobacteria
Glaucophyte plastid
Red and Brown algae
Green Plastids
HGTs of the rbcL gene - comparison
Hypotheses by Delwiche and Palmer (1996)
1- Cyanobacteria
→
γ-Proteobacteria
2- α-Proteobacteria → Red and brown algae
3- γ-Proteobacteria → α-Proteobacteria
4- γ-Proteobacteria → β-Proteobacteria
Solution
1. α-Proteobacteria →
Red and brown algae
2. -Proteobacteria →
β-Proteobacteria
3. -Proteobacteria →
γ-Proteobacteria
4.  -Proteobacteria → -Proteobacteria
5. γ-Proteobacteria →
β-Proteobacteria
6. γ-Proteobacteria →
Cyanobacteria
7. γ-Proteobacteria →
β-Proteobacteria
Application example 2
Horizontal transfers of the protein rpl12e
Data taken from:
Matte-Tailliez O., Brochier C., Forterre P. &
Philippe H. Archaeal phylogeny based on
ribosomal proteins. (2002). Mol. Biol. Evol. 19,
631-639.
Rpl12e HGTs
Assumed HGTs of the rpl12e gene involved the clusters of
Crenarchaeota and Thermoplasmatales (Matte-Tailliez, 2004)
Ferroplasma acidarinanus
Pyrobaculum aerophilum
Thermoplasma acidophilum
Aeropyrum pernix
Aeropyrum pernix
Sulfolobus solfataricus
Pyrobaculum aerophilum
Pyrococcus furiosus
Sulfolobus solfataricus
Pyrococcus abyssi
Pyrococcus abyssi
Pyrococcus horikoshii
Methanococcus jannaschii
Methanobacterium thermoautotrophicum
Archaeoglobus fulgidus
Methanosarcina barkeri
Haloarcula marismortui
Halobacterium sp.
Pyrococcus horikoshii
Pyrococcus furiosus
Methanococcus jannaschii
Archaeoglobus fulgidus
Methanosarcina barkeri
Methanobacterium thermoautotrophicum
Thermoplasma acidophilum
Haloarcula marismortui
Ferroplasma acidarinanus
Halobacterium sp.
Species tree
Rpl12e gene tree
Reconciliation scenario
Pyrobaculum aerophilum
Aeropyrum pernix
3
Sulfolobus solfataricus
Pyrococcus furiosus
5
Pyrococcus abyssi
Pyrococcus horikoshii
Methanococcus jannaschii
1
Methanobacterium thermoautotrophicum
4
Archaeoglobus fulgidus
Methanosarcina barkeri
2
Haloarcula marismortui
Halobacterium sp.
Thermoplasma acidophilum
Ferroplasma acidarinanus
Future developments
• Maximum Likelihood model
• Maximum Parsimony model
• Validation methods (bootstrapping)
• Decreasing the running time
Bibliography
•
Boc, A. and Makarenkov, V. (2003), New Efficient Algorithm for Detection of Horizontal
Gene Transfer Events, Algorithms in Bioinformatics, G. Benson and R. Page (Eds.), 3rd
Workshop on Algorithms in Bioinformatics, Springer-Verlag, pp. 190-201.
•
Delwiche, C.F., and J. D. Palmer (1996). Rampant Horizontal Transfer and Duplication of
Rubisco Genes in Eubacteria and Plastids. Mol. Biol. Evol. 13:873-882.
•
Makarenkov,V. (2001), T-Rex: reconstructing and visualizing phylogenetic trees and
reticulation networks. Bioinformatics, 17, 664-668.
•
Makarenkov, V., Boc, A. and Diallo A.B. (2004), Representing Lateral gene transfer in
species classification. Unique scenario, IFCS’2004 proceedings, Chicago.
•
Matte-Tailliez O., Brochier C., Forterre P. & Philippe H. (2002). Archaeal phylogeny based
on ribosomal proteins. Mol. Biol. Evol. 19, 631-639.
•
Robinson, D.R. and Foulds L.R. (1981), Comparison of phylogenetic trees, Mathematical
Biosciences 53, 131-147.
T-REX — Tree and Reticulogram Reconstruction1
Downloadable from http://www.info.uqam.ca/~makarenv/trex.html
Authors: Vladimir Makarenkov
Versions: Windows 9x/NT/2000/XP and Macintosh
With contributions from A. Boc, P. Casgrain, A. B.
Diallo, O. Gascuel, A. Guénoche, P.-A. Landry, F.-J.
Lapointe, B. Leclerc, and P. Legendre.
________
Makarenkov, V. 2001. T-REX: reconstructing and visualizing phylogenetic trees
and reticulation networks. Bioinformatics 17: 664-668.
1
Methods available
• Six methods for inferring phylogenetic trees from distance data
• Three methods for reconstructing reticulograms (phylogenetic
networks)
Methods available
• Four methods for inferring phylogenetic trees from incomplete
distance data
• Visualization and interactive manipulation of phylogenetic
trees and networks
Phylogenetic tree inferring methods
• ADDTREE by Sattath et Tversky (1977)
• Neighbor-joining (NJ) by Saitou and Nei (1988)
• UNJ and BioNJ methods by Gascuel (1997)
• Circular order reconstruction by Makarenkov and Leclerc (1997)
• Weighted least-squares MW by Makarenkov and Leclerc (1999)
Tree reconstruction with missing data
• Ultrametric procedure by De Soete (1984)
• Additive procedure by Landry et al. (1996)
• Triangles by Guénoche and Leclerc (2001)
• MW* by Makarenkov and Lapointe (2004)
Reticulogram reconstruction methods
• Reticulogram with detection of reticulate evolution processes,
hybridization, or recombination events (Legendre and
Makarenkov 2002; Makarenkov and Legendre 2004).
• Reticulogram with detection of horizontal gene transfer among
species. Complete and Partial gene tranfer models. Unique and
Multiple scenarios (Boc and Makarenkov 2003; Makarenkov,
Boc and Diallo 2004).
• Graphical representations: Hierarchical, Axial, or Radial views.
Interactive manipulation of trees and reticulograms.
Horizontal gene transfer detection
Reticulogram reconstruction methods
Bioinformatics software
Tree reconstruction options
Bioinformatics software
T-Rex output
Bioinformatics software
Results available (tree map + HGTs)
Bioinformatics software
Reticulogram : Hierarchical view
Bioinformatics software
Reticulogram : Axial view
Bioinformatics software
Reticulogram : Radial view
Bioinformatics software
Color selection option
Bioinformatics software
Copy as Bitmap or Metafile
Download