T-COFFEE Multiple Alignments of Orthologous Sequences WebLogo Horizontal Gene Transfer (Phylogenetic Trees) Overview • T-COFFEE – Tree-based Consistency Objective Function for alignment Evaluation • Focuses on orthologous gene sequences • Used to generate multiple sequence alignments • WebLogo • Constructed from multiple sequence alignment • Phylogenetic Trees • Used to determine if your gene is derived from horizontal gene transfer “Click” Enter ortholog sequences into query box – Where do I get these? RECALL: What are orthologs? • Homologs – Orthologs Insert Figure 8-41 from Microbiology – An Evolving Science © 2009 W.W. Norton & Company, Inc. • Genes duplicated via appearance of new species – Identical function in different organisms – Paralogs • Genes duplicated within a species – Perform slightly different tasks in cell » Can develop new capabilities » Can become pseudogene if functionality lost but sequence similarity retained Where do I find orthologs? Scroll down Under Homolog Selection, choose “Paralogs/Orthologs” from drop-down menu Scroll down to table containing list of orthologs Add the top 5 orthologs to Gene Cart Notice orthologous genes are from different organisms The genes are ranked by ascending E-values Select the genes by clicking these boxes Scroll down to bottom of page “Click” Only 5 genes were selected, why are 6 genes shown in the Gene Cart? One of the genes shown is your ASSIGNED gene (the one you are annotating) Generate amino acid sequences for orthologs in FASTA format Scroll down to “Export Genes” Select “FASTA Amino Acid format” “Click” Amino Acid sequences in FASTA format for all 6 genes will appear Scroll down Scroll down Your assigned gene is located at the bottom of this list (inspect Gene OID number) Copy / paste all 5 ortholog sequences into your notebook for this module EXCLUDE your gene, which should already be in your notebook Recording results in your notebook Add heading and box The amino acid sequences in FASTA format for the top 5 orthologs Return to T-COFFEE database STEP 1: Copy / paste the amino acid sequence in FASTA format for your assigned gene into the query box for T-COFFEE T-COFFEE database entries STEP 2: Copy / paste the amino acid sequences in FASTA format for the top 5 orthologs into the same query box as your gene Separate individual sequences by a hard return “Click” Wait a few moments . . . T-COFFEE Results Select “Start JalView” to examine the multiple sequence alignment of the ortholog sequences Alignment inspection using JalView Select “Percentage Identity” under “Colour” menu Reminder: Light Blue = Low Frequency Dark Blue = High Frequency Compare to consensus sequence Return to T-COFFEE Results Copy / paste this alignment into your lab notebook Recording results in your notebook Identify organism in alignment by Gene OID T-COFFEE complete On to WebLogo “Click” “Right Click” and open in IE tab (not Firefox) Copy/paste multiple sequence alignment Scroll down 1- Select “amino acid” as sequence type 2- Select box for multiline logo “Click” WebLogo Results Zoom in In IE, save picture as .png file for upload to notebook Recording results in your notebook WHAT ARE OUR GOALS? 1. Build a phylogenetic tree 2. Determine if assigned genes are derived from horizontal gene transfer Phylogenetic tree of Bacteria showing established & candidate phyla (organismal phylogeny) Domain Phylum Class Order Family Genus Species Insert Figure 1 from Handelsman (2004) Microbiol. Mol. Biol. Rev. 68: 669-685. Three bacterial phyla closely related to Planctomycetes by 23S rRNA analysis (organismal phylogeny) Insert Figure 4A from Pilhofer et al. (2008) Characterization and Evolution of Cell Division and Cell Wall Synthesis Genes in the Bacterial Phyla Verrucomicrobia, Lentisphaerae, Chlamydiae, and Planctomycetes and Phylogenetic Comparison with rRNA Genes. J Bacteriology 190: 3192-3202. 16S rRNA gene supports the monophyletic grouping Planctomycetales (organismal phylogeny by rDNA analysis) Closest phylogenetic relatives of P. limnophilus (same family) http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=1 How do we build a phylogenetic tree? include P. limnophilus gene (first module) include the top 5 orthologs (second module) include genes from organisms closely related to P. limnophilus (i.e., same family) include genes from organisms less closely related to P. limnophilus (i.e., from phyla Verrucomicrobia, Chlamydiae, and Lentisphaerae) include genes from organisms that are distantly related to P. limnophilus Recall: We want to include genes from organisms more closely related to P. limnophilus AND genes from organisms that are less closely related to P. limnophilus. So…depending on the organisms in your top 5 orthologs, there are 2 paths you can take: Select top 5 orthologs PATH 1 PATH 2 If top 5 are closely related to P. limnophilus. . . If top 5 are less closely related to P. limnophilus. . . Choose 5-10 less closely related organisms Choose 5-10 more closely related organisms Building a phylogenetic tree EXAMPLE: Organisms closely related to P. limnophilus (i.e., same family) http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=1 Building a phylogenetic tree EXAMPLE: Organisms less closely related to P. limnophilus (i.e., from phyla Verrucomicrobia, Chlamydiae, and Lentisphaerae) Insert Figure 4A from Pilhofer et al. (2008) Characterization and Evolution of Cell Division and Cell Wall Synthesis Genes in the Bacterial Phyla Verrucomicrobia, Lentisphaerae, Chlamydiae, and Planctomycetes and Phylogenetic Comparison with rRNA Genes. J Bacteriology 190: 3192-3202. Inspect your top 5 orthologs: Which path? Example: PATH #1 – Most are in same family as P. limnophilus, so choose 5-10 sequences from less closely related organisms Where do I find sequences? Scroll down Under Homolog Selection, choose “Paralogs/Orthologs” from drop-down menu Scroll through the ortholog list and select some genes from less closely related as well as some distantly related organisms Once 5-10 orthologs are selected, add them to your gene cart Generate amino acid sequences for orthologs in FASTA format Scroll down to “Export Genes” Select “FASTA Amino Acid format” “Click” Amino Acid sequences in FASTA format Remember: Your assigned gene is at the bottom of the list Scroll down Recording results in your notebook Create another box in your lab notebook, and copy/paste ONLY the 5-10 NEW ortholog FASTA sequences (i.e., exclude those already in first & second module) Recording results in your notebook What if your top 5 orthologs are distantly related to P. limnophilus? Example: PATH #2 – Most are not in the same phylum as P. limnophilus, so choose 5-10 sequences from more closely related organisms Scroll through the ortholog list and select some genes from closely related as well as other distantly related organisms Once 5-10 orthologs are selected, add them to your gene cart Copy / paste FASTA format protein sequences into notebook Use Phylogeny.fr site to create a phylogenetic tree “Click” Creating a phylogenetic tree Select “A la Carte” from menu 1- Select “T-Coffee” for multiple alignment 2- Leave other settings as default Scroll down Scroll down “Click” Your P. limnophilus gene Your top 5 orthologs 5-10 new orthologs Copy/paste sequences in query box. Scroll down & select “submit” Results of phylogenetic analysis Download and save as .png for upload to notebook Recording results in your notebook How do I interpret the tree results? Possible scenarios resulting from construction of a phylogenetic tree P. limnophilus Blastopirellula Carboxydothermus Bacillus P. limnophilus Carboxydothermus P. maris Pirellula No HGT since P. limnophilus and Blastopirellula are in the same family and are clustered together (i.e., gene phylogeny matches organismal phylogeny). Possible HGT since P. limnophilus and Carboxydothermus are very distantly related yet clustered together (i.e., gene phylogeny does NOT match organismal phylogeny). Bacillus Clostridium Clustered Carboxydothermus P. limnophilus P. maris Blastopirellula Not clustered Maybe HGT, but unsure because there is also an unresolved or multifurcating branch Interpreting your phylogenetic tree If your Planctomyces limnophilus gene is clustered with that from an organism in the P. limnophilus family probably not horizontal gene transfer If your Planctomyces limnophilus gene is clustered with that from an organism that is NOT in the P. limnophilus family may be horizontal gene transfer If your Planctomyces limnophilus gene is clustered with more than one organism in the tree (multifurcating branch) unresolved phylogeny In the example below, is the gene derived by HGT? Why or why not? Planctomyces limnophilus Blastopirellula marina Planctomyces maris Pedospheara parvula (Ellin 514) Lentisphaera araneosa Verrucomicrobium spinosum Sorangium cellulosum Escherichia coli K12 Rhodopirellula baltica Thiobacillus denitrificans Moorella thermoacetica Brucella canis Hydrogenivirga sp. Clostridium perfringens Gemmata obscuriglobus Recording results in your notebook In the “Interpretation” box to your lab notebook Is there evidence of horizontal gene transfer? What organisms does your gene cluster with? The same family? Or the three more closely related phyla (Verrucomicrobia, Chlamydiae, Lentisphaerae)? Do the gene and genome GC content differ by more than 5%? Do the neighborhoods for the top 5 orthologs look similar or different to that of your gene in P. limnophilus?