Multiple Sequence Alignment ClustalW TCoffee Ka, Ks, and Ka/Ks Anchored alignment 1 ClustalW http://www.ebi.ac.uk/clustalw/ 2 ClustalW Paste your sequences Multiple sequence Alignment alignment options Submit 3 Exercise HomoloGene is a system for automated detection of homologs among annotated genes of several completely sequenced eukaryotic genomes. Download the FASTA sequences of HomoloGene:5276 and align them with ClustalW 4 Download protein sequences 5 Result Alignment Guide Tree 6 TCoffee http://tcoffee.crg.cat/ Tcoffee computes its alignments by combining a collection of smaller alignments 7 Alignment at the DNA level based on an alignment at the Protein Level The 18-kDa protein plays an important role in fertilization of several abalone species Build a multiple sequence alignment using the following sequences 8 Sequences >gi|604533|gb|AAC37231.1| fertilization protein MRSLVLLCVLLMAICAADKKTSVSKENEAAMKVAMMKFLDMKAGVFKEIIEDMGYPITPPQWTTLLYYNR ERLIEFCRSFLALSKKIILLGGNKLNKANFARMGRILGWKSQWAVRQRQWGMVRVSRRHTSTAIAKRIVA MKVADLPCN >gi|604531|gb|AAC37233.1| fertilization protein MRFLLLLCVLMGAVSQAVCRKRPNVWGKIVVKEKNKAAMKIGFMEYLDAKLVKFKRHWLVGANWKLQKFE TDEMRYLAIKRLIKVCHGYTIWSQRLIMLKYRPLNEKYFKKVGRYLAWRNYLIVFRMWIGVLKKNLKRSE ITKPMQKLLDTKDGELPCPVRKIHG >gi|604529|gb|AAC37232.1| fertilization protein MRSLVLLCVLMAVGCVAFDDVVVSRQEQSYVQRGMVNFLDEEMHKLVKRFRDMRWNLGPGFVFLLKKVNR ERMMRYCMDYARYSKKILQLKHLPVNKKTLTKMGRFVGYRNYGVIRELYADVFRDVQGFRGPKMTAAMRK YSSKDPGTFPCKNEKRRG >gi|604527|gb|AAC37230.1| fertilization protein MRSLVLLCVLLMAICAADKKTTVSKENAAAMKIAMIKFLDARAGKFKKRVENMGYPITPPQWTTLLYYNR QRLMEWCHTYVEFSKKIILMGGNKLNKKNFTRMGRIIGWKNQWVLKRRQWEMVRVMRRYKSTAIAKKIVA MKVADLPCN >gi|604525|gb|AAC37229.1| fertilization protein MRSLVLLCVLLMAICAADKKSTVSKENAAAMKVAMIKFLDSRTDRFKKRIEKIGYPITPPQYTTLLYYNR ERLMDWCHNYVEVSKKIILLGGNKLNKKNFARMGRIIGWKNQWILKRRQWHMVRVMRRYKASAIAKKIVA MKVADLPCN 9 Choose TCoffee Regular, paste the sequences in the data box, and press submit 10 Download formats Guide tree 11 Codon Alignment In order to study selection patterns, you will need to have the corresponding DNA alignment Using the PROTOGENE (Protein-toGene) in Tcoffee, the amino-acid alignment will be transformed into a codon alignment. The actual procedure invloves tBLASTn. 12 •PROTOGENE (in Tcoffee) is time consuming. Please submit your email address, and the results will be emailed to you. •PROTOGENE may return more that one DNA sequence for any given Protein sequence. For your homework assignment, please choose one sequence for each species. 13 (Result) Codon alignment >gi|604533|gb|AAC37231.1|_G_L36554 _S_ AAC37231 _DESC_ fertilization protein MATCHES_ON Haliotis assimilis fertilization protein mRNA, complete cds ATGAGGTCTTTGGTGCTTCTCTGTGTTTTGCTGATGGCAATATGTGCGGCGGAC-----------------------AAAAAAACCTCGGTCTCGAAGGAAAATGAAGCCGCAATGAAG GTAGCGATGATGAAGTTTTTGGATATGAAGGCGGGTGTATTCAAAGAAATC---ATTGAG GATATGGGATATCCAATAACCCCTCCGCAATGGACAACTCTACTGTACTACAACAGAGAG AGATTGATTGAATTTTGCCGTTCCTTCCTTGCATTGTCCAAAAAGATTATATTGCTGGGA GGTAACAAATTAAATAAGGCGAACTTCGCTAGGATGGGTCGAATCCTTGGCTGGAAAAGC CAGTGGGCTGTGAGACAGAGGCAATGGGGGATGGTCAGA---------GTGTCGAGGCGC CATACAAGTACTGCAATAGCTAAAAGGATCGTCGCCATGAAAGTTGCTGACCTACCCTGT AAC------------------TAG >gi|604531|gb|AAC37233.1|_G_L36590 _S_ AAC37233 _DESC_ fertilization protein MATCHES_ON Haliotis corrugata fertilization protein mRNA, complete cds ATGAGGTTTTTGCTGCTTCTCTGTGTTTTGATGGGGGCAGTATCTCAGGCAGTATGCAGA AAAAGACCTAATGTCTGGGGGAAAATCGTGGTCAAGGAGAAAAATAAAGCCGCAATGAAG ATAGGGTTTATGGAATATTTGGATGCAAAGTTGGTAAAGTTTAAAAGGCACTGGCTTGTT GGAGCCAATTGGAAACTTCAAAAATTTGAAACGGATGAAATGAGATACCTCGCCATAAAG AGACTGATAAAAGTTTGCCATGGATACACTATTTGGTCCCAACGACTAATAATGTTAAAA TATCGACCATTGAATGAGAAATACTTCAAAAAGGTGGGTCGATACCTTGCCTGGCGAAAC TACCTCATAGTTTTTCGGATGTGGATCGGCGTTTTG------AAGAAAAATCTTAAAAGA TCGGAAATAACGAAACCCATGCAAAAACTCCTCGACACAAAGGATGGTGAGTTGCCCTGC CCTGTTAGAAAGATACATGGATAA >gi|604529|gb|AAC37232.1|_G_L36589 _S_ AAC37232 _DESC_ fertilization protein MATCHES_ON Haliotis fulgens fertilization protein mRNA, complete cds ATGAGGTCTTTGGTGCTTCTCTGTGTTTTGATGGCGGTAGGATGTGTGGCGTTT-----------------------GATGATGTGGTGGTCTCAAGGCAAGAGCAATCTTATGTGCAG AGAGGGATGGTCAACTTTTTGGATGAAGAAATGCATAAACTGGTTAAACGG---TTTAGA GATATGCGATGGAATTTAGGGCCAGGCTTTGTATTCCTTCTAAAAAAAGTCAACAGAGAG AGAATGATGCGCTACTGCATGGATTACGCCAGATATTCCAAAAAGATTTTACAGCTAAAA CATCTTCCAGTAAATAAGAAGACCCTCACTAAAATGGGTAGATTCGTTGGATATCGAAAC TATGGGGTCATCAGGGAGTTGTACGCCGACGTATTCAGAGACGTTCAAGGATTTAGGGGG CCTAAAATGACTGCAGCCATGAGGAAGTACAGCAGCAAGGATCCTGGTACATTTCCTTGC AAGAACGAGAAACGCCGCGGATGA >gi|604527|gb|AAC37230.1|_G_L36553 _S_ AAC37230 _DESC_ fertilization protein MATCHES_ON Haliotis sorenseni fertilization protein mRNA, complete cds ATGAGGTCTTTGGTGCTTCTCTGTGTTTTGCTGATGGCAATATGTGCGGCGGAC-----------------------AAAAAAACCACGGTCTCGAAGGAAAATGCAGCCGCAATGAAG ATAGCTATGATAAAGTTTTTGGATGCGAGGGCGGGTAAATTCAAAAAACGC---GTTGAG AATATGGGATATCCAATAACCCCTCCGCAATGGACAACTCTACTATACTACAACAGACAG AGATTGATGGAATGGTGCCATACCTACGTTGAATTTTCCAAAAAGATTATATTGATGGGA GGTAACAAATTAAATAAGAAGAACTTCACTAGGATGGGTCGAATCATTGGCTGGAAAAAC CAGTGGGTTTTGAAAAGGAGGCAATGGGAGATGGTCAGA---------GTGATGAGGCGC TATAAAAGTACTGCAATAGCTAAAAAGATCGTCGCCATGAAAGTTGCTGACCTACCCTGT AAC------------------TAG >gi|604525|gb|AAC37229.1|_G_L36552 _S_ AAC37229 _DESC_ fertilization protein MATCHES_ON Haliotis rufescens fertilization protein mRNA, complete cds ATGAGGTCTTTGGTGCTTCTCTGTGTTTTGCTGATGGCAATATGTGCGGCGGAC-----------------------AAAAAATCCACGGTCTCGAAGGAAAATGCAGCCGCAATGAAG GTAGCGATGATAAAGTTTTTGGATTCGAGGACGGATAGATTCAAAAAACGC---ATTGAG AAGATTGGATATCCAATAACCCCTCCGCAATATACAACTCTACTATACTACAACAGAGAG AGATTGATGGATTGGTGCCATAACTACGTTGAAGTATCCAAAAAGATTATATTGTTGGGA GGTAACAAATTAAATAAGAAGAACTTCGCTAGGATGGGTCGAATCATTGGCTGGAAAAAC CAGTGGATTTTGAAAAGGAGGCAATGGCACATGGTCAGA---------GTGATGAGGCGC TATAAAGCTTCTGCAATAGCTAAAAAGATCGTCGCCATGAAAGTTGCTGACCTACCCTGT AAC------------------TAG 14 SNAP - Ds/Dn Calculation Tool http://hcv.lanl.gov/content/sequence/SNAP/SNAP.html Calculates synonymous and nonsynonymous substitution rates based on codon alignments according to Nei and Gojobori (1986) method. 15 Input codon alignment Select output statistics 16 SNAP - Ds/Dn Calculation Tool Conclusion: We detect positive selection in six of the comparisons. So did Swanson and Vacquier (1998). 17 Distmat http://emboss.bioinformatics.nl/cgi-bin/emboss/distmat Distmat calculates the evolutionary distances between every pair of sequences in a multiple alignment. The distances are expressed in terms of the number per 100 nucleotides or number of replacements per 100 amino acids 18 Distmat Feed the DNA alignment of 18-kDa protein into distmat. Calculate separately the distances between the sequences for codon positions 1 and 2, and for codon position 3. Are the results in agreement with those from the dn/ds analysis? 19 Distmat Distmat Anchored multiple-sequence alignment with DIALIGN http://dialign.gobics.de/anchor/submission.php User manual: http://dialign.gobics.de/anchor/manual 22 Align the following sequences (use the file dalign_sequences.txt): >seq1 WKKNADAPKRAMTSFMKAAY >seq2 WNLDTNSPEEKQAYIQLAKDDRIRYD >seq3 WRMDSNQKNPDSNNPKAAYNKGDANAPK 23 Results DIALIGN makes alignments from fragments 24 Results Numbers below the alignment reflect some rough degree of local similarity among the sequences 25 Anchored alignment Now, let us assume that the user has some expert knowledge concerning a certain domain that is present in all the input sequences The domains marked in red in the three sequences are thought to be homologous to one another >seq1 WKKNADAPKRAMTSFMKAAY >seq2 WNLDTNSPEEKQAYIQLAKDDRIRYD >seq3 WRMDSNQKNPDSNNPKAAYNKGDANAPK 26 Therefore, the user wants to define this domain as anchor and align the rest of the sequences automatically. To specify a set of anchor points, each anchor point corresponds to a equallength segment pair involving two of the input sequences should be defined 27 first sequence involved second sequence involved start of anchor in first sequence start of anchor in second sequence length of anchor 28 Results The specified domain is aligned and the remainder of the sequences is aligned automatically respecting the constraints given by the anchor points: 29 Guidance/HoT >seq1 WKKNADAPKRAMTSFMKAAY >seq2 WNLDTNSPEEKQAYIQLAKDDRIRYD >seq3 WRMDSNQKNPDSNNPKAAYNKGDANAPK >seq4 WRMDSNQKNPNNPKAAYNKGDANAPK