MSA Programs

advertisement
Multiple Sequence
Alignment
ClustalW
TCoffee
Ka, Ks, and Ka/Ks
Anchored alignment
1
ClustalW
 http://www.ebi.ac.uk/clustalw/
2
ClustalW
Paste your
sequences
Multiple sequence
Alignment alignment
options
Submit
3
Exercise
 HomoloGene
is a system for automated
detection of homologs among annotated
genes of several completely sequenced
eukaryotic genomes.
 Download the FASTA sequences of
HomoloGene:5276 and align them with
ClustalW
4
Download protein
sequences
5
Result
Alignment
Guide Tree
6
TCoffee
http://tcoffee.crg.cat/
Tcoffee computes its alignments by combining a
collection of smaller alignments
7
Alignment at the DNA level based on an
alignment at the Protein Level
 The
18-kDa protein plays an important role
in fertilization of several abalone species
 Build a multiple sequence alignment using
the following sequences
8
Sequences
>gi|604533|gb|AAC37231.1| fertilization protein
MRSLVLLCVLLMAICAADKKTSVSKENEAAMKVAMMKFLDMKAGVFKEIIEDMGYPITPPQWTTLLYYNR
ERLIEFCRSFLALSKKIILLGGNKLNKANFARMGRILGWKSQWAVRQRQWGMVRVSRRHTSTAIAKRIVA
MKVADLPCN
>gi|604531|gb|AAC37233.1| fertilization protein
MRFLLLLCVLMGAVSQAVCRKRPNVWGKIVVKEKNKAAMKIGFMEYLDAKLVKFKRHWLVGANWKLQKFE
TDEMRYLAIKRLIKVCHGYTIWSQRLIMLKYRPLNEKYFKKVGRYLAWRNYLIVFRMWIGVLKKNLKRSE
ITKPMQKLLDTKDGELPCPVRKIHG
>gi|604529|gb|AAC37232.1| fertilization protein
MRSLVLLCVLMAVGCVAFDDVVVSRQEQSYVQRGMVNFLDEEMHKLVKRFRDMRWNLGPGFVFLLKKVNR
ERMMRYCMDYARYSKKILQLKHLPVNKKTLTKMGRFVGYRNYGVIRELYADVFRDVQGFRGPKMTAAMRK
YSSKDPGTFPCKNEKRRG
>gi|604527|gb|AAC37230.1| fertilization protein
MRSLVLLCVLLMAICAADKKTTVSKENAAAMKIAMIKFLDARAGKFKKRVENMGYPITPPQWTTLLYYNR
QRLMEWCHTYVEFSKKIILMGGNKLNKKNFTRMGRIIGWKNQWVLKRRQWEMVRVMRRYKSTAIAKKIVA
MKVADLPCN
>gi|604525|gb|AAC37229.1| fertilization protein
MRSLVLLCVLLMAICAADKKSTVSKENAAAMKVAMIKFLDSRTDRFKKRIEKIGYPITPPQYTTLLYYNR
ERLMDWCHNYVEVSKKIILLGGNKLNKKNFARMGRIIGWKNQWILKRRQWHMVRVMRRYKASAIAKKIVA
MKVADLPCN
9
Choose TCoffee
Regular, paste the
sequences in the
data box, and press
submit
10
Download
formats
Guide
tree
11
Codon Alignment
 In
order to study selection patterns, you
will need to have the corresponding DNA
alignment
 Using the PROTOGENE (Protein-toGene) in Tcoffee, the amino-acid
alignment will be transformed into a codon
alignment. The actual procedure invloves
tBLASTn.
12
•PROTOGENE
(in Tcoffee) is time
consuming. Please submit your email
address, and the results will be emailed to
you.
•PROTOGENE may return more that one
DNA sequence for any given Protein
sequence. For your homework assignment,
please choose one sequence for each
species.
13
(Result) Codon alignment
>gi|604533|gb|AAC37231.1|_G_L36554 _S_ AAC37231 _DESC_ fertilization protein MATCHES_ON Haliotis assimilis fertilization protein mRNA, complete cds
ATGAGGTCTTTGGTGCTTCTCTGTGTTTTGCTGATGGCAATATGTGCGGCGGAC-----------------------AAAAAAACCTCGGTCTCGAAGGAAAATGAAGCCGCAATGAAG
GTAGCGATGATGAAGTTTTTGGATATGAAGGCGGGTGTATTCAAAGAAATC---ATTGAG
GATATGGGATATCCAATAACCCCTCCGCAATGGACAACTCTACTGTACTACAACAGAGAG
AGATTGATTGAATTTTGCCGTTCCTTCCTTGCATTGTCCAAAAAGATTATATTGCTGGGA
GGTAACAAATTAAATAAGGCGAACTTCGCTAGGATGGGTCGAATCCTTGGCTGGAAAAGC
CAGTGGGCTGTGAGACAGAGGCAATGGGGGATGGTCAGA---------GTGTCGAGGCGC
CATACAAGTACTGCAATAGCTAAAAGGATCGTCGCCATGAAAGTTGCTGACCTACCCTGT
AAC------------------TAG
>gi|604531|gb|AAC37233.1|_G_L36590 _S_ AAC37233 _DESC_ fertilization protein MATCHES_ON Haliotis corrugata fertilization protein mRNA, complete cds
ATGAGGTTTTTGCTGCTTCTCTGTGTTTTGATGGGGGCAGTATCTCAGGCAGTATGCAGA
AAAAGACCTAATGTCTGGGGGAAAATCGTGGTCAAGGAGAAAAATAAAGCCGCAATGAAG
ATAGGGTTTATGGAATATTTGGATGCAAAGTTGGTAAAGTTTAAAAGGCACTGGCTTGTT
GGAGCCAATTGGAAACTTCAAAAATTTGAAACGGATGAAATGAGATACCTCGCCATAAAG
AGACTGATAAAAGTTTGCCATGGATACACTATTTGGTCCCAACGACTAATAATGTTAAAA
TATCGACCATTGAATGAGAAATACTTCAAAAAGGTGGGTCGATACCTTGCCTGGCGAAAC
TACCTCATAGTTTTTCGGATGTGGATCGGCGTTTTG------AAGAAAAATCTTAAAAGA
TCGGAAATAACGAAACCCATGCAAAAACTCCTCGACACAAAGGATGGTGAGTTGCCCTGC
CCTGTTAGAAAGATACATGGATAA
>gi|604529|gb|AAC37232.1|_G_L36589 _S_ AAC37232 _DESC_ fertilization protein MATCHES_ON Haliotis fulgens fertilization protein mRNA, complete cds
ATGAGGTCTTTGGTGCTTCTCTGTGTTTTGATGGCGGTAGGATGTGTGGCGTTT-----------------------GATGATGTGGTGGTCTCAAGGCAAGAGCAATCTTATGTGCAG
AGAGGGATGGTCAACTTTTTGGATGAAGAAATGCATAAACTGGTTAAACGG---TTTAGA
GATATGCGATGGAATTTAGGGCCAGGCTTTGTATTCCTTCTAAAAAAAGTCAACAGAGAG
AGAATGATGCGCTACTGCATGGATTACGCCAGATATTCCAAAAAGATTTTACAGCTAAAA
CATCTTCCAGTAAATAAGAAGACCCTCACTAAAATGGGTAGATTCGTTGGATATCGAAAC
TATGGGGTCATCAGGGAGTTGTACGCCGACGTATTCAGAGACGTTCAAGGATTTAGGGGG
CCTAAAATGACTGCAGCCATGAGGAAGTACAGCAGCAAGGATCCTGGTACATTTCCTTGC
AAGAACGAGAAACGCCGCGGATGA
>gi|604527|gb|AAC37230.1|_G_L36553 _S_ AAC37230 _DESC_ fertilization protein MATCHES_ON Haliotis sorenseni fertilization protein mRNA, complete cds
ATGAGGTCTTTGGTGCTTCTCTGTGTTTTGCTGATGGCAATATGTGCGGCGGAC-----------------------AAAAAAACCACGGTCTCGAAGGAAAATGCAGCCGCAATGAAG
ATAGCTATGATAAAGTTTTTGGATGCGAGGGCGGGTAAATTCAAAAAACGC---GTTGAG
AATATGGGATATCCAATAACCCCTCCGCAATGGACAACTCTACTATACTACAACAGACAG
AGATTGATGGAATGGTGCCATACCTACGTTGAATTTTCCAAAAAGATTATATTGATGGGA
GGTAACAAATTAAATAAGAAGAACTTCACTAGGATGGGTCGAATCATTGGCTGGAAAAAC
CAGTGGGTTTTGAAAAGGAGGCAATGGGAGATGGTCAGA---------GTGATGAGGCGC
TATAAAAGTACTGCAATAGCTAAAAAGATCGTCGCCATGAAAGTTGCTGACCTACCCTGT
AAC------------------TAG
>gi|604525|gb|AAC37229.1|_G_L36552 _S_ AAC37229 _DESC_ fertilization protein MATCHES_ON Haliotis rufescens fertilization protein mRNA, complete cds
ATGAGGTCTTTGGTGCTTCTCTGTGTTTTGCTGATGGCAATATGTGCGGCGGAC-----------------------AAAAAATCCACGGTCTCGAAGGAAAATGCAGCCGCAATGAAG
GTAGCGATGATAAAGTTTTTGGATTCGAGGACGGATAGATTCAAAAAACGC---ATTGAG
AAGATTGGATATCCAATAACCCCTCCGCAATATACAACTCTACTATACTACAACAGAGAG
AGATTGATGGATTGGTGCCATAACTACGTTGAAGTATCCAAAAAGATTATATTGTTGGGA
GGTAACAAATTAAATAAGAAGAACTTCGCTAGGATGGGTCGAATCATTGGCTGGAAAAAC
CAGTGGATTTTGAAAAGGAGGCAATGGCACATGGTCAGA---------GTGATGAGGCGC
TATAAAGCTTCTGCAATAGCTAAAAAGATCGTCGCCATGAAAGTTGCTGACCTACCCTGT
AAC------------------TAG
14
SNAP - Ds/Dn Calculation Tool
http://hcv.lanl.gov/content/sequence/SNAP/SNAP.html
Calculates synonymous and nonsynonymous
substitution rates based on codon alignments
according to Nei and Gojobori (1986) method.
15
Input codon
alignment
Select output
statistics
16
SNAP - Ds/Dn Calculation Tool
Conclusion: We detect positive selection in six of the
comparisons. So did Swanson and Vacquier (1998).
17
Distmat
http://emboss.bioinformatics.nl/cgi-bin/emboss/distmat
Distmat calculates the evolutionary distances
between every pair of sequences in a multiple
alignment.
The distances are expressed in terms of the
number per 100 nucleotides or number of
replacements per 100 amino acids
18
Distmat
 Feed
the DNA alignment of 18-kDa protein
into distmat.
 Calculate separately the distances
between the sequences for codon
positions 1 and 2, and for codon position
3.
 Are the results in agreement with those
from the dn/ds analysis?
19
Distmat
Distmat
Anchored multiple-sequence alignment with
DIALIGN
http://dialign.gobics.de/anchor/submission.php
User manual:
http://dialign.gobics.de/anchor/manual
22
Align the following sequences (use the file
dalign_sequences.txt):
>seq1 WKKNADAPKRAMTSFMKAAY
>seq2 WNLDTNSPEEKQAYIQLAKDDRIRYD
>seq3 WRMDSNQKNPDSNNPKAAYNKGDANAPK
23
Results
 DIALIGN
makes alignments from
fragments
24
Results
 Numbers
below the alignment reflect
some rough degree of local similarity
among the sequences
25
Anchored alignment
 Now,
let us assume that the user has
some expert knowledge concerning a
certain domain that is present in all the
input sequences
 The domains marked in red in the three
sequences are thought to be homologous
to one another
>seq1 WKKNADAPKRAMTSFMKAAY
>seq2 WNLDTNSPEEKQAYIQLAKDDRIRYD
>seq3 WRMDSNQKNPDSNNPKAAYNKGDANAPK
26
 Therefore,
the user wants to define this
domain as anchor and align the rest of the
sequences automatically.
 To specify a set of anchor points, each
anchor point corresponds to a equallength segment pair involving two of the
input sequences should be defined
27
 first
sequence involved
 second sequence involved
 start of anchor in first sequence
 start of anchor in second sequence
 length of anchor
28
Results
 The
specified domain is aligned and the
remainder of the sequences is aligned
automatically respecting the constraints
given by the anchor points:
29
Guidance/HoT
>seq1
WKKNADAPKRAMTSFMKAAY
>seq2
WNLDTNSPEEKQAYIQLAKDDRIRYD
>seq3
WRMDSNQKNPDSNNPKAAYNKGDANAPK
>seq4
WRMDSNQKNPNNPKAAYNKGDANAPK
Download