Objective: To examine MutS homologs and compare their amino

advertisement
Bio/CS – 251
Laboratory #6
Multiple Sequence Alignment
October 18, 2004
Objective: To examine MutS homologs and compare their amino acid sequences via a
multiple sequence alignment. After constructing the alignment we will look at regions
within the gene that appear to have been strongly conserved during the evolutionary
process. We will test our observations in an empirical manner. We will learn to access
tools that are available for making multiple sequence alignments.
Retrieving Amino Acid Sequences:
We will gather the amino acid sequences for four paralogs of the MutS gene and two
orthologs. In order to do this we need to acquire the SwissProt Accession Numbers for
these genes. We can do this by visiting the SwissProt web site at
http://www.expasy.org/sprot/
However, in the interest of saving time and to also guarantee that we are all working with
the same amino acid sequences we have located Accession Numbers and placed them in
the following table
Gene
MSH3 (Human)
MSH2 (Human)
MSH4 (Human)
MSH5 (Human)
MSH6 (Human)
MSH3 (Mus musculus)
MSH3 (Yeast)
Swiss Prot
Accession #
P20585
P43246
O15457
O43196
P52701
P13705
P25336
Question 1: Which of the above sequences are paralogs and which are orthologs?
Question 2: Why might we consider doing multiple sequence alignments with paralogs
and orthologs? What evolutionary information might be gained from such alignments?
For the next part of the investigation we will follow the information in Claverie and
Notre Dame (BFD) p 294.
Enter the URL
http://www.expasy.org/sprot/sprot-retrieve-list.html
in the address line of your browser.
a. On the Format line click the FASTA radio button.
b. Enter the accession numbers for the MutS paralogs in the Sequence window
of the page
c. After entering these numbers, click the Create FTP file button
Question 3: Copy and paste all of the sequences that are generated in the space below.
You may also want to paste this information in a NotePad file. You will be using this
data shortly to obtain your alignment.
Question 4: Repeat the above procedure for the 3 orthologs given in the table. If you are
also creating NotePad files, create a separate file for this result.
The data that you have collected is now ready to be fed into the ClustalW program that
will do the multiple sequence alignment of our MutS genes.
Question 5: Before beginning the multiple sequence alignment, which of the two groups
(paralogs or orthologs) are the more functionally constrained? Give the reasons for your
choice.
For this part of our investigation we will be following the material in Claverie &
Notre Dame (BFD) pp296 – 300.
Enter the URL:
http://www.ebi.ac.uk/clustalw
in the address window of your browser. You are presented with a fairly elaborate page
with several options that can be set. Don’t panic (yet) we will be changing only a few of
these from their default settings. In the mean time, scroll down to the Sequence window.
Question 6: Paste the sequence for the MutS paralogs that you have pasted above into the
Sequence window. After doing this:
a. Choose Full from the Alignment pull-down menu.
b. Choose aln w/numbers from the Output Format menu.
c. Choose Input from the Output Format window
d. Click on the Run button at the bottom of the page
e. Review the output and make sure that the Alignment Section appears in the
center of the output. This is important for the rest of our investigation.
f. Save the web page to the Laboratory 6 section of your H drive. Do not close
the page.
On page 305 of your Lab Manual is an explanation of the markings that appear below
each line of the multiple alignment. We review them here. The markings are a star (*), a
colon (:) and a period (.). Their meanings are as follows:
1. (*) The column is conserved for all of the sequences in the multiple sequence
alignment.
2. (:) All residues in the column have roughly the same size and the same
hydropathy, i.e. that are functionally constrained.
3. (.) The size or hydropathy has been preserved in the course of evolution.
Your overall goal in a multiple sequence alignment is to identify important positions. In
particular you want to find the amino acids that have not mutated or are functionally
constrained. A good block for starting such an investigation is one that has a block with
at least one to three stars, five to seven colons and a few periods sprinkled about for every
10 – 30 amino acids. The sequence may extend over more than one line of the displayed
alignment and may be over 100 amino acids long.
Question 7: Identify the region(s) containing several positions that we have defined as
important positions that are in your alignment. Give the approximate locations of these
regions relative to the hMSH3 sequence.
Question 8: Is any one of these more promising than the others, i.e. seem to have a
higher percentage of the so called important positions?
Open the JalView portion of the ClustalW results. This is actually a Java Applet that is
running on your computer. It is used for editing the alignment generated by clustalW.
We are not planning to do that now. Our purpose is just to compare its presentation to
the clustalW results.
Question 9: What is shown in the graph below the sequence alignment in JalView?
How does this information compare to your answers to questions 6 and 7 above?
Our final observation concerns the quide tree or Cladogram shown at the end of the
clustalW page. DO NOT CONFUSE THIS WITH A PHYLOGENETIC TREE. The tree
shown here merely indicates the order in which ClustalW compared the sequences by
taking the most similar sequences first and then adding in the others.
Question 10: In what order where the sequences added to the comparison? (Start with the
sequences more closely related to hMSH3 and add those that are least closely related).
Save this web page to the Lab6 folder in your Bioinformatics folder in your H drive as
ClustalW1.
Now we will repeat the ClusstalW process for the three orthologs of MSH3. Add the
three sequences to the Sequence window and choose the same options that you chose for
the alignment of paralogs.
Question 11: Using the location numbers from hMSH3 what regions in this alignment
seem to exhibit strongly conserved regions?
Question 12: Which of your two multiple sequence alignments seem to be more strongly
aligned?
Save this web page to the Lab6 folder in your Bioinformatics folder in your H drive as
ClustalW2.
If our goal is to find the strongly conserved regions within the sequences then it does not
make sense to deal with the paralogs and orthologs separately. Return to the clustalW
home page and once again paste the sequences for the three paralogs in the Sequence
window and then add the two orthologs – Mus musculus and Yeast to these sequences.
Now using the same options that you have used in your first two runs, press the run
button. This will generate a third multiple sequence alignment. Save this web page in
your Lab 3 folder as ClustalW3.
Question 13: Using the numbering scheme for the hMSH3 gene, identify the strongly
conserved regions of this alignment.
Question 14: What does your observation in your answer to Question 13 say about the
relative rates of evolution between the orthologs vs that between the paralogs? Briefly
explain your reasoning.
Finally, we can test the evolutionary strength of the region(s) you have identified. We
will test our alignment against a sequence that is even more distantly related to our
hMSH3 sequence. Return to SwissProt to find such a sequence. We will follow the
procedure laid out earlier in Chapter 9 of our lab manual on pp290 – 295. We begin by
BLASTing the hMSH3 gene.
Enter the URL
http://www.expasy.org/cgi-bin/BLASTEMBnet-CH.pl
After the ExPASy server appears.
a. Enter the Accession Number P20585 in the box that is provided.
b. If it is not highlighted (it probably is) click on the blastp radio button
c. Click on the check box “exclude fragment sequences”
d. Slide down to the Options section and set the number of best scoring
sequences and best alignments to 1000
e. Set the E threshold to .1
f. Click the run BLAST button
g. Click NiceBlastView when the next screen appears.
This will generate a very long list of information. Scroll down the list until you get to the
lower valued Scores say around 100 – 110. This should have e-values in the 10-20 to 10-30
range Choose a sequence from a non human that is similar along the full range of
hMSH3 and that has at least 800 amino acids . Check the box to the left of the score.
Question 15: Which sequence did you choose? What is the e-value of the sequence
comparison of this sequence with hMSH3?
In the pull-down menu at the top of the page, choose Retrieve sequences (FASTA
format) and click submit. This is located next to the Send selected sequences to: phrase.
Question 16: Paste your result here:
Question 17: Return to ClustalW and add this sequence to the other 7 sequences and run
ClustalW again.
Question 18: What can be gained from comparing this sequence, which is rather
distantly related in terms of score and e-values from MSH3, with the other 7 aligned
sequences.
Question 19: Are there any regions that seem to be functionally conserved? (You may
have to relax your criterion on *’s a bit.) Identify the region(s) using the ID numbers
from hMSH3.
For Homework
Return to your third alignment that you saved as ClustalW3. Open the JalView window
and look at the color coding of the sequence alignment and also the graph below the
sequences. Progress towards the end of the alignment. Notice that the numbering goes
beyond that of the sequence alignment numbering that appears on the main ClustalW
page.
Question 20: Why is there a difference in the numbering?
Around notation1130 on the JalView presentation of the alignment is a column that is
highlighted in blue. It reads top to bottom V, C, I, L, C, M, I. We want to consider this
sequence of amino acids. Please be aware that the JalView display may be temporary
(ClustalW only keeps your results for 24 hours). Therefore, we should locate this column
in the alignment section of the ClustalW web page that we saved as ClustalW3.
Question 21: What is the number for the ClustalW alignment column that corresponds to
the JalView column containing V, C, I, L, C, M, I.
Question 22: What is the most direct and simple route taken by natural selection to
install these hydrophobic, non-polar amino acids at this location in each gene? In other
words, determine the most likely pathway (most parsimonious pathway) of codon
substitutions (minimum number of nucleotide substitutions) that would interconvert
Methionine, Leucine, Isoleucine, Valine, and Cysteine. Build a pathway, starting from
one of these amino acids, that shows how each of the other four could be obtained by a
minimum number of changes. From this analysis, which amino acid(s), and which
codon, is more likely to be ancestral, i.e., which amino acid and codon was more likely to
reside at this position in the common ancestor of each of these genes?
Download