Lab8

advertisement
Bio/CS – 251
Laboratory #8
Multiple Sequence Alignment
March 29, 2004
Objective: To examine mutS/hMSH2 homologs and compare their amino acid sequences
via a multiple sequence alignment. After constructing the alignment we will look at
regions within the gene that appear to have been strongly conserved during the
evolutionary process. We will test our observations in an empirical manner. We will
learn to access tools that are available for making multiple sequence alignments.
NOTE: For this lab check that your Browser is Java 1.5 (or higher) enabled. JalView
will not work for versions less than 1.5
Retrieving Amino Acid Sequences:
We will gather the amino acid sequences for four paralogs of the human hMSH2 gene
and two orthologs from other species. In order to do this we need to acquire the
SwissProt Accession Numbers for these genes. We can do this by visiting the SwissProt
web site at
http://www.expasy.org/sprot/
However, in the interest of saving time and to also guarantee that we are all working with
the same amino acid sequences we have located Accession Numbers and placed them in
the following table
Gene
MSH3 (Human)
MSH2 (Human)
MSH4 (Human)
MSH5 (Human)
MSH6 (Human)
MSH3 (Mus musculus)
MSH3 (Yeast)
Swiss Prot
Accession #
P20585
P43246
O15457
O43196
P52701
P13705
P25336
Question 1: Which of the above sequences are paralogs and which are orthologs?
Question 2: Why might we consider doing multiple sequence alignments with paralogs
and orthologs? What evolutionary information might be gained from such alignments?
For the next part of the investigation we will follow the information in Claverie and
Notre Dame (BFD) p 294.
Enter the URL
http://www.expasy.org/sprot/sprot-retrieve-list.html
in the address line of your browser.
a. On the Format line click the FASTA radio button.
b. Enter the accession numbers for the mutS paralogs in the Sequence window of
the page
c. After entering these numbers, click the Create FTP file button
Question 3: Copy and paste all of the sequences that are generated into the space below.
Make sure to include the header line that begins with the symbol ‘>’. This is an essential
part of the FASTA-formatted sequence file. You may also want to paste this information
into a NotePad file. You will be using this data shortly to obtain your alignment.
Question 4: Repeat the above procedure for the 3 orthologs given in the table. If you are
also creating NotePad files, create a separate file for this result.
The data that you have collected are now ready to be fed into the ClustalW program that
will do the multiple sequence alignment of our mutS genes.
Question 5: Before beginning the multiple sequence alignment, which of the two groups
(paralogs or orthologs) do you expect to be the more functionally constrained? Give
the reasons for your choice.
For this part of our investigation we will be following the material in Claverie &
Notre Dame (BFD) pp296 – 300.
Enter the URL:
http://www.ebi.ac.uk/clustalw
in the address window of your browser. You are presented with a fairly elaborate page
with several options that can be set. Don’t panic (yet): we will be changing only a few
of these from the default settings. In the mean time, scroll down to the Sequence
window.
Question 6: Block and paste the sequences for the mutS paralogs from Question 3 above
into the Sequence window. Make sure to include the header line with each sequence.
After doing this:
a. Choose Full from the Alignment pull-down menu.
b. Choose aln w/numbers from the Output Format menu.
c. Choose Input from the Output Format window
d. Click on the Run button at the bottom of the page
e. Review the output and make sure that the Alignment Section appears in the
center of the output. This is important for the rest of our investigation.
f. Save the web page to the Laboratory 6 section of your H drive. Do not
close the page.
On page 305 of your lab manual is an explanation of the markings that appear below each
line of the multiple alignment. We review them here. The markings are a star (*), a
colon (:) and a period (.). Their meanings are as follows:
1. (*) The column is conserved for all of the sequences in the multiple sequence
alignment.
2. (:) All amino acid residues in the column have roughly the same size and the
same hydropathy, i.e., they appear to be functionally constrained.
3. (.) The size or the hydropathy (but not both) was preserved in the course of
evolution.
Your overall goal in a multiple sequence alignment is to identify important positions. In
particular you want to find the amino acids that have not mutated or are functionally
constrained. A good block for starting such an investigation is one that has a block with
at least one to three stars, five to seven colons and a few periods sprinkled about for every
10 – 30 amino acids. The sequence may extend over more than one line of the displayed
alignment and may be over 100 amino acids long.
Question 7: Identify the conserved region(s) in your alignment. Give the approximate
locations of these regions relative to the hMSH3 sequence.
Question 8: Is any one of these more promising than the others, i.e., seem to have a
higher percentage of the so-called important or conserved positions?
Open the JalView portion of the ClustalW results. This is actually a Java Applet that is
running on your computer. It is used for editing the alignment generated by ClustalW.
We are not planning to do that now. Our purpose is just to compare its presentation to
the ClustalW results.
Question 9: What is shown in the graph below the sequence alignment in JalView?
How does this information compare to your answers to questions 6 and 7 above?
Our final observation concerns the Guide Tree or Cladogram shown at the end of the
ClustalW page. DO NOT CONFUSE THIS WITH A PHYLOGENETIC TREE. The
tree shown here merely indicates the order in which ClustalW compared the sequences by
taking the two most similar sequences first and then adding in the others.
Question 10: In what order were the sequences added to the comparison? (Start with the
sequences more closely related to hMSH3 and add those that are least closely related).
Save this web page to the Lab6 folder in your Bioinformatics folder in your H drive as
ClustalW1.
Now we will repeat the ClustalW process for the three orthologs of MSH3. Add the three
sequences to the Sequence window and choose the same options that you chose for the
alignment of paralogs.
Question 11: Using the location numbers from hMSH3 what regions in this alignment
seem to exhibit strongly conserved regions?
Question 12: Which of your two multiple sequence alignments seem to be more strongly
aligned?
Save this web page to the Lab 6 folder in your Bioinformatics folder in your H drive as
ClustalW2.
If our goal is to find the strongly conserved regions within the proteins then it does not
make sense to deal with the paralogs and orthologs separately. Return to the ClustalW
home page and once again paste the sequences for the five paralogs into the Sequence
window and then add the two orthologs to these sequences. Now, using the same options
as in your first two runs, press the Run button. This will generate a third multiple
sequence alignment for all 7 protein sequences. Save this web page in your Lab 3 folder
as ClustalW3.
Question 13: Using the numbering scheme for the hMSH3 gene, identify the strongly
conserved regions of this alignment.
Question 14: What does your observation in your answer to Question 13 say about the
relative rates of evolution between the orthologs vs that between the paralogs? Briefly
explain your reasoning.
Finally, we can test the strength of evolutionary conservation in the region(s) you have
identified. To do this, we will test our alignment against a sequence that is even more
distantly related to the human hMSH3 sequence. Return to SwissProt to find such a
sequence. We will follow the procedure laid out earlier in Chapter 9 of our lab manual
on pp290 – 295. We begin by BLASTing the hMSH3 gene.
Enter the URL
http://www.expasy.org/cgi-bin/BLASTEMBnet-CH.pl
After the ExPASy server appears.
a. Enter the Accession Number P20585 (for hMSH3) in the box that is provided.
b. If it is not highlighted (it probably is) click on the blastp radio button.
c. Click on the check box “exclude fragment sequences”.
d. Slide down to the Options section and set the number of best scoring
sequences and best alignments to 1000.
e. Set the E-value threshold to 0.1
f. Click the Run BLAST button
g. Click NiceBlastView when the next screen appears.
This will generate a very long list of information. Scroll down the list until you get to the
lower valued Scores say around 100 – 110. This should have e-values in the 10-20 to 10-30
range. Choose a sequence from a non-human that is similar along the full range of
hMSH3 and that has at least 800 amino acids. Check the box to the left of the score.
Question 15: Which sequence did you choose? What is the e-value of the sequence
comparison of this sequence with hMSH3?
In the pull-down menu at the top of the page, choose Retrieve Sequences (FASTA
format) and click Submit. This is located next to the “Send Selected Sequences to”:
phrase.
Question 16: Paste your result here:
Question 17: Return to ClustalW and add this sequence to the other 7 sequences and run
ClustalW again.
Question 18: What can be gained from comparing this sequence, which is rather
distantly related in terms of score and e-values from MSH3, with the other 7 aligned
sequences?
Question 19: Are there any regions that seem to be functionally conserved? (You may
have to relax your criterion on *’s a bit.) Identify the region(s) using the ID numbers
from hMSH3.
For Homework
Return to your third alignment that you saved as ClustalW3. Open the JalView window
and look at the color coding of the sequence alignment and also the graph below the
sequences. Move towards the end of the alignment. Notice that the numbering goes
beyond that of the sequence alignment numbering that appears on the main ClustalW
page.
Question 20: Why is there a difference in the numbering?
Around notation 1130 on the JalView presentation of the alignment is a column that is
highlighted in blue. It reads top to bottom V, C, I, L, C, M, I. We want to consider this
sequence of amino acids. Please be aware that the JalView display may be temporary
(ClustalW only keeps your results for 24 hours). Therefore, we should locate this column
in the alignment section of the ClustalW web page that we saved as ClustalW3.
Question 21: What is the number for the ClustalW alignment column that corresponds to
the JalView column containing V, C, I, L, C, M, I.
Question 22: What is the most direct and simple route taken by natural selection to
install these hydrophobic, non-polar amino acids at this location in each gene? In other
words, determine the most likely pathway (most parsimonious pathway) of codon
substitutions (minimum number of nucleotide substitutions) that would interconvert
Methionine, Leucine, Isoleucine, Valine, and Cysteine. Build a pathway, starting from
one of these amino acids, that shows how each of the other four could be obtained by a
minimum number of changes. From this analysis, which amino acid(s), and which
codon, is more likely to be ancestral, i.e., which amino acid and codon was more likely to
reside at this position in the common ancestor of each of these genes?
Download