BIO 224 Fall 2010 Dr. Tom Peavy Lab Assignment 2 (due date Wed, Sept 22) 1. Go to the Dotlet website (http://www.isrec.isb-sib.ch/java/dotlet/Dotlet.html ) for dotplot comparisons. First, go to the “about” and "new features" links to read more about the program. A. Then, examine the “learn by example” link and go to the “repeated domains” example. Explain the pattern you see. (note: be sure to describe the characteristic of a repeated domain pattern within dotplot)? B. Go to the “Conserved domains” example in which two proteins are compared. Explain the dotplot pattern. C. Create a folder for yourself on the computer and download the fasta file for the protein sequence of your assigned sequence from last week. Then enter the sequence into Dotlet program to perform a dotplot comparison against itself. (in essence, looking for repetitive domains within the protein). Explain the pattern you observe. (note: use the histogram slider bar to reduce noise; see the "about" link for a tutorial on how to do so) D. Go back to NCBI (http://www.ncbi.nlm.nih.gov/) and search the Homologene database to find another protein sequence that is homologous to your human protein that is NOT a rodent (e.g. rat) nor a primate sequence (e.g. chimpanzee). The sequence can be as divergent as you want. What species did you choose? ___________________ What is the accession number?___________________ E. Download/save the homologous non-rodent/non-primate protein sequence and compare it to the human protein sequence above using Dotlet. Explain the pattern. (note whether a diagonal line is present and whether it is continuous vs broken using similar histogram criteria as above; is there any shift in the alignment? Why?). 2. You are going to perform a series of local pairwise sequence alignments using your human protein, the homologous mouse protein (found within Homologene last assignment), and the other non-rodent/non-primate protein sequence you used above in question 1D, E. BIO 224 Fall 2010 Dr. Tom Peavy A. For a local pairwise comparison, use the "bl2seq " local alignment program found at the NCBI site (http://www.ncbi.nlm.nih.gov/blast/bl2seq/wblast2.cgi). Be sure to change the program to "blastp" at the top of the page since the default setting is "blastn" (a nucleotide alignment program). Perform pairwise comparisons with your three proteins to provide the data in the table. Replace the question marks embedded in the table with the percent identity and percent similarity (place similarity into parentheses) for each pairwise comparison using the specified scoring matrix. You will need to click on the "Algorithm parameters" arrow at the bottom of the page to alter the scoring matrix. Scoring Matrix= Blosum80 Human protein Mouse protein Other protein Human protein 100% (100%) ---------------------------- Mouse protein ???? 100% (100%) --------------- Other protein ???? ???? 100%(100%) Scoring Matrix= Blosum62 (note this is the default value) Human protein Mouse protein Other protein Human protein 100% (100%) ---------------------------- Mouse protein ???? 100% (100%) --------------- Other protein ???? ???? 100%(100%) Mouse protein ???? 100% (100%) --------------- Other protein ???? ???? 100%(100%) Scoring Matrix= Blosum45 Human protein Mouse protein Other protein Human protein 100% (100%) ---------------------------- B. Copy and paste one of your more interesting sequence alignments into this document. Why did you choose this one? (note: copy the alignment into a word document using a universal font like Courier). C. What trends do you note when comparing the different matrices for the local alignment comparisons? How do you explain this? BIO 224 Fall 2010 Dr. Tom Peavy 3. Then perform the same comparative alignments using a global alignment program found at the FASTA site: (http://fasta.bioch.virginia.edu/fasta_www2/fasta_www.cgi). You will need to select the box underneath the statement "Compare your own sequences" near the "(A) Program" specification area (which should be listed as "GGSEARCH: global protein:protein). You can change the Scoring Matrix at the bottom of the page. A. Once again, enter the percent identity and similarity scores as before. Scoring Matrix= Blosum80 Human protein Mouse protein Other protein Human protein 100% (100%) ---------------------------- Mouse protein ???? 100% (100%) --------------- Other protein ???? ???? 100%(100%) Mouse protein ???? 100% (100%) --------------- Other protein ???? ???? 100%(100%) Scoring Matrix= Blosum62 Human protein Mouse protein Other protein Human protein 100% (100%) ---------------------------- Scoring Matrix= Blosum50 (note this is the default value) Human protein Mouse protein Other protein Human protein 100% (100%) ---------------------------- Mouse protein ???? 100% (100%) --------------- Other protein ???? ???? 100%(100%) B. For a visual comparison of sequence alignment programs, copy and paste the same global sequence alignments (the same two species and BLOSUM matrix) that you pasted in question 2B above. (note: copy the alignment into a word document using a universal font like Courier). We will discuss these two alignments in question 5. C. What trends do you note when comparing the different matrices for the global alignment comparisons? How do you explain this? BIO 224 Fall 2010 Dr. Tom Peavy 4. Using the tables you generated for the local and global alignments, compare the values for percent identity/similarity for similar scoring matrices (e.g. BLOSUM62 for local and global). Are the values identical or relatively close? Explain why or why not? What is the trend? 5. Compare your sequence alignment comparisons (local vs global) for the same species and scoring matrix. (e.g. Are there conserved regions?; Are there variable or divergent regions?; Are there gaps and are they found in the same region?; How do the local and global alignments differ? Why would they differ?) 6. Calculate the Log-odds for a PAM250 matrix for a position in the alignment in which a Histidine (H) remains a Histidine (H) (note: show your calculation; relevant information is in table 3-2 [pg 63 new edition; pg 53 old text] and figure 3.13 [pg 68 new edition; pg 57 old text].