BCB 444/544 Fall 07 Sept 21 Exam 1 BCB 444/544 - F07 Exam 1 (100 pts) p 1 of 6 Name___________________________________ A. Databases & Literature Resources for Bioinformatics (10 pts TOTAL) A1. (2pts) In your undergraduate research project, you have identified an especially interesting and, so far, unannotated gene in bacteria, which you have named "BCB1." Your experimental results demonstrate that BCB1 is an essential gene: mutations that knock-out its function are lethal. You have a hunch it must be conserved among all life forms. To obtain support for this hypothesis, you would like to find identify a homolog of this gene in humans. You logon to the BLAST page at NCBI and choose to run a basic protein BLAST search against only human proteins. However, you obtain no significant hits!!! How should you change your search parameters to increase your chances of detecting a potential human homolog? A2. (1pt) Despite changing parameters as described above, you were unable to identify a putative human homolog. You decide to change your strategy and run a BLAST search against proteins from all organisms. Great! You find an extensive list of potential homologs across many forms of life -- but you still did not identify any potential homologs in human. As you sit in frustration your thoughts drift back to your glory days in BCB 444/544 and you remember an alternative BLAST program that takes advantage of a profile or PSSM in an interative search procedure, thus providing more sensitivity for detecting remote homologs. What is this specific BLAST program called? A3. (2pts) You tap your foot and wait for your browser to refresh. You recall a few "suggestion" & "caveats" about effective use of PSI-BLAST ; ) Hmmm… What is one "tip" for effective use of PSI-BLAST? A4. (2pts) At last there it is, you’ve found a significant "hit" in human! homolog! This seems like an excellent and fitting end to a long and exciting search! You pat yourself on the back and are just about to go out to celebrate with a few beers, when your lab partner takes a look at the annotation for your putative human homolog and says: "Hey! I think you've been scooped! I saw a paper describing a human protein with the same annotation from Drena Dobbs's lab last year - it was in Science or Nature, I think--or maybe it was in NAR, no - it was Proteins, maybe 2 years ago! You'd better check it out!" Aaargh… it is 2 AM & the library is closed…Which online resource would you use to find all papers published by Dobbs in biomedical journals during the past 5 years? A5. (3pts) Darn! That Dobbs lab must have some amazing students! They did identify your gene in humans -- and actually found two very similar genes. They said one of them is the ortholog of the gene you found in bacteria and the other is actually a paralog. What is an ortholog and how does it differ from a paralog? BCB 444/544 Fall 07 Sept 21 Exam 1 p 2 of 6 B. Dynamic Programming (20 pts TOTAL) You think Dobbs made an error -- it looks like she confused the ortholog & paralog! A vital piece of evidence that could prove this is an optimal global pairwise alignment between your prokaryotic gene and each of the human homologs. You would love to prove Dobbs wrong, so despite the late hour, you decide to compare the two alignments (in the bar, where you are now drowning your sorrows, while surfing web on your laptop). Aaaarrrgh! Your battery just died - and you left your charger in lab!! You must perform the alignment by hand. Demonstrate your prowess by reproducing a portion of that global alignment below. B1. (8pts) Fill out the dynamic programming matrix for determining an optimal global alignment between the sequences TCG and TCCAG. Scoring: +5 for matches; -3 for mismatches and spaces. T C C A G T C G B2. (2pts) Where is the score of the optimal alignment(s) located in the DP matrix? (Circle it) B3. (4pts) There are 2 optimal alignments. For full credit, draw both of them & show your traceback arrows. B4. (4pts) You don't want to go home yet, so decide it would be entertaining to set up a DP matrix for local alignment, using the BLOSUM62 matrix (attached to this Exam). But, you were able to fill in only the first two rows before the bar closed. Show what you accomoplished in the matrix below: 0 T C C A G T B5. (2pts) Walking home with a bit of a buzz, it occurs to you that the "rule" for initializing a DP matrix for global alignment - which can cause "end-gap" penalties to accumulate if sequences are of different lengths - would be a problem if you wanted to use global alignment to assemble a set of overlapping sequences into a single long sequence. How would you initialize a DP matrix to identify the region(s) of overlap between two long sequences (a & b), which are known to overlap, but each of which is expected to have some unique sequences on one end? a) -----------------------|||||||||||||||||||||||||| b) ------------------------- BCB 444/544 Fall 07 Sept 21 Exam 1 C. p 3 of 6 PSSMs & PSI-BLAST (25 pts TOTAL) C1. (10pts) PSSM matrix - The alignment of four DNA sequences is shown below. CAACTG CAGCTG CAGGTG CAGCTT Which of the position-specific score matrices (PSSMs) shown above is most likely to be correct ? Explain. C2. (5pts) Briefly describe how the PAM and BLOSUM scoring matrices are derived and how they are different. C3. (5pts) In evaluating the results of a database search using BLAST, why is it sometimes important to consider the bit score, S', instead of only E-value? C4. (2pts) In what sense is the Smith-Waterman (local alignment) DP algorithm better than BLAST? C5. (3pts) Everything else being equal, when does BLAST produce a more significant E-value, when searcing a database of size 500,000 or when searching a database of size 1,000,000? Explain your answer BCB 444/544 Fall 07 Sept 21 Exam 1 D. p 4 of 6 Dot Plots & Misc. (20 pts TOTAL) D1. Suppose we are given 2 DNA sequences A and B. Draw a simple diagram of dot plots that would result from the following comparisons. To receive full credit, be sure to label both axes. a) (5pts ) DNA sequence A is 1000 bp in length and is identical to sequence B, which is 800 bp in length, except that A has a single 200 bp segment duplicated near the 3' end (right end). b) (5pts) Explain what the dot plot pattern shown below represents: D2. (5pts) Which lab did you like best? Why? D3. (5pts) (From Sean Eddy's paper - and discussed in lecture) Why is "dynamic programming" alled that? What does the name mean? Why did Richard Bellman at RAND give it this name? BCB 444/544 Fall 07 Sept 21 Exam 1 p 5 of 6 E. Molecular Biology & Bioinformatics Terms (20 pts TOTAL) (1pt each) Fill in the box beside each definition with one term that corresponds to the definition provided. Term E1. E2. Definition Genes in different species that evolved from a common ancestral gene and have similar functions A nucleotide or amino-acid sequence pattern that is often conserved and has, or is conjectured to have, functional significance E3. Observable characteristics of an organism E4. Process mediated by RNA polymerase in which information in DNA is copied into RNA. E5. Sections of eukaryotic genes that are transcribed, but spliced out of mature mRNA E6. A type of substitution matrix that relies on an explicit evolutionary model and is based on observed differences in closely related proteins E7. A region of a DNA sequence that begins with a START codon and ends with a STOP codon Software that uses progressive alignment hueristics to generate a multiple sequence alignment of related sequences An n x m matrix of log-odds scores, derived from a MSA of related protein sequences, which can be used to represent a (gapless) sequence motif A computational "shortcut" or "rule-of-thumb" that can dramatically shorten the "runtime" required to solve a problem, but cannot guarantee an optimal solution E8. E9. E10 ( 2pts each) Short answer: Answer each of the following questions (one phrase or sentence should be sufficient). E11. What is RNA splicing? E12. What is meant by 6-frame translation? E13. What is an affine gap penalty? E14. Why do we need/use heuristics for aligning sequences? E15. What are 3 basic computational methods for sequence alignment? BCB 444/544 Fall 07 Sept 21 Exam 1 p 6 of 6 F. The Question I Didn't Ask (5 pts TOTAL) Describe something you have learned from your reading, lectures or labs that was not asked on this Exam and that you think is worth 5 pts! Blosum62 matrix