BCB 444/544 - F07 Exam 1 (100 pts) Name___________________________________

advertisement
BCB 444/544 Fall 07 Sept 21 Exam 1
BCB 444/544 - F07
Exam 1 (100 pts)
p 1 of 6
Name___________________________________
A. Databases & Literature Resources for Bioinformatics (10 pts TOTAL)
A1. (2pts) In your undergraduate research project, you have identified an especially interesting and, so far,
unannotated gene in bacteria, which you have named "BCB1." Your experimental results demonstrate that BCB1 is
an essential gene: mutations that knock-out its function are lethal. You have a hunch it must be conserved among all
life forms. To obtain support for this hypothesis, you would like to find identify a homolog of this gene in humans.
You logon to the BLAST page at NCBI and choose to run a basic protein BLAST search against only human
proteins. However, you obtain no significant hits!!! How should you change your search parameters to
increase your chances of detecting a potential human homolog?
A2. (1pt) Despite changing parameters as described above, you were unable to identify a putative human homolog.
You decide to change your strategy and run a BLAST search against proteins from all organisms. Great! You find
an extensive list of potential homologs across many forms of life -- but you still did not identify any potential
homologs in human. As you sit in frustration your thoughts drift back to your glory days in BCB 444/544 and you
remember an alternative BLAST program that takes advantage of a profile or PSSM in an interative search
procedure, thus providing more sensitivity for detecting remote homologs. What is this specific BLAST program
called?
A3. (2pts) You tap your foot and wait for your browser to refresh. You recall a few "suggestion" & "caveats" about
effective use of PSI-BLAST ; ) Hmmm… What is one "tip" for effective use of PSI-BLAST?
A4. (2pts) At last there it is, you’ve found a significant "hit" in human! homolog! This seems like an excellent and
fitting end to a long and exciting search! You pat yourself on the back and are just about to go out to celebrate with
a few beers, when your lab partner takes a look at the annotation for your putative human homolog and says: "Hey!
I think you've been scooped! I saw a paper describing a human protein with the same annotation from Drena
Dobbs's lab last year - it was in Science or Nature, I think--or maybe it was in NAR, no - it was Proteins, maybe 2
years ago! You'd better check it out!" Aaargh… it is 2 AM & the library is closed…Which online resource would
you use to find all papers published by Dobbs in biomedical journals during the past 5 years?
A5. (3pts) Darn! That Dobbs lab must have some amazing students! They did identify your gene in humans -- and
actually found two very similar genes. They said one of them is the ortholog of the gene you found in bacteria and
the other is actually a paralog. What is an ortholog and how does it differ from a paralog?
BCB 444/544 Fall 07 Sept 21 Exam 1
p 2 of 6
B. Dynamic Programming (20 pts TOTAL)
You think Dobbs made an error -- it looks like she confused the ortholog & paralog! A vital piece of evidence that
could prove this is an optimal global pairwise alignment between your prokaryotic gene and each of the human
homologs. You would love to prove Dobbs wrong, so despite the late hour, you decide to compare the two
alignments (in the bar, where you are now drowning your sorrows, while surfing web on your laptop). Aaaarrrgh!
Your battery just died - and you left your charger in lab!! You must perform the alignment by hand. Demonstrate
your prowess by reproducing a portion of that global alignment below.
B1. (8pts) Fill out the dynamic programming matrix for determining an optimal global alignment between
the sequences TCG and TCCAG. Scoring: +5 for matches; -3 for mismatches and spaces.

T
C
C
A
G

T
C
G
B2. (2pts) Where is the score of the optimal alignment(s) located in the DP matrix? (Circle it)
B3. (4pts) There are 2 optimal alignments. For full credit, draw both of them & show your traceback arrows.
B4. (4pts) You don't want to go home yet, so decide it would be entertaining to set up a DP matrix for local
alignment, using the BLOSUM62 matrix (attached to this Exam). But, you were able to fill in only the first two
rows before the bar closed. Show what you accomoplished in the matrix below:
0

T
C
C
A
G

T
B5. (2pts) Walking home with a bit of a buzz, it occurs to you that the "rule" for initializing a DP matrix for global
alignment - which can cause "end-gap" penalties to accumulate if sequences are of different lengths - would be a
problem if you wanted to use global alignment to assemble a set of overlapping sequences into a single long
sequence. How would you initialize a DP matrix to identify the region(s) of overlap between two long
sequences (a & b), which are known to overlap, but each of which is expected to have some unique sequences
on one end?
a)
-----------------------||||||||||||||||||||||||||
b)
-------------------------
BCB 444/544 Fall 07 Sept 21 Exam 1
C.
p 3 of 6
PSSMs & PSI-BLAST (25 pts TOTAL)
C1. (10pts) PSSM matrix - The alignment of four DNA sequences is shown below.
CAACTG
CAGCTG
CAGGTG
CAGCTT
Which of the position-specific score matrices (PSSMs) shown above is most likely to be correct ? Explain.
C2. (5pts) Briefly describe how the PAM and BLOSUM scoring matrices are derived and how they are
different.
C3. (5pts) In evaluating the results of a database search using BLAST, why is it sometimes important to
consider the bit score, S', instead of only E-value?
C4. (2pts) In what sense is the Smith-Waterman (local alignment) DP algorithm better than BLAST?
C5. (3pts) Everything else being equal, when does BLAST produce a more significant E-value, when searcing
a database of size 500,000 or when searching a database of size 1,000,000? Explain your answer
BCB 444/544 Fall 07 Sept 21 Exam 1
D.
p 4 of 6
Dot Plots & Misc. (20 pts TOTAL)
D1. Suppose we are given 2 DNA sequences A and B. Draw a simple diagram of dot plots that would result from the
following comparisons. To receive full credit, be sure to label both axes.
a) (5pts ) DNA sequence A is 1000 bp in length and is identical to sequence B,
which is 800 bp in length, except that A has a single 200 bp segment duplicated
near the 3' end (right end).
b) (5pts) Explain what the dot plot pattern shown below represents:
D2. (5pts) Which lab did you like best? Why?
D3. (5pts) (From Sean Eddy's paper - and discussed in lecture) Why is "dynamic programming" alled
that? What does the name mean? Why did Richard Bellman at RAND give it this name?
BCB 444/544 Fall 07 Sept 21 Exam 1
p 5 of 6
E. Molecular Biology & Bioinformatics Terms (20 pts TOTAL)
(1pt each) Fill in the box beside each definition with one term that corresponds to the definition provided.
Term
E1.
E2.
Definition
Genes in different species that evolved from a common ancestral gene and have similar
functions
A nucleotide or amino-acid sequence pattern that is often conserved and has, or is conjectured to
have, functional significance
E3.
Observable characteristics of an organism
E4.
Process mediated by RNA polymerase in which information in DNA is copied into RNA.
E5.
Sections of eukaryotic genes that are transcribed, but spliced out of mature mRNA
E6.
A type of substitution matrix that relies on an explicit evolutionary model and is based on
observed differences in closely related proteins
E7.
A region of a DNA sequence that begins with a START codon and ends with a STOP codon
Software that uses progressive alignment hueristics to generate a multiple sequence alignment of
related sequences
An n x m matrix of log-odds scores, derived from a MSA of related protein sequences, which
can be used to represent a (gapless) sequence motif
A computational "shortcut" or "rule-of-thumb" that can dramatically shorten the "runtime"
required to solve a problem, but cannot guarantee an optimal solution
E8.
E9.
E10
( 2pts each) Short answer: Answer each of the following questions (one phrase or sentence should be
sufficient).
E11.
What is RNA splicing?
E12.
What is meant by 6-frame translation?
E13.
What is an affine gap penalty?
E14.
Why do we need/use heuristics for aligning sequences?
E15.
What are 3 basic computational methods for sequence alignment?
BCB 444/544 Fall 07 Sept 21 Exam 1
p 6 of 6
F. The Question I Didn't Ask (5 pts TOTAL)
Describe something you have learned from your reading, lectures or labs that was not asked on this Exam and that you think is worth 5 pts!
Blosum62 matrix
Download