BCB 444/544 - F07 Exam 2 (100 pts) Name_____________________________________

advertisement
BCB 444/544 Fall 07 Oct 26 Exam 2
BCB 444/544 - F07
Exam 2 (100 pts)
p 1 of 7
Name_____________________________________
A. Cells, Nucleus, Genetic Code, Transcription Factors, etc. (15 pts TOTAL)
Below the DNA sequence shown, write the RNA sequence that would be transcribed from the top
strand of DNA, assuming it is copied completely from the 3' to the 5' end. On the RNA sequence,
circle the START and STOP codons for translation. Translate the RNA sequence into amino acids
and write the deduced protein sequence below, too.
DNA
3'-TATATCGTACAGACGATTTTGATCTATTGACTGG-5'
5'-ATATAGCATGTCTGCTAAAACTAGATAACTGACC-3'
RNA
Protein:
NH2 –
Suppose the DNA sequence in another individual is different, due to a single base-pair deletion,
resulting in the sequence shown below. What effects (if any) would you expect this SNP have on the
sequence and function of the expected protein product?
3'-TATATCGTACGACGATTTTGATCTATTGACTGG-5'
5'-ATATAGCATGCTGCTAAAACTAGATAACTGACC-3'
RNA
Protein:
NH2 –
BCB 444/544 Fall 07 Oct 26 Exam 2
p 2 of 7
B. HMM (20 pts TOTAL)
Consider the occasionally dishonest casino example discussed in class.
The system has 3 states:
B denotes the start state
F denotes the state when a fair die is used
L denotes the state when a loaded die is used
The transition probabilities between these states are shown in the diagram.
The emission probabilities are:
for state F, eF(1) = eF(2) = … = eF(6) = 1/6
for state L, eL(1) = eL(2) = … = eL(5) = 0.1, eL(6) = 0.5
1. What is the most probable sequence of states, starting from state B, to produce the sequence of
die tosses 1,6? For full credit, you must show your work and fill in the table below.
1
6
B
F
L
The most probable sequence of states is:
2. What is the total probability of the sequence 1,6? Show your work and fill in the table below.
1
B
F
L
The total probability of the sequence 1,6 is:
6
BCB 444/544 Fall 07 Oct 26 Exam 2
p 3 of 7
C. Motifs, Domains, Structure & Structure Prediction (10 pts TOTAL)
C1.
(2 pts) Why is a profile more sensitive and flexible than a PSSM for detecting sequence
motifs in proteins?
C2.
(4 pts) Briefly explain the roles of base-pairing vs base-stacking interactions in RNA
structure prediction
C3.
(2 pts) Suggest one physical explanation for the fact that secondary structure prediction
algorithms can more accurately identify alpha-helical segments than beta-strands/sheets.
C4.
(2 pts) According to the paper by Ginalski et al., why are meta predictors better than
individual methods for predicting the tertiary structure of proteins?
BCB 444/544 Fall 07 Oct 26 Exam 2
p 4 of 7
D. Longer answers/problems (20 pts TOTAL)
D1.
(10 pts)
a) Briefly outline the steps used to predict a protein structure by threading.
b) What were the key "simplifications" exploited in the Ho method to make it fast enough to
be used for genome-wide threading?
D2. (10 pts) Given a single RNA sequence, GGCGCGGCACCGUCCGCGGAACAAACGG, we predict the
structure shown below. We then perform a database search and discover 5 homologous sequences
and align them. The MSA is shown below with nucleotides that are base-paired in the structure
highlighted and numbered so that base-paired positions have the same number above them.
a) Based on the information in this MSA, is our predicted structure likely to be
correct? Explain.
b) Are there any base-pairs in the predicted structure that are unlikely to form in
structures corresponding to the additional sequences in the MSA? Explain.
12345
54321
GGCGCGGCACCGUCCGCGGAACAAACGG
UCCGGGUCACCGUACGCGGAACAAACGG
UCCGUGCCACCGUGCGCGGAACAAACGG
UCCGUGACACCGUUCCCGGAACAAACGG
UCCGAGUCACCGUACGCGGAACAAACGG
UCCGCGUCACCGUACGCGGAACAAACGG
BCB 444/544 Fall 07 Oct 26 Exam 2
p 5 of 7
E. Molecular Biology & Bioinformatics Terms (20 pts TOTAL)
(1pt each) Fill in the box beside each definition with one term or acronym that corresponds to the
definition provided. (Some have more than one correct answer).
Term
Definition
Algorithm used to determine the most probable path for a sequence of
observed variables from a HMM
E1.
Viterbi algorithm
E2.
GOR, CDM, etc
E3.
CpG Island
E4.
CATH, SCOP, DALI
A protein structure classification database
E5.
PyMol, Cn3D, etc
A program for visualizing protein structures
E6.
CASP
E7.
Cytoskeleton
Internal structural scaffold that organizes the cytoplasm in eukaryotic cells
E8.
Peptide bond
Type of covalent bond that links amino acids in a polypeptide chain
E9.
Domain
E10
NMR spectroscopy
X-ray crystallograpy
A program for predicting protein secondary structure
Genomic region with a high frequency of CG dinucleotides, likely to be near the
transcription start site for genes
A protein structure prediction "contest"
Independent structural or functional unit of a protein
Experimental method for determining the 3-D structure of a macromolecule
(2pts each) Short answer: Answer each of the following questions (one phrase or sentence should
be sufficient in most cases).
E11. What is the difference between a protein motif and a protein domain?
A protein motif is a short, conserved sequence pattern. A protein domain is a
larger unit, corresponding to a longer sequence, that usually represents a
functionally or structurally independent unit (and may contain one or moremotifs).
E12. What is "hidden" in a hiddle Markov model?
The actual state is hidden. We c
an’t "see" the underlying state that emits the observed variables.
E13. What is meant by co-variation in the context of RNA structure prediction?
Co-variation is used to determine which residues are more likely to be base-paired. In a
MSA, if every time one position changes from an A to a C, another position changes from a
U to a G, those two positions are likely to be base-paired because the mutation/variations
change the sequence of the RNA, but conserve the ability to base-pair.
BCB 444/544 Fall 07 Oct 26 Exam 2
p 6 of 7
E14. Name the 3 basic computational appraoches for protein structure prediction.
E15. Which experimental protein structure determination method can to provide
information about protein dynamics?
F. Something a bit more thought provoking!! (10 pts TOTAL)
You are given a "mystery" gene sequence (M gene) from a newly discovered bacterium. The M gene has
only one large open reading frame (ORF), which would encode a protein ~200 amino acids in length.
Outline and briefly describe the types of computational analyses you would perform to try to
annotate this gene and its potential product(s). Be sure to provide relevant details and explain
how you would proceed if a proposed approach does not provide useful information. You should
begin like this -- but please replace underlined bits and "x"s with your own words:
First, I would perform a "xBLASTx" search using the M gene sequence to query the "x" database.
If no hits with e-values better than "x" are returned using default parameters, I would….
BCB 444/544 Fall 07 Oct 26 Exam 2
p 7 of 7
G. The Question I Didn't Ask (5 pts TOTAL)
Describe something you have learned about an ISU scientist or research project, somehow related
to bioinformatics or computational biology (from lectures, reading, labs, seminars), which you think
is worth 5 pts!
Genetic Code Table
Download