BCB 444/544 - F07 Exam 2 (100 pts) Name_____________________________________

BCB 444/544 Fall 07 Oct 26 Exam 2 BCB 444/544 - F07 Exam 2 (100 pts) p 1 of 7 Name_____________________________________ A. Cells, Nucleus, Genetic Code, Transcription Factors, etc. (15 pts TOTAL) Below the DNA sequence shown, write the RNA sequence that would be transcribed from the top strand of DNA, assuming it is copied completely from the 3' to the 5' end. On the RNA sequence, circle the START and STOP codons for translation. Translate the RNA sequence into amino acids and write the deduced protein sequence below, too. DNA 3'-TATATCGTACAGACGATTTTGATCTATTGACTGG-5' 5'-ATATAGCATGTCTGCTAAAACTAGATAACTGACC-3' RNA Protein: NH2 – Suppose the DNA sequence in another individual is different, due to a single base-pair deletion, resulting in the sequence shown below. What effects (if any) would you expect this SNP have on the sequence and function of the expected protein product? 3'-TATATCGTACGACGATTTTGATCTATTGACTGG-5' 5'-ATATAGCATGCTGCTAAAACTAGATAACTGACC-3' RNA Protein: NH2 – BCB 444/544 Fall 07 Oct 26 Exam 2 p 2 of 7 B. HMM (20 pts TOTAL) Consider the occasionally dishonest casino example discussed in class. The system has 3 states: B denotes the start state F denotes the state when a fair die is used L denotes the state when a loaded die is used The transition probabilities between these states are shown in the diagram. The emission probabilities are: for state F, eF(1) = eF(2) = … = eF(6) = 1/6 for state L, eL(1) = eL(2) = … = eL(5) = 0.1, eL(6) = 0.5 1. What is the most probable sequence of states, starting from state B, to produce the sequence of die tosses 1,6? For full credit, you must show your work and fill in the table below. 1 6 B F L The most probable sequence of states is: 2. What is the total probability of the sequence 1,6? Show your work and fill in the table below. 1 B F L The total probability of the sequence 1,6 is: 6 BCB 444/544 Fall 07 Oct 26 Exam 2 p 3 of 7 C. Motifs, Domains, Structure & Structure Prediction (10 pts TOTAL) C1. (2 pts) Why is a profile more sensitive and flexible than a PSSM for detecting sequence motifs in proteins? C2. (4 pts) Briefly explain the roles of base-pairing vs base-stacking interactions in RNA structure prediction C3. (2 pts) Suggest one physical explanation for the fact that secondary structure prediction algorithms can more accurately identify alpha-helical segments than beta-strands/sheets. C4. (2 pts) According to the paper by Ginalski et al., why are meta predictors better than individual methods for predicting the tertiary structure of proteins? BCB 444/544 Fall 07 Oct 26 Exam 2 p 4 of 7 D. Longer answers/problems (20 pts TOTAL) D1. (10 pts) a) Briefly outline the steps used to predict a protein structure by threading. b) What were the key "simplifications" exploited in the Ho method to make it fast enough to be used for genome-wide threading? D2. (10 pts) Given a single RNA sequence, GGCGCGGCACCGUCCGCGGAACAAACGG, we predict the structure shown below. We then perform a database search and discover 5 homologous sequences and align them. The MSA is shown below with nucleotides that are base-paired in the structure highlighted and numbered so that base-paired positions have the same number above them. a) Based on the information in this MSA, is our predicted structure likely to be correct? Explain. b) Are there any base-pairs in the predicted structure that are unlikely to form in structures corresponding to the additional sequences in the MSA? Explain. 12345 54321 GGCGCGGCACCGUCCGCGGAACAAACGG UCCGGGUCACCGUACGCGGAACAAACGG UCCGUGCCACCGUGCGCGGAACAAACGG UCCGUGACACCGUUCCCGGAACAAACGG UCCGAGUCACCGUACGCGGAACAAACGG UCCGCGUCACCGUACGCGGAACAAACGG BCB 444/544 Fall 07 Oct 26 Exam 2 p 5 of 7 E. Molecular Biology & Bioinformatics Terms (20 pts TOTAL) (1pt each) Fill in the box beside each definition with one term or acronym that corresponds to the definition provided. (Some have more than one correct answer). Term Definition Algorithm used to determine the most probable path for a sequence of observed variables from a HMM E1. Viterbi algorithm E2. GOR, CDM, etc E3. CpG Island E4. CATH, SCOP, DALI A protein structure classification database E5. PyMol, Cn3D, etc A program for visualizing protein structures E6. CASP E7. Cytoskeleton Internal structural scaffold that organizes the cytoplasm in eukaryotic cells E8. Peptide bond Type of covalent bond that links amino acids in a polypeptide chain E9. Domain E10 NMR spectroscopy X-ray crystallograpy A program for predicting protein secondary structure Genomic region with a high frequency of CG dinucleotides, likely to be near the transcription start site for genes A protein structure prediction "contest" Independent structural or functional unit of a protein Experimental method for determining the 3-D structure of a macromolecule (2pts each) Short answer: Answer each of the following questions (one phrase or sentence should be sufficient in most cases). E11. What is the difference between a protein motif and a protein domain? A protein motif is a short, conserved sequence pattern. A protein domain is a larger unit, corresponding to a longer sequence, that usually represents a functionally or structurally independent unit (and may contain one or moremotifs). E12. What is "hidden" in a hiddle Markov model? The actual state is hidden. We c an’t "see" the underlying state that emits the observed variables. E13. What is meant by co-variation in the context of RNA structure prediction? Co-variation is used to determine which residues are more likely to be base-paired. In a MSA, if every time one position changes from an A to a C, another position changes from a U to a G, those two positions are likely to be base-paired because the mutation/variations change the sequence of the RNA, but conserve the ability to base-pair. BCB 444/544 Fall 07 Oct 26 Exam 2 p 6 of 7 E14. Name the 3 basic computational appraoches for protein structure prediction. E15. Which experimental protein structure determination method can to provide information about protein dynamics? F. Something a bit more thought provoking!! (10 pts TOTAL) You are given a "mystery" gene sequence (M gene) from a newly discovered bacterium. The M gene has only one large open reading frame (ORF), which would encode a protein ~200 amino acids in length. Outline and briefly describe the types of computational analyses you would perform to try to annotate this gene and its potential product(s). Be sure to provide relevant details and explain how you would proceed if a proposed approach does not provide useful information. You should begin like this -- but please replace underlined bits and "x"s with your own words: First, I would perform a "xBLASTx" search using the M gene sequence to query the "x" database. If no hits with e-values better than "x" are returned using default parameters, I would…. BCB 444/544 Fall 07 Oct 26 Exam 2 p 7 of 7 G. The Question I Didn't Ask (5 pts TOTAL) Describe something you have learned about an ISU scientist or research project, somehow related to bioinformatics or computational biology (from lectures, reading, labs, seminars), which you think is worth 5 pts! Genetic Code Table

BCB 444/544 - F07 Exam 2 (100 pts) Name_____________________________________

Related documents

Products

Support

BCB 444/544 - F07 Exam 2 (100 pts) Name_____________________________________

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib