Problem I Definitions Provide a BRIEF description for the terms listed below: Smith-Waterman BLOSUM62 UPGMA Parsimony ddA (di-deoxyA) TBLASTX Shannon entropy Pseudoknot Pseudocount rotamer library Problem II Secondary sequence analysis In secondary structure analysis, both the Chou-Fasman algorithm and the Garnier-Osguthorpe-Robson (GOR) methods are inherently statistical in nature. However, the ChouFasman method is sometimes described as having some “physical principles” contained within it, while the GOR is sometimes described as an “information theory”-style approach. a. Describe the basis for the Chou-Fasman method and explain why some describe it as having some “physical principles” within it. b. Describe the basis for the GOR secondary structure predictions and explain why some refer to it as an information-theory style approach. c. JPRED is a consensus-based approach to secondary structure predictions. Explain what this “consensus-based” term means and explain why this approach gives the highest overall accuracy in predicting secondary structure of proteins. Problem III Protein Structure Prediction Here is information about target T0140 from CASP5 (same as that that given to assessment participants). CASP5 Target T0140 1. Protein Name: 1b11 2. Organism Name: Synthetic protein 3. Number of amino acids (approx): 103 4. Accession number: 5. Sequence Database: 6. Amino acid sequence: MRGSHHHHHHGSRLQSGKMTGIVKWFNADKGFGFITPDDGSKDVFVHFSAGSSGAAVRG NPQQGDRVEGKIKSITDFGIFIGLDGGIDGLVHLSDISWAQAEA 7. Additional Information 1b11 is a synthetic protein constructed by non-homologous recombination. The N-terminal part derives from cold shock protein A (CspA), while the C-terminal segment comes from the E.coli 30S ribosomal subunit protein S1. (Riechmann L, Winter G. Novel folded protein domains generated by combinatorial shuffling of polypeptide segments. Proc Natl Acad Sci U S A. 2000 Aug 29;97(18):10068-73.) 8. Crystallization conditions: include MES pH5.6 The protein is a tetramer under native conditions, but after denaturation, elutes at approximately the molecular weight of dimer on gel filtration. 9. X-ray structure yes 10. Current state of the experimental work: Completed 11. Interpretable map?: yes 12. Estimated date of chain tracing completion: June 13. Estimated date of public release of structure: September 14. Name: unavailable until after public release of structure Here is the abstract from the Riechmann & Winter article describing how target T0140 was constructed: It has been proposed that the architecture of protein domains has evolved by the combinatorial assembly and/or exchange of smaller polypeptide segments. To investigate this proposal, we fused DNA encoding the N-terminal half of a beta-barrel domain (from cold shock protein CspA) with fragmented genomic Escherichia coli DNA and cloned the repertoire of chimeric polypeptides for display on filamentous bacteriophage. Phage displaying folded polypeptides were selected by proteolysis; in most cases the protease-resistant chimeric polypeptides comprised genomic segments in their natural reading frames. Although the genomic segments appeared to have no sequence homologies with CspA, one of the originating proteins had the same fold as CspA, but another had a different fold. Four of the chimeric proteins were expressed as soluble polypeptides; they formed monomers and exhibited cooperative unfolding. Indeed, one of the chimeric proteins contained a set of very slowly exchanging amides and proved more stable than CspA itself. These results indicate that native-like proteins can be generated directly by combinatorial segment assembly from nonhomologous proteins, with implications for theories of the evolution of new protein folds, as well as providing a means of creating novel domains and architectures in vitro. a. Describe the three general strategies used for structure prediction in CASP and when each is appropriate. b. Outline how you would go about predicting the structure of target T0140, justifying your choice of methods. Describe at least 5 sequential steps you would take. c. Describe two specific challenges where improvements are needed in protein structure prediction today.Problem IV Protein Structure Modeling Approaches For many approaches to protein structure analysis there are two key parts to the problem: (1) how to search the relevant sequence or structure space, and (2) how to evaluate, or score, which sequence or structure is best. a. Give examples of three distinct problems that we studied in the protein structure part of the class. For each, describe a search algorithm and a scoring function that can be used in combination to address it. b. Search methods sometimes constrain the energy functions that can be used, and vice versa. Give an example of a scoring function and a search method that can NOT be used together, and describe why they are incompatible.Problem V Modeling of simple reactions Consider the following reaction in which the forward rate constant is k1 and the reverse rate constant is k-1 a. On the rate balance plot below, draw lines indicating the forward reaction and the reverse reaction (and label them accordingly). Indicate the steady state points on the graph. Is this system bistable or monostable? Rate [A*]/([A]+[A*]) b. Consider simple linear feedback – that is, A* feeds back and catalyzes the conversion of A to A*. On the ratebalance plot below, draw a curve representing this feedback. Rate c. Now draw a rate balance plot that includes results from both simple linear feedback and the reverse reaction. For simplicity, assume the forward reaction without feedback is negligible. Indicate the two equilibrium states. Are they both stable – that is, can simple linear positive feedback as illustrated here generate a bistable system? Explain why in a sentence or two. Rate [A*]/([A]+[A*])