New Strategies for Protein Folding Joseph F. Danzer, Derek A. Debe, Matt J. Carlson, William A. Goddard III Materials and Process Simulation Center California Institute of Technology Protein Tertiary Structure Prediction Given a Protein’s Primary Structure -- Amino Acid Sequence …-HIS-CYS-ALA-ALA-GLY-GLU-ASP-... Can We Determine It’s 3D Structure How Do Those Structural Units Pack Together? What Local Structural Units Does It Form? •-Helix (Cylinder) •-Sheets (Ribbon) Structure Prediction is a Two Fold Problem With a 6 (f,y) state representation, 650 or 1038 states for a 50 residue protein Assuming protein may sample 1state/ps, 1019 years to fold •Conformational Search Problem –Given the exponentially large number of possible states, how do we generate a correct state? •Recognition Problem –How do we differentiate correct from incorrect folds? Restrained Generic Protein (RGP) Direct Monte Carlo Highly efficient, off-lattice residue buildup procedure for generating ensembles of protein conformations that comply with a set of user defined distance restraints. q f l l = 3.8Å; q = 120; Typically f = 0, 60, 120, 180, 240, 300. (6 states per residue) Generic Protein Model •Each residue is a 5.5 Å sphere •Fixed geometry connects residues Restraint Implementation At residue addition step i, the maximal position of residue i+n in the (z,r) plane is known. r i+4 i+4 i+4 i+4 i+4 i-1 i+4 i i+4 z Satisfies pairwise restraints with >90% efficiency with negligible computational cost. i+4 i+4 i+4 i+4 Leads to a simple set of trigonometric conditions for restraint satisfaction. Generate-and-Select Hierarchy Inter-residue restraints RGP Ensemble Generation Amino Acid Sequence 4 <10 topologies Static Residue Burial Selection <500 topologies Intact Peptide Backbone Secondary structure prediction Dynamic Residue Burial Selection <20 topologies Local Structure Refinement Additional Restraints <10 topologies Additional Refinement <5 topologies LexA Repressor RGP Ensemble Selected Set Sec. Prediction N/36 Sa 30,0000 CRMSb 6.85Å sc 395 Rankd 24t CRMSe 7.46Å Rankf 14t CRMSg 6.67Å N/24 5,000 6.57Å 209 6t 6.76Å 2t 6.11Å N/12 500 6.28Å 271 1 6.43Å 7t 4.45Å N/6 - - 44 2 6.13Å 1t 5.76Å Secondary Structure Prediction-PHD Burkhard Rost & Chris Sander, J. Mol. Biol. 232, 584 (1993). Myoglobin RGP Ensemble Selected Set Sec. Prediction N/12 S 50,000 CRMS 8.95Å s 117 Rank 11 CRMS 8.77Å Rank 5 CRMS 7.01Å N/6 - - 23 1 9.28Å 1 6.30Å Inter-Residue Restraints If tertiary structure is unknown, How can we generate distance restraints? •Experimentally determined disulfide bond connectivity •Use PHD prediction algorithm to generate loose restraints1 PHD predicts whether each residue will be buried or exposed to solvent •Assume the residues with greatest burial form a hydrophobic core •Generate a few loose restraints (4-10 Å) between these residues Tests on two proteins (3icb,1lea) using loose restraints were done Protein # Restraints 3icb 3 1 3 8* 7** 1lea Energy Cut-Off -26 -23 -27 -18 -30 -27 # Selected Structures 463 460 172 2242 110 330 # Near Native 4 2 1 1 3 8 *All restraints were picked so that they were incorrect **All restraints were picked so that they were correct 1. Burkhard Rost & Chris Sander, J. Mol. Biol. 232, 584 (1993). Best CRMS 7.787 7.827 8.300 8.484 7.001 7.001 Local Structure Refinement •Dynamic Monte Carlo –Make small local deformations to the backbone structure –Overall topology must be kept intact –Use simple energy function to determine if deformation is accepted or rejected •Fragment Sewing –Isites1 library is a database of structural fragments widely observed in the Protein Data Bank. –Based on sequence homology, Isites will generate a list of fragments whose structures are likely to be found in the protein –Local structure can be refined by sewing these fragments into the overall structure 1. C. Bystroff & D. Baker, J. Mol. Bol. 281, 565 (1998). Dynamic Monte Carlo Local deformations are made by modifying the position of a single residue. Axis of rotation Circle defines allowed movement based on fixed geometry of model Energy function properly orients side chains. Hydrophilic groups point outward and hydrophobic groups point inward. C- Atoms Hydrophilic Side Chain Hydrophobic Side Chain Fragment Sewing Segment’s original structure New structure after sewing Rest of protein Overall topology is still intact, but now local structure has -helical structure rather than a random coil.