Modeling the Structures of Macromolecular Assemblies Frank Alber Maya Topf Andrej Šali http://salilab.org/ Depts. of Biopharmaceutical Sciences and Pharmaceutical Chemistry California Institute for Quantitative Biomedical Research University of California at San Francisco UC S F Contents Modeling of assemblies by optimization (5’, Andrej) Representation for modeling, illustrated by NPC (15’, Frank) Comparative modeling (10’, Andrej) Combined comparative modeling and density fitting (10’, Maya) Acknowledgments http://salilab.org Our group/assemblies EM E. coli ribosome Frank Alber Maya Topf Fred Davis Dmitry Korkin M.S. Madhusudhan Damien Devos Marc A. Martí-Renom Mike Kim Wah Chiu Matt Baker Wen Jiang H.Gao J.Sengupta M.Valle A.Korostelev S.Stagg P.Van Roey R.Agrawal S.Harvey M.Chapman J.Frank Bino John Narayanan Eswar Ursula Pieper Min-yi Shen Structural Genomics Stephen Burley (SGX) John Kuriyan (UCB) NY-SGRC Chandrajit Bajaj Yeast NPC Tari Suprapto Julia Kipper Wenzhu Zhang Liesbeth Veenhoff Sveta Dokudovskaya Mike Rout Brian Chait NIH NSF Sinsheimer Foundation A. P. Sloan Foundation Burroughs-Wellcome Fund Merck Genome Res. Inst. Mathers Foundation I.T. Hirschl Foundation The Sandler Family Foundation SUN IBM Intel Structural Genomix Protein and assembly structure by experiment and computation Sali, Earnest, Glaeser, Baumeister. From words to literature in structural proteomics. Nature 422, 216-225, 2003. Determining the structures of proteins and assemblies Use structural information from any source: measurement, first principles, rules, resolution: low or high resolution to obtain the set of all models that are consistent with it. Modeling proteins and macromolecular assemblies by satisfaction of spatial restraints 1) Representation of a system. 2) Scoring function (spatial restraints). 3) Optimization. Sali, Earnest, Glaeser, Baumeister. Nature 422, 216-225, 2003. There is nothing but points and restraints on them. Assembly representation for modeling (visualization) Multi-scale representation: Resolution of subunit structures can vary largely. Hierarchical representation: Spatial information about subunit-subunit interactions may be available at different resolutions. Flexibility of subunits: Subunit structures may be subject to conformational changes upon assembly formation. Points and restraints between them (distance, angle dihedral restraints). Software implementation Assembler, extension of MODELLER Provides a framework for assembly modelling: • multi-scale hierarchical representations, • integration of experimental input data, • calculates models that are consistent with input data. Resolution of restraints 300Å 1Å Cross-linking Cross-linking Cross-linking Cross-linking FRET FRET FRET FRET Site-directed mutagenesis Site-directed mutagenesis Site-directed mutagenesis Site-directed mutagenesis Computational docking Computational docking Computational docking Binding site predictions Binding site predictions Correlated mutations Correlated mutations Binding site predictions Cryo-EM/ tomography Computational docking Binding site predictions Cryo-EM/ tomography Tandem affinity purification Cross-linking FRET Site-directed mutagenesis Computational docking Binding site predictions Cryo-EM/ tomography Tandem affinity purification Immuno-EM Yeast-two-hybrid Cryo-EM/ tomography Resolution of restraints 300Å 1Å Cross-linking Cross-linking FRET FRET Site-directed mutagenesis Site-directed mutagenesis Site-directed mutagenesis Cryo-EM/ tomography Cryo-EM/ tomography Cross-linking FRET Computational docking Binding site predictions Cryo-EM/ tomography Cross-linking FRET Site-directed mutagenesis Computational docking Binding site predictions Cryo-EM/ tomography Tandem affinity purification Cross-linking FRET Site-directed mutagenesis Computational docking Binding site predictions Cryo-EM/ tomography Tandem affinity purification Immuno-EM Yeast-two-hybrid Cryo-EM/ tomography Properties of points Interaction centers: Binding patches Define interaction centers at each structural level. Interaction centers transmit interactions between assembly subunits. Properties of points Interaction centers: Subunit flexibility: q Define interaction centers at each structural level. Interaction centers transmit interactions between assembly subunits. q Allow or restrict conformational degrees of freedom within assembly subunits. Design alternative internal restraints sets for different conformational states. Modeling the Yeast Nuclear Pore complex by satisfaction of spatial restraints Andrej Sali Frank Alber, Damien Devos Depts. Of Biopharmaceutical Sciences and Pharmaceutical Chemistry California Institute for Quantitative Biomedical Research University of California at San Francisco http://salilab.org/ Mike Rout, T. Suprapto, J. Kipper, L. Veenhoff, S. Dokudovskaya Brian Chait, W. Zhang The Rockefeller University, 1230 York Avenue, New York Integrate spatial information 1) NUP Localization NUP- NUP Interactions Symmetry Global shape NUP Shape NUP Stoichiometry 2) 3) Successful optimization Analysis of models Assessing the well scoring models: How similar are the models to each other? Do the models make sense given other data? Search for conservation of : Protein-protein contacts Structural features Visualization of the results: Probability distribution of subunit positions (density maps) Comparative modeling Steps in Comparative Protein Structure Modeling START TARGET Template Search ASILPKRLFGNCEQTSDEGLK IERTPLVPHISAQNVCLKIDD VPERLIPERASFQWMNDK Target – Template Alignment TEMPLATE ASILPKRLFGNCEQTSDEGLKIERTPLVPHISAQNVCLKIDDVPERLIPE MSVIPKRLYGNCEQTSEEAIRIEDSPIV---TADLVCLKIDEIPERLVGE Model Building Model Evaluation No OK? Yes END A. Šali, Curr. Opin. Biotech. 6, 437, 1995. R. Sánchez & A. Šali, Curr. Opin. Str. Biol. 7, 206, 1997. M. Marti et al. Ann. Rev. Biophys. Biomolec. Struct., 29, 291, 2000. http://salilab.org/ Comparative modeling by satisfaction of spatial restraints MODELLER 3D GKITFYERGFQGHCYESDC-NLQP… SEQ GKITFYERG---RCYESDCPNLQP… 1. Extract spatial restraints 2. Satisfy spatial restraints F(R) = P pi (fi /I) i A. Šali & T. Blundell. J. Mol. Biol. 234, 779, 1993. J.P. Overington & A. Šali. Prot. Sci. 3, 1582, 1994. A. Fiser, R. Do & A. Šali, Prot. Sci., 9, 1753, 2000. http://salilab.org/ Comparative modeling of the TrEMBL database Unique sequences processed: 1,182,126 Sequences with fold assignments or models: 659,495 (56%) 70% of models based on <30% sequence identity to template. On average, only a domain per protein is modeled (an “average” protein has 2.5 domains of 175 aa). 9/15/03 ~4 weeks on 660 Pentium CPUs http://salilab.org/modbase Pieper et al., Nucl. Acids Res. 2002. Structural Genomics Sali. Nat. Struct. Biol. 5, 1029, 1998. Sali et al. Nat. Struct. Biol., 7, 986, 2000. Sali. Nat. Struct. Biol. 7, 484, 2001. Baker & Sali. Science 294, 93, 2001. Characterize most protein sequences based on related known structures. The number of “families” is much smaller than the number of proteins. Any one of the members of a family is fine. There are ~16,000 30% seq id families (90%) (Vitkup et al. Nat. Struct. Biol. 8, 559, 2001). Comparative modeling and EM density fitting E. coli ribosome H.Gao, J.Sengupta, M.Valle, A. Korostelev, N.Eswar, S.Stagg, P.Van Roey, R.Agrawal, S.Harvey, A.Sali, M.Chapman, and J.Frank. Cell 113, 789-801, 2003. 1. Cryo-EM density map of the 70S ribosome of E. coli was determined at ~12Å. 2. Calculate comparative models of the proteins and rRNA based on T. thermophilus, H. marismortui, and D. radiodurans ribosome structures. 3. For each protein, select the best model among several alternatives based on model assessment by statistical potentials, model compactness, targettemplate sequence identity, quality of template structure, and fit into the EM map. 4. Obtain an assembly model by multi-rigid body real-space refinement of the fit into the EM map, by RSRef (M. Chapman). 5. Repeat 3 and 4 iteratively to get an optimized model. E. coli ribosome Fitting of comparative models into 12Å cryo-electron density map. All 29 proteins in 30S and 29 of 36 proteins in the 50S subunit could be modeled on 22-73% seq.id. to a known structure. The models generally cover whole folds, but tails. H.Gao, J.Sengupta, M.Valle, A. Korostelev, N.Eswar, S.Stagg, P.Van Roey, R.Agrawal, S.Harvey, A.Sali, M.Chapman, and J.Frank. Cell 113, 789-801, 2003. Errors in comparative models vs. resolution Incorrect template Rigid body movement 20Å Maps courtesy of Wah Chiu. Misalignment 10Å Region without a template Distortion/shifts in aligned regions Sidechain packing 2Å Moulding: iterative alignment, model building, model assessment alignment model building model assessment Models per alignment B. John, A. Sali. Nucl. Acids Res. 31, 3982, 2003. 105 Comparative modeling 104 Moulding Threading 1 1 104 1030 Alignments Moulding Start Sequence profiles of target & template Generate 25 initial target-template alignments Rank alignments by alignment score (top 15 parents) Build all atom models for 15 parent alignments Rank models by the GA341 score Best GA341 score < 0.6? No Apply genetic algorithm operators to parents Number of child alignments >300? No No Select representative alignments Build all-atom models for all representative alignments Select best 10 models by statistical potential score (parents for next iteration) alignment alignment Number of iterations >25? model building Rank models by composite score Select best model Stop model assessment Application to a difficult modeling case 1BOV-1LTS (4.4% sequence identity) final initial Ca RMSD 3.6Å Ca RMSD 10.1Å 1lts structure 1lts model Given a sequence-structure pair and an EM map: Exploring alignment (CM) and rigid body (EM) degrees of freedom 1. Generate a set of comparative models by a standard procedure. 2. Select the model that fits the EM map best. More generally: Use the quality of the best fit into the EM map as one of the model fitness criteria in MOULDER. With Wah Chiu et al. Moulding protocol S fold assignment helixhunter, sheehunter PROF, PSIPRED Initial alignment MODELLER alignment MODELLER model building foldhunter, EM docking algorithm model fitting GA341 score PROSA score EM fitting score alignment model assessment model building model assessment E Application: Rice Dwarf Virus (6.8Å EM) Fold assignment: HELIXHUNTER and DEJAVU - 2 templates: Upper domain (beta sheet) - bluetongue virus VP7 (1bvp) Lower domain (helical) - rotavirus VP6 (1qhd) Generate initial alignments : HELIXHUNTER and secondary structure prediction methods MOULDING: Iterative alignment - restrained by EM data Model building with MODELLER, loop optimization Model assessment (GA341 score and PROSA score) Model fitting (FOLDHUNTER) With Wah Chiu et al. Upper domain - good fit loop optimization: EM-derived restraints + MM forcefield Lower domain - to improve fit MODELLERderived restraints model fitting EM-derived restraints best fitted structure END Acknowledgments http://salilab.org Comparative Modeling Bino John Narayanan Eswar Ursula Pieper Marc A. Martí-Renom Ribosomes H.Gao J.Sengupta M.Valle A.Korostelev S.Stagg P.Van Roey R.Agrawal S.Harvey M.Chapman J.Frank LSD Patsy Babbitt, Fred Cohen, Ken Dill, Tom Ferrin, John Irwin, Matt Jacobson, Tanja Kortemme, Tack Kuntz, Brian Shoichet, Chris Voigt Structural Genomics Stephen Burley (SGX) John Kuriyan (UCB) NY-SGRC NIH NSF Sinsheimer Foundation A. P. Sloan Foundation Burroughs-Wellcome Fund Merck Genome Res. Inst. Mathers Foundation I.T. Hirschl Foundation The Sandler Family Foundation SUN IBM Intel Structural Genomix Analysis Assessing the well scoring models How similar are the models to each other? Do the models make sense given other data? Using “toy” models as benchmarks. Score Analysis frequency Protein-protein contacts NUP type Search conservation of : NUP type Structural features E. coli ribosome H.Gao, J.Sengupta, M.Valle, A. Korostelev, N.Eswar, S.Stagg, P.Van Roey, R.Agrawal, S.Harvey, A.Sali, M.Chapman, and J.Frank. Cell 113, 789-801, 2003. 1. … 2. Calculate comparative models of the proteins and rRNA based on T. thermophilus, H. marismortui, and D. radiodurans ribosome structures. 3. For each protein, select the best model among several alternatives based on model assessment by statistical potentials, model compactness, target-template sequence identity, quality of template structure, and fit into the EM map. 4. Obtain the final assembly model by rigid body real-space refinement of the fit into the EM map, by RSRef (M. Chapman). 5. Repeat 3 and 4. Combining Modeller and Docking of Atomic Models into EM Density Maps (ModEM?) Improving resolution and accuracy of molecular models determined by cryo-EM and comparative protein structure prediction Map courtesy of Wah Chiu. Typical errors in comparative models Incorrect template Misalignment MODEL X-RAY TEMPLATE Region without a template Distortion/shifts in aligned regions Marti-Renom et al. Annu.Rev.Biophys.Biomol.Struct. 29, 291-325, 2000. Sidechain packing