Sali Assemblies Wah 11Oct03

advertisement
Modeling the Structures of
Macromolecular Assemblies
Frank Alber
Maya Topf
Andrej Šali
http://salilab.org/
Depts. of Biopharmaceutical Sciences and Pharmaceutical Chemistry
California Institute for Quantitative Biomedical Research
University of California at San Francisco
UC S F
Contents
Modeling of assemblies by optimization (5’, Andrej)
Representation for modeling, illustrated by NPC (15’, Frank)
Comparative modeling (10’, Andrej)
Combined comparative modeling and density fitting (10’, Maya)
Acknowledgments
http://salilab.org
Our group/assemblies
EM
E. coli ribosome
Frank Alber
Maya Topf
Fred Davis
Dmitry Korkin
M.S. Madhusudhan
Damien Devos
Marc A. Martí-Renom
Mike Kim
Wah Chiu
Matt Baker
Wen Jiang
H.Gao
J.Sengupta
M.Valle
A.Korostelev
S.Stagg
P.Van Roey
R.Agrawal
S.Harvey
M.Chapman
J.Frank
Bino John
Narayanan Eswar
Ursula Pieper
Min-yi Shen
Structural Genomics
Stephen Burley (SGX)
John Kuriyan (UCB)
NY-SGRC
Chandrajit Bajaj
Yeast NPC
Tari Suprapto
Julia Kipper
Wenzhu Zhang
Liesbeth Veenhoff
Sveta Dokudovskaya
Mike Rout
Brian Chait
NIH
NSF
Sinsheimer Foundation
A. P. Sloan Foundation
Burroughs-Wellcome Fund
Merck Genome Res. Inst.
Mathers Foundation
I.T. Hirschl Foundation
The Sandler Family Foundation
SUN
IBM
Intel
Structural Genomix
Protein and assembly structure by
experiment and computation
Sali, Earnest, Glaeser, Baumeister. From words to literature in structural proteomics. Nature 422, 216-225, 2003.
Determining the structures of proteins
and assemblies
Use structural information from any
source: measurement, first principles, rules,
resolution: low or high resolution
to obtain the set of all models that are consistent with it.
Modeling proteins and macromolecular assemblies
by satisfaction of spatial restraints
1) Representation of a system.
2) Scoring function (spatial restraints).
3) Optimization.
Sali, Earnest, Glaeser, Baumeister. Nature 422, 216-225, 2003.
There is nothing but points and
restraints on them.
Assembly representation for modeling
(visualization)
Multi-scale representation:
Resolution of subunit structures can vary largely.
Hierarchical representation:
Spatial information about subunit-subunit interactions may be
available at different resolutions.
Flexibility of subunits:
Subunit structures may be subject to conformational changes
upon assembly formation.
Points and restraints between them (distance, angle dihedral
restraints).
Software implementation
Assembler, extension of MODELLER
Provides a framework for assembly modelling:
• multi-scale hierarchical representations,
• integration of experimental input data,
• calculates models that are consistent with input data.
Resolution of restraints
300Å
1Å
Cross-linking
Cross-linking
Cross-linking
Cross-linking
FRET
FRET
FRET
FRET
Site-directed
mutagenesis
Site-directed
mutagenesis
Site-directed
mutagenesis
Site-directed
mutagenesis
Computational
docking
Computational
docking
Computational
docking
Binding site
predictions
Binding site
predictions
Correlated
mutations
Correlated
mutations
Binding site
predictions
Cryo-EM/
tomography
Computational
docking
Binding site
predictions
Cryo-EM/
tomography
Tandem affinity
purification
Cross-linking
FRET
Site-directed
mutagenesis
Computational
docking
Binding site
predictions
Cryo-EM/
tomography
Tandem affinity
purification
Immuno-EM
Yeast-two-hybrid
Cryo-EM/
tomography
Resolution of restraints
300Å
1Å
Cross-linking
Cross-linking
FRET
FRET
Site-directed
mutagenesis
Site-directed
mutagenesis
Site-directed
mutagenesis
Cryo-EM/
tomography
Cryo-EM/
tomography
Cross-linking
FRET
Computational
docking
Binding site
predictions
Cryo-EM/
tomography
Cross-linking
FRET
Site-directed
mutagenesis
Computational
docking
Binding site
predictions
Cryo-EM/
tomography
Tandem affinity
purification
Cross-linking
FRET
Site-directed
mutagenesis
Computational
docking
Binding site
predictions
Cryo-EM/
tomography
Tandem affinity
purification
Immuno-EM
Yeast-two-hybrid
Cryo-EM/
tomography
Properties of points
Interaction centers:
Binding patches
Define interaction centers at each
structural level.
Interaction centers transmit interactions
between assembly subunits.
Properties of points
Interaction centers:
Subunit flexibility:
q
Define interaction centers at each
structural level.
Interaction centers transmit interactions
between assembly subunits.
q
Allow or restrict conformational degrees
of freedom within assembly subunits.
Design alternative internal restraints sets
for different conformational states.
Modeling the Yeast Nuclear Pore
complex by satisfaction of spatial
restraints
Andrej Sali
Frank Alber, Damien Devos
Depts. Of Biopharmaceutical Sciences and Pharmaceutical Chemistry
California Institute for Quantitative Biomedical Research
University of California at San Francisco
http://salilab.org/
Mike Rout,
T. Suprapto, J. Kipper, L. Veenhoff,
S. Dokudovskaya
Brian Chait,
W. Zhang
The Rockefeller University,
1230 York Avenue, New York
Integrate spatial information
1)
NUP
Localization
NUP- NUP
Interactions
Symmetry
Global shape
NUP Shape
NUP
Stoichiometry
2)
3)
Successful optimization
Analysis of models
Assessing the well scoring models:
How similar are the models to each other?
Do the models make sense given other data?
Search for conservation of :
Protein-protein contacts
Structural features
Visualization of the results:
Probability distribution of subunit positions
(density maps)
Comparative modeling
Steps in Comparative Protein Structure Modeling
START
TARGET
Template Search
ASILPKRLFGNCEQTSDEGLK
IERTPLVPHISAQNVCLKIDD
VPERLIPERASFQWMNDK
Target – Template
Alignment
TEMPLATE
ASILPKRLFGNCEQTSDEGLKIERTPLVPHISAQNVCLKIDDVPERLIPE
MSVIPKRLYGNCEQTSEEAIRIEDSPIV---TADLVCLKIDEIPERLVGE
Model Building
Model Evaluation
No
OK?
Yes
END
A. Šali, Curr. Opin. Biotech. 6, 437, 1995.
R. Sánchez & A. Šali, Curr. Opin. Str. Biol. 7, 206, 1997.
M. Marti et al. Ann. Rev. Biophys. Biomolec. Struct., 29, 291, 2000.
http://salilab.org/
Comparative modeling by satisfaction of spatial restraints
MODELLER
3D GKITFYERGFQGHCYESDC-NLQP…
SEQ GKITFYERG---RCYESDCPNLQP…
1. Extract spatial restraints
2. Satisfy spatial restraints
F(R) = P pi (fi /I)
i
A. Šali & T. Blundell. J. Mol. Biol. 234, 779, 1993.
J.P. Overington & A. Šali. Prot. Sci. 3, 1582, 1994.
A. Fiser, R. Do & A. Šali, Prot. Sci., 9, 1753, 2000.
http://salilab.org/
Comparative modeling of the TrEMBL database
Unique sequences processed: 1,182,126
Sequences with fold assignments or models: 659,495 (56%)
70% of models based on <30% sequence identity to template.
On average, only a domain per protein is modeled
(an “average” protein has 2.5 domains of 175 aa).
9/15/03
~4 weeks on 660 Pentium CPUs
http://salilab.org/modbase
Pieper et al., Nucl. Acids Res. 2002.
Structural Genomics
Sali. Nat. Struct. Biol. 5, 1029, 1998.
Sali et al. Nat. Struct. Biol., 7, 986, 2000.
Sali. Nat. Struct. Biol. 7, 484, 2001.
Baker & Sali. Science 294, 93, 2001.
Characterize most protein sequences based on related known
structures.
The number of “families” is
much smaller than the
number of proteins.
Any one of the members
of a family is fine.
There are ~16,000 30% seq id families (90%)
(Vitkup et al. Nat. Struct. Biol. 8, 559, 2001).
Comparative modeling and EM density fitting
E. coli ribosome
H.Gao, J.Sengupta, M.Valle, A. Korostelev, N.Eswar, S.Stagg, P.Van Roey, R.Agrawal,
S.Harvey, A.Sali, M.Chapman, and J.Frank. Cell 113, 789-801, 2003.
1. Cryo-EM density map of the 70S ribosome of E. coli was determined at ~12Å.
2. Calculate comparative models of the proteins and rRNA based on T.
thermophilus, H. marismortui, and D. radiodurans ribosome structures.
3. For each protein, select the best model among several alternatives based on
model assessment by statistical potentials, model compactness, targettemplate sequence identity, quality of template structure, and fit into the EM
map.
4. Obtain an assembly model by multi-rigid body real-space refinement of the fit
into the EM map, by RSRef (M. Chapman).
5. Repeat 3 and 4 iteratively to get an optimized model.
E. coli ribosome
Fitting of comparative models
into 12Å cryo-electron density
map.
All 29 proteins in 30S and 29 of
36 proteins in the 50S subunit
could be modeled on 22-73%
seq.id. to a known structure.
The models generally cover
whole folds, but tails.
H.Gao, J.Sengupta, M.Valle, A. Korostelev, N.Eswar, S.Stagg, P.Van Roey, R.Agrawal,
S.Harvey, A.Sali, M.Chapman, and J.Frank. Cell 113, 789-801, 2003.
Errors in comparative models vs. resolution
Incorrect template
Rigid body
movement
20Å
Maps courtesy of Wah Chiu.
Misalignment
10Å
Region without
a template
Distortion/shifts in
aligned regions
Sidechain packing
2Å
Moulding: iterative alignment,
model building, model assessment
alignment
model building
model assessment
Models per alignment
B. John, A. Sali. Nucl. Acids Res. 31, 3982, 2003.
105
Comparative modeling
104
Moulding
Threading
1
1
104
1030
Alignments
Moulding
Start
Sequence profiles of target & template
Generate 25 initial target-template alignments
Rank alignments by alignment score (top 15
parents)
Build all atom models for 15 parent alignments
Rank models by the GA341 score
Best GA341 score < 0.6?
No
Apply genetic algorithm operators to parents
Number of child alignments >300?
No
No
Select representative alignments
Build all-atom models for all representative alignments
Select best 10 models by statistical potential score
(parents for next iteration)
alignment
alignment
Number of iterations >25?
model building
Rank models by composite score
Select best model
Stop
model assessment
Application to a difficult modeling case
1BOV-1LTS (4.4% sequence identity)
final
initial
Ca RMSD 3.6Å
Ca RMSD 10.1Å
1lts structure
1lts model
Given a sequence-structure pair and an EM map:
Exploring alignment (CM) and rigid body (EM)
degrees of freedom
1. Generate a set of comparative models by a standard procedure.
2. Select the model that fits the EM map best.
More generally:
Use the quality of the best fit into the EM map as one of the
model fitness criteria in MOULDER.
With Wah Chiu et al.
Moulding protocol
S
fold assignment
helixhunter, sheehunter
PROF, PSIPRED
Initial alignment
MODELLER
alignment
MODELLER
model building
foldhunter,
EM docking algorithm
model fitting
GA341 score
PROSA score
EM fitting score
alignment
model assessment
model building
model assessment
E
Application: Rice Dwarf Virus (6.8Å EM)
Fold assignment: HELIXHUNTER and DEJAVU - 2 templates:
Upper domain (beta sheet) - bluetongue virus VP7 (1bvp)
Lower domain (helical) - rotavirus VP6 (1qhd)
Generate initial alignments : HELIXHUNTER and secondary
structure prediction methods
MOULDING:
Iterative alignment - restrained by EM data
Model building with MODELLER, loop optimization
Model assessment (GA341 score and PROSA score)
Model fitting (FOLDHUNTER)
With Wah Chiu et al.
Upper domain - good fit
loop optimization:
EM-derived restraints +
MM forcefield
Lower domain - to improve fit
MODELLERderived restraints
model fitting
EM-derived restraints
best fitted structure
END
Acknowledgments
http://salilab.org
Comparative Modeling
Bino John
Narayanan Eswar
Ursula Pieper
Marc A. Martí-Renom
Ribosomes
H.Gao
J.Sengupta
M.Valle
A.Korostelev
S.Stagg
P.Van Roey
R.Agrawal
S.Harvey
M.Chapman
J.Frank
LSD
Patsy Babbitt, Fred Cohen,
Ken Dill, Tom Ferrin, John Irwin,
Matt Jacobson, Tanja Kortemme,
Tack Kuntz, Brian Shoichet,
Chris Voigt
Structural Genomics
Stephen Burley (SGX)
John Kuriyan (UCB)
NY-SGRC
NIH
NSF
Sinsheimer Foundation
A. P. Sloan Foundation
Burroughs-Wellcome Fund
Merck Genome Res. Inst.
Mathers Foundation
I.T. Hirschl Foundation
The Sandler Family Foundation
SUN
IBM
Intel
Structural Genomix
Analysis
Assessing the well scoring models
How similar are the models to each other?
Do the models make sense given other data?
Using “toy” models as benchmarks.
Score
Analysis
frequency
Protein-protein contacts
NUP type
Search conservation of :
NUP type
Structural features
E. coli ribosome
H.Gao, J.Sengupta, M.Valle, A. Korostelev, N.Eswar, S.Stagg, P.Van Roey, R.Agrawal,
S.Harvey, A.Sali, M.Chapman, and J.Frank. Cell 113, 789-801, 2003.
1. …
2. Calculate comparative models of the proteins and rRNA based on T.
thermophilus, H. marismortui, and D. radiodurans ribosome structures.
3. For each protein, select the best model among several alternatives
based on model assessment by statistical potentials, model
compactness, target-template sequence identity, quality of template
structure, and fit into the EM map.
4. Obtain the final assembly model by rigid body real-space refinement of
the fit into the EM map, by RSRef (M. Chapman).
5. Repeat 3 and 4.
Combining Modeller and Docking of Atomic
Models into EM Density Maps (ModEM?)
Improving resolution and accuracy of
molecular models determined by cryo-EM
and comparative protein structure prediction
Map courtesy of Wah Chiu.
Typical errors in comparative models
Incorrect template
Misalignment
MODEL
X-RAY
TEMPLATE
Region without a
template
Distortion/shifts in
aligned regions
Marti-Renom et al. Annu.Rev.Biophys.Biomol.Struct. 29, 291-325, 2000.
Sidechain packing
Download