slides

advertisement
Structural bioinformatics for
glycobiology
Structural glycoinformatics approaches
• Structural modeling
– Comparative modeling of glycoproteins
– Complex modeling: glycoprotein replacement
• Modeling of the complex of glycans and GBPs and GTs:
– docking
– Analysis of interaction specificities
• Key residues vs. Specific glycan conformations
• Molecular Dynamics
– Modeling the dynamics of the recognition of glycans by
GBPs
– Modeling the enzymology of GTs: quantum mechanic
calculations
Approaches to predicting protein structures
obtain sequence (target)
Sequence-sequence alignment or
Sequence-structure alignment
fold assignment
high identity
long alignment
comparative
modeling
low identity
fragment alignment
ab initio
modeling
build, assess model
Comparative modeling of proteins
• Definition:
Prediction of three dimensional structure of a target protein from the
amino acid sequence (primary structure) of a homologous (template)
protein for which an X-ray or NMR structure is available.
• Why a Model:
A Model is desirable when either X-ray crystallography or NMR
spectroscopy cannot determine the structure of a protein in time or
at all. The built model provides a wealth of information of how the
protein functions with information at residue property level, e.g. the
interaction with the ligands, GBPs/GTs with glycans.
Comparative Modeling
(or homology modeling)
KQFTKCELSQNLYDIDGYGRIALPELICTMF
HTSGYDTQAIVENDESTEYGLFQISNALWCK
SSQSPQSRNICDITCDKFLDDDITDDIMCAK
KILDIKGIDYWIAHKALCTEKLEQWLCEKE
?
Homologous
Share Similar
Sequence
KVFGRCELAAAMKRHGLDNYRGYSLGNWVCAAK
FESNFNTQATNRNTDGSTDYGILQINSRWWCND
GRTPGSRNLCNIPCSALLSSDITASVNCAKKIV
SDGNGMNAWVAWRNRCKGTDVQAWIRGCRL
Use as template &
model
1alc
8lyz
Homology models can be very smart!
Homology models have RMSDs less than 2Å more than 70% of the time.
Sequence similarity implies structural similarity?
100
.
identity/similarity
Percentage sequence
80
Sequence identity implies
structural similarity
60
40
Don’t
know
20
0
region .....
(B.Rost, Columbia, NewYork)
0
50
100
150
200
Number of residues aligned
250
Step 1: Fold Identification
Aim: To find a template or templates structures from protein database (PDB)
pairwise sequence alignment - finds high homology sequences BLAST
Fold recognition programs – find low homology sequences (threading,
profile-profile alignment)
Improved Multiple sequence alignment methods improves
sensitivity - remote homologs PSIBLAST, CLUSTAL
Step 2: Model Construction
Aim: To build three dimension (3D) structures of proteins, coordinates of every
atoms of the homology proteins
Approach 1: protein structure buildup: cores, loops and sidechains;
Approach 2: whole protein modeling: constraint-based optimization.
Commonly used programs:
Modeller (http://salilab.org/modeller/)
Swiss-model (http://swissmodel.expasy.org/)
Geno3D (http://geno3d-pbil.ibcp.fr/)
……
Step 3: Model Construction
Modeling of glycan-protein complexes
• Template: glycan-protein complex;
– Case 1: same glycan, different protein
• Glycoprotein replacement: comparative modeling of protein
structure
• Energy minimization, allowing structural flexibility of glycans
– Case 2: same protein, different glycan
• Flexible docking of glycans
– Case 3: different protein and different glycan
• Comparative modeling of proteins
• Flexible docking of glycan
• Can also be applied without a template of complex
Flexible docking
• Semi-flexible (rigid protein, flexible ligand)
– Useful for drug screening
– >150 programs: Dock, AutoDock, FlexX/FlexE, …
• Flexible protein: mainly sidechains (hard)
• Two elements of semi-flexible docking algorithms
– ligand sampling methods
• Pattern matching: Genetic Algorithm, Molecular Dynamics, Monte
Carlo…
– Treatment of intermolecular forces:
• Simplified scoring functions: empirical, knowledge-based and
molecular mechanics e.g. AMBER, CHARMM, GROMOS, ...
• Very simple treatment of solvation and entropy, or completely
ignored!
Flexible docking of glycans to proteins
• Glycan structure sampling
– Automatic generation / sampling of 3D glycan
structures: Sweet II (http://www.dkfzheidelberg.de/spec/sweet2)
• Docking of each glycan conformation to the GBP:
Scoring schemes
– Empirical scores
– Forcefield
• GLYCAM: modified AMBER forcefield / MD tools for glycans
(R. Woods group)
– Challenge: water molecules
Flexibility of molecules
• Atoms connected
by covalent bonds
• Bond lengths and
bond angles are
rigid
• Torsion (dihedral)
angles are flexible
Frequently used definitions of
glycosidic torsion angles
Angle
NMR style
C−1
crystallographic
style
C+1
crystallographic
style
ϕ
H1—C1—O—C′x
O5—C1—O—C′x
O5—C1—O—C′x
ψ
C1—O—C′x—H′x
C1—O—C′x—C′x−1
C1—O—C′x—C′x+1
ψ [(1–6)-linkage]
C1—O—C′6—C′5
C1—O—C′6—C′5
C1—O—C′6—C′5
ω [(1–6)-linkage]
O—C′6—C′5—H′5
O—C′6—C′5—C′4
O—C′6—C′5—O′5
ASN
sweet2: http://www.dkfz-heidelberg.de/spec/sweet2/
Induced fit? rigid receptor hypethesis
Preferred torsion angles of glycans
Cone-like (left) and umbrella-like (right) topologies of
2-3 and 2-6 siaylated glycans binding to influenza viral
HAs
Chandrasekaran, et. al. Nature Biotechnology 26, 107 - 113 (2008)
Combine structural analysis with the glycan array analysis: providing structural insights.
M. E. Taylor and K. Drickamer, Glycobiology 2009 19(11):1155-1162
Ligand binding by
the scavenger
receptor C-type
lectin (SRCL) and
LSECtin
M. E. Taylor and K. Drickamer, Glycobiology 2009 19(11):1155-1162
Binding of multiple classes
of ligands to DC-SIGN and
the macrophage galactose
receptor. Model of the
binding site in the
macrophage galactose
receptor with a bound
GalNAc residue, based on
the structure of the
galactose-binding mutant of
mannose-binding protein
that was created by
insertion of key binding site
residues from the galactosebinding receptor.
M. E. Taylor and K. Drickamer, Glycobiology 2009 19(11):1155-1162
Mechanisms of mannose-binding protein interaction with ligands.
M. E. Taylor and K. Drickamer, Glycobiology 2009 19(11):1155-1162
Molecular Dynamics: simulation of
molecular motions
• Energy model of conformation
• Two main approaches:
– Monte Carlo - stochastic
– Molecular dynamics – deterministic
• Understand molecular function and
interactions
– Catalysis of enzymes
• Complementary to experiments
• Obtain a movie of the interacting molecules
Basic Concepts of simulation of
molecular motion
1. Compute energy for the interaction between
all pairs of atoms.
2. Move atoms to the next state.
3. Repeat.
Energy Function
• Target function that MD uses to govern the
motion of molecules (atoms)
• Describes the interaction energies of all atoms
and molecules in the system
• Always an approximation
– Closer to real physics --> more realistic, more
computation time (I.e. smaller time steps and
more interactions increase accuracy)
Scale in Simulations
mesoscale
continuum
Monte Carlo
10-6 S
10-8 S
molecular
dynamics
quantum
chemistry
10-12 S
domain
exp(-DE/kT)
F = MA
Hy = Ey
10-10 M
10-8 M
10-6 M
10-4 M
Length Scale
Taken from Grant D. Smith
Department of Materials Science and Engineering
Department of Chemical and Fuels Engineering
University of Utah
http://www.che.utah.edu/~gdsmith/tutorials/tutorial1.ppt
The energy model
• Proposed by Linus Pauling
in the 1930s
• Bond angles and lengths
are almost always the same
• Energy model broken up
into two parts:
– Covalent terms
• Bond distances (1-2
interactions)
• Bond angles (1-3)
• Dihedral angles (1-4)
– Non-covalent terms
• Forces at a distance
between all non-bonded
atoms
http://cmm.cit.nih.gov/modeling/guide_documents/molecular_mechani
cs_document.html
The NIH Guide to Molecular Modeling
The energy equation
Energy =
Stretching Energy +
Bending Energy +
Torsion Energy +
Non-Bonded Interaction Energy
These equations together with the data (parameters) required to describe
the behavior of different kinds of atoms and bonds, is called a forcefield.
Bond Stretching Energy
kb is the spring constant of the bond.
r0 is the bond length at equilibrium.
Unique kb and r0 assigned for each bond
pair, i.e. C-C, O-H
Bending Energy
k is the spring constant of the bend.
0 is the bond length at equilibrium.
Unique parameters for angle bending are
assigned to each bonded triplet of atoms
based on their types (e.g. C-C-C, C-O-C, CC-H, etc.)
Torsion Energy
A controls the amplitude of the curve
n controls its periodicity
 shifts the entire curve along the
rotation angle axis ().
The parameters are determined from
curve fitting.
Unique parameters for torsional rotation
are assigned to each bonded quartet of
atoms based on their types (e.g. C-C-CC, C-O-C-N, H-C-C-H, etc.)
Non-bonded Energy
A determines the degree the attractiveness
A determines the degree the attractiveness
B determines the degree of repulsion
B determines the degree of repulsion
q is the charge
q is the charge
Simulating In A Solvent
• The smaller the system, the more particles on the
surface
–
–
1000 atom cubic crystal, 49% on surface
106 atom cubic crystal, 6% on surface
• Would like to simulate infinite bulk surrounding
N-particle system
• Two approaches:
– Implicitly
– Explicitly
• Periodic boundary conditions
Schematic representation of periodic
boundary conditions.
http://www.ccl.net/cca/documents/molecularmodeling/node9.html
Parameters for MD: Forcefield
• Derived from direct experimental
measurements on small molecules (~10
atoms)
• Commonly used: AMBER, CHARMM,
GROMOS, etc
– GLYCAM for MD of glycoconjugates (derived from
AMBER forcefield)
Monte Carlo
Explore the energy surface by randomly probing the
configuration space by a Markov Chain approach
Metropolis method (avoids local minima):
1. Specify the initial atom coordinates.
2. Select atom i randomly and move it by random displacement.
3. Calculate the change of potential energy, DE corresponding to
this displacement.
4. If DE < 0, accept the new coordinates and go to step 2.
5. Otherwise, if DE  0, select a random R in the range [0,1] and:
1. If e-DE/kT < R accept and go to step 2
2. If e-DE/kT  R reject and go to step 2
Deterministic Approach
• Provides us with a trajectory of the system.
– From atom positions, velocities, and accelerations,
calculate atom positions and velocities at the next
time step.
– Integrating these infinitesimal steps yields the
trajectory of the system for any desired time
range.
• Typical simulations of small proteins including
surrounding solvent in the pico-seconds.
E
Fi =
F = ma
x i
Deterministic / MD methodology
• From atom positions, velocities, and
accelerations, calculate atom positions and
velocities at the next time step.
• Integrating these infinitesimal steps yields the
trajectory of the system for any desired time
range.
• There are efficient methods for integrating these
elementary steps with Verlet and leapfrog
algorithms being the most commonly used.
MD algorithm
• Initialize system
{r(t+Dt), v(t+Dt)}
{r(t), v(t)}
– Ensure particles do not overlap in initial positions
(can use lattice)
– Randomly assign velocities.
• Move and integrate.
Leapfrog algorithm
MD studies of Prion proteins
• Prion protein (PrP) is associated with an unusual class of
neurodegenerative diseases
– Scrapie (sheep); bovine spongiform encephalopathy (BSE) in cattle; kuru,
Creutzfeldt-Jacob disease (CJD), Gerstmann-Sträussler-Scheinker syndrome
(GSS), and fatal familiar insomnia (FFI) in humans
• Protein-only hypothesis (Prusiner, 1982): the disease is caused
by an abnormal form of the 250 amino acid PrP, which
accumulates in plaques in the brain.
•
•
PrP (PrPSc) differs from the normal cellular form (PrPC) only in its 3-D structure,
and FTIR and CD spectra indicate it has a significantly increased content of ßsheet conformation compared with PrPC
Glycosylation appears to protect prion protein (PrPC) from the conformational
transition to the disease-associated scrapie form (PrPSc);
PrP is a glyco-protein
• Available NMR structures are for nonglycosylated PrPC only
• Glycosylation appears to protect prion protein
(PrPC) from the conformational transition to
the disease-associated scrapie form (PrPSc)
• Objective: study of the influence of two Nlinked glycans (Asn181 and Asn197) and of the
GPI anchor attached to Ser230
Zuegg, et. al., Glycobiology, 2000, 10(10):959-974.
MD simulations
• Molecular dynamics simulations on the C-terminal region of human prion
protein HuPrP(90–230), with and without the three glycans
• AMBER94 force field in a periodic box model with explicit water molecules,
considering all long-range electrostatic interactions
• HuPrP(127–227) is stabilized overall from addition of the glycans,
specifically by extensions of two helix and reduced flexibility of the linking
turn containing Asn197;
• The stabilization appears indirect, by reducing the mobility of the
surrounding water molecules, and not from specific interactions such as H
bonds or ion pairs.
– Asn197 having a stabilizing role, while Asn181 is within a region with already
stable secondary structure
Zuegg, et. al., Glycobiology, 2000, 10(10):959-974.
Cone-like (left) and umbrella-like (right)
topologies of 2-3 and 2-6 siaylated glycans
binding to influenza viral HAs
A retrospective analysis
Chandrasekaran, et. al. Nature Biotechnology 26, 107 - 113 (2008)
MD simulation of glycan binding of
influenza HAs
• A combined approach (MD + sequences) to predict ligandbinding mutants of H5N1 influenza HA
– Modeling the ligand-bound state of H5N1 HA using the isolate VN1194
bound to α2,3-sialyllactose as previously crystallized
– Excess mutual information was computed between each residue of each
monomer and the corresponding bound ligand, using the average
mutual information between the residue and all residues as an estimate
of the “background” mutual information.
– Combine these results with sequence analysis of H5N1 mutational data
to predict clusters of residues that undergo coordinated mutation, which
have some capacity to vary but are subject to selective pressure relating
mutation. These residues may be richer targets to change ligand
specificity than residues absolutely conserved or residues that display
uncorrelated mutations (involved in immune escape).
Kasson, et. al., JACS, 2009, 131 (32), pp 11338–11340
Experimentally identified
ligand-binding mutations in
red, the top 5% of residues by
dynamics scoring in cyan
(overlap of these two in
magenta), and the six mutation
sites identified by both
dynamics and sequence
analysis in yellow.
The top three mutations
from the ligand dissociation
analyses in yellow. A
modeled α2,3-sialyllactose
is shown in orange.
Prediction of dissociation rate for
HA mutants (in silico mutagenesis)
• Bayesian analysis methods to predict dissociation rates based
on extensive simulation of each mutant and evaluate whether
a mutant has a faster dissociation rate than the influenza
clinical isolate that we use as a wild-type reference.
• These simulations were used to estimate the dissociation rate
for each mutation.
• The mutation sites predicted by analysis of the molecular
dynamics data include both residues immediately contacting
the bound glycan and residues located farther away on the
globular head of the hemagglutinin molecule.
Download