Structural bioinformatics for glycobiology Structural glycoinformatics approaches • Structural modeling – Comparative modeling of glycoproteins – Complex modeling: glycoprotein replacement • Modeling of the complex of glycans and GBPs and GTs: – docking – Analysis of interaction specificities • Key residues vs. Specific glycan conformations • Molecular Dynamics – Modeling the dynamics of the recognition of glycans by GBPs – Modeling the enzymology of GTs: quantum mechanic calculations Approaches to predicting protein structures obtain sequence (target) Sequence-sequence alignment or Sequence-structure alignment fold assignment high identity long alignment comparative modeling low identity fragment alignment ab initio modeling build, assess model Comparative modeling of proteins • Definition: Prediction of three dimensional structure of a target protein from the amino acid sequence (primary structure) of a homologous (template) protein for which an X-ray or NMR structure is available. • Why a Model: A Model is desirable when either X-ray crystallography or NMR spectroscopy cannot determine the structure of a protein in time or at all. The built model provides a wealth of information of how the protein functions with information at residue property level, e.g. the interaction with the ligands, GBPs/GTs with glycans. Comparative Modeling (or homology modeling) KQFTKCELSQNLYDIDGYGRIALPELICTMF HTSGYDTQAIVENDESTEYGLFQISNALWCK SSQSPQSRNICDITCDKFLDDDITDDIMCAK KILDIKGIDYWIAHKALCTEKLEQWLCEKE ? Homologous Share Similar Sequence KVFGRCELAAAMKRHGLDNYRGYSLGNWVCAAK FESNFNTQATNRNTDGSTDYGILQINSRWWCND GRTPGSRNLCNIPCSALLSSDITASVNCAKKIV SDGNGMNAWVAWRNRCKGTDVQAWIRGCRL Use as template & model 1alc 8lyz Homology models can be very smart! Homology models have RMSDs less than 2Å more than 70% of the time. Sequence similarity implies structural similarity? 100 . identity/similarity Percentage sequence 80 Sequence identity implies structural similarity 60 40 Don’t know 20 0 region ..... (B.Rost, Columbia, NewYork) 0 50 100 150 200 Number of residues aligned 250 Step 1: Fold Identification Aim: To find a template or templates structures from protein database (PDB) pairwise sequence alignment - finds high homology sequences BLAST Fold recognition programs – find low homology sequences (threading, profile-profile alignment) Improved Multiple sequence alignment methods improves sensitivity - remote homologs PSIBLAST, CLUSTAL Step 2: Model Construction Aim: To build three dimension (3D) structures of proteins, coordinates of every atoms of the homology proteins Approach 1: protein structure buildup: cores, loops and sidechains; Approach 2: whole protein modeling: constraint-based optimization. Commonly used programs: Modeller (http://salilab.org/modeller/) Swiss-model (http://swissmodel.expasy.org/) Geno3D (http://geno3d-pbil.ibcp.fr/) …… Step 3: Model Construction Modeling of glycan-protein complexes • Template: glycan-protein complex; – Case 1: same glycan, different protein • Glycoprotein replacement: comparative modeling of protein structure • Energy minimization, allowing structural flexibility of glycans – Case 2: same protein, different glycan • Flexible docking of glycans – Case 3: different protein and different glycan • Comparative modeling of proteins • Flexible docking of glycan • Can also be applied without a template of complex Flexible docking • Semi-flexible (rigid protein, flexible ligand) – Useful for drug screening – >150 programs: Dock, AutoDock, FlexX/FlexE, … • Flexible protein: mainly sidechains (hard) • Two elements of semi-flexible docking algorithms – ligand sampling methods • Pattern matching: Genetic Algorithm, Molecular Dynamics, Monte Carlo… – Treatment of intermolecular forces: • Simplified scoring functions: empirical, knowledge-based and molecular mechanics e.g. AMBER, CHARMM, GROMOS, ... • Very simple treatment of solvation and entropy, or completely ignored! Flexible docking of glycans to proteins • Glycan structure sampling – Automatic generation / sampling of 3D glycan structures: Sweet II (http://www.dkfzheidelberg.de/spec/sweet2) • Docking of each glycan conformation to the GBP: Scoring schemes – Empirical scores – Forcefield • GLYCAM: modified AMBER forcefield / MD tools for glycans (R. Woods group) – Challenge: water molecules Flexibility of molecules • Atoms connected by covalent bonds • Bond lengths and bond angles are rigid • Torsion (dihedral) angles are flexible Frequently used definitions of glycosidic torsion angles Angle NMR style C−1 crystallographic style C+1 crystallographic style ϕ H1—C1—O—C′x O5—C1—O—C′x O5—C1—O—C′x ψ C1—O—C′x—H′x C1—O—C′x—C′x−1 C1—O—C′x—C′x+1 ψ [(1–6)-linkage] C1—O—C′6—C′5 C1—O—C′6—C′5 C1—O—C′6—C′5 ω [(1–6)-linkage] O—C′6—C′5—H′5 O—C′6—C′5—C′4 O—C′6—C′5—O′5 ASN sweet2: http://www.dkfz-heidelberg.de/spec/sweet2/ Induced fit? rigid receptor hypethesis Preferred torsion angles of glycans Cone-like (left) and umbrella-like (right) topologies of 2-3 and 2-6 siaylated glycans binding to influenza viral HAs Chandrasekaran, et. al. Nature Biotechnology 26, 107 - 113 (2008) Combine structural analysis with the glycan array analysis: providing structural insights. M. E. Taylor and K. Drickamer, Glycobiology 2009 19(11):1155-1162 Ligand binding by the scavenger receptor C-type lectin (SRCL) and LSECtin M. E. Taylor and K. Drickamer, Glycobiology 2009 19(11):1155-1162 Binding of multiple classes of ligands to DC-SIGN and the macrophage galactose receptor. Model of the binding site in the macrophage galactose receptor with a bound GalNAc residue, based on the structure of the galactose-binding mutant of mannose-binding protein that was created by insertion of key binding site residues from the galactosebinding receptor. M. E. Taylor and K. Drickamer, Glycobiology 2009 19(11):1155-1162 Mechanisms of mannose-binding protein interaction with ligands. M. E. Taylor and K. Drickamer, Glycobiology 2009 19(11):1155-1162 Molecular Dynamics: simulation of molecular motions • Energy model of conformation • Two main approaches: – Monte Carlo - stochastic – Molecular dynamics – deterministic • Understand molecular function and interactions – Catalysis of enzymes • Complementary to experiments • Obtain a movie of the interacting molecules Basic Concepts of simulation of molecular motion 1. Compute energy for the interaction between all pairs of atoms. 2. Move atoms to the next state. 3. Repeat. Energy Function • Target function that MD uses to govern the motion of molecules (atoms) • Describes the interaction energies of all atoms and molecules in the system • Always an approximation – Closer to real physics --> more realistic, more computation time (I.e. smaller time steps and more interactions increase accuracy) Scale in Simulations mesoscale continuum Monte Carlo 10-6 S 10-8 S molecular dynamics quantum chemistry 10-12 S domain exp(-DE/kT) F = MA Hy = Ey 10-10 M 10-8 M 10-6 M 10-4 M Length Scale Taken from Grant D. Smith Department of Materials Science and Engineering Department of Chemical and Fuels Engineering University of Utah http://www.che.utah.edu/~gdsmith/tutorials/tutorial1.ppt The energy model • Proposed by Linus Pauling in the 1930s • Bond angles and lengths are almost always the same • Energy model broken up into two parts: – Covalent terms • Bond distances (1-2 interactions) • Bond angles (1-3) • Dihedral angles (1-4) – Non-covalent terms • Forces at a distance between all non-bonded atoms http://cmm.cit.nih.gov/modeling/guide_documents/molecular_mechani cs_document.html The NIH Guide to Molecular Modeling The energy equation Energy = Stretching Energy + Bending Energy + Torsion Energy + Non-Bonded Interaction Energy These equations together with the data (parameters) required to describe the behavior of different kinds of atoms and bonds, is called a forcefield. Bond Stretching Energy kb is the spring constant of the bond. r0 is the bond length at equilibrium. Unique kb and r0 assigned for each bond pair, i.e. C-C, O-H Bending Energy k is the spring constant of the bend. 0 is the bond length at equilibrium. Unique parameters for angle bending are assigned to each bonded triplet of atoms based on their types (e.g. C-C-C, C-O-C, CC-H, etc.) Torsion Energy A controls the amplitude of the curve n controls its periodicity shifts the entire curve along the rotation angle axis (). The parameters are determined from curve fitting. Unique parameters for torsional rotation are assigned to each bonded quartet of atoms based on their types (e.g. C-C-CC, C-O-C-N, H-C-C-H, etc.) Non-bonded Energy A determines the degree the attractiveness A determines the degree the attractiveness B determines the degree of repulsion B determines the degree of repulsion q is the charge q is the charge Simulating In A Solvent • The smaller the system, the more particles on the surface – – 1000 atom cubic crystal, 49% on surface 106 atom cubic crystal, 6% on surface • Would like to simulate infinite bulk surrounding N-particle system • Two approaches: – Implicitly – Explicitly • Periodic boundary conditions Schematic representation of periodic boundary conditions. http://www.ccl.net/cca/documents/molecularmodeling/node9.html Parameters for MD: Forcefield • Derived from direct experimental measurements on small molecules (~10 atoms) • Commonly used: AMBER, CHARMM, GROMOS, etc – GLYCAM for MD of glycoconjugates (derived from AMBER forcefield) Monte Carlo Explore the energy surface by randomly probing the configuration space by a Markov Chain approach Metropolis method (avoids local minima): 1. Specify the initial atom coordinates. 2. Select atom i randomly and move it by random displacement. 3. Calculate the change of potential energy, DE corresponding to this displacement. 4. If DE < 0, accept the new coordinates and go to step 2. 5. Otherwise, if DE 0, select a random R in the range [0,1] and: 1. If e-DE/kT < R accept and go to step 2 2. If e-DE/kT R reject and go to step 2 Deterministic Approach • Provides us with a trajectory of the system. – From atom positions, velocities, and accelerations, calculate atom positions and velocities at the next time step. – Integrating these infinitesimal steps yields the trajectory of the system for any desired time range. • Typical simulations of small proteins including surrounding solvent in the pico-seconds. E Fi = F = ma x i Deterministic / MD methodology • From atom positions, velocities, and accelerations, calculate atom positions and velocities at the next time step. • Integrating these infinitesimal steps yields the trajectory of the system for any desired time range. • There are efficient methods for integrating these elementary steps with Verlet and leapfrog algorithms being the most commonly used. MD algorithm • Initialize system {r(t+Dt), v(t+Dt)} {r(t), v(t)} – Ensure particles do not overlap in initial positions (can use lattice) – Randomly assign velocities. • Move and integrate. Leapfrog algorithm MD studies of Prion proteins • Prion protein (PrP) is associated with an unusual class of neurodegenerative diseases – Scrapie (sheep); bovine spongiform encephalopathy (BSE) in cattle; kuru, Creutzfeldt-Jacob disease (CJD), Gerstmann-Sträussler-Scheinker syndrome (GSS), and fatal familiar insomnia (FFI) in humans • Protein-only hypothesis (Prusiner, 1982): the disease is caused by an abnormal form of the 250 amino acid PrP, which accumulates in plaques in the brain. • • PrP (PrPSc) differs from the normal cellular form (PrPC) only in its 3-D structure, and FTIR and CD spectra indicate it has a significantly increased content of ßsheet conformation compared with PrPC Glycosylation appears to protect prion protein (PrPC) from the conformational transition to the disease-associated scrapie form (PrPSc); PrP is a glyco-protein • Available NMR structures are for nonglycosylated PrPC only • Glycosylation appears to protect prion protein (PrPC) from the conformational transition to the disease-associated scrapie form (PrPSc) • Objective: study of the influence of two Nlinked glycans (Asn181 and Asn197) and of the GPI anchor attached to Ser230 Zuegg, et. al., Glycobiology, 2000, 10(10):959-974. MD simulations • Molecular dynamics simulations on the C-terminal region of human prion protein HuPrP(90–230), with and without the three glycans • AMBER94 force field in a periodic box model with explicit water molecules, considering all long-range electrostatic interactions • HuPrP(127–227) is stabilized overall from addition of the glycans, specifically by extensions of two helix and reduced flexibility of the linking turn containing Asn197; • The stabilization appears indirect, by reducing the mobility of the surrounding water molecules, and not from specific interactions such as H bonds or ion pairs. – Asn197 having a stabilizing role, while Asn181 is within a region with already stable secondary structure Zuegg, et. al., Glycobiology, 2000, 10(10):959-974. Cone-like (left) and umbrella-like (right) topologies of 2-3 and 2-6 siaylated glycans binding to influenza viral HAs A retrospective analysis Chandrasekaran, et. al. Nature Biotechnology 26, 107 - 113 (2008) MD simulation of glycan binding of influenza HAs • A combined approach (MD + sequences) to predict ligandbinding mutants of H5N1 influenza HA – Modeling the ligand-bound state of H5N1 HA using the isolate VN1194 bound to α2,3-sialyllactose as previously crystallized – Excess mutual information was computed between each residue of each monomer and the corresponding bound ligand, using the average mutual information between the residue and all residues as an estimate of the “background” mutual information. – Combine these results with sequence analysis of H5N1 mutational data to predict clusters of residues that undergo coordinated mutation, which have some capacity to vary but are subject to selective pressure relating mutation. These residues may be richer targets to change ligand specificity than residues absolutely conserved or residues that display uncorrelated mutations (involved in immune escape). Kasson, et. al., JACS, 2009, 131 (32), pp 11338–11340 Experimentally identified ligand-binding mutations in red, the top 5% of residues by dynamics scoring in cyan (overlap of these two in magenta), and the six mutation sites identified by both dynamics and sequence analysis in yellow. The top three mutations from the ligand dissociation analyses in yellow. A modeled α2,3-sialyllactose is shown in orange. Prediction of dissociation rate for HA mutants (in silico mutagenesis) • Bayesian analysis methods to predict dissociation rates based on extensive simulation of each mutant and evaluate whether a mutant has a faster dissociation rate than the influenza clinical isolate that we use as a wild-type reference. • These simulations were used to estimate the dissociation rate for each mutation. • The mutation sites predicted by analysis of the molecular dynamics data include both residues immediately contacting the bound glycan and residues located farther away on the globular head of the hemagglutinin molecule.