Iowa State University Bioinformatics and Computational Biology Graduate Program Protein Structure Lab Michael Zimmermann Ataur Katebi Ragothaman Yennamalli CSBSI Short Course, June, 2010 Iowa State University Bioinformatics and Computational Biology Graduate Program Structures and Bioinformatics Detailed genetic information informs organism wide views CSBSI Short Course, June, 2010 Iowa State University Bioinformatics and Computational Biology Graduate Program Structures and Bioinformatics CSBSI Short Course, June, 2010 Iowa State University Bioinformatics and Computational Biology Graduate Program CSBSI Short Course, June, 2010 Iowa State University Bioinformatics and Computational Biology Graduate Program Today’s Plan 1. What are molecular structures? • • Primary, Secondary, Tertiary, Quaternary Structure Why we need them 2. Where do we get them? • • PDB, NDB, and EMDB Homology modeling 3. How do they interact? • DIP and Docking 4. How do we know what they do? • • Genome annotation (what you’ve been doing) Molecular motions I. Molecular Dynamics II. Normal Mode Analysis (Elastic Networks) CSBSI Short Course, June, 2010 Iowa State University Bioinformatics and Computational Biology Graduate Program What Are Molecular Structures? (and why are they important?) CSBSI Short Course, June, 2010 Iowa State University Bioinformatics and Computational Biology Graduate Program Central Dogma CGACGGGGACGA CGGGGACCATTT GCUGCCCCUGCU GCCCCUGGUAAA AAPAAPGK DNA → RNA → Protein CSBSI Short Course, June, 2010 Iowa State University Bioinformatics and Computational Biology Graduate Program Protein secondary structure elements (1arl) • (H) -helices • (E) - sheets • (C) Coils •Molecules are too small to see •Artistic depictions are informative Iowa State University Bioinformatics and Computational Biology Graduate Program Size and Scale http://learn.genetics.utah.edu/content/begin/cells/scale/ CSBSI Short Course, June, 2010 Iowa State University Bioinformatics and Computational Biology Graduate Program Protein Structure Iowa State University Bioinformatics and Computational Biology Graduate Program Helix Iowa State University Bioinformatics and Computational Biology Graduate Program Parallel sheet Iowa State University Bioinformatics and Computational Biology Graduate Program Antiparallel sheet Iowa State University Bioinformatics and Computational Biology Graduate Program Diverse Tertiary Structures CSBSI Short Course, June, 2010 Iowa State University Bioinformatics and Computational Biology Graduate Program Importance of the problem • # sequences >> # number structures • Secondary structure may be used as an input for tertiary structure prediction • 1D problem is easier than 3D Iowa State University Bioinformatics and Computational Biology Graduate Program Scale of Sequence Versus Structure CSBSI Short Course, June, 2010 Iowa State University Bioinformatics and Computational Biology Graduate Program How do we get them? Databases or Structure Prediction CSBSI Short Course, June, 2010 Iowa State University Bioinformatics and Computational Biology Graduate Program Assignments of secondary structure • Crystallographers assign (subjective) • Automatic assignments from the PDB coordinates – Dictionary of Secondary Structure of Proteins (DSSP) – Kabsch and Sander 1983 - based on positions of hydrogen bonds • STRIDE assignments Iowa State University Bioinformatics and Computational Biology Graduate Program DSSP assignments • • • • • • • • 1. 2 3 4 5 6 7 8 (H) Helix (E) Strand (G) 310 Helix (I) Helix (B) Bridge (single residue strand) (T) Turn (S) Bend (C) Coil Iowa State University Bioinformatics and Computational Biology Graduate Program Some ambiguity • Various translations of 8 DSSP states into 3 secondary structure states • Two versions of DSSP – EMBL (Heidelberg) version • Includes interchain hydrogen bonds – PDB version • Excludes interchain hydrogen bonds Iowa State University Bioinformatics and Computational Biology Graduate Program Improvement of prediction by using multiple sequence alignments • Zvelebil et al 1987 • Levin, Pascarella, Argos & Garnier 1993 • Rost & Sander 1993 • Accuracy of prediction based on single sequences ~ 65% • Accuracy of prediction using multiple sequence alignments ~ 75% (for the most successful methods) Iowa State University Bioinformatics and Computational Biology Graduate Program New improved algorithm (GOR V) Kloczkowski, Ting, Jernigan & Garnier • New database of 513 non-redundant sequences proposed by Cuff and Barton • Additional statistics of triplets • Resizable window (size of the window is adjusted to the length of the sequence) • Optimization of parameters – Decision parameters to increase the accuracy of prediction for -sheets • Multiple sequence alignments PSI-BLAST (FASTA + CLUSTAL in an early version) Iowa State University Bioinformatics and Computational Biology Graduate Program GOR V >gi|42572793|ref|NP_974493.1| myb family transcription factor [Arabidopsis thaliana] MDNHRRTKQPKTNSIVTSSSEVSSLEWEVV SQEEEDLVSRMHKLVGDRWELIAGRIPGRT AGEIERFWVMKN GOR V server http://gor.bb.iastate.edu/ Iowa State University Bioinformatics and Computational Biology Graduate Program References • A. Kloczkowski, K-L. Ting, R.L. Jernigan and J. Garnier – Protein secondary structure prediction based on the GOR algorithm incorporating multiple sequence alignment, Polymer, 2002, 43, 441-449 • A. Kloczkowski, K-L. Ting, R.L. Jernigan and J. Garnier – Combining GOR V algorithm with evolutionary information for protein secondary structure prediction from amino acid sequence, Proteins; Structure, Function Genetics, 2002, 49, 154-166 Iowa State University Bioinformatics and Computational Biology Graduate Program Other methods • PSIPRED (Neural Network) http://bioinf.cs.ucl.ac.uk/psipred/psiform.html • PHD (Neural Network) http://cubic.bioc.columbia.edu/predictprotein/ • JPRED (Neural Network) http://www.compbio.dundee.ac.uk/~wwwjpred/submit.html • SAM-T99 (Hidden Markov Models) http://www.cse.ucsc.edu/research/compbio/HMMapps/T99-query.html • META servers http://cubic.bioc.columbia.edu/predictprotein/submit_ meta.html » compare with actual structure » problem of turning into 3D structure Iowa State University Bioinformatics and Computational Biology Graduate Program Retrieving, Viewing, and Analyzing Molecular Structure Files CSBSI Short Course, June, 2010 Iowa State University Bioinformatics and Computational Biology Graduate Program Where to get Molecular Files • http://www.rcsb.org/ • http://ndbserver.rutgers.edu • http://www.emdatabank.org/ CSBSI Short Course, June, 2010 Iowa State University Bioinformatics and Computational Biology Graduate Program Molecule Files • The Protein DataBank (PDB) file 1T3R ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM 8 9 10 11 12 13 14 15 16 N CA C O CB CG CD OE1 NE2 GLN GLN GLN GLN GLN GLN GLN GLN GLN A A A A A A A A A 2 2 2 2 2 2 2 2 2 AtomType ChainID Atom# Residue 25.279 23.872 23.654 23.996 22.926 21.447 20.558 20.145 20.336 22.419 22.620 24.078 24.956 22.138 22.401 21.549 20.502 21.926 X Residue# CSBSI Short Course, June, 2010 34.914 34.516 34.247 35.114 35.611 35.328 36.121 35.662 37.380 Y Z 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 21.01 17.82 18.11 20.40 19.10 18.52 21.32 22.49 21.05 N C C O C C C O N B-Factor Element Iowa State University Bioinformatics and Computational Biology Graduate Program sdf mol2 MOL2 – SYBYL Tripos format SMILES convert to 3D with CORINA Iowa State University Bioinformatics and Computational Biology Graduate Program Molecular Visualization UIUC UCSF Delano Scientific and Schrödinger CSBSI Short Course, June, 2010 Iowa State University Bioinformatics and Computational Biology Graduate Program Homology Modeling CSBSI Short Course, June, 2010 Iowa State University Bioinformatics and Computational Biology Graduate Program Homology Modeling • Use when sequence identity is > 35% • 1233 known topologies (CATH) • ≈70% of protein sequences (~50,000,000) template selection sequence-to-structure alignment model building model selection and refinement CSBSI Short Course, June, 2010 Iowa State University Bioinformatics and Computational Biology Graduate Program CSBSI Short Course, June, 2010 Iowa State University Bioinformatics and Computational Biology Graduate Program Protein Machines • Most of biochemical processes taking place in vivo are controlled by proteins: – gene expression and regulation (nuclear receptors) – metabolic pathways (enzymes) – immune system (antibodies) – signal transduction (trans-membrane receptors) – structural (collagen) • Fully automated • Highly specific CSBSI Short Course, June, 2010 Iowa State University Bioinformatics and Computational Biology Graduate Program Classical Structure Determination • Proteins’ structures are solved mostly by: – x-ray crystallography (or SAXS) – NMR spectroscopy – Cryo-EM • All methods require a lot of human input from highly trained specialists. • time-consuming • $10,000 - $1,000,000 for one structure. CSBSI Short Course, June, 2010 Iowa State University Bioinformatics and Computational Biology Graduate Program Homology Modeling CSBSI Short Course, June, 2010 Iowa State University Bioinformatics and Computational Biology Graduate Program Template Detection • Sequence-only methods: – Blast, Fasta scan against PDB database. – PSI-Blast scan against sequence database. • Profile comparison: – Profile-to-profile alignment on structural database. • Threading: – Optimal fitting of modeled sequence to structures from PDB. • Metaservers: – Combination of all above (and others). CSBSI Short Course, June, 2010 Iowa State University Bioinformatics and Computational Biology Graduate Program Modeling • • • • Template is used as a rigid scaffold. Modeling algorithm rebuilds missing parts (loops) Template is used as a semi-flexible scaffold. Usually a great number of models are generated • Modeller (A. Sali), Rosetta (D. Baker), CABS (A. Kolinski), UnRes (H. Scheraga), ITASSER (Y. Zhang) CSBSI Short Course, June, 2010 Iowa State University Bioinformatics and Computational Biology Graduate Program Homology Modeling Example See “Homology Modeling.pdf” CSBSI Short Course, June, 2010 Iowa State University Bioinformatics and Computational Biology Graduate Program How do they interact? DIP: http://dip.doe-mbi.ucla.edu/dip/Main.cgi ORGANISM PROTEINS Drosophila melanogaster 7482 (fruit fly) Saccharomyces cerevisiae 4943 (baker's yeast) Escherichia coli 1863 Caenorhabditis elegans 2650 Homo sapiens (Human) 1476 Helicobacter pylori 712 Mus musculus 502 (house mouse) Rattus norvegicus 163 (Norway rat) Others (266) 2098 INTERACTIONS EXPERIMENTS 22881 23178 18440 23034 7447 4043 2292 1428 8884 4090 3438 1430 683 917 215 315 CSBSI Short Course, June, 2010 Iowa State University Bioinformatics and Computational Biology Graduate Program An Introduction to Docking 46 Iowa State University Bioinformatics and Computational Biology Graduate Program Outline Introduction to DOCKING Protein-protein docking Protein-ligand docking Protein-ligand Docking – “Hands -on” 47 Iowa State University Bioinformatics and Computational Biology Graduate Program What is docking Prediction of the optimal physical configuration and energy between two molecules The docking problem optimizes: 1. Finds orientation that maximizes the interaction. 2. Searches for minimum energy conformation 3. Predicts structural rearrangement 48 Iowa State University Bioinformatics and Computational Biology Graduate Program Why docking? Predicting Biomolecular interactions Computer aided analysis is time saving Automated prediction of molecular interactions is the key to rational drug design Measuring the relative strength of interactions in a cluster of interacting proteins Drug design: Virtual Screening Drug molecule database growth 49 Iowa State University Bioinformatics and Computational Biology Graduate Program Different types of docking Protein-protein docking: Two proteins – aprox. the same size Protein-ligand docking A large molecule (the receptor) and a small molecule (the ligand) 50 Iowa State University Bioinformatics and Computational Biology Graduate Program Rigid body and flexible docking Rigid body docking: bond angles, bond lengths, and torsion angles of the components are not modified Flexible Docking: Permits conformational change 51 Iowa State University Bioinformatics and Computational Biology Graduate Program Scoring function Van der Waals A/(r6) - B/(r12) where A and B are constants and r is the distance between them H-bond: occurs when one molecule has a Hydrogen atom close to the docking surface that interacts with an atom from the second molecule when the docking occurs Electrostatics The most significant force that draws parts of the molecules closer together or further apart 52 according to their electrical charge. Iowa State University Bioinformatics and Computational Biology Graduate Program 53 Iowa State University Bioinformatics and Computational Biology Graduate Program Protein-Protein Docking Examples Based on last CAPRI (Critical Assessment of Predicted Interactions) performances: • Zdock • Cluspro • Autodock • RosettaDock • PatchDock • HADDOCK 54 Iowa State University Bioinformatics and Computational Biology Graduate Program Protein-Ligand Docking Examples • • • • • • • • DOCK Autodock MOE-Dock GOLD FlexX Glide Hammerhead FLOG 55 Iowa State University Bioinformatics and Computational Biology Graduate Program Docking Server: ClusPro http://cluspro.bu.edu/ ClusPro is the first integrated automated server that incorporates both docking and discrimination steps for structural predictions of protein-protein complexes Using ClusPro, one can generate many relative orientation/conformations of the 2 proteins filter using desolvation + electrostatics potentials discriminate via clustering find the best fit (closest to native structure from x-ray crystallography results) between the 2 proteins Top ranked predictions of ClusPro further manual refinement and discrimination using existing biochemical constraints and analysis to eliminate false positives test binding affinity of promising protein pairs in vitro lead compounds used as starting points for drug development/optimization Can use ClusPro to screen databases of various existing, recombinant, or de novo proteins for their interaction to a protein target of interest ClusPro can be used to predict either: How a protein drug may bind (either inhibit or stimulate) a receptor How 2 proteins bind, and based on the structural details of the interaction design/screen for a drug that can inhibit that interaction Iowa State University Bioinformatics and Computational Biology Graduate Program Protein-protein docking Cyclin docked to Yeast transcription factor Ubiquitin-conjugating enzyme docked to Yeast transcription factor 57 Iowa State University Bioinformatics and Computational Biology Graduate Program DOCK program Protein-ligand docking Ligand flexibility is permitted algorithm's ability to find the lowest-energy binding mode force-field based scoring A function expressing the energy of a system as a sum of diverse molecular mechanics (or other) terms. an improved matching algorithm for rigid body docking and an algorithm for flexible ligand docking 58 Iowa State University Bioinformatics and Computational Biology Graduate Program Iowa State University Bioinformatics and Computational Biology Graduate Program Iowa State University Bioinformatics and Computational Biology Graduate Program 61 Iowa State University Bioinformatics and Computational Biology Graduate Program 62 Iowa State University Bioinformatics and Computational Biology Graduate Program Recap: • • • • Molecular Structures Structure Databases Homology Modeling Molecular Docking Now, what can we learn from motion? CSBSI Short Course, June, 2010 Iowa State University Bioinformatics and Computational Biology Graduate Program How do we know what they do? • Genome annotation (what you’ve been doing) • Molecular motions I. Molecular Dynamics II. Normal Mode Analysis (Elastic Networks) CSBSI Short Course, June, 2010 Iowa State University Bioinformatics and Computational Biology Graduate Program Ribosome Simulation http://www.pnas.org/content/102/44/15854.long •tMD simulation of 2,640,030 atoms •CPU time used ≈ 106 hours •Accommodation occurs ≈ 7/s •Simulated for a total time of 20ns (2E-8s) How do we handle these large systems when MD won’t do? CSBSI Short Course, June, 2010 Iowa State University Bioinformatics and Computational Biology Graduate Program Coarse-Grained MD Coarse-graining plug-in Existing issues with model parameters are smoothed or compounded? CSBSI Short Course, June, 2010 Iowa State University Bioinformatics and Computational Biology Graduate Program Elastic Network Models CSBSI Short Course, June, 2010 Iowa State University Bioinformatics and Computational Biology Graduate Program Force Field Comparison ENMs use Hookean springs for all interactions CSBSI Short Course, June, 2010 Iowa State University Bioinformatics and Computational Biology Graduate Program Elastic Network Models d ij rc 0 d ij rc N ik i j k 1,k i V R T R 2 Ri Ai Qi cosi t i Spring Constant kB Boltzmann Constant rc Cutoff Radius T Temperatur e V Potential Energy N # of Points Eigenvalue ε Phase Angle Q Eigenvecto r ΔR Fluctuation ω Frequency 1 Ri R j ZN Ri R j e 3k B T 1 Vtot k BT dR i i ij CSBSI Short Course, June, 2010 Iowa State University Bioinformatics and Computational Biology Graduate Program Elastic Network Models CSBSI Short Course, June, 2010 Iowa State University Bioinformatics and Computational Biology Graduate Program http://ignmtest.ccbb.pitt.edu/cgi-bin/anm/anm1.cgi 1. Locate a structure on PDB 2. Determine its primary function 3. Submit it to oANM 4. Relate the computed motions to known functions CSBSI Short Course, June, 2010 Iowa State University Bioinformatics and Computational Biology Graduate Program Acknowledgements Secondary structure prediction slides generously provided by Dr. Andrzej Kloczkowski The homology modeling section of this presentation is based on a presentation by Mateusz Kurcinski CSBSI Short Course, June, 2010