Using X-ray structures for bioinformatics Robbie P. Joosten Netherlands Cancer Institute Autumnschool 2013 Introduction Structures in bioinformatics • Understand biology – Direct interpretation – Data mining – Homology modeling • Drug design • Molecular dynamics Basic rule: Better structures → Better results Introduction Right structure(s) for the job 1. Selection: find (a number of) PDB entries 2. Validation: check the quality of your selection 3. Optimisation: maximise the quality of your selection Focus on X-ray structures Selection X-ray structures have a history 1. 2. 3. 4. 5. Protein expression Crystallisation X-ray diffraction experiment Model building and refinement Deposition at the PDB All these steps affect the final PDB file History Protein expression A ‘construct’ is made • Partial proteins – E.g. only extracellular domain of membrane protein • Frankenstein proteins – Fusion proteins or chimeras • Mutants are introduced – Some by accident! • Poly-histidine tags added for purification • Altered glycosylation state – Large sugars hamper crystallisation History Crystallisation The protein stacks regularly to form a crystal • Protein still functional in the crystal • Much solvent in the crystal (~40%) • Some residues can move – Disorder: missing loops/side chains – Alternate conformation History Crystallisation Beware of crystal packing • One copy of the protein can influence the next History Crystallisation Chemicals are used for crystallisation • Buffers to stabilise the pH • Precipitants – – – – Change solubility of the protein Neutralise local charges Bind water High concentrations are used • Compounds compete with natural ligands • Examples: – Polyethylene glycol (PEG) – Ammonium sulphate History Crystallisation Beware of the crystallisation conditions History Crystallisation Beware of the crystallisation conditions History X-ray diffraction Typical experiment Detector X-ray source History X-ray diffraction • X-rays interact with electrons – Atoms with few electrons (H, Li) do not diffract well • X-rays cause damage to the protein – – – – Acidic groups (ASP en GLU) can be destroyed Disulphide bridges are broken Hydrogens are stripped Cooling crystals in liquid nitrogen helps • Glycerol added to the crystal! History X-ray diffraction • We are not using a microscope • We don’t measure everything we need 1 ρ ๐ฅ, ๐ฆ, ๐ง = ๐ ๐นโ๐๐ ๐ [−2๐๐ โ ๐ โ๐ฅ+๐๐ฆ+๐๐ง −๐ผ] ๐ Measured Missing: phase X-ray diffraction gives an indirect and incomplete measurement History Model building and refinement Iterative process FT Phases + calculated X-ray data Measured X-ray diffraction data Initial phases Electron density maps Structure model History Model building and refinement Two types of maps 1. Regular electron density map (2mFo-DFc) 2. Difference map (mFo-DFc) History Model building and refinement Fitting atoms to the ED map and trying to remove difference density peaks History Model building and refinement • Requires skill and experience • Requires time and patience • Requires good software Lack of any of these can be seen in the final PDB file History Deposition at the PDB • Both coordinates and experimental X-ray data are deposited • PDB standardises files and adds annotation • Sometimes things go wrong History Deposition at the PDB LINKs between alternate conformations History Deposition at the PDB Un-biological LINKs LINK LINK LINK LINK LINK LINK LINK C C CF N C C N ACE PTH PTH DIP ACE PTH DIP C C C C D D D 100 101 101 103 100 101 103 (in 1a1a) N N OG C N N C PTH GLU SER GLU PTH GLU GLU C C A C D D D 101 102 188 102 101 102 102 Think of what happened to the structure before you downloaded it Validation X-ray specific validation Use the experimental data • Resolution says very little about the structure • (free) R-factor gives the overall fit of the structure to the experimental data • For biological interpretation more detail is needed Use the maps Validation X-ray specific validation Which is the better structure of berenil bound to DNA? PDB id Resolution R 268d 1d63 2.0 2.0 0.160 0.183 Validation X-ray specific validation The real-space R-factor (RSR) • A per-residue score of how well the atoms fit the map • Works like the R-factor (lower is better) Validation X-ray specific validation Maps can help distinguish the good and bad bits of a structure Validation Things you can find in maps Poorly fitted side-chains Evil peptides Validation Things you can find in maps The wrong drug Validation Things you can find in maps Sequence error K -> R • Accidental mutant • Also a missing sulfate Validation Things you can find in maps Missing water Missing alternate conformation Validation Checking maps • Visualisation in Coot – http://www2.mrclmb.cam.ac.uk/personal/pemsley/coot/ • Get maps and real-space R values from the Electron Density Server – http://eds.bmc.uu.se/eds/index.html – Direct interface with Coot • Get maps and updated models from PDB_REDO Practical session Maps show things you cannot see otherwise Optimisation Structures in the PDB • Solved by a diverse group of scientists – People make errors & gain experience • Since 1976 – Structures are not updated • Solved with the methods of their era – Methods improve over time Structures in the PDB do not represent the best we can do NOW Optimisation Improve structures in PDB • Take structure + experimental data • Use latest X-ray crystallography methods – Decision making: use case-specific methods – Create new methods when needed • Improve model quality – Fit with experimental data – Geometric quality • Fix errors PDB_REDO Optimisation PDB_REDO method Step 1: prepare data • Clean-up structure and X-ray data • Data mining Step 2: establish baseline • Fit with experimental data (R-factors) • Geometric quality – Validation with WHAT_CHECK Optimisation PDB_REDO method Step 3: re-refine structure (with Refmac) • Improve fit with experimental data – Use restraints to improve geometric quality • Improve description of protein dynamics – Concerted movement of groups of atoms (TLS) – Anisotropic movement of individual atoms Optimisation PDB_REDO method Step 4: rebuild structure • Delete nonsense waters • Flip peptide planes • Rebuild side-chains – Add missing ones – Optimise H-bonding Step 5: validate structure • Geometry • Density map fit • Ligand interactions Availability PDB_REDO databank • www.cmbi.ru.nl/pdb_redo – > 72,000 structures (98%) – Detailed methods & reprints • Directly in molecular graphics software – – – – YASARA CCP4mg Coot (needs plugin) PyMOL (needs plugin) • Linked via PDBe & RCSB Optimisation Does it work? (12,000 structures) • Improved fit with the data • Better geometry Ramachandran plot 100% Fine packing R-free 100% 100% 80% 75% 75% 50% 50% 50% 25% 25% 8% 12% 0% 17% Same 22% 25% 9% 4% 0% 0% Worse 74% 74% 75% Better Worse Same Worse Better Same Better Optimisation MolProbity validation PDB PDB_REDO (1eoi) Optimisation Electrostatics calculations • ‘Missing’ positive lysine atoms distort electrostatics calculations • Adding missing atoms correctly describes C-terminus interaction with side chains Optimisation Protein-ligand interaction • Wrong peptide plane in peptide ligand • Fixed by PDB_REDO • Better understanding of H-bonds in the interaction Optimisation Protein-protein interaction • Packing interface with poor ionic interactions • Rebuilt interface properly describes ionic dimerisation interactions Optimised structures give a better view of the biology of the protein PDB_REDOers Amsterdam: Nijmegen: Cambridge: • R Joosten • K Joosten • A Perrakis • T te Beek • M Hekkelman • G Vriend • G Murshudov • F Long Key contributors: Eleanor Dodson, Ian Tickle, Paul Emsley, Ethan Merritt, Elmar Krieger, Thomas Lütteke, Rachel Kramer Green, Sanchayita Sen