BioSimGRID and BioSimGRID ’lite’ -Towards a worldwide repository for biomolecular simulation www.biosimgrid.org Philip C Biggin http://indigo1.biop.ox.ac.uk phil@biop.ox.ac.uk Overview • Introduction - Motivation - Consortium - Case studies – added value from comparisons • Design - Architecture - Data schema • How to use - Deposition - Analysis - Worldwide application • The Future - Towards computational systems biology Current Paradigm for MD Simulations Target selection: literature based; interesting protein/problem System preparation: highly interactive; slow; idiosyncratic Simulation: diversity of protocols Analysis: highly interactive; slow; idiosyncratic Dissemination: traditional – papers, posters, talks Archival: ‘archive’ data … and then mislay the tape! No third party involvement bioinformatics & structural biology Integrating Simulations and Structural Biology of Proteins Novel structure (RCSB) Sequence alignment Biomedically relevant homologue(s) Homology model(s) BioSimGRID MD simulations bacterial K channel mammalian K channel Biomolecular simulation database dynamics in membrane Comparative analysis Interaction site dynamics drug discovery Evaluation/refinement of model Biological and pharmacological simulation & modelling e.g. drug discovery drug docking calculations Consortium • Oxford: Mark Sansom, Paul Jeffreys, Bing Wu, Kaihsu Tai York • Southampton: Jon Essex, Simon Cox, Stuart Murdock, Muan Hong Ng, Hans Fogohr, Steven Johnston Nottingham • London: David Moss • Nottingham: Charlie Laughton RAL Oxford • York: Leo Caves • Bristol: Adrian Mulholland Bristol Southampton London Comparative Simulations: Drug Receptors Why? – increase significance of results Sampling – long simulations and multiple simulations Sampling via biology – exploiting evolution Biology emerges from comparisons… e.g. mammalian receptor vs. bacterial binding protein glutamate D1 D2 Rat GluR2 EC fragment Major receptor in mammalian brains – drug target MD simulations with/without bound ligands Analyse inter-domain motions GluR2 – Flexibility & Gating… Kainate empty Glutamate > >> “ON” “OFF” 4 empty RMSD (Å) 3 +Glu 2 1 0 +Kai 0 0.5 1.0 time (ns) 1.5 2.0 Flexibility depends on ligand occupancy & species Gating mechanism – decrease in flexibility on channel activation But … incomplete sampling Need: longer simulations & comparative simulations GlnBP – A Bacterial Binding Protein X-ray structures MD Simulation empty + Gln Gln bound empty Gln bound GlnBP – bacterial 2-domain periplasmic binding protein Similar fold to mammalian GluR2 X-ray shows ligand binding induces domain closure MD shows ligand binding reduces inter-domain motions - cf. GluR2 simulations Case Study 2.. OMPLA AChE Acetylcholinesterase Outer-membrane phospholipase So how do compare… Similar active sites or similar motions Different structures Simulated with different MD packages (analysis difficult if not visualization) On different hard drives/tapes/CDs/DVDs. Under different graduate students’ desks Under different postdocs’ beds In different rubbish bins! Answer… Create a wordwide repository of molecular simulations…. BioSimGrid = BioSimDB + Toolkits + Integration BioSimGrid Architecture… GUI Web Application Python Application HTTP(S) SSH Apache / Tomcat / SSL / Python Service Authentication Authorisation Accounting Data Retrieval Tool Data Deposition Tool Analysis HTML Generator Tool Trajectory Query Tool Video/Img SQL Engine Editor TCP/IP Middleware BioSim Data Engine / Storage Resource Broker TCP/IP DB/Data Database Flat Files DB Flat File Size/GB 7.5 3.0 Random Access /s 560.8 18.6 Sequential Access 389.0 5.5 Cross-software Analysis… • BioSimDB = PDB (or NDB) for MD enable discovery of new science (cf. genomics/proteomic initiatives) CHARMM AMBER GROMACS NAMD LAMMPS BioSimDB TINKER It’s a Distributed Database Nobody has enough disk space in one place anyway Distributed and duplicate Any piece of information is stored in at least two sites …for resilience Current Architecture oxford.biosimgrid.org BioSim Data Engine Services IDA MCAT DB Interface DB Engine Database SRB Server SRB Agent F/F Interface F/F Engine Flat Files Cache soton.biosimgrid.org BioSim Data Engine Services SRB Agent SRB Server F/F Interface F/F Engine Flat Files Cache MCAT IDA DB Interface DB Engine Database Data Schema The hierachy is like that in the PDB: Chain residue atom coordinate …but also extended in the time dimension: frames Metadata.. …is the data about data MD setup, parameters, instantaneous properties, etc. People currently write this in papers People forget something The disciplined way: …structured schema Deposition… Unified deposition for trajectories from any packages. Analysis BioSimDB Toolkit • Analysis tools Radius of Gyration Surface and Volume RMSD/RMSF Centre of Mass Inter-atomic distances Distance matrix Internal angles Principal Component Analysis Average structure Current Implementation New workflow with BioSimGrid Target selection: literature based; interesting protein/problem Perform simulation (or use someone else’s) Protocals more systematically recorded/checked/confirmed Archive data to BioSimGrid Analyse shared data (either locally or distributed) Dissemination: traditional – papers, posters, talks Store results in BioSimGrid Third parties can analyse data you deposit That’s dandy - but who is this aimed at? • Novice and Expert.. Novice (web/GUI) Makes selections Guided through the options Can only do specific things Difficult to make mistakes Expert (employ scripting) Python interpreter Much available Reasonably unrestricted Example sessions Example sessions Example sessions Example sessions Example sessions Example sessions Example sessions Example sessions Even in script mode the syntax is quite informative:- FC = FrameCollection(`2, 100-200`) myRMSD = RMSD(FC) myRMSD.createPNG() Provide biochemists with little computational experience a means of analysing computational data and obtain meaningful results. Example sessions Viewlet of a session; Demo4.html BioSimGrid ‘Lite’ Light version before final rollout Provides equilibrated lipid bilayer boxes Also provides ontogeny: How the box came about… …metadata …equilibration process (all the frames) Deliverables to Date… • Database schema • Sample database (with test trajectories) • Prototype shared between 2 sites • Analysis tools – preliminary versions (about 14 tools) • Interface to database for data retrieval • Python hosting environment Roadmap Dec 2002 – project started July 2003 – (internal) prototype September 2003 – working prototype (All Hands meeting) November 2003 – test ‘real world’ applications December 2003 – multi-site prototype 2004 – multi-site deposition of data 2005 – open up to additional groups for deposition/testing If you are interested… The team would like to hear from interested parties especially with new ideas etc Benefits to you New directions are implemented Toolkit suits your needs Shared development of code Faster and more thorough development BioSimGrid Benefits Larger user community More work gets done Code is efficient. BioSimGrid and community is successful Future Directions in the GRID context 1. HTMD – simulations coupled to structural genomics 2. 3. Diamond light source Computational system biology – virtual outer membrane HPCx Multiscale biomolecular simulations – from QM/MM to meso-scale modelling GRID-enabled simulations BioSimGrid Structural Genomics & HTMD synchrotron compute GRID MD database novel biology… Overall vision – simulation as an integral component of structural genomics Needs capacity computation – GRID? MD database (distributed) – BioSimGRID Towards a Virtual Outer Membrane (vOM) Pi TolC OMPLA OmpT PiBP TonB FhuD FhuA OmpX PhoE OpcA OmpF OmpA LamB MalE d+ Pi First step towards computational systems biology – a suitable system Bacterial OMs – 5 or 6 proteins = 90% of protein content Structures or good homology models of proteins are available Complex lipid – outer leaflet is lipopolysaccharide (LPS) Minimum system size ca. 2.5x106 atoms; simulation times ca. 50 ns cf. current FhuA – 80,000 atoms & 10 ns – need HPCx Multiscale Biomolecular Simulations QM (Bristol) Drug-binding (Southampton) Protein Motions (Oxford) Drug Diffusion (London) Membrane bound enzymes – major drug targets (cf. ibruprofen, anti-depressants, endocannabinoids) Complex multi-scale problem: QM/MM; ligand binding; membrane/protein fluctuations; diffusive motion of substrates/drugs in multiple phases Need for GRID-based integrated simulations References… 1. 2. K. Tai, S. Murdock, B.Wu, MH Ng, S. Johnston, H. Fangohr, S. Cox, P Jeffreys, J. Essex, M.S.P. Sansom. Org. Biomol. Chem :: Under review MH Ng, S. Johnston, S. Murdock, B. Wu, K. Tai, H. fangohr, S. Cox, J. Essex, M.S.P. Sansom, P.Jeffrey. UK E-Science Programme All Hands Meeting 2004 :: Accepted. 3. Python Website – www.python.org 4. BioSimGrid – www.biosimgrid.org Acknowledgements Oxford Professor Mark Sansom Dr Carmen Domene Dr Alessandro Grottesi Dr Andrew Hung Dr Daniele Bemporad Dr Shozeb Haider Dr Kaihsu Tai (curation and integration) Dr George Patargias Oliver Beckstein Jennifer Johnston Syma Khalid Jorge Pikunic Pete Bond Zara Sands Jonathan Cuthbertson Sundeep Deol Jeff Campbell Yalini Pathy Loredana Vaccaro Shiva Amiri Katherine Cox Robert d’Rozario John Holyoake Samantha Kaye Anthony Ivetac Sylvanna Ho BBSRC EC (TMR) MRC Oxford e-Science Center Professor Paul Jeffreys Dr Bing Wu (database management) Matthew Dovey Ivaylo Kostadinov Southampton Dr Stuart Murdock (generic analysis tools) Dr Muan Hong Ng (data retrieval) Dr Hans Fangohr Steven Johnston Prof Simon Cox Dr Jon Essex Elsewhere Leo Caves (York) Charles Laughton (Nottingham) David Moss (Birkbeck) Oliver Smart (Birmingham) Adrian Mulholland (Bristol) Marc Baaden (Paris) DTI The Wellcome Trust OeSC (EPSRC & DTI) EPSRC GSK OSC (JIF) More information… team@biosimgrid.org www.biosimgrid.org