GRID and e-science in Drug Discovery Rod Hubbard Director, Structural Sciences, RiboTargets Ltd Senior Consultant, Accelrys Ltd Professor of Structural and Biological Chemistry, University of York China n+n meeting, E-science centre, Edinburgh, 26th June 2002 The Drug Development Process Probability of reaching market 23% 1 2 3 Market Regulatory PRECLINICAL Discovery Clinical trial phase 4 5 31% 6 7 64% 8 75% 9 10 11 Years UBS, 1998 Computation (MRC Clinical E-Science Framework) IT and informatics 12 The Discovery Phase Target Identification Hit Generation Lead Optimisation • Where most of the excitement and interest is in research • Where most of the technology companies are working to bring new methods for better, faster drug discovery • Main overlap with Research Council remits • Main areas where demonstrator / pilot grid projects have been established Target Identification Target Identification 6 mo Hit Generation 12 mo Lead Optimisation 24 mo For a particular disease - linking a gene to a particular biology Validation – interfering with target => desired therapeutic effect For some companies – large scale screening to identify and validate genes across many therapeutic areas For all companies – access to published information Bioinformatics • Massive amounts of experimental information • Disparate databases / data types • Making connections between data • Capturing error • MyGrid – EPSRC pilot project • Advantages? of “new”, computerate community • To date, relatively open, data sharing Value of Structural Biology in Target Identification What does the protein do? Mechanism – example of estrogen receptor OH OH S HO OH HO RAL (Selective antagonist) O HO O N E2 (agonist) ICI (full antagonist) N O Structural Proteomics Sequences Identify Homology & Build model From structure => assign function Database of known structures For pharma => target identification / validation Structure-based proteome annotation – BBSRC Pilot project Coordination of Sequence and Structural family data – MRC Pilot Commercial – Inpharmatica and Accelrys Hit Generation Target Identification 6 mo Hit Generation 12 mo Lead Optimisation 24 mo Finding a compound (hit) that affects target function - virtual screening Building diverse libraries for screening - combinatorial chemistry and cheminformatics Structural biology - high throughput crystallography / NMR / mass spec Virtual Screening Binding Surface Parallel Virtual Screening (RiboDock®) Virtual Hits Drug Profiling Large Compound library Similarity Searches etc. Virtual Libraries ‘Real’ Hits Lead Optimisation Low/Medium Throughput Assays Virtual Screening Cheminformatics • Large libraries of “real” compounds (>2m) • Very large libraries of virtual compounds • Methods for generating / handling / analysing / curating such large datasets • Comb-e-Chem (EPSRC Pilot Project) (but materials?) • Projects within large pharma investigating how to distribute parallel calculations around company intranet. (United Devices, Platform, Entropia) Lead Optimisation Target Identification 6 mo Hit Generation 12 mo Lead Optimisation 24 mo Simultaneous optimisation of compound properties – cycles of design, synthesis, assay, (structure) Detailed molecular modelling ADMET calculation / measurement Design of focussed combinatorial libraries Focussed – more informatics demands than computational “GRID” activity in pharma Small companies – building distinctive technology platforms Bioinformatics – Inpharmatica, Accelrys Cheminformatics – RiboTargets, Astex, de Novo Large pharma – starting to explore internal deployment of computational chemistry calculations Discussions on how to exploit distributed databases for knowledge generation Need for research councils to reconnect once pilot / demonstrators available Example • All pharma needs access to compounds • Many suppliers, generating new compounds in new formats • Currently – each pharma receiving a CD with structures • Entered into corporate databases. • Analysis => selection => ordering. • Can be 1000s of compounds a month. • Area for GRID? • local databases maintained by suppliers • Accessed / searched / ordered over the GRID • Ensures latest compounds available – saves pharma time • Issues – cultural – getting all suppliers to agree • Commercial – security for “customers” • Commercial – competition amongst suppliers • Many other areas (similar issues) What’s next • Increasing interest / investment in distributed computing for computational chemistry • Importance of pilot / demonstrator projects • Security – internal and particularly external • Going beyond just providing massive parallel computing power • Successful demonstrators will stimulate the industry • Cultural and commercial issues