GRID and e-science in Drug Discovery

advertisement
GRID and e-science in Drug Discovery
Rod Hubbard
Director, Structural Sciences, RiboTargets Ltd
Senior Consultant, Accelrys Ltd
Professor of Structural and Biological Chemistry, University of York
China n+n meeting, E-science centre, Edinburgh, 26th June 2002
The Drug Development Process
Probability
of reaching
market
23%
1
2
3
Market
Regulatory
PRECLINICAL
Discovery
Clinical trial phase
4
5
31%
6
7
64%
8
75%
9
10
11
Years
UBS, 1998
Computation
(MRC Clinical E-Science Framework)
IT and informatics
12
The Discovery Phase
Target
Identification
Hit Generation
Lead Optimisation
• Where most of the excitement and interest is in research
• Where most of the technology companies are working to
bring new methods for better, faster drug discovery
• Main overlap with Research Council remits
• Main areas where demonstrator / pilot grid projects have
been established
Target Identification
Target
Identification
6 mo
Hit Generation
12 mo
Lead Optimisation
24 mo
For a particular disease - linking a gene to a particular biology
Validation – interfering with target => desired therapeutic effect
For some companies – large scale screening to identify and
validate genes across many therapeutic areas
For all companies – access to published information
Bioinformatics
• Massive amounts of experimental information
• Disparate databases / data types
• Making connections between data
• Capturing error
• MyGrid – EPSRC pilot project
• Advantages? of “new”, computerate community
• To date, relatively open, data sharing
Value of Structural
Biology in Target
Identification
What does the protein do?
Mechanism – example of estrogen
receptor
OH
OH
S
HO
OH
HO
RAL
(Selective
antagonist)
O
HO
O
N
E2 (agonist)
ICI (full
antagonist)
N
O
Structural Proteomics
Sequences
Identify Homology
&
Build model
From structure => assign function
Database of
known
structures
For pharma => target
identification / validation
Structure-based proteome annotation – BBSRC Pilot project
Coordination of Sequence and Structural family data – MRC Pilot
Commercial – Inpharmatica and Accelrys
Hit Generation
Target
Identification
6 mo
Hit Generation
12 mo
Lead Optimisation
24 mo
Finding a compound (hit) that affects target function
- virtual screening
Building diverse libraries for screening
- combinatorial chemistry and cheminformatics
Structural biology
- high throughput crystallography / NMR / mass spec
Virtual Screening
Binding Surface
Parallel
Virtual Screening
(RiboDock®)
Virtual
Hits
Drug Profiling
Large
Compound library
Similarity
Searches etc.
Virtual
Libraries
‘Real’
Hits
Lead
Optimisation
Low/Medium
Throughput
Assays
Virtual Screening
Cheminformatics
• Large libraries of “real” compounds (>2m)
• Very large libraries of virtual compounds
• Methods for generating / handling / analysing /
curating such large datasets
• Comb-e-Chem (EPSRC Pilot Project) (but materials?)
• Projects within large pharma investigating how to
distribute parallel calculations around company
intranet. (United Devices, Platform, Entropia)
Lead Optimisation
Target
Identification
6 mo
Hit Generation
12 mo
Lead Optimisation
24 mo
Simultaneous optimisation of compound properties – cycles
of design, synthesis, assay, (structure)
Detailed molecular modelling
ADMET calculation / measurement
Design of focussed combinatorial libraries
Focussed – more informatics demands than computational
“GRID” activity in pharma
Small companies – building distinctive technology platforms
Bioinformatics – Inpharmatica, Accelrys
Cheminformatics – RiboTargets, Astex, de Novo
Large pharma – starting to explore internal deployment of
computational chemistry calculations
Discussions on how to exploit distributed databases for
knowledge generation
Need for research councils to reconnect once pilot /
demonstrators available
Example
• All pharma needs access to
compounds
• Many suppliers, generating new
compounds in new formats
• Currently – each pharma receiving a
CD with structures
• Entered into corporate databases.
• Analysis => selection => ordering.
• Can be 1000s of compounds a month.
• Area for GRID?
• local databases maintained by suppliers
• Accessed / searched / ordered over the GRID
• Ensures latest compounds available – saves pharma time
• Issues – cultural – getting all
suppliers to agree
• Commercial – security for “customers”
• Commercial – competition amongst suppliers
• Many other areas (similar issues)
What’s next
• Increasing interest / investment in distributed
computing for computational chemistry
• Importance of pilot / demonstrator projects
• Security – internal and particularly external
• Going beyond just providing massive parallel computing power
• Successful demonstrators will stimulate the industry
• Cultural and commercial issues
Download