Virtual screening and collaborative computing

advertisement
UNIVERSITÁ DEGLI STUDI DI MILANO
Facoltà di Scienze del Farmaco
Virtual screening and collaborative computing:
a new frontier in drug discovery
Alessandro Pedretti
XI Congreso Venezolano de Química
Caracas, June 18, 2013
Overview
Collaborative
laboratory.
computing
applied
in
a
computational
chemistry
WarpEngine paradigm to distribute the calculations in the local network.
Virtual screening setup to choose the best software and parameters.
Two WarpEngine applications to evaluate its performances.
Short WarpEngine practical session.
What is the collaborative computing
Main definition:
The “collaborative computing” term includes technologies and informatics
resources based on a network communication system that allows the documents
and projects to be shared between users.
All activities are managed by a variety of devices such as desktops, laptops,
tablets and smartphones.
In a computational chemistry laboratory:
The daily activity of a computational chemist requires not only to share
information and data between the users, but also hardware resources.
Typical scenario in a lab
Internet
Servers
PCs
Firewall
Several PCs with
heterogeneous hardware / OSs.
Very high computational power
“fragmented” on the local
network.
Hard possibility to use all
computational power to run a
single complex calculation.
Network
devices
Ethernet
infrastructure
100-1000 Mbit/s
Main features
Parallel computing without the grid paradigm.
Client/server architecture with hot-plug capabilities.
Possibility to perform calculations with different pieces of software without
changing the main code.
Expandable by scripting languages.
High-level database interface integrated in the main code supporting the most
common SQL database engines (Access, MySQL, SQLite, SQL Server, etc).
Easy configuration by graphic interface.
High performances and security.
What we need …
… to develop WarpEngine:
High-level database interface.
Fast customizable Web server.
Property calculation
Script engine.
Graphic environment.
Molecule editing
MM / MD calculations
Surface mapping
Trajectory analysis
File format conversion
Database engine
Graphic interface
Plug-in expandability
Scripting languages
Server scheme
Project
manager
UDP server
PowerNet
plug-in
Job
manager
Database
engine
Client
manager
VEGA ZZ
core
HTTP server
IP filter
TCP/IP, HTTP,
broadcast
Main program
Optional encrypted tunnel
provided by WarpGate
To clients
Client scheme
PowerNet plug-in
Main program
Project
manager
Multithreaded
worker
UDP client
HTTP client
TCP/IP, HTTP, broadcast
VEGA ZZ
core
To the server
Application fields
WarpEngine is easy expandable by scripting languages, hence it’s possible to
perform some calculation types:
Semi-empirical calculations
Ab-initio calculations
Rescore of docking poses
Multiple molecular mechanics calculations
Virtual screening
Drug discovery and virtual screening
Today, the virtual screening is a very common approach to identify hit compounds
from large libraries of molecules in the drug discovery process.
It can be classified in:
Ligand-based
The 3D structure of the biological target is unknown and a set of geometric
rules and/or physical-chemical properties (pharmacophore model) obtained by
QSAR studies are used to screen the library.
Structure-based
It involves molecular docking calculations between
each molecule to be tested and the biological target
(usually a protein). To evaluate the affinity, a scoring
function is applied. The 3D structure of the target
must be known.
Dis-advantages of the virtual screening
Advantages:
Database
Fast (but it depends by the library size).
Possibility to optimize the in-home resources.
Cheap.
Disadvantages:
False positive rate.
Virtual
screening
Limited chemical space (ligand-based).
Impossibility to discriminate the intrinsic activity
(structure-based).
Necessity to confirm the results by experimental
assays.
Hit compounds
Choice of docking software for virtual screening
For test purposes, we choose three well known and free docking software:
AutoDock 4.2
http://autodock.scripps.edu
AutoDock Vina
http://vina.scripps.edu
PLANTS
http://www.tcd.uni-konstanz.de/research/plants.php
and the acetylcholine esterase (AchE) ligand database from Directory of Useful
Decoys (DUD, http://dud.docking.org), containing:
107 true active molecules
3892 true inactive molecules
All these ligands were docked into AchE crystal structure downloaded from
PDB (1EVE) in order to evaluate the predictive power and the performances of
each docking software.
Hit rate evaluation
The hit rate is the measure of the probability to find active ligands into a set of
molecules and it can be calculated by the following equation:
Active _ molecules
HR 
.100
All _ molecules
Considering the whole dataset:
107
HRRandom 
.100  2.68%
3999
The random hit rate is the probability to find an active compound by random
choices. In other words, every 100 randomly selected ligands from the data
set, there are 2.68 active compounds.
Evaluation of virtual screening performances
The performances of each virtual screening software are evaluated by:
sorting the results by the docking score;
calculating the hit rate in a set of top ranked molecules (1%, 2% and 5% of
the total data set);
calculating the enrichment factor:
EFTopN % 
HRTopN %
HRRandom
Every virtual screening calculation must have at least EF > 1.0 and to be
considered enough efficient EF > 2.0. It means that the screening must have
performances at least 2-fold better than the random.
AutoDock and Vina results
two AutoDock runs were performed: screening and full docking parameters.
one Vina calculation with exhaustiveness set to 7;
both software use a similar scoring function based on Amber force field.
Software
AutoDock
AutoDock
Vina
Exhaustiveness
Screening
Full docking
7
Flexible
chains
No
No
No
Enrichment factor
1%
4,67
7,47
1,87
2%
3,27
4,20
2,34
5%
1,68
3,55
2,06
Single
CPU time
(hours)
44,96
1344,00
342,00
PLANTS results
The PLANTS enrichment performances were evaluated by considering:
all three scoring functions (ChemPLP, PLP and PLP95);
two degrees of exhaustiveness (Speed1 and Speed2);
flexible side chains of aminoacids (PLP and Speed2 only).
Score
ChemPLP
ChemPLP
PLP
PLP
PLP
PLP95
PLP95
Exhaustiveness
Speed1
Speed2
Speed1
Speed2
Speed2
Speed1
Speed2
Flexible
chains
No
No
No
No
Yes
No
No
Enrichment factor
1%
2%
19,62
18,69
19,62
19,62
20,56
17,75
16,82
11,21
10,74
10,28
10,28
10,28
10,28
9,81
5%
5,98
5,23
5,23
5,23
5,05
4,86
4,48
Single
CPU time
(hours)
97,64
66,64
44,08
30,28
350,80
37,04
34,44
Hardware for the test
1 PC configured as client and server:
Quad-core
9 PC configured as client:
1 six-core
7 quad-core
1 dual-core
1 single-core
Operating systems:
6 Windows 7 Pro x64
3 Windows 7 Pro
1 Windows XP Pro
Network connection:
Ethernet 100 Mbs
37 cores
42 Gb ram
> 3 Tb storage
Software & data for the test
APBS – Adaptive Poisson-Boltzmann Solver
Calculation of solvation energy.
PLANTS – Protein-Ligand ANT system
Structure-based virtual screening.
Database of drugs in .mdb format
174.398 molecules, average MW 353,70.
Human M2 muscarinic receptor
PDB ID: 3UON.
Both programs
are single-threaded
Real case tests
APBS – Solvation energy calculation.
174.398 molecules, two APBS calculation for each molecule (reference and
solvated state).
Time required by a single thread calculation:
Time required by WarpEngine:
WarpEngine speed:
13 days 5 hours
8 hours 36 minutes
339,10 jobs / min.
PLANTS – Virtual screening.
174.398 molecules, M2 target, PLP, speed2.
Time required by a single thread calculation:
Time required by WarpEngine:
WarpEngine speed:
36 days 22 hours
1 day 0 hour 1 minute
121,00 jobs / min.
Test Drive
Graphic interface
Graphic interface
Conclusions
The collaborative computing not only can help the users to work
together on the same project, but also can be extended efficiently to
share the computational resources that remain often unused.
WarpEngine can collect the unused computational power and convey it
to carry out large calculations, such as a virtual screening, without
interfering with the normal user activities.
The setup phase of a virtual screening plays a
pivotal role to obtain good performances in
terms of results and calculation speed.
Acknowledgements
Giulio Vistoli
Matteo Lo Monte
Angelica Mazzolari
www.vegazz.net
Download