Docking_With_ArgusLab

advertisement
Molecular Docking With ArgusLab
Mark Thompson
Planaria Software
Seattle, WA
http://www.planaria-software.com
Molecular Docking
A complicated search problem
 Find the optimal ligand/protein configurations and accurately (at least
consistently) predict their binding free energy without using formal
statistical mechanics approaches.
 Ligand is flexible.
 Protein binding site is flexible (side chains and protein backbone).
 Do this in under ~5 seconds on a commodity processor (ideally under
1 second or faster).
Abstract
We have developed two docking engines and an empirical scoring function in
ArgusLab 4.0.
• ShapeDock: shape-based method, approximates exhaustive search.
• GADock: Lamarckian genetic algorithm similar to AutoDock.
• AScore: scoring function based on XScore of Wang and coworkers.
Typical ShapeDock times for ligands with 10-15 torsions are < 30 seconds on
a 2.4 GHz Pentium laptop computer.
Our docking code is implemented for both interactive docking and screening
of ligand databases.
ArgusLab 4.0
Molecular modeling application runs on Windows platforms.
ShapeDock
Approximates an exhaustive search
(similarities to Fred, Dock, and Glide)
1. Ligand is described as a torsion tree
Nodes are groups of bonded atoms that do not have rotatable bonds;
connections between nodes are torsions. Topology of tree is crucial to
efficient docking. A balanced tree with a large central node is best.
2. Construct two grids that overlay the binding site
Grid points marked as inside or outside the free volume of binding site.
Fine grid used to determine if atoms of a pose fragment are inside or
outside the binding site. Coarse grid is used to establish the search points
inside the binding site.
3. At each “search point of interest”
Ligand’s root node is placed on a search point and a set of diverse and
energetically favorable rotations is created. Translations near the search
point are allowed to remove bumps with the target.
ShapeDock
4. For each rotation, construct the torsions in breadth-first order
Use pre-defined torsion values based on bond order of central bond.
Use fine grid to test newly added atoms for bumps with protein and
intra-ligand contacts to accept or reject pose fragment. Allow small
torsion adjustments to minimize bumps. (rings are treated as rigid)
5. Score pose candidates
Poses candidates are those that survive the torsion search. They are
ranked to maintain set of N-lowest energy poses (N typically 50-150).
Clustering poses as they are found maintains diversity in the final set.
6. Optimize the final set of poses
•Coarse minimization of all poses.
•Re-cluster and rank using more aggressive cluster cutoff.
•Minimize 25 lowest energy poses more aggressively.
•Stochastic search of 25 lowest poses to find nearby minima.
GADock
Lamarckian Genetic Algorithm
Genetic Degrees of Freedom

Translation
 Rigid-body rotation
 Torsions
Search procedure

Population of individuals
 Fitness of each is docking
score
 Each generation:






Select breeding individuals
Mutation
Crossover
Local minimization
Elitism
Check for convergence
Compare GADock & ShapeDock
GADock

Robust & General
 Slow, hard to define
convergence
 Not reproducible
(Stochastic)
 Can get caught in a local
minima
ShapeDock

Some ligand/binding site
types still cause problems
 Fast!
 Reproducible
 Formally explores all
minima
Sample Preparation and Run
Ligand
Hydrogens added
Hybridization and AScore atom types assigned
Target
Crystal waters remain
Hybridization and AScore atom types assigned
Miscellaneous
Atom charges not required
All steps are done automatically inside ArgusLab (no user intervention
required). However, manual modifications to above may be done if desired.
Running the docking calculation:
Select the ligand and binding site. Accept default parameters (grid size and
resolution) or modify them. Run the docking.
ShapeDock: Typical Timings
Target
Ligand
1HPV
VX478
1HVR
XK263
4DFR Methotrexate
1IEP
Gleevec
1CBX Benzylsuccinate
1STP
Biotin
3PTB Benzamidine
2.4 GHz Pentium(R) 4 Dell Inspiron laptop
Torsions
14
8
9
7
5
5
0
Time
(sec)
21
16
3
11
5
3
1
AScore
an empirical scoring function
AScore is based on terms taken from the HPScore piece of XScore [1]
DGbind = DGvdw + DGhydrophobic + DGH-bond + DGH-bond (chg) + DGdeformation + DG0
DGvdw
= CVDW VDW
DGhydrophobic
= Chydrophobic HP
DGH-bond
= CH-bond HB
DGH-bond (chg-chg & chg-neutral) = CH-bond(chg) HB
DGdeformation
= Crotor RT
DG0
= Cregression
[1] “Further development and validation of empirical scoring functions for structure-based binding affinity prediction” Wang, R, Lai, L, and Wang, S. J. Comp. Aided Mol.
Design 16, 11-26, 2002
AScore
ligand protein
VDW =
 
i
j
 d 8  d  4  ligand ligand  d 8  d  4 
 ij ,0   2 ij, 0       ij , 0   2 ij ,0  
 r  


 r  
 rij 
i
j i  rij 
ij 

 ij  



dij,0 is sum of vdW radii of atoms i,j
intra-ligand VDW excludes 1-2, 1-3 bonded pairs.
  f d 
ligand protein
HP =
ij
i
j
Sum is over hydrophobic ligand-protein atom pairs
f(dij) = 1.0
= 2/3 (d0 + 2 – d)
=0
d < dij,0 + 0.5Å
dij,0 + 0.5 Å < d <= dij,0 + 2.0 Å
d > dij,0 + 2.0 Å
AScore
ligand protein
HB =
  HB
ij
i
j
HBij = f(rij) f(1,ij) f(2,ij)
rij distance between donor/acceptor atoms
1,ij angle between donor root-donor-acceptor
2,ij angle between donor-acceptor-acceptor root
ligand
RT =
 RT
i
i
RTi = 0 atom i not involved in any torsion.
= 0.5 atom i involved in 1 torsion.
= 1.0 atom i involved in 2 torsions.
= 0.5 atom i involved in > 2 torsions.
Each term varies from 1.0 to 0.0
depending on how close to ideal
value. Maximum number of Hbonds per donor/acceptor atom
imposed.
AScore
Differences with XScore
 AScore extends XScore to allow it to be used as the docking objective
function.
 Separate H-bond term involving charged donor and/or acceptor groups.
 Max. number of H-bonds per donor/acceptor imposed by uniformly scaling
total found to the maximum number allowed for any given ligand atom.
 Ligand has hydrogens added.
 Hydrogens included in the VDW term.
 Crystal waters retained (but hydrogens not added). H-bonds with crystal
waters treated as having ideal H-bond geometry but with a scaling factor fit to
experiment.
 H-bonds with target metals treated as ideal geometry, but with scaling factor fit
to experiment.
 SH treated as H-bond donor/acceptor, >S treated as H-bond acceptor.
 Intra-ligand VDW energy included.
Parameterization & Validation
(in progress)
•Begin with the published XScore parameters.[1]
•Begin with Wang’s data set of 100 protein-ligand structures.[2]
•Remove incorrect structures to get a final training set of 84 structures:
39 hydrophilic, 20 hydrophobic, 25 mixed
•Modify H-bond parameters & other new parameters to improve correlation of score of x-ray
pose and experiment binding free.
Structure
Type
Correlation
Hydrophilic
Hydrophobic
Mixed
All Structures
0.53
0.84
0.70
0.70
DGbind with
DGexperiment
RMSD
Binding
Affinity
(kcal/mol)
2.3
2.0
2.1
2.2
[1] “Further development and validation of empirical scoring functions for structure-based binding affinity prediction” Wang, R, Lai, L, and Wang, S. J. Comp. Aided Mol.
Design 16, 11-26, 2002
[2] “Comparative Evaluation of 11 Scoring Functions for Molecular Docking” Renxiao Wang, Yipin Lu, and Shaomeng Wang. J. Med. Chem. 2003, 46, 2287-2303
Parameterization & Validation
Dock the training set using the ShapeDock engine.
Structure
Type
Correlation
Hydrophilic
Hydrophobic
Mixed
All
Structures
0.43
0.80
0.61
0.64
DGbind with
DGexperiment
RMSD
Binding
Affinity
(kcal/mol)
2.4
2.2
2.4
2.3
Ave.
RMSD(Å)
1.4
1.9
1.7
1.6
Trial Study:
Influenza Virus Neuraminidase
[1]
• Glycoprotein enzyme cleaves sialic acid residues from maturing virus particles.
• Eleven conserved residues make up the binding site.
• Dominated by H-bonding & charge-charge group interactions (e.g. carboxyl :
guanidino)
DANA
GANA
100,000 x increase
in binding affinity
-10.2 kcal/mol
-11.8 kcal/mol
~ 3x enhancement
[1] “The Effect of Small Changes in Protein Structure on Predicted Binding Modes of Known Inhibitors of Influenza Virus Neruaminidase: PMF-Scoring in Dock4” Ingo
Muegge, Med. Chem. Res. 9, 1999, 490-500.
Neuraminidase Dockings
ShapeDock
9 of the 10 structures reproduced the experimental binding mode.
Correlation of predicted and measured binding affinities
-9
-10
-9
-8
-7
-6
-5
-4
-3
-2
AScore Score (kcal/mol)
2
R = 0.70
Ave. RMSD = 1.55 Angstroms
-10
-11
-12
log IC50
Docking in ArgusLab 4.0
• ShapeDock and GADock engines (IDockEngine interface, DockEngineFactory, etc).
• AScore scoring function with modifiable parameter set (IScore interface).
• Easy to make the ligand and binding site groups with one mouse click.
• Dock ligand as flexible, rigid, or using only selected torsions.
• Score current pose, optimize current pose, and full docking.
• Scoring function pre-evaluated on a scoring grid(s).
• Database docking supports SDF file as ligand database (IDataSource).
• Efficient reuse of scoring and docking grids allows user to interactively
modify ligand or choose new ligand and quickly dock new structures.
• Results summarized in external file and in a tree-view. User can click on
poses to view details.
ArgusLab Capabilities
• 3D interactive molecule builder & viewer
• Computational experiments
•QM: Extended Huckel, Semi-empirical (MNDO, AM1, PM3), ZINDO, and ab initio (via interface
to Gaussian 98/03).
•MM: Universal Force Field (UFF), CVFF, AMBER, custom force fields for research. Polarizable
molecular mechanics, Rappe & Goddard’s charge equilibration scheme for UFF.
•Geometry optimizations, electronic excited states, MD simulations, free-energy perturbation, and
potential of mean force.
•QM/MM and QM/MMpol.
•Molecular Docking.
• Properties & misc.
dipole moments, atom-charges, transition properties, surface properties,
animate normal modes, view dock poses, ribbons, solvent-accessible surfaces, SCRF solvent effects,
explicit solvent, periodic boundary conditions, Ewald sums, etc.
• Manage/organize results: treeview tool for editing structures and viewing results.
and structures can be saved in ArgusLab XML file.
Results
Arguslab Architecture
• Multi-document interface, multi-threaded.
• Written in C++ (some old legacy C-code is wrapped in C++)
• Uses OpenGL for graphics, Win32 API for windowing system.
• Garbage collection for graphics objects, events, etc.
• Custom hash-tables & containers in addition to use of STL.
• Custom Model-View-Controller (MVC) transport layer.
• 3D editor built on a command processor model (support undo/redo).
Installed User Base
• ~20,000 downloads/licenses.
• Popular in university teaching programs and with students. (free )
• Used in several industrial settings.
Score the PDBbind Database
Score the 786 structures from the PDBbind database[1]
(14 incorrect structures were removed from the original 800 in database)
PDBbind
Database
Correlation
786 Structures
0.47
DGbind with
DGexperiment
RMSD
Binding
Affinity
(kcal/mol)
2.9
[1] “The PDBbind Database: Collection of Binding Affinities for Protein-Ligand Complexes with Known Three-Dimensional Structures” Renxiao Wang, Xueliang Fang,
Yipin Lu, and Showmeng Wang. J. Med. Chem. 2004, 47, 2977-2980
Download