Molecular Docking With ArgusLab Mark Thompson Planaria Software Seattle, WA http://www.planaria-software.com Molecular Docking A complicated search problem Find the optimal ligand/protein configurations and accurately (at least consistently) predict their binding free energy without using formal statistical mechanics approaches. Ligand is flexible. Protein binding site is flexible (side chains and protein backbone). Do this in under ~5 seconds on a commodity processor (ideally under 1 second or faster). Abstract We have developed two docking engines and an empirical scoring function in ArgusLab 4.0. • ShapeDock: shape-based method, approximates exhaustive search. • GADock: Lamarckian genetic algorithm similar to AutoDock. • AScore: scoring function based on XScore of Wang and coworkers. Typical ShapeDock times for ligands with 10-15 torsions are < 30 seconds on a 2.4 GHz Pentium laptop computer. Our docking code is implemented for both interactive docking and screening of ligand databases. ArgusLab 4.0 Molecular modeling application runs on Windows platforms. ShapeDock Approximates an exhaustive search (similarities to Fred, Dock, and Glide) 1. Ligand is described as a torsion tree Nodes are groups of bonded atoms that do not have rotatable bonds; connections between nodes are torsions. Topology of tree is crucial to efficient docking. A balanced tree with a large central node is best. 2. Construct two grids that overlay the binding site Grid points marked as inside or outside the free volume of binding site. Fine grid used to determine if atoms of a pose fragment are inside or outside the binding site. Coarse grid is used to establish the search points inside the binding site. 3. At each “search point of interest” Ligand’s root node is placed on a search point and a set of diverse and energetically favorable rotations is created. Translations near the search point are allowed to remove bumps with the target. ShapeDock 4. For each rotation, construct the torsions in breadth-first order Use pre-defined torsion values based on bond order of central bond. Use fine grid to test newly added atoms for bumps with protein and intra-ligand contacts to accept or reject pose fragment. Allow small torsion adjustments to minimize bumps. (rings are treated as rigid) 5. Score pose candidates Poses candidates are those that survive the torsion search. They are ranked to maintain set of N-lowest energy poses (N typically 50-150). Clustering poses as they are found maintains diversity in the final set. 6. Optimize the final set of poses •Coarse minimization of all poses. •Re-cluster and rank using more aggressive cluster cutoff. •Minimize 25 lowest energy poses more aggressively. •Stochastic search of 25 lowest poses to find nearby minima. GADock Lamarckian Genetic Algorithm Genetic Degrees of Freedom Translation Rigid-body rotation Torsions Search procedure Population of individuals Fitness of each is docking score Each generation: Select breeding individuals Mutation Crossover Local minimization Elitism Check for convergence Compare GADock & ShapeDock GADock Robust & General Slow, hard to define convergence Not reproducible (Stochastic) Can get caught in a local minima ShapeDock Some ligand/binding site types still cause problems Fast! Reproducible Formally explores all minima Sample Preparation and Run Ligand Hydrogens added Hybridization and AScore atom types assigned Target Crystal waters remain Hybridization and AScore atom types assigned Miscellaneous Atom charges not required All steps are done automatically inside ArgusLab (no user intervention required). However, manual modifications to above may be done if desired. Running the docking calculation: Select the ligand and binding site. Accept default parameters (grid size and resolution) or modify them. Run the docking. ShapeDock: Typical Timings Target Ligand 1HPV VX478 1HVR XK263 4DFR Methotrexate 1IEP Gleevec 1CBX Benzylsuccinate 1STP Biotin 3PTB Benzamidine 2.4 GHz Pentium(R) 4 Dell Inspiron laptop Torsions 14 8 9 7 5 5 0 Time (sec) 21 16 3 11 5 3 1 AScore an empirical scoring function AScore is based on terms taken from the HPScore piece of XScore [1] DGbind = DGvdw + DGhydrophobic + DGH-bond + DGH-bond (chg) + DGdeformation + DG0 DGvdw = CVDW VDW DGhydrophobic = Chydrophobic HP DGH-bond = CH-bond HB DGH-bond (chg-chg & chg-neutral) = CH-bond(chg) HB DGdeformation = Crotor RT DG0 = Cregression [1] “Further development and validation of empirical scoring functions for structure-based binding affinity prediction” Wang, R, Lai, L, and Wang, S. J. Comp. Aided Mol. Design 16, 11-26, 2002 AScore ligand protein VDW = i j d 8 d 4 ligand ligand d 8 d 4 ij ,0 2 ij, 0 ij , 0 2 ij ,0 r r rij i j i rij ij ij dij,0 is sum of vdW radii of atoms i,j intra-ligand VDW excludes 1-2, 1-3 bonded pairs. f d ligand protein HP = ij i j Sum is over hydrophobic ligand-protein atom pairs f(dij) = 1.0 = 2/3 (d0 + 2 – d) =0 d < dij,0 + 0.5Å dij,0 + 0.5 Å < d <= dij,0 + 2.0 Å d > dij,0 + 2.0 Å AScore ligand protein HB = HB ij i j HBij = f(rij) f(1,ij) f(2,ij) rij distance between donor/acceptor atoms 1,ij angle between donor root-donor-acceptor 2,ij angle between donor-acceptor-acceptor root ligand RT = RT i i RTi = 0 atom i not involved in any torsion. = 0.5 atom i involved in 1 torsion. = 1.0 atom i involved in 2 torsions. = 0.5 atom i involved in > 2 torsions. Each term varies from 1.0 to 0.0 depending on how close to ideal value. Maximum number of Hbonds per donor/acceptor atom imposed. AScore Differences with XScore AScore extends XScore to allow it to be used as the docking objective function. Separate H-bond term involving charged donor and/or acceptor groups. Max. number of H-bonds per donor/acceptor imposed by uniformly scaling total found to the maximum number allowed for any given ligand atom. Ligand has hydrogens added. Hydrogens included in the VDW term. Crystal waters retained (but hydrogens not added). H-bonds with crystal waters treated as having ideal H-bond geometry but with a scaling factor fit to experiment. H-bonds with target metals treated as ideal geometry, but with scaling factor fit to experiment. SH treated as H-bond donor/acceptor, >S treated as H-bond acceptor. Intra-ligand VDW energy included. Parameterization & Validation (in progress) •Begin with the published XScore parameters.[1] •Begin with Wang’s data set of 100 protein-ligand structures.[2] •Remove incorrect structures to get a final training set of 84 structures: 39 hydrophilic, 20 hydrophobic, 25 mixed •Modify H-bond parameters & other new parameters to improve correlation of score of x-ray pose and experiment binding free. Structure Type Correlation Hydrophilic Hydrophobic Mixed All Structures 0.53 0.84 0.70 0.70 DGbind with DGexperiment RMSD Binding Affinity (kcal/mol) 2.3 2.0 2.1 2.2 [1] “Further development and validation of empirical scoring functions for structure-based binding affinity prediction” Wang, R, Lai, L, and Wang, S. J. Comp. Aided Mol. Design 16, 11-26, 2002 [2] “Comparative Evaluation of 11 Scoring Functions for Molecular Docking” Renxiao Wang, Yipin Lu, and Shaomeng Wang. J. Med. Chem. 2003, 46, 2287-2303 Parameterization & Validation Dock the training set using the ShapeDock engine. Structure Type Correlation Hydrophilic Hydrophobic Mixed All Structures 0.43 0.80 0.61 0.64 DGbind with DGexperiment RMSD Binding Affinity (kcal/mol) 2.4 2.2 2.4 2.3 Ave. RMSD(Å) 1.4 1.9 1.7 1.6 Trial Study: Influenza Virus Neuraminidase [1] • Glycoprotein enzyme cleaves sialic acid residues from maturing virus particles. • Eleven conserved residues make up the binding site. • Dominated by H-bonding & charge-charge group interactions (e.g. carboxyl : guanidino) DANA GANA 100,000 x increase in binding affinity -10.2 kcal/mol -11.8 kcal/mol ~ 3x enhancement [1] “The Effect of Small Changes in Protein Structure on Predicted Binding Modes of Known Inhibitors of Influenza Virus Neruaminidase: PMF-Scoring in Dock4” Ingo Muegge, Med. Chem. Res. 9, 1999, 490-500. Neuraminidase Dockings ShapeDock 9 of the 10 structures reproduced the experimental binding mode. Correlation of predicted and measured binding affinities -9 -10 -9 -8 -7 -6 -5 -4 -3 -2 AScore Score (kcal/mol) 2 R = 0.70 Ave. RMSD = 1.55 Angstroms -10 -11 -12 log IC50 Docking in ArgusLab 4.0 • ShapeDock and GADock engines (IDockEngine interface, DockEngineFactory, etc). • AScore scoring function with modifiable parameter set (IScore interface). • Easy to make the ligand and binding site groups with one mouse click. • Dock ligand as flexible, rigid, or using only selected torsions. • Score current pose, optimize current pose, and full docking. • Scoring function pre-evaluated on a scoring grid(s). • Database docking supports SDF file as ligand database (IDataSource). • Efficient reuse of scoring and docking grids allows user to interactively modify ligand or choose new ligand and quickly dock new structures. • Results summarized in external file and in a tree-view. User can click on poses to view details. ArgusLab Capabilities • 3D interactive molecule builder & viewer • Computational experiments •QM: Extended Huckel, Semi-empirical (MNDO, AM1, PM3), ZINDO, and ab initio (via interface to Gaussian 98/03). •MM: Universal Force Field (UFF), CVFF, AMBER, custom force fields for research. Polarizable molecular mechanics, Rappe & Goddard’s charge equilibration scheme for UFF. •Geometry optimizations, electronic excited states, MD simulations, free-energy perturbation, and potential of mean force. •QM/MM and QM/MMpol. •Molecular Docking. • Properties & misc. dipole moments, atom-charges, transition properties, surface properties, animate normal modes, view dock poses, ribbons, solvent-accessible surfaces, SCRF solvent effects, explicit solvent, periodic boundary conditions, Ewald sums, etc. • Manage/organize results: treeview tool for editing structures and viewing results. and structures can be saved in ArgusLab XML file. Results Arguslab Architecture • Multi-document interface, multi-threaded. • Written in C++ (some old legacy C-code is wrapped in C++) • Uses OpenGL for graphics, Win32 API for windowing system. • Garbage collection for graphics objects, events, etc. • Custom hash-tables & containers in addition to use of STL. • Custom Model-View-Controller (MVC) transport layer. • 3D editor built on a command processor model (support undo/redo). Installed User Base • ~20,000 downloads/licenses. • Popular in university teaching programs and with students. (free ) • Used in several industrial settings. Score the PDBbind Database Score the 786 structures from the PDBbind database[1] (14 incorrect structures were removed from the original 800 in database) PDBbind Database Correlation 786 Structures 0.47 DGbind with DGexperiment RMSD Binding Affinity (kcal/mol) 2.9 [1] “The PDBbind Database: Collection of Binding Affinities for Protein-Ligand Complexes with Known Three-Dimensional Structures” Renxiao Wang, Xueliang Fang, Yipin Lu, and Showmeng Wang. J. Med. Chem. 2004, 47, 2977-2980