docking

advertisement
Molecular Docking
Ugur Sezerman
Sabanci University
What is docking?
Docking is finding the binding geometry of two interacting molecules
with known structures
The two molecules (“Receptor” and “Ligand”) can be:
- two proteins
- a protein and a drug
- a nucleic acid and a drug
Two types of docking:
- local docking:
the binding site in the receptor is known,
and docking refers to finding the position
of the ligand in that binding site
- global docking: the binding site is unknown. The search
for the binding site and the position of the
ligand in the binding site can then
be performed sequentially or simulaneously
What Are Docking & Scoring?
• To place a ligand (small molecule) into the binding
site of a receptor in the manners appropriate for
optimal interactions with a receptor.
• To evaluate the ligand-receptor interactions in a way
that may discriminate the experimentally observed
mode from others and estimate the binding affinity.
complex
ligand
docking
scoring
receptor
X-ray structure
& DG
… etc
Why Do We Do Docking?
• Drug discovery costs are too high: ~$800 millions,
8~14 years, ~10,000 compounds (DiMasi et al. 2003;
Dickson & Gagnon 2004)
• Drugs interact with their receptors in a highly specific
and complementary manner.
• Core of the target-based structure-based drug design
(SBDD) for lead generation and optimization.
Lead is a compound that
– shows biological activity,
– is novel, and
– has the potential of being structurally modified for
improved bioactivity, selectivity, and drugeability.
Docking Applications
• Determine the lowest free energy structures for the receptorligand complex
• Search database and rank hits for lead generation
• Calculate the differential binding of a ligand to two different
macromolecular receptors
• Study the geometry of a particular complex
• Propose modification of a lead molecules to optimize potency
or other properties
• de novo design for lead generation
• Library design
Key aspects of docking
• Scoring Functions
– What are they?
– Which Scoring Functions are feasible?
• Search Methods
– How do they work?
– Which search method should I use?
• Which program should I use?
Docking Challenge
• Both molecules are
flexible and may alter
each other’s structure
as they interact:
– Hundreds to thousands
of degrees of freedom
– Total possible
conformations are
astronomical
Formulation of Docking Problem
• A scoring function that can discriminate
correct (experimentally observed) docking
complex structure from incorrect ones
•
A search algorithm that finds the docking complex structure measured by the
scoring function
Formulation of Docking Problem
Factors Affecting ∆G0
Intramolecular Forces(covalent)
• Bond lengths
• Bond angles
• Dihedral angles
Intermolecular Forces (noncovalent)
• Electrostatics
• Dipolar interactions
• Hydrogen bonding
• Hydrophobicity
• Van der Waals
Types of Docking Problems
• Docking
– Bound docking : the goal is to reproduce a known
complex
– Unbound docking : complex structure not known
• Protein-Small Molecule Docking
– Rigid receptor, rigid ligand
– Rigid receptor, flexible ligand
– Flexible receptor, flexible ligand
Types of Docking Problems
Docking strategies require:
1) Protein representation
2) A search method
3) Final refinement and scoring
1. Protein Structure
• A 3-D structure of the target protein at atomic
resolution must be available
– Crystal and solution structures (PDB)
– Homology models
– Pseudoreceptor models
• Ideally, the atomic resolution of crystal structures
should be below 2.5 A
• Even small changes in structure can drastically
alter the outcome
Receptor Structures & Binding Site
Descriptions
• PDB (Protein Data Bank, www.rcsb.org/pdb/) containing
proteins or enzymes:
– X-ray crystal: >60,000 structures,~10 % have ≤ 1.5 Å, ~80% between
1.5-2.5 Å
– NMR:, ensemble accuracy of 0.4-1 Å in the backbone region, 1.5 Å in
average side chain position (Billeter 1992; Clore et al. 1993)
– (and high quality homology models built from highly similar sequences)
• Limitation of experimental structures (Davis et al. 2003):
– Locations of hydrogen atoms, water molecules, and metal ions
– Identities and locations of some heavy atoms (e.g., ~1/6 of N/O of Asn
& Gln, and N/C of His incorrectly assigned in PDB; up to 0.5 Å
uncertainty in position)
– Conformational flexibility of proteins
• Binding site descriptions: atomic coordinates, surface,
volume, points & distances, bond vectors, grid and
various properties such as electrostatic potential,
hydrophobic moment, polar, nonpolar, atom types, etc. DOCK
Drug, Chemical & Structural Space
• Drug-like: MDDR (MDL Drug Data Report) >147,000 entries, CMC
(Comprehensive Medicinal Chemistry) >8,600 entries
• Non-drug-like: ACD (Available Chemicals Directory) ~3 million entries
• Literatures and databases, Beilstein (>8 million compounds), CAS &
SciFinder
• CSD (Cambridge Structural Database, www.ccdc.cam.ac.uk): ~3 million X-ray
crystal structures for >264,000 different compounds and >128,00 organic
structures
• Available compounds
– Available without exclusivity: various vendors (& ACD)
– Available with limited exclusivity: Maybridge, Array, ChemDiv, WuXi
Pharma, ChemExplorer, etc.
• Corporate databases: a few millions in large pharma companies
3D Structural Information & Ligand
Descriptions
• 2D->3D software: CORINA, OMEGA, CONCORD,
MM2/3, WIZARD, COBRA. (reviewed by Robertson et
al. 2001)
• CSD: <0.1 Å for small molecules, but may not be the
bound conformation in the receptor
• PDB: ligand-bound protein structures ~6000 entries
• Atoms associated with inter-atom distances, physical
and chemical properties, types, charges,
pharmacophore, etc
• Flexibility: conformation ensemble, fragment-based
Scoring Functions
• A fast and simplified estimation of binding energies
scores <-> DGbinding
DGbinding   RT ln K affinity
-scores
 DGcomplex/ solv  DGligand/ solv  DG protein/ solv  DGinteraction  TDS  D
X-ray
structure
?
configurations of the complex
3. Scoring Functions
• Factors Affecting ∆G0
Intramolecular Forces(covalent)
• Bond lengths
• Bond angles
• Dihedral angles
Intermolecular Forces
• Electrostatics
• Dipolar interactions
• Hydrogen bonding
• Hydrophobicity
• Van der Waals
Types of Scoring Functions
• Force field based: nonbonded interaction terms as the score,
sometimes in combination with solvation terms
• Empirical: multivariate regression methods to fit coefficients of
physically motivated structural functions by using a training set of
ligand-receptor complexes with measured binding affinity
• Knowledge-based: statistical atom pair potentials derived
from structural databases as the score
• Other: scores and/or filters based on chemical
properties, pharmacophore, contact, shape
complementary
• Consensus scoring functions approach
3. Scoring Functions
Force Field Based
• CHARMM [Brooks83]
• AMBER [Cornell95]
Empirical methods:
• ChemScore [Eldridge97]
• GlideScore [Friesner04]
• AutoDock [Morris98]
• AutoDock Vina[Trott09]
Knowledge-based methods
• PMF [Muegge99]
• Bleep [Mitchell99]
• DrugScore [Gohlke00]
Force Field Based Scoring
Functions
 Aij Bij
qi q j

E   a  b  332

Drij
rij
i 1 j 1  rij
lig rec




e.g. AMBER FF in DOCK
• Advantages
– FF terms are well studied and have some physical basis
– Transferable, and fast when used on a pre-computed grid
• Disadvantages
– Only parts of the relevant energies, i.e., potential energies
& sometimes enhanced by solvation or entropy terms
– Electrostatics often overestimated, leading to systematic
problems in ranking complexes
Molecular mechanics force fields
• Usually quantify the sum of two energies
– the receptor–ligand interaction energy
– internal ligand energy (such as steric strain induced by binding)
• Interactions between ligand and receptor are most often
described by using van der Waals and electrostatic energy
terms.
• Advantages
– FF terms are well studied and have some physical basis
– Transferable, and fast when used on a pre-computed grid
• Disadvantages
– Only parts of the relevant energies, i.e., potential energies & sometimes
enhanced by solvation or entropy terms
– Electrostatics often overestimated, leading to systematic problems in ranking
complexes
Molecular mechanics force fields
• CHARMM
[Brooks83]
Molecular mechanics force fields
• AMBER:
[Cornell95]
FF Scoring: Implementations
• AMBER FF: DOCK, FLOG, AutoDOCK
• CHARMm FF: CDOCK, MC-approach (Caflisch et al. 1997)
• Potential Grid: rigid receptor structure upon docking. The grid-based score
interpolates from eight surrounding grid points only. 100-fold speed up. Examples:
DOCK, CDOCK, and many other docking programs.
• Soften VDW: A soft-core vdw potential is needed for the kinetic accessibility of
the binding site (Vieth et al. 1998). FLOG: 6-9 Lennard-Jones function; GOLD: 4-8
vdw + H-bond, and intraligand energy.
• Solvent Effect on Electrostatic: often approximated by rescaling the in vacuo
coulomb interactions by 1/D, where D = 1-80 or = n*r, n = 1-4, r = distance.
• Solvation and Entropy Terms: Solvation terms decomposed into nonpolar
and electrostatic contributions (e.g., DOCK):
Ebind  E nonbond  E solv,elec  E solv,np
Empirical Scoring Functions
DG  DG0  DGrot N rot  DGHB  neutral _ Hbonds f DR, D 
 DGio  ionic _ int f DR, D   DGaro  aro _ int f DR, D 
 DGlipo  lipo.cont f DR, D 
LUDI & FlexX
(Boehm 1994)
• Goals: reproduce the experimental values of binding energies and with its
global minimum directed to the X-ray crystal structure
• Advantages: fast & direct estimation of binding affinity
• Disadvantages
–Only a few complexes with both accurate structures & binding
energies known
–Discrepancy in the binding affinities measured from different labs
–Heavy dependence on the placement of hydrogen atoms
–Heavy dependence of transferability on the training set
–No effective penalty term for bad structures
Empirical Scoring: Implementations
Mostly differ by what training set and how many
parameters are used
• Cerius2/Insight2000: LUDI, ChemScore, PLP, LigScore
• SYBYL: FlexX, F-Score
• Hammerhead: 17 parameters for hydrophobic, polar complementary,
entropy, solvation. sLOO = 1.0 logK for 34 complexes
• VALIDATE: 8 parameters for VDW and Coulomb interactions, surface
complementarity, lipophilicity, conformational entropy and enthalpy, lipophilic and
hydrophilic complementarity between receptor and ligand surfaces
• PRO_LEADS: 5 coefficients for lipophilic, metal-binding, H-bond, and a
flexibility penalty term. sLOO = 2 kcal/mol for 82 complexes
• SCORE (Tao & Lai, 2001); ChemScore (GOLD)
•
Knowledge-based Potentials of
Mean Force Scoring Functions
(PMF)
Assumptions
– An observed crystallographic complex represents the optimum
placement of the ligand atoms relative to the receptor atoms
– The Boltzmann hypothesis converts the frequencies of finding atom A
of the ligand at a distance r from atom B of the receptor into an
effective interaction energy between A and B as a function of r
• Advantages
– Similar to empirical, but more general (much more distance data than
binding energy data)
• Disadvantages
– The Boltzmann hypothesis originates from the statistics of a spatially
uniform liquid, while receptor-ligand complex is a two-component nonuniform medium
– PMF are typically pair-wise, while the probability to find atoms A and B
at a distance r is non-pairwise and depends also on surrounding atoms
PMF: Implementations
• Verkhivker et al.(1995): 12 atom pairs, 30 complexes (HIV-1
and simian immunodeficiency virus). Test on 7 other HIV-1 protease complexes
• Wallqvist et al. (1995): 38 complexes, 21 atom types (10
C, 5 O, 5 N, 1 S). Test on 8 complexes sd=1.5 kcal/mol, and 20 complexes
rmsd=1.0 A.
DG pred     ij    ln Pij   
• Muegge et al. (1999): 697 complexes, 16 atom types
from receptor & 34 from ligand, 282 statistically
significant PMF interactions. Test on 77 diverse compounds: sd=1.8
i
j
log Ki. The PMF was combined with a vdw term to account for short-range
ij
interactions for DOCK4 docking:
 j
r 
 seg
PMF _ score   Aij r 
where Aij r   k B T ln  f Vol _ corr r  ij 
ij
 bulk 
kl , r  rcu to ff

• DrugScore (Gohlke et al, 2000), FlexX, BLEEP
Two Kinds of Search
Systematic
✽ Exhaustive
✽ Deterministic
✽ Outcome is dependent
on granularity of
sampling
✽ Feasible only for low
dimensional problems
✽ e.g. DOT (6D)
Stochastic
✽ Random
✽ Outcome varies
✽ Must repeat the
search to improve
chances of
success
✽ Feasible for bigger
problems
✽ e.g. AutoDock
Searching Algorithms
•
•
•
•
•
•
•
Systematic search
Molecular dynamics
Monte Carlo Simulations
Simulated annealing
Genetic algorithms
Lamarckian Genetic Algorithm
Incremental construction
Systematic Search
• Uniform sampling of search space
– Relative position (3)
– Relative orientation (3)
– Rotatable bonds in ligand (n)
– Rotatable bonds in protein (m)
FRED [Yang04]
Systematic Search
• Uniform sampling of search space
• Exhaustive, deterministic
• Quality dependent on granularity of sampling
• Feasible only for low-dimensional problems
Example: search all rotations
FRED [Yang04]
Molecular Mechanics
• Energy minimization:
• Start from a random or specific state (position, orientation,
conformation)
• Move in direction indicated by derivatives of energy function
• Stop when reach local minimum
Monte Carlo Simulations
• Tries to dock the ligand inside the receptor site through
many random positions and rotations
• In ICM and MCDOCK, this method is used to make random
moves of the ligand inside a receptor binding site.
• After each random move, a force-field based energy
minimization is applied.
• To avoid trapping in local minima, Monte Carlo combine this
procedure with other search methods, such as Simulated
Annealing, Genetic Algorithm and Lamarckian GA
Simulated Annealing
• Global optimization technique based on the
Monte Carlo method :
• Start from a random or specific state
(position, orientation, conformation)
• Make random state changes, accepting up-hill moves
with probability dictated by “temperature”
• Reduce temperature after each move
• Stop after temperature gets very small
Genetic Algorithm (GA)
• Genetic search of parameter
space:
• Start with a random population
of states
• Perform random crossovers and
mutations to make children
• Select children with highest
scores to populate next
generation
• Repeat for a number of iterations
Gold [Jones95], AutoDock [Morris98]
Lamarckian Genetic Algorithm
• LGA finds lowest fitness function (energy)
values first, then maps these values to their
respective genotypes
• Each new child is allowed to create a
new generation
• Genetic algorithm plus Solis and Wets
local search
Better performance than either
simulated annealing or genetic
algorithm alone
Incremental Extension
• Used in DOCK, FLEXX, FLOG and Surflex
• Greedy fragment-based construction:
• Partition ligand into fragments
Incremental Extension
• Greedy fragment-based construction:
• Partition ligand into fragments
• Place base fragment (e.g., with geometric hashing)
Incremental Extension
• Greedy fragment-based construction:
• Partition ligand into fragments
• Place base fragment (e.g., with geometric hashing)
• Incrementally extend ligand by attaching
fragments
Descriptor Matching Methods:
DOCK
• Distance-compatibility graph in DOCK (Ewing and Kuntz 1997):
distances between sphere centers and distances between ligand
heavy atoms
Descriptor Matching Methods
• Distance-compatibility graph in DOCK (Ewing and Kuntz 1997):
•
distances between sphere centers and distances between ligand heavy atoms
Interaction site matching in LUDI (Boehm 1992): HBA<->HBD, HYP<>HYP
• Pose clustering and triplet matching in FlexX (Rarey et al. 1996):
HBA<->HBD, HYP<->HYP
•
•
•
•
Shape-matching in FRED (Openeye www.eyesopen.com)
Vector matching in CAVEAT (Lauri and Bartlett 1994)
Steric effects-matching in CLIX (Lawrence and Davis 1992)
Shape chemical complementarity in SANDOCK (Burkhard et al.
1998)
• Surface complementarity in LIGIN: (Sobolev et al. 1996)
• H-bond matching in ADAM (Mizutani et al. 1994)
Fragment-based Methods
• Flexibility and/or de novo design
• Identification and placement of the base/anchor fragment are very
important
• Energy optimization (during or post-docking) is important
• Examples
–Incremental construction in FlexX with triplet matching and pose clustering to
maximize the number of favorable interactions
–Growing and/or joining in LUDI from pre-built fragment and linker libraries
and maximize H-bond and hydrophobic interactions
–Anchor-based fragment joining in DOCK
Molecular Simulation: MD & MC
• Two major components:
– The description of the degrees of freedom
– The energy evaluation
• The local movement of the atoms is performed
– Due to the forces present at each step in MD (Molecular Dynamics)
– Randomly in MC (Monte Carlo)
• Usually time consuming:
– Search from a starting orientation to low-energy configuration
– Several simulations with different starting orientation must be
performed to get a statistically significant result
• Grid for energy calculation. Larger steps or multiple
starting poses are often used for speed and sampling
coverage in MD:
– Di Nola et al. 1994; Mangoni et al. 1999; Pak & Wang 2000; CDOCKER
by Wu et al. 2003.
MC-based Docking
 E ( B)  E ( A) 

P  exp  
k BT


where T is reduced based on a so-called cooling schedule, and grid can be
used for energy calculation.
• An advantage of the MC technique compared with
gradient-based methods (e.g. MD) is that a simple energy
function can be used which does not require derivative
information, and able to step over energy barrier.
• AutoDOCK (Goodsell & Olson 1990). MCDOCK (Liu &
Wang 1999), PRODOCK (Trosset & Scheraga 1999), ICM
(Abagyan et al. 1994).
• Simulated annealing is used in DockVision (Hart & Read
1992) and Affinity (Accelrys Inc., San Diego, CA)
• Energy minimization is used in QXP (McMartin &
Bohacek 1997).
Genetic Algorithm Docking
• A fitness function is used to decide which individuals
(configurations) survive and produce offspring for the
next iteration of optimization. Degrees of freedom are
encoded into genes or binary strings.
• The collection of genes (chromosome) is assigned a
fitness based on a scoring function. There are three
genetic operators:
– mutation operator randomly changes the value of a gene;
– crossover exchanges a set of genes from one parent chromosome to
another;
– migration moves individual genes from one sub-population to another.
• Requires the generation of an initial population where
conventional MC and MD require a single starting
structure in their standard implementation.
• GOLD (Jones et al. 1997); AutoDock 3.0 (Morris et al.
1998); DIVALI (Clark & Ajay 1995).
DOCK (Kuntz, UCSF)
Receptor Structure
• X-ray crystal
• NMR
• homology
Binding Site
Molecular Surface
of Binding Site
Binding Mode Analysis for
Lead Optimization: binding
orientations and scores for each
ligands
Virtual Screening for
MTS/HTS and Library
Design: ligands in the order
of their best scores
Scoring Orientations
1. Energy scoring (vdw and electrostatic)
2. Contact scoring (shape complementarity)
3. Chemical scoring
4. Solvation terms
Filters
Spheres describing the
shape of binding site and
favorable locations of
potential ligand atoms
Ligands
Matching heavy atoms of
ligands to centers of
spheres to generate thousands
of binding orientations
• 3D structure
• atomic charges
• potentials
• labeling
FlexX (Tripos/SYBYL)
• Fragment-based, descriptor matching, empirical scoring
(Rarey et al. 1996)
• Procedures:
– Select a small set of base fragment suitable for placement using a
simple scoring function.
– Place base fragments with the pose clustering algorithm: rigid, triplet
matching of H-bond & hydrophobic interactions, Bohm's scoring
function
– Build up the remainder of the ligand incrementally from other
fragments
• Ligand conformations
– MIMUMBA model with CSD derived low energy torsional angles for
each rotatable bond and ring from CORINA.
– Multiple conformations for each fragment in the ligand building steps
• Other works: Explicit waters are placed into binding site during the docking
procedure using pre-computed water positions(Rarey et al. 1999). Receptor
flexibility using discrete alternative protein conformations (Claussen et al. 2001;
Claussen & Hindle 2003)
GOLD
• GA method, H-bond matching, FF scoring (Jones et al.
1997)
– A configuration is represented by two bit strings:
1. The conformation of the ligand and the protein defined by the
torsions;
2. A mapping between H-bond partners in the protein and the
ligand.
– For fitness evaluation, a 3D structure is created from the chromosome
representation. The H-bond atoms are then superimposed to H-bond
site points in the receptor site.
– Fitness (scoring) function: H-bond, the ligand internal energy, the
protein-ligand van der Waals energy
– Rotational flexibility for selected receptor hydrogens along with full
ligand flexibility
• Highlights:
– Validation test set: 100 complexes, 66 with rmsd<2A.
– The structure generation is biased towards inter-molecular H-bonds.
– Hydrophobic fitting points was added (GOLD 1.2, CCDC, Cambridge,
UK 2001).
LUDI: Matching polar and hydrophobic
groups
• Calculate protein and ligand interaction sites (H-bond or
hydrophobic), which are defined by centers and surface,
from
– non-bonded contact distributions based on a search through the CSD,
– a set of geometric rules,
– the output from the program GRID (Goodford 1985) which calculates
binding energies for a given probe with a receptor molecule.
• Fit fragments onto the interaction sites.
– distance between interaction sites on the receptor
– an RMSD superposition algorithm,
– A hashing scheme to access and match surface triangles onto a
triangle query of a ligand interaction center.
– A list-merging algorithm creates all triangles based on lists of fitting
triangle edges for two of the three query triangle edges.
• Join/grow fragments using the databases of fragments
and the same fitting algorithm.
GLIDE (www.schrodinger.com)
• Funnel: site point search -> diameter test -> subset test -> greedy
score -> refinement -> grid-based energy optimization ->
GlideScore.
• Approximates a complete systematic search of the conformational,
orientational, and positional space of the docked ligand.
• Hierarchical filters, including a rough scoring function that
recognizes hydrophobic and polar contacts, dramatically narrow
the search space
• Torsionally flexible energy optimization on an OPLS-AA nonbonded
potential grid for a few hundred surviving candidate poses.
• The very best candidates are further refined via a MC sampling of
pose conformation.
• A modified ChemScore (Eldridge et al. 1997) that combines
empirical and force-field-based terms.
• Validation: 282 complexes, new ligand conformation, the topranked pose: 50%<1 A, ~33% >2 A.
Matrix of Accuracy & Success
•
•
•
•
•
•
Drug <- Quality Novel Lead <- Active
Reproduce binding mode (X-ray crystal structures)
Predict binding affinity (free energies)
Rank diverse set of compounds (by binding affinity)
Enhance hit rate for database mining
Reduce false positive (Nselected-Nhits) and false negative
(Nall_hits-Nhits)
Fast enough for iterative SBDD
active inactive
active TRUE FALSE
inactive FALSE TRUE
 N hits


N selected VS
HVS 
EF 

H0
 N all _ hits


N all 

0
Accuracy of Docking
• Reality Boundary
– Experimental errors: 0.1-0.25 kcal/mol (18-53%) with MSR (maximum
significant ratio) as much as 3 fold (0.65 kcal/mol)
– Free energy calculation accuracy: ~1 kcal/mol (5.4 fold) starting with
an accurate geometric model & fully sampling
– Entropy and solvation estimation need a sufficiently long simulation
run with an accurate force field, an ensemble of explicit of water
molecules, and fully sampling
• Current
–
–
–
–
–
Reproduce X-ray structure with rmsd<2A: 50-90% achievable
Binding affinity: 1.5~2 log unit (32-100 fold, 2.05-2.73 kcal/mol)
Correlation between scores and affinities, r^2<0.3
Enthalpy ranking with minimization: ±5 kcal/mol
Hit rate enhancement : 2~50 fold with hit rate 1-20% (and high false
negative rate if 1~5% of total compounds selected)
Background & Motivation
•
•
•
•
•
•
Docking = process of starting with a set of coordinates for two distinct
molecules and generating a model of the bound complex
Numerous methods which perform protein- protein docking exist today
Fourier correlation approach (Ritchie and Kemp, 2000) enabled the
generation of billions of possible docked conformation via defined scoring
functions
Problem: Many false-positives (good surface complementarity) that are far
from the native complex
Motivation: Need to develop methods to filter and rank the docked
conformations such that near-native complexes can be identified
ClusPro: an automated, fast rigid-body docking and discrimination algorithm
that:
1) Rapidly filters docked conformations
2) Ranks the conformations using clustering of computed pairwise RMSD
values
Input and Method Outline
CAPRI
Receptor-Ligand
Pairs
2,000 docked
conformations for 48
receptor-ligand pairs
Free Energy
Filtering
2,000 conformations
w/ low desolvation or
electrostatic energies
Discrimination
Via Clustering
Top 10 Clusters
(Centers)
Compare with
Native Structure
(RMSD)
Part I: Free-Energy Filtering
•
•
•
•
•
•
•
Goal: to identify docked conformations having good surface
complementarity by selecting those w/ lowest desolvation and electrostatic
energies
Surface complementarity is an important criteria due to the observation that
proteins tend to bury large surface areas after complex formation
Electrostatic and desolvation potentials (capturing the free energy of
association) are used independently since different binding mechanisms are
governed by different ratios of electrostatic/desolvation contributions
500 structures w/ lowest values of desolvation free energy retained
1500 structures w/lowest electrostatic energy retained
Electrostatics more sensitive to small coordinate perturbations  noisy
Cannot combine desolvation and electrostatics due to the noisy behavior of
electrostatics potential
Part II: Clustering based on Pairwise RMSD
•
•
•
By examining free energy landscapes of partially solvated receptor-ligand
complexes: native binding site is expected to be characterized by a local minima
having greatest width
In other words, the most probable conformation is expected to be surrounded by
lots of other low-energy conformations
Goal: to use a hierarchical clustering method to select and rank docked
conformations having the most “neighbors” given a defined cluster radius (in terms
of C-alpha RMSD)
Procedure:
1)
Need to define fixed molecule (receptor) and flexible molecule (ligand)
2)
Define a set of relevant ligand residues to be within 10 Angs of any atom in receptor
3)
For each docked conformation X, calculate its pairwise ligand RMSD with 1999 other
conformations
Pairwise ligand RMSD = deviations between coordinates of X’s defined set of
ligand residues and corresponding coordinates of another conformation
4)
Cluster the set of 2000 docked conformations using a 2000 by 2000 matrix of RMSD
values, and a cluster radius constraint of 9 Angs RMSD from the center
5)
Pick largest cluster  rank cluster center  remove conformations within this
cluster from matrix
6)
Pick next largest cluster -> rank cluster center  remove conformations within this
cluster from matrix  keep iterating until matrix is empty
Results
Result I:
• Tested the discrimination step of the method on a benchmark set of 48
interacting protein pairs (2000 docked conformations each)
• In 31/48 protein pairs, top 10 predictions include at least one near-native
complex (average RMSD of 5 angs from native structure)
Result II:
- Tested method in the CAPRI (Critical Assessment of Predictions of
Interactions) experiment and generated predictions for 9 target complexes
- Round 3 (automated server): ClusPro prediction ranked as #3 for Target 8
ClusPro Web Server
•
•
•
•
•
User Input: PDB files of the 2 protein structures that user would like to
analyze in terms complex formation
Output: 10 (default) top predictions of docked conformations closest to
native structure
First, docking of the 2 proteins is performed using 2 established FFT-based
docking programs (DOT and ZDOCK)
Then, filtering and discrimination is performed
Server allows for customization of parameters:
– Clustering radius
Smaller protein  smaller radius maybe more suitable
– Relative number of desolvation and electrostatic best hits used during
filtering
– Number of predictions to generate (1-30)
Protein Drug Discovery
•
•
•
•
•
•
•
•
Although small molecule drugs are more prevalent therapeutics in current drug
discovery, protein drugs is a rapidly growing area in pharmaceuticals
It is true that protein therapeutics can be much more costly (in terms of R&D and
synthesis) than small-molecule therapeutics, but protein therapeutics can deliver
biological mechanisms that are not possible with small-molecule therapeutics
Multiple blockbuster protein drugs are currently on the market
Conservative estimation: there exist between 3,000 and 10,000 possible drug
targets
Many of these new targets offer great opportunities for the development of protein
drugs
In 2002, drug companies sold nearly $33 billion in protein drugs
Rising at an average annual growth rate (AAGR) of 12.2%, this market is expected
to reach $71 billion in 2008.
Examples of popular classes of drug targets:
1) G-protein-coupled receptors
Compounds will be screened for their ability to inhibit (antagonist) or stimulate
(agonist) the receptor
2) Protein kinases
Compounds will be screened for their ability to inhibit the kinase
Application to Protein Drug Discovery
•
•
Ideal Drug: demonstrate high specificity and high affinity for the target protein
In order to evaluate the affinity of the potential drug with the target, you must first
predict what the binding interface looks like, and the relative positions of the potential
drug and target
•
ClusPro is the first integrated automated server that incorporates both docking and
discrimination steps for structural predictions of protein-protein complexes
Using ClusPro, one can generate many relative orientation/conformations of the 2
proteins  filter using desolvation + electrostatics potentials  discriminate via
clustering  find the best fit (closest to native structure from x-ray crystallography
results) between the 2 proteins
Top ranked predictions of ClusPro  further manual refinement and discrimination using
existing biochemical constraints and analysis to eliminate false positives  test binding
affinity of promising protein pairs in vitro  lead compounds used as starting points for
drug development/optimization
•
•
•
•
Can use ClusPro to screen databases of various existing, recombinant, or de novo
proteins for their interaction to a protein target of interest
ClusPro can be used to predict either:
– How a protein drug may bind (either inhibit or stimulate) a receptor
– How 2 proteins bind, and based on the structural details of the interaction 
design/screen for a drug that can inhibit that interaction
2.1 Rigid Docking
• Protein and ligand fixed.
• Search for the relative orientation of the two molecules with
lowest energy
• Fastest way to perform an
initial screening of a smallmolecule database
-> virtual-screening initiative
Rotamer Libraries
• Rigid docking of many conformations:
• Precompute all low-energy conformations
• Dock each precomputed conformations as rigid
bodies
Glide [Friesner04]
Rigid Docking Methods
• All rigid-body docking methods have in
common that superposition of point sets is a
fundamental sub-problem that has to be
solved efficiently:
• Geometric hashing
• Pose clustering
• Clique detection
Geometric Hashing
• Originates from computer vision technology for
recognizing partially occluded objects in camera
scenes
• Given a picture of a scene and a set of objects
within the picture, both represented by points in
2d space, the goal is to recognize some of the
models in the scene
• Objects with certain geometric features can be
accessed very fast through a geometric hashing
table
Pose-Clustering
• Originally developed to detect objects in 2-D
scenes with unknown camera location
• For each triangle of receptor compute the
transformation to each ligand matching
triangle.
• Cluster transformations.
• If a cluster grows large, a location with a high
number of matching features is found
eg. The FlexX Method
• The base fragment (the ligand core) is
automatically selected and is placed into the
active site using a pattern recognition
technique called pose clustering
• Next, the remainder of the ligand is built up
incrementally from other fragments.
Clique-Detection
•Nodes comprise of matches between protein and ligand
•Edges connect distance compatible pairs of nodes
•In a clique all pair of nodes are connected
Eg. DOCK 6
• The rigid body orienting code is written as a
direct implementation of the isomorphous
subgraph matching method of Crippen and
Kuhl
• Conceptually, the algorithm matchings the
centers of the ligand heavy atom to the
centers of the receptor site spheres.
DOCK 6
• The algorithm follows the steps below:
1) Generate node
2) Label as match if atom and sphere edges
are equivalent
3) Extend match by adding more nodes
4) Exhaustively generate set of nondegenerate matches
5) Use matches to create transformation
matrices to move the entire molecule
node = pairing of one heavy atom and one sphere center
edge length = Euclidean distance between atom or sphere centers
• Once an orientation has been generated, the interaction between the
ligand and the receptor can be energetically optimized (ligand is allowed to
be flexible in optimization)
2.2. Rigid Receptor, Flexible Ligand
 Multiple steps in the receptor – ligand
interaction:
• Approach
• Desolvation of the ligand and the
binding site of a protein
• Penetration into the protein cavity
• Change of the ligand orientation
• Adoption of the correct “active”
conformation
• Establishing of new H-bonds,
electrostatic and hydrophobic contacts
Free energy function :
Challenges
• Predicting energetics of protein-ligand binding
• Searching space of possible poses &
conformations
– Relative position (3 degrees of freedom)
– Relative orientation (3 degrees of freedom)
– Rotatable bonds in ligand (n degrees of freedom)
– Rotatable bonds in protein (m degrees of freedom)
2.3. Flexible Receptor, Flexible Ligand
• Protein flexibility can be introduced through
Monte Carlo or Molecular Dynamics
– Protein can be divided into rigid and flexible parts
-> only flexible receptor site atoms are free to move
– The procedure is still very slow
• Leach* developed a docking algorithm that
sequentially fixes the degrees of freedom of the
protein side-chain atoms
• Broughton** reported the use of conformational
samples from short protein MD simulation runs+
*Leach AR. Ligand docking to proteins with discrete side-chain flexibility. J Mol Biol1994; 235:345–356
**Broughton HB. A method for including protein flexibility in protein–ligand docking: Improving tools for
database mining and virtual screening. J Mol Graph Model 2000;18:247–257
AutoDock 4
• AMBER FF-based energy grid, flexible ligands,
rigid protein as represented in a grid
• GA as a global optimizer combined with
energy minimization as a local search method
• The fitness function:




a Lennard-Jones 12-6 dispersion/repulsion term
a directional 12-10 hydrogen bond term
a coulombic electrostatic potential
a term proportional to the number of sp3 bonds in the ligand to represent
unfavorable entropy of ligand binding
 a desolvation term
Comparison of Two Recent Versions
Autodock 4
• Scoring Function is based
on AMBER FF
– FF includes electrostatic
interactions, hydrogen bonds,
desolvation energy.
• “Torsion Tree” for Ligand
Flexibility
• Protein Flexibility by sidechain rotations
• Too many torsions are
problematic
Autodock Vina
• Faster than AutoDock 4
• More accurate than
AutoDock 4
• More User-friendly than
AutoDock 4 in case of
calculation of grid maps and
clusters
Our Case: Triacylglyceride Docking
into Lipase
• Lipase: Geobacillus thermocatenulatus Lipase (BTL2)
• Crystal Structure in 2009
– 2.2 Å Resolution (Carrasco-López C et al, J Biol Chem.
2009, PMID: 19056729)
• 2 Triton X-100 Molecule found in the crystal allows
identification of putative binding pockets for the acyl
chains (sn-1, sn-2, sn-3) of triglyceride.
Tributyrin (4 carbons in chain)
Tricaprylin (8 carbons in chain)
BTL2 (Apo-enzyme in
open-conformation)
79 /26
Work-Flow of Docking Study
Separating bound molecules from active site cleft
Apo-enzyme
Ligand
Definition of flexible/rigid bonds
Autodock 4.2 and Vina
Assesment of Docking
Outcomes
Poses, Scores
Selection of Best Binding Modes
Preparation of Input Structures: Protein
(BTL2)
S114
S114
F17
Open-Lid Conformation displaying
catalytic residues for ligand binding
F17
Locating search space (grid-box) for
triglyceride binding
Preparation of Input Structures:
Ligand (tricaprylin)
Preparation of Input Structures:
Ligand (tributyrin)
Results and Evaluation of Poses:
Tricaprylin (8C)
The predicted binding affinity is in kcal/mol.
rmsd/lb(c1, c2) = max(rmsd'(c1, c2), rmsd'(c2, c1))
This score matches each atom in one conformation with itself in the
other conformation, ignoring any symmetry
Results and Evaluation of Poses:
Tricaprylin (8C)
S114
_OH
TCPN
_O
Mode_1
F17
Results and Evaluation of Poses:
Tricaprylin (8C)
S114
_OH
TCPN
_O
_2
F17
_1
Results and Evaluation of Poses:
Tricaprylin (8C)
TCPN
_O
S114
_OH
F17
_7
Results and Evaluation of Poses:
Tricaprylin (8C)
S114
_OH
S114
_OH
TCPN TCPN
_O
_O
_8
F17F17
_7
Results and Evaluation of Poses:
Tributyrin (4C)
S114 TBTN
_OH _O
_1
F17
_3
Results and Evaluation of Poses:
Tributyrin (4C)
S114
_OH
_1
TBTN
_O
_6
F17
Results and Evaluation of Poses:
Tributyrin (4C)
S114
_OH
_1
F17
TBTN
_O
_7
VINA Outcome
After 1ns
93 /26
3.1. Force field-based scoring functions
• The parameters of the Lennard–Jones potential vary depending on the
desired ‘hardness’ of the potential.
• D-Score: Higher terms, 12–6 Lennard–Jones potential,result in
increasingly repulsive potentials and will be less forgiving of close
contacts between receptor and ligand atoms
•
G-score: Lower terms, 8–4 Lennard–Jones potential, make the
potential softer
3.1. Force field-based scoring functions
3.2. Empirical methods
• Goals: reproduce the experimental values of binding energies and with
its global minimum directed to the X-ray crystal structure
• Advantages: fast & direct estimation of binding affinity
• Disadvantages
–Only a few complexes with both accurate structures & binding
energies known
–Discrepancy in the binding affinities measured from different labs
–Heavy dependence on the placement of hydrogen atoms
–Heavy dependence of transferability on the training set
–No effective penalty term for bad structures
3.2. Empirical methods
• ChemScore:
3.2. Empirical methods
• GlideScore:
3.2. Empirical methods
• Autodock 4.0
3.2. Empirical methods
• Autodock Vina:
– Combines advantages of empirical methods and
knowledge-based potentials
– AutoDock Vina can be several orders of magnitude
faster than AutoDock 4
3.3. Knowledge-based methods
• Designed to reproduce experimental structures rather than
binding energies.
• Protein–ligand complexes are modelled using relatively
simple atomic interaction-pair potentials.
• Advantages
– Similar to empirical, but more general (much more distance data than binding
energy data)
• Disadvantages
– The Boltzmann hypothesis originates from the statistics of a spatially uniform
liquid, while receptor-ligand complex is a two-component non-uniform
medium
– PMF are typically pair-wise, while the probability to find atoms A and B at a
distance r is non-pairwise and depends also on surrounding atoms
3.3. Knowledge-based methods
• Parametrized Pairwise Potential (PMF) score:
Boltzmann
constant
Ligand volume
correction factor
Radial distribution
function for a
protein atom i and a
ligand atom j
3.3. Knowledge-based methods
• DrugScore:
[Gohlke00]
Multiple Method Approach


systematic search
conformations
initial poses
filters
rigid DOCK
minimization
finer docking
MD/SA
(Wang et al. 1999)
final scoring
(FRED, GLIDE, DOCK)
• Similarity-guided MD simulated annealing to improve accuracy (Wu
& Vieth 2004).
• Shape similarity & clustering to speed up conformational search in
docking (Makino & Kuntz 1998).
Better input or constrains for the existing docking engines
Computing Scoring Functions
• Point-based calculation:
• Sum terms computed at positions of ligand atoms
(this will be slow)
Computing Scoring Functions
• Grid-based calculation:
• Precompute “force field” for each term of scoring
function for each conformation of protein (usually only
one)
• Sample force fields at positions of ligand atoms
-> Accelerate calculation of scoring function by 100X
[Huey & Morris]
Consensus Scoring
• Typically evaluate the ranking of binding modes
measured with different scoring functions and favor
those that rank consistently high in several of them
• Reduces false positive rate
• Examples
–
–
–
–
SYBYL Cscore (Tripos) : FlexX, PMF, DOCK energy, GOLD score
C2 (Accelrys) : LigScore2, PLP, PMF, Ludi, Jain
FRED (OpenEye) : ChemScore, PB-SA, ChemGauss, PLP, ScreenScore
DOCK: AMBER FF, PMF, contact scores, ChemScore
Flexible ligand-search methods
Random/stochastic
• AutoDock (MC)
• MOE-Dock (MC,TS)
• GOLD (GA)
• PRO_LEADS (TS)
Systematic
• DOCK (incremental)24
• FlexX (incremental)50
• Glide (incremental)134
• Hammerhead (incremental)28
• FLOG (database)
Simulation
• DOCK
• Glide
• MOE-Dock
• AutoDock
• Hammerhead
Docking Software: Important Factors
• Sensitivity on and transferability of the parameters,
including the starting conformation
• Adaptability to additional scoring functions, pre- and/or
post- docking processing and filters
• Ability for iteratively refining docking parameter/protocol
based on new results
•
•
•
•
Design, components, and results of validation studies
Speed, user interface & control, I/O, structural file formats
User learning curve, customer supports, and cost
Code availability and upgrading possibility
Docking Softwares
DOCK 6.0 (Ewing & Kuntz 1997)
AutoDOCK 4.0 (Morris et al. 1998)
GOLD (Jones et al. 1997)
FlexX: (Rarey et al. 1996)
GLIDE: (Friesner et al. 2004)
ADAM (Mizutani et al. 1994)
CDOCKER (Wu et al. 2003)
CombiDOCK (Sun et al. 1998)
DIVALI (Clark & Ajay 1995)
DockVision (Hart & Read 1992)
FLOG (Miller et al. 1994)
GEMDOCK (Yang & Chen 2004)
Hammerhead (Welch et al. 1996)
LIBDOCK (Diller & Merz 2001)
MCDOCK (Liu & Wang 1999)
SDOCKER (Wu et al. 2004)
 de novo design tools
LUDI (Boehm 1992),
BUILDER (Roe & Kuntz 1995)
SMOG (DeWitte et al. 1997)
CONCEPTS (Pearlman & Murcko 1996)
DLD/MCSS (Stultz & Karplus 2000)
Genstar (Rotstein & Murcko 1993)
Group-Build (Rotstein & Murcko 1993)
Grow (Moon & Howe 1991)
HOOK (Eisen et al. 1994)
Legend (Nishibata & Itai 1993)
MCDNLG (Gehlhaar et al. 1995)
SPROUT (Gillet et al. 1993)
FRED (OpenEye www.eyesopen.com)
•
•
•
•
•
Systematic, nonstochastic, docking
Multiple active site comparisons
Multiple simultaneous scoring functions and hit lists
RMS clustering of hit-lists
Algorithm:
1. Exhaustive Docking
(a) Enumerate all possible poses of the ligand around the active site by rigidly rotating
and translating each conformer within the site.
(b) Filter the resulting pose ensemble by rejecting poses that do not fit within the
larger of the two volumes specified by the receptor file’s shape potential grid and
a contour level.
2. Systematic solid body optimization by Shapegauss, PLP, Chemgauss2, Chemgauss3,
CGO, CGT, Chemscore, OEChemscore or Screenscore
3. Rank poses via the Consensus Structure method and discard all but the top ranked
poses
DOCK 6.4
• Generates many possible orientations/conformations
of a putative ligand within a user-selected region of a
receptor structure
• Orientations may be scored using several schemes
designed to measure steric and/or chemical
complementarity of the receptor-ligand complex
• Evaluate likely orientations of a single ligand, or to rank
molecules from a database
• Search databases for DNA-binding compounds
• Examine possible binding orientations of proteinprotein and protein-DNA complexes
• Design combinatorial libraries
GOLD
•
GA method, H-bond matching, FF scoring (Jones et al. 1997)
– A configuration is represented by two bit strings:
1. The conformation of the ligand and the protein defined by the torsions;
2. A mapping between H-bond partners in the protein and the ligand.
– For fitness evaluation, a 3D structure is created from the chromosome
representation. The H-bond atoms are then superimposed to H-bond site
points in the receptor site.
– Fitness (scoring) function: H-bond, the ligand internal energy, the proteinligand van der Waals energy
•
Highlights:
– Full ligand flexibility
– Partial protein flexibility, including protein side chain and backbone flexibility
for up to ten user-defined residues
– A choice of GoldScore, ChemScore, Astex Statistical Potential (ASP) or
Piecewise Linear Potential (PLP) scoring functions
– GOLD's genetic algorithm parameters are optimised for virtual screening
applications
Hammerhead
• Focus on screening large databases of small molecules
• The algorithm is fast enough to allow screening of a library of
roughly 100.000 small organic compounds in a few days
• Empirical scoring function
• Start with automatic pocket finder
• Breaking ligands into fragments, and aligning each of these onto the
protein.
• At each stage of the fragment alignment computation, gradientdescent pose optimization improves the conformation and
alignment of the growing ligand
– Relaxing van der waals surface interpenetrations
– Improving hydrogen bond and hydrophobic surface contact geometries.
LUDI: Matching polar and hydrophobic groups
• Calculate protein and ligand interaction sites (H-bond or hydrophobic),
which are defined by centers and surface, from
– non-bonded contact distributions based on a search through the CSD,
– a set of geometric rules,
– the output from the program GRID (Goodford 1985) which calculates binding energies
for a given probe with a receptor molecule.
• Fit fragments onto the interaction sites.
– distance between interaction sites on the receptor
– an RMSD superposition algorithm,
– A hashing scheme to access and match surface triangles onto a triangle query of a ligand
interaction center.
– A list-merging algorithm creates all triangles based on lists of fitting triangle edges for
two of the three query triangle edges.
• Join/grow fragments using the databases of fragments and the same
fitting algorithm.
GLIDE (www.schrodinger.com)
• Funnel: site point search -> diameter test -> subset test -> greedy score ->
refinement -> grid-based energy optimization -> GlideScore.
• Approximates a complete systematic search of the conformational,
orientational, and positional space of the docked ligand.
• Hierarchical filters, including a rough scoring function that recognizes
hydrophobic and polar contacts, dramatically narrow the search space
• Torsionally flexible energy optimization on an OPLS-AA nonbonded
potential grid for a few hundred surviving candidate poses.
• The very best candidates are further refined via a MC sampling of pose
conformation.
• A modified ChemScore (Eldridge et al. 1997) that combines empirical and
force-field-based terms.
• Validation: 282 complexes, new ligand conformation, the top-ranked pose:
50%<1 A, ~33% >2 A.
GRAMM v1.03
• Protein-Protein Docking and Protein-Ligand
Docking
• exhaustive 6-dimensional search through the
relative translations and rotations of the
molecules.
• Empirical approach to smoothing the
intermolecular energy function.
• The quality of the prediction depends on the
accuracy of the structures.
CDOCKER & SDOCKER
 Randomly generate ligand seeds in the binding site
 High temperature MD using a modified version of CHARMM
 Locate minima from all of the MD simulations
 Fully minimization
 Cluster on position and geometry
 Rank by energy (interaction + ligand conformation)
 SDOCKER: X-ray structure of complex as templates to guide docking
Wu et al. 2003;
Wu et al. 2004.
Docking Webservers
Assessment of CAPRI Predictions 2009
ClusPro Webserver
• Fast rigid-body docking
• Ligand-Protein, Protein-Protein Docking
• Use FFT-based docking programs (DOT and
ZDOCK)
1) Rapidly filters docked conformations
2) Ranks the conformations using clustering of
computed pairwise RMSD values
• Desolvation and Electrostatic energies are
calculated
Haddock
• Driven by experimental knowledge (e.g., from
mutagenesis, mass spectrometry or a variety of NMR
experiments)
• Protein-Protein Docking server
• Supports nucleic acids
• Algorithm:
1.
2.
3.
Rigid-body Energy Minimization,
Semi-flexible Refinement In Torsion Angle Space
Final refinement in explicit solvent.
• The HADDOCK score : van der Waals, electrostatic,
desolvation and restraint violation energies together with
buried surface area
GRAMM-X
• Protein-Protein Docking server
• Use FFT for the global search of the best rigid
body conformations.
• Use a smoothed Lennard-Jones potential on a
fine grid
• Ability to smooth the protein surface to account
for possible conformational change
• The smoothing of the intermolecular energy
landscape is achieved by increasing potential
range and lowering the value of the repulsion
part
Softened Lennard-Jones potential function:
PatchDock and SymmDock Server
• Based on a rigid-body geometric hashing
algorithm
• Aim: Good molecular shape complementarity
yield
• Algorithm divides the Connolly dot surface
representation of the molecules into concave,
convex and flat patches.
• Then, complementary patches are matched in
order to generate candidate transformations.
• Each candidate transformation is further
evaluated by a scoring function that considers
both geometric fit and atomic desolvation energy.
PatchDock detects
transformations with
high shape
complementarity
SymmDock Server
SymmDock restricts its search to
symmetric cyclic transformations
of a given order n.
FireDock server
• Fast rigid-body docking algorithms
• Protein-protein docking
RosettaDock
protein-protein docking server
•
•
•
Computationally intensive approach incorporating models flexibility
Multi-start, multi-scale Monte Carlo based algorithm
Start with 1000 independent structures, and the server returns pictures,
coordinate files and detailed scoring information for the 10 top-scoring models
•
The low-resolution phase:
– Random rigid-body perturbations
– Scoring : residue–residue contacts and bumps, knowledge-based terms for residue
environment and residue–residue pair propensities and for antibody-antigen targets, a score
to favor interactions with antibody complementarity determining regions.
•
The high-resolution (all-atom, including hydrogens) phase
– Smaller rigid-body perturbations, sidechain optimization via rotamer packing and continuous
minimization, and explicit gradient-based minimization of the rigid-body displacement.
– Scoring: the energy is dominated by van der Waals energies , orientation-dependent hydrogen
bonding , implicit Gaussian solvation, side-chain rotamer probabilities and a low-weighted
electrostatics energy.
ZDOCK
HexServer
• In order to address the main limitations of the
Cartesian
• FFT approaches, we developed the ‘Hex’
spherical polar
• Fourier (SPF) approach which uses rotational
correlations
• (10), and which reduces execution times to a
matter of
• minutes
Bold entries in the first column correspond to programmes that can be run on a web
server.
(a) Refined with SMOOTHDOCK.
(b) Uses DOT or ZDOCK as search methods;
(c) Refined with RDOCK
Virtual Screening
• Drug discovery costs are too high: ~$800 millions, 8~14 years,
~10,000 compounds (DiMasi et al. 2003; Dickson & Gagnon 2004)
• Drugs interact with their receptors in a highly specific and
complementary manner.
• Core of the target-based structure-based drug design (SBDD) for
lead generation and optimization.
Lead is a compound that
– shows biological activity,
– is novel, and
– has the potential of being structurally modified for improved bioactivity,
selectivity, and drugeability.
Drug, Chemical & Structural Space
• Drug-like: MDDR (MDL Drug Data Report) >147,000 entries, CMC (Comprehensive
Medicinal Chemistry) >8,600 entries
• Non-drug-like: ACD (Available Chemicals Directory) ~3 million entries
• Literatures and databases, Beilstein (>8 million compounds), CAS & SciFinder
• CSD (Cambridge Structural Database, www.ccdc.cam.ac.uk): ~3 million X-ray crystal
structures for >264,000 different compounds and >128,00 organic structures
• Available compounds
– Available without exclusivity: various vendors (& ACD)
– Available with limited exclusivity: Maybridge, Array, ChemDiv, WuXi Pharma,
ChemExplorer, etc.
• Corporate databases: a few millions in large pharma companies
Docking to Nucleic Acid Targets
• RNA and DNA as potential drug targets
– Ribosome RNA structures (Agalarov et al. 2000; Ban et al. 2000; Filikov et al.
2000; Nissen et al. 2000; Wimberly et al. 2000)
• Highly charged environments, well-defined binding pocket
• DOCK identified compounds selectively bind to RNA duplexes or DNA
qudraplexes (Chen et al. 1996; Chen et al. 1997). The portions in the DOCK
suite that calculate electrostatics, including solvation, partial charges, and
scoring function were recently optimized for RNA targets (Downing et al.
2003; Kang et al. 2004).
• A MC minimization and an empirical scoring function which accounts for
solvation, isomerization free energy, and changes in conformational
entropy were used to rank compounds (Hermann & Westhof 1999).
Download