Calculating Descriptors

advertisement
Projection of compounds into descriptor space
Calculating Descriptors
Open BindingDB Thrombin Inhibitors file: THR_BDB.mdb
Compute\Descriptors:
Choose “2D” as descriptor Class
Enter “BCUT” as Filter, select and calculate all BCUT descriptors
Principal Component Analysis
Compute\Principal Components
Select all BCUT descriptors
Set Minimum Variance to 95%
Click on “Report”, then OK
New fields are created: PCA1 – PCA6
Showing a 3D Plot of first three PCs
Select Fields PCA1-3
Compute\Analysis\3D Plot
Activity: Thrombin_nM
Click “Plot”, look at the plot in the main MOE window
Click “Close”
Descriptor based compound partitioning
Partitioning using principal components
MOE uses a partitioning scheme in order to assign cluster codes to molecules.
Compute\Cluster Codes\Descriptor Based
We have already done PCA
Uncheck “De-correlate Descriptors”
“Equiprobable subdivisions” creates equally populated partitions.
Select PCA1-3
Set Code Count to 3. Thus, each axis will be divided into 3 parts, leading to 27
(3 x 3 x 3) possible cluster codes
Click “OK”
Select Field “$CLUSTER”
Compute\Sort\Select Unique Entries $CLUSTER
This will select 27 entries
Visualize clusters:
Compute\Analysis\3D Plot
X,Y,Z: PCA1-3, Activity: $CLUSTER, Threshold: Unique
Select Unique $CLUSTER (as described above)
Entry\Hide unselected
Look at the distribution in the main window
Render\Ball and Stick
Entry\Show all
Calculation of a diverse subset
Diverse Subset
Diverse subsets are calculated based on Euclidean distances for each cluster. To
determine, which unranked entry is farthest from all already-ranked entries, the
distance between each unranked entry and each ranked entry is calculated. For each
unranked entry, the minimum of its distances to each ranked entry is found. The entry
with the largest such minimum distance is the farthest.
Compute\Diverse Subset\Method: Descriptors
Select PCA 1-6
Output limit: 0 (no limit)
This will calculate a $DIVPRIO field.
Entry\Select
$DIVPRIO <= 2
(select two diverse representatives of each cluster)
This will select 54 compounds representing the structural BCUT space covered by the
whole thrombin antagonist set.
Select the fields “mol” and “Thrombin_nM”
File\Export
MOE molecular database (mdb)
Selected Fields only
Selected entries only
Export to THR_BDB_BCUT_subset.mdb
Now, we have a diverse reference set of active Thrombin antagonists.
Molecular Fingerprints
Calculating Fingerprints
Open the Thrombin reference set.
Compute\Fingerprints
FP:MACCS
FP:GpiDAPH3
(keyed fingerprint, 166 keys)
(2D Pharmacophore)
Fingerprint Model
A fingerprint model saves information about the fingerprint, the similarity metric and
the search strategy to use. Create two separate fingerprint models for the MACCS and
the GpiDAPH3 fingerprints.
File\Fingerprint Model
Score: Maximum (corresponds to nearest neighbor, meaningful since we have
a diverse representative set)
Save the two fingerprint models under THR_MACCS.fpt and THR_Gpi.fpt
Multiple ligand similarity searching
The two fingerprint models can be used in order to sort database compounds
according to fingerprint Tanimoto similarity.
Open Thrombin screening data: THR_screen.mdb
Apply the two models:
Compute\Model-Evaluate
Model File: your .fpt file
Field: $PRED_MACCS or $PRED_Gpi
Evaluate the performance of each model:
Sort descending by $PRED (Compute\Sort)
Select first 100 entries
Entry\Select\“and”\Active = 1
This gives the hit rate for each of the models.
Focused library design using RECAP
RECAP Analysis
RECAP generates fragments by breaking bonds that are formed by common chemical
reactions.
Perform RECAP Analysis on the Thrombin screening dataset:
Compute\RECAP\Analysis
Activity Field: Active = 1
The Output Database, upon completion of the analysis, will contain the following
fields:
Field
Description
mol
A molecule field containing a 2D depiction [Clark 2006] of each RECAP
fragment discovered in the analysis.
name
A character field containing the extended SMILES name of each RECAP
fragment discovered in the analysis. For example, a methoxy fragment that
was part of an ether group would have the name [OH;ether]C.
apo
The number of attachment points in each RECAP fragment.
freq
The frequency of occurrence of each RECAP fragment.
freq0
freq1
If activity information was specified then these fields will hold the frequency
of occurrence of each RECAP fragment in inactives (freq0) and actives
(freq1). The frequency field freq is the sum of freq0 and freq1.
freqA If activity information was specified then freqA will contain an activity
adjusted frequency value. This value is proportional to the joint likelihood
Pr(active,fragment) which is calculated as Frequency(fragment) *
Pr(active | fragment).
This database can be used to generate a focused library:
Compute\RECAP\Synthesis
Output: THR_recapsynth.mdb
Model File: THR_Gpi.fpt, Tc > 0.5
Database1: THR_screen_recap.mdb (created with analysis)
Fragment Weight: freqA
Database Weight: 2
Database2: $MOE/lib/recaplib.mdb
Database weight: 1
The output database will contain the following fields:
Field
Description
mol
A molecule field containing a 2D depiction of each synthesized molecule
that satisfies the filters and model threshold.
lead
A flag that is 1 if the molecule is lead-like according to the Oprea test.
drug
A flag that is 1 if the molecule is drug-like according to the Lipinski test.
rings
The number of rings in the molecule.
don+acc The number of hydrogen bond donors and acceptors in the molecule.
MW
The molecular weight of the molecule.
logS
An estimate of the aqueous solubility of the molecule [Hou 2004].
logP
An estimate of the octanol/water partition coefficient of the molecule
[Wildman 1999].
reactive
A flag that is 1 if the molecule contains a common reactive group.
model
The value of the predictive model specified in the Model Field.
The database is redundant. Calculate SMILES and select unique SMILES:
Compute\Molecule Names\SMILES
Compute\Sort\Select Unique\SMILES
Entry\Invert Selection
Entry\Delete selected
Exercise:
Cluster the database using BCUT descriptors and select a diverse subset of < 100
molecules based on MACCS structural keys
Combinatorial Library Enumeration
Combinatorial library enumeration in MOE takes an .mdb file with scaffolds and
several .mdb files with R groups as input and enumerates all molecules by attaching R
groups to predefined attachment points in the scaffold.
The MOE tutorial can be found in the moe directory at moe/html/combi/cdgen.htm
Diverse Combinatorial Library
Instead of first enumerating the library and the running a diverse subset calculation,
the combinatorial space can be sampled directly in order to assemble a diverse subset.
MOE accomplishes this in two major steps:
1. Enumeration of a small subset (default is 100 molecules) in order to calculate
principal component space.
2. Random sampling of combinatorial space and simulated annealing in order to
create a diverse subset. For a subset of 50 compounds, the procedure is:
a. Initially, select 50 random molecules from combinatorial space.
b. Iteration: exchange a random set molecule for a random new molecule
from combinatorial space.
c. Calculate the “energy” of the system as the average entropy over all
principal components.
d. Accept the change based on a Metropolis Monte Carlo criterion (is the
new energy smaller than a function of the current energy, temperature,
and a random number?)
The simulated annealing procedure allows avoidance of local minima, because in the
beginning, higher energetic states are allowed. By successively lowering the
temperature, the system is restricted to accepting new energies only if they are lower
than the current energy. The entropy-based energy term calculation accounts for
homogeneous coverage of principal component space. This yields a diverse subset
without the necessity of enumerating the entire combinatorial library.
The tutorial can be found in the moe directory at moe/html/quasar/qcombi.htm
SAR Viewer (new in MOE 2008.10)
Open cyp450.mdb
Compute\SAReport
Select pot_uM as the potency field, a destination folder, and generate the report
The report can be viewed in a web browser.
Download