Projection of compounds into descriptor space Calculating Descriptors Open BindingDB Thrombin Inhibitors file: THR_BDB.mdb Compute\Descriptors: Choose “2D” as descriptor Class Enter “BCUT” as Filter, select and calculate all BCUT descriptors Principal Component Analysis Compute\Principal Components Select all BCUT descriptors Set Minimum Variance to 95% Click on “Report”, then OK New fields are created: PCA1 – PCA6 Showing a 3D Plot of first three PCs Select Fields PCA1-3 Compute\Analysis\3D Plot Activity: Thrombin_nM Click “Plot”, look at the plot in the main MOE window Click “Close” Descriptor based compound partitioning Partitioning using principal components MOE uses a partitioning scheme in order to assign cluster codes to molecules. Compute\Cluster Codes\Descriptor Based We have already done PCA Uncheck “De-correlate Descriptors” “Equiprobable subdivisions” creates equally populated partitions. Select PCA1-3 Set Code Count to 3. Thus, each axis will be divided into 3 parts, leading to 27 (3 x 3 x 3) possible cluster codes Click “OK” Select Field “$CLUSTER” Compute\Sort\Select Unique Entries $CLUSTER This will select 27 entries Visualize clusters: Compute\Analysis\3D Plot X,Y,Z: PCA1-3, Activity: $CLUSTER, Threshold: Unique Select Unique $CLUSTER (as described above) Entry\Hide unselected Look at the distribution in the main window Render\Ball and Stick Entry\Show all Calculation of a diverse subset Diverse Subset Diverse subsets are calculated based on Euclidean distances for each cluster. To determine, which unranked entry is farthest from all already-ranked entries, the distance between each unranked entry and each ranked entry is calculated. For each unranked entry, the minimum of its distances to each ranked entry is found. The entry with the largest such minimum distance is the farthest. Compute\Diverse Subset\Method: Descriptors Select PCA 1-6 Output limit: 0 (no limit) This will calculate a $DIVPRIO field. Entry\Select $DIVPRIO <= 2 (select two diverse representatives of each cluster) This will select 54 compounds representing the structural BCUT space covered by the whole thrombin antagonist set. Select the fields “mol” and “Thrombin_nM” File\Export MOE molecular database (mdb) Selected Fields only Selected entries only Export to THR_BDB_BCUT_subset.mdb Now, we have a diverse reference set of active Thrombin antagonists. Molecular Fingerprints Calculating Fingerprints Open the Thrombin reference set. Compute\Fingerprints FP:MACCS FP:GpiDAPH3 (keyed fingerprint, 166 keys) (2D Pharmacophore) Fingerprint Model A fingerprint model saves information about the fingerprint, the similarity metric and the search strategy to use. Create two separate fingerprint models for the MACCS and the GpiDAPH3 fingerprints. File\Fingerprint Model Score: Maximum (corresponds to nearest neighbor, meaningful since we have a diverse representative set) Save the two fingerprint models under THR_MACCS.fpt and THR_Gpi.fpt Multiple ligand similarity searching The two fingerprint models can be used in order to sort database compounds according to fingerprint Tanimoto similarity. Open Thrombin screening data: THR_screen.mdb Apply the two models: Compute\Model-Evaluate Model File: your .fpt file Field: $PRED_MACCS or $PRED_Gpi Evaluate the performance of each model: Sort descending by $PRED (Compute\Sort) Select first 100 entries Entry\Select\“and”\Active = 1 This gives the hit rate for each of the models. Focused library design using RECAP RECAP Analysis RECAP generates fragments by breaking bonds that are formed by common chemical reactions. Perform RECAP Analysis on the Thrombin screening dataset: Compute\RECAP\Analysis Activity Field: Active = 1 The Output Database, upon completion of the analysis, will contain the following fields: Field Description mol A molecule field containing a 2D depiction [Clark 2006] of each RECAP fragment discovered in the analysis. name A character field containing the extended SMILES name of each RECAP fragment discovered in the analysis. For example, a methoxy fragment that was part of an ether group would have the name [OH;ether]C. apo The number of attachment points in each RECAP fragment. freq The frequency of occurrence of each RECAP fragment. freq0 freq1 If activity information was specified then these fields will hold the frequency of occurrence of each RECAP fragment in inactives (freq0) and actives (freq1). The frequency field freq is the sum of freq0 and freq1. freqA If activity information was specified then freqA will contain an activity adjusted frequency value. This value is proportional to the joint likelihood Pr(active,fragment) which is calculated as Frequency(fragment) * Pr(active | fragment). This database can be used to generate a focused library: Compute\RECAP\Synthesis Output: THR_recapsynth.mdb Model File: THR_Gpi.fpt, Tc > 0.5 Database1: THR_screen_recap.mdb (created with analysis) Fragment Weight: freqA Database Weight: 2 Database2: $MOE/lib/recaplib.mdb Database weight: 1 The output database will contain the following fields: Field Description mol A molecule field containing a 2D depiction of each synthesized molecule that satisfies the filters and model threshold. lead A flag that is 1 if the molecule is lead-like according to the Oprea test. drug A flag that is 1 if the molecule is drug-like according to the Lipinski test. rings The number of rings in the molecule. don+acc The number of hydrogen bond donors and acceptors in the molecule. MW The molecular weight of the molecule. logS An estimate of the aqueous solubility of the molecule [Hou 2004]. logP An estimate of the octanol/water partition coefficient of the molecule [Wildman 1999]. reactive A flag that is 1 if the molecule contains a common reactive group. model The value of the predictive model specified in the Model Field. The database is redundant. Calculate SMILES and select unique SMILES: Compute\Molecule Names\SMILES Compute\Sort\Select Unique\SMILES Entry\Invert Selection Entry\Delete selected Exercise: Cluster the database using BCUT descriptors and select a diverse subset of < 100 molecules based on MACCS structural keys Combinatorial Library Enumeration Combinatorial library enumeration in MOE takes an .mdb file with scaffolds and several .mdb files with R groups as input and enumerates all molecules by attaching R groups to predefined attachment points in the scaffold. The MOE tutorial can be found in the moe directory at moe/html/combi/cdgen.htm Diverse Combinatorial Library Instead of first enumerating the library and the running a diverse subset calculation, the combinatorial space can be sampled directly in order to assemble a diverse subset. MOE accomplishes this in two major steps: 1. Enumeration of a small subset (default is 100 molecules) in order to calculate principal component space. 2. Random sampling of combinatorial space and simulated annealing in order to create a diverse subset. For a subset of 50 compounds, the procedure is: a. Initially, select 50 random molecules from combinatorial space. b. Iteration: exchange a random set molecule for a random new molecule from combinatorial space. c. Calculate the “energy” of the system as the average entropy over all principal components. d. Accept the change based on a Metropolis Monte Carlo criterion (is the new energy smaller than a function of the current energy, temperature, and a random number?) The simulated annealing procedure allows avoidance of local minima, because in the beginning, higher energetic states are allowed. By successively lowering the temperature, the system is restricted to accepting new energies only if they are lower than the current energy. The entropy-based energy term calculation accounts for homogeneous coverage of principal component space. This yields a diverse subset without the necessity of enumerating the entire combinatorial library. The tutorial can be found in the moe directory at moe/html/quasar/qcombi.htm SAR Viewer (new in MOE 2008.10) Open cyp450.mdb Compute\SAReport Select pot_uM as the potency field, a destination folder, and generate the report The report can be viewed in a web browser.