Schrödinger Workshop 2013 Structure Based Virtual Screening - Various Approaches Jas Bhachoo Schrodinger Senior Applications Scientist Your Files for Today • 4 main directories • Open the latest *prjzip file for pre-generated results – E.g. /Ligand_Preparation • Raw files for import are all also in each dir Structure Based Virtual Screening • Virtual screening is a cost-effective early stage lead generation method • How do we design a successful structure based virtual screening campaign? The road to success : What we have found Based on experiences from our Drug Design Team who are dedicated to working with Customers on new projects.... • • Careful protein preparation – Know your target • Pilot screening – Know the best combination of constraints and scores • • • Screening Careful ligand preparation – Enumerate states and conformations Post-screening processing Purchase and assay Questions we will be Asking Today •Do we have a crystal structure or an homology model? •Is the target drug-able? •What is the quality of the model - electron density? •Understanding the binding site: big small buried pockets general-properties? •Are there known ligands for this target, and how can we use that information? •How to screen compounds •How to improve the quality of screened outputs? •How to filter and cluster the output from screen to manageable numbers for synthesis or purchase Putting everything together– Tailored Protocols for XYZ Protein v Databases 1. CACDB2010 lead/drug-like set. 2. Phase mining, with multiple hypotheses, of CACDB phase database using ABCDE as queries (shape similarity > 0.6 or top 0.5%).* 3. Fingerprint-based similarity search of whole CACDB using HTS hits as queries (Tanimoto >= 0.6 or top 0.5%).* v ABCDE pocket of XYZ 1. Receptor of ABCDE site with HB to Res123 (-C=O and – NH) and/or Res 122 (-NH) on chain A. 2. Receptor of ABCDE site with HB to Res123 (-C=O and – NH) and/or Res 122 (-NH) on chain B. 3. Receptor of ABCDE site with HB to Res123 (-C=O and – NH) on both chain A and B v Glide HTVS •Two conformations of ligands for SP docking •Glide shows dependency on input conformations (25 % top scoring) v Glide SP* •Three conformations of ligands for XP docking •ConfGen/MM/multiple FFs (15 % top scoring) *: Combined structures taken directly to XP v Glide XP* (post-processing of ensemble) 1. 2. 3. 4. Mine for poses with desirable interactions, i.e., hydrogen bonding. Rescore with Epik state and strain penalties Take top scoring 5000 to next step for visualization Select molecules for sourcing with help of clustering tools Target structure preparation, prediction and characterization Problems with PDB structures XP GlideScore = -6.9 kcal/mol Lys58 rotamer χ1 = -73.5 χ2 = +179.2 χ3 = +154.2 χ4 = +85.4 Extremely rare Lys rotamer Crystallographic refinement (PrimeX) XP GlideScore = -8.7 kcal/mol Lys58 rotamer χ1 = -69.8 χ2 = -178.4 χ3 = +178.2 χ4 = -179.9 Most common Lys rotamer Challenges in homology modeling • Accurate alignment – High sequence identity is straight forward and can produce high quality structures – Low sequence identity requires assistance and manual editing – Depends on human intervention, experimental data • Model refinement – – – – Side chain conformation Loop conformation Binding site conformation Depends strongly on the quality of the force field High Resolution Protein Structure Prediction: Comparative Modeling Query sequence Blast/PSI-Blast Align Query/Template Alignment Refined protein structure Template(s), PSI-Blast Profile (PSSM), Query Secondary Structure Predictions, Multiple Structure Alignment Profile Build Refine Homology model Validate Protein Report Alignments of GPCRs: Example • Sequence alignment between human β2-adrenergic and human Melaninconcentrating hormone-1 sequences using the ‘Align GPCR’ mode – ‘Align GPCR’ mode correctly aligned all helices, including the challenging helix 5, without any user intervention. All gaps in the alignment are located in the intracellular and extracellular loop regions, and not in TM regions Representation of loops - Variable Dielectric Constants qi q j qi q j 1 1 1 1 Ges = ∑ − ( − )∑ 2 i < j rij ε in (ij ) 2 ε in (ij ) ε sol ij f GB ε in (ij ) = max(ε in (i ) , ε in ( j ) ) continuum solvent ε=80 ε=2 ε=4 Reparametrization of internal dielectric constants for charged side chains: Lys = 4 Glu = 3 Asp = 2 Arg = 2 His = 2 Others = 1 ε=1 Improvement of solvation model impacts on loop prediction accuracy RMSD (Å) Number of Cases Uniform Dielectric Variable Dielectric Uniform + Hydrophobic Variable + Hydrophobic 6 residue 99 0.48 0.40 0.46 0.41 8 residue 65 0.84 0.79 0.76 0.74 10 residue 41 1.27 0.73 1.05 0.76 13 residue 35 2.73 1.62 1.29 1.08 Tips and tricks • • Induced fit docking to generate bioactive conformations • Molecular dynamics simulations to measure the stability of your protein model Prime side chain predictions, Macromodel side chain conformation search and hand tweaks if needed Screening 1 • Ligand Based Methods Putting everything together– Tailored Protocols for XYZ Protein v Databases 1. Screening_Inital_Data.SDF 2. Canvas > Filter using Properties > Filtered Set for 3D Screeing.mae 3. Convert to 3d with chemical enumeration 4. Prepare the protein – chemically accurate and minimised 5. Characterise the protein 6. Dock xtaly bound ligand for initial test 7. Dock 3d prepared ligand dataset v ABCDE pocket of XYZ 1. Create e-pharm use for ligand based screening of FXA_db 2. Shape based searching using xtal ligand > searching FX_db v Glide HTVS (25 % top scoring) v Glide SP* (15 % top scoring) v Glide XP* (post-processing of ensemble) 1. 2. 3. 4. Mine for poses with desirable interactions, i.e., hydrogen bonding. Rescore with Epik state and strain penalties Take top scoring 5000 to next step for visualization Select molecules for sourcing with help of clustering tools Ligand preparation 2D Perspective • Smart filtering of your screening deck for optimal druglike and leadlike properties (REOS, Ligparse, Ligfilter) 3D Perspective • Chemically accurate ligand structures (tautomers, ionization states, stereoisomers..) • Multiple input ligand conformations – Confgen and MacroModel • Estimate state penalties Fingerprints in Canvas • • Fingerprints are defined by fragments used Rules of making fragment are different – how you define the path through the molecule, linearly or via torsions, or through pre-defined libraries like MACCS specific Available More • MACCS and Custom MACCS & Custom are fragment based Torsion Pairwise Triplet Linear (default) ‘Daylight’ Dendritic Molprint2D Radial ‘Circular SciTegic’ Others are based on topology (exhaustive) High Performance Chemical Spreadsheet • • • Structures/properties retrieved from SQLite database as needed Scroll smoothly through 106 compounds, hundreds of columns Create custom views from sorting, filtering, and chart selections Fingerprints in Canvas • Linear • Dendritic • Radial • MOLPRINT2D • Pairwise • Triplet • Torsion • MACCS keys Exercise 1 Fast Screening Using 2D Approaches • (.../Cheminformatics) Create a Canvas project and import ’FXA_all_initial_data.sdf’ / *ligprep.out – Note you may want to start from different points • • 2D filtering > 3D preparation > 2D Filtering ........ • 3D preparation > Shape filtering > 2D Filtering ........ Generate molecular properties – Applications -> Molecular properties • • Incorporate the results Filter by properties using – Data -> Property Filter – Scatter plots • • Similarity Searches if you have data on known ligands Clustering data with different Clustering methods Path dependent fingerprint methods • Linear fingerprint – codes for all linear path up to 7 bonds – codes up to 14 bonds for ring closures • Dendritic fingerprint – codes for branches up to 5 bonds Circular fingerprint methods • Radial fingerprint (Extended Connectivity fingerprint) – generated by fragmenting a structure into pieces that grow radially from each heavy atom over a series of iterations (4 by default) – Each atom identified by its atom type and connecting bond types. • MOLPRINT2D fingerprint – each heavy atom in a structure is characterized by an environment that consists of all other heavy atoms within a distance of two bonds Pairwise, Triplet and Torsion fingerprints • Pairwise fingerprint – two atom types and the distance separating them: Typei-Typej-dij. • Triplet fingerprint – three atoms and the distances separating them • Torsion fingerprint – every fragment consists of a linear path of four atoms that are differentiated by type What are the recommendations? • There is no single best setting for all targets and query molecules. • • Pairwise and Triplet methods exhibit size dependency. Without prior knowledge about the performance of fingerprint methods for a target, the best choice can be – MOLPRINT2D, Dendritic – Fingerprint combination – Fingerprint averaging: modal fingerprint • More specific atom typing schemes (Daylight, Mol2, Carhart) are best but probably less suited for lead hopping Sastry, M.; Lowrie, J.F.; Dixon, S.L.; Sherman, W., "Large-Scale Systematic Analysis of 2D Fingerprint Methods and Parameters to Improve Virtual Screening Enrichments," J. Chem. Inf. Model., 2010, 50, 771–784 Exercise 2 Preparing 3D Ligands for 3D Screening • (.../Ligand preparation) In Maestro import a simple example of starting ligands – Import 2D_variations.sdf – In the first structure note, it has two ionisable groups, an ammonium counter ion and there are three chiral centres (two marked) • Run LigPrep – Default options – Start and Append new entries as a new group • Observe results in Maestro – Tile and label the structures to see them individually – In the first structure note, carboxylate is unprotonated, pyridine is both protonated and unprotonated, the variety of R/S chiralities • In Maestro import FXA_ligprep-out.mae – Only import the first few ligands using the Advanced options. We do not need to see the entire file as it is very large. Conformer Generation • Accurate and efficient bioactive conformational searching – It is important to reproduce bioactive ligand geometries in minimally sized conformer sets, and still obtain accuracy and coverage. – Imatinib shown colored according to how ConfGen treats various parts of the molecule. Rotatable torsions are rendered in yellow, templated rings in green carbon atoms, and rigid rings/bonds are in cyan. http://www.schrodinger.com/productpage/14/26/ ConfGen white paper and recent publications Screening 2 Structure Based Methods Background to Xtalographically obtained PDB files • Most protein structures solved by X-ray crystallography have a drawback that becomes apparent as soon as the structure is used for molecular simulations and related applications: the electron density traces the shape of the molecules, but does not really permit to identify hydrogen atoms or distinguish the heavier elements C, N and O. Consequently ambiguities arise if groups of atoms can be rotated without affecting the overall shape. • If a molecular simulation is run with incorrectly oriented or protonated sidechains, the protein stability can be reduced significantly, in the worst case the protein may even fall apart. The only way to resolve the issue is to infer the correct orientations and protonation patterns from the chemical environment, most importantly the hydrogen bonding possibilities. Since several of the critical side-chains are often found in close contact, a choice made for one side-chain immediately influences others, giving rise to a hydrogen bonding network that must be optimized in one shot. http://www.yasara.org/hbondnet.htm • Typical examples in proteins are the side-chains of asparagine and glutamine, whose terminal amide group can be rotated by 180 degrees with almost no impact on the electron density. • The same applies to the imidazole ring of histidine, which can additionally adopt three different pH dependent protonation patterns, giving rise to six different states, that can hardly be distinguished based on the electron density alone. • Aspartates and glutamates can adopt three different states (negatively charged or neutral with the hydrogen on either of the two terminal side-chain oxygens), with the neutral states being mostly important for buried residues with strongly shifted pKa values. ASN, GLN, ASP, GLU, HIS residues pKa values of amino group, carboxyl group and a few R groups. This information tells us in a neutral solution: • The carboxyl group is most likely negatively charged. • The amino group is most likely positively charged • The R group of aspartate and glutamate are most likely negatively charged. • The R group of lysine and arginine are most likely positively charged. • The R group of tyrosine at pH = 7 is most likely neutral. • The R group of histidine has 10% probability to become positively charged at pH = 7, but the probability increases to 50% at pH = 6. Thus, histidine is very sensitive to pH change in the physiological range. http://www.web-books.com/MoBio/Free/Ch2A4.htm The Importance Of Protein Preparation • Almost all protein structures require some sort of remediation before they can be used in drug-design. – Protonation. • Most structures come from X-ray crystallography. As protons don’t show up well in X-ray experiments they are normally missing from structures and need to be added. – Missing side-chains. • Any side-chain which is too mobile will not diffract well and will not be visible in the electron density. Simply ignoring this side-chain may not be a good idea as any ligand may well interact with the sidechain and cause it to adopt a fixed position. – Missing loops. • Similar to the above situation. However in this case whole residues can be missing from the final structure. – Counterions/random small molecules/waters. • The crystallisation media will often contain other counterions and small molecules along with water. These frequently show up in the final structure. Sometimes these species reveal important information (particularly water molecules), but in many cases they need to be removed. – Bonding/ionisation/tautomerisation. • Crystallography only provides the atomic coordinates of the structure. The bonding information needs to be added manually. For standard amino-acids this is trivial, however other species such as ligands and cofactors will need to be edited manually. Related to this the ionisation state and tautomerisation state of any non-standard species present will need to be assigned. Protein preparation wizard: Prepare and repair PDB structures The preparation wizard (under Applications >) helps us through this process of… • Cleaning up raw PDB files – – – – Assign bond order Add hydrogen atoms Delete unwanted part of the system Optimize the hydrogen bond networks (flip of residues like ASN, GLN, tautomer determination: HIE, HID or protonation state HIP…) – Remove putative clashes in your structure (ideally with diffraction data) • Dealing with missing information – Important side-chains are missing – Important loops are missing Exercise 3 (.../Virtual Screening) Preparing a PDB Structure for Virtual Screening • Download 1FJS structures in Protein preparation wizard. – Extra: go to EDS (http://eds.bmc.uu.se/eds/) and download the CNS format map (2mFo-DFc) for 1FJS; Examine the electron density – Notes on electron density: do the residues ligand protein water sit in the electron density or is there an anomoloy? ... • • Prepare 1FJS Set up Glide grids (Applications -> Glide -> Receptor grid generation) Understanding Sigma values • The sigma level of an electron density map refers to the standard deviation. After taking a fourier transform of your data and refining it through a variety of methods to find the phases, you're left with an electron density map ρ(x,y,z), which describes the intensity of the electron density at each point in real space. This reflects the fact that regions in space with a higher electron density will scatter more X-rays, although this is not what you measure directly. Once the mean and the standard deviation (σ) of the intensity across the entire map are calculated, the intensity of every point in ρ(x,y,z) can be described in standard deviation (σ) units away from the mean. • The "sigma level" is the contour level or cut-off point in the intensity of any particular 3D representation of the electron density represented in standard deviation units. For example a sigma level of 1 would show density only for values in ρ(x,y,z) which have intensities greater than one standard deviation unit above the mean. Lowering the sigma level will give you a 3D map with more density, but if it's too low you won't be certain that what you're seeing is real electron density and not noise and it will be harder to fit your molecule into it. Raising the sigma level will reduce the noise, but if you raise it too high you will eliminate some of your real data and see gaps in the density at positions where actual atoms happened to have measured intensities lower than the sigma level. Flexible portions of the protein, even when crystallized, will be the first regions of the electron density to go as the sigma level is raised. Of course, the sigma level just reflects cosmetic changes to what you see on the screen; your full dataset with the intensities for every point is unaffected. Contour levels are just a convenient way of representing an extra dimension of intensity in the three dimensional space. http://www.quora.com/What-does-the-sigma-level-refer-to-in-electron-density-mapping# Different Sigma Levels in RNase A • Here is the electron density for the RNase A unit cell at four different contour levels in standard deviation units above the mean: 0.5σ, 1σ, 2σ, and 3σ. Lowering the sigma level will increase the noise Raising the sigma level will reduce the noise (visualized in Coot) http://www.quora.com/What-does-the-sigma-level-refer-to-in-electron-density-mapping# Characterize the binding pocket - Sitemap • Early stage analysis tool – Summarises key parts of the protein structure • • • • Find potential binding sites Characterize known binding sites Is the binding site drug-able? Potential binding sites characterized by – Hydrophobic, hydrophilic, hbond donor/acceptor isosurfaces, volume etc – Scoring used to determine drug-ability – Site points can be used to define Glide grids (treat as ligand entry) • Validated for site “druggability” – Halgren, T., "New Method for Fast and Accurate Binding-site Identification and Analysis", Chem. Biol. Drug Des., 2007, 69, 146–148. – Halgren, T., "Identifying and Characterizing Binding Sites and Assessing Druggability”, J. Chem. Inf. Model., 2009, 49, 377–389. SiteMap Feature Detection Sites identified by SiteMap can easily be used to set up Glide grids for virtual screening experiments. Thrombin (1ett) Druggability Dataset Druggable Prodrug/transporter Undruggable ACE-1 Aldose reductase cAbl kinase CDK2 Cyclooxygenase 2 DNA gyrase B EGFR kinase Enoyl reductase Factor X Fungal Cyp51 HIV RT (NNRTI site) HIV-1 Protease HMG CoA reductase MDM2 P38 kinase PDE 4D PDE 5A Thrombin Acetylcholinesterase Cathepsin K Thrombin PTP-1B Neuraminidase Caspase 1 (ICE-1) IMPDH HIV integrase Penicillin binding protein HIV RT (nucleotide site) -diverse set of pharmaceutically relevant targets -widely used in benchmark studies to study the properties of binding sites SiteMap Druggability Results MAPPOD is from Cheng et al. Undruggable Difficult Exercise 4 (.../Virtual Screening) Property Mapping the Xtal Structure • Run Sitemap for 1FJS – Try ”Evaluate...” Tasks. • Analyse the results. Where are the hydrophobic areas and polar areas? * Is the target druggable? • Sitemap has been parameterized such that : Rule of thumb SiteMap of IFJS • The active site of factor Xa is divided into four sub pockets as S1, S2, S3 and S4. The S1 subpocket determines the major component of selectivity and binding. The S2 sub-pocket is small, shallow and not well defined. It merges with the S4 subpocket. The S3 sub-pocket is located on the rim of the S1 pocket and is quite exposed to solvent. The S4 sub-pocket has 3 ligand binding domains, namely the “hydrophobic box”, the “cationic hole” and the water site. • Factor Xa inhibitors generally bind in an L-shaped Conformation. One group of the ligand occupies the anionic S1 pocket lined by residues Asp189, Ser195, and Tyr228, and another group of the ligand occupies the aromatic-S4 pocket lined by residues Tyr99, Phe174, & Trp228. • Typically, a fairly rigid linker group bridges these two interaction sites. Tips and tricks • • • • Make sure your structure is chemically accurate! • Molecular dynamics simulations to understand flexibility of the structure • Molecular dynamics simulations to measure the stability of your protein model Use multiple structures, even different chains from same PDB Induced fit docking to generate bioactive conformations Prime side chain predictions, Macromodel side chain conformation search and hand tweaks if needed Pre-screen • Pilot screens to evaluate performance of different combinations of constraints (EFs; GlideScores; Chemical matter ‘eyeballing for a motif’). • Check if co-crystallised ligand can be docked with Standard Docking. If large RMS to native. Find out why: – Are there any crystal mates? – Constraints, multiple protein input conformations (see VSW GUI for Glide), QPLD • Screen your database! Glide Overview • The conformations then enter the ‘Glide Filter’ – This is a series of hierarchical filters which are used to rapidly eliminate poses of the ligand which cannot correspond to a welldocked solution. Ligand conformations • • • • Protein & ligand preparation Calculate Coulomb & vdW grids Docking algorithm 3 Modes: • HTVS 3-5 secs/lig • SP 30-50secs/lig • XP 3-5mins/lig Site-point Search Diameter Test/Subset Test Greedy Score Refinement Minimisation H N core O sidechain group S N sidechain group center diameter O- O Final Score Top hits Additional Information: Glide Implementation Details • The value of GlideScore is determined as follows: 0.065Ecoul + 0.130EvdW + ELipo + EHBond + GlideScore = EMetal + PBuryP + PRotB + Site • The PBuryP is a penalty term for burying polar functionality in a hydrophobic environment. • • The PRotB is a penalty term for freezing rotatable bonds. The Site term rewards polar, but non-hydrogen bonding interactions in the site. Glide SP: Enrichment Summary Average enrichment in recovering actives in top 1% of decoys (Average of 65 systems) EF(1%) • • • • Glide: A New Approach for Rapid, Accurate Docking and Scoring. 1. Method and Assessment of Docking Accuracy. R. A. Friesner, J. L. Banks, R. B. Murphy, T. A. Halgren, J. J. Klicic, D. T. Mainz, M. P. Repasky, E. H. Knoll, M. Shelley, J. K. Perry, D. E. Shaw, P. Francis, and P. S. Shenkin. J. Med. Chem. 2004, 47, 1739-1749 Glide: A New Approach for Rapid, Accurate Docking and Scoring. 2. Enrichment Factors in Database Screening. T. A. Halgren, R. B. Murphy, R. A. Friesner, H. S. Beard, L. L. Frye, W. T. Pollard, and J. L. Banks. J. Med. Chem. 2004, 47, 1750-1759 Extra Precision Glide: Docking and Scoring Incorporating a Model of Hydrophobic Enclosure for Protein-Ligand Complexes. Friesner, R. A.; Murphy, R. B.; Repasky, M. P.; Frye, L. L.; Greenwood, J. R.; Halgren,T. A.; Sanschagrin, P. C.; Mainz, D. T., J. Med. Chem., 2006, 49, 6177–6196. Comparative Performance of Several Flexible Docking Programs and Scoring Functions: Enrichment Studies for a Diverse Set of Pharmaceutically Relevant Targets. Zhou, Z.; Felts, A. K.; Friesner, R. A.; Levy, R. M., J. Chem. Inf. Model., 2007, 47, 1599–1608. Exercise 5 Virtual Screening: Docking and Visualising Poses • Generate the Glide Grid for 1fjs – Use the fully prepared protein and co-crystralised ligand as starting point for Glide > Receptor Grid Generation – Define the ligand inside the Grid panel – Start the job (1-2 minutes) – ’1fjs-grid-2013.zip’ is the pre-generated output • Dock the 1fjs ligand using this Grid file – In Glide > Ligand Docking, Settings tab > browse for the 1fjs grid file, choose SP mode – In Ligands tab > choose selected entry and ensure the ’1fjs ligand only’ is highlited in the Project Table – Start the job – ’Selfdock-1fjs-sp-pv.mae’ is the pre-generated output • Use Maestro to view the result(s)... Overlay Sitemap result! VSW Interface for automated Glide docking • Docking the xtal ligand is the first step to running a full docking study which may begin with millions of molecules or a few thousand. • Glide Virtual Screening Workflow is powerful interface for setting up a series of screens that iteratively run through the various precisionlevels of docking (HTVS>SP>XP) and the interface is especially useful for including an ensemble of receptor structures. • Applications > Glide > VSW Post processing methods: What do I do with 1000s hits? • Pose Filter based on known interactions (Script menu) • Filter poses based on pharmacophore (Phase) More Post-processing • • Filter based on Strain Rescore (Script menu) Calculating Prime MMGBSA or Macromodel Embrace Du, J.; Sun, H.; Xi, L.; Li, J.; Yang, Y.; Liu, H.; Yao, X., "Molecular modeling study of checkpoint kinase 1 inhibitors by multiple docking strategies and Prime/MM-GBSA calculation," J. Comp. Chem., 2011, 32(13), 2800-2809 Lyne, P. D.; Lamb, M. L.; Saeh, J. C., "Accurate Prediction of the Relative Potencies of Members of a Series of Kinase Inhibitors Using Molecular Docking and MM-GBSA Scoring," J. Med. Chem., 2006, 49, 4805-4808 Selection of Hits • If resources are limited, only the most diverse ligands are selected for experimental measurements – Upcoming Exercise: Run this script on VSW results output (96 - VSW 1FOR results) • Use fingerprints to cluster hits and choose cluster representatives – Scripts> Cheminformatics > Canvas Similarity & Clustering… Structural Interaction Fingerprints • • Original publication: Chuaqui et al., J. Med. Chem. 47 (2005) 121-133 (Biogen) Basic algorithm – Begin with pre-docked poses – For each ligand, generate fingerprint based on types of contact with the receptor • The bit string is translated into a matrix/plot where the ‘1’s’ and ‘0’s’ can be easily interpreted (next slide) – Use the fingerprints for filtering, similarity searching, and clustering Structural Interaction Fingerprints Residue 1 Residue 2 1 0 1 0 0 0 0 0 1 1 1 0 1 0 1 0 0 0 1 1 1 compound n 1 0 0 0 1 1 0 1 0 1 0 0 0 0 0 1 1 0 1 1 0 0 0 0 1 0 1 0 1 0 0 1 0 0 0 1 0 0 0 0 pSIFt 0.8 0.6 0 0.1 0.2 0.5 0 0 1 0 0 0 1 0 1 0 0.1 0.2 0.4 0.1 0.1 0.2 0.3 0.1 0.2 bit1 = contact bit2 = backbone bit3 = side chain bit4 = polar bit5 = hydrophobic bit6 = HB acceptor bit7 = HB donor bit8 = aromatic (addition to paper) bit9 = charge (addition to paper) 0 … … … 1 0 1 1 0 1 0 0 0 0 0 1 1 0 0 0 1 1 0 1 0 1 0 0 0 1 1 1 0 0 0 1 0 1 0 0 1 0.1 … 1 0 1 … compound 1 compound 2 compound 3 Residue N 0.8 0.6 0 1 0.1 0.2 0.5 Structural Interaction Fingerprints (Demo or Exercise) • • Calculate Fingerprints Visualize contacts with Interaction Matrix Interactive Analysis of Contacts • Picking in the matrix displays the residue and the interaction Interactive Distance Matrix Exercise 6 Post-Docking Analysis; SIFTs and Clustering Using the Workspace Style toolbar is an easy way to visualize the docked poses, along with ’view poses’ option with a ’right click’ to the Group in the PT. While eye-balling is crucial, the SiFTS interface makes identifying key interactions easy • Run the script on VSW results output (96 - VSW 1FOR results) and analyse the interaction pattern – Scripts->Cheminformatics->Interaction fingerprints • Perform Clustering, chosing the number of ’desired clusters’ Visual Inspection • • The different filtering methods reduce the number of hits • Possible selection criterias: The final selection should be done by human visual inspection of the protein ligand interactions. – Does the ligand have strange docked conformations? – Do the protein-ligand interactions fit SiteMap results? – Does the ligand have good ligand efficiencies? Post-screen • • Pose filter (Scripts -> Docking post-processing-> Pose filter) • Reserve slots for compounds with somewhat lower GlideScores that came via ligand based methods (show high similiarity using techniques we’ve seen here) • Visualize 5-10x the number of compounds to be wet-screened by several people; balance leadlike vs. druglike; tabulate votes • Cluster results by chemotype (e.g. spectral clustering) and only order up to 2-4 best examples from each cluster for testing Filter based on Interaction fingerprints, ligand strain energy, ligand efficiency Glide XP and beyond • Empirical scoring functions tell us the propensity of binding between two things • The ability to decompose the scoring function in more detail can tell us more about the reasons for good binging and help to distinguish between close-binding ligands – If we know which tiny factors contribute better to binding, this in turn helps us design better ligands • The topic of structure based pharmacophores leads on from this concept, but let’s look at Glide XP first... Glide XP Visualiser You can read in your pv file here Exercise 7 Understanding Extra Precision Docking, XPVisuliaser • Do Glide ’Score in Place’ with the co-crystalized ligand IFJS using XP and toggle on ”Write XP descriptor information” • Examine the XP descriptors in Applications-> Glide -> XP Visualizer (read in your *xp pv file) – A pre-generated file can be used ”ScoreInPlace_1FJS_XP_pv.maegz” • • • Use ’Help...’ To understand the terms in the scoring function ... Leads to Exercise 4b Structure based pharmacophore https://www.schrodinger.com/productpage/14/41 “Novel Method for Generating Structure-Based Pharmacophores Using Energetic Analysis” N. K. Salam, R. Nuti, W. Sherman J. Chem. Inf. Model. 2009, 49, 2356-2368 "Energetic analysis of fragment docking and application to structure-based pharmacophore hypothesis generation,“ K. Loving, N. K. Salam, W. Sherman J. Comput. Aided Mol. Des. 2009, 23, 541–554. Introduction to Energetic-Pharmacophores • By capturing the power of XP to de-convolute the scoring function we are able to use those components to rank the most important features of binding • The binding interactions are translated into ‘pharmacophore features’ and presented to the user – Only the highest ranked feature will make up the final pharmacophore – We can use this PH4 to screen for similar compounds that will also exhibit these “key features for binding” Which are the important features? This is an example of using fragments, but the same concept holds true for single ligand as seen in the previous slide -2.2 -1.2 0.0 -0.4 0.0 0.0 -2.4 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 -1.2 -0.3 Which are the important features? Which are the important features? -2.2 -1.2 0.0 -0.4 0.0 0.0 -2.4 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 -1.2 -0.3 Optimal Site Selection -2.2 -1.2 -2.4 -1.2 Methods • • • Protein PDB structure preparation Fragments ionization/tautomer states generated Glide XP modified settings • Increase number of poses generated for initial docking stage • Wider scoring window for filtration on initial poses • Increase number of poses per ligand for energy minimization – Write XP Descriptor Information • Writes atom-level energy terms – – – – H-bond Electrostatic and vdw Hydrophobic enclosure π-π and π-cation Exercise 8 Generating a Structure Based Pharmacophore for Screening • ... In the PT Group **** E-PH4s*****, the XP ligand can be used to generate e-Pharmacophore with Scripts->Postdocking processing-> e-Pharmacophore (< 1 min) – Single ligand option – Create hypothesis • • Note, the input is a ’glide-dock-XP-SIP-2013-pv.maegz’ pose-viewer file normally. Search the Phase database using Applications -> Phase -> Advanced Pharmacophore Screening(< 1 min for search) – Database: /Conf_database/FXA_db.phdb (latest 2013 format) – Choose hypothesis in workspace (or selected entry) – Use existing conformations • View results in Maestro – Fitness is the output column – Use ’right click fix’ on highlited row to fix the original pharmacophore in the workspace. Arrow-through results. Use all possible known information • Use all available information about target and its ligand preferences – 2D similarity – Pharmacophore – Shape based Shape in Phase (latest seminar: http://www.schrodinger.com/seminarprior/19/39/) • Shape based methods are widely used in industry – Literature > especially for virtual screening – Fast, easy, sound-physical justification • Based on principal of rapid initial alignments using atom triplets followed by refinement and volume overlap scoring – Different to Gaussian-based methods in their output – Speed roughly 500 confs/sec on opteron 2.4GHz – A number of publications http://www.schrodinger.com/productpublications/14/13/ Shape in Phase: Volume Similarity • Hard sphere atom volume overlaps are used for similarity assessment: OAB VA∩B ≈ SimAB = VA∪B max {OAA, OBB } • Atomic properties are easily incorporated – Only include atoms with similar types • Atom types: – – – – None (shape only) Phase QSAR* Element MacroModel Increasing specificity * hydrophobic, electron withdrawing, H-bond donor, negative ionic, positive ionic, and other OAB = ∑ ∑ oab a∈A b∈B Additional Pharmacophore Mode • We added a mode to treat each molecule as an assembly of pharmacophore features as opposed to atoms – Features include: • • • Hydrophobic Ÿ Aromatic • Acceptor Ÿ Donor • Negative Ÿ Positive Pharmacophore features are treated just like atoms The motivation for this mode was to improve virtual screening enrichments by focusing on features rather than all atoms Atoms for Alignment • • Assessing all possible atomic alignments is computationally intractable We simplify the problem by focusing on local atom environments 1. Compute a distance profile for each atom 2. Compare atom profiles between different molecules 3. Align top scoring triplets 4. Refine alignment by adding more atoms • • Triangular area and binning is used to eliminate edge effects Atom types can be easily incorporated Comparing Atoms • Based on the atom distance distributions, we can compute similarities between atoms of different molecules Alignment of 3 Points • Triad from structure B can be rapidly aligned to a triad in structure A 1. Shift A & B to a common centroid 2. Rotate them to the xy plane 3. Determine the angle θB that minimizes the sum of squared distances between the pairs of atoms (a1, b1), (a2, b2) and (a3, b3) • ~3x faster than standard 3D leastsquares alignment – Results in significant speedup due to the number of alignments being considered • Refinement of top alignments by considering atoms within 0.5 Å from each other Exercise 9 (if you have energy!) Shape Based Searching • Use the VDW shape of a ligand to search for molecules of a similar shape. The 1FJS xtal ligand is the template. • Applications > Shape Screening... – – – – – Use Shape query from workspace Generate conformations during search Drop-down options give you more stringency in addition to shape “Shape sim” is the output column View results in Maestro as before Putting everything together– Tailored Protocols for XYZ Protein v Databases 1. CACDB2010 lead/drug-like set. 2. Phase mining, with multiple hypotheses, of CACDB phase database using half and whole of ABCDE as queries (shape similarity > 0.6 or top 0.5%).* 3. Fingerprint-based similarity search of whole CACDB using HTS hits as queries (Tanimoto >= 0.6 or top 0.5%).* v ABCDE pocket of XYZ 1. Receptor of ABCDE site with HB to Res123 (-C=O and – NH) and/or Res 122 (-NH) on chain A. 2. Receptor of ABCDE site with HB to Res123 (-C=O and – NH) and/or Res 122 (-NH) on chain B. 3. Receptor of ABCDE site with HB to Res123 (-C=O and – NH) on both chain A and B v Glide HTVS (25 % top scoring) v Glide SP* * Two conformations of ligands for XP docking (15 % top scoring) *: Combined structures taken directly to XP v Glide XP* * Three conformations of ligands for XP docking (post-processing of ensemble) 1. 2. 3. 4. Mine for poses with desirable interactions, i.e., hydrogen bonding. Rescore with Epik state and strain penalties Take top scoring 5000 to next step for visualization Select molecules for sourcing with help of clustering tools Summary • • Virtual screening needs careful planning and preparation • Products and tools that have been discussed today: Post-process the results using different tools and re-score, rerank – PrimeX, Prime, Macromodel, Sitemap, Glide, Epik, Canvas, Phase, Prime MM-GBSA, Interaction fingerprint, Spectral clustering, Strain rescore, Pose filter, e-Pharmacophore Thanks Audience for your attention !