QSAR Qualitative Structure-Activity Relationships Can one predict activity (or properties in QSPR) simply on the basis of knowledge of the structure of the molecule? In other, words, if one systematically changes a component, will it have a systematic effect on the activity? Choice of Model Can approach in two directions: Simple to complex model Complex to simple model Simplest Model Linear relationship between x and y Y = mx + b Minimize error by least squares: S(Yi – Y’i)2 = S[Yi – (mXi + b)]2 Y’i is predicted value Least Squares Correlation coefficient -1 < r < 1 Another test Is the line better than the mean? 60 y = 2.9562x - 0.2597 2 y = 0.0676x - 0.3882 R = 0.8686 2 R = 0.0045 30 0 -15 -10 -5 0 5 10 15 -10 -5 0 -30 -15 A circle -60 2 lines 5 10 15 100 1000 y = 0.0008x + 275.11 y = 2.8515x - 31.647 R2 = 0.978 2 R = 0.9179 75 750 50 500 25 250 0 0 10 20 30 40 One bad point 50 0 200000 400000 600000 Wrong model 800000 Multiple Regression Y = f (X1, X2…Xn) Problems: Choice of model – linear, polynomial, etc. Visualization Interpretation Computationally demanding Variable reduction Principal Component Analysis Principal Component PC1 = a1,1x1 + a1,2x2 + … + a1,nxn PC2 = a2,1x1 + a2,2x2 + … + a2,nxn Keep only those components that possess largest variation PC are orthogonal to each other Exploring QSAR Pickup the NONLIN program http://www.trinity.edu/sbachrac/drugdesign2007/ Unzip and install it on your computer Read the Read.Me and Nonlin.doc documentation Look at the HeatForm.NLR file with any word processor Running NONLIN Start an MSDOS window Change to directory where the code is Cd /d d:\nonlin Execute the program with data file Nonlin heatForm > output assignment Propose a QSAR scheme to predict the DHf of the alkanes Early Examples Hammett (1930s-1940s) COOH X COOH X COOH X X para = log10 Kp K0 meta = log10 Km K0 COO + H K0 COO + H Kp COO + H Km Hammett (cont.) Now suppose have a related series X CH2COOH CH2COO X +H K'x log10 K'x = r K'0 reflect sensitivity to substituent r reflect sensitivity to different system Hammett (cont.) Linear Free Energy Relationship So and DG = -2.303RTlog10K DG – DG0 = -2.303RT DG’ – DG’0 = -2.303RTr Therefore DG’ – DG’0 = r(DG – DG0) Free-Wilson Analysis Log 1/C = S ai + m where C=predicted activity, ai= contribution per group, and m=activity of reference Free-Wilson example Br X N Y HCl activity of analogs Log 1/C = -0.30 [m-F] + 0.21 [m-Cl] + 0.43 [m-Br] + 0.58 [m-I] + 0.45 [m-Me] + 0.34 [p-F] + 0.77 [p-Cl] + 1.02 [p-Br] + 1.43 [p-I] + 1.26 [p-Me] + 7.82 Problems include at least two substituent position necessary and only predict new combinations of the substituents used in the analysis. Hansch Analysis Log 1/C = a p + b + c where p(x) = log PRX – log PRH and log P is the water/octanol partition This is also a linear free energy relation Molecular Descriptors Simple rules for describing some aspect of a molecule Structure Property 2D descriptors only use the atoms and connection information of the molecule Internal 3D descriptors use 3D coordinate information about each molecule; however, they are invariant to rotations and translations of the conformation External 3D descriptors also use 3D coordinate information but also require an absolute frame of reference (e.g., molecules docked into the same receptor). Descriptor examples Physical Properties MW log P (ocanol/water partition) bp, mp Dipole moment solubility Descriptor examples Structural descriptors 2D Atom/Bond counts Number non-H atoms Number of rotatable bonds Number of each functional group 2C chains, 3C chains, 4C chains, 5C chains, etc. Rings and their size 3D Number of accessible conformations Surface area Topological Descriptors Weiner Path Index Distance Matrix 6 4 2 1 3 5 7 w = S S dij i j>i 0123423 1012312 2101221 3210132 1234043 2123403 3212330 w = 46 Topological Descriptors Randic Index 1 valence at vertex 2 3 1 1 3 1 bond values as product of above 3 3 9 2 6 3 edge term as reciprocal of square rooot of above bond values .577 .577 .333 .408 .577 Sum of edge terms 3.179 .707 Predict bp of alkanes 100 y = 1.5225x + 7.2917 R2 = 0.9547 90 bp 80 70 60 50 30 35 40 45 50 Weiner Index 55 60 65 3D Molecular Descriptors Potential energy Solvation energy Water accessible surface area Water accessible surface area of all atoms with positive (negative) partial charge Pharmacophore Specification of the spatial arrangement of a small number of atoms or functional groups With the model in hand, search databases for molecules that fit this spatial environment Creating a Pharmacophore O O O OH O OH 3D Pharmacophore searching With the pharmacophore in hand, search databases containing 3-D structure of molecules for molecules that fit Can rank these “hits” using scoring system described later Pharmacophore Descriptors Number of acidic atoms Number of basic atoms Number of hydrogen bond donor atoms Number of hydrophobic atoms Sum of VDW surface areas of hydrophobic atoms Lipinski’s Rule of 5 potential drug candidates should Have 5 or fewer H-bond donors (expressed as the sum of OHs and NHs) Have a MW <500 LogP less than 5 Have 10 or less H-bond acceptors (expressed as the sum of Ns and Os) Adv. Drug Delivery Rev., 1997, 23, 3 Docking Interact a ligand with a receptor Need to do the following A) select appropriate ligands B) select appropriate conformation of receptor C) select appropriate conformations of ligands D) combine the ligand and receptor (docking) E) evaluate these combinations and rank order them Selection of Ligands Want drug-like molecules 250< MW < 500 Lipinski’s rules Search through databases Available Chemicals Directory (ACD) World Drug Index NCI Drug database In-house databases Receptor Conformation Usually Receptor is assumed to be static Get structure from X-ray or NMR experiment Protein Data Bank (http://www.rcsb.org/pdb/) 41385 Structures Ligand Conformation Rigid or flexible If rigid, optimize the structure then use it throughout the docking procedure If flexible, can A) create a set of low energy conformations and then use this set as a collection of rigid structures in docking B) optimize structure within active site of receptor, i.e. dock and optimize together Docking Place ligand in appropriate location for interacting with the receptor Methodological problem: 1) No best method for defining shape 2) No general solution for packing irregular objects (the knapsack problem) Docking Algorithmic Components Receptor and Ligand Description relative errors of structures, etc.) (keep in mind Bind the Ligand to Receptor (configuration/conformation search) Geometric search (match ligand and receptor site descriptions) Search for minimum energy - molecular dynamics (MD) or monte carlo (MC) Evaluation of the dock (DGbind) also called scoring Descriptor Matching Method DOCK program 1) Generate molecular surface for receptor 2) Generate spheres to fill the active site (usually 30-50 spheres) 3) Match sphere centers to the ligand atoms (originally just lowest E conformer, now use multiple conformers, but still rigid) – generates 10K orientations per ligand – Shape-driven! 4) Score the interaction Fragment-Joining Method FlexX, LUDI Place base fragments into microstates of the active site (Fragments can be small molecules like benzene, formaldehyde, formamide, naphthol, etc.) Optimize position of the Base fragment Join fragments with small connecting chains made of CH2, CO, CONH, etc. Scoring (evaluation of the dock) Want to quickly evaluate the strength of the interaction between ligand and receptor Full free energy computation Expensive Requires excellent force fields Empirical method Fast and cheap Requires fitting to a broad set of ligand/receptor complexes Empirical Scoring Method of Bohm (LUDI, FlexX, etc.) DGbind = DG0 + Sh-bonds DGhb f(DR,Da) + Sion DGion f(DR,Da) + DGlipo Alipo + DGrot NROT DG0 reduction in binding energy due to loss of rotation and translation of ligand DGhb contribution from ideal hydrogen bond DGion contribution from ionic interactions DGlipo contribution from lipophilic interactions DGrot contribution from freezing rotations within ligand These come from empirical fits. Bohm Method (cont.) f(DR,Da) are penalty functions for non-ideal interactions – distances too short/long, angles not linear f (DR,Da) = f1(DR)f2(Da) f1(DR) = 1, DR<0.2 Å f2(Da) = 1, Da<30° 1-(DR-0.2)/0.4, DR<0.6 Å 1-(Da-30)/50, Da<80° 0, DR>0.6 Å 0, Da>80° DR is deviation from ideal H...O/N distance of 1.9 Å Da is deviation from ideal N/O-H…O/N angle of 180° Bohm Method (cont.) Alipo is the lipophilic contact surface, evaluated by a coarse grid of boxes NROT is the number of rotatable bonds – acyclic sp3-sp3, sp3-sp2 and sp2-sp2. No terminal groups or flexibility of rings incorporated. H.-J. Bohm, J. Comput.-Aided Mol. Des., 1994, 8, 243-256 Scoring alternatives Many variations on Bohm scheme Buried Polar term, desolvation term, different forms for the lipophilic term, include metal bonding, etc. Combine scoring functions, i.e. QSAR with scoring functions as variables Use empirical score to select set of hits, then refine with free energy minimization