From: AAAI Technical Report SS-99-01. Compilation copyright © 1999, AAAI (www.aaai.org). All rights reserved. Multiple Formula Approach for Structure-Cytotoxicity/Antiviral Activity Relationship Studies of Nucleoside Analogs. Mathew L. Lesniewski, Ravi R. Parakulam, Merideth R. Marquis, and Chun-che Tsai* Department of Chemistry, Kent State University, Kent, Ohio 44242-0001 USA E-mail: ctsai@kent.edu Phone:(330) 672-2989 Fax:(330) 672-3816 ABSTRACT Quantitative structure-activity relationships (QSAR) were developed for a series of purine nucleoside analogs with antiviral activity. The correlations of chemical structure of these purine nucleoside analogs to their toxicity/activity were investigated using molecular similarity analysis. Structure-activity relationship studies and molecular similarity analyses were performed using the molecular descriptors, number of atoms and bonds of a molecule (NAB), maximum common substructure (MaCS), and molecular similarity index (MSI). The antiviral activity measurement used in this study was the 50% effective dose (ED50) in µM. The cytotoxicity measurement used in this study was the 50% cytotoxic dose (CD50) in µM. The biological activities and MSI were utilized to generate a series of correlation equations. The multiple formula approach (MuFA) used the top regression correlation equations, based on several reference compounds, to generate the average estimated CD50 and ED50 values for a set of testing compounds. The MuFA integrated the effects of structural similarities and dissimilarities in estimating the cytotoxicity and antiviral activity of testing compounds. Introduction Human immunodeficiency virus (HIV), the causative agent of the acquired immunodeficiency syndrome (AIDS) belongs to the class of viruses called retroviruses. The enzyme reverse transcriptase, which catalyzes the conversion of viral RNA into viral DNA, is a target for the design and development of inhibitors. The clinical use of the highly active antiretroviral therapy (HAART) requires a diverse set of chemicals with a variety of activities and modes of action (Cohen 1998); thus a method for rapidly identifying safe and effective compounds is required. In this paper we describe a methodology using quantitative structure activity relationships (QSARs) and quantitative molecular similarity analysis (QMSA) to investigate the correlation of structure and activity for purine nucleoside analogs. A multiple-formula approach (MuFA) for a similarity-based QSAR is described for estimating the biological activity of a set of purine nucleoside analogs and for identifying new lead compounds. Methods and Results Biological Activity Data A computer database of the biological activities was compiled for anti-HIV purine nucleoside analogs tested in the MT4 cell line. The chemical toxicity measurement used was the cytotoxic dose (CD50) in µM, based on the 50% reduction in the viability of the mock-infected host cells. The antiviral activity measurement used was the 50% effective dose (ED50) in µM, based on the 50% protection of cells against the cytopathic effect of HIV-1 in µM. The selectivity index, SI50 = CD50/ED50, indicates the safety of a particular compound. Molecular Descriptors QSAR requires quantification of a compound’s activity and its chemical structure. A simple descriptor to quantify the chemical structure is the integer value NAB that denotes the number of atoms and bonds in a molecule. NAB groups compounds into topological isomer groups (topoisomers), and is the foundation of the other topological descriptors used. Maximum common substructure, MaCS(x, y), is expressed in terms of NAB and is defined as a substructure of molecule X and molecule Y such that no other common substructure of X and Y has a greater value of NAB. The MaCS (x, y) is allowed to have isolated atoms and structural fragments. Molecular similarity index (MSI) describes the degree of similarity between two molecules (Tsai et. al. 1987). It is defined as: MSI (X, Y) = [MaCS (X, Y)/NAB (X)] x [MaCS (X, Y)/NAB (Y)] QSAR Studies Selection of Learning Set. A learning set of compounds was selected from the compiled database of purine analogs in order to illustrate a topological approach to pharmacophore modeling, which utilized a similaritybased QSAR and a MuFA to estimate the biological activity of new compounds. The criteria for selecting the learning set were compounds tested in the MT4 cell line, purine nucleoside analogs, compounds with a SI50 > 20 and NAB < 40. In this similarity-based QSAR approach, stereoisomers with different activities were topologically identical. Since each of the stereoisomers, used as a reference compound, generated the same MSI(X,Y) values, only the most active stereoisomer from each set of stereoisomers was selected. A learning set containing 15 compounds was produced (Table 1 & Figure 1) (Balizarini and DeClercq 1979; Herdewijn and DeClercq 1979; Nasr, Cradock, and Johnston 1992a and 1992b; Masuda et. al. 1993). Table 1 Learning Set Activity Data NO. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 NAB 40 36 36 34 36 38 40 40 38 42 40 40 38 40 40 CD50 20.8 148 945 27.1 643 1314 486 196 404 271 237 360 486 788 489 ED50 .059 .422 4.46 .136 3.80 8.86 3.74 1.54 3.79 2.72 2.40 4.50 7.73 27.2 17.8 SI50 353 350 226 200 169 148 130 127 106 100 99 80 63 29 27 MSI. The ED50-MSI and CD50-MSI relations were expressed as the following linear functions: log ED50(X) = a + b MSI (X, Y) log CD50(X) = a + b MSI (X, Y) where a and b refer to regression coefficients, X refers to the xth compound in the list and Y is the reference compound. Each of the chemical structures of the 15 compounds was used as a reference compound (Y) in the calculation of MaCS (x, y) by the program TOPSIM (Durand 1996). Fifteen sets of MSI(X, Y) based on each compound were used to produce 15 ED50 and 15 CD50 regression correlation equations utilizing JMP version 3.2.2. The best correlation for the CD50-MSI relation was found using compound 4 as the reference compound: C4 log CD50(X) = 6.227 - 4.502 MSI (x,4) r2 = 0.399 s = 0.414 F(1,13) = 8.63 The best correlation for ED50-MSI relation was found using compound 4 as the reference compound: E4 log ED50(X) = 5.367 - 5.948 MSI(X,4) r2 = 0.346 s = 0.613 F(1,13) = 6.87 Figure 2 New Testing Compounds NH2 N N OH OH N N F N O HO HO O N N N N HO 4 NH2 NH2 N NH N N N N O HO NH N N N HO N HO N NH N N O NH2 O NH2 N N N HO 8 7 NH2 N N N NH2 HO O N N N F N NH2 O N NH N 12 11 10 N O N NH2 13 NH2 O O HO N N O F 9 Cl OH OH 6 N O OH OH N N N O HO 5 N O HO O N N N N 3 NH2 O N O 2 1 N NH2 N NH OH OH HO HO N HO N NH N N NH2 O HO OH 14 N N NH2 O OH OH 4a N O N N 15 Similarity-Based QSAR Analysis. Quantitative structure activity relationships for the compounds of the learning set were derived using molecular similarity and regression analyses for both log CD50-MSI and log ED50- N F F O 4c NH2 N N N N N O 4b F HO OH 4d NH2 N N N O NH2 Figure 1 Learning Set Structures NH2 N N N F F OH NH2 N O N N NH2 N N HO N N N 4e Since compound 4 was the best reference compound in the learning set, it was systematically modified in order to compile a set of new lead compounds. The sites and types of modifications used to develop the new lead compounds were determined by analysis of structureactivity maps (SAMs) containing the learning set compounds (Lesniewski et. al. 1999). A second similarity-based QSAR analysis was run using the original learning set and the new compounds as reference compound (Y). This produced another set of regression correlations. Figure 2 is a list of new compounds selected, based on their log CD50-MSI correlation and log ED50-MSI correlation values that were greater than or equal to compound 4. Cross Validation. A cross validation data set, containing compounds with known activities which were not used in the learning set, was selected to test the validity of the correlation equations (Hayashi et. al. 1988; Jeong et. al. 1993a, and 1993b; Kim et. al. 1993; Murakami et. al. 1991). A MuFA estimated the biological activities of the cross validation set and the new compounds 4a, 4b, 4c, 4d and 4e (Figure 3). The resulting MSI(X, Y) values were used to generate the estimated activity for the CD50 and ED50 for 4, 4a, 4b, 4c, 4d, 4e, and 5 cross validation compounds. Table 3 lists the estimated CD50, ED50, and SI50 values generated by each of the CD50 and ED50 equations. In Tables 3a, 3b and 3c, the column indicates the name of the equation used and the row is each of the compounds' estimated activities generated by the 6 equations for CD50, ED50, and SI50 respectively. The average estimated CD50, ED50, and SI50 values and available observed values were listed in the last 2 columns. The resulting estimated CD50, ED50, and SI50 values were in good agreement with the observed values. Table 2a logCD50(X)=a+bMSI(X,Y) Equations Equ C4a C4b C4c C4d C4e r2 0.692 0.690 0.574 0.537 0.399 Y 4a 4b 4c 2a 4d S 0.295 0.296 0.347 0.362 0.412 F(1,13) 29.1 29.0 17.6 15.0 8.64 a 7.353 7.310 7.142 6.528 6.137 b -6.389 -6.047 -5.815 -5.103 -4.537 r2 0.580 0.602 0.468 0.522 0.345 Y 4a 4b 4c 2a 4d S 0.491 0.478 0.552 0.524 0.613 F(1,13) 17.9 19.7 11.5 14.2 6.85 a 6.775 6.847 6.416 6.108 5.265 b -8.334 -8.047 -7.480 -7.171 -6.012 Figure 3 Cross Validation Set HO N N N O N 16 C O N O S N O S N N N N N N 17 O N O F 19 O O N N N N 18 Cl N N N N N N O O N NH2 N O F 20 O N 4 4a 4b 4c 4d 4e 21 16 18 17 19 20 C4 C4a C4b C4c C4d C4e 53.7 158 95.0 95.0 158 72.6 452 428 167 167 270 270 43.3 9.20 19.9 19.9 41.5 63.7 664 618 185 185 85.6 85.6 39.7 38.1 18.3 39.3 38.1 58.4 607 566 169 169 78.5 78.5 44.6 42.9 44.2 21.2 86.0 64.7 615 575 180 180 169 169 91.7 88.7 49.4 90.9 26.6 124 811 767 292 90.9 158 49.0 54.1 157 95.0 95.0 157 39.8 457 255 168 168 270 270 Avg CD50 52.3 56.7 43.9 48.7 67.0 66.6 588 506 189 156 153 126 Obs. 27.1 N/A N/A N/A N/A N/A >300 1000 >100 >100 >100 >100 Table 3b Estimated and observed ED50 values in ìM Table 2b logED50(X)=a+bMSI(X,Y) Equations Equ E4a E4b E4c E4d E4e Using a MuFA, the estimated CD50, ED50, and SI50 values were within 2 fold of the observed values for all 6 compounds in the cross validation set. The MuFA was based on a learning set containing only the active Table 3a Estimated and observed CD50 values in ìM N N 4 4a 4b 4c 4d 4e 20 19 17 18 16 21 E4 E4a E4b E4c E4d E4e .263 1.11 .562 .562 1.11 .393 2.26 2.26 1.19 1.19 4.18 4.50 .208 .028 .076 .076 .197 .345 .507 .507 1.38 1.38 6.69 7.32 .176 .167 .063 .174 .167 .295 .437 .437 1.22 1.22 6.06 6.65 .224 .213 .221 .086 .521 .362 1.24 1.24 1.35 1.35 6.01 6.55 .492 .470 .207 .487 .087 .761 .204 1.06 .487 2.51 9.75 10.5 .269 1.11 0.57 0.57 1.11 0.18 2.27 2.27 1.21 1.21 2.10 4.54 Avg ED50 .257 .288 .203 .237 .350 .354 .813 1.07 1.08 1.42 5.25 6.40 Obs .136 N/A N/A N/A N/A N/A 0.30 2.80 .500 1.01 10.0 5.50 N 21 Multiple-Formula Approach (MuFA). The average estimated CD50, ED50, and SI50 values were generated by a multiple-formula approach. This approach is used to correct the inability of a single activity(X)-MSI(X,Y) correlation equation to distinguish between compounds with the same MSI(X, Y) value. In estimating the biological activity of a compound, the MuFA used multiple correlation equations to integrate the effects of structural similarity and dissimilarity between a testing compound and multiple reference compounds.The estimated CD50 and ED50 values generated by each of the 6 correlation equations were used to calculate the geometric mean. The average estimated CD50 values and the average estimated ED50 values were used to calculated the average estimated SI50 for the cross validation set. Table 3c Estimated and observed SI50 values in ìM 4 4a 4b 4c 4d 4e 20 17 19 18 16 21 SI4 SI4a SI4b SI4c SI4d SI4e 204 143 169 169 143 185 119 140 119 140 102 100 208 333 263 263 210 185 169 134 169 134 92.5 90.6 225 228 290 226 228 198 180 139 180 139 93.4 91.2 199 201 200 246 165 179 136 133 136 133 95.7 94.0 186 189 239 187 307 164 240 187 149 116 78.7 76.9 201 142 167 167 142 222 119 139 119 139 121 100 Avg SI50 204 197 217 206 191 188 155 144 144 133 96.5 91.9 Obs 200 N/A N/A N/A N/A N/A >333 >200 >35.7 >99.0 100 >55.0 compounds of a particular type, and was only effective for identifying new lead compounds, that were topologically similar to the reference compounds used. This MuFA was not intended to be used for estimating the activity of less active compounds. Conclusions In this study we describe a topological approach to pharmacophore mapping for drug-biological response. A similarity-based QSAR was developed, utilizing a pairwise comparison of a set of known compounds to determine an optimized reference compound R. The MaCS(X,R) expresses the largest topological commonality between compounds X and R which forms an optimized pharmacophore. The MSI(X,R) is the index of the similarity between compounds X and R. Since MSI(X,R) is maximized when X is equal to R. The R compound represents the compound with an optimal biological response. New reference compounds R’, structurally similar to R, can be constructed with improved biological response. New reference compounds R’ contain modifications to R based upon observations made in the various SAMs, and can be selected based upon improved r2 values for the regression correlation equations. This type of pharmacophore modeling can be viewed as a topological analog of molecular shape analysis modeling (Hopfinger 1980). The advantage of this topological approach for pharmacophore mapping is the generation of new lead compounds with improved biological response based upon the chemical structure and the biological response of a set of known drugs without knowing the drug-bioreceptor interaction mechanisms. References Balizarini, J., and De Clercq, E., 1990. Acyclic and Carbocyclic Nucleoside Analogues as Inhibitors of HIV Replication. Design of Anti-AIDS Drugs, 175-194, De Clercq, E. ed. New York, N.Y.: Elsevier. Cohen J.; 1998. Exploring How to get at and EradicateHidden HIV. Science. 278: 1854-1855 Durand, P.J. 1996. An Improved Program for Topological Similarity Analysis of Molecules. Masters Thesis, Dept. of Mathematics and Computer Science, Kent State Univ. Hayashi, S.; Phadatare, S.; Zemlicka, J.; Matsukura, M.; Mitsuya, H.; Broder, S.; 1988. Adenallene and Cytallene: Acyclic Nucleoside Analogues that Inhibit Replication and Cytopathic Effect of Human Immunodeficiency Virus in vitro. Proc. Natl. Acad. Sci. USA 85: 6127-6131 Herdewijn, P. and De Clercq, E., 1990. Dideoxynucleoside Analogoues as Inhibitors of HIV Replication. Design of Anti-AIDS Drugs. 141-174, De Clercq, E. ed. New York, NY: Elsevier. Hopfinger, A. J. 1980. A QSAR Investigation of Dihydrofolate Reductase Inhibition by Baker Triazines Based upon Molecular Shape Analysis. J. Amer. Chem. 102: 7196-7206 Jeong, L. et. al. 1993a. Asymmetric synthesis and Biological Evaluation of β-L-(2R,5S)- and α -L(2R,5R)-1,3-Oxathiolane-pyrimidine and purine Nucleosides as Potential Anti-HIV Agents. J. Med. Chem. 36: 181-195 Jeong, L. et. al. 1993b. Structural-Activity Relationships of β-D-(2S,5R)-and α-D-(2S,5S)-1,3-Oxathiolanyl Nucleosides as potential anti-HIV Agents. J. Med. Chem. 36: 2627-2638 Kim, Hea O. et. al. 1993. 1,3-Dioxolanylpurine Nucleosides (2R,4R) and (2R,4S) with Selective antiHIV1 Activity in Human Lymphocytes. J. Med. Chem. 36: 30-37 Lesniewski, M.; Parakulam, R.; Marquis II, M.; Tsai, Cc.; 1999. Internet Journal of Chemistry. Forthcoming. Masuda, A. et. al. 1993. Synthesis and Antiviral Activity of Adenosine Deaminase-Resistant Oxetanocin A Derivatives: 2-Halogeno-Oxetanocin A. J. Antibiotics. 46(6): 1034-1037 Murakami, K. et. al. 1991. Escherichia coli Mediated Biosynthesis and in vitro Anti-HIV Activity of Lipophilic 6-halo-2’,3’-dideoxypurine Nucloesides. J. Med. Chem. 34: 1606-1612 Nasr, M.; Cradock, J.; and Johnston, M. I.; 1992a.. Structure-Activity Correlation of Natural Products with Anti-HIV Activity. In Natural Products as Antiviral Agents, 31-56. Chu, C.K., and Cutler, H.G. eds. New York, N.Y.: Plenum Press Nasr, M., and Turk, S. R. 1992b Computer-Assisted Structure-Activity Correlation’s of Halodideoxynucleoside Analogs as Potential Anti-HIV Drugs. AIDS Research and Human Retroviruses. 8: 135-144 Tsai, C.-c.; Johnson, M.; Nicholson, V.; and Naim, M.; 1987. A Topological Approach to Molecular Similarity Analysis and Its Application. Studies in Physical and Theoretical Chemistry. 51: 231-236