Multiple Formula Approach for Structure-Cytotoxicity/Antiviral Activity Relationship Studies of Nucleoside Analogs.

advertisement
From: AAAI Technical Report SS-99-01. Compilation copyright © 1999, AAAI (www.aaai.org). All rights reserved.
Multiple Formula Approach for Structure-Cytotoxicity/Antiviral
Activity Relationship Studies of Nucleoside Analogs.
Mathew L. Lesniewski, Ravi R. Parakulam, Merideth R. Marquis, and
Chun-che Tsai*
Department of Chemistry, Kent State University, Kent, Ohio 44242-0001 USA
E-mail: ctsai@kent.edu Phone:(330) 672-2989 Fax:(330) 672-3816
ABSTRACT
Quantitative structure-activity relationships (QSAR) were
developed for a series of purine nucleoside analogs with
antiviral activity. The correlations of chemical structure
of these purine nucleoside analogs to their
toxicity/activity were investigated using molecular
similarity analysis. Structure-activity relationship studies
and molecular similarity analyses were performed using
the molecular descriptors, number of atoms and bonds of
a molecule (NAB), maximum common substructure
(MaCS), and molecular similarity index (MSI). The
antiviral activity measurement used in this study was the
50% effective dose (ED50) in µM. The cytotoxicity
measurement used in this study was the 50% cytotoxic
dose (CD50) in µM. The biological activities and MSI
were utilized to generate a series of correlation equations.
The multiple formula approach (MuFA) used the top
regression correlation equations, based on several
reference compounds, to generate the average estimated
CD50 and ED50 values for a set of testing compounds. The
MuFA integrated the effects of structural similarities and
dissimilarities in estimating the cytotoxicity and antiviral
activity of testing compounds.
Introduction
Human immunodeficiency virus (HIV), the causative
agent of the acquired immunodeficiency syndrome
(AIDS) belongs to the class of viruses called
retroviruses. The enzyme reverse transcriptase, which
catalyzes the conversion of viral RNA into viral DNA, is
a target for the design and development of inhibitors.
The clinical use of the highly active antiretroviral
therapy (HAART) requires a diverse set of chemicals
with a variety of activities and modes of action (Cohen
1998); thus a method for rapidly identifying safe and
effective compounds is required.
In this paper we describe a methodology using
quantitative structure activity relationships (QSARs) and
quantitative molecular similarity analysis (QMSA) to
investigate the correlation of structure and activity for
purine nucleoside analogs. A multiple-formula approach
(MuFA) for a similarity-based QSAR is described for
estimating the biological activity of a set of purine
nucleoside analogs and for identifying new lead
compounds.
Methods and Results
Biological Activity Data
A computer database of the biological activities was
compiled for anti-HIV purine nucleoside analogs tested
in the MT4 cell line. The chemical toxicity measurement
used was the cytotoxic dose (CD50) in µM, based on the
50% reduction in the viability of the mock-infected host
cells. The antiviral activity measurement used was the
50% effective dose (ED50) in µM, based on the 50%
protection of cells against the cytopathic effect of HIV-1
in µM. The selectivity index, SI50 = CD50/ED50, indicates
the safety of a particular compound.
Molecular Descriptors
QSAR requires quantification of a compound’s activity
and its chemical structure. A simple descriptor to
quantify the chemical structure is the integer value NAB
that denotes the number of atoms and bonds in a
molecule. NAB groups compounds into topological
isomer groups (topoisomers), and is the foundation of the
other topological descriptors used. Maximum common
substructure, MaCS(x, y), is expressed in terms of NAB
and is defined as a substructure of molecule X and
molecule Y such that no other common substructure of X
and Y has a greater value of NAB. The MaCS (x, y) is
allowed to have isolated atoms and structural fragments.
Molecular similarity index (MSI) describes the degree of
similarity between two molecules (Tsai et. al. 1987). It is
defined as:
MSI (X, Y) = [MaCS (X, Y)/NAB (X)] x [MaCS (X, Y)/NAB (Y)]
QSAR Studies
Selection of Learning Set. A learning set of compounds
was selected from the compiled database of purine
analogs in order to illustrate a topological approach to
pharmacophore modeling, which utilized a similaritybased QSAR and a MuFA to estimate the biological
activity of new compounds. The criteria for selecting the
learning set were compounds tested in the MT4 cell line,
purine nucleoside analogs, compounds with a SI50 > 20
and NAB < 40. In this similarity-based QSAR approach,
stereoisomers with different activities were topologically
identical. Since each of the stereoisomers, used as a
reference compound, generated the same MSI(X,Y) values,
only the most active stereoisomer from each set of
stereoisomers was selected. A learning set containing 15
compounds was produced (Table 1 & Figure 1)
(Balizarini and DeClercq 1979; Herdewijn and DeClercq
1979; Nasr, Cradock, and Johnston 1992a and 1992b;
Masuda et. al. 1993).
Table 1 Learning Set Activity Data
NO.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
NAB
40
36
36
34
36
38
40
40
38
42
40
40
38
40
40
CD50
20.8
148
945
27.1
643
1314
486
196
404
271
237
360
486
788
489
ED50
.059
.422
4.46
.136
3.80
8.86
3.74
1.54
3.79
2.72
2.40
4.50
7.73
27.2
17.8
SI50
353
350
226
200
169
148
130
127
106
100
99
80
63
29
27
MSI. The ED50-MSI and CD50-MSI relations were
expressed as the following linear functions:
log ED50(X) = a + b MSI (X, Y)
log CD50(X) = a + b MSI (X, Y)
where a and b refer to regression coefficients, X refers to
the xth compound in the list and Y is the reference
compound.
Each of the chemical structures of the 15 compounds
was used as a reference compound (Y) in the calculation
of MaCS (x, y) by the program TOPSIM (Durand 1996).
Fifteen sets of MSI(X, Y) based on each compound were
used to produce 15 ED50 and 15 CD50 regression
correlation equations utilizing JMP version 3.2.2. The
best correlation for the CD50-MSI relation was found
using compound 4 as the reference compound:
C4
log CD50(X) = 6.227 - 4.502 MSI (x,4)
r2 = 0.399 s = 0.414 F(1,13) = 8.63
The best correlation for ED50-MSI relation was found
using compound 4 as the reference compound:
E4
log ED50(X) = 5.367 - 5.948 MSI(X,4)
r2 = 0.346 s = 0.613 F(1,13) = 6.87
Figure 2 New Testing Compounds
NH2
N
N
OH
OH
N
N
F
N
O
HO
HO
O
N
N
N
N
HO
4
NH2
NH2
N
NH
N
N
N
N
O
HO
NH
N
N
N
HO
N
HO
N
NH
N
N
O
NH2
O
NH2
N
N
N
HO
8
7
NH2
N
N
N
NH2
HO
O
N N N
F
N
NH2
O
N
NH
N
12
11
10
N
O
N
NH2
13
NH2
O
O
HO
N
N
O
F
9
Cl
OH
OH
6
N
O
OH
OH
N
N
N
O
HO
5
N
O
HO
O
N
N
N
N
3
NH2
O
N
O
2
1
N
NH2
N
NH
OH
OH
HO
HO
N
HO
N
NH
N
N
NH2
O
HO
OH
14
N
N
NH2
O
OH
OH
4a
N
O
N
N
15
Similarity-Based QSAR Analysis. Quantitative
structure activity relationships for the compounds of the
learning set were derived using molecular similarity and
regression analyses for both log CD50-MSI and log ED50-
N
F
F
O
4c
NH2
N
N
N
N
N
O
4b
F
HO
OH
4d
NH2
N
N
N
O
NH2
Figure 1 Learning Set Structures
NH2
N
N
N
F
F
OH
NH2
N
O
N
N
NH2
N
N
HO
N
N
N
4e
Since compound 4 was the best reference compound in
the learning set, it was systematically modified in order
to compile a set of new lead compounds. The sites and
types of modifications used to develop the new lead
compounds were determined by analysis of structureactivity maps (SAMs) containing the learning set
compounds (Lesniewski et. al. 1999). A second
similarity-based QSAR analysis was run using the
original learning set and the new compounds as reference
compound (Y). This produced another set of regression
correlations. Figure 2 is a list of new compounds
selected, based on their log CD50-MSI correlation and
log ED50-MSI correlation values that were greater than or
equal to compound 4.
Cross Validation. A cross validation data set, containing
compounds with known activities which were not used in
the learning set, was selected to test the validity of the
correlation equations (Hayashi et. al. 1988; Jeong et. al.
1993a, and 1993b; Kim et. al. 1993; Murakami et. al.
1991).
A MuFA estimated the biological activities of the cross
validation set and the new compounds 4a, 4b, 4c, 4d and
4e (Figure 3). The resulting MSI(X, Y) values were used to
generate the estimated activity for the CD50 and ED50 for
4, 4a, 4b, 4c, 4d, 4e, and 5 cross validation compounds.
Table 3 lists the estimated CD50, ED50, and SI50 values
generated by each of the CD50 and ED50 equations. In
Tables 3a, 3b and 3c, the column indicates the name of
the equation used and the row is each of the compounds'
estimated activities generated by the 6 equations for
CD50, ED50, and SI50 respectively. The average estimated
CD50, ED50, and SI50 values and available observed
values were listed in the last 2 columns. The resulting
estimated CD50, ED50, and SI50 values were in good
agreement with the observed values.
Table 2a logCD50(X)=a+bMSI(X,Y) Equations
Equ
C4a
C4b
C4c
C4d
C4e
r2
0.692
0.690
0.574
0.537
0.399
Y
4a
4b
4c
2a
4d
S
0.295
0.296
0.347
0.362
0.412
F(1,13)
29.1
29.0
17.6
15.0
8.64
a
7.353
7.310
7.142
6.528
6.137
b
-6.389
-6.047
-5.815
-5.103
-4.537
r2
0.580
0.602
0.468
0.522
0.345
Y
4a
4b
4c
2a
4d
S
0.491
0.478
0.552
0.524
0.613
F(1,13)
17.9
19.7
11.5
14.2
6.85
a
6.775
6.847
6.416
6.108
5.265
b
-8.334
-8.047
-7.480
-7.171
-6.012
Figure 3 Cross Validation Set
HO
N
N
N
O
N
16
C
O
N
O
S
N
O
S
N
N
N
N
N
N
17
O
N
O
F
19
O
O
N
N
N
N
18
Cl
N
N
N
N
N
N
O
O
N
NH2
N
O
F
20
O
N
4
4a
4b
4c
4d
4e
21
16
18
17
19
20
C4
C4a
C4b
C4c
C4d
C4e
53.7
158
95.0
95.0
158
72.6
452
428
167
167
270
270
43.3
9.20
19.9
19.9
41.5
63.7
664
618
185
185
85.6
85.6
39.7
38.1
18.3
39.3
38.1
58.4
607
566
169
169
78.5
78.5
44.6
42.9
44.2
21.2
86.0
64.7
615
575
180
180
169
169
91.7
88.7
49.4
90.9
26.6
124
811
767
292
90.9
158
49.0
54.1
157
95.0
95.0
157
39.8
457
255
168
168
270
270
Avg
CD50
52.3
56.7
43.9
48.7
67.0
66.6
588
506
189
156
153
126
Obs.
27.1
N/A
N/A
N/A
N/A
N/A
>300
1000
>100
>100
>100
>100
Table 3b Estimated and observed ED50 values in ìM
Table 2b logED50(X)=a+bMSI(X,Y) Equations
Equ
E4a
E4b
E4c
E4d
E4e
Using a MuFA, the estimated CD50, ED50, and SI50
values were within 2 fold of the observed values for all 6
compounds in the cross validation set. The MuFA was
based on a learning set containing only the active
Table 3a Estimated and observed CD50 values in ìM
N
N
4
4a
4b
4c
4d
4e
20
19
17
18
16
21
E4
E4a
E4b
E4c
E4d
E4e
.263
1.11
.562
.562
1.11
.393
2.26
2.26
1.19
1.19
4.18
4.50
.208
.028
.076
.076
.197
.345
.507
.507
1.38
1.38
6.69
7.32
.176
.167
.063
.174
.167
.295
.437
.437
1.22
1.22
6.06
6.65
.224
.213
.221
.086
.521
.362
1.24
1.24
1.35
1.35
6.01
6.55
.492
.470
.207
.487
.087
.761
.204
1.06
.487
2.51
9.75
10.5
.269
1.11
0.57
0.57
1.11
0.18
2.27
2.27
1.21
1.21
2.10
4.54
Avg
ED50
.257
.288
.203
.237
.350
.354
.813
1.07
1.08
1.42
5.25
6.40
Obs
.136
N/A
N/A
N/A
N/A
N/A
0.30
2.80
.500
1.01
10.0
5.50
N
21
Multiple-Formula Approach (MuFA). The average
estimated CD50, ED50, and SI50 values were generated by
a multiple-formula approach. This approach is used to
correct the inability of a single activity(X)-MSI(X,Y)
correlation equation to distinguish between compounds
with the same MSI(X, Y) value. In estimating the
biological activity of a compound, the MuFA used
multiple correlation equations to integrate the effects of
structural similarity and dissimilarity between a testing
compound and multiple reference compounds.The
estimated CD50 and ED50 values generated by each of the
6 correlation equations were used to calculate the
geometric mean. The average estimated CD50 values and
the average estimated ED50 values were used to
calculated the average estimated SI50 for the cross
validation set.
Table 3c Estimated and observed SI50 values in ìM
4
4a
4b
4c
4d
4e
20
17
19
18
16
21
SI4
SI4a
SI4b
SI4c
SI4d
SI4e
204
143
169
169
143
185
119
140
119
140
102
100
208
333
263
263
210
185
169
134
169
134
92.5
90.6
225
228
290
226
228
198
180
139
180
139
93.4
91.2
199
201
200
246
165
179
136
133
136
133
95.7
94.0
186
189
239
187
307
164
240
187
149
116
78.7
76.9
201
142
167
167
142
222
119
139
119
139
121
100
Avg
SI50
204
197
217
206
191
188
155
144
144
133
96.5
91.9
Obs
200
N/A
N/A
N/A
N/A
N/A
>333
>200
>35.7
>99.0
100
>55.0
compounds of a particular type, and was only effective
for identifying new lead compounds, that were
topologically similar to the reference compounds used.
This MuFA was not intended to be used for estimating
the activity of less active compounds.
Conclusions
In this study we describe a topological approach to
pharmacophore mapping for drug-biological response. A
similarity-based QSAR was developed, utilizing a
pairwise comparison of a set of known compounds to
determine an optimized reference compound R. The
MaCS(X,R) expresses the largest topological commonality
between compounds X and R which forms an optimized
pharmacophore. The MSI(X,R) is the index of the
similarity between compounds X and R. Since MSI(X,R) is
maximized when X is equal to R. The R compound
represents the compound with an optimal biological
response. New reference compounds R’, structurally
similar to R, can be constructed with improved biological
response. New reference compounds R’ contain
modifications to R based upon observations made in the
various SAMs, and can be selected based upon improved
r2 values for the regression correlation equations. This
type of pharmacophore modeling can be viewed as a
topological analog of molecular shape analysis modeling
(Hopfinger 1980). The advantage of this topological
approach for pharmacophore mapping is the generation
of new lead compounds with improved biological
response based upon the chemical structure and the
biological response of a set of known drugs without
knowing the drug-bioreceptor interaction mechanisms.
References
Balizarini, J., and De Clercq, E., 1990. Acyclic and
Carbocyclic Nucleoside Analogues as Inhibitors of HIV
Replication. Design of Anti-AIDS Drugs, 175-194, De
Clercq, E. ed. New York, N.Y.: Elsevier.
Cohen J.; 1998. Exploring How to get at and EradicateHidden HIV. Science. 278: 1854-1855
Durand, P.J. 1996. An Improved Program for
Topological Similarity Analysis of Molecules. Masters
Thesis, Dept. of Mathematics and Computer Science,
Kent State Univ.
Hayashi, S.; Phadatare, S.; Zemlicka, J.; Matsukura, M.;
Mitsuya, H.; Broder, S.; 1988. Adenallene and Cytallene:
Acyclic Nucleoside Analogues that Inhibit Replication
and Cytopathic Effect of Human Immunodeficiency
Virus in vitro. Proc. Natl. Acad. Sci. USA 85: 6127-6131
Herdewijn, P. and De Clercq, E., 1990. Dideoxynucleoside Analogoues as Inhibitors of HIV Replication.
Design of Anti-AIDS Drugs. 141-174, De Clercq, E. ed.
New York, NY: Elsevier.
Hopfinger, A. J. 1980. A QSAR Investigation of
Dihydrofolate Reductase Inhibition by Baker Triazines
Based upon Molecular Shape Analysis. J. Amer. Chem.
102: 7196-7206
Jeong, L. et. al. 1993a. Asymmetric synthesis and
Biological Evaluation of β-L-(2R,5S)- and α -L(2R,5R)-1,3-Oxathiolane-pyrimidine
and
purine
Nucleosides as Potential Anti-HIV Agents. J. Med.
Chem. 36: 181-195
Jeong, L. et. al. 1993b. Structural-Activity Relationships
of
β-D-(2S,5R)-and
α-D-(2S,5S)-1,3-Oxathiolanyl
Nucleosides as potential anti-HIV Agents. J. Med. Chem.
36: 2627-2638
Kim, Hea O. et. al. 1993. 1,3-Dioxolanylpurine
Nucleosides (2R,4R) and (2R,4S) with Selective antiHIV1 Activity in Human Lymphocytes. J. Med. Chem.
36: 30-37
Lesniewski, M.; Parakulam, R.; Marquis II, M.; Tsai, Cc.; 1999. Internet Journal of Chemistry. Forthcoming.
Masuda, A. et. al. 1993. Synthesis and Antiviral Activity
of Adenosine Deaminase-Resistant Oxetanocin A
Derivatives: 2-Halogeno-Oxetanocin A. J. Antibiotics.
46(6): 1034-1037
Murakami, K. et. al. 1991. Escherichia coli Mediated
Biosynthesis and in vitro Anti-HIV Activity of
Lipophilic 6-halo-2’,3’-dideoxypurine Nucloesides. J.
Med. Chem. 34: 1606-1612
Nasr, M.; Cradock, J.; and Johnston, M. I.; 1992a..
Structure-Activity Correlation of Natural Products with
Anti-HIV Activity. In Natural Products as Antiviral
Agents, 31-56. Chu, C.K., and Cutler, H.G. eds. New
York, N.Y.: Plenum Press
Nasr, M., and Turk, S. R. 1992b Computer-Assisted
Structure-Activity Correlation’s of Halodideoxynucleoside Analogs as Potential Anti-HIV Drugs. AIDS
Research and Human Retroviruses. 8: 135-144
Tsai, C.-c.; Johnson, M.; Nicholson, V.; and Naim, M.;
1987. A Topological Approach to Molecular Similarity
Analysis and Its Application. Studies in Physical and
Theoretical Chemistry. 51: 231-236
Download