vii ii iii

advertisement
vii
TABLE OF CONTENTS
CHAPTER
1
TITLE
PAGE
DECLARATION
ii
DEDICATION
iii
ACKNOWLEDGEMENT
iv
ABSTRACT
v
ABSTRAK
vi
TABLE OF CONTENTS
vii
LIST OF TABLES
xi
LIST OF FIGURES
xiii
LIST OF SYMBOLS
xv
LIST OF ACRONYMS
xvii
LIST OF APPENDICES
xix
INTRODUCTION
1.1
Introduction
1
1.2
Quantitative Structure Activity Relationship (QSAR)
2
1.3
History and Development of QSAR
4
1.3.1 Data Set
7
1.3.2 Descriptors
8
1.4
1.3.2.1 Topological Descriptors
10
1.3.2.2 Electronic Descriptors
11
1.3.2.3 Geometric Descriptors
11
Feature Selection
12
1.4.1 Genetic Algorithm (GA)
14
viii
1.5
Tools and Techniques of QSAR
14
1.5.1 Multiple Linear Regression Analysis
15
1.5.2 Partial Least Squares
17
1.6
Applications of QSAR
20
1.7
Overview of Multidrug Resistance Mycobacterium
22
tuberculosis
1.7.1 Mycobacterium tuberculosis
23
1.7.2 How Does Tuberculosis Spread
24
Minimum Inhibition Concentration (MIC)
25
1.8.1 Escherichia coli
26
1.9
Database Mining
27
1.10
Research Scope
28
1.11
Research Objectives
28
1.12
Significance of Research
29
1.13
Layout of the Thesis
29
1.8
2
RESEARCH METHODOLOGY
2.1
Introduction
31
2.2
Data Set
32
2.3
Structure Entry and Molecular Modeling
33
2.4
Descriptor Generation
33
2.5
Feature Selection
35
2.5.1 Objective Feature Selection
35
2.5.2 Subjective Feature Selection
36
Model Development
37
2.6.1 Multiple Linear Regression Analysis
38
2.6.2 Partial Least Squares
39
2.7
Model Validation
40
2.8
Application of QSAR Models to Database Mining
42
2.8.1 Molecular Descriptors and Similarity
44
2.6
Calculation
2.8.2 Applicability Domain of QSAR Models
45
2.8.3 Biological Activity Predicted Using QSAR
46
ix
Models
2.9
3
Laboratory Testing
46
2.9.1 Material and Method of Agar Diffusion
47
DEVELOPMENT OF QSAR MODELS AND DATABASE
MINING FOR ANTI BACTERIAL AGENTS
3.1
Introduction
49
3.2
Selection of Descriptors and Feature Selection
49
3.3
Model Development Using MLRA Method
54
3.4
Model Development Using PLS Method
57
3.5
Model Validation
61
3.6
Application of QSAR Models to Database Mining
63
3.6.1 Application of QSAR Models in AmicBase
Database Mining (without Scaling)
66
3.6.2 Application of QSAR Models in AmicBase
68
Database Mining (with Scaling)
3.7
Experimental Validation
3.8
Effects of Range Scaling and Applicability Domain to
Search New Agents
4
71
74
DEVELOPMENT OF QSAR MODELS AND DATABASE
MINING FOR ANTI TUBERCULOSIS AGENTS
4.1
Introduction
76
4.2
Descriptors generation and Objective Feature
76
Selection
4.3
Development of QSAR Models by using MLRA
80
Method
4.4
Development of QSAR Models by using PLS
83
Technique
4.5
Model Validation
86
4.6
Application of QSAR Models to Database Mining
89
4.6.1 Application of QSAR Models in AmicBase
Database Mining (without Scaling)
4.6.2 Application of QSAR Models in AmicBase
90
x
Database mining (with Scaling)
93
4.6.3 Effects of Applicability Domain to
Search New Agents
4.7
5
Experimental Validation
96
96
CONCLUSIONS AND RECOMENDATION
5.1
Introduction
100
5.2
Conclusion
100
5.3
Limitation of the study
101
5.4
Future Research Recommendation
102
REFERENCES
103
APPENDIX A
110
APPENDIX B
114
APPENDIX C
121
xi
LIST OF TABLES
TABLE NO.
TITLE
PAGE
2.1
Type of descriptors in TSAR
34
3.1
List of selected descriptors and their statistical analysis
50
3.2
Correlation matrix of descriptors
52
3.3
Statistical output of MLRA model
54
3.4
Descriptors which were included in the QSAR model by
using of MLRA
55
3.5
Statistical analysis of MLRA method
57
3.6
Statistical output of GA-PLS for each dimension
58
3.7
Statistical output of PLS model
59
3.8
Descriptors which were included in the QSAR model by
using of PLS
60
3.9
Calculated MIC for compounds in the prediction set
62
3.10
List of probe compounds for database mining
64
3.11
Selected compounds with predicted MIC value
67
3.12
Selected compound with their biological activity predicted
70
3.13
MIC value of selected compounds (without scaling) using
71
agar diffusion method
3.14
MIC value of selected compounds (with scaling) using agar
73
diffusion
4.1
List of selected descriptors and their statistics analysis
77
4.2
Correlation matrix of descriptors
78
4.3
Statistical output of MLRA model
81
4.4
Descriptors which were included in the MLRA model
81
xii
4.5
Statistical plot output of GA-PLS for each dimension
83
4.6
Statistic of the PLS model
84
4.7
Descriptors which were included in the PLS model
86
4.8
Calculated MIC for compounds in the prediction set
87
4.9
Selected compounds with their predicted anti tuberculosis
activity
90
4.10
List of probe compounds for database mining
91
4.11
Selected compounds with their predicted MIC value
95
4.12
MIC value of selected compounds (without scaling) using
4.13
agar diffusion method
97
MIC value of selected compounds (with scaling) using agar
98
diffusion method
xiii
LIST OF FIGURES
TABLE NO.
TITLE
PAGE
1.1
The general QSAR problem
1.2
Flow diagram for the genetic algorithm (GA)
15
1.3
Illustration of the difference between PCR and PLS
19
1.4
Structure of E. coli
26
2.1
General QSAR methodology
32
2.2
Genetic algorithm process
38
2.3
Flowchart for the general model building process in QSAR
studies
2.4
9
41
Flowchart of database mining that employs predictive
QSAR models
43
3.1
Plot of experimental vs. predicted MIC for MLRA model
56
3.2
Plot of predicted value vs. standard residual for MLRA
56
model
3.3
Plot PRESS vs. No of component
58
3.4
Plot of experimental vs. predicted MIC for PLS model
59
3.5
Plot of predicted value vs. standard residual for PLS model
61
3.6
Flowchart to select new compounds in AmbicBase
database
3.7
Flowchart to select new compounds in AmbicBAse
database
3.8
3.9
66
69
Inhibition zone of E. coli using (a) m-cresol and (b)
eugenol methyl ether
72
Inhibiton zone of E. coli using selective compounds
74
xiv
4.1
Plot of experimental value vs. predicted MIC for MLRA
82
4.2
Plot of predicted value vs. standard residual for MLRA
82
model
4.3
Plot PRESS vs. No. of component
84
4.4
Plot of experimental vs. predicted MIC for PLS model
85
4.5
Plot of predicted value vs. standard residual for PLS model
85
4.6
Step to select new compounds against M. tuberculosis
94
4.7
Inhibition zone of active and inactive agents
97
xv
LIST OF SYMBOLS
a, b, c, d
~
b̂ , b
-
Regression coefficient
-
Regression vector
~ˆ
b
-
~
The estimate of b
ĉ
-
Activity of unknown compounds
DT
-
Applicability domain
Es
-
Steric component
ρ
-
Proportionality reaction constant
σ
-
Electronic properties of aromatic compounds, standard
deviation of Euclidean distance
π
-
Hydrophobicity of substituents
px
-
Partition coefficients of derivative molecule
pH
-
Partition coefficients of parent molecule
r2
-
How closely equation fits the data
r2 (CV)
-
Predictive power of the model
runk
-
Matrix of the known descriptor
χ
-
Molecular connectivity indices
X
-
Mean value
y
-
Activity observed value
y
-
Mean value, average Euclidean distance
ŷ
-
Predicted value
C
-
Concentration of molecule
D
-
Distance matrix
F
-
Degrees of freedom
R
-
Matrix of descriptor
xvi
RT
-
Pseudo-inverse of matrix descriptor
S
-
A diagonal matrix, standard error of the
regression model
s.d
-
Standard deviation
U
-
Score matrix from PCA
V
-
Matrix containing the loading
W
-
Wiener index
Z
-
An arbitrary parameter to control the significance level
xvii
LIST OF ACRONYMS
BC3
-
Benzo [c] quinolizin-3-ones
CADD
-
Computer assisted drug design
CAMD
-
Computer assisted molecular design
DAT
-
Dopamine transporter
EC50
-
Effect concentration
ED
-
Euclidean distance
EDCs
-
Endocrine disrupting chemicals
EIEC
-
Enteroinvasive
EPEC
-
Enter pathogenic
ETEC
-
Enterotoxigenic
GA
-
Genetic algorithm
GA-MLRA -
Genetic algorithm-multiple linear regression analysis
GAPLS
-
Genetic algorithm partial least squares
GSA
-
Genetic simulated annealing
HOMO
-
Highest occupied molecular orbital
IC50
-
Inhibition concentration
KNN
-
K-nearest neighbor
LDA
-
Linear discriminant analysis
LFER
-
Linear free energy relationship
LUMO
-
Lowest unoccupied molecular orbital
MDR
-
Multi drug resistant
MIC
-
Minimum inhibition concentration
MLRA
-
Multiple linear regression analysis
MLR
-
Multivariate linear regression
MRA
-
Multiple regression analysis
xviii
NCI
-
National cancer institute
PCA
-
Principal component analysis
PCR
-
Principal component regression
PLS
-
Partial least squares
PRESS
-
Predictive sum of squares
QSAR
-
Quantitative structure activity relationship
QSPR
-
Quantitative structure property relationship
RSS
-
Residual sum of squares
TCH
-
Thiophene 2 carboxylic acid hyrazide
SSR
-
Sum of squares
SST
-
Total sum of squares
VTEC
-
Verotoxigenic
VOCs
-
Volatile organic compounds
Download