Receptor-Based Design

advertisement
An Introduction to QSAR
Dr. Bahram Hemmateenejad
Chemistry Department
Shiraz University
Computer-Aided Molecular
Design (CAMD)
 Computer-Aided Ligand Design (CALD)
 Computer-Aided Drug Design (CADD)
Approaches in CAMD
 Receptor-based design

Known receptor


Protein binding site
Supramolecular host
 Ligand-based design


Known set of ligands
Unknown receptor
Receptor-Based
Design
Build or Find the key that fits the lock
Receptor-Based
Design
 Docking
 Interaction
energy
 Molecular alignment
 Pharmacophor modeling
Ligand-Based
Design
 Quantitative structure-activity relationship (QSAR)
 Quantitative structure-Property relationship





(QSPR)
Quantitative structure-Toxicity relationship (QSTR)
Quantitative structure-Retention relationship
(QSRR)
Quantitative structure-Migration relationship
(QSMR)
Quantitative structure-Electrochemistry
relationship (QSAR)
Quantitative structure-Function relationship
(QSFR)
QSAR/QSPR
 Definition
Prediction of biological activities or chemical
property of organic compounds from their
molecular structures using mathematical
equations
(obs. biological activity)  (molecular descriptors)
Y = f (Xi)
 Prediction
Ligand-Based
Molecular Design
Infer Binding Pocket
QSAR
 What to achieve
 estimate
the value of unknown
physical/chemical/biological properties of
compounds based on known or computationally
accessible properties
 How to achieve:
 determine
the value of the response variable y
as a function of descriptor variables xi
yik=F(Ai,xk) + Bik
What we need for QSAR
models?
 Dataset including compounds with known
biological activity
 Descriptors that are accessible for all
members of the dataset
 Algorithms for the development of a QSAR
model
 Validation protocol for the evaluation of the
model
Requirements for QSAR
datasets
 Compounds should




belong to a congeneric series (more
important in 2D)
have same mechanism of action
have comparable binding mode
have biological activtity that correlates to
binding affinity
Requirements for QSAR
datasets
 Compounds should

have enumerated biological response
measured
 in
same organism/tissue/cell/protein
 using same type of measurement
(binding/functional/IC50/Ki etc.)
 using same protocol (radioligand, activator,
cofactor, pH, buffer
 etc.)
QSAR Origin
Linear Free-Energy Relationships (LFER)
 Hammet
K
Log
 
K
0
Free-Wilson analysis
 log 1/C = Σ ai + μ
 ai = substituents (R1, R2,
etc.) contributions
 μ = activity contribution of
reference compound
R1
R2
R3
Free-Wilson analysis
NO R1
Cl
1
0
2
1
R2
OH Me Cl
1
0
1
0
0
1
R3
OH Me Cl
0
0
0
0
0
0
OH Me
0
0
1
0
3
4
5
0
1
1
0
1
0
0
0
0
 Hansch Analysis

Official Birth
1
2
Log  a(log P)  b log P  c  ...  k
C
C=Biological effect
P=Partition Coefficient
σ=Electronic Hammet Constant
 Linear Hansch model

log 1/C = a log P + b σ + c MR + ... + k
 Nonlinear Hansch models


log 1/C = a (log P)2 + b log P + c σ + ... + k
log 1/C = a log P - b log (ßP + 1) + c σ + ... + k
 Mixes Hanch/Free-Wilson model

log 1/C = a (log P)2 + b log P + c σ +...+ Σ ai + k
Complementarily principles in binding
molecules to macromolecular targets
Interaction
Property
Descriptor
Steric
Topology
Distance, volume,
surface
Electrostatic
Electron
Density
σ, partial charges,
Quantum chemical
Hydrophobic
Lipophilicity
logP, π
van der Waals
Polarizability
MR, parachor
Descriptor types for QSAR
 Substituent variables:
Property of substituents only
 Molecule variables:
Property of the whole
molecule
 Interaction variables:
Property of a given
interaction
Descriptors for QSAR
 Constitutional

MW, Nheteroatoms ,Natoms
 Topological

Connectivity, Weiner index, E-state indices
 Electrostatic

Polarity, dipol moments, partial charges
 Geometrical Descriptors

Distances, molecular volume, PSA
 Quantum chemical




HOMO and LUMO energies
Vibrational frequencies
Bond orders
Energies, entalphies, entropy
Descriptors for QSAR
 3D descriptors
MEP – Molecular Electrostatic Potential
MLP – Molecular Lipophilicity Potential
GRID – total energy of interaction: the sum of
steric (Lennard-Jones), H-bonding and
electrostatics
CoMFA – standard: steric and electrostatic,
additional: H-bonding, indicator, parabolic and
others.
Conditions for applicability of
QSAR
 Selection of compounds
 The
same mechanism of action
 Homogeneity
 Representativity

Experimental design
 Biological data
 High
quality and reliable
 Same protocol and same laboratory
 The level of experimental error
Conditions for applicability of
QSAR
 Type of data
 Continues
 Discrete
 Data scaling and transformation
 Logarithmic
transformation
 Normalization
Conditions for applicability of
QSAR
 Descriptors
 As
meaningful as possible
 Interpretable
 Calculation simplicity
 Calculation uncertainty
 Software reliability
QSAR Steps
1. Formulation of classes of similar compounds
Ideal situation: Classes of chemically and
biologically similar compounds
All the compounds should be structurally similar
and function according to the same mode of
action
Compounds must be disimilar enough to cause
some systematic change in biological activity
QSAR Steps
2. Quantitative description of structural
variations (descriptor calculation)
 Usually several descriptors are required
 It is difficult to predict which descriptors
will be useful
 It is convenient to have a set of
independent descriptors
QSAR Steps
3. Selection of training set compounds
Training set: is used to optimize and develop the model
Calibration set: Calculating model coefficient
Validation set: Validate the constructed model
External test set
has no contribution in the model development step
Measures the overall prediction ability of the proposed
model
Selection criteria



Random
Experimental design
Classification methods
 PCA
 Classification
 SIMCA
…
and regression tree (CART)
Example of a PCA-based selection method
QSAR steps
4. QSAR model development (data analysis)


Regression method
Variable selection method
Regression methods
 Linear regression


Preferred for simplicity and ease of
calculation
More descriptive
 Non-linear regress



Usually are complex
Higher prediction ability
High risk of over-fitting
Linear regression
 Multiple linear regression (MLR)
 The
simplest and the mostly used method
 More interpretable
 Collinearity
 Number of variables considered in the model
 Factor analysis based methods
 Principal
component regression (PCR)
 Partial least squares (PLS)
PCR and PLS






Both overcome colinearity by producing orthogonal
variables
PCR is a continuum between MLR and PLS
PLS is more predictive
PCR is more descriptive
PLS generate latent variables
Two-step model building



Variable selection
Factor selection
Higher risk of over-fitting with respect to MLR
Nonlinear regression
 Artificial neural networks (ANN)
 Feed-forward
 Counter propagation
 Kohonen networks
 Wavelet neural network
 Neuro fuzzy
 Nonlinear PCR and PLS
 Quadratic PCR or PLS
 PC-ANN
 PLS-ANN
 Support vector machine (SVM)
 …
Variable selection
 Search strategy
 Searching
different subsets of descriptors
 Scoring function
 Evaluating
the performances of the variable
combination
 Regression methods are used for scoring
 Variable selection always is coupled with a
regression method
Variable selection
 Feature selection
 Different variable selection methods
 Stepwise
 Genetic algorithm
 Ant colony optimization
…
 Feature extraction
 PCA scores
 Kohonen scores
 SVM scores
 Wavelet coefficients
 Combined feature selection-feature extraction
QSAR steps
5. Model validation




The essential part of a QSAR study
Internal validation
Cross-validation
External validation
Some advises
 QSAR models should be
 Simple
 Transparence
 Mechanistically
 Non
comprehensible
over-fitted
 Use as low number of variables as possible
Some advises
 Be associated with a biological end point
 Take the form of unambiguous and easily applicable




algorithm
Ideally have a clear mechanistic basis
Be accompanied by a definition of applicability domain
Be associated with a measure of good-ness of fit
Be accessed in term of its predictive capacity
The last advise
 Using Experimental design and QSAR to
increase the rate of proposing new
compounds
 Medicinal chemists or drug designers
Good
diversity
Molecular
volume
HO
C
2H
O
2HC
HC
HN
O
Rotatable
bonds
2HC
2HC
Dipole
Poor
diversity
O
O
2HC
2HC
O
Synthesis->
Biol. testing->
QSPR model
Predicted value
O
HO
Actual Value
Models are not real
but
sometimes are helpful
Download