Chemoinformatics a tool for Modern Drug Discovery

advertisement
Chemoinformatics a tool for Modern Drug Discovery
Competition and cost has changed the drug design paradigm from the hit and trial approach to
automated drug design approach allowing tailor made design of active molecules. This has resulted in both
targeted drug discovery and reduced drug development cycle time. The need for introducing newer molecules
that are superior using automated approach will make drug discovery a highly knowledge specific. Some of the
techniques that are evolved over time are schematically presented in Fig.1. indicating that progressively every
step in the drug discovery chain has become automated.
Automated
Scanning of
Molecules
High
High Throughput
Screening
Predictability
of activity
Use of
Library of
Molecules
Low
Hit and
Trial NCE
Discovery
Years
Fig. 1 Progress in drug discovery with time
Rapid change in global competition, growth in IT and emergence of low cost storage technology has
facilitated the paradigm change in drug discovery. Any recent drugs in the market have its own story to tell as
how it managed to come out of various hurdles starting from conceptualization to reality. Knowledge
management is playing in a major role in almost all chemical and pharmaceutical companies. New
chemoinformatics units are created to assist on going drug discovery programs. Many studies have appeared on
chemoinformatics, however this paper briefly outlines managerial issues and support required for effective
implementation of chemoinformatics in both small and large organizations successfully.
What is Chemoinformatics
Chem(o)informatics is a generic term that encompasses the design, creation, organization, storage, management,
retrieval, analysis, dissemination, visualization and use of chemical information, not only in its own right, but as
a surrogate or index for other data, information and knowledge1a.Chemoinformatics, was defined1b as the
''mixing of information resources to transform data into information, and information into knowledge, for the
intended purpose of making better decisions faster in the arena of drug lead identification and optimization”.
Chemoinformatics plays a vital link between theoretical design and in drug design through extraction of
information from the data and convert into knowledge (Fig.2).
Rules
Knowledge
Facts
Information
Chemoinformatics
Numbers
Data
Fig. 2: Chemoinformatics pyramid
In Chemoinformatics there are really only two [primary] questions:
1.) what to test next and
2.) what to make next.
Derivation of information and knowledge is only one aspect of chemoinformatics. The use of derived
knowledge in a design and selection support role is an important part of the drug design cycle. The main
processes within drug discovery are lead identification, where a lead is something that has activity in the low
micromolar range, and lead optimization, which is the process of transforming a lead into a drug candidate.
Chemoinformatics methods can be used proactively to design and filter the most appropriate compounds to
work with in the real world.
Fig. 3. The Molecular Paradigm
Molecular Target
Cloning and Expression
Dissimilarity Selection
Automated High Throughput Screening
Similarity Search
Lead Optimization
Development
Fig. 4. Describing Chemical Structure Information
Environmental
effects & Hazards
Analysis and
Modelling
Chemical and
physical reference
Data
Chemical
Information
Systems
Spectroscopy
Environment
Pharmocology
Toxicology
Regulations
Approach towards Chemoinformatics
For effective implementation of Chemoinformatics the following approaches are followed by different firms and
organizations.

compound registration (Database Creation)

library enumeration

navigating virtual libraries

access to primary and secondary scientific literature

QSAR (quantitative structure/activity relationships)

physical and chemical property calculations

chemical structure and property databases
These tools include not only methods for analysis of experimental data, but also for generation of calculated
properties of molecules.
Physico chemical Property Predictions (“Drug Like” molecules)
Since long efforts were directed towards predicting the properties of chemical species (Drugs, Drug like
candidates, Drug Intermediates etc.). However recent advances in chemoinformatics include new molecular
descriptors and pharmacophore techniques, statistical tools and their applications. The ability to predict so called
ADME (absorption, distribution, metabolism and excretion) properties from molecular structure would have a
tremendous impact on the drug discovery process both in terms of cost and the amount of time required to bring
a new compound to market. Over the past several years there has been a tremendous shift toward emphasis on
optimization of ADME properties early in the life of drug discovery programs.
Two strategies are likely to emerge in the area of physicochemical property prediction: those seeking to develop
general rules in order to screen large numbers of compounds, and those attempting to provide increasing levels
of accuracy for more diverse compounds. Future research will probably focus on developing models with data
sets that are larger and built around more diverse collections of compounds with a wide range of chemical
functionalities. The application of chemoinformatics tools to predict physicochemical properties are reviewed3.
1. Human pharmacokinetic parameters4 Human intestinal absorption5
2. LogP 6 (a measure of lipophilicity / hydrophobicity in the distribution of compounds in various biological
systems) ClogP.7
3. Computational models to measure the local absorption rate 8.
4. Solubility and Permeability 9-11
5. Reduced Ion Mobility 12
6. Drug Absorption 13,14
7. Transport phenomena 15-17
Though aqueous solubility has been extensively studied, computational methods for the estimation of this highly
important property are just beginning to demonstrate predictive capabilities for complex molecules, though it is
not an intrinsic property of molecular structure alone. Aqueous solubility can be greatly affected by crystal
polymorphism.
Design and development of structural libraries in silico environment
The term in silico is now widely used to describe this virtual world of data, analysis, models and designs that
reside within a computer. All possible compounds and ideas are contained within this virtual world, most of
which we cannot afford to attempt in the real world. The 'real' world of compounds made in a chemistry
laboratory and tested in a biological laboratory is only part of a much larger 'virtual' world where hypotheses
may be computer-generated and tested for practicality. The advent of high-throughput methods for drug
discovery represents the key driving event for the renewed enthusiasm for developing (and re-inventing)
computational methodology, since we will always be able to conceive of more molecules than we can make or
afford to test. Estimates of the number2 of drug-like compounds that could theoretically be made are greater than
1040. Deciding which of these molecules to make or acquire, and test, requires good decision-support systems.
Rapid identification of a lead compound or lead compound series, remains the primary objective of all highthroughput screening.
Thus, questions of similarity and diversity of chemical structures and libraries become important. In the present
scenario, computational tools play a major role in the design of the libraries prior to synthesis 18,19 that meet the
defined criteria of similarity or diversity. To address these, an appropriate structure coding has to be chosen, a
structure coding that is somehow related to the biological activity under investigation. Furthermore, the structure
coding scheme must produce the same number of descriptors, irrespective of the size of a molecule, the number
of atoms in a molecule. The chemical structure has somehow to be transformed to produce a fixed number of
descriptors. One such mathematical transformation is autocorrelation which is introduced by Moreau and Broto
and it is well used for QSAR studies. QSAR technique provides quantitative relationships between a chemical
structure and its physical, chemical or biological activity. Correlating the chemical structure of drugs with their
pharmacological activities is of particular interest.
The library enumeration where the core sub-structures are identified as templates with few atoms will be left
open for substituents (R-groups). By varying the R-groups at the points of substitution different product
structures can be generated. Rebek et al20a. published the synthesis of two combinatorial libraries of semi-rigid
compounds that were prepared by condensing a rigid central molecule functionalized by four acid chloride
groups with a set of 19 different L-amino acids. The more symmetric skeleton gives less compounds as shown
in the Fig. 5
R
11,191
Compounds
R
R
R
19 L-amino acids
R
R
O
65,341
Compounds
R
R
Fig. 5. Virtual Library Generation
However if the core sub-structure contain symmetric geometry it may create duplicate structures during
enumeration. This problem can be overcome by implementing similarity check algorithm based on connection
table and chirality of the atoms involved.
Alternative approach where the actual reaction is simulated through synthetic knowledge base. This more
closely replicates the stages involved in the actual synthesis, in which reagents should react together according
to the rules of synthetic chemistry. A strong back ground in Computer Aided Organic Synthesis (CAOS)
program will help to generate reasonable structures of synthetic importance.
Similarity Measures through structural descriptors, weighting scheme and similarity coefficient.20
So far more attention has been paid to the generation of descriptors for diversity analysis and studies on
fragment substructures or physicochemical properties21. Methods to encode the structural features efficiently
play an important role as it works as fingerprint for similarity analysis. 3D sub-structural descriptors based upon
potential pharmacophoric patterns have also been widely used for diversity analysis 22-25
as have physicochemical properties that describe a molecule’s topological, electronic, steric, lipophilic or
geometric features26-28.
There is a need to select set of compounds that are as structurally diverse as possible from existing database
such as a company’s corporate collection, a publicly available database or virtually library from combi-chem.
Four principal types of selection procedure cited in the literature based on cluster, partition, dissimilarity and
optimization29.
However which types of procedure yield the best result and address factors such as cost, availability, synthetic
feasibility rest with the user’s decision. In parallel another area that is getting greater importance is the
development of filtering procedures that identifies molecules that exhibit some sort of undesirable
characteristics (toxicity, high reactivity etc.,). The advent of a chemically aware web language and crossplatform working is ensuring that chemoinformatics methods are becoming available to all chemists in a more
appropriate manner. Library chemistry and high-throughput screening require greater use of chemoinformatics
to increase their effectiveness.
Role of Natural Product chemistry in Chemoinformatics
Natural products also forms an important sector in the area of drug discovery and development. Most
encouraging is the continuing emergence of new natural product chemotypes with interesting structures and
biological activities and potential for sub-library generation of targeted screening. Increasingly available as pure
compounds, natural products are highly amenable to the much broader screening opportunities presented by the
new targets.
Regardless of chemical library input, natural products are uniquely well placed to provide
structural information from which virtual compounds can be created by computational chemistry and allied
technologies. The structural versatility of natural products is expected to play a major role in the modern drug
discovery programs.
Organizational structures for implementation
IT and drug discovery are distinct competence in the existing organizational environment and hence will require
between coordination. The success of leveraging chemoinformatics will depend on the ability of firms to use
chemoinformatics to reduce the drug discovery cycle and the ability to integrate the chemoinformatics into the
organizational knowledge creation process. The main organizational issue is managing the IT and the drug
discovery process
Organizations have two options for sourcing chemoinformatics competence, namely (a) through inhouse
facilities and (b) through outsourcing. The way chemoinformatics is developing it may be easy for firms to
outsource it, as it is specialized competence. Difficulty in getting specialized experts on chemoinformation is
likely to compound in future.
Identifying partners
The major issue in leveraging chemoinformatics is identifying competent partners without loosing competitive
edge and at the same time creating new molecules of medicinal importance.
Review on Drug Company status (Growth sector)
The major companies are using chemoinformatics in an integrated manner for areas that have high growth
potential. The challenge is to rapidly learn how to leverage chemoinformatics to bring newer molecules that
have highly predictable activity characteristics to minimize clinical trial costs.
What the future holds?
Genomics, proteomics and chemoinformatics will increase the diffusion of IT into the pharmaceutical industry.
This will require a higher level organisational knowledge integration process that was hitherto non existent in
the pharmaceutical industry. Drug discovery is moving into the realm of IT. Structural knowledge and drug
knowledge are getting tightly integrated.
Some of the major technologies that are used in chemoinformatics tools are:
 Virtual Chemistry
 Integration of Archival Data
 Diversity Metrics
 Structurally-based Diversity Searches or Comparisons
 Functionally-based Diversity Searches or Comparisons
 Virtual Database Screening
 Extraction of Information from High Throughput Screening Results
 Integration of Screening Results with Structural-based Design Efforts
 Application of Chemoinformatics to Lead Optimization
 Integration of Biological Activity Data
Recent advances in chemoinformatics include new molecular descriptors and pharmacophore techniques,
statistical tools and their applications. Two-dimensional fragment descriptors provide a powerful means of
measuring structural similarity, and their success in this regard has made them a popular tool for diversity
analysis. Visualization methods and hardware development are also opening new opportunities. Much time will
continue to be wasted with incompatible file types without internationally agreed standards.
Fig. 6: Knowledge Based Drug Design
Protein
X-ray
Pharmacophore
Structure
based
design
Pharmcophore
based design
Protein
mechanism
Focused
sets
Primary
screening
libraries
Zero
knowledge
Diversity needed to find a hit
In classical QSAR, a common free-energy scale relates independent variables to each other, so concepts of
similarity are possible by simple arithmetic difference of values. Concepts of similarity of chemical structure are
more complex because a structure needs to be described in terms of a descriptor space where comparisons can
be carried out.
Such descriptors, for example two-dimensional or three-dimensional pharmacophore
fingerprints, are not on a common free-energy scale and therefore comparisons are not so intuitive 30-31.
New molecular descriptors are continually being developed and used for selection or design of similar
or dissimilar molecules. An interesting example of a new descriptor is the 'feature tree', a novel way of
representing the characteristics of a molecule32.
When used for intermolecular similarities, feature trees break away from comparisons based purely on atomic
connectivity but avoid the need to explicitly go to three-dimensional pharmacophore concepts. Work on three-
dimensional pharmacophore and shape representations continues, because these are the methods that should
mimic a receptor's viewpoint, rather than a chemist's perception of the internal make-up of a molecule33
Fig. 7. Drug Discovery Cycle
Collection
Natural Products
Medical Need
Known "Leads"
Secondary Test
Systems
Biological Hypothesis
Rational Design
Primary Screens
and Assays
Compounds
Development
Candidate(s)
IND
"Leads"
Advanced
Leads
Phase I
Phase II
Concepts of 'diverse sets' and 'representative sets' of molecules are often used as both subjective and objective
ways of describing and selecting collections of molecules. It should, however, be
remembered that the
descriptor space that is chosen to work in will always be a molecular-derived one because that is all we can a
priori determine from a molecular structure. Furthermore, a compound that may be chosen as similar or
dissimilar to another molecule is only such in the descriptor space that is used for the selection. Therefore, there
is no such thing as a truly universal set of representative molecules for all bioassays, despite the mathematical
possibility of deriving one in a particular chemical descriptor space.
Selection of subsets of molecules for screening is often carried out by selecting 'representative' molecules from
clusters created in a multidimensional chemical descriptor space. For the datasets studied, Bayada et al.34
concluded that Ward's clustering of two-dimensional fingerprints gave the biggest improvement over random
selections while, in a different study, the use of a partitioned chemical descriptor space showed how such a
space could be used for diverse subset selections
This latter method obviates the problem of some clustering techniques, where the clusters change as new
molecules are added to a study. Computational library design techniques using appropriate descriptors,
particularly methods using genetic algorithms36-37 have become vital because of the need to design more
efficient libraries.
These methods allow the calculated property profile of a virtual library to be optimised so that it most
effectively matches a desired target, such as the properties of a collection of drug-like molecules. They can also
cope with the huge combinatorial space that must be examined when selecting monomers for a library that is to
be smaller than that theoretically possible. A useful paper by Cramer et al.38 on library design provides a
summary of background issues and extensions while Drewry and Young39 have recently published a
comprehensive review of library design methods. A novel procedure, based on the fragmentation of molecules
already known to be active at the target receptor or enzyme, has been described to aid in the selection of
appropriate monomers for inclusion into focussed libraries40 Experience has shown that library design should
preferably be based on calculated properties in product space rather than in monomer space 41 This requires
efficient means to enumerate the product structures of libraries. Synthetic chemists favour software systems42,43
based on chemical transformations that mimic the actual chemistry carried out as these are more familiar.
Alternative methods that require identifying the common core and appended fragments of a library 44,45 are faster
once the separate parts of the product have been defined, but this often requires
intervention. Hybrid systems have also been developed
45,46
considerable human
. Strategies for more efficient biological screening
continue to evolve. Rather than relying on very large screening campaigns, iterative screening strategies are
being explored. These involve screening smaller, selected, sets of molecules and using the derived results to
define descriptors for the rational selection of a further set of molecules. While this obviously mimics the
traditional medicinal chemistry approach of responding to new data, it has taken some time for it to be
effectively translated into the libraries paradigm. Statistical tools, such as recursive partitioning 47 can assist in
this process to identify which descriptors about a lead should be pursued.
Tools and techniques
In recent years, the computational workhorse of most computational chemistry and informatics groups has been
Silicon Graphics computers, particularly for property calculations, molecular graphics and complex data display.
IBM, Sun and DEC Alpha servers and workstations have also been used extensively. With the advent of
client/server concepts of computing and the deep penetration of WEB technologies into most computing
environments, however, the situation is rapidly changing.
Chemoinformatics have evolved through individual initiatives of many firms. Software, hardware, applications
as well as systems have emerged and now they are getting integrated. Since it is not an organic growth if the
applications have to catch up there has to be a major drive towards standardization. This is one of the most
crucial action imperatives. While use of the Web is widely accepted for text and image handling, its use as an
environment for scientific tools is technically more difficult. Although its familiarity to users makes it an
attractive option, exploring the true benefits of this type of environment may need to wait for the next
generation of web languages. Many of the tools developed and applied in chemoinformatics. Molecular
Simulations Inc. (San Diego, USA) developed WEBLAB48 a tool address this problem. The latest version of
HTML incorporates extensions such as XML (extensible mark-up language) and its chemical implementation
CML (chemical mark-up language)49 Other companies50-51 continue to develop web plug-ins such as
ChemBeans (based on Java) and environments such as MOE. Several tools for visualising raw and derived data
are now available. Spotfire52 PARTEK53 and DIVA54 are examples of tools that have appeared in the past year
that have value for different aspects of the visualisation and analysis of the volumes of data now being
generated. While tools for making chemoinformatics methods more accessible to bench scientists are important,
the receptiveness of medicinal chemists to these techniques requires that their training in statistics, data analysis,
visualisation and biomolecular concepts be improved. The interest shown in Lipinski's 'rule of five 55 which
succinctly encapsulates some simple parameters concerning drug absorption, shows how eager medicinal
chemists are for rules to help design appropriate molecules in the libraries era. As chemists are receptive to these
simplified rules, however, more sophisticated tools and concepts can easily get bypassed. This illustrates a real
need for both better end-user tools and training of medicinal chemists, and readily accessible experts to apply
the more advanced methods effectively.
Fig. 8. Drug Discovery Funnel
Understand
disease
Select
Target
Design
Primary
Screen
Screen
Identify
“HIT”
1,00,000
Compounds
Final
Selection
of best leads
Entry
into
Development
5 Compounds
Table 1. Chemical structure Databases[Source: Daylight]
From the above table it is clear that every organization collected and generated their own library there will be
repeat informations in the same. There should be a mechanism or tool to be developed to link all the related
information . This will help to develop unique database with global interest and one point access to chemical
informations.
Technical Issues
Chemoinformatics software from software houses is expensive! Building and maintaining your own solutions is
also expensive! Thus, if you want good tools to derive and use knowledge, you must be prepared to commit
significant resources to this area, in terms of hardware, software and people-ware (i.e. effective creators and
users of software). Avoiding supplier monopolies and looking for cheaper modules to be substituted for
outdated or overpriced parts helps keep costs down. This, however, requires software to be assembled in a
modular fashion in the first place and to be mutually compatible.
Structure representation in computer in an encoded form is a almost matured field now, however many
organisations follow their own file format for storage of structure in addition to their in-house acquired research
data. Much time will continue to be wasted with incompatible file types without internationally agreed
standards. There is a need to develop unicode for individual molecules along with structural descriptor, which
should be implemented in all the databases available globally as a linking medium irrespective of database type
and location in e-world. This unicode technique will reduce duplication of information. All the compounds
including virtual library of molecules should be referred using this unicode as the researchers uses CAS Registry
Number for known molecules and compounds.
Current Status
Recent advances in virtual screening track computational capability, as the processing power of computers
improve, so does screening speed and complexity. Parameters such as structure, function or chemical space
allows for nearly limitless array of screening options. The use of screening data for development decision
making is predicated on the management and interpretation of the data. Extraction of information from the data
is the vital link between theoretical design and drug candidate. Finally, it is the integration of iterative results
from computation to activity that drives the cycle forward.
With out proper knowledge base lead optimization is a search in the vast darkness of chemistry space. It may
lead to wrong direction in the drug discovery program. Establishing a proper database with complete test results
may lead to organizational success in drug discovery developments (Fig. 9).
Identify
Leads
100 %
INPUT
Optimize
Drug
Candidates
Effective
“Chemo”
Filter
85 %
Survival Rate
Fig. 9: Need for effective Chemoinformatics filter
Combinatorial chemistry has opened new strategies for a more comprehensive parallel approach to sweeping
and searching during lead optimization, which has necessitated the development of suitable and new library
design principles.
Conclusions
The need for improved chemoinformatics systems has been driven by the explosion of raw data coming from
library synthesis and HTS operations. Knowledge gained by analysis of this data is only as good as the quality
of the data in the first place; however, the increase in the amount of data available has often been at the expense
of context and quality. The next phase of the challenge must be to have quality chemoinformatics tools to apply
to quality data. Then at least we will have achieved something other than a new name for a continuing problem.
This integration of chemical information and drug discovery will completely change the drug discovery process
allowing small and innovative firms to be active in drug discovery.
Insert Table-1 : Chemical structure Databases[Source: Daylight]
Database
ACD
Aquire
Asinex
ChemReact97
ChemSynth97
IBioScreenSC
Maybridge
MedChem
NCI96
SPRESI '95
SPRESI '95 Preps
SpresiReact
TSCA93
WDI
Contents
238,000
5300
115,000
470,000 (Str)
170,000
16,000
62,000 (subst)
36,000 (subst)
Supplier
MDL Information Systems Inc.
EPA
AsInEx Ltd.
InfoChem GmbH
InfoChem GmbH
InterBioscreen Ltd.
Maybridge
Pomona/BioByte
120,000
3,200,000
2.0 million substances
1,800,000
100,000
60000 (drugs)
NCI
InfoChem GmbH
InfoChem GmbH
InfoChem GmbH
EPA
Derwent
Table –2: Companies sponsoring chemoinformatics products worldwide
Abbott Laboratories
Affymax Research Institute
Aventis Crop Science (France, UK)
Aventis Pharma (France, Germany, USA)
AstraZeneca UK
Avon Products Inc
Bayer (Germany, USA)
Beiersdorf AG
Birmingham University
Boehringer Ingelheim
Cardiff University
CMBI Nijmegen
Celltech R&D Limited
Firmenich SA
GlaxoWellcome Inc
GlaxoWellcome R & D
GlaxoWellcome SpA
Health & Safety Executive
Henkel KgaA
Hoffmann-La Roche (AG, Inc)
Instituto Quimico de Sarriá
Janssen Pharmaceutica
Novartis Pharma
NV Organon
Pfizer Inc
Procter & Gamble Company
RW Johnson PRI
Schering AG
Searle Pharmaceuticals
SmithKline Beecham Pharmaceuticals
Sanofi-Synthelabo Group
Takeda Chemical Industries
Unilever Research
University of Leeds
Wyeth-Ayerst Research
Table 3: Chemoinformatics Web links (URL)
Company / Organization
NCI 3D
NIST Webbook
Cambridge Soft ACX
Cambridge crystallographic Data
Beilstein Abstracts
Advanced Chemistry Development Inc
Molecular Design Limited Informational Systems Inc
ChemWeb
Daylight Chemical Information Systems Inc
Molecular Simulations Inc, Weblab.
Chemical Computing Group Inc.
Afferent Systems Inc.
Oxford Molecular Inc.
Tripos Inc.
Synopsys Scienctific Systems
Web Site (URL)
Glossary for Chemoinformatics
CML Chemical Markup Language: http://www.xml-cml.org/
CIS chemical information system: Must include registration, computed and measured properties, chemical
descriptors and inventory.
Chemoinformatics: Increasingly incorporates "compound registration into databases, including library
enumeration; access to primary and secondary scientific literature; QSARs (quantitative structure/activity
relationships) and similar tools for relating activity to structure; physical and chemical property calculations;
chemical structure and property databases, chemical library design and analysis; structure-based design and
statistical methods.
Chemometrics: The chemical discipline that uses mathematical, statistical and other methods employing formal
logic 1.to design or select optimal measurement procedures and experiments, and 2.to provide maximum
relevant chemical information by analyzing chemical data.
Computational chemistry: A discipline using mathematical methods for the calculation of molecular properties
or for the simulation of molecular behaviour. [IUPAC Med Chem]
Data mining: Nontrivial extraction of implicit, previously unknown and potentially useful information from
data, or the search for relationships and global patterns that exist in databases.
Data mining tools : Tools for Data Mining, NCBI, US http://www.ncbi.nlm.nih.gov/Tools/index.html
Provides access to BLAST, Clusters of Orthologous Groups (COGs), ORF finder, Electronic PCR, UniGene,
GeneMap99, VecScreen, Cancer Genome Anatomy Project CGAP, Cancer Chromosome Aberration Project
cCAP, Human-Mouse Homology Maps, LocusLink, VAST search data mining, genomic:
GUI Graphical User Interface: The two most useful GUI’s are the Query interface to the database and the
Report/Analysis interfaces
in silico: In or by means of a computer simulation. Virtual world of data, analysis, models and designs that
reside within a computer. All possible compounds and ideas are contained within this virtual world. More
molecules that we can make or afford to test. Estimates of the number of drug-like compounds that could
theoretically be made are greater than 1040
Lipinski’s rule of five: The Rule of Five is called so because the cutoffs for each of four parameters are all
close to five or a multiple of five. The "rule of 5" states that: poor absorption or permeation are more likely
when: There are more than 5 H-bond donors (expressed as the sum of OHs and NHs); The MWT is over 500
The LogP is over 5 (or MLogP is over 4.15) There are more than 10 H-bond acceptors (expressed as the sum of
Ns and Os).
http://www.acdlabs.com/products/phys_chem_lab/logp/ruleof5.html
"plug and play" systems: Required for effective chemoinformatics systems. Must be designed backward from
the answer to the data to be captured and systems should be in components where each component has one
simple task.
"silo systems": Legacy method for many information systems, a system built to collect, store and report one
laboratory’s data. Each "silo system" holds the data differently and may be in a different technology and the
results of the systems cannot easily be interchanged.
SAR Structure Activity Relationship: The relationship between chemical structure and pharmacological activity
for a series of compounds.
References
1. Brown FK: Chemoinformatics: what is it and how does it impact drug discovery. Annu Rep Med Chem 1998,
(33) 375–384.
2. Martin YC: Challenges and prospects for computational aids to molecular diversity. Perspect Drug Discov
Des 1997, (7/8): 159–172
3. Blake JF: Chemoinformatics – predicting the physicochemical properties of drug-like molecules. Current
Opinion in Biotechnology 2000, (11) 104-107
4.Obach RS, Baxter JG, Liston TE, Silber MB, Maclntyre F, Rance DJ: The prediction of human
pharmacokinetic parameters from preclinical and in vitro metabolism data. J. Pharacol Exp Ther 1997, 283:4658
5.Wessel MD, Jurs PC, Tolan JW, Muskal SM: Prediction of human intestinal absorption of drug compounds
from molecular structure. J Chem Inf Comput Sci 1998, 38: 726-235
6.Buchwald P, Bodor N: Octanol-Water partition : searching for predictive models. Curr Med Chem 1998, 5:
353-380
7.Annon: ClogP. Daylight Chemical Information Software. Mission Viejo, CA: Daylight Chemical Information
Inc.
8. Lennernas H: Human intestinal permeability. J Pharm Sci 1998, 87: 403-410
9.Lipinski CA, Lombardo F, Dominy BW, Feeney PJ: Experimental and computational approaches to estimate
Solubility and permeability in drug discovery and development settings. Adv Drug Deliv Rev 1997, 23:3-25
10.Mitchell BE, Jurs PC: Prediction of aqueous solubility of organic compounds from molecular structure. J.
Chem Inf. Comput Sci. 1998, 38: 489-496
11.Huuskonen J, Salo M, Taskinen J: Aqueous solubility prediction of drugs based on molecular topology and
neural network modeling. J. Chem Inf. Comput Sci 1995, 35:1039-1045
12.Wessel MD, Sutter JM, Jurs PC: Prediction of reduced ion mobility constans of organic compounds from
molecular structure. Anal Chem 1996, 63: 4237-4243
13. Palm K, Luthman K, Ungell A-L, Standlund G, Beigei F, Lundahl P, Artursson P: Evaluation of dynamic
polar surface area as a predictor of drug absorption: comparison with other computational and experimental
predictors. J. Med Chem 1998, 41:5382-5392
14.Krarup LH, Christenson IT, Hovgaard L, Frokjaer S: Predicting drug absorption from molecular surface
properties based on molecular dynamics similations. Pharm Res 1998, 15:972-978
15.Clark DE: Rapid calculation of polar molecular surface area and its application to the prediction of transport
phenomena. 1. Prediction of intestinal absorption. J. Pharm Sci. 1999, 88:807-814
16.Clark DE: Rapid calculation of polar molecular surface area and its application to the prediction of transport
phenomena. 2. Prediction of blood-brain barrier penetration. J. Pharm Sci 1999, 88:815-821
17. Huibers PDT, Katritzky AR: Correlation of the aqueous solubility of hydrocarbons with molecular structure.
J Chem Inf Comput Sci 1998, 38:283-292
18.Willett P: Chemoinformatics – similarity and diversity in chemical libraries. Current Opinion in
Biotechnology 2000, 11:85-88
19.Leach AR, Hann MM: The in silico world of virtual libraries. Drug Discov Today 2000,5(8):326-336
20a. Carell T, Wintner A, Bashir H.A, Rebek J. A Solution-Phase Screening Procedure for the Isolation of
Active Compounds from a Library of Molecules, Angew Chem. Int. Ed. Engl. 1994, 33, 2061-2064
20. Kubinyi H: Similarity and dissimilarity. A medicinal chemists view. Perspect Drug Discov Des 1998, 9–11:
225–252.
21. Brown RD: Descriptors for diversity analysis. Prospect Drug Discov Des 1997, 7/8:31-49
22. Pickett SD, Mason JS, McLay IM: Diversity profiling and design using 3D pharmacophores:
pharmacophore-derived queries (PDQ) J Chem Inform Comput Sci 1996, 36:1214-1223
23. Parks CA, Crippen GM, Topliss JG: he measurement of molecular diversity by receptor site interaction
simulation. J Comput Aided Mol Des 1998, 12:441-449
24. Kubinyi H, Folkers G, Martin YC: 3D QSAR in drug design. Theory, methods and applications. Perspect
Drug Discov Des 1998, 9–11: v–vii]
25. Kubinyi H, Folkers G, Martin YC: 3D QSAR in drug design. Theory, methods and applications. Perspect
Drug Discov Des 1998, 12–14: v–vii
26.Bayada DM, Mamersma H, van Geerestein VJ: Molecular Diversity and representativity in chemical
databases. J. Chem Inform Comput Sci 1999, 39:1-10
27.Cummins DJ, Andrews CW, Bentley JA, Cory M: Molecular diversity in chemical databases: comparison of
medicinal chemistry knowledge bases and databases of commercially available compounds. J Chem Inform
Comput Sci 1996, 36:750-763
28.Martin EJ, Blanney JM, Siani MA, Spellmeyer DC, Wong AK, Moos WH: Measuring diversity:
experimental design of combinatorial libraries for drug discovery. J Med Chem 1995, 38:1431-1436
29. Bayada DM, Mamersma H, van Geerestein VJ: Molecular Diversity and representativity in chemical
databases. J. Chem Inform Comput Sci 1999, 39:1-10
30. Willett P, Barnard JM, Downs GM: Chemical similarity searching. J Chem Inform Comput Sci 1998, 38:
983–996.
31. Martin YC, Brown RD, Bures MG: Quantifying diversity. In Combinatorial Chemistry and Molecular
Diversity in Drug Discovery. Edited by Gordon M, Kerwin JF. New York: Wiley–Liss, 1998, 369–385
32.Rarey M, Dixon JS: Feature trees: a new molecular similarity measure based on tree matching. J Comput
Aided Mol Des 1998, 12: 471–490
33. Good AC, Richards WG: Explicit calculation of 3D molecular similarity.
Perspect Drug Discov Des 1998, 9–11: 321–338
34. Bayada DM, Hamersma H, van Geerestein VJ: Molecular diversity and representativity in chemical
databases. J Chem Inform Comput Sci 1999, 39: 1–10
35.Menard PR, Mason JS, Morize I, Bauerschmidt S: Chemistry space metrics in diversity analysis, library
design, and compound selection. J Chem Inform Comput Sci 1998, 38: 1204–1213
36. Gillet VJ, Willett P, Bradshaw J, Green DVS: Selecting combinatorial libraries to optimize diversity and
physical properties. J Chem Inform Comput Sci 1999, 39: 169–177
37. Brown RD, Martin YC: Designing combinatorial library mixtures using a genetic algorithm.
J Med Chem 1997, 40: 2304–2313
38. Cramer RD, Patterson DE, Clark RDD, Soltanshahi F, Lawless MS: Virtual compound libraries: a new
approach to decision making in molecular discovery research. J Chem Inform Comput Sci 1998, 38: 1010–1023
39. Drewry D, Young S: Approaches to the design of combinatorial libraries. Chemomet Intell Lab Sys 1999,
48: 1–20
40. Lewell XQ, Judd D, Watson S, Hann M: RECAP-retrosynthetic combinatorial analysis procedure: a
powerful new technique for identifying privileged molecular fragments with useful applications in combinatorial
chemistry. J Chem Inf Comput Sci 1998, 38: 511–522
41. Gillet V, Willett P, Bradshaw J: The effectiveness of reactant pools for generating structurally-diverse
combinatorial libraries. J Chem Inform Comput Sci 1997, 37: 731–740
42. Daylight Chemical Information Systems Inc. on the World Wide Web, URL http://www.daylight.com/.
43. Afferent Systems Inc. on the World Wide Web, URL http://www.afferent.com/. ]
44. Molecular Design Limited, Information Systems Inc. on the World Wide Web, URL
http://www.MDLi.com/tech/centrallib.html/. ]
45. Tripos, Inc. on the World Wide Web, URL http://www.tripos.com/. ]
46. Synopsys Scientific Systems on the World Wide Web, URL http://www.synopsys.co.uk/. ]
47. Chen X, Rusinko A, Young SS: Recursive partitioning analysis of a large structure-activity data set using
three-dimensional descriptors. J Chem Inform Comput Sci 1998, 38: 1054–1062. ]
48. Molecular Simulations Inc. on the World Wide Web, URL http://www.msi.com. ]
49. Rzepa HS, Murray-Rust P, Whitaker BJ: The application of chemical multipurpose internet mail extensions
(Chemical MIME). Internet standards to electronic mail and World Wide Web information exchange. J Chem
Inform Comput Sci 1998, 38: 976–982. ] .
50. Cherwell Scientific Publishing Ltd. on the World Wide Web, URL http://www.cherwell.com/.
51.Chemical Computing Group Inc. on the World Wide Web, URL http://www.chemcomp.com/.
52.Spotfire Inc. on the World Wide Web, URL http://www.spotfire.com/
53.Partek Inc. on the World Wide Web, URL http://www.partek.com/
54.Oxford Molecular Group on the World Wide Web, URL http://www.oxmol.co.uk/.
55.Lipinski CA, Lombardo F, Dominy BW, Feeney PJ: Experimental and computational approaches to estimate
solubility and permeability in drug discovery and development settings. Adv Drug Deliv Rev 1997, 23: 3–2535
Fig. 1 Progress in drug discovery with time
Automated
Scanning of
Molecules
High
High Throughput
Screening
Predictability
of activity
Use of
Library of
Molecules
Low
Hit and
Trial NCE
Discovery
Years
Fig. 2: The chemical information address all related areas
Environmental
effects & Hazards
Analysis and
Modelling
Chemical and
physical reference
Data
Chemical
Information
Systems
Spectroscopy
Environment
Pharmocology
Toxicology
Regulations
Fig. 3. The Molecular Paradigm
Molecular Target
Cloning and Expression
Dissimilarity Selection
Automated High Throughput Screening
Similarity Search
Lead Optimization
Development
Fig. 4. Describing Chemical Structure Information
Functional
Group
Molecular Size
Fig. 5: Combinatorial Synthesis
R
11,191
Compounds
R
R
R
19 L-amino acids
R
R
O
65,341
Compounds
R
R
Fig. 6: Knowledge Based Drug Design
Protein
X-ray
Structure
based
design
Pharmacophore
Pharmcophore
based design
Protein
mechanism
Focused
sets
Primary
screening
libraries
Zero
knowledge
Diversity needed to find a hit
Fig. 7. Drug Discovery Cycle
Collection
Natural Products
Medical Need
Known "Leads"
Biological Hypothesis
Rational Design
Primary Screens
and Assays
Compounds
Development
Candidate(s)
IND
Secondary Test
Systems
"Leads"
Advanced
Leads
Phase I
Phase II
Fig. 8. Drug Discovery Funnel
Understand
disease
Select
Target
Design
Primary
Screen
Screen
Identify
“HIT”
Final
Selection
of best leads
1,00,000
Compounds
5 Compounds
Fig. 9: Leads to Drug Candidates
Identify
Entry
into
Development
Leads
100 %
INPUT
Optimize
Drug
Candidates
Effective
“Chemo”
Filter
85 %
Survival Rate
Download