Chemoinformatics a tool for Modern Drug Discovery Competition and cost has changed the drug design paradigm from the hit and trial approach to automated drug design approach allowing tailor made design of active molecules. This has resulted in both targeted drug discovery and reduced drug development cycle time. The need for introducing newer molecules that are superior using automated approach will make drug discovery a highly knowledge specific. Some of the techniques that are evolved over time are schematically presented in Fig.1. indicating that progressively every step in the drug discovery chain has become automated. Automated Scanning of Molecules High High Throughput Screening Predictability of activity Use of Library of Molecules Low Hit and Trial NCE Discovery Years Fig. 1 Progress in drug discovery with time Rapid change in global competition, growth in IT and emergence of low cost storage technology has facilitated the paradigm change in drug discovery. Any recent drugs in the market have its own story to tell as how it managed to come out of various hurdles starting from conceptualization to reality. Knowledge management is playing in a major role in almost all chemical and pharmaceutical companies. New chemoinformatics units are created to assist on going drug discovery programs. Many studies have appeared on chemoinformatics, however this paper briefly outlines managerial issues and support required for effective implementation of chemoinformatics in both small and large organizations successfully. What is Chemoinformatics Chem(o)informatics is a generic term that encompasses the design, creation, organization, storage, management, retrieval, analysis, dissemination, visualization and use of chemical information, not only in its own right, but as a surrogate or index for other data, information and knowledge1a.Chemoinformatics, was defined1b as the ''mixing of information resources to transform data into information, and information into knowledge, for the intended purpose of making better decisions faster in the arena of drug lead identification and optimization”. Chemoinformatics plays a vital link between theoretical design and in drug design through extraction of information from the data and convert into knowledge (Fig.2). Rules Knowledge Facts Information Chemoinformatics Numbers Data Fig. 2: Chemoinformatics pyramid In Chemoinformatics there are really only two [primary] questions: 1.) what to test next and 2.) what to make next. Derivation of information and knowledge is only one aspect of chemoinformatics. The use of derived knowledge in a design and selection support role is an important part of the drug design cycle. The main processes within drug discovery are lead identification, where a lead is something that has activity in the low micromolar range, and lead optimization, which is the process of transforming a lead into a drug candidate. Chemoinformatics methods can be used proactively to design and filter the most appropriate compounds to work with in the real world. Fig. 3. The Molecular Paradigm Molecular Target Cloning and Expression Dissimilarity Selection Automated High Throughput Screening Similarity Search Lead Optimization Development Fig. 4. Describing Chemical Structure Information Environmental effects & Hazards Analysis and Modelling Chemical and physical reference Data Chemical Information Systems Spectroscopy Environment Pharmocology Toxicology Regulations Approach towards Chemoinformatics For effective implementation of Chemoinformatics the following approaches are followed by different firms and organizations. compound registration (Database Creation) library enumeration navigating virtual libraries access to primary and secondary scientific literature QSAR (quantitative structure/activity relationships) physical and chemical property calculations chemical structure and property databases These tools include not only methods for analysis of experimental data, but also for generation of calculated properties of molecules. Physico chemical Property Predictions (“Drug Like” molecules) Since long efforts were directed towards predicting the properties of chemical species (Drugs, Drug like candidates, Drug Intermediates etc.). However recent advances in chemoinformatics include new molecular descriptors and pharmacophore techniques, statistical tools and their applications. The ability to predict so called ADME (absorption, distribution, metabolism and excretion) properties from molecular structure would have a tremendous impact on the drug discovery process both in terms of cost and the amount of time required to bring a new compound to market. Over the past several years there has been a tremendous shift toward emphasis on optimization of ADME properties early in the life of drug discovery programs. Two strategies are likely to emerge in the area of physicochemical property prediction: those seeking to develop general rules in order to screen large numbers of compounds, and those attempting to provide increasing levels of accuracy for more diverse compounds. Future research will probably focus on developing models with data sets that are larger and built around more diverse collections of compounds with a wide range of chemical functionalities. The application of chemoinformatics tools to predict physicochemical properties are reviewed3. 1. Human pharmacokinetic parameters4 Human intestinal absorption5 2. LogP 6 (a measure of lipophilicity / hydrophobicity in the distribution of compounds in various biological systems) ClogP.7 3. Computational models to measure the local absorption rate 8. 4. Solubility and Permeability 9-11 5. Reduced Ion Mobility 12 6. Drug Absorption 13,14 7. Transport phenomena 15-17 Though aqueous solubility has been extensively studied, computational methods for the estimation of this highly important property are just beginning to demonstrate predictive capabilities for complex molecules, though it is not an intrinsic property of molecular structure alone. Aqueous solubility can be greatly affected by crystal polymorphism. Design and development of structural libraries in silico environment The term in silico is now widely used to describe this virtual world of data, analysis, models and designs that reside within a computer. All possible compounds and ideas are contained within this virtual world, most of which we cannot afford to attempt in the real world. The 'real' world of compounds made in a chemistry laboratory and tested in a biological laboratory is only part of a much larger 'virtual' world where hypotheses may be computer-generated and tested for practicality. The advent of high-throughput methods for drug discovery represents the key driving event for the renewed enthusiasm for developing (and re-inventing) computational methodology, since we will always be able to conceive of more molecules than we can make or afford to test. Estimates of the number2 of drug-like compounds that could theoretically be made are greater than 1040. Deciding which of these molecules to make or acquire, and test, requires good decision-support systems. Rapid identification of a lead compound or lead compound series, remains the primary objective of all highthroughput screening. Thus, questions of similarity and diversity of chemical structures and libraries become important. In the present scenario, computational tools play a major role in the design of the libraries prior to synthesis 18,19 that meet the defined criteria of similarity or diversity. To address these, an appropriate structure coding has to be chosen, a structure coding that is somehow related to the biological activity under investigation. Furthermore, the structure coding scheme must produce the same number of descriptors, irrespective of the size of a molecule, the number of atoms in a molecule. The chemical structure has somehow to be transformed to produce a fixed number of descriptors. One such mathematical transformation is autocorrelation which is introduced by Moreau and Broto and it is well used for QSAR studies. QSAR technique provides quantitative relationships between a chemical structure and its physical, chemical or biological activity. Correlating the chemical structure of drugs with their pharmacological activities is of particular interest. The library enumeration where the core sub-structures are identified as templates with few atoms will be left open for substituents (R-groups). By varying the R-groups at the points of substitution different product structures can be generated. Rebek et al20a. published the synthesis of two combinatorial libraries of semi-rigid compounds that were prepared by condensing a rigid central molecule functionalized by four acid chloride groups with a set of 19 different L-amino acids. The more symmetric skeleton gives less compounds as shown in the Fig. 5 R 11,191 Compounds R R R 19 L-amino acids R R O 65,341 Compounds R R Fig. 5. Virtual Library Generation However if the core sub-structure contain symmetric geometry it may create duplicate structures during enumeration. This problem can be overcome by implementing similarity check algorithm based on connection table and chirality of the atoms involved. Alternative approach where the actual reaction is simulated through synthetic knowledge base. This more closely replicates the stages involved in the actual synthesis, in which reagents should react together according to the rules of synthetic chemistry. A strong back ground in Computer Aided Organic Synthesis (CAOS) program will help to generate reasonable structures of synthetic importance. Similarity Measures through structural descriptors, weighting scheme and similarity coefficient.20 So far more attention has been paid to the generation of descriptors for diversity analysis and studies on fragment substructures or physicochemical properties21. Methods to encode the structural features efficiently play an important role as it works as fingerprint for similarity analysis. 3D sub-structural descriptors based upon potential pharmacophoric patterns have also been widely used for diversity analysis 22-25 as have physicochemical properties that describe a molecule’s topological, electronic, steric, lipophilic or geometric features26-28. There is a need to select set of compounds that are as structurally diverse as possible from existing database such as a company’s corporate collection, a publicly available database or virtually library from combi-chem. Four principal types of selection procedure cited in the literature based on cluster, partition, dissimilarity and optimization29. However which types of procedure yield the best result and address factors such as cost, availability, synthetic feasibility rest with the user’s decision. In parallel another area that is getting greater importance is the development of filtering procedures that identifies molecules that exhibit some sort of undesirable characteristics (toxicity, high reactivity etc.,). The advent of a chemically aware web language and crossplatform working is ensuring that chemoinformatics methods are becoming available to all chemists in a more appropriate manner. Library chemistry and high-throughput screening require greater use of chemoinformatics to increase their effectiveness. Role of Natural Product chemistry in Chemoinformatics Natural products also forms an important sector in the area of drug discovery and development. Most encouraging is the continuing emergence of new natural product chemotypes with interesting structures and biological activities and potential for sub-library generation of targeted screening. Increasingly available as pure compounds, natural products are highly amenable to the much broader screening opportunities presented by the new targets. Regardless of chemical library input, natural products are uniquely well placed to provide structural information from which virtual compounds can be created by computational chemistry and allied technologies. The structural versatility of natural products is expected to play a major role in the modern drug discovery programs. Organizational structures for implementation IT and drug discovery are distinct competence in the existing organizational environment and hence will require between coordination. The success of leveraging chemoinformatics will depend on the ability of firms to use chemoinformatics to reduce the drug discovery cycle and the ability to integrate the chemoinformatics into the organizational knowledge creation process. The main organizational issue is managing the IT and the drug discovery process Organizations have two options for sourcing chemoinformatics competence, namely (a) through inhouse facilities and (b) through outsourcing. The way chemoinformatics is developing it may be easy for firms to outsource it, as it is specialized competence. Difficulty in getting specialized experts on chemoinformation is likely to compound in future. Identifying partners The major issue in leveraging chemoinformatics is identifying competent partners without loosing competitive edge and at the same time creating new molecules of medicinal importance. Review on Drug Company status (Growth sector) The major companies are using chemoinformatics in an integrated manner for areas that have high growth potential. The challenge is to rapidly learn how to leverage chemoinformatics to bring newer molecules that have highly predictable activity characteristics to minimize clinical trial costs. What the future holds? Genomics, proteomics and chemoinformatics will increase the diffusion of IT into the pharmaceutical industry. This will require a higher level organisational knowledge integration process that was hitherto non existent in the pharmaceutical industry. Drug discovery is moving into the realm of IT. Structural knowledge and drug knowledge are getting tightly integrated. Some of the major technologies that are used in chemoinformatics tools are: Virtual Chemistry Integration of Archival Data Diversity Metrics Structurally-based Diversity Searches or Comparisons Functionally-based Diversity Searches or Comparisons Virtual Database Screening Extraction of Information from High Throughput Screening Results Integration of Screening Results with Structural-based Design Efforts Application of Chemoinformatics to Lead Optimization Integration of Biological Activity Data Recent advances in chemoinformatics include new molecular descriptors and pharmacophore techniques, statistical tools and their applications. Two-dimensional fragment descriptors provide a powerful means of measuring structural similarity, and their success in this regard has made them a popular tool for diversity analysis. Visualization methods and hardware development are also opening new opportunities. Much time will continue to be wasted with incompatible file types without internationally agreed standards. Fig. 6: Knowledge Based Drug Design Protein X-ray Pharmacophore Structure based design Pharmcophore based design Protein mechanism Focused sets Primary screening libraries Zero knowledge Diversity needed to find a hit In classical QSAR, a common free-energy scale relates independent variables to each other, so concepts of similarity are possible by simple arithmetic difference of values. Concepts of similarity of chemical structure are more complex because a structure needs to be described in terms of a descriptor space where comparisons can be carried out. Such descriptors, for example two-dimensional or three-dimensional pharmacophore fingerprints, are not on a common free-energy scale and therefore comparisons are not so intuitive 30-31. New molecular descriptors are continually being developed and used for selection or design of similar or dissimilar molecules. An interesting example of a new descriptor is the 'feature tree', a novel way of representing the characteristics of a molecule32. When used for intermolecular similarities, feature trees break away from comparisons based purely on atomic connectivity but avoid the need to explicitly go to three-dimensional pharmacophore concepts. Work on three- dimensional pharmacophore and shape representations continues, because these are the methods that should mimic a receptor's viewpoint, rather than a chemist's perception of the internal make-up of a molecule33 Fig. 7. Drug Discovery Cycle Collection Natural Products Medical Need Known "Leads" Secondary Test Systems Biological Hypothesis Rational Design Primary Screens and Assays Compounds Development Candidate(s) IND "Leads" Advanced Leads Phase I Phase II Concepts of 'diverse sets' and 'representative sets' of molecules are often used as both subjective and objective ways of describing and selecting collections of molecules. It should, however, be remembered that the descriptor space that is chosen to work in will always be a molecular-derived one because that is all we can a priori determine from a molecular structure. Furthermore, a compound that may be chosen as similar or dissimilar to another molecule is only such in the descriptor space that is used for the selection. Therefore, there is no such thing as a truly universal set of representative molecules for all bioassays, despite the mathematical possibility of deriving one in a particular chemical descriptor space. Selection of subsets of molecules for screening is often carried out by selecting 'representative' molecules from clusters created in a multidimensional chemical descriptor space. For the datasets studied, Bayada et al.34 concluded that Ward's clustering of two-dimensional fingerprints gave the biggest improvement over random selections while, in a different study, the use of a partitioned chemical descriptor space showed how such a space could be used for diverse subset selections This latter method obviates the problem of some clustering techniques, where the clusters change as new molecules are added to a study. Computational library design techniques using appropriate descriptors, particularly methods using genetic algorithms36-37 have become vital because of the need to design more efficient libraries. These methods allow the calculated property profile of a virtual library to be optimised so that it most effectively matches a desired target, such as the properties of a collection of drug-like molecules. They can also cope with the huge combinatorial space that must be examined when selecting monomers for a library that is to be smaller than that theoretically possible. A useful paper by Cramer et al.38 on library design provides a summary of background issues and extensions while Drewry and Young39 have recently published a comprehensive review of library design methods. A novel procedure, based on the fragmentation of molecules already known to be active at the target receptor or enzyme, has been described to aid in the selection of appropriate monomers for inclusion into focussed libraries40 Experience has shown that library design should preferably be based on calculated properties in product space rather than in monomer space 41 This requires efficient means to enumerate the product structures of libraries. Synthetic chemists favour software systems42,43 based on chemical transformations that mimic the actual chemistry carried out as these are more familiar. Alternative methods that require identifying the common core and appended fragments of a library 44,45 are faster once the separate parts of the product have been defined, but this often requires intervention. Hybrid systems have also been developed 45,46 considerable human . Strategies for more efficient biological screening continue to evolve. Rather than relying on very large screening campaigns, iterative screening strategies are being explored. These involve screening smaller, selected, sets of molecules and using the derived results to define descriptors for the rational selection of a further set of molecules. While this obviously mimics the traditional medicinal chemistry approach of responding to new data, it has taken some time for it to be effectively translated into the libraries paradigm. Statistical tools, such as recursive partitioning 47 can assist in this process to identify which descriptors about a lead should be pursued. Tools and techniques In recent years, the computational workhorse of most computational chemistry and informatics groups has been Silicon Graphics computers, particularly for property calculations, molecular graphics and complex data display. IBM, Sun and DEC Alpha servers and workstations have also been used extensively. With the advent of client/server concepts of computing and the deep penetration of WEB technologies into most computing environments, however, the situation is rapidly changing. Chemoinformatics have evolved through individual initiatives of many firms. Software, hardware, applications as well as systems have emerged and now they are getting integrated. Since it is not an organic growth if the applications have to catch up there has to be a major drive towards standardization. This is one of the most crucial action imperatives. While use of the Web is widely accepted for text and image handling, its use as an environment for scientific tools is technically more difficult. Although its familiarity to users makes it an attractive option, exploring the true benefits of this type of environment may need to wait for the next generation of web languages. Many of the tools developed and applied in chemoinformatics. Molecular Simulations Inc. (San Diego, USA) developed WEBLAB48 a tool address this problem. The latest version of HTML incorporates extensions such as XML (extensible mark-up language) and its chemical implementation CML (chemical mark-up language)49 Other companies50-51 continue to develop web plug-ins such as ChemBeans (based on Java) and environments such as MOE. Several tools for visualising raw and derived data are now available. Spotfire52 PARTEK53 and DIVA54 are examples of tools that have appeared in the past year that have value for different aspects of the visualisation and analysis of the volumes of data now being generated. While tools for making chemoinformatics methods more accessible to bench scientists are important, the receptiveness of medicinal chemists to these techniques requires that their training in statistics, data analysis, visualisation and biomolecular concepts be improved. The interest shown in Lipinski's 'rule of five 55 which succinctly encapsulates some simple parameters concerning drug absorption, shows how eager medicinal chemists are for rules to help design appropriate molecules in the libraries era. As chemists are receptive to these simplified rules, however, more sophisticated tools and concepts can easily get bypassed. This illustrates a real need for both better end-user tools and training of medicinal chemists, and readily accessible experts to apply the more advanced methods effectively. Fig. 8. Drug Discovery Funnel Understand disease Select Target Design Primary Screen Screen Identify “HIT” 1,00,000 Compounds Final Selection of best leads Entry into Development 5 Compounds Table 1. Chemical structure Databases[Source: Daylight] From the above table it is clear that every organization collected and generated their own library there will be repeat informations in the same. There should be a mechanism or tool to be developed to link all the related information . This will help to develop unique database with global interest and one point access to chemical informations. Technical Issues Chemoinformatics software from software houses is expensive! Building and maintaining your own solutions is also expensive! Thus, if you want good tools to derive and use knowledge, you must be prepared to commit significant resources to this area, in terms of hardware, software and people-ware (i.e. effective creators and users of software). Avoiding supplier monopolies and looking for cheaper modules to be substituted for outdated or overpriced parts helps keep costs down. This, however, requires software to be assembled in a modular fashion in the first place and to be mutually compatible. Structure representation in computer in an encoded form is a almost matured field now, however many organisations follow their own file format for storage of structure in addition to their in-house acquired research data. Much time will continue to be wasted with incompatible file types without internationally agreed standards. There is a need to develop unicode for individual molecules along with structural descriptor, which should be implemented in all the databases available globally as a linking medium irrespective of database type and location in e-world. This unicode technique will reduce duplication of information. All the compounds including virtual library of molecules should be referred using this unicode as the researchers uses CAS Registry Number for known molecules and compounds. Current Status Recent advances in virtual screening track computational capability, as the processing power of computers improve, so does screening speed and complexity. Parameters such as structure, function or chemical space allows for nearly limitless array of screening options. The use of screening data for development decision making is predicated on the management and interpretation of the data. Extraction of information from the data is the vital link between theoretical design and drug candidate. Finally, it is the integration of iterative results from computation to activity that drives the cycle forward. With out proper knowledge base lead optimization is a search in the vast darkness of chemistry space. It may lead to wrong direction in the drug discovery program. Establishing a proper database with complete test results may lead to organizational success in drug discovery developments (Fig. 9). Identify Leads 100 % INPUT Optimize Drug Candidates Effective “Chemo” Filter 85 % Survival Rate Fig. 9: Need for effective Chemoinformatics filter Combinatorial chemistry has opened new strategies for a more comprehensive parallel approach to sweeping and searching during lead optimization, which has necessitated the development of suitable and new library design principles. Conclusions The need for improved chemoinformatics systems has been driven by the explosion of raw data coming from library synthesis and HTS operations. Knowledge gained by analysis of this data is only as good as the quality of the data in the first place; however, the increase in the amount of data available has often been at the expense of context and quality. The next phase of the challenge must be to have quality chemoinformatics tools to apply to quality data. Then at least we will have achieved something other than a new name for a continuing problem. This integration of chemical information and drug discovery will completely change the drug discovery process allowing small and innovative firms to be active in drug discovery. Insert Table-1 : Chemical structure Databases[Source: Daylight] Database ACD Aquire Asinex ChemReact97 ChemSynth97 IBioScreenSC Maybridge MedChem NCI96 SPRESI '95 SPRESI '95 Preps SpresiReact TSCA93 WDI Contents 238,000 5300 115,000 470,000 (Str) 170,000 16,000 62,000 (subst) 36,000 (subst) Supplier MDL Information Systems Inc. EPA AsInEx Ltd. InfoChem GmbH InfoChem GmbH InterBioscreen Ltd. Maybridge Pomona/BioByte 120,000 3,200,000 2.0 million substances 1,800,000 100,000 60000 (drugs) NCI InfoChem GmbH InfoChem GmbH InfoChem GmbH EPA Derwent Table –2: Companies sponsoring chemoinformatics products worldwide Abbott Laboratories Affymax Research Institute Aventis Crop Science (France, UK) Aventis Pharma (France, Germany, USA) AstraZeneca UK Avon Products Inc Bayer (Germany, USA) Beiersdorf AG Birmingham University Boehringer Ingelheim Cardiff University CMBI Nijmegen Celltech R&D Limited Firmenich SA GlaxoWellcome Inc GlaxoWellcome R & D GlaxoWellcome SpA Health & Safety Executive Henkel KgaA Hoffmann-La Roche (AG, Inc) Instituto Quimico de Sarriá Janssen Pharmaceutica Novartis Pharma NV Organon Pfizer Inc Procter & Gamble Company RW Johnson PRI Schering AG Searle Pharmaceuticals SmithKline Beecham Pharmaceuticals Sanofi-Synthelabo Group Takeda Chemical Industries Unilever Research University of Leeds Wyeth-Ayerst Research Table 3: Chemoinformatics Web links (URL) Company / Organization NCI 3D NIST Webbook Cambridge Soft ACX Cambridge crystallographic Data Beilstein Abstracts Advanced Chemistry Development Inc Molecular Design Limited Informational Systems Inc ChemWeb Daylight Chemical Information Systems Inc Molecular Simulations Inc, Weblab. Chemical Computing Group Inc. Afferent Systems Inc. Oxford Molecular Inc. Tripos Inc. Synopsys Scienctific Systems Web Site (URL) Glossary for Chemoinformatics CML Chemical Markup Language: http://www.xml-cml.org/ CIS chemical information system: Must include registration, computed and measured properties, chemical descriptors and inventory. Chemoinformatics: Increasingly incorporates "compound registration into databases, including library enumeration; access to primary and secondary scientific literature; QSARs (quantitative structure/activity relationships) and similar tools for relating activity to structure; physical and chemical property calculations; chemical structure and property databases, chemical library design and analysis; structure-based design and statistical methods. Chemometrics: The chemical discipline that uses mathematical, statistical and other methods employing formal logic 1.to design or select optimal measurement procedures and experiments, and 2.to provide maximum relevant chemical information by analyzing chemical data. Computational chemistry: A discipline using mathematical methods for the calculation of molecular properties or for the simulation of molecular behaviour. [IUPAC Med Chem] Data mining: Nontrivial extraction of implicit, previously unknown and potentially useful information from data, or the search for relationships and global patterns that exist in databases. Data mining tools : Tools for Data Mining, NCBI, US http://www.ncbi.nlm.nih.gov/Tools/index.html Provides access to BLAST, Clusters of Orthologous Groups (COGs), ORF finder, Electronic PCR, UniGene, GeneMap99, VecScreen, Cancer Genome Anatomy Project CGAP, Cancer Chromosome Aberration Project cCAP, Human-Mouse Homology Maps, LocusLink, VAST search data mining, genomic: GUI Graphical User Interface: The two most useful GUI’s are the Query interface to the database and the Report/Analysis interfaces in silico: In or by means of a computer simulation. Virtual world of data, analysis, models and designs that reside within a computer. All possible compounds and ideas are contained within this virtual world. More molecules that we can make or afford to test. Estimates of the number of drug-like compounds that could theoretically be made are greater than 1040 Lipinski’s rule of five: The Rule of Five is called so because the cutoffs for each of four parameters are all close to five or a multiple of five. The "rule of 5" states that: poor absorption or permeation are more likely when: There are more than 5 H-bond donors (expressed as the sum of OHs and NHs); The MWT is over 500 The LogP is over 5 (or MLogP is over 4.15) There are more than 10 H-bond acceptors (expressed as the sum of Ns and Os). http://www.acdlabs.com/products/phys_chem_lab/logp/ruleof5.html "plug and play" systems: Required for effective chemoinformatics systems. Must be designed backward from the answer to the data to be captured and systems should be in components where each component has one simple task. "silo systems": Legacy method for many information systems, a system built to collect, store and report one laboratory’s data. Each "silo system" holds the data differently and may be in a different technology and the results of the systems cannot easily be interchanged. SAR Structure Activity Relationship: The relationship between chemical structure and pharmacological activity for a series of compounds. References 1. Brown FK: Chemoinformatics: what is it and how does it impact drug discovery. Annu Rep Med Chem 1998, (33) 375–384. 2. Martin YC: Challenges and prospects for computational aids to molecular diversity. Perspect Drug Discov Des 1997, (7/8): 159–172 3. Blake JF: Chemoinformatics – predicting the physicochemical properties of drug-like molecules. Current Opinion in Biotechnology 2000, (11) 104-107 4.Obach RS, Baxter JG, Liston TE, Silber MB, Maclntyre F, Rance DJ: The prediction of human pharmacokinetic parameters from preclinical and in vitro metabolism data. J. Pharacol Exp Ther 1997, 283:4658 5.Wessel MD, Jurs PC, Tolan JW, Muskal SM: Prediction of human intestinal absorption of drug compounds from molecular structure. J Chem Inf Comput Sci 1998, 38: 726-235 6.Buchwald P, Bodor N: Octanol-Water partition : searching for predictive models. Curr Med Chem 1998, 5: 353-380 7.Annon: ClogP. Daylight Chemical Information Software. Mission Viejo, CA: Daylight Chemical Information Inc. 8. Lennernas H: Human intestinal permeability. J Pharm Sci 1998, 87: 403-410 9.Lipinski CA, Lombardo F, Dominy BW, Feeney PJ: Experimental and computational approaches to estimate Solubility and permeability in drug discovery and development settings. Adv Drug Deliv Rev 1997, 23:3-25 10.Mitchell BE, Jurs PC: Prediction of aqueous solubility of organic compounds from molecular structure. J. Chem Inf. Comput Sci. 1998, 38: 489-496 11.Huuskonen J, Salo M, Taskinen J: Aqueous solubility prediction of drugs based on molecular topology and neural network modeling. J. Chem Inf. Comput Sci 1995, 35:1039-1045 12.Wessel MD, Sutter JM, Jurs PC: Prediction of reduced ion mobility constans of organic compounds from molecular structure. Anal Chem 1996, 63: 4237-4243 13. Palm K, Luthman K, Ungell A-L, Standlund G, Beigei F, Lundahl P, Artursson P: Evaluation of dynamic polar surface area as a predictor of drug absorption: comparison with other computational and experimental predictors. J. Med Chem 1998, 41:5382-5392 14.Krarup LH, Christenson IT, Hovgaard L, Frokjaer S: Predicting drug absorption from molecular surface properties based on molecular dynamics similations. Pharm Res 1998, 15:972-978 15.Clark DE: Rapid calculation of polar molecular surface area and its application to the prediction of transport phenomena. 1. Prediction of intestinal absorption. J. Pharm Sci. 1999, 88:807-814 16.Clark DE: Rapid calculation of polar molecular surface area and its application to the prediction of transport phenomena. 2. Prediction of blood-brain barrier penetration. J. Pharm Sci 1999, 88:815-821 17. Huibers PDT, Katritzky AR: Correlation of the aqueous solubility of hydrocarbons with molecular structure. J Chem Inf Comput Sci 1998, 38:283-292 18.Willett P: Chemoinformatics – similarity and diversity in chemical libraries. Current Opinion in Biotechnology 2000, 11:85-88 19.Leach AR, Hann MM: The in silico world of virtual libraries. Drug Discov Today 2000,5(8):326-336 20a. Carell T, Wintner A, Bashir H.A, Rebek J. A Solution-Phase Screening Procedure for the Isolation of Active Compounds from a Library of Molecules, Angew Chem. Int. Ed. Engl. 1994, 33, 2061-2064 20. Kubinyi H: Similarity and dissimilarity. A medicinal chemists view. Perspect Drug Discov Des 1998, 9–11: 225–252. 21. Brown RD: Descriptors for diversity analysis. Prospect Drug Discov Des 1997, 7/8:31-49 22. Pickett SD, Mason JS, McLay IM: Diversity profiling and design using 3D pharmacophores: pharmacophore-derived queries (PDQ) J Chem Inform Comput Sci 1996, 36:1214-1223 23. Parks CA, Crippen GM, Topliss JG: he measurement of molecular diversity by receptor site interaction simulation. J Comput Aided Mol Des 1998, 12:441-449 24. Kubinyi H, Folkers G, Martin YC: 3D QSAR in drug design. Theory, methods and applications. Perspect Drug Discov Des 1998, 9–11: v–vii] 25. Kubinyi H, Folkers G, Martin YC: 3D QSAR in drug design. Theory, methods and applications. Perspect Drug Discov Des 1998, 12–14: v–vii 26.Bayada DM, Mamersma H, van Geerestein VJ: Molecular Diversity and representativity in chemical databases. J. Chem Inform Comput Sci 1999, 39:1-10 27.Cummins DJ, Andrews CW, Bentley JA, Cory M: Molecular diversity in chemical databases: comparison of medicinal chemistry knowledge bases and databases of commercially available compounds. J Chem Inform Comput Sci 1996, 36:750-763 28.Martin EJ, Blanney JM, Siani MA, Spellmeyer DC, Wong AK, Moos WH: Measuring diversity: experimental design of combinatorial libraries for drug discovery. J Med Chem 1995, 38:1431-1436 29. Bayada DM, Mamersma H, van Geerestein VJ: Molecular Diversity and representativity in chemical databases. J. Chem Inform Comput Sci 1999, 39:1-10 30. Willett P, Barnard JM, Downs GM: Chemical similarity searching. J Chem Inform Comput Sci 1998, 38: 983–996. 31. Martin YC, Brown RD, Bures MG: Quantifying diversity. In Combinatorial Chemistry and Molecular Diversity in Drug Discovery. Edited by Gordon M, Kerwin JF. New York: Wiley–Liss, 1998, 369–385 32.Rarey M, Dixon JS: Feature trees: a new molecular similarity measure based on tree matching. J Comput Aided Mol Des 1998, 12: 471–490 33. Good AC, Richards WG: Explicit calculation of 3D molecular similarity. Perspect Drug Discov Des 1998, 9–11: 321–338 34. Bayada DM, Hamersma H, van Geerestein VJ: Molecular diversity and representativity in chemical databases. J Chem Inform Comput Sci 1999, 39: 1–10 35.Menard PR, Mason JS, Morize I, Bauerschmidt S: Chemistry space metrics in diversity analysis, library design, and compound selection. J Chem Inform Comput Sci 1998, 38: 1204–1213 36. Gillet VJ, Willett P, Bradshaw J, Green DVS: Selecting combinatorial libraries to optimize diversity and physical properties. J Chem Inform Comput Sci 1999, 39: 169–177 37. Brown RD, Martin YC: Designing combinatorial library mixtures using a genetic algorithm. J Med Chem 1997, 40: 2304–2313 38. Cramer RD, Patterson DE, Clark RDD, Soltanshahi F, Lawless MS: Virtual compound libraries: a new approach to decision making in molecular discovery research. J Chem Inform Comput Sci 1998, 38: 1010–1023 39. Drewry D, Young S: Approaches to the design of combinatorial libraries. Chemomet Intell Lab Sys 1999, 48: 1–20 40. Lewell XQ, Judd D, Watson S, Hann M: RECAP-retrosynthetic combinatorial analysis procedure: a powerful new technique for identifying privileged molecular fragments with useful applications in combinatorial chemistry. J Chem Inf Comput Sci 1998, 38: 511–522 41. Gillet V, Willett P, Bradshaw J: The effectiveness of reactant pools for generating structurally-diverse combinatorial libraries. J Chem Inform Comput Sci 1997, 37: 731–740 42. Daylight Chemical Information Systems Inc. on the World Wide Web, URL http://www.daylight.com/. 43. Afferent Systems Inc. on the World Wide Web, URL http://www.afferent.com/. ] 44. Molecular Design Limited, Information Systems Inc. on the World Wide Web, URL http://www.MDLi.com/tech/centrallib.html/. ] 45. Tripos, Inc. on the World Wide Web, URL http://www.tripos.com/. ] 46. Synopsys Scientific Systems on the World Wide Web, URL http://www.synopsys.co.uk/. ] 47. Chen X, Rusinko A, Young SS: Recursive partitioning analysis of a large structure-activity data set using three-dimensional descriptors. J Chem Inform Comput Sci 1998, 38: 1054–1062. ] 48. Molecular Simulations Inc. on the World Wide Web, URL http://www.msi.com. ] 49. Rzepa HS, Murray-Rust P, Whitaker BJ: The application of chemical multipurpose internet mail extensions (Chemical MIME). Internet standards to electronic mail and World Wide Web information exchange. J Chem Inform Comput Sci 1998, 38: 976–982. ] . 50. Cherwell Scientific Publishing Ltd. on the World Wide Web, URL http://www.cherwell.com/. 51.Chemical Computing Group Inc. on the World Wide Web, URL http://www.chemcomp.com/. 52.Spotfire Inc. on the World Wide Web, URL http://www.spotfire.com/ 53.Partek Inc. on the World Wide Web, URL http://www.partek.com/ 54.Oxford Molecular Group on the World Wide Web, URL http://www.oxmol.co.uk/. 55.Lipinski CA, Lombardo F, Dominy BW, Feeney PJ: Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Adv Drug Deliv Rev 1997, 23: 3–2535 Fig. 1 Progress in drug discovery with time Automated Scanning of Molecules High High Throughput Screening Predictability of activity Use of Library of Molecules Low Hit and Trial NCE Discovery Years Fig. 2: The chemical information address all related areas Environmental effects & Hazards Analysis and Modelling Chemical and physical reference Data Chemical Information Systems Spectroscopy Environment Pharmocology Toxicology Regulations Fig. 3. The Molecular Paradigm Molecular Target Cloning and Expression Dissimilarity Selection Automated High Throughput Screening Similarity Search Lead Optimization Development Fig. 4. Describing Chemical Structure Information Functional Group Molecular Size Fig. 5: Combinatorial Synthesis R 11,191 Compounds R R R 19 L-amino acids R R O 65,341 Compounds R R Fig. 6: Knowledge Based Drug Design Protein X-ray Structure based design Pharmacophore Pharmcophore based design Protein mechanism Focused sets Primary screening libraries Zero knowledge Diversity needed to find a hit Fig. 7. Drug Discovery Cycle Collection Natural Products Medical Need Known "Leads" Biological Hypothesis Rational Design Primary Screens and Assays Compounds Development Candidate(s) IND Secondary Test Systems "Leads" Advanced Leads Phase I Phase II Fig. 8. Drug Discovery Funnel Understand disease Select Target Design Primary Screen Screen Identify “HIT” Final Selection of best leads 1,00,000 Compounds 5 Compounds Fig. 9: Leads to Drug Candidates Identify Entry into Development Leads 100 % INPUT Optimize Drug Candidates Effective “Chemo” Filter 85 % Survival Rate