A Comprehensive Course on Chemoinformatics Day 1; 1. Introduction into Chemoinformatics The term Chemoinformatics was coined only a few years ago but it rapidly gained widespread use. Workshops and symposia are organized completely devoted to Chemoinformatics, and journals are full of advertisements of positions for Chemoinformatics specialists. With the name Chemoinformatics being so new, there are still different views on the scope of the field. Brown confined in his definition Chemoinformatics to the drug discovery process and quite a few people have followed this lead. We are rather attached to a broader definition of Chemoinformatics: Chemoinformatics is the application of informatics methods to solve chemical problems. In the last four decades attempts have been made in many areas of chemistry to develop informatics methods and software tools to further our insight into chemistry and to assist in the solution of chemical problems. In this sense, Chemoinformatics is not a discipline that was born overnight but has a long history and many roots. It is clear that chemistry is a scientific discipline that is largely built on experimental observations and data. However, the amount of data and information accumulated is enormous - and this mountain of data (or is it a haystack?) is increasing with increasing speed. The problem is then to extract knowledge from these data and information in order to make predictions. This is where Chemoinformatics comes in. Chemoinformatics should assist the chemist in solving some of the most fundamental problems he/she is faced with in the day-to-day work: the design of molecules that have a desired property the design of reactions and syntheses to make these compounds the analysis and elucidation of structures isolated in the environment or obtained in reactions The relationships embodied in these problems are in most cases too complex to be solved by first principles. Nevertheless, chemists have solved such problems since over a century. They have done so by building their knowledge on known facts, ordering these facts in a systematic manner, building models on the inherent relationships in these data, abstracting information to its essential features, and making predictions by analogy. This process is called inductive learning and inductive learning has thus played an essential role in chemistry. Methods and software systems have been developed to assist in this inductive learning process on a scale that now allows the processing of huge amounts of data. Computer methods have invaded most areas of chemistry, computers are on the desk of nearly every chemist, computer methods will change the way we do and perceive chemistry. In this sense, the teaching of chemoinformatics becomes an urgent need. In fact, several universities already offer chemoinformatics curricula. It is clear that we do not expect each university to offer an entire curriculum in Chemoinformatics but in each country certainly a few places are required where Chemoinformatics is taught. Furthermore, we think that essential ideas and methods of Chemoinformatics have to be integrated into any chemistry curriculum. 1 2. Representation of Chemical Compounds This chapter focuses on chemical compounds and introduces various methods for their representation in computer-readable form. These methods can be used for the input and output of chemical structures into computer programs and databases. In communicating information on the structure of chemical compounds chemists use various depths of perception: from a characterization of a compound by its name, or by drawing a two-dimensional image, or by providing a three-dimensional molecular model. This hierarchy is also reflected in various levels of sophistication in representing chemical structures in electronic form, from linear notations, through chemical graphs to 3D structure representations and, finally, to molecular surfaces. The international language for representing chemical structures is graphical in nature: the structure diagram. Mathematically, a structure diagram can be considered as a graph, and therefore many problems faced in processing structure information can build on Graph Theory. Important tasks in processing structure information are canonical numbering, in order to arrive at a unique and unambiguous structure representation, and the perception of constitutional symmetry. The presence of rings in chemical structures profoundly changes their properties; ring perception is therefore an important task in structure analysis. Molecules are three-dimensional objects and, therefore, the stereochemical features of molecules have to be handled. Furthermore, methods have been developed to automatically generate a 3D molecular model from a connection table, from the constitution of a molecule. With the generation of a 3D molecular model one is immediately faced with the problem of conformational flexibility as molecules can change their 3D structure by relatively free rotations around single bonds. Conformational flexibility allows, for example, a drug molecule to adjust its shape to the structural requirements of a receptor protein. The wide-spread use of ball and stick molecular models, the balls representing atoms and the sticks bonds, should not make us forget that atoms are spheres that penetrate each other on bond formation, and, thus, molecules have surfaces, have shape. The human eye is the best pattern recognizer and, therefore, any sophisticated communication of information on chemical structures should be visual in nature; the computer visualization of molecular structures has attained a high state of development. 2