A Comprehensive Course on Chemoinformatics

A Comprehensive Course on Chemoinformatics
Day 1;
1. Introduction into Chemoinformatics
The term Chemoinformatics was coined only a few years ago but it rapidly gained
widespread use. Workshops and symposia are organized completely devoted to
Chemoinformatics, and journals are full of advertisements of positions for
Chemoinformatics specialists.
With the name Chemoinformatics being so new, there are still different views on the
scope of the field. Brown confined in his definition Chemoinformatics to the drug
discovery process and quite a few people have followed this lead.
We are rather attached to a broader definition of Chemoinformatics:
Chemoinformatics is the application of informatics methods to solve chemical
In the last four decades attempts have been made in many areas of chemistry to
develop informatics methods and software tools to further our insight into chemistry
and to assist in the solution of chemical problems. In this sense, Chemoinformatics is
not a discipline that was born overnight but has a long history and many roots.
It is clear that chemistry is a scientific discipline that is largely built on experimental
observations and data. However, the amount of data and information accumulated is
enormous - and this mountain of data (or is it a haystack?) is increasing with
increasing speed. The problem is then to extract knowledge from these data and
information in order to make predictions.
This is where Chemoinformatics comes in. Chemoinformatics should assist the
chemist in solving some of the most fundamental problems he/she is faced with in the
day-to-day work:
 the design of molecules that have a desired property
 the design of reactions and syntheses to make these compounds
 the analysis and elucidation of structures isolated in the environment or obtained
in reactions
The relationships embodied in these problems are in most cases too complex to be
solved by first principles. Nevertheless, chemists have solved such problems since
over a century. They have done so by building their knowledge on known facts,
ordering these facts in a systematic manner, building models on the inherent
relationships in these data, abstracting information to its essential features, and
making predictions by analogy. This process is called inductive learning and inductive
learning has thus played an essential role in chemistry. Methods and software
systems have been developed to assist in this inductive learning process on a scale
that now allows the processing of huge amounts of data.
Computer methods have invaded most areas of chemistry, computers are on the
desk of nearly every chemist, computer methods will change the way we do and
perceive chemistry. In this sense, the teaching of chemoinformatics becomes an
urgent need. In fact, several universities already offer chemoinformatics curricula.
It is clear that we do not expect each university to offer an entire curriculum in
Chemoinformatics but in each country certainly a few places are required where
Chemoinformatics is taught. Furthermore, we think that essential ideas and methods
of Chemoinformatics have to be integrated into any chemistry curriculum.
2. Representation of Chemical Compounds
This chapter focuses on chemical compounds and introduces various methods for
their representation in computer-readable form. These methods can be used for the
input and output of chemical structures into computer programs and databases.
In communicating information on the structure of chemical compounds chemists use
various depths of perception: from a characterization of a compound by its name, or
by drawing a two-dimensional image, or by providing a three-dimensional molecular
model. This hierarchy is also reflected in various levels of sophistication in
representing chemical structures in electronic form, from linear notations, through
chemical graphs to 3D structure representations and, finally, to molecular surfaces.
The international language for representing chemical structures is graphical in nature:
the structure diagram. Mathematically, a structure diagram can be considered as a
graph, and therefore many problems faced in processing structure information can
build on Graph Theory. Important tasks in processing structure information are
canonical numbering, in order to arrive at a unique and unambiguous structure
representation, and the perception of constitutional symmetry. The presence of rings
in chemical structures profoundly changes their properties; ring perception is
therefore an important task in structure analysis.
Molecules are three-dimensional objects and, therefore, the stereochemical features
of molecules have to be handled. Furthermore, methods have been developed to
automatically generate a 3D molecular model from a connection table, from the
constitution of a molecule. With the generation of a 3D molecular model one is
immediately faced with the problem of conformational flexibility as molecules can
change their 3D structure by relatively free rotations around single bonds.
Conformational flexibility allows, for example, a drug molecule to adjust its shape to
the structural requirements of a receptor protein.
The wide-spread use of ball and stick molecular models, the balls representing atoms
and the sticks bonds, should not make us forget that atoms are spheres that
penetrate each other on bond formation, and, thus, molecules have surfaces, have
The human eye is the best pattern recognizer and, therefore, any sophisticated
communication of information on chemical structures should be visual in nature; the
computer visualization of molecular structures has attained a high state of