Student notes - CBRG

advertisement
Department of Computer Science
ETH Zürich
Scientific Databases
Storing and Sharing Mathematical Objects: OpenMath.
Chemical Formula Representation: InChITM.
Lecture 5
Lorenzo Gatti, Aliaksandr Yudzin
Wednesday 12th December, 2012
5.1 Introduction
Storing and sharing complex scientific objects like mathematical equations and chemical formulae
is very important but also a quite challenging topic. The first and most obvious difficulty is that
the mathematical formula is much more complex to typeset and visualize then just a plain text, but
real challenges start when we have to store and then search for that formula or represent objects
in a machine readable way and then exchange these objects between different applications. In the
following lecture we will discuss the state-of-the art of dealing with all these issues.
5.2 Different levels of representation
Mathematical formulae and chemical objects can be encoded at different levels of human and machine
understandability. The first and simplest level is the notational level (representation level). At notation
level one can just capture the way the equation looks like and do not care about any meaning of stored
object. The only goal of that storing is just to make sure that object will be rendered in a proper
way. Among the most popular notation level Markup Languages for mathematical objects there are
MathML 2 , LATEX 4 and MathType 5 .
One way to encode the complex objects is the pure semantic level of representation in which the
application has the deepest understanding of a mathematical object and on which it can perform
computations. That representation level is necessary for Computer Algebra Systems and theorem
provers.
Somewhere in between these two extremes there is a content level, whose aim is to encode the
structure and, to a limited extent, the semantics of mathematical formulae. MathML Contenta and
OpenMath 1 are examples of markup languages that encode formulae at this level. That former level
is also useful for intercommunication between different applications, lets say between a front-end
equation editor and back-end Computer Algebra System or between two machines.
5.3 Complex objects representation: problem statement
Our goal is to represent a complex object in a formal, machine readable way. In essence, we need
an algorithm that allows us to convert any formula or chemical object into a formal language and,
a
http://www.w3.org/TR/MathML2/chapter4.html#contm.intro
1
5.3. Complex objects representation: problem statement
of course, to convert it back in a unique way. To give an idea of which problems may arise, let
us imagine the necessity of representing a mathematical object. Some objects like matrices do not
cause any issues neither on the level of representation nor on the level of semantic, but some, like
expressions do.
The implementation of formulae on notational level is quite straightforward as well, but the content
and semantic levels are not so obvious. So, if we want to represent (a + b), we need to consider that
a, b are variables and + is an operation. We also have to apply some simple mathematical rules: for
example, the expressions (a + b) and (b + a) are the same from an algebraic point of view (because
of commutative property) but different from representation level and we have to take it into account.
The multiplicative law shows a different behaviour: the expressions (a × b) and (b × a) represent
the same operation only if the variables a and b are not matrices. We also might want to consider
some more algebraic rules, like the following:
• 1×a = a
• a+0 = a
• a(b + c ) = ab + ac
Defining a limit and restricting the set of algebraic rules to consider is a critical point: the desire
of taking into account every aspect of mathematical semantic could make any use of the language
impossible and make the system too cumbersome.
5.3.1 Representation of complex scientific objects: main idea
The most common way of representing the semantic level of complex objects is by a graph. It has some
nice features like, for example, a possibility to apply the commutative law to a formula (see above)
or represent structure variations in a chemical. Of course, representing mathematical and chemical
objects has its own specificity. For example, managing mathematical object does not require structure
handling due to the intrinsic logic of this type of objects. On the other hand, in mathematics we do
not have cycles and loops but we have symmetry rules to account for.
When we think about mathematical objects and chemicals, it is immediate that, as soon as we can
represent these objects properly, we would represent them in a unique form.
The main idea is to represent the object as a graph (binary tree or generic tree objects), where nodes
are operands (a, b) or operations (+) and they are all connected in a proper way to represent a
formula. One of the possible ways of representation is to use nested parentheses. This way was
introduced by Sir Arthur Cayley in 1895 and called the Newick tree formatb .
∗
R
+
A
B
E
C D
Figure 5.1: Generic hierarchical tree structure
b
10
7
3
Figure 5.2: Hierarchical tree structure for
the mathematical equation (7 + 3) ∗ 10 corresponding to the Newick string ((7, 3)+, 10)∗;
http://www.ctu.edu.vn/~dvxe/Bioinformatic/Software/Rod%20Page/Newick%20tree%20format.htm
2
5.4. Application: InChITM
The following Newick strings describe tree in figure 5.1.
(A, B ), ((C , D ), E )
(A, B ), ((D, C ), E )
(5.1)
(5.2)
The graph representation also has some issues: for example, we still have to handle multiple representations, i.e. in case we have a communicative operator + for example (trees are prone to branch
swapping). In case of matrices, according to the operation to perform on the tree, it is also worth
noticing that, the objects represented by these two strings above are not the same. So, due to tree
branch permutations, describing two subtrees coming from the same parent can result in an ambiguity
and this case happens every time we encounter a commutative operator. We can apply this format to
represent chemical compounds, as well. However, a compound represented by a graph, in absence of
cycles, will lead to the same problem due to structure orientation deficiency.
Solving the tree structure is a complex problem: it belongs to the class of "graph isomorphism" where
we cannot approach the identification of two or more identical graphs when the nodes are non labelled
(no polynomial algorithms as solvers) as in figure 5.3.
Figure 5.3: Graph isomorphism: unlabelled 5-nodes graphs.
The concept of "isomorphism", e.g. of "graph isomorphism", captures the notion that some objects have
"the same structure" if one ignores individual distinctions of "atomic" components (vertices and edges,
for graphs) of objects in question. Whenever individuality of "atomic" components is important for
correct representation of whatever is modelled by graphs, the model is refined by imposing additional
restrictions on the structure, and other mathematical objects are used: digraphs, labelled graphs,
coloured graphs, rooted trees and so on. If two graphs are isomorphic their nodes can be rearranged
(without breaking or adding any edge) so that the two graphs are identical, ignoring the labels on
the nodes.
5.4 Application: InChITM
IUPAC (International Union of Pure and Applied Chemistry) 6 is an organization which is responsible
for assigning a unique name to new chemicals. They recognised the need of assigning some formal
description to chemical compounds in order to perform database searching through a unique identifier.
The kind of problem they faced with shows up immediately on simple chemical formulae (as in figure
5.4) representing more than one single compound at the same time. The description is then unique
but ambiguous.
IUPAC relied on the simplified molecular-input line-entry system or SMILES which produces specifications in form of a line notation of the structure of chemical molecules using short ASCII strings.
Basically the representation starts at some point on the molecule and it does a deep search, mainly
describing the links between the atoms.
In terms of a graph-based computational procedure, SMILES is a string obtained by printing the
symbol nodes encountered in a depth-first tree traversal of a chemical graph. The chemical graph (in
fig. 5.5a) is first trimmed to remove hydrogen atoms, consequently cycles are broken (in fig. 5.5b) to
3
5.4. Application: InChITM
OH
OH
CH3
CH3
OH
(a) m-cresol
3-Methylphenol
CH3
(c) o-cresol
2-Methylphenol
(b) p-cresol
4-Methylphenol
Figure 5.4: These compounds all share the same chemical formula C7 H8 O, conventional name Cresol, while they have different
structural conformation according to their substituent locations
turn it into a spanning tree (in fig, 5.5c). Where cycles have been broken, numeric suffix labels are
included to indicate the connected nodes. Parentheses are used to indicate points of branching on
the tree.
SMILE was known to be not unique, so IUPAC came to InChITM (International Chemical Identifier) 3
that is a standard which aim to give a unique string for every molecule parsed.
An example of the main differences in representing the same molecule in shown in figure 5.6.
2
4
2
4
1
1
3
3
(a) Systematic name: 1-cyclopropyl-6-fluoro-4-oxo7-(piperazin-1-yl)-quinoline-3-carboxylic acid
2
(b) Trimming and cycle breaking
4
2
4
1
1
3
3
(c) Spanning tree reconstruction
N1CCN(CC1)C(C(F)=C2)=CC(=C2C4=O)N(C3CC3)C=C4C(=O)O
(d) SMILE string
Figure 5.5: Deriving the SMILES representation of a chemical molecule. Shown example: ciprofloxacin, a fluoroquinolone
antibiotic, chemical formula C17 H18 F N3 O3 .
5.4.1 InChITM : string format details
InChITM strings start with InChI=, followed by the version number, currently 1. An InChITM is a text
string composed of segments (layers) separated by delimiters (/). The more complicated the structure,
4
5.4. Application: InChITM
the more complicated will be the string. If multiple disconnected parts of a structure are present,
semicolons within each layer will separate them. No white space is allowed inside any InChITM string.
This format has been designed for compactness, not readability, but can be interpreted manually. The
length of the string is roughly proportional to the number of atoms in the compound. Numbers inside
the layers represent the canonical numbering of atoms presented in the first layer according to the
chemical formula (except hydrogen atoms)c .
The six layers with important sublayers are:
1. Main layer
a) Chemical formula (no prefix). This is the only sublayer that must occur in every InChITM .
b) Atom connections (prefix: "c"). The atoms in the chemical formula (except for hydrogens)
are numbered in sequence; this sublayer describes which atoms are connected by bonds
to which other ones.
c) Hydrogen atoms (prefix: "h"). Describes how many hydrogen atoms are connected to each
of the other atoms.
2. Charge layer
a) proton sublayer (prefix: "p" for "protons")
b) charge sublayer (prefix: "q")
3. Stereochemical layer
a) double bonds and cumulenes (prefix: "b")
b) tetrahedral stereochemistry of atoms and allenes (prefixes: "t", "m")
c) type of stereochemistry information (prefix: "s")
4. Isotopic layer (prefixes: "i", "h", as well as "b", "t", "m", "s" for isotopic stereochemistry)
5. Fixed-H layer (prefix: "f"); contains some or all of the above types of layers except atom connections; may end with "o" sublayer; never included in standard InChITM
6. Reconnected layer (prefix: "r"); contains the whole InChITM of a structure with reconnected metal
atoms; never included in standard InChITM
c
http://www.inchi-trust.org/fileadmin/user_upload/html/inchifaq/inchi-faq.html#2.8
O
H
SMILE format
O=Cc1ccc(O)c(OC)c1
InChITM format
InChI=1S/C8H8O3/c1-11-8-4-6(5-9)2-3-7(8)10/h2-5,10H,1H3
CH3
O
Latex chemfig format
\chemfig{*6(-(-OH)=(-O-[::+60]CH_3)-=(-(=[::+60]O)(-[::-60]H))-=)}
OH
Figure 5.6: Vanilline, IUPAC name 4-Hydroxy-3-methoxybenzaldehyde, chemical formula C8 H8 O3 , different methods of representing the chemical structure.
5
5.5. Applications: OpenMath & MathML
The delimiter-prefix format has the advantage that a user can easily use a wildcard search to find
identifiers that match only in certain layers.d
5.5 Applications: OpenMath & MathML
OpenMath was started in December 1993 and has been developed in a long series of workshops. The
first OpenMath workshop was organized by Gaston Gonnet at ETH in Zürich. Then, the OpenMath
1.0 Standard was introduced in February 2000. Two years later, the OpenMath 2.0 Standard was
released in June 2004. OpenMath 1.0 fixed the basic language architecture, while OpenMath 2.0
brought better XML integration.
OpenMath is a markup language that supplies meaning specifics (semantics) of mathematical formulae.
Its main goals are:
• storing and searching mathematical objects in databases.
• freely exchanging mathematical objects between applications (highly important).
• allowing algebraic applications to understand the meaning of formulae and perform computation.
MathML was started a little bit late, in 1998, as a W3C recommendation. Version 1.01 of the format
was released in July 1999 and version 2.0 appeared in February 2001. Its aim differed from OpenMath’s
goal. MathML was mainly concerned about representing mathematics in the web and it was supposed
to be simpler than OpenMath and not to care about the semantics of formulae.
Nowadays, they both are converging, so now OpenMath cares not only about representation but
also about semantics as well as MathML acquired some semantic representation features. One can
compare two short fragments of MathML and OpenMath notation of the same simple formula 2*a (in
fig. 5.7) and feel this difference:
in MathML:
in OpenMath:
<mrow>
<mn>2</mn>
<mo>⁢<!-- ⁢ --></mo>
<mi>a</mi>
</mrow>
<OMA>
<OMS cd="arith1" name="times"/>
<OMI>2</OMI>
<OMV name="a"/>
</OMA>
Figure 5.7: We can see that MathML actually do not care about the type of operation (InvisibleTimes), so that it places it
inside a tag <mo></mo> as a simple value. On the contrary, OpenMath wants to know that it is in fact a multiplication setting
it into a special attribute tag so, that during the parsing of <OMS cd="arith1" name="times"/> we will have the type of
operation (times) as a metadata not as a value.
Another example of the difference between MathML and OpenMath is that for MathML b + a is
completely different from a + b because they are rendered differently. But for OpenMath that formulae
might mean the same, in case the operands are numbers and we can apply commutative law.
OpenMath is aimed at encoding the mathematical semantics and, via its extensible Content Dictionary mechanism, may be applied to arbitrary areas of mathematics without the need for any central
agreement to change the language. MathML on the other hand has no mechanism for describing the
semantics of mathematical objects, although it can attach a pointer to a symbol indicating where its
semantics are defined, for example in an OpenMath Content Dictionary. It also includes a small, fixed
set of symbols whose semantics are defined informally in the MathML Recommendation.
d
http://en.wikipedia.org/wiki/International_Chemical_Identifier#Format_and_layers
6
5.5. Applications: OpenMath & MathML
• OpenMath provides a mechanism for describing the semantics of mathematical symbols, while
MathML does not.
• MathML provides a presentation format for mathematical objects, while OpenMath does not.
5.5.1 Mathematical objects sharing and exchanging: content dictionary
Application*A
Content&Dictionaries
Application*B
communication)between)
applications)when)CDs)are)
identical
Content&Dictionaries
communication)between)
applications
Encoding/XML
transport)layer
Encoding/XML
Figure 5.8: In the OpenMath general transportation scheme, we have an application A and an application B that want to share
a certain mathematical object. The communication will be instantiated through content dictionaries (CDs) XML based encoded.
The encoding layer is then used as transport level. The CDs are specific per each application. CDs are used to define the kind
of mathematics is encoding the objects. A special case of transport is realized when we have the same CDs, this leads to a
shortcut at CDs level.
Lets say there are two applications that need to communicate, i.e. sending a complex mathematical
object like a formula. In order to understand each other, they need to have a dictionary of elementary
mathematical objects and operations on both sides, which is called Content Dictionary (CD). CDs are
central to the OpenMath philosophy of transmitting mathematical information. They hold the meanings
of the objects being transmitted. If, for example, we are sending an equation involving multiplication
of matrices, the applications must agree on what a matrix is, and on what a multiplication is, etc. All
these informations are held within some Content Dictionaries which both applications agree upone .
A Content Dictionary holds the meanings of (various) mathematical "words". These words are OpenMath basic objects referred as symbols .
So, the basic workflow is the following:
1. The application gets a formula from a visual editor (or database).
2. It parses and validates that formula against a Content Dictionary.
3. The string is then encoded in XML.
4. XML can be used as a transport layer to be transmitted to another application.
5.5.2 Issues
Due to mathematical semantics, the presence of exceptions and variations causes serious issues: as the
matter of fact, according to the specific case and the assumptions considered, the equality statement
e
http://www.openmath.org/standard/om20-2004-06-30/omstd20html-4.xml
7
5.6. Complex Object representation process: storing and searching
might not always hold, as in the example shown here:
e.g.
x
= 1 if x 6= 0
x
while if xx is considered as a polynomial equation, then the equality statement is valid. In the following
case the equality declaration is valid only if the expression is considered as a power series:
inf
X
xi =
i=0
1
if |x| < 1
1−x
According the context, encoded in the CDs, one will obtain the list of the operations allowed. A lot
of effort has been put on the CDs, which de facto are sophisticated ontologies: they describe which
rules are allowed for that particular semantic environment.
5.6 Complex Object representation process: storing and searching
5.6.1 Mapping process: encoding and decoding processes
In order to store a real complex objects in a database and then do a search it is very convenient to
map it into a string, serializing it through an encoding function. This mapping process should ideally
be one-to-one, even it may be not always the case. When it is not, the mapping process of the object
onto a/many representation/s and backward, should at least retrieve the same object, as in figure 5.9.
The mapping process will give the possibility of linking the real objects with the strings that codify
for them (encoding process).
Figure 5.9: Mapping process: coherence requirement strictly necessary.
The decoding process is needed to check if the strings are valid. The objects can be stored in a
scientific database for exchanging and computing operations. A scheme is proposed in figure 5.10.
5.6.2 Issues
• Correctness in the process of encoding.
• One-to-One mapping.
• Human readability (objects that are going to be exchanged with other people have to be understood immediately if they are simple at least).
• Compactness (strings should not be too long) [e.g. SMILE represents H2 O as O, C H4 as C ].
• Acceptance in some communities.
• Encode-decode has to be efficient.
• Representation should not be too long. Representation might be maximally compact but still
too long.
8
5.7. Importance of Hashing Functions
1b
bad$encoding$process
4
E*(◉)
1:n
{set(of(strings}
selection$
(shortest$|$lexicographical$1°)
1a
complex(object
representation
good$encoding$process
E(◉)
1:1
unique(1string1
other
users
e
shar ate
c
uni
m
com
SH A $function
H (◉)
2
3a
key
(fixed(size)
e
m y
na k e x
3b
D(◉)
decoding$process$to$
reconstruct$the$original$
object
other$informations$in$the$DB
Figure 5.10: Picture of the day. (1a) 1:1 encoding process with a good encoding method. (1b) 1:n encoding process with a bad
encoding method will retrieve a set of strings representing the encoded object. A selection process is required to guaranty the
unicity of the string generated. This selection or filtering is performed by choosing the first lexicographical string. (2) decoding
process, reconstruction of the original object encoded in the string. (3a) String referencing through hashing values. (3b) strings
are then stored in a database associated with their hash values. This leads in faster searching in databases. (4) distributing
the string associated to the complex object.
– This is very relevant in case of chemical products, for example if we use CCCCC to represent
C5, it is maximally compact but still too long, so we have to come up with some tricks (like
C5) to represent it in a shorter way.
– We might use that compact representation as a database key, by which we could quickly
look up for this object in our SDB. And that trick is usually done with help of one of SHA
– Secure Hash Algorithms.
5.7 Importance of Hashing Functions
5.7.1 Standard Hash
A hashing function is a simple way of mapping a string of variable length (called keys, they are
serialized objects in our case) into an integer of fixed length (called hash values). After applying
a hashing function we have a quite short signature of our string (which is in fact a chemical or
mathematical object) but still unique (or almost unique).
One of the nice features of the hashing function is that it allows to perform whole database search
very fast by using just a hash of what we are looking for. Let say we are trying to find a chemical
formula in our scientific database. One way to do that is to convert it into string and then look for
the same string in our database.
It will work, but the more quick and efficient way to go is to calculate a hash value of our string of
interest and then search for that hash value (which is much more small) in the database. Then we can
use that hash key which we have found to get the full string representing our object. In Relational
Database world, this technique is known as indexing and it has been proven to significantly improve
the performances of SQL queries.
9
5.7. Importance of Hashing Functions
The main problem with hash functions is the non-uniqueness of the hash representation. In fact, we
cannot encode 1024 bits of information in just one byte, so there must be some collisions.
A collision means a hash of a certain object 1 (O1 ) is equal to the hash of another object 2 (O2 ) but
object 1 6= object 2. In other words when two different objects have the same hash digest.
H (O1 ) = H (O2 ) while O1 6= O2
In other words, the hash values H (Oi ), with i = {1, 2}, of two objects (Oi ) are identical, while the
objects which they come from are different. The hashing function assigns randomly the object to an
integer and it will get a collision with a probability:
Pr (collision) =
1
2M
where M is the hash function bit length (64, 128, 256, 1024 bits).
5.7.2 Cryptography Hash Functions
Cryptographic hash functions slightly differ from Standard Hash functions not by principle but by
usage and optimization. Hence, these functions are used to encrypt arbitrary blocks of data and to
return a fixed-size bit string, the (cryptographic) hash value, such that an (accidental or intentional)
change to the data will (with very high probability) change the hash value – i.e. applications of this
algorithm are seen in documents signatures, for fingerprinting, to detect duplicate data or uniquely
identify files, and as checksums to detect accidental data corruption. The data to be encoded is often
called the "message", and the hash value is called "digest".
Major applications of cryptographic hash functions are:
• Verifying the integrity of files or messages during transmission, for example. We simple accompany any transmitted block of data by a small hash value, which allows us to easily detect any
bit corrupted during transmission.
• Password verification. Due to security reasons in-clear passwords are never saved in databases,
what we keep is their hash values (MD5 usually). So, we can easily check the correctness of
passwords but will never know their real values.
One of the main differences between standard and cryptographic hash function is that the cryptographic one is much more computationally expensive. For this reason, they are mainly used to protect
users against the possibility of forgeries (the creation of data with the same digest as the expected).
Examples of Cryptographic Hash functions (SHA - Secure Hash Algorithm):
• MD5 (weak, 128 hash output)
• SHA-0 and SHA-1 (160 bit hash output, might be vulnerable)
• SHA-2 (256 bit output, current choice)
• SHA-3 released in Oct 2012f
Having a hashing function and the guarantee that collisions occur at probability of 21M , if a very large
1
M has been chosen, i.e. M = 1024, the ratio will be 21024
= 5.56 ∗ 10−309 . This value is astronomically
small. Hence, we can think that collisions occurring are not that relevant.
In cryptography we have two problems:
f
http://keccak.noekeon.org
10
5.7. Importance of Hashing Functions
• The birthday paradox: the probability that two objects having the same hash value will follow
the birthday paradox where the probability of two people have their birthday being in the same
day is:
2N 2
P= N
2 year
where N = number of individuals and Nyear = the number of days in one year.
Hence, having 1 million objects, the probability of one collision is strictly dependent on the
number of bits used in the hash function:
P (collision) =
(1 ∗ 106 )2
2M
• Preimage: given K , one can build O (O1 ) such as H (O1 ) is equal to K , then by chance one can
build objects and the probability of this happening H (O1 ) = K is P = 21M .
When cryptography started thinking about document signatures, developers were tolerating collisions
even if they did not want them. They wanted to use functions that were impossible to invert: one
cannot figure out which is the object that generates a particular signature (hashing value). The history
of the SHA function and MD5 started as a way to have a signature of a file. MD5 has been proven
to be easy to invert, while SHA-0 has been proven to be breakable from the point of view of the
collisions. SHA-1, an improvement of SHA-0, has been seen to be slow. Nevertheless, the number of
the probability of collisions has been decreased developing SHA-2. SHA-3 released in October 2012
is a sponge algorithm that comes from the kind of operations performed.
5.7.3 Application of SHA in SDB : short recapitulation
As we have previously mentioned, the main role of hashing in storing complex objects in SDB is to
facilitate database searches, i.e. making them significantly faster.
So, once again, the workflow is as follows: we have a user request to find a formula in our database.
First of all, we convert it into a string (we serialize it), then apply a hashing function to have a short
digest of a fixed length. In the next step we perform a fast look up in our database for the same digest.
If a match is found, we retrieve the whole information about our complex object from the database
using the hash as a primary key.
Having that in mind, we are interested in generating a hash-key (digest) with a very low collision
probability. We can do it with SHA-family function.
For example, for 128 bit possible digest of MD5 algorithm we could encode a message of n = 2128 bits =
1.8 ∗ 1019 bit longg . SHA algorithms have 160 bit long digest and allow to encode a longer string
without collisions.
g
http://en.wikipedia.org/wiki/SHA-2#Comparison_of_SHA_functions
11
Bibliography
1. H. Apiola, E. Barreiro, S. Braham, S. Buswell, A. Capani, A. M. Cohen, O. Caprotti, D. Carlisle, S. Dalmas, J. Davenport, S. Devitt, M. Dewar, A. Diaz, A. Franke, M. Gaëtano, G. Goguazde, G. Gonnet,
V. Harvey, T. Huuskonen, M. Kohlhase, S. Lavirotte, P. Libbrecht, B. Miller, W. Naylor, Y. Papegay, M. Riem, M. Seppälä, E. Smirnova, C. So, A. Solomon, A. Strotmann, B. Sutor, R. Timoney,
C. Traverso, S. Turner, S. Watt, R. Rioboo, S. Xambo, H. Cuipers, C. Müller, N. Müller, F. Rabe,
C. Lange, P. Ion, S. Dooley, M. Hitchcliffe, O. Bringslid, L. Mamane, M. Suzuki, M. Pauna, J.-W. Knopper, R. Verrijzer, R. Eixarch, J. Collins, P. Horn, D. Roozemond, J. Heras, and C. Rowley. Overview
of OpenMath [online]. Available from: http://www.openmath.org/overview/index.html [cited
Friday, 2 November 2012].
2. R. Ausbrooks, S. Buswell, D. Carlisle, S. Dalmas, S. Devitt, A. Diaz, M. Froumentin, R. Hunter, P. Ion,
M. Kohlhase, R. Miner, N. Poppelier, B. Smith, N. Soiffer, R. Sutor, and S. Watt. Mathematical
Markup Language (MathML) Version 2.0 (Second Edition) [online]. October, 21 2003. Available
from: http://www.w3.org/TR/MathML2/ [cited Friday, 2 November 2012].
3. S. Heller. International Chemical Identifier - InChI [online]. Available from: http://www.iupac.
org/home/publications/e-resources/inchi.html [cited Friday, 2 November 2012].
4. L. Lamport. Latex - A Document Preparation System [online]. Available from: http://www.
latex-project.org/ [cited Thursday, 8 November 2012].
5. D. Science. MathType [online]. 2012. Available from: http://www.dessci.com/en/products/
mathtype/ [cited Thursday, 8 November 2012].
6. F. A. K. von Stradonitz. IUPAC - International Union of Pure and Applied Chemistry [online].
Available from: http://www.iupac.org/ [cited Friday, 2 November 2012].
12
Further Reading
1. S. Alves, R. Apodaca, J. Ballanco, M. Banck, R. Braithwaite, D. Bratashov, F. Bresciani, J. Brefort,
A. C. Massagué, J. Chen, A. Clark, J. Corkery, S. Constable, D. Curtis, A. Dalke, D. de Leon, H. D.
Winter, M. Deij, C. Ehrlicher, N. England, eMolecules, V. Favre-Nicolin, M. Fedorovsky, F. Fontaine,
M. Gillies, R. Gillilan, B. Goldman, R. Guha, R. Hall, B. Hanson, M. Hanwell, T. Hassinen, B. Herger,
D. Hoekman, G. Hutchison, InhibOx, B. Jacob, C. James, M. Johansson, S. Kebekus, D. Koes, E. Krieger,
E. Kruus, D. Leidert, C. Laggner, G. Landrum, E. Leitl, T. Lin, Z. Liu, D. Lonie, D. Mansfield, D. Mathog,
G. Menche, D. Mierzejewski, C. Morley, P. Mortenson, P. Murray-Rust, A. Nicholls, C. Niehaus,
F. Nigsch, N. O’Boyle, T. O’Donnell, S. Patchkovskii, F. Peters, S. Reith, L. Richard, P. Rumpf,
R. Sayle, E.-G. Schmid, A. Shah, K. Shepherd, S. Shim, S. NV, A. Smellie, M. Sprague, M. Stahl,
C. Swain, S. J. Swamidass, G. Thijs, J. Thomas, K. Tokarev, B. Tolbert, P. Tosco, S. Trepalin, G. Tu,
T. Vandermeersch, U. Varetto, W. Volkmuth, M. Vogt, I. Wallach, F. Wallner, C. Wassman, P. Walters,
S. Wathen, J. K. Wegner, and P. Wolinski. Canonical Coding Algorithm - Open Babel [online].
October 2007. Available from: http://openbabel.org/api/2.3/canonical_code_algorithm.
shtml [cited Thursday, 8 November 2012].
2. P. Topping. Mathematics on the Web: MathML and MathType [online]. January, 21 1999. Available
from: http://www.dessci.com/en/reference/white_papers/mt_mathml.htm.
13
Download