ChEBI

advertisement
ChEBI
Kirill Degtyarenko, EMBL-EBI / EPO
The team
•
•
•
•
•
•
•
•
•
•
•
Rafael Alcántara
Michael Ashburner *
Volker Ast *
Michael Darsow *
Paula de Matos
Marcus Ennis
Janna Hastings
Alan McNaught *
Inma Spiteri
Christoph Steinbeck
Martin Zbinden *
ChEBI: What is it?
Chemical Entities of Biological Interest –
an EBI database/dictionary of
‘biochemical compounds’
What are the ‘biochemical
compounds’?
Can be defined as consisting of
“molecules not directly encoded by the genome
... that are either the products of nature or are
synthetic products used ... to intervene in the
processes of living organisms”
[Michael Ashburner]
Molecular entity
“Any constitutionally or isotopically
distinct atom, molecule, ion, ion pair,
radical, radical ion, complex, conformer
etc., identifiable as a separately
distinguishable entity”
[IUPAC “Gold Book”]
In fact, ChEBI contains
• Molecular entities
 trans-vaccenic acid
• Groups
 trans-vaccenoyl group
• Classes
 fatty acids
‘Small molecules’?
Yes, but big molecules as well!
• alumina
• amylose
• metaborate
• poly(vinyl alcohol)
Current status (17.12.08)
ChEBI entries
16,618
Synonyms
43,880
IUPAC names
14,847
Registry Numbers
15,773
Formulae
13,163
Database Links
9,196
Structures
14,274
0
5,000 10,000 15,000 20,000 25,000 30,000 35,000 40,000 45,000 50,000
1-D ChEBI
• Numeric ID
• Carefully checked terminology
• Unambiguous ChEBI name
• IUPAC names
• Cross-references to free resources
Unambiguous ChEBI name
CHEBI:28918
L-adrenaline
not just ‘adrenaline’
Systematic Name (IUPAC)
2-{[3-(trifluoromethyl)phenyl]amino}benzoic acid
O
6
5
OH
1
4
2
3
NH
1
6
2
5
4
F
3
F
F
Common Name
•
•
•
•
•
O
OH
NH
F
F
F
flufenamic acid (INN English)
acide flufénamique (INN French)
ácido flufenámico (INN Spanish)
acidum flufenamicum (INN Latin)
Flufenaminsäure (German)
The Unpronounceables
O
CHEBI:48935
(E)-roxithromycin
O
N
O
H3C
HO
CH3
CH3
H3C
CH3
CH3
OH
OH
O
H3C
O
O
N
OH
O
O
CH3
IUPAC name:
H3C
H3C
CH3
CH3
CH3
O
CH3
O
OH
(3R,4S,5S,6R,7R,9R,10E,11S,12R,13S,14R)-4-(2,6-dideoxy-3C-methyl-3-O-methyl-α-L-ribo-hexopyranosyloxy)-14ethyl-7,12,13-trihydroxy-10-{[(2methoxyethoxy)methoxy]imino}-6-[3,4,6-trideoxy-3(dimethylamino)-β-D-xylo-hexopyranosyloxy]3,5,7,9,11,13-hexamethyloxacyclotetradecan-2-one
What is the common name of
roxithromycin?
CHEBI:32109
(Z)-roxithromycin
H3C
O
CHEBI:48935
(E)-roxithromycin
INN: roxithromycin
O
O
O
N
H3C
HO
CH3
CH3
HO
N
OH
O
CH3
H3C
H3C
CH3
OH
CH3
H3C
O
H3C
O
O
N
OH
O
O
CH3
O
O
CH3
H3C
H3C
CH3
CH3
CH3
OH
OH
O
O
CH3
CH3
O
H3C
O
O
H3C
CH3
CH3
OH
OH
H3C
O
N
O
CH3
O
OH
CH3
CH3
CHEBI:48844 roxithromycin
O
O
N
O
H3C
HO
CH3
CH3
H3C
CH3
CH3
OH
OH
O
H3C
O
O
N
OH
O
O
CH3
H3C
O
H3C
O
O
N
H3C
H3C
HO
CH3
CH3
H3C
CH3
O
O
H3C
N
OH
CH3
CH3
CH3
CH3
H3C
CH3
O
H3C
O
O
OH
(Z)-roxithromycin
N
OH
O
O
H3C
H3C
CH3
CH3
CH3
OH
CH3
O
O
O
OH
O
CH3
O
N
OH
HO
O
O
H3C
O
O
H3C
O
H3C
CH3
CH3
CH3
CH3
OH
OH
CH3
CH3
CH3
O
CH3
O
OH
(E)-roxithromycin
What is thiamine?
CHEBI:18385
thiamine(1+)
aka thiamine
H3C
N
S
CHEBI:33283
thiamine(1+) chloride
INN: thiamine
OH
H3C
+
N
N
NH2
N
+
N
CH3
N
CH3
NH2
CHEBI:49105 thiamine(2+) dichloride
aka thiamine chloride hydrochloride
aka thiamine hydrochloride
H3C
OH
S
Cl
-
Cl
N
+
N
+
NH3
-
-
S
N
Cl
CH3
OH
Need for 2-D
• “Better to see the face than to hear the
name” (Zen proverb)
• Structures and identifiers based on
structures offer new ways of
crosslinking to other databases
• Structure search
Connection table
ChEBI
9 10 0
11.8219
11.8219
12.6074
11.1072
12.6039
11.1072
13.0886
10.3923
10.3888
1 2 2
1 3 1
1 4 1
2 5 1
2 6 1
3 7 1
4 8 2
6 9 2
5 7 2
8 9 1
M END
0
0
0
0
0
0
0
0
0
0
0
0 0
-7.2713
-8.0922
-7.0165
-6.8574
-8.3505
-8.5027
-7.6818
-7.2713
-8.0922
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
999 V2000
0.0000 C
0 0
0.0000 C
0 0
0.0000 N
0 0
0.0000 C
0 0
0.0000 N
0 0
0.0000 N
0 0
0.0000 C
0 0
0.0000 N
0 0
0.0000 C
0 0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
H
N
N
N
N
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
2-D ChEBI
• One or more 2-D (or 3-D)
connection tables
• One is default
• Autogenerated images (PNG)
• Default diagrams should be
unambiguous
The Fine Art of chemical
drawing
Linear forms of
monosaccharides
CHO
H
OH
HO
H
HO
H
H
OH
H
O
H
O
H
OH
HO
H
HO
H
H
OH
OH
HO
OH
HO
CH 2OH
OH
HO
Pyranose forms of
monosaccharides
CH 2OH
O
HO
H
OH
H
H
H
OH
OH CH OH
2
H
OH
O
OH
HO
O
HO
HO
OH
OH
OH
OH
Fused systems
(R)-camphor
H3C
CH3
CH3
H3C
H3C
O
O
CH3
ambiguous
unambiguous
Square planar geometry
cisplatin
transplatin
H H
H H
H
N
Cl
Cl
Pt
Pt
H
N
H H
N
Cl
H
N
H H
Cl
H
From 2-D back to 1-D
 SMILES
 InChI
SMILES
(1)
• Simplified Molecular Input Line Entry
Specification
• Developed by David Weininger in 1988
• Extended by others (e.g. Daylight)
• String of standard ASCII characters
• A number of valid SMILES can be
produced for the same molecule
SMILES
H
N
N
N






N
N1C=NC2=C1C=NC=N2
c1ncc2ncnc2n1
C=1N\C=N/C\2=N/C=N\C=1/2
c1ncnc2/N=C\Nc12
n1cc2c(nc1)ncn2
[H]c1nc([H])c2n([H])c([H])nc2n1
(2)
InChI
(1)
• IUPAC International Chemical
Identifier or InChI
• Open source
• Developed by Stein, Heller,
Tchekhovskoi and McNaught
• Used by NIST, PubChem, CML… and
ChEBI
InChI
(2)
H
N
N
N
N
InChI=1/C5H4N4/c1-4-5(8-2-6-1)9-3-7-4/h1-3H,(H,6,7,8,9)/f/h7H
InChIKey=KDCGOANMDULRCW-QDQILVOLCG
Limitations
• Stereochemistry other than sp3
tetrahedral and sp2 trigonal planar
• Polymers
• Conformers
• Radicals/different spin state
• Topological isomers
• Mixtures
• Markush structures
(1)
Limitations
cisplatin
(2)
transplatin
H H
H H
H
N
Cl
Cl
N
H H
H
Pt
Pt
H
N
Cl
H
N
Cl
H H
InChI=1/2ClH.2H3N.Pt/h2*1H;2*1H3;/q;;;;+2/p-2
3-D ChEBI
cisplatin
Uncertainty and ambiguity
in chemistry
 Compositional uncertainty
 Positional uncertainty
 Configurational uncertainty
 Conformational uncertainty
Compositional uncertainty
Examples
 an alkali metal cation
 vanadate(V) anion
 [2H]ethanol
Positional uncertainty
Examples
 L-bromohistidine residue
 pteroic acid (several tautomers)
Configurational uncertainty
Examples
 androstane
 rel-(2R,3R)-2-amino-3-methylpentanoic
acid
 tetradec-11-enoic acid
Conformational uncertainty
Examples
 cyclohexane: chair, boat, twist
 protein secondary structure: , , …
ChEBI ontology
• Molecular structure ontology
• Subatomic particle ontology
• Role ontology
 Biological role
 Application
L-adrenaline
Molecular structure ontology
 catecholamines
Biological role
 hormone
Application
 antiglaucoma
 bronchodilator
 cardiostimulant
The family relations
L-cystein-S-yl
L-cysteine(•)
L-cysteine
cysteine
D-cysteine
L-cysteine
L-cysteino
L-cysteinium
L-cysteinyl
L-cysteine
L-cysteinate
zwitterion
residue
residue
L-cysteinate(1–)
L-cysteinate(2–)
Relationships in ChEBI
∆
⋄
Is A
Has Part
generic
generic
♯
♭


ℛ
ℋ
ℱ

Is Conjugate Acid Of
Is Conjugate Base Of
Is Enantiomer Of
Is Tautomer Of
Is Substituent Group From
Has Parent Hydride
Has Functional Parent
Has Role
specific
specific
specific
specific
specific
specific
specific
generic?
Is A relationship
O
O
∆
HS
OH
HS
NH2
L-cysteine
OH
NH2
is a
cysteine
Is Enantiomer Of
O
HS
OH
NH2
∆
∆
O
O

HS
OH
HS
NH2
L-cysteine
OH
NH2
is enantiomer of
D-cysteine
Has Part
has part
O
HS
O
OH
⋄
HS
+
Cl
-
+
NH3
L-cysteinium
OH
NH3
is part of
L-cysteine
hydrochloride
Is Conjugate Acid Of
O
HS
O
-
OH
S
+
NH3
O
-
NH2
L-cysteinium
L-cysteinate(2–)
♯
♯
O
O
♯
HS
OH
HS
NH2
L-cysteine
O
-
NH2
is conjugate acid of
L-cysteinate(1–)
Is Conjugate Base Of
O
HS
O
-
OH
S
+
NH3
L-cysteinate(2–)
♭
♭
O
O
OH
NH2
L-cysteine
-
NH2
L-cysteinium
HS
O
♭
HS
O
-
NH2
L-cysteinate(1–)
Acid/base relationships
O
HS
O
-
OH
S
+
NH3
L-cysteinate(2–)
♭
♯
♯
♭
O
O
OH
NH2
L-cysteine
-
NH2
L-cysteinium
HS
O
♯
♭
HS
O
-
NH2
L-cysteinate(1–)
Is Tautomer Of
O
O

HS
OH
HS
+
NH2
L-cysteine
O
NH3
is tautomer of
L-cysteine
zwitterion
-
Is Tautomer Of
H
N
1H-pyrrole

N
2H-pyrrole


N
3H-pyrrole
Has Parent Hydride
is parent hydride of
O
H3C
HO
N
H3C
CH3
ℋ
N
H H
O
OH
salutaridinol
has parent hydride
morphinan
Has Functional Parent
is functional parent of
O
H3C
O
H3C
HO
N
H3C
CH3
HO
ℱ
N
O
H3C
H3C
O
CH3
O
OH
O
7-O-acetylsalutaridinol has functional parent salutaridinol
Is Substituent Group From
O
L-cysteine
HS
OH
NH2
ℛ
O
HS
O
ℛ
ℛ
OH
HS
O
NH
NH2
*
L-cysteino
HS
*
NH
*
L-cysteine
*
residue
L-cysteinyl
The family relations
L-cystein-S-yl
L-cysteine(•)
cysteine
∆
ℱ
∆

D-cysteine
♯ ♭
ℛ
ℛ
L-cysteinyl
♭
L-cysteine
zwitterion
♯ ♭
♭
L-cysteinate(1–)
residue
♯♭
L-cysteinate
♯
♯

L-cysteine
ℛ
L-cysteino
L-cysteine
ℛ
L-cysteinium
residue
ℛ
♯ ♭
L-cysteinate(2–)
Ontology of L-cysteine
Ontology of L-cysteine (1)
Ontology of L-cysteine (2)
Thank you
Download