EMBL-EBI Powerpoint Presentation - European Bioinformatics Institute

advertisement
Small Molecules in Bioinformatics
EBI Bioinformatics Roadshow
16th March 2011
Dusseldorf
EBI is an Outstation of the European Molecular Biology Laboratory.
Agenda
• Introduction
• Small molecule resources
• ChEMBL
• ChEBI
• Searching and browsing
• Hands-on Exercises
2
13.04.2015
Small molecule resources at the EBI
Annotation of bioinformatics data
• Essential for capturing and understanding and knowledge
associated with core data
• Often captured in free text, which is easier to read and better
for conveying understanding to a human audience, but…
• Difficult for computers to parse
• Quality varies from database to database
• Terminology used varies from annotator to annotator
• Towards annotation using standard vocabularies: ontologies
within bioinformatics
3
13.04.2015
Small molecule resources at the EBI
Small molecules participate in
all processes of life
What are Small Molecules?
• A small molecule is defined as a low molecular weight
organic compound.
• Most drugs are small molecules to allow passage over
cell membranes and oral bioavailability.
• They are also able to bind to proteins and enzymes,
thereby altering function, which can lead to a therapeutic
effect.
• Small molecules are used in everyday life.
Some
common
small
molecules:
Amino
Acids
γ-aminobutyric acid
Signaling
GABA: chief inhibitory neurotransmitter in the mammalian central nervous system.
In humans, also regulates muscle tone.
•
synthesized by neurons
•
found mostly as a zwitterion, that is, with the carboxyl group deprotonated and
the amino group protonated
•
conformational flexibility of GABA is important for its biological function, as it has
been found to bind to different receptors with different conformations
•
7
GABA deficiency linked to
•
anxiety disorder, depression, alcoholism
•
multiple sclerosis, action tremors, tardive dyskinesia
13.04.2015
Small molecule resources at the EBI
Metabolism
Adenosine 5'-triphosphate
Adenosine 5’-triphosphate (ATP): the
"molecular unit of currency" of intracellular
energy transfer.
8
•
generated in the cell by energy-consuming processes, broken down by
energy-releasing processes
•
proteins that bind ATP do so in a characteristic protein fold known as the
Rossmann fold, which is a general nucleotide-binding structural domain that
can also bind the cofactor NAD
13.04.2015
Small molecule resources at the EBI
Enzymes
• Enzyme inhibitors are molecules that bind to enzymes and
decrease their activity.
• Many drugs are enzyme inhibitors.
They are also used as herbicides
and pesticides.
clavulanic acid
acts as a suicide
inhibitor of
bacterial β-lactamase
enzymes
• Enzyme activators bind to enzymes and increase their enzymatic
activity.
• Enzyme activators are often involved in the allosteric regulation of
enzymes in the control of metabolism.
9
13.04.2015
Small molecule resources at the EBI
Pathways
http://www.genome.jp/kegg-bin/highlight_pathway?scale=1.0&map=map00231&keyword=tryptophan
10
13.04.2015
Small molecule resources at the EBI
Systems biology
BioModels: quantitative models of biochemical and cellular systems
tryptophan
D-enantiomer: sweet
11
13.04.2015
Small molecule resources at the EBI
L-enantiomer: bitter
Drug types 2003 - 2009
'Small molecules' in various shades of blue (http://chembl.blogspot.com/)
12
13.04.2015
Small molecule resources at the EBI
Small Molecule Databases
• Small Molecule Databases can be used to:
• Investigate historical compounds and associated bioactivity data.
• To give fresh insight into previously rejected drugs.
• Create Structure-Activity Relationships (SARs)
• Look at how changing a functional group can change the
biological activity of a compound – before you start your own
synthesis.
13
13.04.2015
Small molecule resources at the EBI
• Direct synthesis
• Could reduce number of compounds made – if any similar
compounds have significant toxicity or unfavourable binding data,
you can save time by not making analogues.
• Direct end product testing
• Suggest what testing could be carried out – the database can
give you an idea of what testing has given ‘good’ (i.e. clear)
results.
• Reduce number of compounds put through High Throughput
Screening (HTS).
14
13.04.2015
Small molecule resources at the EBI
ChEBI and ChEMBL
Small molecule resources at the EBI
What is ChEBI?
•
•
•
•
•
•
Chemical Entities of Biological Interest
Freely available
Focused on ‘small’ chemical entities (no proteins or
nucleic acids)
Illustrated dictionary of chemical nomenclature
High quality, manually annotated
Provides chemical ontology
Access ChEBI at http://www.ebi.ac.uk/chebi/
16
13.04.2015
Small molecule resources at the EBI
ChEBI home page
17
13.04.2015
Small molecule resources at the EBI
ChEBI data overview
Nomenclature
Ontology
caffeine
1,3,7-trimethylxanthine
methyltheobromine
metabolite
CNS stimulant
trimethylxanthines
Chemical data
Database Xrefs
Formula: C8H10N4O2
Charge: 0
Mass: 194.19
MSDchem: CFF
KEGG DRUG: D00528
Chemical Informatics
InChI=1/C8H10N4O2/c1-10-4-9-65(10)7(13)12(3)8(14)11(6)2/h4H,1-3H3
SMILES: CN1C(=O)N(C)c2ncn(C)c2C1=O
Visualisation
ChEBI entry view
19
13.04.2015
ChEBI – Chemical Entities of Biological Interest
Chemical Structures
• Chemical structure may be
interactively explored
using MarvinView applet
• Available in formats
•
•
•
•
20
13.04.2015
Image
Molfile
InChI and InChIKey
SMILES
Small molecule resources at the EBI
Automatic Cross-references
21
13.04.2015
ChEBI – Chemical Entities of Biological Interest
What is ChEMBL?
•
•
•
•
•
Database of bioactive, drug-like small molecules.
Contains 2D structures, calculated properties (logP, mol
weight, Lipinski etc)
Contains abstracted bioactivity data, e.g. binding data
and IC50, from multiple primary scientific journals
Covers about 30 years of compound synthesis and
testing
Annotated FDA-approved drugs
Access ChEMBL at https://www.ebi.ac.uk/chembldb/
22
13.04.2015
Small molecule resources at the EBI
ChEMBL Main Search Page
Small molecule resources at the EBI
Clickable structure
Drug
Information
Calc.
properties
24
13.04.2015
Master headline
Small molecule resources at the EBI
Structural
Representations
25
13.04.2015
Small molecule resources at the EBI
26
13.04.2015
Small molecule resources at the EBI
Parent and Salt
Forms
Database links
27
13.04.2015
Small molecule resources at the EBI
ChEBI Link:
28
13.04.2015
Small molecule resources at the EBI
This will take you back to ChEMBL
ChemSpider Links:
The link works
both ways. They
link TO
ChemSpider and
FROM
ChemSpider.
They link on
Standard_Inchi
30
13.04.2015
Small molecule resources at the EBI
Wikipedia Links:
We also have links with
Wikipedia. These also use
the Standard_Inchi as the
common identifier. These
links will link to the
Compound Report Card in
ChEMBL.
The links are added by a
ChemoBot and can be
updated with each
release, if required.
31
13.04.2015
Small molecule resources at the EBI
STRUCTURAL
REPRESENTATION
32
13.04.2015
Small molecule resources at the EBI
Stereoisomers
• Compounds that have same molecular formula and
configuration, but differ in the 3-dimensional orientations.
• The central tetrahedral carbon has 4 different molecular
groups/atoms attached. This is known as the chiral
centre.
33
13.04.2015
Small molecule resources at the EBI
Stereoisomerism Example - Thalidomide
• Caused thousands of deformities in babies across 46
countries between 1957 and 1961.
• The R isomer is to control morning sickness but the S
isomer was teratogenic.
• Sparked more tightly controlled laboratory practices
across the world.
34
13.04.2015
Small molecule resources at the EBI
Stereoisomers
• Where known, the stereochemistry of the compound is
noted in the structure and in the name.
• If a stereoisomer of an existing compound is submitted, it
is given a separate id number.
• If a mixture of two stereoisomers had data submitted, we
will also give this a separate id number if the activity of
the compounds can not be isolated.
• If you draw a planar compound into the structure search,
you will receive data on all stereoisomers.
35
13.04.2015
Small molecule resources at the EBI
Ofloxacin, Levofloxacin and Dextrofloxacin
• Fluoroquinolone antibiotics
• Ofloxacin is a racemic (equal) mixture of Levo and Dextro
isomers.
• Levofloxacin is the more active stereoisomer
• Dextrofloxacin is the less active stereoisomer
• ChEMBL has data on each with separate bioactivities.
36
13.04.2015
Small molecule resources at the EBI
Tautomers (keto-enol form)
• Two forms readily interconvert via the migration of a
hydrogen to the adjacent oxygen and the swapping of a
single to a double bond, and vice versa.
• ChEMBL does not differentiate between different
tautomers.
• The preferred tautomeric structure is retained.
• ChEBI does differentiate and will store the separate
tautomers.
37
13.04.2015
Small molecule resources at the EBI
Salts
• About 50% of marketed drugs are combined with salts to
aid in their activity.
• Some salts prevent the drug from being absorbed in the mouth.
• Some salts help the drug be activated in the intestines, rather
than the stomach.
• There are approx 40,200 ChEMBL compounds with salts.
• Bioactivity data is recorded against the parent drug and
against the salt.
• Therefore, it’s important to give these compounds
different ChEMBL ids.
38
13.04.2015
Small molecule resources at the EBI
Salt Example: Morphine
• Morphine can be adminstered with many different salts:
•
•
•
•
•
•
•
•
•
•
•
39
13.04.2015
Hydrochloride (HCl)
Sulphate (SO4)
Tartrate
Acetate
Citrate
Methobromide (MeBr)
Hyrobromide (HBr)
Hydroiodide (HI)
Lactate
Chloride (Cl)
Bitartrate
Small molecule resources at the EBI
Dealing with Salts in ChEMBL
• Each compound, if in a salt form, is analysed and
matched to a ‘parent’ – i.e. the base form of the
compound. (Not inorganic compounds)
• For example, morphine hydrochloride (CHEMBL556578),
morphine sulfate (CHEMBL422878) and morphine sulfate
hydrate (CHEMBL1200603) are matched to their parent
morphine (CHEMBL70)
• This relationship is shown on the interface of the
compound page.
• Additionally, when you run a search for a compound, you
will only be brought back the parent form in the results
grid.
40
13.04.2015
Small molecule resources at the EBI
Parents and Salts on the Compound Page
PARENT
(compound report
page)
41
13.04.2015
SALTS
(with hyperlinks beneath)
Small molecule resources at the EBI
• Clicking on the Bioactivity Summary pie chart will give
you the bioactivity data for ALL forms of the compound
• To get salt specific bioactivity data, click on the hyperlink
beneath the salt form of interest to be taken to its
compound page.
Morphine - All Data
42
13.04.2015
Small molecule resources at the EBI
Morphine HCl specific data
Naming and Classification
Small molecule resources at the EBI
Chemical names
Common or trivial names are those that are highly used.
Advantages of common names include
simplicity,
pronounceability and
universally recognised
The main disadvantage is ambiguity – the same common
name may refer to more than one type of chemical.
Small molecule resources at the EBI
Systematic names
A systematic name is one which corresponds to the chemical
structure such that the structure can be determined from the
name, e.g. 1,2-dimethyl-naphthalene
Software packages exist which can generate structures from
the systematic names (e.g. ACD/Name, ChemOffice,
MarvinSketch).
More than one correct systematic name can be assigned to the
same molecular structure, depending on the manner in which
naming rules are applied.
Small molecule resources at the EBI
Examples of common and systematic names
Common names
Systematic names
1,3,7-trimethyl-3,7dihydro-1H-purine-2,6dione
caffeine
guaranine
7-methyltheophylline
theine
1,3,7-trimethyl-2,6dioxopurine
Small molecule resources at the EBI
SEARCHING IN CHEBI
Why?
• Ontological data
• Structure classification
• Chemical entity, e.g. hydrocarbon
• Role, e.g. ligand
• Subatomic particle, e.g. electron
• Links to other databases
• Kegg
• DrugBank
• PDBEChem
• Citations
How?
Text-based
Drawing
The ChEBI ontology
Organised into three sub-ontologies, namely
• Molecular structure ontology
• Subatomic particle ontology
• Role ontology
(R)-adrenaline
50
13.04.2015
Small molecule resources at the EBI
Molecular structure ontology
51
13.04.2015
Small molecule resources at the EBI
Role ontology
52
13.04.2015
Small molecule resources at the EBI
ChEBI ontology relationships
• Generic ontology relationships
• Chemistry-specific relationships
53
13.04.2015
ChEBI – Chemical Entities of Biological Interest
Viewing ChEBI ontology
54
13.04.2015
ChEBI – Chemical Entities of Biological Interest
Simple and advanced text search
Narrow by
category
AND, OR
and BUT
NOT
55
13.04.2015
Small molecule resources at the EBI
Structure search
Structure
drawing tools
56
13.04.2015
Small molecule resources at the EBI
Search options
Search Results
Hover-over for
search menu
Click to go to
compound page
57
13.04.2015
Small molecule resources at the EBI
Types of structure search
• Identity – based on InChI
InChI=1/H2O/h1H2
• Substructure – uses fingerprints to narrow search range, then
performs full substructure search algorithm
0010110010
1010110111
• Similarity – based on Tanimoto coefficient calculated between the
fingerprints
Tanimoto(a,b)
= c / (a+b-c)
58
13.04.2015
a
b
Small molecule resources at the EBI
0010110010
1010110111
= 4 / (4+7-4)
= 0.57
Browse via Periodic Table
Molecular
entities /
Elements
59
13.04.2015
Small molecule resources at the EBI
Navigate via links in ontology
Click to follow links
60
13.04.2015
Small molecule resources at the EBI
CHEBI SEARCH EXAMPLE
ChEBI example
•
•
•
•
•
•
•
•
•
•
Search for ‘Glycine’
What is the ChEBI ID for this?
Is it available as a Kegg compound?
What are the IUPAC names?
What is ‘glycine zwitterion’?
15428
Yes
Glycine, aminoacetic acid
It is a tautomer of glycine
SEARCHING IN CHEMBL
63
13.04.2015
Small molecule resources at the EBI
How to search in ChEMBL:
• Keywords
• Compound name – dopamine, haloperidol
• Assay name – cytotoxicity, liver hepatotoxicity
• Target – RAF-1, IRAK-4
• Structure
• BLAST search – FASTA sequence from UniProt
• Protein or taxonomy hierarchy
64
13.04.2015
Small molecule resources at the EBI
Where to search:
65
13.04.2015
Small molecule resources at the EBI
Using the search field (found on main page):
• Best for single words
• E.g. ‘dopamine’, ‘Muscarinic’
• Looks for matching text in compound name, key or
synonym
• 3-o-methyl-alpha-methyldopamine
• Muscarinic receptor 4
• Needs an exact match
• Can’t use wildcards, e.g. ‘%’, ‘?’…
66
13.04.2015
Small molecule resources at the EBI
Using the Protein Sequence Search
• Useful for searching for a specific protein or a protein
from the same family
• The results brought back will show a percentage similarity
to the inputted sequence.
• An exact match will give 100%.
• Same targets but different organisms will give ~90%
67
13.04.2015
Small molecule resources at the EBI
Compound Drawing
• Can draw the full structure of
interest or a partial structure
• Using the Substructure
Search you can find
compounds containing your
partial structure
• Using the Similarity Search,
you can find similar
compounds – based on a
percentage score (70-100%)
68
13.04.2015
Small molecule resources at the EBI
DOWNLOAD AND ANALYSIS
OF CHEMBL RESULTS
69
13.04.2015
Small molecule resources at the EBI
• The compounds can be downloaded as an *.SDFile.
71
13.04.2015
Small molecule resources at the EBI
• The bioactivity data can be downloaded as *.XLS
72
13.04.2015
Small molecule resources at the EBI
73
13.04.2015
Small molecule resources at the EBI
CHEMBL WORKED EXAMPLE
STRUCTURE ACTIVITY
RELATIONSHIPS
Small molecule resources at the EBI
Drug design
•
Ligand-based: relies on knowledge of other molecules that bind to the
biological target of interest.
•
Structure-based: relies on knowledge of the 3D structure of the biological
target.
•
A lead has
•
•
•
76
evidence that modulation of the target will have therapeutic value: e.g. disease
linkage studies showing associations between mutations in the biological target
and certain disease states.
evidence that the target is druggable, i.e. capable of binding to a small molecule
and that its activity can be modulated by the small molecule.
Target is cloned and expressed, then libraries of potential drug compounds
are screened using screening assays
13.04.2015
Small molecule resources at the EBI
Drug Discovery Process
Target
Discovery
Lead
Discovery
•Target
identification
•Microarray
profiling
•Target
validation
•Assay
development
•Biochemistry
•Clinical/Animal
disease models
•High-throughput
Screening (HTS)
•Fragment-based
screening
•Focused
libraries
•Screening
collection
Lead
Optimisatio
•Medicinal
n
Chemistry
•Structure-based
drug design
•Selectivity
screens
•ADMET screens
•Cellular/Animal
disease models
•Pharmacokineti
cs
Clinical Trials
Preclinical
Development
•Toxicology
•In vivo safety
pharmacology
•Formulation
•Dose prediction
Discovery
Med. Chem. SAR
Phase
1
PK
tolerabilit
y
Phase
2
Phase
3
Efficacy
Safety
&
Efficacy
Launch
Indication
Discovery &
expansion
Development
Use
Clinical
Candidates
Dru
gs
~12,000 candidates
~2000
drugs
ChEMBL database
> 2,900,000 bioactivities
> 600,000 compounds
~30,000 distinct lead series
Small molecule resources at the EBI
Compound
H
N
N H
Target
O
N
N
N
H
N
H
N
O
Bioactivity
H
O
>Thrombin
Assay
MAHVRGLQLPGCLALAALCSLVHSQHVFLAPQQARSLLQRVRRANTFLEEVRKGNLERECVEETCSY
EEAFEALESSTATDVFWAKYTACETARTPRDKLAACLEGNCAEGLGTNYRGHVNITRSGIECQLWRS
RYPHKPEINSTTHPGADLQENFCRNPDSSTTGPWCYTTDPTVRRQECSIPVCGQDQVTVAMTPRSEG
SSVNLSPPLEQCVPDRGQQYQGRLAVTTHGLPCLAWASAQAKALSKHQDFNSAVQLVENFCRNPDGD
EEGVWCYVAGKPGDFGYCDLNYCEEAVEEETGDGLDEDSDRAIEGRTATSEYQTFFNPRTFGSGEAD
CGLRPLFEKKSLEDKTERELLESYIDGRIVEGSDAEIGMSPWQVMLFRKSPQELLCGASLISDRWVL
TAAHCLLYPPWDKNFTENDLLVRIGKHSRTRYERNIEKISMLEKIYIHPRYNWRENLDRDIALMKLK
KPVAFSDYIHPVCLPDRETAASLLQAGYKGRVTGWGNLKETWTANVGKGQPSVLQVVNLPIVERPVC
KDSTRIRITDNMFCAGYKPDEGKRGDACEGDSGGPFVMKSPFNNRWYQMGIVSWGEGCDRDGKYGFY
THVFRLKKWIQKVIDQFGE
Compound
Ki=4.5
nM
SAR Data
APTT
11 min
Small molecule resources at the EBI
Current Data Content (ChEMBL_09)
• Abstracted from 39,094 papers from 16 journals
•
Ongoing curation and clean-up of all data
• 759,220 compound records
•
623,012 distinct compound structures
• 8,091 targets
•
4,912 protein molecular targets
• 3,030,317 experimental bioactivities
•
binding measurements, functional assays and ADMET
Small molecule resources at the EBI
ChEMBL Assay Data
• ChEMBL contains >3 million data points relating compounds to
targets or effects.
• These activities come from ~490K assays reported in
medicinal chemistry literature.
• Assays can be classified as:
•
binding measurements
ADMET 9%
e.g., IC50
•
functional assay endpoints
Binding 40%
e.g., Vasodilation
•
ADME/toxicity data
e.g., LD50
Small molecule resources at the EBI
Functional
51%
Compound Properties and Selectivity
• Stores a wide range of calculated compound properties
(e.g., mol wt, logP, RO5 violations)
• Can be used to identify compounds most likely to have good in
vivo properties (Absorption, Distribution, Metabolism, Excretion)
• Contains activity information against liability targets (e.g.,
cytochrome P450s, HERG K+ channel)
• If compounds have been tested in these assays, can avoid those
with potential toxicity issues
• Contains data on a wide range of targets
• If compounds have been tested against multiple targets, can get
an idea of their selectivity (important for validation studies)
81
13.04.2015
Small molecule resources at the EBI
Identifying Chemical Tools
• Search ChEMBL for protein of interest
• Simple text search against protein names/synonyms
• Browse protein family tree
• Sequence search using BLAST (can find related proteins)
• Identify compounds active against this protein
• Sort/filter by relevant activity types and potency
• E.g., retrieve compounds with IC50/Ki < 100nM
• Retrieve other data for these compounds
• Structures, chemical properties, other activities
82
13.04.2015
Small molecule resources at the EBI
Example SAR
1. Run a search on RAF-1
2. Filter on all IC50 values less than 100nM
3. Run the structures through an external source, such as
Pipeline Pilot, to show the most common substructures.
•
•
This will give you an idea of what type of compounds
have a good IC50 for the target RAF-1.
You can then design a similar compound(s) based on
these substructures.
Assessing selectivity
• So far we have only identified compounds that may be
active against a target of interest
• Often the aim is to find compounds that are selective for
that target (i.e., not active against other targets)
• Need to consider all of the available activity data for each
compound to see if it is known to be active against any
other targets
Small molecule resources at the EBI
• Extract the list of SMILES from the XLS spreadsheet
• Run this through ChEMBL SMILES list search tool
• Filter the bioactivity for IC50 > 100nM
• Download the filtered bioactivity as another XLS
spreadsheet
• Run a filter on the spreadsheet
• Not RAF-1
• Collect the subset of compounds that showed the specificity for
RAF-1
• Selective for RAF-1
• Selective for RAF-1 and inactive for other targets
• You can use external programs like Pipeline Pilot™ and
Spotfire Decision Site™ to analyse the results.
87
13.04.2015
Small molecule resources at the EBI
Pipeline pilot protocol to extract all data with
an IC50 of 0-100nM
88
13.04.2015
Small molecule resources at the EBI
IC50 vs Molecular Weight - Spotfire™
89
13.04.2015
Small molecule resources at the EBI
Downloads and programmatic access
Downloading ChEBI flavours
• All downloads come in two flavours
• 3 star only entries (manually annotated ChEBI
entries)
• 2 and 3 star entries (manually annotated ChEBI,
ChEMBL and user submissions)
91
13.04.2015
Small molecule resources at the EBI
Downloading ChEBI
• OBO file
• Use on OBO-edit
• SDF File
• Chemistry software compliant such as Bioclipse
• Flat file, tab delimited
• Import all the data into Excel
• Parse it into your own database structure
• Oracle binary dumps
• Import into an oracle database
• Generic SQL insert statements
• Import into MySQL or postgresql database
92
13.04.2015
Small molecule resources at the EBI
The ChEBI web service
• Programmatic access to a ChEBI entry
• SOAP based Java implementation
• Clients currently available in Java and perl
• Methods
•
•
•
•
•
getLiteEntity
getCompleteEntity and getCompleteEntityByList
getOntologyParents
getOntologyChildren and getAllOntologyChildrenInPath
getStructureSearch
• Documented at
http://www.ebi.ac.uk/chebi/webServices.do.
93
13.04.2015
Small molecule resources at the EBI
Downloading ChEMBL
• Frequent releases (approx monthly)
•
•
•
•
SDFile
Text
MySQL
Oracle
Small molecule resources at the EBI
Downloading ChEMBL
Small molecule resources at the EBI
Help and Feedback
• Email addresses for support queries and feedback
• General questions and feedback on ChEMBL interface:
chembl-help@ebi.ac.uk
• Reporting of data errors:
chembl-data@ebi.ac.uk
• General questions, support and feedback on ChEBI
chebi-help@ebi.ac.uk
96
13.04.2015
Small molecule resources at the EBI
Thank you
Download