ChEMBL

advertisement

Overview of ChEMBL Database

Gareth Owen, ChEBI group, EMBL-EBI

Northwestern University

16 th October 2012

EBI is an Outstation of the European Molecular Biology Laboratory.

2

What is ChEMBL?

• Open access database for drug discovery

• Freely available (searchable and downloadable)

• Content:

• 2D structures & calculated properties (logP, MW, Lipinski, etc.)

• Associated bioactivity data extracted from the primary medicinal chemistry journals such as J. Med. Chem.

• Deposited data from neglected disease screening (e.g. malaria)

• Subset of data from PubChem

• Covers ~30 years of compound synthesis and testing

• Annotated FDA-approved drugs

• Secure searching ( https://www.ebi.ac.uk/chembldb )

ChEMBL Database

• Content

ChEMBL14

Targets: 9,003

Compounds: 1,376,469

Activities: 10,129,256*

Publications: 46,133

* Includes:

~5,900,000 (PubChem)

~100,000 (Deposited malaria screening sets)

Assays are classified as:

• Binding measurements

• Functional assays

• ADME/toxicity data

3

60% proteins

20% organisms

20% cell lines

3

4

ChEMBL Assays –

Binding, Functional, ADMET

Binding Assays

• Assays which directly measure the binding of a compound to a particular target

• E.g., competition binding assays with a radioligand

• Various endpoints measured, but most commonly reported are:

• IC50 (half maximal inhibitory concentration)

• Ki (binding affinity)

• MIC (minimum inhibitory concentration)

• % Inhibition (of activity)

5

Functional Assays

Whole organism assays

(e.g., anti-infectives/parasitics)

Disease-derived cell-line

(e.g., human ovarian cancer cell line cytotoxicity)

Tissue or cell-based disease model

(e.g., glucose uptake by adipocytes)

Tissue or cell-based assay for target effect

(e.g., contraction of guinea-pig ileum)

Cell-based assay over-expressing target

(e.g., GPCR calcium mobilisation)

ADMET Assays

• Assays measuring:

Absorption, Distribution, Metabolism, Excretion, Toxicity properties of compounds

6

• Examples include:

• Half-life of compound in rats

• Tissue distribution of compound

• Levels of metabolites

7

ChEMBL Targets:

Protein Protein complex Protein family Nucleic Acid e.g., PDE5 e.g., Nicotinic acetylcholine receptor e.g., Muscarinic receptors e.g., DNA

Cell Line Tissue Sub-cellular Fraction Organism e.g., HEK293 cells e.g., Nervous e.g., Mitochondria e.g., Drosophila

8

Protein Targets

• Each protein target linked to a sequence in UniProt

• Information from UniProt used in ChEMBL to allow searching:

• Protein name/description

• Synonyms and gene names

• Organism (and NCBI Tax ID)

• Proteins in ChEMBL also classified according to family

(e.g., Receptor, Kinase, Protease, Transporter etc).

• Used for searching by target tree (Browse Targets)

9

ChEMBL Compounds

• Chemical structures are stored as .mol files

• If the stereochemistry is known it is drawn as a specific enantiomer

• Tautomers of the same compound are treated as the same compound. The form shown is as in the paper

• Identifying unique compounds is done using standard Inchis

• Salts and parent molecules are grouped together for displaying bioactivity data although activity data is recorded against the specific salt

ChEMBL Home Page

10 https://www.ebi.ac.uk/chembldb

11

ChEMBL Main Search Page

Drug

Information

12

Small molecule resources at the EBI

Clickable structure

Parent and Salt

Forms

13

14

Click to display data

15

16

18

ChEBI Link:

19

This will take you back to ChEMBL

20

ChemSpider Links:

The link works both ways. They link TO

ChemSpider and

FROM

ChemSpider.

They link on

Standard_Inchi

21

Wikipedia Links:

We also have links with

Wikipedia. These also use the Standard_Inchi as the common identifier. These links will link to the

Compound Report Card in

ChEMBL.

The links are added by a

ChemoBot and can be updated with each release, if required.

22

Use Case 1 - Searching by Target

• What is known about chemical structures that bind to a specific protein (Adenosine A2a)?

• What is known about their potency/selectivity/ADMET

Properties

• Is there any protein structure data?

23

Use Case 1 Searching by Target in ChEMBL

Choose Sources to include in search

Retrieving Bioactivity Data - Single Target

24

Bioactivity data for target

Assay data for target

3D Structures

Display all bioactivity data for target

Click pie chart to retrieve particular end-points

25

Select targets of interest

Filtering Bioactivities

Select required activity types and define cut-offs e.g Ki<100nM

Bioactivity Results

Compound structures

Activity values

26

Assay details Target details References

27

Selectivity Data

For example:

Can search ChEMBL for all data on compounds that have adenosine A2a Ki values <100nM

28

ADMET Data

Summary of ChEMBL bioavailability data for compounds with A2a Ki values <100nM

Example of

Bioavailability data

29

Use Case 2 – Searching by Structure

• What compounds contain a particular substructure?

• What is known about their bioactivities?

• Known drugs/clinical Trials

30 name

Lists of Identifiers

Different sketchers

Types of synonyms:

• Research codes

• Trade names

• INN, USAN

Similarity and Substructure Searching

31 Display/Download Bioactivity Data

Filtering Data on Lipinski Properties etc

32 Display Bioactivities of subset

names

Bioactivities

33

Structure

34

Bioactivities

Properties

Cross-references

Clinical Trials

35

Links to Other Resources

36

Links to Other Resources

PDBe http://www.ebi.ac.uk/pdbe

Marketed Drugs

37

Select set of interest

Export to Excel or

Export SDF

38

Use Case 3 – Similar Targets

• Are there any available data on compounds that bind to proteins similar to IRAK2?

• For these compounds what bioactivity data is there on compounds with related sub-structures?

• Is there any crystal structure data on these proteins?

Protein Sequence Search

• More precise method for identifying targets

• Input is a protein sequence of interest

• Uses BLAST* algorithm to perform pair-wise comparisons between input sequence and all proteins in the Target

Dictionary, to find most closely related matches

• Results are scored according to similarity to input sequence (determined by number of amino acids that are identical or have similar properties)

39

*Altschul SF et al., J Mol Biol. 215(3), p403-10 (1990)

Use Case 3 – Similar Targets

Protein Sequence of

Interest e.g from UniProt http://www.uniprot.org

40

Data on IRAK1,IRAK3 and

IRAK4 but not IRAK2

IRAK1, IRAK3 and IRAK4 data

41

Identify sub-structure of interest

What other data available on compounds with this sub-structure?

Use Case 4 - Assay keyword search

• Some ChEMBL data (e.g., functional assays) may not be mapped against molecular targets

• May want to perform a more general search (e.g., for a disease process, animal model, cell type of interest)

• Examples:

1.

What compounds have been tested in disease models (cholesterol lowering)?

2.

What data is available for brain penetration (brain to plasma ratio)?

42

43

Assay Search for “Cholesterol Lowering”

44

Assay Search for “Brain to Plasma”

45

Accessing ChEMBL Data

46

Useful Links

ChEMBL Blog: http://chembl.blogspot.com

If you would like help: chembl-help@ebi.ac.uk

For ChEMBL news and data releases subscribe to: http://listserver.ebi.ac.uk/mailman/listinfo/chembl-announce

47

Acknowledgements

ChEMBL Group

John Overington

Anne Hersey

Anna Gaulton

Mark Davies

Jon Chambers

Louisa Bellis

Kazuyoshi Ikeda

Patricia Bento

Shaun McGlinchey

Yvonne Light

Felix Krueger

Ben Stauch

Ruth Akhtar

Francis Atkinson

Rita Santos

EMBL-EBI

Samuel Kerrien, Sandra Orchard, Bruno

Aranda, Rafael Jimenez, Reactome,

UniProt and ChEBI teams

Collaborators

Imperial Cancer Research, University of

Dundee, University of Cambridge,

Sanger Centre, University of Maryland,

NCBI, TDR, IUPHAR, Bayer-Schering,

Pfizer, GSK, Schering-Plough, MMV,

Novartis, St Jude Children ’s Research

Hospital

Former Inpharmatica colleagues

48

Exercises!

Download