The Connectivity Map:

advertisement
The
Connectivity
Map:
A portrait of
The
Database as Biomedical
Laboratory
by Pablo Tamayo, Broad Institute and Oracle Corporation
Part I:
What is the CMAP?
Introduction
and overview
Connectivity Map
In a nutshell: it connects diseases with drugs using the
language of genes.
Connectivity Map
It is organized as a publicly available online
database that contains the signatures of many
drugs in the language of gene expression.
It can be “queried” with genetic signatures of
disease, in an approach known as in silico drug
screening, in order to find matching drugs that
are therefore identified as potential new
treatments for the disease.
Connectivity Map
5186223
(-)-catechin
5186324
fulvestrant
12,13-EODE
5213008
geldanamycin
3-hydroxy-DL-kynurenine
5286656
genistein
BW-B70C
HC
haloperidol
DL-PPMP
calmidazolium
monorden
MG-132
carbamazepine
nordihydroguaiaretic
cytochalasin
prochlorperazine
demecolcine
celastrol
rosiglitazone
doxycycline
sirolimus
minocycline
celecoxib
thioridazine
monensin
clotrimazole
tretinoin
phenanthridinone troglitazone
colforsin
phentolamine
decitabine
valproic
trichostatin
docosahexaenoic
vorinostat
tyrphostin
ikarugamycin
wortmannin
yohimbine
ionomycin
clozapine
15-delta
pararosaniline
trifluoperazine
17-allylamino-geldanamycin
quercetin
17-dimethylamino-geldanamycin
rottlerin
LY-294002
topiramate
5109870
acetylsalicylic
5182598
5114445
5211181
5140203
5224221
5149715
5230742
5151277
5248896
5152487
5252917
5162773
pirinixic
LM-1685
dopamine
sulindac
NU-1025
imatinib
5253409
fludrocortisone
butein
rofecoxib
5255229
prednisolone
thalidomide
cobalt
5279552
tomelukast
MK-886
quinpirole
Y-27632
sulfasalazine
arachidonic
TTNPB
blebbistatin
amitriptyline
ciclosporin
diclofenac
bucladesine
dexverapamil
nifedipine
clofibrate
depudecin
exemestane
arachidonyltrifluoromethane
felodipine
verapamil
3-aminobenzamide oligomycin
oxaprozin
chlorpropamide
probucol
oxamic
prazosin
tolbutamide
U0125
fasudil
pyrvinium
mesalazine
splitomicin
raloxifene
resveratrol
metformin
HNMPA-(AM)3
tacrolimus
monastrol
phenformin
dimethyloxalylglycine
butirosin
Phenyl
fisetin
tamoxifen
mercaptopurine
alpha-estradiol
copper
dexamethasone Chlorpromazine
W-13
deferoxamine
2-deoxy-D-glucose
benserazide
tetraethylenepentamine
colchicine
estradiol
1,5-isoquinolinediol azathioprine
tioguanine
fluphenazine
SC-58125
nitrendipine
paclitaxel
gefitinib
N-phenylanthranilic
pentamidine
staurosporine
flufenamic
novobiocin
indometacin
exisulind
4,5-dianilinophthalimide
sodium
nocodazole
iloprost
5666823
It contains 164 (1079 v2) different drugs
including most FDA approved drugs.
Connectivity Map
The CMAP can significantly speed up the rate
of drug discovery, and find new uses for old
drugs.
The CMAP is housed at the Broad Institute in
Cambridge MA and is publicly available at
www.broad.mit.edu/cmap/
The Broad Institute is a research
collaboration involving the MIT and
Harvard academic and medical
communities.
It was founded in 2003 through the
far-sighted generosity of philanthropists
Eli and Edythe Broad.
The Institute is organized around
interdisciplinary Scientific Programs and
Scientific Platforms to enable scientists to
collaborate on important projects with
the objective of bringing the power of
genomics to medicine.
The CMAP Team
People that have participated in the project include Irene Blat,
Jean-Philippe Brunet, Steve Carr, Jon Clardy, Paul Clemons, Emily
Crawford, Stephen Haggarty, William Hahn, Jim Lerner, Joshua
Modell, David Peck, Xiao Peng, Srilakshmi Raj, Michael Reich,
Kenneth Ross, Aravind Subramanian, David Twomey, Ru Wei and
Matthew Wrobel. Justin Lamb and Todd Golub (shown in photo
below) lead the CMAP team.
Photo courtesy of Justin Ide/Harvard News Office
CMAP reference: Lamb et al. The Connectivity Map: Using Gene-Expression
Signatures to Connect Small Molecules, Genes, and Disease. Science 313 (5795),
1929 (2006).
What type of database is the CMAP?
Web interface
Java Servlets
The CMAP v1 runs on an Oracle Database
10g Enterprise Edition Release 10.1 - 64bit
with partitioning, OLAP and data mining
options.
It captures information about the
experimental process that generates the data
CMAP
It is implemented as a Java/servlet
application with a web interface.
It has more than 6,000 registered users.
It stores the drug and disease signatures
plus entire results sets for each user/query
that can be retrieved at later times.
The Connectivity Map has been useful to identify
novel therapeutics in leukemia and prostate cancer
Volume 10, October 2006
Two articles in this issue of
Cancer Cell show the use of
the CMAP in Leukemia and
prostate cancer research to
predict anticancer activity
that was subsequently
demonstrated in additional
experiments on model
systems.
Later in this presentation we will see the leukemia
example in detail …
Part II:
How does the CMAP work?
How was the CMAP created?
First 164 (1079 v2) distinct drugs were
selected and used in several doses and times
for a total of 564 (5774 v2) instances…
…on 4 different types of cell lines…
Breast
Prostate
Leukemia
… then those were profiled
using Affymetrix arrays of DNA
micro-chips and a scanner
Melanoma
CMAP
.. .they were
finally stored
in the
database
The drug
signatures
are ordered
lists of
genes…
genes that
go up
genes that
go down
…then a computer program
identified drug signatures
How is the CMAP queried?
Starting from two patient populations
E.g. Disease and Normal…
A
B
…samples are extracted and
profiled using Affymetrix arrays
of DNA micro-chips and a scanner
CMAP
...and the
disease
signature
itself becomes
the query
Disease signature
Query
…a computer program defines
the disease signature
genes that
go up
genes that
go down
How to match diseases to drugs?
Disease X
signature
564 (5774 v2) drug instances
Top genes
up
match
against all
the drugs
Top genes
down
……
~22,000
genes
One Example in Detail…
is match
Disease signature against all
the drugs
e.g. 13 genes:
7 up and 6 down by using
an
A
B
statistical
test
564 (5774 v2) drug instances
…
gene up
gene down
strong weak
positive positive
…
null
~22,000
genes
weak strong
negative negative
Notice that the CMAP queries are not standard information
retrieval queries such as:
SELECT <...> FROM CMAP <...>
Because the actual link between drugs and disease does
not exist until the query is made!
The match between the disease and the drug signatures is
computed using an statistical test that compares the gene
orderings of both signatures and computes a similarity
score.
Lets see how it works…….
CMAP queries use a Kolmogorov-Smirnov statistical test
Are the genes in the
down signature enriched
on this side?
Drug x
More formally:
V ( j ) j  1 
b  max 


j 1
tdown 
 n
More formally:
tdown
tdown = size of down signature
n = number of genes
Kdown =
a if a > b
Are the genes in the up
signature enriched on
this side?
 j V ( j) 
a  max  

j 1
t
n
 up

tup
Disease
signature
drug x’s effect on genes
down
up
tup = size of up signature
n = number of genes
-b if b > a
Connectivity score Sx =
Kup =
a if a > b
-b if b > a
0 if sign(Kup)  sign(Kdown)
Kup – Kdown otherwise
CMAP queries use a Kolmogorov-Smirnov statistical test
It can be computed entirely inside the RDBMS:
SELECT stats_ks_test(drug_instance, disease_sig,
'STATISTIC') ks_statistic,
stats_ks_test(drug_instance, disease_sig) p_value
FROM cmap.drugs c, cmap.sig s
WHERE c.gene_id = s.gene_id;
Finally the top scoring drugs are selected
564 drug instances
connectivity scores
Drugs are sorted by their
connectivity scores and hits
found by the pattern of
dose/time instances of the
same drug
S1
S2
S3
.
.
.
.
.
S564
A (second) test is used to
assess the statistical
significance of each hit
For example:
Drugs:
Sx
Sy
Sz
hit + miss hit –
p-values:
0.01 0.3 0.02
Part III:
The CMAP
in action
Finding a way around
glucocorticoid resistance in
leukemia
Cancer is the most common cause of death
from disease in children in developed countries,
and the most frequent childhood malignancy is
acute lymphoblastic leukemia (ALL).
dexamethasone
Glucocorticoids have been an important
component of the treatment of acute
lymphoblastic leukemia (ALL) for more
than 50 years. However, it is still unknown
what specific factors affect sensitivity and
Cancer
is theto
most
common
resistance
these
drugs. cause of death
from disease in children in developed countries,
and the most frequent childhood malignancy is
acute lymphoblastic leukemia (ALL).
dexamethasone
With current treatment regimes, the
Glucocorticoids
have been
anlong
important
majority of patients
will be
term
component
of the treatment
of acute of
survivors, however,
almost one-third
lymphoblastic
leukemia
(ALL)
forofmore
ALL patients relapse
and
most
those die
than
However, it is
due50
toyears.
the development
ofstill
drugunknown
what
specific factors affect sensitivity and
resistance.
resistance to these drugs.
With current treatment regimes, the
majority of patients will be long term
The development
of resistance
chemotherapy
survivors, however,
almosttoone-third
of
agents
poses
a major
clinical
Many die
cells
ALL
patients
relapse
andproblem.
most of those
develop
not only toof
the
selecting agent
dueresistance
to the development
drug
but also
exhibit cross-resistance to other
resistance.
structurally unrelated compounds.
Looking for better ways to deal with this problem researchers
from an multi-institutional collaboration led by Scott Armstrong
created a 100-gene wide gene signature of glucocorticoid
resistance (Wei et al, Cancer Cell, 10, 4, 331)
Glucocorticoid
sensitive resistant
Glucocorticoid
sensitive
resistant
The CMAP shows that the drug sirolimus,
also known as rapamycin, is a top match
Rapamycin instances
Multiple instances of
rapamycin score high
when the leukemia
resistance/sensitivity
signature is used to
query the CMAP.
Good hit, but, What is Rapamycin…?
It is a natural product from Rapa Nui
Also known as Easter Island
It was isolated in the
1960s from a bacteria
and known
developed
into Island
Also
as Easter
an antifungal drug
It was isolated in the
1960s from a bacteria
and developed into
an antifungal drug
It was also found to have immunosuppressant
properties and in 1999 became an FDA approved
drug for preventing the rejection of kidney
transplants
Rapamycin regulates one of
the critical nodes in
mammalian cell circuitry:
the mTOR/Akt pathway.
It was also found to have immunosuppressant
properties and in 1999 became an FDA approved
drug for preventing the rejection of kidney
transplants
Following up the CMAP discovery Broad Institute researchers
were able to confirm that rapamycin decreases glucocorticoid
resistance in acute lymphoblastic leukemia cells.
Cell survival (resistance)
Without rapamycyin
resistant cells remain
resistant
With rapamycyin
Resistant cells become
sensitive
Higher glucocorticoid concentration
Rapamycin is currently the subject of
multiple clinical trials in leukemia and
other cancers.
This and other examples have
demonstrated that the CMAP has real
potential for accelerating drug
discovery.
Could we do it the other way?
Disease X
signature
564 (5774 v2) drug instances
Top genes
up
Score disease
samples
using the
drug
signatures
Top genes
down
……
~22,000
genes
CMAP queries “in reverse”
Are the genes in the
down drug signature
enriched on this side?
Disease x
More formally:
V ( j ) j  1 
b  max 


j 1
tdown 
 n
More formally:
tdown
a if a > b
 j V ( j) 
a  max  

j 1
t
n
 up

tup
Disease
signature
tdown = size of down signature
n = number of genes
Kdown =
Are the genes in the up
drug signature enriched
on this side?
disease effect on genes
down
up
tup = size of up signature
n = number of genes
-b if b > a
Connectivity score Sx =
Kup =
a if a > b
-b if b > a
0 if sign(Kup)  sign(Kdown)
Kup – Kdown otherwise
CML Armstrong et al 2006
Class 1
CML Armstrong et al 2006
RAPA LATE II
Class 1
Sensitive
CMAP v2 AR
Resistant
RAPA LATE
RAPA LATE II
Peng et al A
CMAP v2 AR
P-value= 0.001
RAPA EARLY
RAPA LATE
Peng et al A
Peng et al A
1_C_R
8_C_R
3_C_R
3_C_R
8_C_R
9_C_R
0_C_R
8_C_R
5_C_R
2_C_R
6_C_R
5_C_R
9_C_R
5_C_R
7_C_R
9_C_R
6_C_S
4_C_S
7_C_S
3_C_S
3_C_S
7_C_S
7_C_S
9_C_S
1_C_S
9_C_S
Armstrong e
Armstrong e
7_C_S
RAPA EARLY III
RAPA LATE III
6_C_S
Armstrong e
Peng et al A
8_C_S
RAPA LATE III
RAPA EARLY
Part IV:
Demonstration of the
CMAP web interface
The Leukemia
example
Part V:
Future plans and
conclusions
Future plans
The CMAP will represent about 1,000 drugs in its next
release (v2). This is already a significant fraction of all FDA
approved drugs.
It will eventually include several additional libraries
of experimental drugs and small molecules.
It will also contain other types of “perturbagens”
such as those produced by silencing every gene is
the genome.
Conclusions
The CMAP has demonstrated the potential of using gene
expression profiles of particular disease states as a tool for drug
screening.
The CMAP allows rapid in silico, assessment of molecules and
their ability to reverse signatures associated with specific disease
states or drug resistance profiles. It is a virtual biomedical
laboratory.
The CMAP is a very useful tool to rapidly assess for potential
activity of thousands of drugs and is an approach
complementary and synergistic with other drug screening
methods.
Postscript:
What can we learn from
the CMAP from a
database perspective?
The CMAP represents a type of database where the process
of information retrieval is deeply integrated with an
analytical component, in the case of the CMAP, an
statistical test.
This synergy between databases and analytics is also
becoming more common in other databases where
analytics are at the core of retrieval operations that involve
pattern matching, clustering, regression, forecasting or
prediction.
Since the late 1990’s Oracle has incorporated analytical
functions, e.g. statistics and data mining, in the core stack of
database technology. The challenge is now how to combine
and integrate them with more traditional information
retrieval patterns.
At present with a few hundred drugs the CMAP does not push
database technology to the limit… however, once it contains a
few hundred thousand perturbagens and many more online
users worldwide it will.
Using advanced analytic database technology the entire
connectivity score of the CMAP could be computed inside the
database, for example by multiple calls to Oracle’s SQL
Kolmogorov-Smirnov test:
SELECT stats_ks_test(signature, drug_order,
'STATISTIC') ks_statistic, FROM cmap_drugs
This is a current subject of research at Oracle Data Mining
technologies.
Comments to the author can be sent to tamayo@broad.mit.edu.
For additional information about the CMAP or the Broad Institute
please contact Nicole Davies (ndavis@broad.mit.edu).
The End
For additional information about Oracle corporation and Oracle
products please contact Charlie Berger (charlie.berger@oracle.com)
Acknowledgements:
Eli and Edythe Broad Institute: Bang Wong, Matt Wrobel, Nicole Davis, Justin Lamb,
Todd Golub and Jill Mesirov.
Oracle Corporation: Jacek Myczkowski, Charlie Berger, Jodi Greenberg and Paul Salinger.
Cell animations from “Inner Life of the Cell", provided by Robert Lue and Alain Viel,
Harvard University (c) (2007) and created by Alain Viel and Robert Lue in collaboration
with XVIVO, LLC. John Liebler, Lead Animator, and under generous support from the
Howard Hughes Medical Institution's Undergraduate Science Education Program.”
Music by:
Part I: “A Dream in the Evening, “ DJ Saryon.
Part II:” L'arrivée,” Ehma - La plage de Blâne-est. Music from the “Inner Life of the Cell“.
Part III: “Medieval Acoustic,” Vincent Bernay - Etincelle
Part IV: “A Dream in the Evening, “ DJ Saryon.
Part V: Music from the “Inner Life of the Cell“.
Postscript: “Spoir,” Vincent Bernay – Etincelle
Public domain images from wikipedia.org
Art work by Daniel Kohn (www.kohnworkshop.com)
Download