Library of Integrated Network-based Cellular Signatures (LINCS)

advertisement
Library of Integrated Network -based
Cellular Signatures
(LINCS)
September 20, 2013
LINCS concept
cell types
• perturbations scalable to genome
• high information content read-outs (e.g. gene expression)
• inexpensive
• mechanism to query database
Look-up table of cellular activity
COMMUNITY
QUERIES
PLATFORMINDEPENDENT
GENOME SCALE
GENETIC
PHARMACOLOGIC
perturbations
database
MODERATE
COMPLEXITY
10’S
COMPLEX
cell types
read out
The LINCS Network (NIH)
Data Production/Analysis Centers
Broad Institute
Harvard Medical School
Computational and
Technology Development Centers
Arizona State
Broad Institute (Jake Jaffe)
Columbia
U. Cincinnati
Miami School of Medicine
Wake Forest
Yale
External Collaborations
•
•
•
•
•
•
•
•
•
•
•
•
•
Snyder Lab, Sanford-Burnham Medical
Research Institute
FDA
GTEx
ENCODE/Epigenomics
Rao Lab, NIH CRM:
Scadden Lab, Massachusetts General
Hospital
McCray Lab, University of Iowa
Loring Lab, Scripps Research Institute
Edenberg Lab, Indiana University
Spria Lab, Boston University
Pandolfi Lab, BIDMC
Chen Lab, NHLBI
Kotton Lab, Boston University
Connectivity Map
diseases
genes
drugs
mRNA Expression Database
453 Affymetrix profiles
164 drugs
> 16,000 users
916 citations
Lamb et al, Science (2006)
CMAP/LINCS is an approach to
functional annotation
perturbagens
cell types
 CMap is limited by profiling cost
 low-cost, high-throughput method would enable…
primary screening libraries
drug-like, non-drug-like, natural products
genomic perturbagens
shRNA, ORF, variants (natural + synthetic)
cellular contexts
tissues, types, culture conditions, genetics
treatment parameters
concentrations, durations, combinations
• re-think: gene content × labeling × detection
observation: gene expression is correlated
genes
samples
Reduced Representation of Transcriptome
reduced
representation
transcriptome
genome-wide
expression profile
computational
inference model
‘landmarks’
100
60
simulation
40
20
100
300
500
700
1000
1000
1500
2000
5000
10000
14812
0
22283
~ 100,000 profiles
% connections
80
80%
number of landmarks measured
1000-plex Luminex bead profiling
AAAA 3'
5'
 RT
3'
5'
5'-PO4
|
TTTT
3'
 ligation
Luminex Beads
(500 colors,
2 genes/color)
5'
5'
 PCR
 hybridization
001
Reagent cost:
$5/sample
“L1000” expression profiling
GeneChip
L1000
measured
content
1
transcripts
inferred
22,000
1
1,000
transcripts
22,000
technology
microarray
Luminex beads
throughput
3× 96 / week
200× 384 / week
$500
$5
unit cost
(reagent)
LINCS Dataset
Current LINCS Dataset
small-molecules
1,000 landmark genes
genomic perturbagens
1,209,824 profiles
21,000 inferred genes
5,178 compounds
15 cell types
• Banked primary cell types
• Cancer cell lines
• Primary hTERTimmortalized
• Patient-derived iPS cells
• 5 community nominated
• 1,300 off-patent FDA-approved drugs
• 700 bioactive tool compounds
• 2,000+ screening hits (MLPCN + others)
3,712 genes (shRNA + cDNA)
•targets/pathways of FDA-approved drugs
(n=900)
•candidate disease genes (n=600)
Coming soon (in beta)
U54 Grant: Progress on Data Access
desc
level 1
Raw data
level 2
Normalized
dataset
level 3
Signatures
(differentially
expressed
genes)
level 4
Queries
format
Plate folders with
Matrix: GCTX
1. mongo DB
2. Matrix: GCTX
JSON objects
availability
common use cases
3,812 folders
new computational approaches to
data pre-processing and
normalization
1.2M+ profiles
deriving signatures
other kinds of analysis
383,788 sigatures
(beta release)
Q1 2014
High-level integration with
analytics and websites e.g
Genes that are modulated by TP53
Genes most correlated to the Akt1
pathway
Genes connected to an external
query signature
findings
1) Large-scale gene-expression analysis
2) Analysis of L1000 shRNA signatures
# of profiles
1400000"
1200000"
1000000"
800000"
600000"
400000"
200000"
0"
CMap"v1"
CMap"v2"
CMap"v3"
Data quality:
correlation between biological replicates
matching cell states
1) define a ‘query’
the set of genes up- and down- regulated in a cellular state of interest
2) assess strength of the query in the profile of all perturbagens in DB
not connected
up-regulated
down-regulated
cumulative score
cumulative score
connected
genes (thousands)
genes (thousands)
3) rank order perturbagens by connectivity strength
rank
perturbagen
1
2
3
.
.
.
.
.
997
998
999
drug Y
drug e
gene S
…
gene n
drug I
drug L
…
drug N
gene E
drug G
conn score
1
0.993
0.791
.
0
0
0
.
-0.877
-0.945
-1
positive connectivity
no connectivity
negative connectivity
reversing drug resistance
signature: glucocorticoid resistant acute lymphoblastic leukemia
resistant sensitive
resistant sensitive
50 ‘sensitive’ and 50 ‘resistant’ markers
(David Twomey and Scott Armstrong)
1
rank
perturbagen
5
6
27
35-sirolimus
42-sirolimus
26-sirolimus
cell
HL60
ssMCF7
MCF7
score
0.804
0.789
0.544
sirolimus
hypothesis:
sirolimus induces
glucocorticoid sensitivity
464
The 1% challenge:
the “tail” of current data is > ENTIRE previous dataset
1400000"
1200000"
1000000"
800000"
600000"
400000"
200000"
0"
CMap"v1"
CMap"v2"
CMap"v3"
query: histone deacetylase inhibitors (Glaser et al 2003)
0.5%
Rank
1
Compound ID
BRD-K69840642
Compound Description
ISOX
Connectivity Score
0.995
2
BRD-K52522949
NCH-51
0.994
3
BRD-K12867552
THM-I-94
0.993
4
BRD-K64606589
apicidin
0.992
5
BRD-K56957086
dacinostat
0.99
6
BRD-A19037878
trichostatin-a
0.989
7
BRD-A94377914
merck-ketone
0.987
8
BRD-K17743125
belinostat
0.987
9
BRD-K75081836
BRD-K75081836
0.986
10
BRD-K81418486
vorinostat
0.986
11
BRD-K68202742
trichostatin-a
0.986
12
BRD-K22503835
scriptaid
0.986
13
BRD-K02130563
panobinostat
0.985
14
BRD-A39646320
HC-toxin
0.983
15
BRD-K13810148
givinostat
0.98
16
BRD-K85493820
KM-00927
0.977
17
BRD-K11663430
pyroxamide
0.977
18
BRD-K74761218
WT-171
0.975
19
BRD-K74733595
APHA-compound-8
0.97
20
BRD-A19248578
latrunculin-b
0.965
21
BRD-K49010888
BRD-K49010888
0.962
22
BRD-K53308430
SA-1017940
0.951
23
BRD-K64890080
BI-2536
0.95
24
BRD-K00627859
tubastatin-a
0.947
25
BRD-K31542390
mycophenolic-acid
0.946
Page 1 / 200
query: compound identified to induce the lysosomal apoptosis pathway (D’Arcy et al
Nature Medicine 2012)
0.5%
Rank
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
Compound ID
BRD-K78659596
BRD-K60230970
BRD-K88510285
BRD-A55484088
BRD-A18725729
BRD-K74402642
BRD-K50234570
BRD-A58924247
BRD-A39093044
BRD-A72180425
BRD-K50691590
BRD-K19499941
BRD-K09854848
BRD-A76490030
BRD-A36275421
BRD-K28366633
BRD-A11007541
BRD-K37392901
BRD-K66884694
BRD-A83124583
BRD-K10882151
BRD-K44366801
BRD-K61033289
BRD-K07303502
BRD-K02822062
Compound Description
MLN2238
MG-132
bortezomib
BNTX
BRD-A18725729
NSC-632839
EMF-bca1-16
BRD-A58924247
K784-3187
K784-3188
bortezomib
BRD-K19499941
MD-II-008-P
K784-3131
MW-RAS12
BRD-K28366633
BCI-hydrochloride
NSC-632839
BRD-K66884694
EMF-sumo1-39
BO2-inhibits-RAD51
BRD-K44366801
15-delta-prostaglandin-j2
arachidonyl-trifluoro-methane
CT-200783
Connectivity Score
0.998
0.998
0.996
0.993
0.993
0.992
0.992
0.992
0.992
0.992
0.992
0.99
0.988
0.988
0.987
0.987
0.987
0.987
0.987
0.986
0.986
0.985
0.985
0.984
0.984
Page 1 / 200
query: HUVEC cells treated with pitavastatin (cell line not in panel)
0.5%
Rank
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
Compound ID
BRD-A81772229
BRD-A70155556
BRD-U88459701
BRD-A18763547
BRD-K22134346
BRD-K12994359
BRD-K09416995
BRD-K34581968
BRD-K94176593
BRD-K20285085
BRD-K94441233
BRD-K95785537
BRD-K53414658
BRD-K83213911
BRD-K85606544
BRD-A19248578
BRD-K68588778
BRD-K06750613
BRD-A11678676
BRD-K05653692
BRD-K72420232
BRD-K19796430
BRD-K78513633
BRD-K03618428
BRD-K37940862
Compound Description
simvastatin
lovastatin
atorvastatin
BAX-channel-blocker
simvastatin
valdecoxib
lovastatin
BMS-536924
TWS-119
fostamatinib
mevastatin
PP-2
tivozanib
PF-750
neratinib
latrunculin-b
BRD-K68588778
GSK-1059615
wortmannin
DL-PDMP
WZ-4002
erismodegib
lonidamine
PP-110
BRD-K37940862
Connectivity Score
0.996
0.994
0.991
0.988
0.985
0.983
0.981
0.979
0.975
0.973
0.972
0.971
0.97
0.968
0.968
0.967
0.966
0.966
0.964
0.963
0.961
0.961
0.961
0.961
0.961
Page 1 / 200
query: imatinib-resistant chronic myeloid leukemia (Frank et al Leukemia 2006)
0.5%
Rank
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
Compound ID
BRD-K12502280
BRD-K94176593
BRD-K20285085
BRD-K49328571
BRD-K12867552
BRD-K85493820
BRD-A02180903
BRD-K91701654
BRD-K95785537
BRD-K53414658
BRD-A50454580
BRD-K73789395
BRD-K17743125
BRD-K46419649
BRD-K09499853
BRD-K64890080
BRD-K70914287
BRD-K50168500
BRD-U43867373
BRD-U25771771
BRD-K34581968
BRD-K18787491
BRD-K56343971
BRD-K01877528
BRD-K66175015
Compound Description
TG-101348
TWS-119
fostamatinib
dasatinib
THM-I-94
KM-00927
betamethasone
U-0126
PP-2
tivozanib
PD-0325901
ZM-336372
belinostat
U0126
KU-0060648
BI-2536
BIBX-1382
canertinib
WH-4025
WZ-4-145
BMS-536924
U-0126
vemurafenib
TL-HRAS-61
afatinib
Connectivity Score
0.992
0.987
0.975
0.969
0.969
0.969
0.969
0.966
0.965
0.964
0.96
0.96
0.952
0.95
0.949
0.947
0.947
0.946
0.946
0.945
0.943
0.942
0.941
0.937
0.933
Page 1 / 200
findings
1) Large-scale gene-expression analysis
2) Analysis of L1000 shRNA signatures
Current CMap Dataset
biological goal
1.
2.
3.
4.
5.
6.
Connections b/w genes and drugs
GWAS gene lists to pathways
Causal mutation to therapeutic leads
Discovering new cancer pathways
MoA of novel small-molecules
Biological novelty biasing
LINCS as a starting point for
functional follow-up
Core Signature DB
Core Gene signatures from KD (n=1387)
Genes (n=1387)
22268 Features
Genes (n=1387)
Similarity Metric
Signature Diversity
Mining the Similarity Matrix
• Unsupervised
• Global Patterns
• Supervised
•
263 Components explain 80% of the variance
Gene->[Gene,Pathway,Compound]
Global Views of Connections
Connections per gene
Most connected genes
49% of genes have at least 1
connection > 0.4
PC3 cell line
querying LINCS for connections
JAK2 knockdown connects to STAT1 signature
FOS knockdown connects to JUN signature
Cell cycle genes connected (CCND1, CDK2, CDK4, CDK6, CCNE1, E2F1)
ER knockdown connected to ER antagonists & inversely connected to ER
agonists
• JAK2 over-expression signature inversely to JAK2 inhibitor (lestaurtinib)
• HDAC knock-downs connected to HDAC inhibitors (vorinostat, others)
• NRF2 over-expression signature inversely connected to curcumin
• WNT1 gene connections: TCF7L1, GSK3B, CSNK2A2, PRAKACA, SMAD3
…
•
•
•
•
Integrating queries across members of a pathway
AKT1
genes connections
AKT3, FOXO1,
PDPK1, PHLPP1,
PIK3CB
Top 10 small-molecule connections
allele classification
• genes implicated by GWAS
39 genes
associated with T2D
S. Jacobs &
D. Altshuler
– can be many hundreds, most unannotated
• create profiles of ablation (shRNA) in
suitable cells by L1000
– universal functional bioassay
• cluster into “complementation groups”
– assign genes to groups, groups to pathways,
pathways to disease
Target ID
Drug
signature in
MCF7
All MCF7 CGS
Query
Molecular target of Drug A
wtcs score rank
Similar
Dissimilar
An Example where integrating across
many shRNAs improves Connections
Each dot is a dose / timepoint of rapamycin
MTOR shRNA 1
MTOR shRNA 2
MTOR shRNA 3
MTOR shRNA 4
MTOR shRNA 5
MTOR shRNA 6
MTOR shRNA 7
MTOR shRNA 8
MTOR shRNA 9
MTOR shRNA 10
MTOR shRNA 11
MTOR shRNA 12
MTOR shRNA 13
MTOR Consensus
Gene Signature
1
1000
2000
3000
4000
5000
Connectivity Rank of Small Molecules
Query with Vemerafinib,
highlight BRAF shRNAs
Cell line
Each dot is an individual shRNA targeting BRAF
Positive
Correlation
Rank of shRNA (%)
Negative
Correlation
MTOR connects to BEZ235
Rank
1
2
3
4
5
6
7
8
9
10
Rank
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
25
Compound ID
BRD-K12184916
BRD-K69932463
BRD-K67566344
BRD-K67868012
BRD-K77008974
BRD-K94294671
BRD-A45498368
BRD-K13049116
BRD-K87343924
BRD-K67075780
CGS ID
CGS001-2475
CGS001-4609
CGS001-57521
CGS001-2623
CGS001-5245
CGS001-2581
CGS001-9184
CGS001-360023
CGS001-4860
CGS001-11164
CGS001-89849
CGS001-527
CGS001-2065
CGS001-3845
CGS001-4486
CGS001-3479
CGS001-207
CGS001-8607
CGS001-54106
CGS001-5045
CGS001-9533
Compound Description
NVP-BEZ235
AZD-8055
KU-0063794
PI-103
WYE-354
OSI-027
WYE-125132
BMS-754807
wortmannin
TGX-115
Connectivity Score
1
1
1
0.999
0.998
0.998
0.998
0.997
0.996
0.996
Gene Symbol
MTOR
MYC
RPTOR
GATA1
PHB
GALC
BUB3
ZBTB41
PNP
NUDT5
ATG16L2
ATP6V0C
ERBB3
KRAS
MST1R
IGF1
AKT1
RUVBL1
TLR9
FURIN
POLR1C
Connectivity Score
0.999
0.99
0.976
0.972
0.969
0.967
0.965
0.965
0.965
0.964
0.964
0.964
0.961
0.954
0.954
0.951
0.95
0.948
0.948
0.947
0.944
BEZ235: a dual ATP-competitive PI3K and mTOR inhibitor
Dose dependent
connectivity
PIK3CA connects to BEZ235
Current list of significant drug-CGS connectivities
span multiple MoA’s
losartan
MK-2206
10-DEBC
MK-2206
MK-2206
10-DEBC
brefeldin A
gossypol
YM-155
ZM336372
LFM-A13
N9-isopropylolomoucine
BML-259
fumonisin B1
etomoxir
PNU-74654
cyanoquinoline 11
neratinib
tyrphostin AG-1478
AGTR1
AKT1
AKT1
AKT2
AKT3
AKT3
ARF1
BCL2
BIRC5
BRAF
BTK
CDK1
CDK2
CERS4
CPT1A
CTNNB1
EGFR
EGFR
EGFR
tamoxifen
PF-3845
ESR1
FAAH
Merck60
ISOX
2-bromopyruvate
lovastatin acid
linsitinib
selumetinib
Compound 11e
sirolimus
BEZ235
PIK-90
PP-30
parthenolide
triptolide
dexamethasone
olaparib
olaparib
veliparib
GSK-2334470
BX-795
AZD-7545
HDAC1
HDAC6
HK1
HMGCR
IGF1R
MAP2K1
MAPK1
MTOR
MTOR
MTOR
MTOR
NFKB1
NFKB2
NR3C1
PARP1
PARP2
PARP2
PDK1
PDK1
PDK2
TGX-115
BEZ235
PIK-90
Compound 110
GW-843682X
LFM-A13
HA-1004
KU 0060648
AM-580
gemcitabine
fatostatin
RITA
nutlin-3
pifithrin-alpha
SJ-172550
gemcitabine
MK 1775
PIK3C2A
PIK3CA
PIK3CA
PIK3CA
PLK1
PLK1
PRKACB
PRKDC
RARA
RRM1
SREBF2
TP53
TP53
TP53
TP53
TYMS
WEE1
Goal: Given a chemical library:
• identify the bioactive subset of a library
• identify unique bioactivity
Gene-expression as a universal measure of bioactivity
If we see no robust gene expression consequence whatsoever
across a diverse panel of cell types, then it's likely that the
compound has no bioactivity.
L1000 as a sensor of bioactivity
signature strength (S)
S-C plot
signature robustness across replicates (C)
dose titration
active analogs
(high S-C)
inactive analogs
(low S-C)
biological novelty biasing of chemical libraries
• global bioactivity detection using L1000 profiles
– number and magnitude of expression changes, and robustness
• calibrate with 350 known bioactives across 47 cell lines
– median sensitivity of individual cell lines is 42% (90% specificity)
– rationally-designed panel of 7 cell lines achieves 95% sensitivity
• qualification, de-duplication, and novelty biasing
– consolidate and subset libraries based on function
signal strength
20
chemical library
n = 9,875
active
n = 487 (5%)
known MoA
n = 435 (4.5%)
6
0
-1
0
reproducibility
1
novel
n = 52 (0.5%)
de-duplicated
n = 30 (0.3%)
Broad LINCS U54
1. Data Generation: 1.2M+ profiles released to LINCS
2. Data Access: Multiple levels of data matrices, cloudcompute beta released
3. Biologist-friendly web user interfaces
4. Emerging scientific findings
1. Causal mutation to therapeutic leads
2. GWAS gene lists to pathways
3. Discovering new cancer pathways
4. Connecting small-molecules to biology
5. Biological novelty biasing of chemical libraries
CMap Analytical
CMap Data Generation
Rajiv Narayan
Joshua Gould
Corey Flynn
Ted Natoli
David Wadden
Ian Smith
Roger Hu
Larson Hogstrom
Peyton Greenside
David Peck
John Davis
Roger Cornell
Xiaohua Wu
Xiaodong Lu
Melanie Donahue
Broad Platforms
RNAi platform
Chemical Biology
TD/TS
Broad Scientists
Jesse Boehm
Bang Wong
Federica Piccioni
John Doench
David Root
Suzanne Jacobs
Paul Clemons
Stuart Schreiber
Aly Shamji
Todd Golub
Download