Library of Integrated Network -based Cellular Signatures (LINCS) September 20, 2013 LINCS concept cell types • perturbations scalable to genome • high information content read-outs (e.g. gene expression) • inexpensive • mechanism to query database Look-up table of cellular activity COMMUNITY QUERIES PLATFORMINDEPENDENT GENOME SCALE GENETIC PHARMACOLOGIC perturbations database MODERATE COMPLEXITY 10’S COMPLEX cell types read out The LINCS Network (NIH) Data Production/Analysis Centers Broad Institute Harvard Medical School Computational and Technology Development Centers Arizona State Broad Institute (Jake Jaffe) Columbia U. Cincinnati Miami School of Medicine Wake Forest Yale External Collaborations • • • • • • • • • • • • • Snyder Lab, Sanford-Burnham Medical Research Institute FDA GTEx ENCODE/Epigenomics Rao Lab, NIH CRM: Scadden Lab, Massachusetts General Hospital McCray Lab, University of Iowa Loring Lab, Scripps Research Institute Edenberg Lab, Indiana University Spria Lab, Boston University Pandolfi Lab, BIDMC Chen Lab, NHLBI Kotton Lab, Boston University Connectivity Map diseases genes drugs mRNA Expression Database 453 Affymetrix profiles 164 drugs > 16,000 users 916 citations Lamb et al, Science (2006) CMAP/LINCS is an approach to functional annotation perturbagens cell types CMap is limited by profiling cost low-cost, high-throughput method would enable… primary screening libraries drug-like, non-drug-like, natural products genomic perturbagens shRNA, ORF, variants (natural + synthetic) cellular contexts tissues, types, culture conditions, genetics treatment parameters concentrations, durations, combinations • re-think: gene content × labeling × detection observation: gene expression is correlated genes samples Reduced Representation of Transcriptome reduced representation transcriptome genome-wide expression profile computational inference model ‘landmarks’ 100 60 simulation 40 20 100 300 500 700 1000 1000 1500 2000 5000 10000 14812 0 22283 ~ 100,000 profiles % connections 80 80% number of landmarks measured 1000-plex Luminex bead profiling AAAA 3' 5' RT 3' 5' 5'-PO4 | TTTT 3' ligation Luminex Beads (500 colors, 2 genes/color) 5' 5' PCR hybridization 001 Reagent cost: $5/sample “L1000” expression profiling GeneChip L1000 measured content 1 transcripts inferred 22,000 1 1,000 transcripts 22,000 technology microarray Luminex beads throughput 3× 96 / week 200× 384 / week $500 $5 unit cost (reagent) LINCS Dataset Current LINCS Dataset small-molecules 1,000 landmark genes genomic perturbagens 1,209,824 profiles 21,000 inferred genes 5,178 compounds 15 cell types • Banked primary cell types • Cancer cell lines • Primary hTERTimmortalized • Patient-derived iPS cells • 5 community nominated • 1,300 off-patent FDA-approved drugs • 700 bioactive tool compounds • 2,000+ screening hits (MLPCN + others) 3,712 genes (shRNA + cDNA) •targets/pathways of FDA-approved drugs (n=900) •candidate disease genes (n=600) Coming soon (in beta) U54 Grant: Progress on Data Access desc level 1 Raw data level 2 Normalized dataset level 3 Signatures (differentially expressed genes) level 4 Queries format Plate folders with Matrix: GCTX 1. mongo DB 2. Matrix: GCTX JSON objects availability common use cases 3,812 folders new computational approaches to data pre-processing and normalization 1.2M+ profiles deriving signatures other kinds of analysis 383,788 sigatures (beta release) Q1 2014 High-level integration with analytics and websites e.g Genes that are modulated by TP53 Genes most correlated to the Akt1 pathway Genes connected to an external query signature findings 1) Large-scale gene-expression analysis 2) Analysis of L1000 shRNA signatures # of profiles 1400000" 1200000" 1000000" 800000" 600000" 400000" 200000" 0" CMap"v1" CMap"v2" CMap"v3" Data quality: correlation between biological replicates matching cell states 1) define a ‘query’ the set of genes up- and down- regulated in a cellular state of interest 2) assess strength of the query in the profile of all perturbagens in DB not connected up-regulated down-regulated cumulative score cumulative score connected genes (thousands) genes (thousands) 3) rank order perturbagens by connectivity strength rank perturbagen 1 2 3 . . . . . 997 998 999 drug Y drug e gene S … gene n drug I drug L … drug N gene E drug G conn score 1 0.993 0.791 . 0 0 0 . -0.877 -0.945 -1 positive connectivity no connectivity negative connectivity reversing drug resistance signature: glucocorticoid resistant acute lymphoblastic leukemia resistant sensitive resistant sensitive 50 ‘sensitive’ and 50 ‘resistant’ markers (David Twomey and Scott Armstrong) 1 rank perturbagen 5 6 27 35-sirolimus 42-sirolimus 26-sirolimus cell HL60 ssMCF7 MCF7 score 0.804 0.789 0.544 sirolimus hypothesis: sirolimus induces glucocorticoid sensitivity 464 The 1% challenge: the “tail” of current data is > ENTIRE previous dataset 1400000" 1200000" 1000000" 800000" 600000" 400000" 200000" 0" CMap"v1" CMap"v2" CMap"v3" query: histone deacetylase inhibitors (Glaser et al 2003) 0.5% Rank 1 Compound ID BRD-K69840642 Compound Description ISOX Connectivity Score 0.995 2 BRD-K52522949 NCH-51 0.994 3 BRD-K12867552 THM-I-94 0.993 4 BRD-K64606589 apicidin 0.992 5 BRD-K56957086 dacinostat 0.99 6 BRD-A19037878 trichostatin-a 0.989 7 BRD-A94377914 merck-ketone 0.987 8 BRD-K17743125 belinostat 0.987 9 BRD-K75081836 BRD-K75081836 0.986 10 BRD-K81418486 vorinostat 0.986 11 BRD-K68202742 trichostatin-a 0.986 12 BRD-K22503835 scriptaid 0.986 13 BRD-K02130563 panobinostat 0.985 14 BRD-A39646320 HC-toxin 0.983 15 BRD-K13810148 givinostat 0.98 16 BRD-K85493820 KM-00927 0.977 17 BRD-K11663430 pyroxamide 0.977 18 BRD-K74761218 WT-171 0.975 19 BRD-K74733595 APHA-compound-8 0.97 20 BRD-A19248578 latrunculin-b 0.965 21 BRD-K49010888 BRD-K49010888 0.962 22 BRD-K53308430 SA-1017940 0.951 23 BRD-K64890080 BI-2536 0.95 24 BRD-K00627859 tubastatin-a 0.947 25 BRD-K31542390 mycophenolic-acid 0.946 Page 1 / 200 query: compound identified to induce the lysosomal apoptosis pathway (D’Arcy et al Nature Medicine 2012) 0.5% Rank 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 Compound ID BRD-K78659596 BRD-K60230970 BRD-K88510285 BRD-A55484088 BRD-A18725729 BRD-K74402642 BRD-K50234570 BRD-A58924247 BRD-A39093044 BRD-A72180425 BRD-K50691590 BRD-K19499941 BRD-K09854848 BRD-A76490030 BRD-A36275421 BRD-K28366633 BRD-A11007541 BRD-K37392901 BRD-K66884694 BRD-A83124583 BRD-K10882151 BRD-K44366801 BRD-K61033289 BRD-K07303502 BRD-K02822062 Compound Description MLN2238 MG-132 bortezomib BNTX BRD-A18725729 NSC-632839 EMF-bca1-16 BRD-A58924247 K784-3187 K784-3188 bortezomib BRD-K19499941 MD-II-008-P K784-3131 MW-RAS12 BRD-K28366633 BCI-hydrochloride NSC-632839 BRD-K66884694 EMF-sumo1-39 BO2-inhibits-RAD51 BRD-K44366801 15-delta-prostaglandin-j2 arachidonyl-trifluoro-methane CT-200783 Connectivity Score 0.998 0.998 0.996 0.993 0.993 0.992 0.992 0.992 0.992 0.992 0.992 0.99 0.988 0.988 0.987 0.987 0.987 0.987 0.987 0.986 0.986 0.985 0.985 0.984 0.984 Page 1 / 200 query: HUVEC cells treated with pitavastatin (cell line not in panel) 0.5% Rank 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 Compound ID BRD-A81772229 BRD-A70155556 BRD-U88459701 BRD-A18763547 BRD-K22134346 BRD-K12994359 BRD-K09416995 BRD-K34581968 BRD-K94176593 BRD-K20285085 BRD-K94441233 BRD-K95785537 BRD-K53414658 BRD-K83213911 BRD-K85606544 BRD-A19248578 BRD-K68588778 BRD-K06750613 BRD-A11678676 BRD-K05653692 BRD-K72420232 BRD-K19796430 BRD-K78513633 BRD-K03618428 BRD-K37940862 Compound Description simvastatin lovastatin atorvastatin BAX-channel-blocker simvastatin valdecoxib lovastatin BMS-536924 TWS-119 fostamatinib mevastatin PP-2 tivozanib PF-750 neratinib latrunculin-b BRD-K68588778 GSK-1059615 wortmannin DL-PDMP WZ-4002 erismodegib lonidamine PP-110 BRD-K37940862 Connectivity Score 0.996 0.994 0.991 0.988 0.985 0.983 0.981 0.979 0.975 0.973 0.972 0.971 0.97 0.968 0.968 0.967 0.966 0.966 0.964 0.963 0.961 0.961 0.961 0.961 0.961 Page 1 / 200 query: imatinib-resistant chronic myeloid leukemia (Frank et al Leukemia 2006) 0.5% Rank 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 Compound ID BRD-K12502280 BRD-K94176593 BRD-K20285085 BRD-K49328571 BRD-K12867552 BRD-K85493820 BRD-A02180903 BRD-K91701654 BRD-K95785537 BRD-K53414658 BRD-A50454580 BRD-K73789395 BRD-K17743125 BRD-K46419649 BRD-K09499853 BRD-K64890080 BRD-K70914287 BRD-K50168500 BRD-U43867373 BRD-U25771771 BRD-K34581968 BRD-K18787491 BRD-K56343971 BRD-K01877528 BRD-K66175015 Compound Description TG-101348 TWS-119 fostamatinib dasatinib THM-I-94 KM-00927 betamethasone U-0126 PP-2 tivozanib PD-0325901 ZM-336372 belinostat U0126 KU-0060648 BI-2536 BIBX-1382 canertinib WH-4025 WZ-4-145 BMS-536924 U-0126 vemurafenib TL-HRAS-61 afatinib Connectivity Score 0.992 0.987 0.975 0.969 0.969 0.969 0.969 0.966 0.965 0.964 0.96 0.96 0.952 0.95 0.949 0.947 0.947 0.946 0.946 0.945 0.943 0.942 0.941 0.937 0.933 Page 1 / 200 findings 1) Large-scale gene-expression analysis 2) Analysis of L1000 shRNA signatures Current CMap Dataset biological goal 1. 2. 3. 4. 5. 6. Connections b/w genes and drugs GWAS gene lists to pathways Causal mutation to therapeutic leads Discovering new cancer pathways MoA of novel small-molecules Biological novelty biasing LINCS as a starting point for functional follow-up Core Signature DB Core Gene signatures from KD (n=1387) Genes (n=1387) 22268 Features Genes (n=1387) Similarity Metric Signature Diversity Mining the Similarity Matrix • Unsupervised • Global Patterns • Supervised • 263 Components explain 80% of the variance Gene->[Gene,Pathway,Compound] Global Views of Connections Connections per gene Most connected genes 49% of genes have at least 1 connection > 0.4 PC3 cell line querying LINCS for connections JAK2 knockdown connects to STAT1 signature FOS knockdown connects to JUN signature Cell cycle genes connected (CCND1, CDK2, CDK4, CDK6, CCNE1, E2F1) ER knockdown connected to ER antagonists & inversely connected to ER agonists • JAK2 over-expression signature inversely to JAK2 inhibitor (lestaurtinib) • HDAC knock-downs connected to HDAC inhibitors (vorinostat, others) • NRF2 over-expression signature inversely connected to curcumin • WNT1 gene connections: TCF7L1, GSK3B, CSNK2A2, PRAKACA, SMAD3 … • • • • Integrating queries across members of a pathway AKT1 genes connections AKT3, FOXO1, PDPK1, PHLPP1, PIK3CB Top 10 small-molecule connections allele classification • genes implicated by GWAS 39 genes associated with T2D S. Jacobs & D. Altshuler – can be many hundreds, most unannotated • create profiles of ablation (shRNA) in suitable cells by L1000 – universal functional bioassay • cluster into “complementation groups” – assign genes to groups, groups to pathways, pathways to disease Target ID Drug signature in MCF7 All MCF7 CGS Query Molecular target of Drug A wtcs score rank Similar Dissimilar An Example where integrating across many shRNAs improves Connections Each dot is a dose / timepoint of rapamycin MTOR shRNA 1 MTOR shRNA 2 MTOR shRNA 3 MTOR shRNA 4 MTOR shRNA 5 MTOR shRNA 6 MTOR shRNA 7 MTOR shRNA 8 MTOR shRNA 9 MTOR shRNA 10 MTOR shRNA 11 MTOR shRNA 12 MTOR shRNA 13 MTOR Consensus Gene Signature 1 1000 2000 3000 4000 5000 Connectivity Rank of Small Molecules Query with Vemerafinib, highlight BRAF shRNAs Cell line Each dot is an individual shRNA targeting BRAF Positive Correlation Rank of shRNA (%) Negative Correlation MTOR connects to BEZ235 Rank 1 2 3 4 5 6 7 8 9 10 Rank 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 25 Compound ID BRD-K12184916 BRD-K69932463 BRD-K67566344 BRD-K67868012 BRD-K77008974 BRD-K94294671 BRD-A45498368 BRD-K13049116 BRD-K87343924 BRD-K67075780 CGS ID CGS001-2475 CGS001-4609 CGS001-57521 CGS001-2623 CGS001-5245 CGS001-2581 CGS001-9184 CGS001-360023 CGS001-4860 CGS001-11164 CGS001-89849 CGS001-527 CGS001-2065 CGS001-3845 CGS001-4486 CGS001-3479 CGS001-207 CGS001-8607 CGS001-54106 CGS001-5045 CGS001-9533 Compound Description NVP-BEZ235 AZD-8055 KU-0063794 PI-103 WYE-354 OSI-027 WYE-125132 BMS-754807 wortmannin TGX-115 Connectivity Score 1 1 1 0.999 0.998 0.998 0.998 0.997 0.996 0.996 Gene Symbol MTOR MYC RPTOR GATA1 PHB GALC BUB3 ZBTB41 PNP NUDT5 ATG16L2 ATP6V0C ERBB3 KRAS MST1R IGF1 AKT1 RUVBL1 TLR9 FURIN POLR1C Connectivity Score 0.999 0.99 0.976 0.972 0.969 0.967 0.965 0.965 0.965 0.964 0.964 0.964 0.961 0.954 0.954 0.951 0.95 0.948 0.948 0.947 0.944 BEZ235: a dual ATP-competitive PI3K and mTOR inhibitor Dose dependent connectivity PIK3CA connects to BEZ235 Current list of significant drug-CGS connectivities span multiple MoA’s losartan MK-2206 10-DEBC MK-2206 MK-2206 10-DEBC brefeldin A gossypol YM-155 ZM336372 LFM-A13 N9-isopropylolomoucine BML-259 fumonisin B1 etomoxir PNU-74654 cyanoquinoline 11 neratinib tyrphostin AG-1478 AGTR1 AKT1 AKT1 AKT2 AKT3 AKT3 ARF1 BCL2 BIRC5 BRAF BTK CDK1 CDK2 CERS4 CPT1A CTNNB1 EGFR EGFR EGFR tamoxifen PF-3845 ESR1 FAAH Merck60 ISOX 2-bromopyruvate lovastatin acid linsitinib selumetinib Compound 11e sirolimus BEZ235 PIK-90 PP-30 parthenolide triptolide dexamethasone olaparib olaparib veliparib GSK-2334470 BX-795 AZD-7545 HDAC1 HDAC6 HK1 HMGCR IGF1R MAP2K1 MAPK1 MTOR MTOR MTOR MTOR NFKB1 NFKB2 NR3C1 PARP1 PARP2 PARP2 PDK1 PDK1 PDK2 TGX-115 BEZ235 PIK-90 Compound 110 GW-843682X LFM-A13 HA-1004 KU 0060648 AM-580 gemcitabine fatostatin RITA nutlin-3 pifithrin-alpha SJ-172550 gemcitabine MK 1775 PIK3C2A PIK3CA PIK3CA PIK3CA PLK1 PLK1 PRKACB PRKDC RARA RRM1 SREBF2 TP53 TP53 TP53 TP53 TYMS WEE1 Goal: Given a chemical library: • identify the bioactive subset of a library • identify unique bioactivity Gene-expression as a universal measure of bioactivity If we see no robust gene expression consequence whatsoever across a diverse panel of cell types, then it's likely that the compound has no bioactivity. L1000 as a sensor of bioactivity signature strength (S) S-C plot signature robustness across replicates (C) dose titration active analogs (high S-C) inactive analogs (low S-C) biological novelty biasing of chemical libraries • global bioactivity detection using L1000 profiles – number and magnitude of expression changes, and robustness • calibrate with 350 known bioactives across 47 cell lines – median sensitivity of individual cell lines is 42% (90% specificity) – rationally-designed panel of 7 cell lines achieves 95% sensitivity • qualification, de-duplication, and novelty biasing – consolidate and subset libraries based on function signal strength 20 chemical library n = 9,875 active n = 487 (5%) known MoA n = 435 (4.5%) 6 0 -1 0 reproducibility 1 novel n = 52 (0.5%) de-duplicated n = 30 (0.3%) Broad LINCS U54 1. Data Generation: 1.2M+ profiles released to LINCS 2. Data Access: Multiple levels of data matrices, cloudcompute beta released 3. Biologist-friendly web user interfaces 4. Emerging scientific findings 1. Causal mutation to therapeutic leads 2. GWAS gene lists to pathways 3. Discovering new cancer pathways 4. Connecting small-molecules to biology 5. Biological novelty biasing of chemical libraries CMap Analytical CMap Data Generation Rajiv Narayan Joshua Gould Corey Flynn Ted Natoli David Wadden Ian Smith Roger Hu Larson Hogstrom Peyton Greenside David Peck John Davis Roger Cornell Xiaohua Wu Xiaodong Lu Melanie Donahue Broad Platforms RNAi platform Chemical Biology TD/TS Broad Scientists Jesse Boehm Bang Wong Federica Piccioni John Doench David Root Suzanne Jacobs Paul Clemons Stuart Schreiber Aly Shamji Todd Golub