2
While gene expression data is widely available describing mRNA levels in different cancer cells lines, the molecular regulatory mechanisms responsible for these changes are still poorly understood. Here we developed a rationale approach to infer regulatory mechanisms governing changes in gene expression by integrating datasets of protein/DNA interactions, proteinprotein interactions and kinase-substrate interactions collected from prior biological knowledge. We first utilize data obtained from genome-wide ChIP-on-chip and ChIP-Seq experiments to connect mRNA expression levels of the NCI-60 cancer cell lines to the transcription factors most likely regulating them. These identified transcription factors are then “connected”, using known protein-protein interactions, to form cancer specific sub-networks. Within these sub-networks we assess the enrichment for protein kinase substrates to infer the protein kinases likely regulating these complexes. Finally, using quantitative comparison of the up and down regulated genes for each cancer cell line, and genes affected by FDA approved drugs applied to cancer cells, we predict the mechanisms of action of these drugs. Following this path, from changes in gene expression to transcription factors to protein kinases we can provide a more thorough understanding of the regulatory mechanisms behind the observed mRNA levels in the NCI-60 cancer cell lines and other cancer cells. This approach proposes mechanisms of action for drugs. Wet lab experimental validation of this approach is still necessary, it can be done using single drugs or combinations of them.
• The NCI-60 database provides mRNA profiles from microarray experiments of 60 commonly studies cancer cell lines
• Although analyzing these mRNA values is a reliable method to measure the mRNA level of many genes within a cell, this method offers little clues about how cells are regulated
• While mRNA profiles indicates changes caused by cancer, understanding the underlying regulatory mechanisms disregulated in different cancers will bring us closer to therapeutics
• In this project we aim to identify the transcription factors, protein complexes and protein kinases responsible for the aberrant expression of genes in the various types of cancer cell lines
1,2
2
1
2
Analyze mRNA profile from NCI -60 database by using statistical techniques to compute over/under expressed genes
Identify protein sub-networks that “connect” the transcription factors through additional proteins
Wet lab experimental validation
Top ranked protein kinases most likely regulating the protein sub-networks
• Differentially expressed gene lists from the various NCI-60 Cancer
Cell Lines are used as input.
• Over expressed and under expressed genes are identified for specific cancer cell lines
• The following algorithm was implemented:
• The NCI-60 database was parsed and 18,133 unique genes were identified
• The population mean for the expression of each of the genes across all the 60 cancer cell lines was calculated
• The sample mean and sigma for each (gene, cancer cell line) pair was calculated
• The two-sided T-test statistic was applied for each (gene, cancer cell line) pair.
• Whether the gene was over expressed or under expressed was calculated by checking whether the test statistic exceeded a critical T score or was a less than a critical T score determined based on a particular P value.
• A list of genes which are over/under expressed for multiple cancer cell lines was developed
Cancer cell lines
Probes 1
1 M1,1
2
M1,2
3
M1,3
…
… c
M1,c
2 M2, 1 M2,2 M2,3 … M2,c
…
…
…
… n
…
Mn,1
…
Mn,2
…
Mn,3
…
…
…
Mn,c
…
…
Population mean µ = ∑ Mi,j / (n * 60)
60
M1,60
M2,60
…
Mn,60
i=1,n; j=1,60
Sample mean of gene expressions for cancer cell line “c” = x = ∑ Mi,c / n
i=1,n
Std deviation of gene expressions for cancer cell line “c” = s = sqrt (∑ (Xi – x
2
/ (n-1))
i=1,n
Test statistic = ( xbar – µ) * sqrt(n) / s
PLD1
RAE1
CHPF
PRKCD
CTSF
SART1
GPAA1
GAS7
DHPS
CD302
WARS
SLC29A1
CUL4B
SLC37A4
KAT2B
DNAJB1
IL13RA1
SLC6A8
TCTA
EDNRB
GJB1
NPAS2
HBE1
TYR
STAU1
HLA-DRA
PLP1
AP1S1
SCRG1
RXRG
GLRX
MAGEA1
TXN2
HSP90AA1
ATP7A
TRAIP
PLXNB3
HCG4
CTAG2
TNFRSF14
MAGEA12
EIF2S3
UBFD1
KCNS3
GPR143
ZNF200
FAM3A
CTAG1B
SPIN2B
C14orf109
C1orf144
CLCN2
SLC25A6
C9orf61
ZCCHC24
CCPG1
DPY19L2P2 DUSP10 SLC6A10P SLC5A4
M6PR
TRAK2
CSNK1E
TIMP2
NOV
DCT
TYRP1
S100A3
DGKI
CAST
RAB27A
NRP2
GPNMB
AKR7A2
GTF2H1
STX7
HPS5
DYNC1I1
TRPM2
SREBF2
USF2
PCOLCE
SNTA1
SMCR7L
BCHE
DDX18
SEC11A
CNOT8
CHMP2B
PTPN18
ACP5
GSTT2
SLC4A3
ASPA SLC22A18AS PPP2R4
CGGBP1
PDIA6
SLC1A4
CAPN3
SUCLG2 TUBB4
MAGEA2B MAGEA5
GK3P SFXN3
CSRP2
BEST1
KHDRBS3
GYG2
RFNG
AZI1
SLC25A11
MCM7
ART3
DLAT
MORC3
UAP1L1
HLA-DMA ALDH18A1
BACE2
CRIPT
CADPS2
MGAT4B
MUL1
MTO1
PRR7
FAM86C
METT11D1
HEY1
GAL3ST4
ROPN1B
MRPS18A
FBXL15
NUDT11
C20orf30
RPL23AP7 MICALL1 LDLRAP1 C17orf90
LOC348926 PCOTH FAM86B1 LPCAT2
KLF11
CSGALNACT1
CEP97
TRIM48
LUZP1
SURF4
ARHGEF3
ITIH5
LONRF3
TH1L
C3orf64
XPO5
FAHD2A
CA14
TINF2
C5orf54
GPR177
PDXP
RINT1
C14orf139
TP53TG3
CDCA3
COL9A1
COPG2
HAGHL TNFSF13B FAM167B
ULK3 TOMM40L FAM160B1
SPRYD5
SNX30
DGAT2
TMEM55A
C2orf30
GGT7
C6orf89
C12orf34
UBL7
C3orf38
HDAC10 LOC400657 AFAP1L1 FAM125A OLIG1
HSD11B1L SCARNA15 SMYD4 LOC153364 CAMK2N2
CHRM1
GPR158
RNF175
AARS2
SLITRK4
C5orf35
ANKRD54 KIAA1524
GNASAS ELOVL3
LOC147645 LOC730259
KIAA1586 ZC3H12C
ST6GALNAC3
TMEM171
C11orf82
PAGE2 LOC730124 GBGT1
LRRC33
DLX1
ENHO
FSTL5
NPHP3
CITED4
CLEC2L
HMCN1
ChEA Genes2Networks KEA
ChEA, Genes2Networks, and KEA are all web-based tools developed at the Ma’ayan lab to allow users to predict which transcription factors, protein subnetworks, and protein kinases are most correlated with their inputted seed list
• By using the identified up and down regulated genes for each cancer cell line as an input for
ChEA; the top ranked transcription factors (based on pvalue from Fisher’s Exact Test) that most likely influence the input seed list are given as the output
Genes2Networks
•The transcription factor output for each cancer cell line from ChEA is used as an input to Genes2Networks
• Genes2Networks connects lists of transcription factors with other protein intermediates from mammalian protein interactions databases
KEA
•The unique protein sub-networks outputted by
Genes2Networks can then be inputted into KEA which identifies protein kinases most likely regulating the proteins from the subnetwork using the Fisher’s Exact
Test.
• At this stage top regulating transcription factors, protein sub-networks and kinases have been identified for each of the NCI-60 cancer cell lines
• An integrated matrix can now be created in order to holistically compare the data by displaying the top regulating elements and their putative effects on the different cell lines
• Future research involves further analyzing other cancer datasets
• Cluster analysis will be done to groups transcription factors or kinases that were identified
• Additionally, by combining such data with data collected for drug perturbation of these cells, we may be able to suggest which drugs can reverse the observed changes