this poster

advertisement

2

Abstract

While gene expression data is widely available describing mRNA levels in different cancer cells lines, the molecular regulatory mechanisms responsible for these changes are still poorly understood. Here we developed a rationale approach to infer regulatory mechanisms governing changes in gene expression by integrating datasets of protein/DNA interactions, proteinprotein interactions and kinase-substrate interactions collected from prior biological knowledge. We first utilize data obtained from genome-wide ChIP-on-chip and ChIP-Seq experiments to connect mRNA expression levels of the NCI-60 cancer cell lines to the transcription factors most likely regulating them. These identified transcription factors are then “connected”, using known protein-protein interactions, to form cancer specific sub-networks. Within these sub-networks we assess the enrichment for protein kinase substrates to infer the protein kinases likely regulating these complexes. Finally, using quantitative comparison of the up and down regulated genes for each cancer cell line, and genes affected by FDA approved drugs applied to cancer cells, we predict the mechanisms of action of these drugs. Following this path, from changes in gene expression to transcription factors to protein kinases we can provide a more thorough understanding of the regulatory mechanisms behind the observed mRNA levels in the NCI-60 cancer cell lines and other cancer cells. This approach proposes mechanisms of action for drugs. Wet lab experimental validation of this approach is still necessary, it can be done using single drugs or combinations of them.

Introduction

• The NCI-60 database provides mRNA profiles from microarray experiments of 60 commonly studies cancer cell lines

• Although analyzing these mRNA values is a reliable method to measure the mRNA level of many genes within a cell, this method offers little clues about how cells are regulated

• While mRNA profiles indicates changes caused by cancer, understanding the underlying regulatory mechanisms disregulated in different cancers will bring us closer to therapeutics

• In this project we aim to identify the transcription factors, protein complexes and protein kinases responsible for the aberrant expression of genes in the various types of cancer cell lines

Regulatory Signatures of Cancer Cell Lines

Inferred from Expression Data

Jayanth (Jay) Krishnan

1,2

, Avi Ma’ayan

2

1

Mahopac High School, Mahopac, NY 10541

2

Systems Biology Center New York and Department of Pharmacology and Systems

Therapeutics, Mount Sinai School of Medicine, New York NY

Microarray

Workflow

Top ranked transcription factors most likely responsible for the observed changes in expression

Analyze mRNA profile from NCI -60 database by using statistical techniques to compute over/under expressed genes

Future Research

Identify protein sub-networks that “connect” the transcription factors through additional proteins

Wet lab experimental validation

Top ranked protein kinases most likely regulating the protein sub-networks

Identify protein sub-networks that “connect” the transcription factors through additional proteins

Wet lab experimental validation

Analyzing the mRNA profile from the NCI-60 database

• Differentially expressed gene lists from the various NCI-60 Cancer

Cell Lines are used as input.

• Over expressed and under expressed genes are identified for specific cancer cell lines

• The following algorithm was implemented:

• The NCI-60 database was parsed and 18,133 unique genes were identified

• The population mean for the expression of each of the genes across all the 60 cancer cell lines was calculated

• The sample mean and sigma for each (gene, cancer cell line) pair was calculated

• The two-sided T-test statistic was applied for each (gene, cancer cell line) pair.

• Whether the gene was over expressed or under expressed was calculated by checking whether the test statistic exceeded a critical T score or was a less than a critical T score determined based on a particular P value.

• A list of genes which are over/under expressed for multiple cancer cell lines was developed

Cancer cell lines

Probes 1

1 M1,1

2

M1,2

3

M1,3

… c

M1,c

2 M2, 1 M2,2 M2,3 … M2,c

Statistical Methods:

… n

Mn,1

Mn,2

Mn,3

Mn,c

Population mean µ = ∑ Mi,j / (n * 60)

60

M1,60

M2,60

Mn,60

i=1,n; j=1,60

Sample mean of gene expressions for cancer cell line “c” = x = ∑ Mi,c / n

i=1,n

Std deviation of gene expressions for cancer cell line “c” = s = sqrt (∑ (Xi – x

2

/ (n-1))

i=1,n

Test statistic = ( xbar – µ) * sqrt(n) / s

Example of Process

PLD1

RAE1

CHPF

PRKCD

CTSF

SART1

GPAA1

GAS7

DHPS

CD302

WARS

SLC29A1

CUL4B

SLC37A4

KAT2B

DNAJB1

IL13RA1

SLC6A8

TCTA

EDNRB

GJB1

NPAS2

HBE1

TYR

STAU1

HLA-DRA

PLP1

AP1S1

SCRG1

RXRG

GLRX

MAGEA1

TXN2

HSP90AA1

ATP7A

TRAIP

PLXNB3

HCG4

CTAG2

TNFRSF14

MAGEA12

EIF2S3

UBFD1

KCNS3

GPR143

ZNF200

FAM3A

CTAG1B

SPIN2B

C14orf109

C1orf144

CLCN2

SLC25A6

C9orf61

ZCCHC24

CCPG1

DPY19L2P2 DUSP10 SLC6A10P SLC5A4

M6PR

TRAK2

CSNK1E

TIMP2

NOV

DCT

TYRP1

S100A3

DGKI

CAST

RAB27A

NRP2

GPNMB

AKR7A2

GTF2H1

STX7

HPS5

DYNC1I1

TRPM2

SREBF2

USF2

PCOLCE

SNTA1

SMCR7L

BCHE

DDX18

SEC11A

CNOT8

CHMP2B

PTPN18

ACP5

GSTT2

SLC4A3

ASPA SLC22A18AS PPP2R4

CGGBP1

PDIA6

SLC1A4

CAPN3

SUCLG2 TUBB4

MAGEA2B MAGEA5

GK3P SFXN3

CSRP2

BEST1

KHDRBS3

GYG2

RFNG

AZI1

SLC25A11

MCM7

ART3

DLAT

MORC3

UAP1L1

HLA-DMA ALDH18A1

BACE2

CRIPT

CADPS2

MGAT4B

MUL1

MTO1

PRR7

FAM86C

METT11D1

HEY1

GAL3ST4

ROPN1B

MRPS18A

FBXL15

NUDT11

C20orf30

RPL23AP7 MICALL1 LDLRAP1 C17orf90

LOC348926 PCOTH FAM86B1 LPCAT2

KLF11

CSGALNACT1

CEP97

TRIM48

LUZP1

SURF4

ARHGEF3

ITIH5

LONRF3

TH1L

C3orf64

XPO5

FAHD2A

CA14

TINF2

C5orf54

GPR177

PDXP

RINT1

C14orf139

TP53TG3

CDCA3

COL9A1

COPG2

HAGHL TNFSF13B FAM167B

ULK3 TOMM40L FAM160B1

SPRYD5

SNX30

DGAT2

TMEM55A

C2orf30

GGT7

C6orf89

C12orf34

UBL7

C3orf38

HDAC10 LOC400657 AFAP1L1 FAM125A OLIG1

HSD11B1L SCARNA15 SMYD4 LOC153364 CAMK2N2

CHRM1

GPR158

RNF175

AARS2

SLITRK4

C5orf35

ANKRD54 KIAA1524

GNASAS ELOVL3

LOC147645 LOC730259

KIAA1586 ZC3H12C

ST6GALNAC3

TMEM171

C11orf82

PAGE2 LOC730124 GBGT1

LRRC33

DLX1

ENHO

FSTL5

NPHP3

CITED4

CLEC2L

HMCN1

Top 222 over expressed genes for cancer cell line MDA_N (melanoma)

With gene input, ChEA identified the top ranked transcription factors

Genes2Networks output of protein subnetworks when top

10 transcription factors from ChEA were given as an input

ChEA Genes2Networks KEA

ChEA, Genes2Networks, and KEA are all web-based tools developed at the Ma’ayan lab to allow users to predict which transcription factors, protein subnetworks, and protein kinases are most correlated with their inputted seed list

• By using the identified up and down regulated genes for each cancer cell line as an input for

ChEA; the top ranked transcription factors (based on pvalue from Fisher’s Exact Test) that most likely influence the input seed list are given as the output

Genes2Networks

•The transcription factor output for each cancer cell line from ChEA is used as an input to Genes2Networks

• Genes2Networks connects lists of transcription factors with other protein intermediates from mammalian protein interactions databases

KEA

•The unique protein sub-networks outputted by

Genes2Networks can then be inputted into KEA which identifies protein kinases most likely regulating the proteins from the subnetwork using the Fisher’s Exact

Test.

• At this stage top regulating transcription factors, protein sub-networks and kinases have been identified for each of the NCI-60 cancer cell lines

• An integrated matrix can now be created in order to holistically compare the data by displaying the top regulating elements and their putative effects on the different cell lines

Future Research

• Future research involves further analyzing other cancer datasets

• Cluster analysis will be done to groups transcription factors or kinases that were identified

• Additionally, by combining such data with data collected for drug perturbation of these cells, we may be able to suggest which drugs can reverse the observed changes

Top ranked kinase proteins identified from KEA

Acknowledgements

This research was supported by NIH Grant No.

5P50GM071558

Download