Viewing data as tracks

advertisement
Data integration across omics landscapes
Bing Zhang, Ph.D.
Department of Biomedical Informatics
Vanderbilt University School of Medicine
bing.zhang@vanderbilt.edu
Omics data integration
DNA
Elephant
mRNA
Protein
2
CNCP2012
Informatics approaches to integrate genomic and proteomic data
Genome
The Cancer Genome Atlas
3
Mutations
Exome Sequencing
RNA-Seq
Sequence variants
arrayCGH, SNP Array
CNV
SNP Array
LOH
EG
Clinical Proteomic Tumor
Analysis Consortium
Exome Sequencing
RNA-Seq
Methylation Array
DNA Methylation
Transcriptome
CPTAC
Data Type
Array, RNA-Seq
Exon expression
RNA-Seq
Junction
expression
Array, RNA-Seq
Gene expression
Proteome
TCGA
Technology
MS/MS
Protein
expression
MS/MS, protein arrays
Protein PTM
CNCP2012
Genomic
data
Improved proteomic data
analysis
Genomic
data
Proteomic
data
Novel biological insights
Informatics approaches to integrate genomic and proteomic data


4
Using genomic data to improve proteomic data analysis

Project 1. customProDB: generating customized protein
databases to enhance protein identification in shotgun
proteomics

Project 2. NetWalker: prioritizing candidate gene lists for
targeted MRM analysis
Integrating genomic and proteomic data to gain novel
biological insights

Project 3. miRNA-mediated regulation: understanding posttranscriptional mechanisms regulating human gene expression

Project 4. NetGestalt: viewing and correlating cancer omics data
within a biological network context
CNCP2012
customProDB: motivation
Proteins with
sequence variation
Unexpressed
proteins
commonly used database
Expressed proteins
Database search
5
CNCP2012
Customized protein database from RNA-Seq data

Increased
sensitivity

Reduced
ambiguity

Variant
peptides
Wang et al., J Proteome Res, 2012
6
CNCP2012
CustomProDB: moving forward

R package

Compatible with both DNA and RNA sequencing data

Sample specific database and consensus database

Application to the CPTAC project

Spectral library
Wang et al., manuscript in preparation
7
CNCP2012
miRNA regulation: motivation
Inverse
correlation
miRNA expression
8
mRNA expression
mRNA decay
Protein/mRNA ratio
Translation repression
Protein expression
Combined effect
CNCP2012
miRNA regulation: data preparation
9

9 colorectal cancer cell lines

Protein expression data: Current study

mRNA expression data: GSE10843

miRNA expression data: GSE10833
CNCP2012
miRNA regulation: data analysis workflow
Dissecting microRNA-mediated regulation
miRNA-mRNA
correlation
mRNA decay
protein/mRNA
ratio
miRNA-ratio
correlation
Translational
repression
protein
miRNA-protein
correlation
mRNAi
microRNA
79 miRNAs
Combined effect
5144 genes
microRNA-target interactions
Sequence features on site efficacy
Significant Correlation
Association of sequence
features with estimated mRNA
decay or translation repression
Site type
Site location
TargetScan, miRanda
or MirTarget2
Binding
evidence
580 interactions
60 miRNAs
423 genes
miRNA-target
interactions
Local AU-context
7235 functional
relationships
Additional 3’ pairing
miRNA-mRNA
miRNA-ratio
miRNA-protein
S
NS
*
NS
S
*
NS
NS
S
S
S
S
Major
contribution
mRNA
decay
Translation
repression
Both
weak
Both
strong
The contribution of mRNA decay and translation repression
Liu et al., manuscript in preparation
10
CNCP2012
miRNA regulation: mRNA decay or translational repression?

Early studies suggest a major role of translational repression


Recent large-scale studies suggest a predominant role of mRNA
decay


11
Olsen et al. Dev Biol, 1999; Zeng et al., Molecular Cell, 2001
Baek et al., Nature, 2008; Selbach et al., Nature, 2008; Guo et al.,
Nature, 2010
Our study suggested equally important roles of mRNA decay and
translational repression

Translational repression was involved in 58% and played a major role in
30% of all predicted miRNA-targeted interactions

Most miRNAs exert their effect through both mRNA decay and
translational repression

Sequence features known to drive site efficacy in mRNA decay were
generally not applicable to translational repression
CNCP2012
miR-138 prefers translational repression
B
A
TR, protein & ratio-level
Edge
TR_o, ratio-level only
B_w, protein-level only
Supported by more than
n one methods
Supported by one method
C
Node
cell adhesion or migratio
Other functions or unknown
TR, protein & ratio-level
A
Edge
TR_o, ratio-level only
B_w, protein-level only
n one methods
Supported by more than
Supported by one method
Node
cell adhesion or migratio
B
C
SW620
SW480
metastasis
high
poor
miR-138 (log2)
3.06
6.39
B
C
12
SW620
SW480
metastasis
high
poor
miR-138 (log2)
3.06
6.39
Other functions or unknown
CNCP2012
me
mi
NetGestalt: motivation
DNA
mutation
methylation
Network
mRNA
expression
splicing
Protein
expression
modification
Phenotype
13
CNCP2012
NetGestalt: scalable network representation
Proteins
3210
14

Total number of modules (size >30): 92

Functional homogeneity: 63 (69%)

Spatial homogeneity: 55 (60%)

Dynamic homogeneity: 69 (75%)

Homogeneity of any type: 82 (89%)
CNCP2012
NetGestalt: viewing and cross-correlating data
Viewing data as tracks


Heat map (e.g. gene expression data)

Bar chart (e.g. fold changes, p values)

Binary track (e.g. significant genes, GO)
Comparing binary tracks


Clickable Venn diagram
Enrichment analysis


Network modules

GO terms

Pathways
Navigating at different scales

15

Zoom

Pan

2D graph visualization
CNCP2012
Shi et al., manuscript under revision
Browsing
data sources
16
Viewing data
as tracks
Comparing
tracks
Identifying
modules
CNCP2012
Moving
across scales
Annotating
modules
Browsing
data sources
Viewing data
as tracks
Comparing
tracks
Identifying
modules
Moving
across scales
Annotating
modules
Ruler
Network modules
-log(p) signed
-log(p) signed
PNNL
Diff proteins
Proteomics
Basal
Vandy
Luminal B
Diff proteins
Basal
-log(p) signed
Diff genes
17
CNCP2012
TCGA
Microarray
Luminal B
Browsing
data sources
Viewing data
as tracks
Comparing
tracks
Identifying
modules
Moving
across scales
Annotating
modules
Ruler
Network modules
-log(p) signed
-log(p) signed
51%
45%
PNNL
Diff proteins
Proteomics
Basal
Vandy
Luminal B
Diff proteins
0%
Luminal B
Basal
-log(p) signed
Diff genes
18
CNCP2012
TCGA
Microarray
4%
Browsing
data sources
Viewing data
as tracks
Comparing
tracks
Identifying
modules
Moving
across scales
Annotating
modules
Vandy
PNNL
Microarray
Luminal B
Basal
-log(p) signed
-log(p) signed
Luminal B
Basal
-log(p) signed
19
CNCP2012
Enriched
Modules
Ruler
Network modules
Browsing
data sources
Viewing data
as tracks
Comparing
tracks
Identifying
modules
Moving
across scales
Annotating
modules
Ruler
Network modules
MRM targets
DNA damage response
Vandy
PNNL
Microarray
Luminal B
Basal
Enriched
Modules
Gene symbol
-log(p) signed (Vandy)
-log(p) signed (PNNL)
Luminal B
Basal
-log(p) signed
20
CNCP2012
Browsing
data sources
Viewing data
as tracks
Comparing
tracks
Identifying
modules
Moving
across scales
Annotating
modules
Ruler
Network modules
MRM targets
DNA damage response
Vandy
PNNL
Microarray
Luminal B
Basal
Enriched
Modules
Gene symbol
-log(p) signed (Vandy)
-log(p) signed (PNNL)
Luminal B
Basal
-log(p) signed
21
CNCP2012
Browsing
data sources
Viewing data
as tracks
Comparing
tracks
Identifying
modules
Moving
across scales
Annotating
modules
T cell activation
Proteomics
Microarray
Luminal B
Basal
-log(p) signed
Proteomics Enriched
Modules
Ruler
Network modules
Basal
-log(p) signed
22
CNCP2012
Microarray
Luminal B
Informatics approaches to integrate genomic and proteomic data


23
Using genomic data to improve proteomic data analysis

Project 1. customProDB: generating customized protein
databases to enhance protein identification in shotgun
proteomics

Project 2. NetWalker: prioritizing candidate gene lists for
targeted MRM analysis
Integrating genomic and proteomic data to gain novel
biological insights

Project 3. miRNA-mediated regulation: understanding posttranscriptional mechanisms regulating human gene expression

Project 4. NetGestalt: viewing and correlating cancer omics data
within a biological network context
CNCP2012
Acknowledgement

Qi Liu

Dan Liebler

Jing Wang

Rob Slebos

Xiaojing Wang

Dave Tabb

Jing Zhu

Zhiao Shi
Funding:
NIGMS R01GM088822
NCI U24CA159988
NCI P50CA095103
24
CNCP2012
Download