Data integration across omics landscapes Bing Zhang, Ph.D. Department of Biomedical Informatics Vanderbilt University School of Medicine bing.zhang@vanderbilt.edu Omics data integration DNA Elephant mRNA Protein 2 CNCP2012 Informatics approaches to integrate genomic and proteomic data Genome The Cancer Genome Atlas 3 Mutations Exome Sequencing RNA-Seq Sequence variants arrayCGH, SNP Array CNV SNP Array LOH EG Clinical Proteomic Tumor Analysis Consortium Exome Sequencing RNA-Seq Methylation Array DNA Methylation Transcriptome CPTAC Data Type Array, RNA-Seq Exon expression RNA-Seq Junction expression Array, RNA-Seq Gene expression Proteome TCGA Technology MS/MS Protein expression MS/MS, protein arrays Protein PTM CNCP2012 Genomic data Improved proteomic data analysis Genomic data Proteomic data Novel biological insights Informatics approaches to integrate genomic and proteomic data 4 Using genomic data to improve proteomic data analysis Project 1. customProDB: generating customized protein databases to enhance protein identification in shotgun proteomics Project 2. NetWalker: prioritizing candidate gene lists for targeted MRM analysis Integrating genomic and proteomic data to gain novel biological insights Project 3. miRNA-mediated regulation: understanding posttranscriptional mechanisms regulating human gene expression Project 4. NetGestalt: viewing and correlating cancer omics data within a biological network context CNCP2012 customProDB: motivation Proteins with sequence variation Unexpressed proteins commonly used database Expressed proteins Database search 5 CNCP2012 Customized protein database from RNA-Seq data Increased sensitivity Reduced ambiguity Variant peptides Wang et al., J Proteome Res, 2012 6 CNCP2012 CustomProDB: moving forward R package Compatible with both DNA and RNA sequencing data Sample specific database and consensus database Application to the CPTAC project Spectral library Wang et al., manuscript in preparation 7 CNCP2012 miRNA regulation: motivation Inverse correlation miRNA expression 8 mRNA expression mRNA decay Protein/mRNA ratio Translation repression Protein expression Combined effect CNCP2012 miRNA regulation: data preparation 9 9 colorectal cancer cell lines Protein expression data: Current study mRNA expression data: GSE10843 miRNA expression data: GSE10833 CNCP2012 miRNA regulation: data analysis workflow Dissecting microRNA-mediated regulation miRNA-mRNA correlation mRNA decay protein/mRNA ratio miRNA-ratio correlation Translational repression protein miRNA-protein correlation mRNAi microRNA 79 miRNAs Combined effect 5144 genes microRNA-target interactions Sequence features on site efficacy Significant Correlation Association of sequence features with estimated mRNA decay or translation repression Site type Site location TargetScan, miRanda or MirTarget2 Binding evidence 580 interactions 60 miRNAs 423 genes miRNA-target interactions Local AU-context 7235 functional relationships Additional 3’ pairing miRNA-mRNA miRNA-ratio miRNA-protein S NS * NS S * NS NS S S S S Major contribution mRNA decay Translation repression Both weak Both strong The contribution of mRNA decay and translation repression Liu et al., manuscript in preparation 10 CNCP2012 miRNA regulation: mRNA decay or translational repression? Early studies suggest a major role of translational repression Recent large-scale studies suggest a predominant role of mRNA decay 11 Olsen et al. Dev Biol, 1999; Zeng et al., Molecular Cell, 2001 Baek et al., Nature, 2008; Selbach et al., Nature, 2008; Guo et al., Nature, 2010 Our study suggested equally important roles of mRNA decay and translational repression Translational repression was involved in 58% and played a major role in 30% of all predicted miRNA-targeted interactions Most miRNAs exert their effect through both mRNA decay and translational repression Sequence features known to drive site efficacy in mRNA decay were generally not applicable to translational repression CNCP2012 miR-138 prefers translational repression B A TR, protein & ratio-level Edge TR_o, ratio-level only B_w, protein-level only Supported by more than n one methods Supported by one method C Node cell adhesion or migratio Other functions or unknown TR, protein & ratio-level A Edge TR_o, ratio-level only B_w, protein-level only n one methods Supported by more than Supported by one method Node cell adhesion or migratio B C SW620 SW480 metastasis high poor miR-138 (log2) 3.06 6.39 B C 12 SW620 SW480 metastasis high poor miR-138 (log2) 3.06 6.39 Other functions or unknown CNCP2012 me mi NetGestalt: motivation DNA mutation methylation Network mRNA expression splicing Protein expression modification Phenotype 13 CNCP2012 NetGestalt: scalable network representation Proteins 3210 14 Total number of modules (size >30): 92 Functional homogeneity: 63 (69%) Spatial homogeneity: 55 (60%) Dynamic homogeneity: 69 (75%) Homogeneity of any type: 82 (89%) CNCP2012 NetGestalt: viewing and cross-correlating data Viewing data as tracks Heat map (e.g. gene expression data) Bar chart (e.g. fold changes, p values) Binary track (e.g. significant genes, GO) Comparing binary tracks Clickable Venn diagram Enrichment analysis Network modules GO terms Pathways Navigating at different scales 15 Zoom Pan 2D graph visualization CNCP2012 Shi et al., manuscript under revision Browsing data sources 16 Viewing data as tracks Comparing tracks Identifying modules CNCP2012 Moving across scales Annotating modules Browsing data sources Viewing data as tracks Comparing tracks Identifying modules Moving across scales Annotating modules Ruler Network modules -log(p) signed -log(p) signed PNNL Diff proteins Proteomics Basal Vandy Luminal B Diff proteins Basal -log(p) signed Diff genes 17 CNCP2012 TCGA Microarray Luminal B Browsing data sources Viewing data as tracks Comparing tracks Identifying modules Moving across scales Annotating modules Ruler Network modules -log(p) signed -log(p) signed 51% 45% PNNL Diff proteins Proteomics Basal Vandy Luminal B Diff proteins 0% Luminal B Basal -log(p) signed Diff genes 18 CNCP2012 TCGA Microarray 4% Browsing data sources Viewing data as tracks Comparing tracks Identifying modules Moving across scales Annotating modules Vandy PNNL Microarray Luminal B Basal -log(p) signed -log(p) signed Luminal B Basal -log(p) signed 19 CNCP2012 Enriched Modules Ruler Network modules Browsing data sources Viewing data as tracks Comparing tracks Identifying modules Moving across scales Annotating modules Ruler Network modules MRM targets DNA damage response Vandy PNNL Microarray Luminal B Basal Enriched Modules Gene symbol -log(p) signed (Vandy) -log(p) signed (PNNL) Luminal B Basal -log(p) signed 20 CNCP2012 Browsing data sources Viewing data as tracks Comparing tracks Identifying modules Moving across scales Annotating modules Ruler Network modules MRM targets DNA damage response Vandy PNNL Microarray Luminal B Basal Enriched Modules Gene symbol -log(p) signed (Vandy) -log(p) signed (PNNL) Luminal B Basal -log(p) signed 21 CNCP2012 Browsing data sources Viewing data as tracks Comparing tracks Identifying modules Moving across scales Annotating modules T cell activation Proteomics Microarray Luminal B Basal -log(p) signed Proteomics Enriched Modules Ruler Network modules Basal -log(p) signed 22 CNCP2012 Microarray Luminal B Informatics approaches to integrate genomic and proteomic data 23 Using genomic data to improve proteomic data analysis Project 1. customProDB: generating customized protein databases to enhance protein identification in shotgun proteomics Project 2. NetWalker: prioritizing candidate gene lists for targeted MRM analysis Integrating genomic and proteomic data to gain novel biological insights Project 3. miRNA-mediated regulation: understanding posttranscriptional mechanisms regulating human gene expression Project 4. NetGestalt: viewing and correlating cancer omics data within a biological network context CNCP2012 Acknowledgement Qi Liu Dan Liebler Jing Wang Rob Slebos Xiaojing Wang Dave Tabb Jing Zhu Zhiao Shi Funding: NIGMS R01GM088822 NCI U24CA159988 NCI P50CA095103 24 CNCP2012