ONCOMINE: A Bioinformatics Infrastructure for Cancer Genomics Dan Rhodes Chinnaiyan Laboratory Bioinformatics Program Cancer Biology Training Program Medical Scientist Training Program University of Michigan Medical School Outline Background – DNA Microarrays and the Cancer Transcriptome ONCOMINE – Data collection, normalization & storage – Statistical Analysis – Visualization of Data and Analysis ONCOMINE Data Integration – Therapeutic Targets / Biomarkers – Metabolic and Signaling Pathways – Known protein-protein Interactions ONCOMINE tutorial Outline Background – DNA Microarrays and the Cancer Transcriptome ONCOMINE – Data collection, normalization & storage – Statistical Analysis – Visualization of Data and Analysis ONCOMINE Data Integration – Therapeutic Targets / Biomarkers – Metabolic and Signaling Pathways – Known protein-protein Interactions ONCOMINE tutorial The Cancer Transcriptome The Cancer Transcriptome The Cancer Transcriptome The Cancer Transcriptome The Cancer Transcriptome The Cancer Transcriptome The Cancer Transcriptome The Cancer Transcriptome The Cancer Transcriptome The Cancer Transcriptome The Cancer Transcriptome 180+ studies profiling human cancer Each profiling 5 – 100+ samples We estimate > 10,000 microarrays 10k chips measuring 20k genes = 200+ million data points Outline Background – DNA Microarrays and the Cancer Transcriptome ONCOMINE – Data collection, normalization & storage – Statistical Analysis – Visualization of Data and Analysis ONCOMINE Data Integration – Therapeutic Targets / Biomarkers – Metabolic and Signaling Pathways – Known protein-protein Interactions ONCOMINE tutorial Oncomine oncology + data-mining = oncomine 105 independent datasets (90 analyzed) 7,292 cancer microarrays 79 million gene expression measurements 382 distinct cancer signatures > 5 million tests of differential expression > 5 million tests of gene set enrichment > 5 billion pairwise correlations Oncomine Database – relational, Oracle 9.2 Statistical computing – R, Perl, Java Front End – Java Server Pages Server – Apache/Tomcat Graphics – Scalable Vector Graphics (SVG) Data Collection Monthly Pubmed searches (cancer + microarray + transcriptome + tumor + gene expression profiling) Gene Expression Repositories – Gene Expression Omnibus (GEO) (http://www.ncbi.nlm.nih.gov/geo/) – ArrayExpress (http://www.ebi.ac.uk/arrayexpress/) – Stanford Microarray Database (http://genomewww5.stanford.edu/) – Whitehead Cancer Genomics (http://www.broad.mit.edu/cancer/) Data Normalization Global normalization – same scaling factors applied to all microarray features – mean and variance normalization Affymetrix - Quantile normalization Spotted cDNA - Loess normalization – normalize an M vs. A plot Data Storage Generic data structures to accommodate a variety of data Samples Microarray Features / Genes Normalized Data Statistical Tests Gene Sets Samples Samples Microarray Features / Genes Normalized Data Gene Sets Statistical Tests Statistical Tests Outline Background – DNA Microarrays and the Cancer Transcriptome ONCOMINE – Data collection, normalization & schema – Statistical Analysis – Visualization of Data and Analysis ONCOMINE Data Integration – Therapeutic Targets / Biomarkers – Metabolic and Signaling Pathways – Known protein-protein Interactions ONCOMINE tutorial Differential Expression Analysis Two-sided t-test for each gene: False discovery rate correction for multiple hypothesis testing R, Oracle, RODBC Outline Background – DNA Microarrays and the Cancer Transcriptome ONCOMINE – Data collection, normalization & storage – Statistical Analysis – Visualization of Data and Analysis ONCOMINE Data Integration – Therapeutic Targets / Biomarkers – Metabolic and Signaling Pathways – Known protein-protein Interactions ONCOMINE tutorial Oncomine Tutorial part I • Gene Differential Expression • Gene Co-Expression • Study Differential Expression WWW.ONCOMINE.ORG EMAIL: SHORTCOURSE PASSWORD: MCBI Outline Background – DNA Microarrays and the Cancer Transcriptome ONCOMINE – Data collection, normalization & storage – Statistical Analysis – Visualization of Data and Analysis ONCOMINE Data Integration – Therapeutic Targets / Biomarkers – Metabolic and Signaling Pathways – Known protein-protein Interactions ONCOMINE tutorial Therapeutic Targets / Biomarkers Gene Ontology Consortium – Biological Process (apoptosis, cell cycle) – Cellular Component (cytoplasmic membrane, extracellular) – Molecular Function (kinase, phosphatase, protease, etc.) Known Therapeutic Targets – NCI Clinical Trials Database – Therapeutic Target Database Therapeutic Target Database 338 proteins with Literature-documented Inhibitor, antagonist, Blocker, etc. http://xin.cz3.nus.edu.sg/group/cjttd/ttd.asp Known Drug Targets Expressed in Bladder Cancer Secreted proteins highly expressed in Ovarian Cancer Outline Background – DNA Microarrays and the Cancer Transcriptome ONCOMINE – Data collection, normalization & storage – Statistical Analysis – Visualization of Data and Analysis ONCOMINE Data Integration – Therapeutic Targets / Biomarkers – Metabolic and Signaling Pathways – Known protein-protein Interactions ONCOMINE tutorial Metabolic & Signaling Pathways KEGG – Kyoto Encyclopedia of Genes & Genomes – 87 metabolic pathways, 1700 gene assignments Biocarta – Signaling pathways reviewed and entered by ‘expert’ biologists – 215 signaling pathways, 3700 gene assignments Pathway enrichment analysis Identify pathways and functional groups of genes deregulated in particular cancer types Enrichment Analysis using KolmogrovSmirnov Scanning (Lamb et al) Kolmogrov-Smirnov Scanning (Lamb et al) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 * * * * (1,2,3,4…,19,20) Vs. (2,4,6,7,18) * Pathway Enrichment Liver vs. other Normal tissues Pathway Enrichment cont Pathway enrichment analysis A search for the Biocarta pathways most enriched in a medulloblastoma signature (C2) uncovered involvement of the Ras/Rho pathway Pathway enrichment analysis cont. A direct link to the Biocarta pathway provides the details (Medulloblastoma genes with red boxes) Outline Background – DNA Microarrays and the Cancer Transcriptome ONCOMINE – Data collection, normalization & storage – Statistical Analysis – Visualization of Data and Analysis ONCOMINE Data Integration – Therapeutic Targets / Biomarkers – Metabolic and Signaling Pathways – Known protein-protein Interactions ONCOMINE tutorial Known Protein-Protein Interactions HPRD – Human Protein Reference Database – Manually curated – 20,000+ papers, 15,000+ distinct interactions PKDB – – – – Protein Kinase Database Natural Language Processing 60,000+ abstracts suggest interaciton, 16,000 distinct interactions Error prone Co-RIF – Locus Link Reference into Function – 12,000+ co-RIFs Human Interactome Map (www.himap.org) INTERACT Outline Background – DNA Microarrays and the Cancer Transcriptome ONCOMINE – Data collection, normalization & storage – Statistical Analysis – Visualization of Data and Analysis ONCOMINE Data Integration – Therapeutic Targets / Biomarkers – Metabolic and Signaling Pathways – Known protein-protein Interactions ONCOMINE tutorial Oncomine Tutorial Part II Gene set filtering to identify therapeutic targets and biomarkers Enrichment Analysis to identify pathways and processes deregulated in cancer Pathway and protein interaction networks deregulated in cancer Acknowledgements Chinnaiyan Lab – Radhika, Terry, Vasu, Jianjun, Scott, Soory Pandey Lab IOB – Shanker, Nandan