Oncomine 2.0 - University of Michigan

advertisement
ONCOMINE: A Bioinformatics
Infrastructure for Cancer
Genomics
Dan Rhodes
Chinnaiyan Laboratory
Bioinformatics Program
Cancer Biology Training Program
Medical Scientist Training Program
University of Michigan Medical School
Outline

Background
– DNA Microarrays and the Cancer Transcriptome

ONCOMINE
– Data collection, normalization & storage
– Statistical Analysis
– Visualization of Data and Analysis

ONCOMINE Data Integration
– Therapeutic Targets / Biomarkers
– Metabolic and Signaling Pathways
– Known protein-protein Interactions

ONCOMINE tutorial
Outline

Background
– DNA Microarrays and the Cancer Transcriptome

ONCOMINE
– Data collection, normalization & storage
– Statistical Analysis
– Visualization of Data and Analysis

ONCOMINE Data Integration
– Therapeutic Targets / Biomarkers
– Metabolic and Signaling Pathways
– Known protein-protein Interactions

ONCOMINE tutorial
The Cancer Transcriptome
The Cancer Transcriptome
The Cancer Transcriptome
The Cancer Transcriptome
The Cancer Transcriptome
The Cancer Transcriptome
The Cancer Transcriptome
The Cancer Transcriptome
The Cancer Transcriptome
The Cancer Transcriptome
The Cancer Transcriptome





180+ studies profiling human cancer
Each profiling 5 – 100+ samples
We estimate > 10,000 microarrays
10k chips measuring 20k genes
= 200+ million data points
Outline

Background
– DNA Microarrays and the Cancer Transcriptome

ONCOMINE
– Data collection, normalization & storage
– Statistical Analysis
– Visualization of Data and Analysis

ONCOMINE Data Integration
– Therapeutic Targets / Biomarkers
– Metabolic and Signaling Pathways
– Known protein-protein Interactions

ONCOMINE tutorial
Oncomine
oncology + data-mining = oncomine







105 independent datasets (90 analyzed)
7,292 cancer microarrays
79 million gene expression measurements
382 distinct cancer signatures
> 5 million tests of differential expression
> 5 million tests of gene set enrichment
> 5 billion pairwise correlations
Oncomine





Database – relational, Oracle 9.2
Statistical computing – R, Perl, Java
Front End – Java Server Pages
Server – Apache/Tomcat
Graphics – Scalable Vector Graphics
(SVG)
Data Collection


Monthly Pubmed searches (cancer + microarray +
transcriptome + tumor + gene expression profiling)
Gene Expression Repositories
– Gene Expression Omnibus (GEO)
(http://www.ncbi.nlm.nih.gov/geo/)
– ArrayExpress (http://www.ebi.ac.uk/arrayexpress/)
– Stanford Microarray Database (http://genomewww5.stanford.edu/)
– Whitehead Cancer Genomics
(http://www.broad.mit.edu/cancer/)
Data Normalization



Global normalization – same scaling
factors applied to all microarray
features – mean and variance
normalization
Affymetrix - Quantile normalization
Spotted cDNA - Loess normalization
– normalize an M vs. A plot
Data Storage






Generic data structures to accommodate a
variety of data
Samples
Microarray Features / Genes
Normalized Data
Statistical Tests
Gene Sets
Samples
Samples
Microarray Features /
Genes
Normalized Data
Gene Sets
Statistical Tests
Statistical Tests
Outline

Background
– DNA Microarrays and the Cancer Transcriptome

ONCOMINE
– Data collection, normalization & schema
– Statistical Analysis
– Visualization of Data and Analysis

ONCOMINE Data Integration
– Therapeutic Targets / Biomarkers
– Metabolic and Signaling Pathways
– Known protein-protein Interactions

ONCOMINE tutorial
Differential Expression Analysis


Two-sided t-test for each gene:
False discovery rate correction for multiple
hypothesis testing
R, Oracle, RODBC
Outline

Background
– DNA Microarrays and the Cancer Transcriptome

ONCOMINE
– Data collection, normalization & storage
– Statistical Analysis
– Visualization of Data and Analysis

ONCOMINE Data Integration
– Therapeutic Targets / Biomarkers
– Metabolic and Signaling Pathways
– Known protein-protein Interactions

ONCOMINE tutorial
Oncomine Tutorial part I
• Gene Differential Expression
• Gene Co-Expression
• Study Differential Expression
WWW.ONCOMINE.ORG
EMAIL: SHORTCOURSE
PASSWORD: MCBI
Outline

Background
– DNA Microarrays and the Cancer Transcriptome

ONCOMINE
– Data collection, normalization & storage
– Statistical Analysis
– Visualization of Data and Analysis

ONCOMINE Data Integration
– Therapeutic Targets / Biomarkers
– Metabolic and Signaling Pathways
– Known protein-protein Interactions

ONCOMINE tutorial
Therapeutic Targets /
Biomarkers

Gene Ontology Consortium
– Biological Process (apoptosis, cell cycle)
– Cellular Component (cytoplasmic membrane,
extracellular)
– Molecular Function (kinase, phosphatase,
protease, etc.)

Known Therapeutic Targets
– NCI Clinical Trials Database
– Therapeutic Target Database
Therapeutic Target
Database
338 proteins with
Literature-documented
Inhibitor, antagonist,
Blocker, etc.
http://xin.cz3.nus.edu.sg/group/cjttd/ttd.asp
Known Drug Targets
Expressed in Bladder Cancer
Secreted proteins highly
expressed in Ovarian Cancer
Outline

Background
– DNA Microarrays and the Cancer Transcriptome

ONCOMINE
– Data collection, normalization & storage
– Statistical Analysis
– Visualization of Data and Analysis

ONCOMINE Data Integration
– Therapeutic Targets / Biomarkers
– Metabolic and Signaling Pathways
– Known protein-protein Interactions

ONCOMINE tutorial
Metabolic & Signaling
Pathways

KEGG
– Kyoto Encyclopedia of Genes & Genomes
– 87 metabolic pathways, 1700 gene assignments

Biocarta
– Signaling pathways reviewed and entered by ‘expert’
biologists
– 215 signaling pathways, 3700 gene assignments
Pathway enrichment
analysis


Identify pathways and functional
groups of genes deregulated in
particular cancer types
Enrichment Analysis using KolmogrovSmirnov Scanning (Lamb et al)
Kolmogrov-Smirnov
Scanning (Lamb et al)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
*
*
*
*
(1,2,3,4…,19,20)
Vs.
(2,4,6,7,18)
*
Pathway Enrichment
Liver vs. other
Normal tissues
Pathway Enrichment cont
Pathway enrichment
analysis
A search for the Biocarta
pathways most enriched in
a medulloblastoma signature (C2)
uncovered involvement of
the Ras/Rho pathway
Pathway enrichment
analysis cont.
A direct link to the Biocarta pathway provides the details
(Medulloblastoma genes with red boxes)
Outline

Background
– DNA Microarrays and the Cancer Transcriptome

ONCOMINE
– Data collection, normalization & storage
– Statistical Analysis
– Visualization of Data and Analysis

ONCOMINE Data Integration
– Therapeutic Targets / Biomarkers
– Metabolic and Signaling Pathways
– Known protein-protein Interactions

ONCOMINE tutorial
Known Protein-Protein
Interactions

HPRD
– Human Protein Reference Database
– Manually curated
– 20,000+ papers, 15,000+ distinct interactions

PKDB
–
–
–
–

Protein Kinase Database
Natural Language Processing
60,000+ abstracts suggest interaciton, 16,000 distinct interactions
Error prone
Co-RIF
– Locus Link Reference into Function
– 12,000+ co-RIFs
Human Interactome Map
(www.himap.org)
INTERACT
Outline

Background
– DNA Microarrays and the Cancer Transcriptome

ONCOMINE
– Data collection, normalization & storage
– Statistical Analysis
– Visualization of Data and Analysis

ONCOMINE Data Integration
– Therapeutic Targets / Biomarkers
– Metabolic and Signaling Pathways
– Known protein-protein Interactions

ONCOMINE tutorial
Oncomine Tutorial Part II



Gene set filtering to identify therapeutic
targets and biomarkers
Enrichment Analysis to identify pathways
and processes deregulated in cancer
Pathway and protein interaction networks
deregulated in cancer
Acknowledgements

Chinnaiyan Lab
– Radhika, Terry, Vasu, Jianjun, Scott,
Soory

Pandey Lab

IOB
– Shanker, Nandan
Download