John Quakenbush

advertisement
Visualizing RNA Expression Data
John Quackenbush
VIZBI
16 March 2011
Northern Blots:
Before the dawn of Time
Northern Blots
Northern Blots
Quantitative RT-PCR
The Pre-Modern Era
Quantitative PCR
Quantitative PCR and other Methods
Large-scale Quantitative RT-PCR:
The Dawn of the Modern Age
An Aside: The Birth of Clustering
Our World Today:
A Microarray Overview
History is written by the victors
(or those who produce software):
The Birth of Clustering
This was also the start of tormenting
the red-green color-blind.
Truth is determined by the person
giving the talk:
MeV is the best clustering tool ever!
http://www.tm4.org
Truth is determined by the person
giving the talk:
MeV is the best clustering tool ever!
Truth is determined by the person
giving the talk:
MeV is the best clustering tool ever!
Public Microarray Data
ArrayExpress
 20,423 Experiments (572,682
hybs/arrays)
GEO
 21,320 Experiments (529,108 arrays)
CIBEX
 148 Experiments (2,711 arrays)
SMD
 21,521Expts (80,319 incl private data)
>1,000,000 arrays x
$500 = $500,000,000
Cancer Studies account
for >14% of all
studies in
databases…
EBI’s Expression Atlas Rocks!
Disease Progression and
Personalized Care
Birth
Treatment
Natural History of Disease
Death
Clinical Care
Environment
+ Lifestyle
Outcomes
Treatment
Options
Disease
Staging
Patient
Stratification
Early
Detection
Genetic
Risk
Biomarkers
Quality
Of Life
Welcome to the post-Modern World:
Next-Gen Technologies have Dramatically
Expanded our Genomic Universe
Browser-mania rules!
Back to Excel, Man’s Best Friend
RNA-Seq data of 7 FFPE blocks
And more websites are integrating
data
Cells Converge to Attractive States
Stuart Kauffman presented the idea of a gene expression landscape
with attractors
•~250 stable cell types each represent attractors
•Cells can be "pushed" or induced to converge to an attractor.
•Once in the attractor, a cell is robust to small perturbations.
Jess Mar
Differentiation of Promyelocytes into
Neutrophil-Like Cells
Time 0
Promyeloctyes
(HL-60 Cell Line)
Dimethyl Sulfoxide
(DMSO)
Neutrophil-like
Cells
Collins et al. PNAS 1978
Affymetrix
GeneChip
RA used in differentiation
therapy for acute
promyelocytic leukemia.
All-Trans Retinoic Acid
(ATRA)
~6 days
Combined with
chemotherapy, complete
remission rates as high
as 90-95% can be
achieved.
Day 7
Jess Mar
Huang et al. PRL 2005
GEDI: Cells Display Divergent
Trajectories That Eventually Converge as
they Differentiate
DMSO, ATRA
Graphical representation of the results from a Self-Organizing Map clustering.
Expression data from a single sample (time point) clustered according to a grid.
What factors drive this divergent-then-convergent behavior?
Huang et al. PRL 2005
Our Hypothesis
State A
Observed
Trajectory
(Perturbation 1)
State B
Observed
Trajectory
(Perturbation 2)
Transient Pathway
(Perturbation 1)
State A
Core
Differentiation
Pathway
Jess Mar
State B
Transient Pathway
(Perturbation 2)
Observed Trajectory
2 hrs
4 hrs
8 hrs
12 hrs
18 hrs
1 day
5 days
6 days
7 days
ATRA
DMSO
ATRA
DMSO
2 days
Jess Mar
3 days
4 days
Transient Trajectory
2 hrs
4 hrs
8 hrs
12 hrs
18 hrs
1 day
5 days
6 days
7 days
ATRA
DMSO
ATRA
DMSO
2 days
Jess Mar
3 days
4 days
Core Trajectory
2 hrs
4 hrs
8 hrs
12 hrs
18 hrs
1 day
4 days
5 days
6 days
7 days
ATRA
DMSO
ATRA
DMSO
2 days
Jess Mar
3 days
Ultimately, we’d like to get to pathways:
Functional Roles Are Associated with Constraint
Extracellular
Membrane
High-variance genes
tend to function as
cell surface receptors.
Cytoplasm
Low-variance genes
function as kinases
and transferases.
Nuclear
high variance
low variance
But the tools are very primative
Variance Constraints Alter
Network Topology
Degree distributions for the MAPK module are significantly different
(Kolmogorov-Smirnov test).
SZ Group
Density
Density
P-value 2.8  10-7
Control Group
Node Degree
high variance
low variance
PD Group
P-value 2.5  10-4
Density
Degree of statistical significance
is altered by disease status.
P-value 3.5  10-4
Node Degree
So we’re back to Heat Maps
The transcriptional profiles of ONS XS cells from SZ patients more closely
resemble those of healthy fibroblasts than any other stem cell signature.
And of course, we’ve left out the
interestingg stuff, like where genes are
expressed.
LGRC Research Portal
LGRC Research Portal
PAGE DETAILS
Search
-Facets
-Search within results
-Keyword prompts
-Search history
Table:
-Paged results
-Sortable columns
Actions:
-Go to Gene detail
page
-Add genes to ‘gene
set’
PAGE DETAILS
Annotation summary & summary
view for each assay/data type:
Accordion style sections
Annotation
Summary
Gene Expression Summary
-GEXP – expression profile across
major Dx categories
-RNASeq – Exon structure of the
gene
-SNPs – Table of SNPs in region of
gene, highlighting association with
major Dx group
- Methylation – Methylation
profile in region around gene
-Genomic alterations – table of
CNVs & alterations observed w/
freq in region around gene
Actions:
- Click through to assay detail page
-Add gene to set
RNASeq
LGRC Research Portal
PAGE DETAILS
- View aggregate statistics
- View cohort details
- Build cohort sets
- Build composite phenotypes
Actions:
-Go to data download for selected
cohort
-Go to assay detail for selected
cohort
-Go to cohort manager
LGRC Research Portal
Analysis Tools
Cohort 1:
Set 1
Cohort 2:
Set 2
Job name:
PAGE DETAILS
-Very minimal parameters and
options…here just 2 cohorts of
interest, maybe p-value cutoff
My job 1
View analysis parameters
Generates comprehensive report
Start Analysis
Edit in place results – Don’t set
parameters, edit the results
Analysis goes into queue, email
notification when finished
Job Status
Running
Analysis of Differential Expression: My Job 1
PAGE DETAILS
-Very minimal parameters and
options.
Supervised Analysis
Generates comprehensive report
Edit in place results – Don’t set
parameters, edit the results
Accordion style result sections
Meta analysis
Generate PDF report of analysis
Analysis goes into queue, email
notification when finished
Unsupervised analysis
Before I came here I was confused
about this subject.
After listening to your lecture,
I am still confused but at a higher level.
- Enrico Fermi, (1901-1954)
Genomics is here to stay
Acknowledgments
The Gene Index Team
Corina Antonescu
Valentin Antonescu
Fenglong Liu
Geo Pertea
Razvan Sultana
John Quackenbush
Array Software Hit Team
Katie Franklin
Eleanor Howe
Sarita Nair
Jerry Papenhausen
John Quackenbush
Dan Schlauch
Raktim Sinha
Joseph White
Eskitis Institute
Christine Wells
Alan Mackay-Sim
<johnq@jimmy.harvard.edu>
Center for Cancer
Microarray Expression Team
Computational Biology
Stefan Bentink
Mick Correll
Thomas Chittenden
Howie Goodell
Aedin Culhane
Kristina Holton
Kristina Holton
Jerry Papenhausen
Jane Pak
Patricia Papastamos
Renee Rubio
John Quackenbush
(Former) Stellar Students
http://cccb.dfci.harvard.edu
Martin Aryee
Kaveh Maghsoudi
Jess Mar
Systems Support
Stas Alekseev, Sys Admin
Assistant
Joan Coraccio
Juliana Coraccio
http://compbio.dfci.harvard.edu
Shameless self-promotion
Download