Visualizing RNA Expression Data John Quackenbush VIZBI 16 March 2011 Northern Blots: Before the dawn of Time Northern Blots Northern Blots Quantitative RT-PCR The Pre-Modern Era Quantitative PCR Quantitative PCR and other Methods Large-scale Quantitative RT-PCR: The Dawn of the Modern Age An Aside: The Birth of Clustering Our World Today: A Microarray Overview History is written by the victors (or those who produce software): The Birth of Clustering This was also the start of tormenting the red-green color-blind. Truth is determined by the person giving the talk: MeV is the best clustering tool ever! http://www.tm4.org Truth is determined by the person giving the talk: MeV is the best clustering tool ever! Truth is determined by the person giving the talk: MeV is the best clustering tool ever! Public Microarray Data ArrayExpress 20,423 Experiments (572,682 hybs/arrays) GEO 21,320 Experiments (529,108 arrays) CIBEX 148 Experiments (2,711 arrays) SMD 21,521Expts (80,319 incl private data) >1,000,000 arrays x $500 = $500,000,000 Cancer Studies account for >14% of all studies in databases… EBI’s Expression Atlas Rocks! Disease Progression and Personalized Care Birth Treatment Natural History of Disease Death Clinical Care Environment + Lifestyle Outcomes Treatment Options Disease Staging Patient Stratification Early Detection Genetic Risk Biomarkers Quality Of Life Welcome to the post-Modern World: Next-Gen Technologies have Dramatically Expanded our Genomic Universe Browser-mania rules! Back to Excel, Man’s Best Friend RNA-Seq data of 7 FFPE blocks And more websites are integrating data Cells Converge to Attractive States Stuart Kauffman presented the idea of a gene expression landscape with attractors •~250 stable cell types each represent attractors •Cells can be "pushed" or induced to converge to an attractor. •Once in the attractor, a cell is robust to small perturbations. Jess Mar Differentiation of Promyelocytes into Neutrophil-Like Cells Time 0 Promyeloctyes (HL-60 Cell Line) Dimethyl Sulfoxide (DMSO) Neutrophil-like Cells Collins et al. PNAS 1978 Affymetrix GeneChip RA used in differentiation therapy for acute promyelocytic leukemia. All-Trans Retinoic Acid (ATRA) ~6 days Combined with chemotherapy, complete remission rates as high as 90-95% can be achieved. Day 7 Jess Mar Huang et al. PRL 2005 GEDI: Cells Display Divergent Trajectories That Eventually Converge as they Differentiate DMSO, ATRA Graphical representation of the results from a Self-Organizing Map clustering. Expression data from a single sample (time point) clustered according to a grid. What factors drive this divergent-then-convergent behavior? Huang et al. PRL 2005 Our Hypothesis State A Observed Trajectory (Perturbation 1) State B Observed Trajectory (Perturbation 2) Transient Pathway (Perturbation 1) State A Core Differentiation Pathway Jess Mar State B Transient Pathway (Perturbation 2) Observed Trajectory 2 hrs 4 hrs 8 hrs 12 hrs 18 hrs 1 day 5 days 6 days 7 days ATRA DMSO ATRA DMSO 2 days Jess Mar 3 days 4 days Transient Trajectory 2 hrs 4 hrs 8 hrs 12 hrs 18 hrs 1 day 5 days 6 days 7 days ATRA DMSO ATRA DMSO 2 days Jess Mar 3 days 4 days Core Trajectory 2 hrs 4 hrs 8 hrs 12 hrs 18 hrs 1 day 4 days 5 days 6 days 7 days ATRA DMSO ATRA DMSO 2 days Jess Mar 3 days Ultimately, we’d like to get to pathways: Functional Roles Are Associated with Constraint Extracellular Membrane High-variance genes tend to function as cell surface receptors. Cytoplasm Low-variance genes function as kinases and transferases. Nuclear high variance low variance But the tools are very primative Variance Constraints Alter Network Topology Degree distributions for the MAPK module are significantly different (Kolmogorov-Smirnov test). SZ Group Density Density P-value 2.8 10-7 Control Group Node Degree high variance low variance PD Group P-value 2.5 10-4 Density Degree of statistical significance is altered by disease status. P-value 3.5 10-4 Node Degree So we’re back to Heat Maps The transcriptional profiles of ONS XS cells from SZ patients more closely resemble those of healthy fibroblasts than any other stem cell signature. And of course, we’ve left out the interestingg stuff, like where genes are expressed. LGRC Research Portal LGRC Research Portal PAGE DETAILS Search -Facets -Search within results -Keyword prompts -Search history Table: -Paged results -Sortable columns Actions: -Go to Gene detail page -Add genes to ‘gene set’ PAGE DETAILS Annotation summary & summary view for each assay/data type: Accordion style sections Annotation Summary Gene Expression Summary -GEXP – expression profile across major Dx categories -RNASeq – Exon structure of the gene -SNPs – Table of SNPs in region of gene, highlighting association with major Dx group - Methylation – Methylation profile in region around gene -Genomic alterations – table of CNVs & alterations observed w/ freq in region around gene Actions: - Click through to assay detail page -Add gene to set RNASeq LGRC Research Portal PAGE DETAILS - View aggregate statistics - View cohort details - Build cohort sets - Build composite phenotypes Actions: -Go to data download for selected cohort -Go to assay detail for selected cohort -Go to cohort manager LGRC Research Portal Analysis Tools Cohort 1: Set 1 Cohort 2: Set 2 Job name: PAGE DETAILS -Very minimal parameters and options…here just 2 cohorts of interest, maybe p-value cutoff My job 1 View analysis parameters Generates comprehensive report Start Analysis Edit in place results – Don’t set parameters, edit the results Analysis goes into queue, email notification when finished Job Status Running Analysis of Differential Expression: My Job 1 PAGE DETAILS -Very minimal parameters and options. Supervised Analysis Generates comprehensive report Edit in place results – Don’t set parameters, edit the results Accordion style result sections Meta analysis Generate PDF report of analysis Analysis goes into queue, email notification when finished Unsupervised analysis Before I came here I was confused about this subject. After listening to your lecture, I am still confused but at a higher level. - Enrico Fermi, (1901-1954) Genomics is here to stay Acknowledgments The Gene Index Team Corina Antonescu Valentin Antonescu Fenglong Liu Geo Pertea Razvan Sultana John Quackenbush Array Software Hit Team Katie Franklin Eleanor Howe Sarita Nair Jerry Papenhausen John Quackenbush Dan Schlauch Raktim Sinha Joseph White Eskitis Institute Christine Wells Alan Mackay-Sim <johnq@jimmy.harvard.edu> Center for Cancer Microarray Expression Team Computational Biology Stefan Bentink Mick Correll Thomas Chittenden Howie Goodell Aedin Culhane Kristina Holton Kristina Holton Jerry Papenhausen Jane Pak Patricia Papastamos Renee Rubio John Quackenbush (Former) Stellar Students http://cccb.dfci.harvard.edu Martin Aryee Kaveh Maghsoudi Jess Mar Systems Support Stas Alekseev, Sys Admin Assistant Joan Coraccio Juliana Coraccio http://compbio.dfci.harvard.edu Shameless self-promotion