Visualization of Large-Scale Biological Data Nils Gehlenborg (nils@hms.harvard.edu) Center for Biomedical Informatics, Harvard Medical School Miriah Meyer (miriah@seas.harvard.edu) School of Engineering and Applied Sciences, Harvard University Bio-IT World 2011 Instructors: Nils Gehlenborg - background in bioinformatics, PhD thesis on visualization and exploration of gene expression data - Research Associate at Center for Biomedical Informatics at Harvard Medical School; Associated Researcher at the Broad Institute, working on The Cancer Genome Atlas project - research interests in information visualization, machine learning, information retrieval applied to large-scale biological data sets - IEEE Symposium on Biological Data Visualization (http://www.biovis.net); Workshop on Visualization of Biological Data (http://www.vizbi.org) - developer of various software tools and visualization methods for transcriptomics and proteomics data Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 2 Instructors: Miriah Meyer - background in computer science, PhD thesis on processing and visualizing threedimensional data - Postdoctoral Research Fellow in the School of Engineering and Applied Sciences at Harvard University, focusing on visualizing genomics and molecular biology data - Visiting scientist at the Broad Institute of MIT and Harvard, cofounder of the Data Visualization Initiative - research interests in visualization and human-computer interaction applied to complex biological data sets - developer of various software tools: MizBee (www.mizbee.org); Pathline (www.pathline.org); MulteeSum (www.multeesum.org) Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 3 Participants: Who are you? - Where do you work? In industry or academia? - What is your primary field? Biology? Bioinformatics? Computer Science? - What is your job title? - What is your relationship to visualization software? Are you a user or a developer? - What do you hope to learn today? Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 4 What is this course about? - challenges of large, biological data sets: - scale: store, process and access - heterogeneity: interpret and integrate - course: - how to use visual representations to interpret complex data sets - starting with basic principles through examples for biological data types - pointers to resources Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 5 Overview 1. Principles of Visualization 1. Visual Representation 2. Multiple Views 3. Design Process 2. Key Methods and Software Tools 1. Applications for Visualization 2. Methods and Tools 3. Design of Visualization Systems Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 6 Part 1 Principles of Visualization Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer Engadget Exercise: Critique Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 8 U.S. SmartPhone Marketshare 21.2% 39.0% RIM Apple Palm Motorola Nokia Other 3.1% 7.4% 9.8% 19.5% Definition: Visualization The use of computer-supported, interactive, visual representations of data to amplify cognition. Card, Mackinlay & Shneiderman 1999 Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 10 1.1 Visual Representation Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer slide adapted from Munzner 2011,Visualization Principles Visual Encoding of Data data tabular ordered categorical ordinal apples oranges bananas small medium large quantiative relational spatial 10 inches 13 inches 18.5 inches trees networks intrinsic position Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 12 slide from Munzner 2011,Visualization Principles Visual channel types and rankings 14 slide from Munzner 2011,Visualization Principles Power of the plane: only position works for all! 15 slide from Munzner 2011,Visualization Principles Ranking differs for all other channels 16 Where do rankings come from? - user studies, psychophysical experiments, principles from graphic design - accuracy, discriminability, separability, popout Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer " # $% '% $& '& '( 172884798:7;8<01:125="79>6;2586147?"68<5;@ C,1, Using Rankings @0";5"@ ;>:="<10 '& '% $% $& 8=":41#@ ') '( 0,1" with highest 172884798:7;8<01:125="79>6;2586147?"68<5;@ - Effectiveness Principle: encode most important data attributes C%1' C,1, ranked channels ;5"@ ;>:="<10 '% '& 8=":41#@ ') '( $ % $ = % ( = > ? = @ low $% $&high 98:7;8<01:125="79>6;2586147?"68<5;@ C,1, time ;>:="<10 8=":41#@ $% '% $& '& '( ') value "#% '& '& '( '( ') ') '* 0,1" 021( C%1' C21) '+ 0,1" 021( 0'1% C%1' C21) C'1( 0,1" 021( 0'1% 021# C%1' C21) C'1( C)1D 0'1% 021# C,1, '% '* '* '* '+ 021( '+ ', ', '- Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer C21) C'1( C)1D 0)1) C)1# Using Rankings Year 1 A Year 2 B C D Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 19 Using Rankings Year 1 Year 2 27 27 18 18 9 9 0 0 A B C D A B C Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer D 20 Visual Encoding of Data data tabular ordered categorical ordinal quantiative relational spatial abstract - using spatial encoding for spatial data versus abstract (nonspatial) data Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 21 Common Pitfalls 1. Color 2. 3D 3. Animation Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 22 after Borland & Taylor 2007, IEEE CG&A 1. Color Pitfalls: Rainbow Color Map hard to order easy to order lower resolution creates artifacts Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 23 Rogowitz & Treinish 1996, http://researchweb.watson.ibm.com/people/l/lloydt/color/color.HTM 1. Color Pitfalls: Rainbow Color Map Southeastern United States and Gulf of Mexico Problems: - zero crossing not explicit - lack of ordering of colors makes it hard to interpret the map Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 24 Wong 2010, Nature Methods 1. Color Pitfalls: Relativity Color is a relative medium and context matters Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 25 Only 6-12 colors are visually discernable Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer Cinteny: flexible analysis and visualization of synteny and genome rearrangements in multiple organisms. Sinha and Meller. Bioinformatics 2007 1. Color Pitfalls: Discriminability 26 estimate: Howard Hughes Medical Institute, http://www.hhmi.org/senses/b130.html 1. Color Pitfalls: Color Blindness Normal Vision Deuteranope Vision (“Red-Green Blindness”) ~ 7% of male population affected Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 27 Ware 2008,Visual Thinking for Design 2. 3D - spatial encoding ranking for planar spatial position, not depth - how we see in 3D: - - rapid eye-movement - head and body movements legitimate for 3D spatial data, difficult to justify for abstract data Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 28 Moore et al. 2011, Proceedings of Pacific Symposium on Biocomputing 2011 2. 3D Pitfalls: Perspective - perspective distortion: interferes with size channel encoding - shading: interferes with color, lightness, and saturation channel encodings Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 29 BioLayout 3D 2.0 sample dataset, http://www.biolayout.org 2. 3D Pitfalls: Occlusion Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 30 2. 3D Pitfalls: Text Legibility Mukherjea and Foley 1995, Visualizing the World-Wide Web with the Navigational View Builder. Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 31 3. Animation - - external versus internal memory - easy to compare by moving eyes between views - hard to compare view to memory of what you saw when to use animation? - good: chronological storytelling - good: transition between states - poor: multiple states with multiple changes Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 32 3. Animation Pitfall Global comparisons are difficult Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 33 Barsky et al. 2008, Cerebral:Visualizing Multiple Experimental Conditions on a Graph with Biological Context 3. Animation Pitfall Small Multiples: one view per state - show time with space Barsky, Munzner, Gardy, Kincaid 2008, Cerebral:Visualizing Multiple Experimental Conditions on a Graph with Biological Context. Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 34 1.2 Multiple Views Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer Roberts 2007, Coordinated and Multiple Views in Exploratory Visualization Linked Views - beyond static views, multiple linked views - “allow the user to have a dialog with the data” - technique that allows for data exploration - interactive, multiple views of the data Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 36 large−pse outliers 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 images courtesy of Angela DePace and Charles Fowlkes large−pse outliers 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 images courtesy of Angela DePace and Charles Fowlkes Meyer et al. 2010, MulteeSum: A Tool for Comparative Spatial and Temporal Gene Expression Data 1.3 Design Process Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer target translate design implement validate user-centered design usability engineering participatory design evaluate target translate design implement validate user-centered design usability engineering participatory design evaluate target translate design implement validate user-centered design usability engineering participatory design evaluate target translate user-centered design usability engineering participatory design design evaluate design implement validate target user-centered design usability engineering translate translate design implement validate participatory design evaluate user-centered design target usability engineering participatory design translate design evaluate implement validate validate Carpendale 2008, Evaluating Information Visualizations Validation Methods Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 44 Engadget Exercise: Critique Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 45 U.S. SmartPhone Marketshare 21.2% 39.0% RIM Apple Palm Motorola Nokia Other 3.1% 7.4% 9.8% 19.5% U.S. SmartPhone Marketshare 21.2% RIM Apple Palm Motorola Nokia Other 39.0% 3.1% 7.4% 9.8% 19.5% U.S. SmartPhone Marketshare 40% 39.0 30% 20% 21.2 19.5 10% 9.8 7.4 0% 3.1 RIM Other Apple Palm Motorola Nokia Part 2 Key Methods and Software Tools Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer Applications for Data Visualization 1. Presentation “A picture is worth a thousand words.” “A good sketch is better than a long speech.” (Napoleon Bonaparte) 2. Confirmation “I believe it when I see it.” 3. Exploration Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 50 Minard 1869 Presentation: March on Moscow Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 51 Anscombe 1973,The American Statistician Confirmation: Anscombe’s Quartet mean(X) = 9, var(X) = 11, mean(Y) = 7.5, var(Y) = 4.12, cor(X,Y) = 0.816, linear regression line Y = 3 + 0.5*X Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 52 Anscombe 1973,The American Statistician Confirmation: Anscombe’s Quartet Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 53 Exploration: Hypothesis Generation trends gaps outliers clusters - A large data set is given and the goal is to learn something about it. - Visualization is employed to perform pattern detection using the human visual system. - The goal is to generate hypotheses that can be tested with statistical methods or follow-up experiments. Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 54 Exploration: Hypothesis Generation - Visualization for exploration is an “Exploratory Data Analysis” technique (Tukey 1977). Statistical graphics such as box plots and scatter plots are early examples. - When there is a specific question that can easily be determined algorithmically (“What is the highest value?”), then visualization is usually not the right tool. - When it is not clear what should be asked or when the answer can not be summarized easily (“What is the distribution of the values?”), then visualization is an excellent choice. - Visualization for exploration is challenging because the data sets are getting bigger and bigger and more heterogeneous. Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 55 Shneiderman 1996, in Proceedings IEEE Symposium on Visual Languages Exploration: Information Seeking Mantra - In explorative settings the user is normally dealing with large amounts of data. - Impossible to grasp everything at once. - Solution: Make visualizations interactive to support the user in exploring subsets of the data at different resolutions. - Ben Shneiderman’s Information Seeking Mantra: - Overview first, zoom and filter, then details on demand. - Overview first, zoom and filter, then details on demand. - Overview first, zoom and filter, then details on demand. Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 56 2.1 Key Methods and Software Tools Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer Biological Data Types - Experimental data and knowledge - Detail and overview - example: 3D structure of a protein versus metabolome map of an organism - Complex relationships - example: gene expression data, protein-DNA interactions, sequence motifs Biological data is heterogeneous, complex and often very large! Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 58 Exercise: Biological Data Types data tabular ordered categorical ordinal protein structure gene expression data genome sequence quantiative relational spatial pathway phylogeny sequence alignment Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 59 Overview: Biological Data Types Sequences genes, alignments, genomes Multivariate Data gene and protein expression levels, metabolite concentrations Networks protein interactions, gene regulation, metabolic pathways Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 60 http://host13.bioinfo3.ifom-ieo-campus.it/fancygene/ Sequences: Genes - Genes are linear sequences, nucleotide or amino acid alphabet - Visualization of primary sequence and additional annotation data (e.g. gene architecture, isoforms) NM_000546 gattggggttttcccctcccatgtgctcaagactggcgctaaaagttttgagcttctcaaaagtctagagc NP_000537 MEEPQSDPSVEPPLSQETFSDLWKLLPENNVLSPLPSQAMDDLMLSPDDIEQWFTEDPGPDEAPRMPEAA Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 61 Sequences: Alignments A multiple sequence alignment is basically a matrix: ! rows correspond to sequences ! columns correspond to aligned sites gi|40254597|ref|NM_009871.2| gi|16758761|ref|NM_053891.1| gi|34304373|ref|NM_003885.2| gi|114668067|ref|XM_001158783. gi|297700499|ref|XM_002827237. gi|297272338|ref|XM_001113136. gi|296202033|ref|XM_002748377. gi|301753157|ref|XM_002912351. gi|73966855|ref|XM_548274.2| gi|194217295|ref|XM_001501617. gi|148539985|ref|NM_174512.3| gi|162287175|ref|NM_001101816. gi|291405555|ref|XM_002718944. gi|126313846|ref|XM_001368043. gi|149599543|ref|XM_001510922. gi|301626334|ref|XM_002942302. gi|147905168|ref|NM_001085672. gi|50540099|ref|NM_001002515.1 ** * * * ** ** ** * ** ** * * ** ** ** ** * ** ** ** *********** ** ** ** * TCTGAG---GTGGGCTCCGACCATGAGCTCCAGGCTGTCCTGCTGACCTGTCTGTACCTCTCCTATTCCTACATGGGCAATGAGATCTCCT TCTGAG---GTGGGCTCGGACCACGAGCTCCAGGCTGTCCTGCTGACCTGTCTGTACCTCTCCTATTCCTACATGGGCAATGAAATCTCCT TCCGAG---GTGGGCTCGGATCACGAGCTCCAGGCCGTCCTGCTGACATGCCTGTACCTCTCCTACTCCTACATGGGCAACGAGATCTCCT TCCGAG---GTGGGCTCGGATCACGAGCTCCAGGCCGTCCTGCTGACATGCCTGTACCTCTCCTACTCCTACATGGGCAACGAGATCTCCT TCCGAG---GTGGGCTCGGATCACGAGCTCCAGGCCGTCCTGCTGACATGCCTGTACCTCTCGTACTCCTACATGGGCAACGAGATCTCCT TCCGAG---GTGGGCTCGGATCACGAGCTCCAGGCCGTCCTGCTGACGTGCCTGTACCTCTCCTACTCCTACATGGGCAACGAGATCTCCT TCGGAG---GTGGGCTCAGATCACGAGCTCCAGGCCGTCCTGCTGACCTGCCTGTACCTCTCCTACTCCTACATGGGCAACGAAATCTCCT TCCGAG---GTGGGCTCGGACCACGAGCTCCAGGCCATCCTGCTGACCTGCCTGTACCTCTCCTACTCCTACATGGGCAACGAGATCTCCT TCCGAG---GTGGGCTCGGACCACGAGCTCCAGGCCATCCTGCTGACCTGCCTGTACCTCTCCTACTCCTACATGGGCAACGAGATCTCCT TCTGAG---GTGGGCTCCGACCACGAGCTCCAGGCTGTCCTGCTGACCTGCCTGTACCTCTCCTACTCCTACATGGGCAACGAGATCTCCT TCCGAG---GTGGGTTCCGACCACGAGCTCCAGGCGGTCCTGCTGACCTGCCTGTACCTCTCCTACTCCTACATGGGCAACGAGATCTCCT TCCGAG---GTGGGCTCCGACCACGAGCTCCAGGCTGTCCTGCTGACCTGCCTGTACCTTTCCTACTCCTACATGGGCAACGAGATCTCCT TCCGAG---GTGGGCTCGGACCACGAGCTCCAGGCCGTGCTGCTGACCTGCCTGTACCTCTCCTACTCCTACATGGGCAACGAGATCTCCT TCTGAG---GTCGCCACGGACCATGAGCTACAGGCTGTCCTGTTGACCTGCCTGTACCTCTCCTATTCCTACATGGGCAATGAGATCTCCT CCCGAG---CTGGCCGCCGACCACGAGCTGCAGGCCGTCCTGCTCACCTGCCTCTACCTGTCCTACTCCTACATGGGCAACGAGATCTCCT GGGGACTCTGTGGCCACCGAACATGACTTGCAAGCCACCCTCTTGACCTGCCTCTATTTGTCCTACTCCTACATGGGCAACGAGATATCCT GGGGACTCTGTGGCCACCGAACATGACTTGCAAGCCACCCTTCTAACCTGCCTCTACTTGTCCTACTCTTACATGGGCAACGAGATATCCT TCTGAG---GTGGCCACAGAGCACGAGCTGCAGGCCGTCCCGCTGACCTGCCTCTACCTGTCTTACTCATACATGGGCAATGAGATCTCGT ..1150......1160......1170......1180......1190......1200......1210......1220......1230..... 88 88 88 88 88 88 88 88 88 88 88 88 88 88 88 91 91 88 Example from ClustalW http://www.clustal.org/ Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 62 http://www.jalview.org Sequences: Alignments Amino acid sequence alignment with amino acid color code Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 63 http://www.jalview.org Sequences: Alignments Amino acid sequence alignment with hydrophobicity color code Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 64 Sequences: Alignments Tools to visualize alignments need to support 1. Computation of alignments 2. Various color maps for nucleotides / amino acids / chemical properties 3. Editing and analysis of the sequences Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 65 Sequences: Genomes - Raw data - reads from sequencing - Primary data - DNA sequence: chromosomes are either linear or circular - Annotation - proteins - gene models (exon-intron structure etc.) - ontology Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 66 Sequences: Genome “Browsers” - display genomic data in a “position-centric” view - genome serves as reference for positions - usually track-based - varying levels of interactivity - browsing vs exploration - web-browser-based or desktop applications Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 67 http://genome.ucsc.edu Sequences: UCSC Genome Browser - most commonly used browser - supports basically any data type that can be mapped to the genome - “classic implementation”: images are rendered on the server and embedded in the webpage Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 68 http://genome.ucsc.edu Sequences: UCSC Genome Browser Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 69 http://genome.ucsc.edu Sequences: UCSC Genome Browser Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 69 Sequences: UCSC Genome Browser “squished” “dense” “packed” “full” Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 70 http://www.genomeview.org Sequences: GenomeView - next-generation genome browser - annotation editor: sequences, annotation, multiple alignments, syntenic mappings, short read alignments and more can be displayed Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 71 http://www.broadinstitute.org/igv Robinson et al. 2011, Nature Biotechnology Sequences: Integrative Genomics Viewer (IGV) - visualization tool for interactive exploration of large, integrated datasets. - supports a wide variety of data types including sequence alignments, expression data, copy number variation, RNA-seq, annotations Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 72 Robinson et al. 2011, Nature Biotechnology Sequences: Integrative Genomics Viewer (IGV) Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 73 Robinson et al. 2011, Nature Biotechnology Sequences: Integrative Genomics Viewer (IGV) Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 74 http://www.savantbrowser.com Fiume et al. 2010, Bioinformatics Sequences: Savant Genome Browser reference sequence read coverage reads Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 75 http://www.savantbrowser.com Fiume et al. 2010, Bioinformatics Sequences: Savant Genome Browser Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 76 http://www.savantbrowser.com Fiume et al. 2010, Bioinformatics Sequences: Savant Genome Browser Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 77 http://www.savantbrowser.com Fiume et al. 2010, Bioinformatics Sequences: Savant Genome Browser Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 78 http://www.savantbrowser.com Fiume et al. 2010, Bioinformatics Sequences: Savant Genome Browser Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 79 http://www.savantbrowser.com Fiume et al. 2010, Bioinformatics Sequences: Savant Genome Browser Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 80 http://www.savantbrowser.com Fiume et al. 2010, Bioinformatics Sequences: Savant Genome Browser Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 81 http://mkweb.bcgsc.ca/circos Krzywinski et al. 2009, Genome Research Sequences: Circos Clark et al. 2009, PLoS Genetics Jones et al. 2010, Genome Biology Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 82 Sequences: Comparative Genomics %2$5 !"#$%&'()#*+, -&!./,+./",'(0/1+$- %2$> %2$= %2$3 %2$5 $ %2 3<E? 44 %2 $4 %2 3 < %2 $3 ; %2$2 %2$A %2$@ %2$%2$% ? %2$ $+ %2 $4 %2 3: $ %2 %2 $ %2$4 734:<365 374<<::7< %2$ $4 5 8 %2$ %2$38 %2$ 7 %2$ 39 5 %2$6 %2$3 %2$37 %2$7 6 6 %2$ %2$3 739<;3:; 37456745; $35 %2 "#. /, (/,F&$. $8 $4 5 4 $3 %2 %2 %2 %2$ %2 $3 3 B/,& !+.#$+./", C D %2 %2$3 < %2$; %2$: $9 A"(."'( http://www.mizbee.org "$/&,.+./",' *+.%2 /,F&$!/", http://genome.lbl.gov/vista Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 83 http://compbio.med.harvard.edu/flychromatin/ Kharchenko et al. 2010, Nature Sequences: Epigenomics Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 84 Lieberman-Aiden et al. 2009, Science Sequences: 3D Genome Structure Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 85 Multivariate Data - typical “omics” data: transcriptomics, proteomics, metabolomics - expression/concentration levels of many biological entities (transcripts, proteins, etc.) across many different conditions/time points - entity levels measured per sample on a “genome-wide” scale - often entities are not measured directly Entity B Level Level Entity A Conditions Conditions Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 86 Multivariate Data Interaction Networks and Pathways integrated with Expression and Concentration Data Metabolite Map Peptide Map NMR Spectra Mass Spectra Protein Map 1D Gene Expression Matrix 2D Microarray Image Graph Protein Expression Matrix Insight Metabolite Concentration Matrix Matrix Pathway RNA-seq Reads Protein-Protein/ Protein-Nucleotide Interactions various techniques Gel Data Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 87 Multivariate Data: “Raw” Data Interaction Networks and Pathways integrated with Expression and Concentration Data Metabolite Map Peptide Map NMR Spectra Mass Spectra Protein Expression Matrix Protein Map 1D Gel Data Gene Expression Matrix 2D Microarray Image Graph Metabolite Concentration Matrix Matrix Pathway Protein-Protein/ Protein-Nucleotide Interactions RNA-seq Reads various techniques “Raw” Data Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 88 Multivariate Data: Transciptomics - Microarray scans as images - Scatterplot: comparison of two distributions (experiments) of expression values - Profile plot: individual gene expression across experiments, often used in combination with clustering - Heatmap: colored view on full expression matrix, used in combination with clustering to place similar profiles next to each other - Dendrogram: hierarchical clustering of genes or experiments, often combined with heatmap to provide more information about the cluster structures Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 89 Gehlenborg et al. 2010, Nature Methods Multivariate Data: Transcriptomics after normalization ! ! ! ! ! ! ! 3 ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! 4 2 ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! !!! ! ! ! ! ! ! ! !! ! !! ! !!! !!! ! ! ! ! !! !! !! ! !! ! !! ! ! ! ! ! ! ! ! ! !!! ! ! !! ! !! ! !! !!! ! ! !!! ! !!! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !!!! !!!! ! !! !! !!!! ! !!! ! !! ! ! ! ! !! ! ! ! !!! ! ! ! !! ! !! ! ! ! !!!!! ! ! !! !! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! !! !!! !!!! ! ! ! ! ! ! ! !! ! ! ! ! ! ! !! ! ! ! ! !! ! ! ! ! !! ! ! ! ! ! ! ! !! !! ! ! ! ! ! ! !! ! !!! ! ! !!! ! ! !! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! !!! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! !! ! ! ! ! !! ! !! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !!!! !! ! ! !! !! ! ! ! ! ! ! ! ! !!! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! !!! ! ! ! ! ! ! ! ! !! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! !! ! ! ! ! !! ! ! !! ! ! ! ! !! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !!!!! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! !!!! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! !! !! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !!! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! !!! ! ! ! ! ! ! !! ! ! ! ! !!! ! ! ! !!!! ! !! ! !! ! ! ! ! ! ! ! ! !! ! !! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !!! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! !!! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! !! ! ! ! ! !!! ! ! ! ! ! ! ! ! ! ! ! !! !! ! ! ! ! ! ! ! ! ! !! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! !!! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! !! ! ! ! !! ! ! ! ! ! ! ! !! !! ! ! ! ! ! ! !! !!!! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! !! ! ! !! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! !! ! ! !! ! !!! ! ! ! ! ! ! ! ! ! !! ! ! ! !! ! ! ! ! ! ! !!!!!!! ! ! ! ! !! !! ! ! ! ! ! !! !! ! ! !! ! !! !! !! ! !! ! !! !! ! !! ! !! !! !!! ! ! !! ! ! !!! ! !! ! ! ! ! ! !!! ! !! !! ! ! !! ! !! !!! ! !! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! !!! ! ! !!!! !! !! ! ! !! ! ! !! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! !! ! ! !! ! ! ! ! ! ! !! ! !! ! ! ! ! !!! ! ! !! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! !!! ! ! !! !!! ! !! ! ! ! ! ! ! ! ! !! ! ! ! !! ! !!! ! ! ! !! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! !!! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! 0 ! ! −2 2 0 ! ! ! ! ! ! ! ! ! ! !! ! ! !! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! !! !! ! ! ! ! ! ! ! ! ! ! !!! ! ! !!!!! ! ! !! ! ! !! ! ! ! ! !! !!! ! ! !! ! !! !!! ! ! ! ! ! !!! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! !! ! !!!! !!! ! ! ! !! !! ! ! !! !! !! ! !!!! ! !!! !!!! ! ! !!!! !!! ! !! ! ! ! ! ! !!!! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! !! ! ! !! !! !! ! ! !!!! ! !!!! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! !!! ! ! ! ! !! !! ! ! !! !! ! ! ! !!!!!! ! !! ! ! !! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! !!! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! !! !! ! ! ! ! ! !!! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! !! ! !! ! ! !! ! ! ! ! ! ! !! !! !!!! ! ! !! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! !!! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! !! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !!! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !!! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! !!! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !!! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !!! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !!!! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !!! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !!! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! !!! ! ! ! ! !! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! !!!! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !!! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !!! ! ! ! ! ! ! ! ! ! ! ! !! ! !! ! ! ! ! ! ! !!!!!!! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !!! ! ! !!! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! !!! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! !!! ! ! ! ! ! ! !! ! ! ! ! ! ! !!! ! ! ! ! ! ! !!!!!! ! ! ! !!! ! ! ! !!!!! ! ! ! ! ! !! ! !! ! ! !! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! !! ! ! ! ! ! ! ! !!! !! !!! !! ! ! ! ! ! !! !! ! ! ! ! ! ! ! ! ! !!!!! ! ! ! ! ! ! ! ! ! !! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! !! !! ! ! ! !! ! ! ! ! !! ! ! ! ! ! ! ! ! ! !! ! ! ! ! !!! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! !! ! ! ! ! ! !! !! ! ! !! !! ! !!! ! ! ! ! ! ! !! !! ! ! ! ! !! ! ! !! ! ! ! ! !!! ! ! ! ! ! ! ! ! ! !!! ! ! ! ! !! !! ! ! ! ! ! !! ! ! ! ! ! !! ! ! ! ! !!! ! ! ! ! ! ! ! ! ! ! ! ! ! ! log expression ratio ! ! −2 ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! −4 ! ! −4 16 6 8 10 A = 0.5 * log(R*G) 12 14 16 Array3 14 Array3 A = 0.5 * log(R*G) 12 Array2 10 Array1 8 Array3 6 Array2 ! Array1 M = log(R/G) 2 ! b 4 a 1 ! ! Array2 before normalization Box Plot: 3 arrays Array1 MA Plot: 1 array 1 = before normalization 2 = after within-array normalization 3 = after between-array normalization Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 90 Multivariate Data: Proteomics - quantitative proteomics tries to measure the expression level of “all” proteins (as many as possible) in a sample - quantitative shotgun proteomics produces large and complex datasets (hundreds of GB per run) - data is obtained from liquid chromatography coupled with mass spectrometry (LC-MS or LC-MS/MS) measurements Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 91 Multivariate Data: Proteomics Pep3D www.proteomecenter.org Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 92 www.open-ms.de Multivariate Data: Proteomics Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 93 Multivariate Data: Proteomics a b TOPPView www.open-ms.de Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 94 Multivariate Data: Derived Matrices Interaction Networks and Pathways integrated with Expression and Concentration Data Derived Matrices Metabolite Map Peptide Map NMR Spectra Mass Spectra Protein Map 1D Gene Expression Matrix 2D Microarray Image Graph Protein Expression Matrix Matrix Metabolite Concentration Matrix Pathway Protein-Protein/ Protein-Nucleotide Interactions RNA-seq Reads various techniques Gel Data Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 95 Multivariate Data: Derived Matrices - matrices of multi-dimensional vectors - usually abundance profiles, e.g. transcript or protein levels, metabolite concentrations Meta Information Sample Attributes M D1 D2 M D1 D2 M D1 D2 Wild Type Gene A-/- Gene B-/- Factor Factor Value Gene Attributes Expression Profile Sample Visualization of Large-Scale Biological / Bio-ITgene Worldand 2011 sample / N Gehlenborg & M Meyer Figure 4.2: Expression matrix with Data associated attributes. See Figure 1.1 96 Gehlenborg et al. 2010, Nature Methods Multivariate Data: Derived Matrices Scatter Plot Principal Component 2 2 0 −2 −4 −4 −2 0 2 Principal Component 1 Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 97 Multivariate Data: Derived Matrices - Scatter Plots and Dimensionality Reduction - used to visualize high-dimensional profiles as projections in lower-dimensional spaces (usually 2D, sometimes also 3D ...) - there is always a loss of information in the process, goal is to minimize the loss of information - many different algorithms: Principal Components Analysis (PCA), MultiDimensional Scaling (MDS), Isomap, etc. - Pros - good choice to get an idea about the overall structure of the whole data set: clusters, outliers, gaps in the data - Cons - because of the dimensionality reduction the original profiles are not accessible in the visualization Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 98 Gehlenborg et al. 2010, Nature Methods Multivariate Data: Derived Matrices 3 Profile Plot a.k.a. Parallel Coordinates log expression ratio 2 1 0 −1 −2 0 7 14 21 28 35 42 49 56 63 70 77 84 91 98 105 112 119 −3 Time (min) Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 99 Multivariate Data: Derived Matrices - Profile Plot/Parallel Coordinate Plots - Pros - encoding by position: profiles easy to read - color-coding of expression profiles (groups) very efficient - Cons - overplotting - grows horizontally with every additional sample Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 100 Gehlenborg et al. 2010, Nature Methods Multivariate Data: Derived Matrices 3 Profile Plot a.k.a. Parallel Coordinates log expression ratio 2 1 0 −1 −2 0 7 14 21 28 35 42 49 56 63 70 77 84 91 98 105 112 119 −3 Time (min) Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 101 Gehlenborg et al. 2010, Nature Methods log expression ratio 0 ï 0 ï ï ï 0 7 70 77 ï 0 7 70 77 ï Time (min) Time (min) log expression ratio 0 ï 0 ï ï ï ï 0 7 70 77 ï 0 7 70 77 log expression ratio Profile Plot a.k.a. Parallel Coordinates log expression ratio Multivariate Data: Derived Matrices Time (min) Time (min) Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 101 Gehlenborg et al. 2010, Nature Methods Multivariate Data: Derived Matrices 0 7 14 21 28 35 42 49 56 63 70 77 84 91 98 105 112 119 Heat Map with Dendrogram Time (min) Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 102 Multivariate Data: Derived Matrices - Heatmap - Pros - no overplotting, yet a very dense information display - can be combined with dendrogram and additional information can be encoded in further columns or in the height of rows - Cons - only qualitative interpretation possible due to color coding - grows horizontally with every additional sample and grows vertically with every additional profile Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 103 Multivariate Data: Summary few, high-res many, low-res 3 2 Principal Component 2 log expression ratio 2 1 0 −1 −2 0 −2 −4 0 7 14 21 28 35 42 49 56 63 70 77 84 91 98 105 112 119 −3 −4 0 2 Principal Component 1 0 7 14 21 28 35 42 49 56 63 70 77 84 91 98 105 112 119 Time (min) −2 Time (min) Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 104 data: Lukk et al., 2010, Nature Biotechnology Problem: Very Large Expression Matrices Power Wall (7x4 screens = 11,200x4,800), University of Leeds 1000 transcripts, 5372 samples Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 105 Gehlenborg and Brazma, 2009, BMC Bioinformatics New Visualization Method: Space Maps Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 106 Gehlenborg and Brazma, 2009, BMC Bioinformatics New Visualization Method: Space Maps L5 Observation I L4 L3 Observation II L2 L1 Observation III Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 107 Gehlenborg and Brazma, 2009, BMC Bioinformatics New Visualization Method: Space Maps L5 Observation I L4 L3 Observation II L2 L1 Observation III Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 107 Gehlenborg and Brazma, 2009, BMC Bioinformatics New Visualization Method: Space Maps Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 108 Networks Interaction Networks and Pathways integrated with Expression and Concentration Data Networks Metabolite Map Peptide Map NMR Spectra Mass Spectra Protein Map 1D Gene Expression Matrix 2D Microarray Image Graph Protein Expression Matrix Matrix Metabolite Concentration Matrix Pathway Protein-Protein/ Protein-Nucleotide Interactions RNA-seq Reads various techniques Gel Data Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 109 Networks - Data-derived - protein-protein interaction or protein-DNA interaction networks derived from Chromatin Immuno Precipitation (ChIP) or Yeast-2-Hybrid (Y2H) measurements - gene regulatory networks inferred from gene expression data - correlation networks derived from gene expression data - Knowledge-derived - biochemical pathway maps - other curated networks derived from the literature - Combination of networks and multivariate data Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 110 Protein-Protein Interaction Networks - Protein-protein interaction (PPI) networks are graphs containing an edge for each PPI. - They show significant functional clustering: proteins with related function often form densely connected subgraphs. - Visualization of PPIs requires automated layout algorithms, e.g. force-directed layout or circular layout to arrange the nodes on the screen according to some optimization criterion. - Gene regulatory networks, correlation networks and protein-DNA interaction networks are visualized in a very similar way, the major differences are the types of edges (directed, undirected and other types). Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 111 Networks: Layout Algorithms Circular Layout Force-directed Layout Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 112 Barabasi & Oltvai 2004, Nature Reviews Genetics Yeast Protein-Protein Interaction Network - This network shows the largest connected component of the yeast interactome as determined by Yeast-2-Hybrid - This component contains 78% of all proteins - Nodes are color-coded by the effect of a knock-out mutant: Red: lethal Green: non-lethal Orange: slow growth Yellow: unknown - Hubs are often colored red! Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 113 NSF Visualization Challenge 2011 Honorable Mention: AraNet Networks: Hairballs, Ridiculograms & Friends Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 114 Gehlenborg et al. 2010, Nature Methods Protein-Protein Interaction Networks Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 115 Gehlenborg et al. 2010, Nature Methods Protein-Protein Interaction Networks Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 116 Gehlenborg et al. 2010, Nature Methods Protein-Protein Interaction Networks Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 117 Gehlenborg et al. 2010, Nature Methods Protein-Protein Interaction Networks Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 118 Gehlenborg et al. 2010, Nature Methods Protein-Protein Interaction Networks Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 119 http://www.cytoscape.org/ Cytoscape Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 120 http://ophid.utoronto.ca/navigator NAViGaTOR Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 121 http://www.gephi.org Gephi Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 122 http://www.genome.jp/kegg/pathway.html Pathways - KEGG Pathway “Wiring diagrams of molecular interactions, reactions, and relations” A collection of network diagrams (manual layout!) Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 123 http://www.genome.jp/kegg/pathway.html Pathways KEGG Pathway provides so called reference pathways containing most metabolic pathways. Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 124 http://www.genome.jp/kegg/pathway.html Pathways Species specific pathways highlight those enzymes available in a specific organism in green Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 125 http://www.biocyc.org Pathways - BioCyc (EcoCyc and MetaCyc) 2009: 1355 pathways with 7837 reactions, 5792 enzymes Three types of databases Tier 1: intensively curated databases Tier 2: Computationally derived databases subject to moderate curation Tier 3: Computationally derived databases without curation - visualize individual metabolic pathways, or to view the complete metabolic map of an organism Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 126 http://www.biocyc.org Pathways BioCyc draws pathways interactively with varying level of detail Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 127 http://www.biocyc.org Pathways BioCyc draws pathways interactively with varying level of detail Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 128 http://www.biocyc.org Pathways BioCyc draws pathways interactively with varying level of detail Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 129 http://www.biocyc.org Pathways: Metabolomic Map - Nodes represent metabolites Shape indicates class of metabolite (see key to right). Lines represent reactions. Moving the mouse over a metabolite icon identifies it. BioCyc draws pathways interactively with varying level of detail Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 130 Gene Regulatory Networks - a gene regulatory network is the set of activating and repressing genes or gene products and their interactions - networks are derived from (transcriptomics) datasets using network inference techniques - resulting networks are visualized as a network graph G = (V, E) where ! V is the set of nodes representing the involved genes/gene products ! E is the set of edges representing the transcriptional regulation of the genes ! Edges can be either activating (+) or repressing (-) Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 131 Westenberg et al. 2008, Comp Graph Forum Westenberg et al. 2010, Bioinformatics Gene Regulatory Networks SpotXplore - maps expression profiles onto regulatory network - statistics can be visualized - interaction - highlight subnetworks - Cytoscape plugin Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 132 Gehlenborg et al. 2010, Nature Methods Networks and Multivariate Data Cerebral (Cytoscape) Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 133 Gehlenborg et al. 2010, Nature Methods Networks and Multivariate Data Lichen Prometra VistaClara (Cytoscape) GENeVis VisANT VANTED Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 134 Networks and Multivariate Data: Choice!? 1. Small multiples? (one value per node, all networks shown simultaneously) 2. Animation? (one value per node, one network shown at a time) 3. Complex glyphs? (multiple values per node) 4. Combination of multiple views? (network linked to heat map or profile plot) Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 135 Saraiya et al., 2005, InfoVis 2005 Proceedings Networks and Abundance Data: Choice!? Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 136 Part 3 Design of Visualization Systems Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer Challenge: Heterogeneity Pathline A Tool for Comparative Functional Genomics Data joint work with: Bang Wong, Mark Styczynski, Tamara Munzner, Hanspeter Pfister Pathline: A Tool for Comparative Functional Genomics M. Meyer et al., IEEE/Eurographics EuroVis 2010. Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 138 target translate design implement validate target translate design implement validate functional genomics how do genes work together to perform different functions in a cell? functional genomics data gene expression molecular pathways functional genomics data gene expression molecular pathways gene expression is ... biologists measure it ... ... for many genes ... in many samples (time points, tissue types, species) visualized with heatmaps [Wilkinson09] [Saldanha04] [Seo02] [Eisen98] [Gehlenborg10] [Weinstein08] encode value with color samples genes ... the measured level of how much a gene is on or off ... a single quantitative value 0.2 0.4 1.0 1.0 1.0 0.8 1.0 0.0 0.0 0.0 1.0 1.0 0.7 0.8 1.0 1.0 0.8 0.6 1.0 0.0 0.2 0.5 1.0 1.0 0.5 0.8 0.5 0.3 0.5 0.8 0.7 0.5 0.8 0.7 1.0 1.0 1.0 0.3 0.4 1.0 1.0 1.0 0.5 0.0 0.0 0.7 0.5 0.3 gene expression is ... biologists measure it ... ... for many genes ... in many samples (time points, tissue types, species) visualized with heatmaps [Wilkinson09] [Saldanha04] [Seo02] [Eisen98] [Gehlenborg10] [Weinstein08] encode value with color samples genes ... the measured level of how much a gene is on or off ... a single quantitative value 0.2 0.4 1.0 1.0 1.0 0.8 1.0 0.0 0.0 0.0 1.0 1.0 0.7 0.8 1.0 1.0 0.8 0.6 1.0 0.0 0.2 0.5 1.0 1.0 0.5 0.8 0.5 0.3 0.5 0.8 0.7 0.5 0.8 0.7 1.0 1.0 1.0 0.3 0.4 1.0 1.0 1.0 0.5 0.0 0.0 0.7 0.5 0.3 gene expression is ... ... the measured level of how much a gene is on or off ... a single quantitative value biologists measure it ... ... for many genes ... in many samples (time points, tissue types, species) visualized with heatmaps [Wilkinson09] [Saldanha04] [Seo02] [Eisen98] [Gehlenborg10] [Weinstein08] encode value with color augmented with clustering [Eisen98] functional genomics data gene expression molecular pathways the functioning of a cell is controlled by many interrelated chemical reactions performed by genes input output / input genes output genes = cell function glycolysis tca cycle pathways www.genome.jp/kegg/ functional genomics how do genes work together to perform different functions in a cell? comparative functional genomics how do the gene interactions vary across different species? collaborators: Regev Lab at the Broad Institute biology: metabolism in yeast data: multiple genes multiple time points multiple related species multiple pathways problem: existing tools can only look at a subset of this data comparative functional genomics how do the gene interactions vary across different species? target translate design implement validate t s6 gene expression glycolysis metabolic pathways •10 to 50 pathways of interest genes and 140 metabolites •6 •inputs/outputs called metabolites •directed tca cycle graph t1 time points •14 species of yeast •3D table t1 s2 t1 s1 t1 g1 t20.2 g2 t30.41 s4 s3 •6000 t1 g1 t20 s5 g30.0 -0 g1 0.2 g20.41.0 1.0 t2 t3 t4 t5 g30.0 g40.8 -0.7 1 g1 0.2 g20.41.0 1.0 1.00.0 1.0 t2 t3 t4 t5 t6 g40.8 g5 1.0 0.0 -0 g30.0 -0.7 0.01.0 1.0 g1 0.2 g20.41.0 1.0 1.00.0 1.0 1.0 t2 t3 t4 t5 t6 g50.0 -0.5g6 0.8 -0 g4 1.0 g30.0 -0.7 0.8 0.8 g1 0.2 g20.41.0 1.0 1.00.00.8 1.00.01.0 1.01.01.00.2 t2 t3 t4 t5 t6 g7 -0.7 0.5 -1g50.0 -0.5g6 0.8 g4 1.0 0.50.5 1.0 g30.0 -0.7 0.8 0.2 0.8 g1 0.2 g20.41.0 1.0 1.00.00.8 1.00.01.0 1.01.01.00.2 t1 g8 -1.0 -0.3 -0g60.8 -0.7g7 0.5 0.8 g5 -0.5 -0.3 -0. g40.81.0 1.0 0.2 -0.70.0 0.2 m1 1.0 g30.0 0.01.00.0 1.01.00.2 0.80.80.50.5 g8-0.3 -0.5 0.0 g70.5 -1.0 0.4 g6 -0.7 0.8 -0.7 -1.g50.0 -0.5 -0.3 -0.5 -0.5 0.2 g2 -0.7g40.81.0 1.0 1.00.20.8 0.80.50.5 0.21.0 g8-0.3 -0.5 0.0 0.0 g7 -1.0 0.4 -1.0 -1.g60.8 -0.7 0.8 -0.7 -1.0 0.5 -0.50.2 -0.3 -0.5 -0.5 m2 1.0 g50.0 0.50.50.5 1.0 0.2 g8-0.3 -0.5 0.0 0.0 -0.7-1.0 -0. g70.5 -1.0 0.4 -1.0 -1.0 -0.70.5 0.8 -0.7 -1.0 0.5 g3 -0.5g60.8 -0.3 -0.5 -0.5 g8-0.3 -0.5 0.0 -0.7-1.0 -0.5 -0.7 -1.00.8 0.40.0 -1.0 -1.0 m3 -0.7g70.5 -0.7 -1.0 0.5 similarity scores s1 t1 g1 0.2 aggregate s2 t1 g1 0.2 s3 t1 g1 0.2 t2 t3 t4 t5 t6 0.4 1.0 1.0 1.0 1.0 t2 t3 t4 t5 t6 0.4 1.0 1.0 1.0 1.0 t2 t3 t4 t5 t6 0.4 1.0 1.0 1.0 1.0 ... , , , •aggregate time series for a gene/metabolite over species •similarity = 0.83 of expression across species •aggregate: Pearson, Spearman, others -0.50.40.0-1.0 0.0-1.0 -0.7-1.0 -0.5 -0.7 g7 -1.0g8-0.3 phylogeny g8 -0.5 0.0 0.0 -0.7 -0.5 -0.7 •evolutionary relationship '"#()* '"#+%, '"#-./ '"#-./01 2"#3$. '"#(.4 •binary tree 5"#&6$ 5"#7.$ 5"#$.( '"#,$0 8"#9.: 2"#.$- !"#$%& •quantitative value '"#;.& '"#&6+ tasks - study expression data as a time series - compare a limited number of time series - compare similarity scores along a pathway(s) - comparison of multiple similarity scores metabolic pathways similarity scores similarity scores similarity scores gene expression phylogeny target translate design implement validate slide from Munzner 2011,Visualization Principles Power of the plane: only position works for all! 158 encode quantitative values with spatial position 0123A1# 0'12' "1'' linearized pathway !" $" topological layout <"2:5;78=":=5"A !" !% , ." encode quantitative values with - spatial position !+ ) * !"' www.win.tue.nl/~mwestenb/genevis/ 0123456" heatmap 0123A1# 172884798:7;8<01:125= <"2:5;78=":=5"A )2)) / ,$ @0";5"@ ;>:="<10 $% $2)) +$ +( curvemap $& $'( $') $'& $'* $+, $+' .' !" .+ 1$ !# 0 +& .- - .% . +$) encode quantitative values with spatial position ,( +$) ./ !$% !$& !$# !$' !() !($ !(( !(* .( .) +$# !*) .& .* .', .'' ,* courtesy +(# of M. Styczynski from JavaTreeview jtreeview.sourceforge.net/ +( .'+ $++ Pathline linearized pathway representation 0123456" linearized pathway representation common axes to compare similarity scores similarity score 0'12' !" !% , bars and circles - visual layers for selective attention - color-code gene direction ." - <"2:5;78=":=5"A pathway $" 0123A1# - !+ ) * !"' 0.0 "1'' 1.0 !" 0123456" linearized pathway representation common axes to compare similarity scores - 0'12' !" , bars and circles - visual layers for selective attention - color-code gene direction multiple similarity scores similarity score !% ." - <"2:5;78=":=5"A pathway $" 0123A1# - !+ ) * !"' 0.0 "1'' 1.0 !" 3,4", )& %& %* 0 2& linearized pathway representation &4,, !" !# !$ 1 %$ . / %&, common axes to compare similarity scores - multiple similarity scores - multiple pathways )* bars and circles - visual layers for selective attention - color-code gene direction %&' !+, %*' )+ - %&, %* %++ %&' %&( pathway to ordered list of nodes /*0#12)30* +*$*,-. !"#$%& 423$0 !"#$%& 6 6 7 7 '(%)* ,#. and cut ,1. unroll ,%. reinsert ,5. shared coordinate frame and stylized marks 1234567" 1234B2# 3,4", &4,, )& %& %* 0 2& linearized pathway representation ="3;6<89>";>6"B !" !# !$ 1 %$ . / %&, putting it together . . . - topology is secondary )* use spatial position for similarity scores %&' !+, %*' )+ - %&, %* %++ %&' %&( Pathline curvemap $" $# "1(( curvemap !" !# - &'(% '% )*+, '& . base visual unit is a)'-. curve $ = % ( = > ? = @ time inspired by heatmaps - $% !"#% expression 0(12( !"#$ '( time $, )&/ ') 012- '* )3-4 '+ #562 ', * + $"( $"( !"# $ % $& 0(12( $" $# !"#$ "1(( curvemap !" !# - &'(% '% )*+, '& inspired by heatmaps . - - $, $% !"#% base visual unit is a)'-. curve '( filled, framed line charts to enhance shape perception )&/ ') * + 012- '* )3-4 '+ #562 ', $"( $"( !"# $ % $& $ = % ( = > ? = @ 1234567" 1234B2# ="3;6<89>";>6"B 0(12( !" #$CDC#7<7$CC(1%' #" #$ !$ /" * + base visual unit is a curve !"( &' C,1, (% 0%1E C%1' 0%1% C21) 0%1F C'1( (* 0,1) C)1F (+ 0F1F C)1$ (, !"( filled, framed line charts to enhance shape perception 0%1$ C)1( (- 021E C'1E (. 0%1" C)1, (/ rows are species 0%1% &$ !"% C'1, (%0 #'( 0E1' C'1E (%% 0%1, C"1F (%' 0$1F C"1F (%) 0(12( 3456-5+ 5787597 %"&'()*+&"$ !$% !$ 0,1E "1(( !"# !"#"$ &' - &% () . inspired by heatmaps - $% 9>";52#A (' - !, - <?;>"=21 "1(( !" &" curvemap A1"<6"A .:+:57*;:4<-= %"&,+-$ &7-594<>?.#54?@" C"1E (%* 0%1( C)1( C%1' C)1, 0E1' 021$ 0,1" &7-594<>?.#54?@$ &7-594<ABB 1234567" 1234B2# ="3;6<89>";>6"B 0(12( !" #$CDC#7<7$CC(1%' #" #$ !$ /" * + - base visual unit is a curve !"( &' C,1, (% 0%1E C%1' 0%1% C21) 0%1F C'1( (* 0,1) C)1F (+ 0F1F C)1$ (, !"( filled, framed line charts to enhance shape perception 0%1$ C)1( (- 021E C'1E (. 0%1" C)1, (/ rows are species 0%1% &$ !"% C'1, (%0 #'( columns are genes/metabolites 0E1' C'1E (%% 0%1, C"1F (%' 0$1F C"1F (%) 0(12( 3456-5+ 5787597 %"&'()*+&"$ !$% !$ 0,1E "1(( !"# !"#"$ &' - &% () . inspired by heatmaps - $% 9>";52#A (' - !, - <?;>"=21 "1(( !" &" curvemap A1"<6"A .:+:57*;:4<-= %"&,+-$ &7-594<>?.#54?@" C"1E (%* 0%1( C)1( C%1' C)1, 0E1' 021$ 0,1" &7-594<>?.#54?@$ &7-594<ABB 1234567" 1234B2# ="3;6<89>";>6"B 0(12( !" #$CDC#7<7$CC(1%' #" #$ !$ /" * + - - base visual unit is a curve !"( &' C,1, (% 0%1E C%1' 0%1% C21) 0%1F C'1( (* 0,1) C)1F (+ 0F1F C)1$ (, !"( filled, framed line charts to enhance shape perception 0%1$ C)1( (- 021E C'1E (. 0%1" C)1, (/ rows are species 0%1% &$ !"% C'1, (%0 #'( columns are genes/metabolites 5787597 !$% !$ (%% 0%1, C"1F 0$1F C"1F 0,1E "1(( !"# !"#"$ %"&'()*+&"$ C'1E (%) 0(12( 3456-5+ 0E1' (%' overlays to enhance trends &' - &% () . inspired by heatmaps - $% 9>";52#A (' - !, - <?;>"=21 "1(( !" &" curvemap A1"<6"A .:+:57*;:4<-= %"&,+-$ &7-594<>?.#54?@" C"1E (%* 0%1( C)1( C%1' C)1, 0E1' 021$ 0,1" &7-594<>?.#54?@$ &7-594<ABB 1234567" 1234B2# ="3;6<89>";>6"B 0(12( !" #$CDC#7<7$CC(1%' #" #$ !$ /" * + - - base visual unit is a curve !"( &' C,1, (% 0%1E C%1' 0%1% C21) 0%1F C'1( (* 0,1) C)1F (+ 0F1F C)1$ (, !"( filled, framed line charts to enhance shape perception 0%1$ C)1( (- 021E C'1E (. 0%1" C)1, (/ rows are species 0%1% &$ !"% C'1, (%0 #'( columns are genes/metabolites 5787597 !$% !$ (%% 0%1, C"1F 0$1F C"1F 0,1E "1(( !"# !"#"$ %"&'()*+&"$ C'1E (%) 0(12( 3456-5+ 0E1' (%' overlays to enhance trends &' - &% () . inspired by heatmaps - $% 9>";52#A (' - !, - <?;>"=21 "1(( !" &" curvemap A1"<6"A .:+:57*;:4<-= %"&,+-$ &7-594<>?.#54?@" C"1E (%* 0%1( C)1( C%1' C)1, 0E1' 021$ 0,1" &7-594<>?.#54?@$ &7-594<ABB target translate design implement validate Demo target translate design implement validate case study - qualitative research method - in-depth study of individual or group - real-world setting - description and interpretation 0123456" 0123A1# <"2:5;78=":=5"A 0123456" 2+3"+ ;>:="<10 $% &3++ %& (& @0";5"@ 8=":41#@ $& $' D$3" () 2)3" %) DF3* / <"2:5;78=":=5"A 1& 0123A1# 0 +2++ (& @0";5"@ $% (, () . 2*3) 2&3* D*3+ (% %&+ / D)3) DF3" 2)3, !" !# !$ D)3, (& (* 2+3, D$3' (' 2)3' 0 (+ (- D#3+ 2*3) D'3" - (. %&' () %$ 2&3* ;>:="<10 &2++ %&+ 1& (* (+ %$ %& %) !" !$DED!8=8$DD+3') !# !$ 2&3, . (, ()/ !*+ D#3# 2*3& D"3" %&+ ()) 2,3* (% ()* %&+ D&3' 2)3' D+3+ ()+ 2+3"+ !"# (), !"#"$ %)' 4567/6. 68986:8 (* %) %"&'()*+&"$ %"&,+-$ (, () %&' %&* D)3# D,3* (8/6:5=?@0!65@A) (8/6:5=BCC DF3& 2&3& 0;.;68-<;5=/> (8/6:5=?@0!65@A& %** (& &3++ 2&3* 2*3) 2,3* DF3" (' 2)3* $& $' 0123456" 0123A1# <"2:5;78=":=5"A 0123456" 2+3"+ ;>:="<10 $% &3++ %& (& @0";5"@ 8=":41#@ $& $' D$3" () 2)3" %) DF3* / <"2:5;78=":=5"A 1& 0123A1# 0 +2++ (& @0";5"@ $% (, () . 2*3) 2&3* D*3+ (% %&+ / D)3) DF3" 2)3, !" !# !$ D)3, (& (* 2+3, D$3' (' 2)3' 0 (+ (- D#3+ 2*3) D'3" - (. %&' () %$ 2&3* ;>:="<10 &2++ %&+ 1& (* (+ %$ %& %) !" !$DED!8=8$DD+3') !# !$ 2&3, . (, ()/ !*+ D#3# 2*3& D"3" %&+ ()) 2,3* (% ()* %&+ D&3' 2)3' D+3+ ()+ 2+3"+ !"# (), !"#"$ %)' 4567/6. 68986:8 (* %) %"&'()*+&"$ %"&,+-$ (, () %&' %&* D)3# D,3* (8/6:5=?@0!65@A) (8/6:5=BCC DF3& 2&3& 0;.;68-<;5=/> (8/6:5=?@0!65@A& %** (& &3++ 2&3* 2*3) 2,3* DF3" (' 2)3* $& $' 0123456" 0123A1# <"2:5;78=":=5"A 0123456" 2+3"+ ;>:="<10 $% &3++ %& (& @0";5"@ 8=":41#@ $& $' D$3" () 2)3" %) DF3* / <"2:5;78=":=5"A 1& 0123A1# 0 +2++ (& @0";5"@ $% (, () . 2*3) 2&3* D*3+ (% %&+ / D)3) DF3" 2)3, !" !# !$ D)3, (& (* 2+3, D$3' (' 2)3' 0 (+ (- D#3+ 2*3) D'3" - (. %&' () %$ 2&3* ;>:="<10 &2++ %&+ 1& (* (+ %$ %& %) !" !$DED!8=8$DD+3') !# !$ 2&3, . (, ()/ !*+ D#3# 2*3& D"3" %&+ ()) 2,3* (% ()* %&+ D&3' 2)3' D+3+ ()+ 2+3"+ !"# (), !"#"$ %)' 4567/6. 68986:8 (* %) %"&'()*+&"$ %"&,+-$ (, () %&' %&* D)3# D,3* (8/6:5=?@0!65@A) (8/6:5=BCC DF3& 2&3& 0;.;68-<;5=/> (8/6:5=?@0!65@A& %** (& &3++ 2&3* 2*3) 2,3* DF3" (' 2)3* $& $' 0123456" 0123A1# <"2:5;78=":=5"A 0123456" 2+3"+ ;>:="<10 $% &3++ %& (& @0";5"@ 8=":41#@ $& $' D$3" () 2)3" %) DF3* / <"2:5;78=":=5"A 1& 0123A1# 0 +2++ (& @0";5"@ $% (, () . 2*3) 2&3* D*3+ (% %&+ / D)3) DF3" 2)3, !" !# !$ D)3, (& (* 2+3, D$3' (' 2)3' 0 (+ (- D#3+ 2*3) D'3" - (. %&' () %$ 2&3* ;>:="<10 &2++ %&+ 1& (* (+ %$ %& %) !" !$DED!8=8$DD+3') !# !$ 2&3, . (, ()/ !*+ D#3# 2*3& D"3" %&+ ()) 2,3* (% ()* %&+ D&3' 2)3' D+3+ ()+ 2+3"+ !"# (), !"#"$ %)' 4567/6. 68986:8 (* %) %"&'()*+&"$ %"&,+-$ (, () %&' %&* D)3# D,3* (8/6:5=?@0!65@A) (8/6:5=BCC DF3& 2&3& 0;.;68-<;5=/> (8/6:5=?@0!65@A& %** (& &3++ 2&3* 2*3) 2,3* DF3" (' 2)3* $& $' 0123456" 0123A1# <"2:5;78=":=5"A 0123456" 2+3"+ ;>:="<10 $% &3++ %& (& @0";5"@ 8=":41#@ $& $' D$3" () 2)3" %) DF3* / <"2:5;78=":=5"A 1& 0123A1# 0 +2++ (& @0";5"@ $% (, () . 2*3) 2&3* D*3+ (% %&+ / D)3) DF3" 2)3, !" !# !$ D)3, (& (* 2+3, D$3' (' 2)3' 0 (+ (- D#3+ 2*3) D'3" - (. %&' () %$ 2&3* ;>:="<10 &2++ %&+ 1& (* (+ %$ %& %) !" !$DED!8=8$DD+3') !# !$ 2&3, . (, ()/ !*+ D#3# 2*3& D"3" %&+ ()) 2,3* (% ()* %&+ D&3' 2)3' D+3+ ()+ 2+3"+ !"# (), !"#"$ %)' 4567/6. 68986:8 (* %) %"&'()*+&"$ %"&,+-$ (, () %&' %&* D)3# D,3* (8/6:5=?@0!65@A) (8/6:5=BCC DF3& 2&3& 0;.;68-<;5=/> (8/6:5=?@0!65@A& %** (& &3++ 2&3* 2*3) 2,3* DF3" (' 2)3* $& $' highlights - Pathline - multiple genes, time points, species, and pathways - linearized pathway representation - curvemap - tool deployment open source - used daily by several collaborators - www.pathline.org Challenge: Scale - Example: The Cancer Genome Atlas (NCI & NHGRI) - around 20 cancer types - at least 500 patients per cancer type - all: mRNA transcript expression levels, copy number variation, SNPs, microRNA expression levels, methylation, clinical data - some: whole genome sequencing - so far: more than 4.5 TB of data (for around 3000 patients) - tools for remote visualization needed - (new visualization methods needed as well!) Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 177 Challenge: Scale - access to public biological data sets typically through web-interfaces - large-scale data sets are analyzed on compute clusters or cloud infra structures - web browser-based visualization options: - browser plugins: Java Applets and Adobe Flash - native support: Scalable Vector Graphics, HTML5 Canvas, WebGL - alternative: desktop applications with client/server architecture (e.g. IGV, GenomeView) Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 178 Visualization Toolkits for the Web: Examples - Java applets: Processing, Prefuse* - Flash: Flare* - JavaScript - SVG: Google Chart Tools*, Flot*, ProtoVis*, Raphael, TheJIT - HTML5 Canvas: Three.js, ProcessingJS - WebGL: Three.js, PhiloGL * indicates high-level visualization library Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 179 Collaboration: Web-based Tools Payao www.payaologue.org Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 180 Collaboration: Web-based Tools WikiPathways www.wikipathways.org Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 181 Collaboration: Web-based Tools IBM Many Eyes manyeyes.alphaworks.ibm.com Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 182 Keim et al. 2009,Visual Data Mining:Theory,Techniques and Tools for Visual Analytics Visual Analytics Builds the bridge between Visualization and Analytical Reasoning Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 183 Visual Analytics - formation of abstract visual metaphors in combination with human interaction - enables detection of the expected and discovery of the unexpected within massive, dynamically changing information spaces - knowledge is gained from visualization, automatic analysis, as well as the preceding interactions between visualizations, models, and the human analysts Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 184 Visual Analytics of Biological Data - Biological data is very heterogeneous, complex and often very large - Visualization of biological data plays a central role - Complemented with computational analysis methods and interaction accelerates process of gaining insight of biological processes and modeling them - Applications in all areas of biology where large amounts of heterogeneous data need to be interpreted Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 185 Acknowledgements Tamara Munzner (University of British Columbia, Canada) gave us permission to use slides from her talks. Kay Nieselt (University of Tübingen, Germany) gave us permission to use slides and helped to design an earlier version of this course. Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 186 Resources Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer Scientific and Information Visualization - Scientific Visualization (“scivis”) and Information Visualization (“infovis”) are very illdefined terms - Scientific Visualization is often used to describe visualization of data that is intrinsically spatial (such as medical imaging data, fluid flows or protein structures) - Information Visualization is typically used to describe visualization of abstract data (such as gene expression data or interaction networks) - there is plenty of overlap and the separation is quite arbitrary - both Scientific and Information Visualization are used to visualize scientific data Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 188 Recommended Books Information Visualization - Perception for Design Colin Ware, Morgan Kaufmann, 2004 Information Visualization - Using Vision to Think Stuart K Card, Jock D Mackinlay, Ben Shneiderman, Morgan Kaufmann, 1999 The Visual Display of Quantitative Information (2nd Edition) Edward R Tufte, Graphics Press, 2001 Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 189 Recommended Books Fundamentals of Computer Graphics (3rd Edition) Peter Shirley, Steve Marschner, AK Peters Publishers, 2009 (in particular: “Chapter 27 - Visualization”, also as free PDF from Tamara Munzner’s website) The Non-Designer’s Design Book (3rd Edition) Robin Williams, Peachpit Press, 2008 Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 190 Recommended Resources on Color A Field Guide to Digital Color Maureen C Stone, AK Peters Publishers, 2003 ColorBrewer 2.0 Cynthia Brewer, Mark Harrower, http://www.colorbrewer2.org VisCheck http://www.vischeck.com Color Oracle http://colororacle.cartography.ch Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 191 Recommended Journals Nature Methods Special Issue on Visualizing Biological Data http://www.nature.com/nmeth/journal/v7/n3s IEEE Transactions on Visualization and Computer Graphics http://www.computer.org/portal/web/tvcg IEEE Computer Graphics and Applications http://www.computer.org/portal/web/cga/home Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 192 Recommended Meetings IEEE Symposium on Biological Data Visualization - BioVis www.biovis.net Workshop on Visualizing Biological Data - VIZBI www.vizbi.org IEEE VisWeek with InfoVis,Vis and VAST Conferences www.visweek.org Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 193 Tools for Interaction Network Visualization Name Stand-alone Arena 3D BiNA BioLayout Express 3D BiologicalNetworks 2 Cytoscape GENeVis Medusa NBrowse NAViGaTOR Ondex Osprey Pajek ProViz SpectralNET Tulip VANTED yEd Cytoscape Plug-ins BiNoM BioModules Cerebral MCODE VistaClara Web-based Graphle Lichen MAGGIE Data Viewer STITCH 2 VisANT Cost Availability Description URL Free Free Free Free Free Free Free Free Free Free Free Free Free Free Free Free Free Win Mac Linux Win Mac Linux Win Mac Linux Win Mac Linux Win Mac Linux Win Mac Linux Win Mac Linux Win Mac Linux Win Mac Linux Win Mac Linux Win Mac Linux Win Win Mac Linux Win Win Mac Linux Win Mac Linux Win Mac Linux Visualization of biological multi-layer networks in 3D Exploration and interactive visualization of pathways Generation and cluster analysis of networks with 2D/3D visualization Analysis suite; visualizes networks and heat map; maps abundance data Network analysis; extensive list of plug-ins for advanced visualization Network and pathway visualization; abundance data Basic network visualization tool Network visualization software for heterogeneous interaction data Visualization of large protein-protein interaction data sets; abundance data Integrative workbench; large network visualizations; abundance data Tool for visualization of interaction networks Generic network visualization and analysis tool Software for visualization and exploration of interaction networks Network visualizations; scatter plots for dimensionality reduction methods Generic visualization and analysis tool; extremely large networks; 3D support Combined visualization of abundance data and pathways Generic network visualization software; offers many layout algorithms. http://www.arena3d.org http://www.bnplusplus.org/bina http://www.biolayout.org http://www.biologicalnetworks.org http://www.cytoscape.org http://tinyurl.com/genevis http://coot.embl.de/medusa http://www.gnetbrowse.org http://tinyurl.com/navigator1 http://www.ondex.org http://tinyurl.com/osprey1 http://pajek.imfm.si http://tinyurl.com/proviz http://tinyurl.com/spectralnet http://tulip.labri.fr/TulipDrupal http://tinyurl.com/vanted http://tinyurl.com/yEdGraph Free Free Free Free Free Win Mac Linux Win Mac Linux Win Mac Linux Win Mac Linux Win Mac Linux Extensive support for common systems biology network formats Detects modules in networks; maps abundance data onto nodes and modules Biologically motivated layout algorithm; maps abundance data; clustering Network clustering algorithm; support for manual cluster refinement Mapping of abundance data to nodes and “heat strips”; provides heat map http://tinyurl.com/binom1 http://tinyurl.com/biomodules http://tinyurl.com/cerebral1 http://preview.tinyurl.com/MCODE123 http://www.cytoscape.org/plugins Win Mac Linux Distributed client/server network exploration and visualization tool Library for web-based visualization of network and abundance matrix data Visualization of networks; abundance data in heat maps and profile plots Construction and visualization of networks from a wide range of sources Analysis, mining and visualization of pathways and integrated omics data http://tinyurl.com/graphle http://tinyurl.com/Lichen1 http://maggie.systemsbiology.net http://stitch.embl.de http://visant.bu.edu Free Free Free Free Free Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 194 Tools for Pathway Visualization Name Stand-alone BioTapestry Caleydo CellDesigner Edinburgh Pathway Editor GenMAPP 2 IngenuityPathways JDesigner KaPPA View KEGG Atlas MetaCore PathVisio VitaPad Web-based ArrayXPath GEPA iPath MapMan Omics Viewer Pathway Explorer PATIKA Payaologue ProMeTra Reactome SkyPainter WikiPathways Cost Availability Description URL Free Free Free Free Free $ Free Free Free $ Free Free Win Mac Linux Win Linux Win Mac Linux Win Mac Linux Win Win Mac Linux Win Win Win Mac Linux Win Mac Linux Win Mac Linux Win Mac Linux Visualization of genetic regulatory networks, also with experimental data. Interactive framework for pathway and expression data; 3D “bucket” view Drawing and simulation of pathways and models, supports SBGN Construction and visualization of pathway diagrams, supports SBGN Pathway visualization and construction; abundance data Full analysis suite; network and pathway visualizations; abundance data. Drawing and simulation of pathways and models Analysis and visualization of plant pathways and mapped abundance data Visualization of abundance data on interactive KEGG pathways Pathway, network and omics data analysis and visualization suite Visualization and editing pathways, supports mapping of omics data Editing of pathway diagrams, integration of abundance data http://www.biotapestry.org http://www.caleydo.org http://www.celldesigner.org http://tinyurl.com/EdinburghPE http://www.genmapp.org http://tinyurl.com/IngenuityPath http://tinyurl.com/jdesigner http://tinyurl.com/kappa-view http://www.genome.jp/kegg http://www.genego.com http://www.pathvisio.org http://tinyurl.com/vitapad Mapping of abundance data to pathway visualizations Analysis suite; visualization of transcriptomics data on pathways maps Visualization and exploration of combined KEGG pathways Application that visualizes abundance data on metabolic pathways Tool that maps abundance data to BioCyc pathway diagrams Visualization of abundance data on pathways Extensive pathway visualization tool; good support for signaling pathways Collaborative pathway annotation and visualization tool Maps abundance matrices of multiple omics data types on pathways Visualization of overrepresented pathways and reactions from gene lists Wiki-based, community-driven pathway curation and visualization tool http://tinyurl.com/ArrayXPath http://tinyurl.com/GEPAT1 http://pathways.embl.de http://tinyurl.com/MapManApp http://www.biocyc.org http://tinyurl.com/pathwayexp http://www.patika.org http://celldesigner.org/payao http://tinyurl.com/ProMeTra http://reactome.org http://www.wikipathways.org Free Free Free Free Free Free Free Free Free Free Free Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 195 Tools for Visualization of Multivariate Data Name Stand-alone BicOverlapper BiGGEsTS Brain Explorer Caryoscope Data Matrix Viewer EXPANDER GENESIS GeneSpring GX GeneVAnD geWorkbench Hierarchical Clustering Explorer Java TreeView Mayday MultiExperiment Viewer PointCloudXplore Spotfire Functional Genomics TimeSearcher R/BioConductor Geneplotter Web-based ExpressionProfiler GenePattern Cost OS Description URL Free Free Free Free Free Free Free $ Free Free Free Free Free Free Free $ Free Win Mac Linux Win Mac Linux Win Mac Win Mac Linux Win Mac Linux Win Linux Win Mac Linux Win Mac Linux Win Mac Linux Win Mac Linux Win Win Mac Linux Win Mac Linux Win Mac Linux Win Mac Linux Win Win Visualization of biclusters combined with profile plots and heat maps Heat map-based bicluster visualization Visualization of 3D transcription data in the central nervous system Abundance data mapped to chromosomal location Simple profile plot visualization; supports Gaggle Heat maps, scatter plots and profile plots of cluster averages Analysis suite; offers several interactive visualizations Analysis suite; interactive and linked visualizations; also networks Linked heat maps, dendrograms and 2D/3D scatter plots Modular suite; heat maps, dendrograms, profile and scatter plots Linked heat map, profile and scatter plots; systematic exploration Linked heat maps, karyoscopes, sequence alignments, scatter plots Modular suite; many linked visualizations; enhanced heat map113 Analysis suite; heat maps, dendrograms, profile and scatter plots Visualization of 3D transcription data in Drosophila embryos Analysis suite; many linked visualizations and exploration tools Exploration and analysis of time series; advanced profile plots http://vis.usal.es/bicoverlapper/ http://tinyurl.com/BiGGEsTS http://tinyurl.com/brainExplorer http://tinyurl.com/caryoscope http://gaggle.systemsbiology.net http://acgt.cs.tau.ac.il/expander http://genome.tugraz.at http://tinyurl.com/genespring http://tinyurl.com/GeneVAnD http://tinyurl.com/geWorkbench http://tinyurl.com/HCExplorer http://jtreeview.sourceforge.net http://tinyurl.com/maydaywp http://www.tm4.org http://tinyurl.com/PointCloudXplore http://spotfire.tibco.com http://tinyurl.com/timesearcher Free Win Mac Linux Karyoscope-style plots and other visualizations http://www.bioconductor.org Transcriptomics data analysis suite with basic visualizations Modular analysis platform; several visualization modules available http://tinyurl.com/exprespro http://tinyurl.com/GenePatt Free Free Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer 196