Visualization of Large-Scale Biological Data Nils Gehlenborg () Miriah Meyer ()

advertisement
Visualization of
Large-Scale Biological Data
Nils Gehlenborg (nils@hms.harvard.edu)
Center for Biomedical Informatics, Harvard Medical School
Miriah Meyer (miriah@seas.harvard.edu)
School of Engineering and Applied Sciences, Harvard University
Bio-IT World 2011
Instructors: Nils Gehlenborg
- background in bioinformatics, PhD thesis on visualization and exploration of gene
expression data
- Research Associate at Center for Biomedical Informatics at Harvard Medical School;
Associated Researcher at the Broad Institute, working on The Cancer Genome Atlas
project
- research interests in information visualization, machine learning, information retrieval
applied to large-scale biological data sets
- IEEE Symposium on Biological Data Visualization (http://www.biovis.net); Workshop on
Visualization of Biological Data (http://www.vizbi.org)
- developer of various software tools and visualization methods for transcriptomics and
proteomics data
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
2
Instructors: Miriah Meyer
- background in computer science, PhD thesis on processing and visualizing threedimensional data
- Postdoctoral Research Fellow in the School of Engineering and Applied Sciences at
Harvard University, focusing on visualizing genomics and molecular biology data
- Visiting scientist at the Broad Institute of MIT and Harvard, cofounder of the Data
Visualization Initiative
- research interests in visualization and human-computer interaction applied to
complex biological data sets
- developer of various software tools: MizBee (www.mizbee.org); Pathline
(www.pathline.org); MulteeSum (www.multeesum.org)
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
3
Participants: Who are you?
- Where do you work? In industry or academia?
- What is your primary field? Biology? Bioinformatics? Computer Science?
- What is your job title?
- What is your relationship to visualization software? Are you a user or a developer?
- What do you hope to learn today?
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
4
What is this course about?
- challenges of large, biological data sets:
- scale: store, process and access
- heterogeneity: interpret and integrate
- course:
- how to use visual representations to interpret complex data sets
- starting with basic principles through examples for biological data types
- pointers to resources
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
5
Overview
1. Principles of Visualization
1. Visual Representation
2. Multiple Views
3. Design Process
2. Key Methods and Software Tools
1. Applications for Visualization
2. Methods and Tools
3. Design of Visualization Systems
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
6
Part 1
Principles of Visualization
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
Engadget
Exercise: Critique
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
8
U.S. SmartPhone Marketshare
21.2%
39.0%
RIM
Apple
Palm
Motorola
Nokia
Other
3.1%
7.4%
9.8%
19.5%
Definition: Visualization
The use of computer-supported, interactive, visual
representations of data to amplify cognition.
Card, Mackinlay & Shneiderman 1999
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
10
1.1
Visual Representation
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
slide adapted from Munzner 2011,Visualization Principles
Visual Encoding of Data
data
tabular
ordered
categorical
ordinal
apples
oranges
bananas
small
medium
large
quantiative
relational
spatial
10 inches
13 inches
18.5 inches
trees
networks
intrinsic
position
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
12
slide from Munzner 2011,Visualization Principles
Visual channel types and rankings
14
slide from Munzner 2011,Visualization Principles
Power of the plane: only position works for all!
15
slide from Munzner 2011,Visualization Principles
Ranking differs for all other channels
16
Where do rankings come from?
-
user studies, psychophysical experiments, principles from graphic design
-
accuracy, discriminability, separability, popout
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
"
#
$%
'%
$&
'&
'(
172884798:7;8<01:125="79>6;2586147?"68<5;@
C,1,
Using
Rankings
@0";5"@
;>:="<10
'&
'%
$%
$&
8=":41#@
')
'(
0,1"
with highest
172884798:7;8<01:125="79>6;2586147?"68<5;@
- Effectiveness Principle: encode most important data attributes
C%1'
C,1,
ranked channels
;5"@
;>:="<10
'%
'&
8=":41#@
')
'(
$ %
$
= %
(
= >
?
= @
low
$%
$&high
98:7;8<01:125="79>6;2586147?"68<5;@
C,1,
time
;>:="<10
8=":41#@
$%
'%
$&
'&
'(
')
value
"#%
'&
'&
'(
'(
')
')
'*
0,1"
021(
C%1'
C21)
'+
0,1"
021(
0'1%
C%1'
C21)
C'1(
0,1"
021(
0'1%
021#
C%1'
C21)
C'1(
C)1D
0'1%
021#
C,1,
'%
'*
'*
'*
'+
021(
'+
',
',
'-
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
C21)
C'1(
C)1D
0)1)
C)1#
Using Rankings
Year 1
A
Year 2
B
C
D
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
19
Using Rankings
Year 1
Year 2
27
27
18
18
9
9
0
0
A
B
C
D
A
B
C
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
D
20
Visual Encoding of Data
data
tabular
ordered
categorical
ordinal
quantiative
relational
spatial
abstract
-
using spatial encoding for spatial data versus abstract (nonspatial) data
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
21
Common Pitfalls
1. Color
2. 3D
3. Animation
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
22
after Borland & Taylor 2007, IEEE CG&A
1. Color Pitfalls: Rainbow Color Map
hard to order
easy to order
lower resolution
creates artifacts
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
23
Rogowitz & Treinish 1996, http://researchweb.watson.ibm.com/people/l/lloydt/color/color.HTM
1. Color Pitfalls: Rainbow Color Map
Southeastern United States and Gulf of Mexico
Problems:
- zero crossing not explicit
- lack of ordering of colors makes it hard to interpret the map
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
24
Wong 2010, Nature Methods
1. Color Pitfalls: Relativity
Color is a relative medium and context matters
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
25
Only 6-12 colors are visually discernable
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
Cinteny: flexible analysis and visualization of synteny and genome
rearrangements in multiple organisms. Sinha and Meller. Bioinformatics 2007
1. Color Pitfalls: Discriminability
26
estimate: Howard Hughes Medical Institute, http://www.hhmi.org/senses/b130.html
1. Color Pitfalls: Color Blindness
Normal Vision
Deuteranope Vision
(“Red-Green Blindness”)
~ 7% of male population affected
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
27
Ware 2008,Visual Thinking for Design
2. 3D
-
spatial encoding ranking for planar spatial position, not depth
-
how we see in 3D:
-
-
rapid eye-movement
-
head and body movements
legitimate for 3D spatial data, difficult to justify for abstract data
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
28
Moore et al. 2011, Proceedings of Pacific Symposium on Biocomputing 2011
2. 3D Pitfalls: Perspective
- perspective distortion: interferes with size channel encoding
- shading: interferes with color, lightness, and saturation channel encodings
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
29
BioLayout 3D 2.0 sample dataset, http://www.biolayout.org
2. 3D Pitfalls: Occlusion
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
30
2. 3D Pitfalls: Text Legibility
Mukherjea and Foley 1995, Visualizing the World-Wide Web
with the Navigational View Builder.
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
31
3. Animation
-
-
external versus internal memory
-
easy to compare by moving eyes between views
-
hard to compare view to memory of what you saw
when to use animation?
-
good: chronological storytelling
-
good: transition between states
-
poor: multiple states with multiple changes
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
32
3. Animation Pitfall
Global comparisons are difficult
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
33
Barsky et al. 2008, Cerebral:Visualizing Multiple Experimental Conditions on a Graph with Biological Context
3. Animation Pitfall
Small Multiples: one view per state
- show time with space
Barsky, Munzner, Gardy, Kincaid 2008, Cerebral:Visualizing Multiple Experimental Conditions on a Graph with Biological Context.
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
34
1.2
Multiple Views
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
Roberts 2007, Coordinated and Multiple Views in Exploratory Visualization
Linked Views
-
beyond static views, multiple linked views
-
“allow the user to have a dialog with the data”
-
technique that allows for data exploration
-
interactive, multiple views of the data
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
36
large−pse outliers
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
images courtesy of Angela DePace and Charles Fowlkes
large−pse outliers
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
images courtesy of Angela DePace and Charles Fowlkes
Meyer et al. 2010, MulteeSum: A Tool for Comparative Spatial and Temporal Gene Expression Data
1.3
Design Process
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
target
translate
design
implement
validate
user-centered design
usability engineering
participatory design
evaluate
target
translate
design
implement
validate
user-centered design
usability engineering
participatory design
evaluate
target
translate
design
implement
validate
user-centered design
usability engineering
participatory design
evaluate
target
translate
user-centered design
usability engineering
participatory design
design
evaluate
design
implement
validate
target
user-centered design
usability engineering
translate
translate
design
implement
validate
participatory design
evaluate
user-centered design
target
usability engineering
participatory design
translate
design
evaluate
implement
validate
validate
Carpendale 2008, Evaluating Information Visualizations
Validation Methods
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
44
Engadget
Exercise: Critique
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
45
U.S. SmartPhone Marketshare
21.2%
39.0%
RIM
Apple
Palm
Motorola
Nokia
Other
3.1%
7.4%
9.8%
19.5%
U.S. SmartPhone Marketshare
21.2%
RIM
Apple
Palm
Motorola
Nokia
Other
39.0%
3.1%
7.4%
9.8%
19.5%
U.S. SmartPhone Marketshare
40%
39.0
30%
20%
21.2
19.5
10%
9.8
7.4
0%
3.1
RIM
Other
Apple
Palm
Motorola
Nokia
Part 2
Key Methods and
Software Tools
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
Applications for Data Visualization
1. Presentation
“A picture is worth a thousand words.”
“A good sketch is better than a long speech.” (Napoleon Bonaparte)
2. Confirmation
“I believe it when I see it.”
3. Exploration
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
50
Minard 1869
Presentation: March on Moscow
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
51
Anscombe 1973,The American Statistician
Confirmation: Anscombe’s Quartet
mean(X) = 9, var(X) = 11, mean(Y) = 7.5, var(Y) = 4.12,
cor(X,Y) = 0.816, linear regression line Y = 3 + 0.5*X
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
52
Anscombe 1973,The American Statistician
Confirmation: Anscombe’s Quartet
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
53
Exploration: Hypothesis Generation
trends
gaps
outliers
clusters
- A large data set is given and the goal is to learn something about it.
- Visualization is employed to perform pattern detection using the human visual system.
- The goal is to generate hypotheses that can be tested with statistical methods or
follow-up experiments.
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
54
Exploration: Hypothesis Generation
- Visualization for exploration is an “Exploratory Data Analysis” technique (Tukey
1977). Statistical graphics such as box plots and scatter plots are early examples.
- When there is a specific question that can easily be determined algorithmically
(“What is the highest value?”), then visualization is usually not the right tool.
- When it is not clear what should be asked or when the answer can not be
summarized easily (“What is the distribution of the values?”), then visualization is an
excellent choice.
- Visualization for exploration is challenging because the data sets are
getting bigger and bigger and more heterogeneous.
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
55
Shneiderman 1996, in Proceedings IEEE Symposium on Visual Languages
Exploration: Information Seeking Mantra
- In explorative settings the user is normally dealing with large amounts of data.
- Impossible to grasp everything at once.
- Solution: Make visualizations interactive to support the user in exploring subsets of
the data at different resolutions.
- Ben Shneiderman’s Information Seeking Mantra:
- Overview first, zoom and filter, then details on demand.
- Overview first, zoom and filter, then details on demand.
- Overview first, zoom and filter, then details on demand.
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
56
2.1
Key Methods and
Software Tools
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
Biological Data Types
- Experimental data and knowledge
- Detail and overview
- example: 3D structure of a protein versus metabolome map of an organism
- Complex relationships
- example: gene expression data, protein-DNA interactions, sequence motifs
Biological data is heterogeneous, complex and
often very large!
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
58
Exercise: Biological Data Types
data
tabular
ordered
categorical
ordinal
protein structure
gene expression data
genome sequence
quantiative
relational
spatial
pathway
phylogeny
sequence alignment
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
59
Overview: Biological Data Types
Sequences
genes, alignments, genomes
Multivariate Data
gene and protein expression levels, metabolite concentrations
Networks
protein interactions, gene regulation, metabolic pathways
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
60
http://host13.bioinfo3.ifom-ieo-campus.it/fancygene/
Sequences: Genes
- Genes are linear sequences, nucleotide or amino acid alphabet
- Visualization of primary sequence and additional annotation data (e.g.
gene architecture, isoforms)
NM_000546 gattggggttttcccctcccatgtgctcaagactggcgctaaaagttttgagcttctcaaaagtctagagc
NP_000537 MEEPQSDPSVEPPLSQETFSDLWKLLPENNVLSPLPSQAMDDLMLSPDDIEQWFTEDPGPDEAPRMPEAA
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
61
Sequences: Alignments
A multiple sequence alignment is basically a matrix:
! rows correspond to sequences
! columns correspond to aligned sites
gi|40254597|ref|NM_009871.2|
gi|16758761|ref|NM_053891.1|
gi|34304373|ref|NM_003885.2|
gi|114668067|ref|XM_001158783.
gi|297700499|ref|XM_002827237.
gi|297272338|ref|XM_001113136.
gi|296202033|ref|XM_002748377.
gi|301753157|ref|XM_002912351.
gi|73966855|ref|XM_548274.2|
gi|194217295|ref|XM_001501617.
gi|148539985|ref|NM_174512.3|
gi|162287175|ref|NM_001101816.
gi|291405555|ref|XM_002718944.
gi|126313846|ref|XM_001368043.
gi|149599543|ref|XM_001510922.
gi|301626334|ref|XM_002942302.
gi|147905168|ref|NM_001085672.
gi|50540099|ref|NM_001002515.1
**
* *
* ** ** ** * ** **
*
* ** ** ** ** * ** ** ** *********** ** ** ** *
TCTGAG---GTGGGCTCCGACCATGAGCTCCAGGCTGTCCTGCTGACCTGTCTGTACCTCTCCTATTCCTACATGGGCAATGAGATCTCCT
TCTGAG---GTGGGCTCGGACCACGAGCTCCAGGCTGTCCTGCTGACCTGTCTGTACCTCTCCTATTCCTACATGGGCAATGAAATCTCCT
TCCGAG---GTGGGCTCGGATCACGAGCTCCAGGCCGTCCTGCTGACATGCCTGTACCTCTCCTACTCCTACATGGGCAACGAGATCTCCT
TCCGAG---GTGGGCTCGGATCACGAGCTCCAGGCCGTCCTGCTGACATGCCTGTACCTCTCCTACTCCTACATGGGCAACGAGATCTCCT
TCCGAG---GTGGGCTCGGATCACGAGCTCCAGGCCGTCCTGCTGACATGCCTGTACCTCTCGTACTCCTACATGGGCAACGAGATCTCCT
TCCGAG---GTGGGCTCGGATCACGAGCTCCAGGCCGTCCTGCTGACGTGCCTGTACCTCTCCTACTCCTACATGGGCAACGAGATCTCCT
TCGGAG---GTGGGCTCAGATCACGAGCTCCAGGCCGTCCTGCTGACCTGCCTGTACCTCTCCTACTCCTACATGGGCAACGAAATCTCCT
TCCGAG---GTGGGCTCGGACCACGAGCTCCAGGCCATCCTGCTGACCTGCCTGTACCTCTCCTACTCCTACATGGGCAACGAGATCTCCT
TCCGAG---GTGGGCTCGGACCACGAGCTCCAGGCCATCCTGCTGACCTGCCTGTACCTCTCCTACTCCTACATGGGCAACGAGATCTCCT
TCTGAG---GTGGGCTCCGACCACGAGCTCCAGGCTGTCCTGCTGACCTGCCTGTACCTCTCCTACTCCTACATGGGCAACGAGATCTCCT
TCCGAG---GTGGGTTCCGACCACGAGCTCCAGGCGGTCCTGCTGACCTGCCTGTACCTCTCCTACTCCTACATGGGCAACGAGATCTCCT
TCCGAG---GTGGGCTCCGACCACGAGCTCCAGGCTGTCCTGCTGACCTGCCTGTACCTTTCCTACTCCTACATGGGCAACGAGATCTCCT
TCCGAG---GTGGGCTCGGACCACGAGCTCCAGGCCGTGCTGCTGACCTGCCTGTACCTCTCCTACTCCTACATGGGCAACGAGATCTCCT
TCTGAG---GTCGCCACGGACCATGAGCTACAGGCTGTCCTGTTGACCTGCCTGTACCTCTCCTATTCCTACATGGGCAATGAGATCTCCT
CCCGAG---CTGGCCGCCGACCACGAGCTGCAGGCCGTCCTGCTCACCTGCCTCTACCTGTCCTACTCCTACATGGGCAACGAGATCTCCT
GGGGACTCTGTGGCCACCGAACATGACTTGCAAGCCACCCTCTTGACCTGCCTCTATTTGTCCTACTCCTACATGGGCAACGAGATATCCT
GGGGACTCTGTGGCCACCGAACATGACTTGCAAGCCACCCTTCTAACCTGCCTCTACTTGTCCTACTCTTACATGGGCAACGAGATATCCT
TCTGAG---GTGGCCACAGAGCACGAGCTGCAGGCCGTCCCGCTGACCTGCCTCTACCTGTCTTACTCATACATGGGCAATGAGATCTCGT
..1150......1160......1170......1180......1190......1200......1210......1220......1230.....
88
88
88
88
88
88
88
88
88
88
88
88
88
88
88
91
91
88
Example from ClustalW http://www.clustal.org/
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
62
http://www.jalview.org
Sequences: Alignments
Amino acid sequence alignment with amino acid color code
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
63
http://www.jalview.org
Sequences: Alignments
Amino acid sequence alignment with hydrophobicity color code
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
64
Sequences: Alignments
Tools to visualize alignments need to support
1. Computation of alignments
2. Various color maps for nucleotides / amino acids / chemical properties
3. Editing and analysis of the sequences
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
65
Sequences: Genomes
- Raw data
- reads from sequencing
- Primary data
- DNA sequence: chromosomes are either linear or circular
- Annotation
- proteins
- gene models (exon-intron structure etc.)
- ontology
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
66
Sequences: Genome “Browsers”
- display genomic data in a “position-centric” view
- genome serves as reference for positions
- usually track-based
- varying levels of interactivity
- browsing vs exploration
- web-browser-based or desktop applications
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
67
http://genome.ucsc.edu
Sequences: UCSC Genome Browser
- most commonly used browser
- supports basically any data type that
can be mapped to the genome
- “classic implementation”: images are
rendered on the server and
embedded in the webpage
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
68
http://genome.ucsc.edu
Sequences: UCSC Genome Browser
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
69
http://genome.ucsc.edu
Sequences: UCSC Genome Browser
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
69
Sequences: UCSC Genome Browser
“squished”
“dense”
“packed”
“full”
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
70
http://www.genomeview.org
Sequences: GenomeView
- next-generation genome browser
- annotation editor: sequences,
annotation, multiple alignments,
syntenic mappings, short read
alignments and more can be
displayed
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
71
http://www.broadinstitute.org/igv
Robinson et al. 2011, Nature Biotechnology
Sequences: Integrative Genomics Viewer (IGV)
- visualization tool for interactive
exploration of large, integrated
datasets.
- supports a wide variety of data
types including sequence
alignments, expression data, copy
number variation, RNA-seq,
annotations
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
72
Robinson et al. 2011, Nature Biotechnology
Sequences: Integrative Genomics Viewer (IGV)
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
73
Robinson et al. 2011, Nature Biotechnology
Sequences: Integrative Genomics Viewer (IGV)
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
74
http://www.savantbrowser.com
Fiume et al. 2010, Bioinformatics
Sequences: Savant Genome Browser
reference sequence
read coverage
reads
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
75
http://www.savantbrowser.com
Fiume et al. 2010, Bioinformatics
Sequences: Savant Genome Browser
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
76
http://www.savantbrowser.com
Fiume et al. 2010, Bioinformatics
Sequences: Savant Genome Browser
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
77
http://www.savantbrowser.com
Fiume et al. 2010, Bioinformatics
Sequences: Savant Genome Browser
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
78
http://www.savantbrowser.com
Fiume et al. 2010, Bioinformatics
Sequences: Savant Genome Browser
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
79
http://www.savantbrowser.com
Fiume et al. 2010, Bioinformatics
Sequences: Savant Genome Browser
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
80
http://www.savantbrowser.com
Fiume et al. 2010, Bioinformatics
Sequences: Savant Genome Browser
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
81
http://mkweb.bcgsc.ca/circos
Krzywinski et al. 2009, Genome Research
Sequences: Circos
Clark et al. 2009, PLoS Genetics
Jones et al. 2010, Genome Biology
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
82
Sequences: Comparative Genomics
%2$5
!"#$%&'()#*+,
-&!./,+./",'(0/1+$-
%2$>
%2$=
%2$3
%2$5
$
%2
3<E?
44
%2
$4
%2
3
<
%2
$3
;
%2$2
%2$A
%2$@
%2$%2$%
?
%2$
$+
%2
$4
%2
3:
$
%2
%2
$
%2$4
734:<365 374<<::7<
%2$
$4
5
8
%2$
%2$38
%2$
7
%2$
39
5
%2$6
%2$3
%2$37
%2$7
6
6
%2$
%2$3
739<;3:; 37456745;
$35
%2
"#.
/,
(/,F&$.
$8
$4
5
4
$3
%2
%2
%2
%2$
%2
$3
3
B/,&
!+.#$+./",
C
D
%2
%2$3
<
%2$;
%2$:
$9
A"(."'(
http://www.mizbee.org
"$/&,.+./",'
*+.%2
/,F&$!/",
http://genome.lbl.gov/vista
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
83
http://compbio.med.harvard.edu/flychromatin/
Kharchenko et al. 2010, Nature
Sequences: Epigenomics
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
84
Lieberman-Aiden et al. 2009, Science
Sequences: 3D Genome Structure
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
85
Multivariate Data
- typical “omics” data: transcriptomics, proteomics, metabolomics
- expression/concentration levels of many biological entities (transcripts, proteins, etc.)
across many different conditions/time points
- entity levels measured per sample on a “genome-wide” scale
- often entities are not measured directly
Entity B
Level
Level
Entity A
Conditions
Conditions
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
86
Multivariate Data
Interaction Networks and Pathways integrated with Expression and Concentration Data
Metabolite
Map
Peptide
Map
NMR
Spectra
Mass
Spectra
Protein
Map
1D
Gene Expression
Matrix
2D
Microarray
Image
Graph
Protein Expression
Matrix
Insight
Metabolite
Concentration Matrix
Matrix
Pathway
RNA-seq
Reads
Protein-Protein/
Protein-Nucleotide
Interactions
various
techniques
Gel Data
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
87
Multivariate Data: “Raw” Data
Interaction Networks and Pathways integrated with Expression and Concentration Data
Metabolite
Map
Peptide
Map
NMR
Spectra
Mass
Spectra
Protein Expression
Matrix
Protein
Map
1D
Gel Data
Gene Expression
Matrix
2D
Microarray
Image
Graph
Metabolite
Concentration Matrix
Matrix
Pathway
Protein-Protein/
Protein-Nucleotide
Interactions
RNA-seq
Reads
various
techniques
“Raw” Data
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
88
Multivariate Data: Transciptomics
- Microarray scans as images
- Scatterplot: comparison of two distributions (experiments) of expression values
- Profile plot: individual gene expression across experiments, often used in combination
with clustering
- Heatmap: colored view on full expression matrix, used in combination with clustering
to place similar profiles next to each other
- Dendrogram: hierarchical clustering of genes or experiments, often combined with
heatmap to provide more information about the cluster structures
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
89
Gehlenborg et al. 2010, Nature Methods
Multivariate Data: Transcriptomics
after normalization
!
!
!
!
!
!
!
3
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
4
2
!
!
!
!
!
!
!
!
!
!
!!
! ! !
!
! ! !!!
!
!
!
!
!
!
! !!
!
!! !
!!! !!! !
!
! !
!!
!! !!
!
!! !
!!
!
!
!
!
!
! !
!
!
!!! !
!
!!
!
!! !
!! !!! !
!
!!! !
!!!
!! !
! !
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!!! !!!! !
!!
!!
!!!!
!
!!!
! !! !
!
!
!
!!
!
!
!
!!! !
!
!
!!
!
!!
!
!
!
!!!!! !
!
!! !!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!!
!!!
!!!!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!!
!
!
! !
!!
!
!
!
! !! !
!
!
!
!
!
! !! !! !
!
!
!
!
!
!!
!
!!! !
!
!!! ! !
!!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!!! !
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!!
!
!
!
!
!!
!
!!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!!!
!!
!
!
!!
!!
!
!
!
!
!
!
!
! !!!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!!!
!
!
!
!
!
!
!
!
!!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!!
!
!
!!
!
!
!
!
!!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
! !
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
! !!!!!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
! !! !
!
!
!
!
!
!!!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!! !
!
!
!!
!!
!
!
!
!
!
!
!
!
!
!
!! ! ! !
!
!
!
!
!
!
! !
!
!
!
!
!
!
!
!!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
! !
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
! !!
!
!
!
!
!
!
!
!
! !!!
!
!
!
!
!
! !!
!
!
!
! !!!
!
!
!
!!!!
!
!!
!
!!
!
!
!
!
!
!
!
!
!!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!!!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!! !
!
!
!
!
!
!!
!
!
!
!
!!!
!
!
!
!
!
!
!
!
!
!
!
!!
!!
!
!
!
!
!
!
!
!
! !!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!!!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!! !
!
!
!
!
!!
!
!
!
!!
!
!
!
!
!
!
! !!
!!
!
!
!
!
!
!
!! !!!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!!
!
! !! ! !
!! !
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!!
!
!
!!
!
!!!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!!
!
!
!
!
!
!
!!!!!!! !
!
!
!
!!
!!
!
!
!
!
!
!!
!!
!
!
!!
!
!!
!!
!!
!
!!
!
!!
!!
!
!!
!
!!
!! !!!
!
!
!!
!
!
!!!
!
!!
!
!
!
! ! !!!
!
!!
!!
! !
!!
!
!!
!!!
!
!!
!
! !!
! !
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
! ! !
!
!
!
!
!
!
!
!
!
!
!! !
!
!
!
!!!
!
! !!!! !! !! ! !
!!
!
!
!!
!
!! !
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
! !!
!!
! ! !! ! !
!
!
!
! !!
! !!
! !
!
! !!! ! !
!!
!
!
!
!
!
!
!
!
!
! !! !
! ! ! !!! ! ! !! !!!
! !!
!
!
!
!
!
!
!
!
!! !
! ! !! !
!!! !
!
!
!! ! !
! !
!! !
! !
!
!
!
!
!
!
! !
!
!!!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
0
!
!
−2
2
0
!
!
!
!
!
! !
!
!
!
!!
! ! !!
!
! ! !
!
!
!! !
!
!
!
!
!
!
!
!
!
!
!
! !!
!!
!
!
!
! !
! !
!
!
!
!!! !
! !!!!! ! ! !! ! !
!! !
! ! !
!!
!!!
!
!
!! !
!!
!!! !
! !
!
!
!!! !
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
! !
!
!
!
!
!
!
!
!!
!
!!
!
!!!!
!!! !
!
! !!
!!
! ! !!
!!
!!
!
!!!!
! !!!
!!!!
!
!
!!!!
!!!
!
!!
!
!
!
!
!
!!!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
! !
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!!
!
!
!!
!!
!!
!
!
!!!! !
!!!!
!
!
!
!
!
!
!
!
!
!
!
!
!!
! ! ! ! !!!
!
!
!
!
!!
!!
!
!
!!
!!
!
!
!
!!!!!!
!
!!
!
!
!!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!!!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!!
!!
!
!
!
!
!
!!! ! !
!
!
!
!
!
!
!
!
!
!
!
! !!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!!
!
!!
!
!
!!
!
!
!
!
!
!
!! !! !!!!
!
!
!!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!!!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!! !
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!! ! !
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!!!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
! !
!
!
!
!
!
!
!!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!! !
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!!!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!!! !
!
!
!
!!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!!!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
! !!!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!!
!
!
!
!
!
!
!!!!!!! !
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!! !
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
! !!!
!
!
!!! !
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!!!
!
!
!
!
! !
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!!!
!
!
!
!
!
! !! ! ! ! !
!
!
!!!
!
!
!
!
!
!
!!!!!! !
!
!
!!!
!
!
! !!!!!
!
!
!
!
!
!!
!
!!
!
!
!!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
! !!
!!
!
!
!
! !
!
!
!!!
!!
!!!
!!
!
!
!
!
!
!!
!! !
!
!
!
! !
!
!
!
!!!!!
!
!
!
!
!
!
!
!
!
!! !
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!! !
! !!
!!
!
!
! !! ! ! !
! !!
!
!
!
!
!
!
!
!
! !! !
!
!
!
!!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!!
!
!
! !
!
!!
!! ! !
!!
!!
! !!!
!
!
!
! !
! !! !! ! ! !
! !!
! ! !!
!
! !
!
!!!
! !
!
!
!
! !
!
! !!!
!
! !
!
!!
!! !
!
! !
!
!! ! ! !
!
!
!!
!
!
!
!
!!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
log expression ratio
!
!
−2
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
−4
!
!
−4
16
6
8
10
A = 0.5 * log(R*G)
12
14
16
Array3
14
Array3
A = 0.5 * log(R*G)
12
Array2
10
Array1
8
Array3
6
Array2
!
Array1
M = log(R/G)
2
!
b
4
a
1
!
!
Array2
before normalization
Box Plot: 3 arrays
Array1
MA Plot: 1 array
1 = before normalization
2 = after within-array normalization
3 = after between-array normalization
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
90
Multivariate Data: Proteomics
- quantitative proteomics tries to measure the expression level of “all”
proteins (as many as possible) in a sample
- quantitative shotgun proteomics produces large and complex datasets
(hundreds of GB per run)
- data is obtained from liquid chromatography coupled with mass
spectrometry (LC-MS or LC-MS/MS) measurements
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
91
Multivariate Data: Proteomics
Pep3D www.proteomecenter.org
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
92
www.open-ms.de
Multivariate Data: Proteomics
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
93
Multivariate Data: Proteomics
a
b
TOPPView www.open-ms.de
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
94
Multivariate Data: Derived Matrices
Interaction Networks and Pathways integrated with Expression and Concentration Data
Derived Matrices
Metabolite
Map
Peptide
Map
NMR
Spectra
Mass
Spectra
Protein
Map
1D
Gene Expression
Matrix
2D
Microarray
Image
Graph
Protein Expression
Matrix
Matrix
Metabolite
Concentration Matrix
Pathway
Protein-Protein/
Protein-Nucleotide
Interactions
RNA-seq
Reads
various
techniques
Gel Data
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
95
Multivariate Data: Derived Matrices
- matrices of multi-dimensional vectors
- usually abundance profiles, e.g. transcript or protein levels, metabolite concentrations
Meta Information
Sample Attributes
M D1 D2 M D1 D2 M D1 D2
Wild Type
Gene A-/-
Gene B-/-
Factor
Factor Value
Gene Attributes
Expression Profile
Sample
Visualization
of Large-Scale
Biological
/ Bio-ITgene
Worldand
2011 sample
/ N Gehlenborg
& M Meyer
Figure 4.2: Expression
matrix
with Data
associated
attributes.
See Figure 1.1
96
Gehlenborg et al. 2010, Nature Methods
Multivariate Data: Derived Matrices
Scatter Plot
Principal Component 2
2
0
−2
−4
−4
−2
0
2
Principal Component 1
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
97
Multivariate Data: Derived Matrices
- Scatter Plots and Dimensionality Reduction
- used to visualize high-dimensional profiles as projections in lower-dimensional
spaces (usually 2D, sometimes also 3D ...)
- there is always a loss of information in the process, goal is to minimize the loss of
information
- many different algorithms: Principal Components Analysis (PCA), MultiDimensional Scaling (MDS), Isomap, etc.
- Pros - good choice to get an idea about the overall structure of the whole data
set: clusters, outliers, gaps in the data
- Cons - because of the dimensionality reduction the original profiles are not
accessible in the visualization
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
98
Gehlenborg et al. 2010, Nature Methods
Multivariate Data: Derived Matrices
3
Profile Plot a.k.a.
Parallel Coordinates
log expression ratio
2
1
0
−1
−2
0
7
14
21
28
35
42
49
56
63
70
77
84
91
98
105
112
119
−3
Time (min)
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
99
Multivariate Data: Derived Matrices
- Profile Plot/Parallel Coordinate Plots
- Pros
- encoding by position: profiles easy to read
- color-coding of expression profiles (groups) very efficient
- Cons
- overplotting
- grows horizontally with every additional sample
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
100
Gehlenborg et al. 2010, Nature Methods
Multivariate Data: Derived Matrices
3
Profile Plot a.k.a.
Parallel Coordinates
log expression ratio
2
1
0
−1
−2
0
7
14
21
28
35
42
49
56
63
70
77
84
91
98
105
112
119
−3
Time (min)
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
101
Gehlenborg et al. 2010, Nature Methods
log expression ratio
0
ï
0
ï
ï
ï
0
7
70
77
ï
0
7
70
77
ï
Time (min)
Time (min)
log expression ratio
0
ï
0
ï
ï
ï
ï
0
7
70
77
ï
0
7
70
77
log expression ratio
Profile Plot a.k.a.
Parallel Coordinates
log expression ratio
Multivariate Data: Derived Matrices
Time (min)
Time (min)
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
101
Gehlenborg et al. 2010, Nature Methods
Multivariate Data: Derived Matrices
0
7
14
21
28
35
42
49
56
63
70
77
84
91
98
105
112
119
Heat Map with Dendrogram
Time (min)
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
102
Multivariate Data: Derived Matrices
- Heatmap
- Pros
- no overplotting, yet a very dense information display
- can be combined with dendrogram and additional information can be encoded
in further columns or in the height of rows
- Cons
- only qualitative interpretation possible due to color coding
- grows horizontally with every additional sample and grows vertically with
every additional profile
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
103
Multivariate Data: Summary
few, high-res
many, low-res
3
2
Principal Component 2
log expression ratio
2
1
0
−1
−2
0
−2
−4
0
7
14
21
28
35
42
49
56
63
70
77
84
91
98
105
112
119
−3
−4
0
2
Principal Component 1
0
7
14
21
28
35
42
49
56
63
70
77
84
91
98
105
112
119
Time (min)
−2
Time (min)
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
104
data: Lukk et al., 2010, Nature Biotechnology
Problem: Very Large Expression Matrices
Power Wall (7x4 screens = 11,200x4,800), University of Leeds
1000 transcripts, 5372 samples
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
105
Gehlenborg and Brazma, 2009, BMC Bioinformatics
New Visualization Method: Space Maps
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
106
Gehlenborg and Brazma, 2009, BMC Bioinformatics
New Visualization Method: Space Maps
L5
Observation I
L4
L3
Observation II
L2
L1
Observation III
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
107
Gehlenborg and Brazma, 2009, BMC Bioinformatics
New Visualization Method: Space Maps
L5
Observation I
L4
L3
Observation II
L2
L1
Observation III
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
107
Gehlenborg and Brazma, 2009, BMC Bioinformatics
New Visualization Method: Space Maps
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
108
Networks
Interaction Networks and Pathways integrated with Expression and Concentration Data
Networks
Metabolite
Map
Peptide
Map
NMR
Spectra
Mass
Spectra
Protein
Map
1D
Gene Expression
Matrix
2D
Microarray
Image
Graph
Protein Expression
Matrix
Matrix
Metabolite
Concentration Matrix
Pathway
Protein-Protein/
Protein-Nucleotide
Interactions
RNA-seq
Reads
various
techniques
Gel Data
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
109
Networks
- Data-derived
- protein-protein interaction or protein-DNA interaction networks derived from
Chromatin Immuno Precipitation (ChIP) or Yeast-2-Hybrid (Y2H) measurements
- gene regulatory networks inferred from gene expression data
- correlation networks derived from gene expression data
- Knowledge-derived
- biochemical pathway maps
- other curated networks derived from the literature
- Combination of networks and multivariate data
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
110
Protein-Protein Interaction Networks
- Protein-protein interaction (PPI) networks are graphs containing an edge for each PPI.
- They show significant functional clustering: proteins with related function often form densely
connected subgraphs.
- Visualization of PPIs requires automated layout algorithms, e.g. force-directed layout or
circular layout to arrange the nodes on the screen according to some optimization criterion.
- Gene regulatory networks, correlation networks and protein-DNA interaction networks are
visualized in a very similar way, the major differences are the types of edges (directed,
undirected and other types).
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
111
Networks: Layout Algorithms
Circular Layout
Force-directed Layout
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
112
Barabasi & Oltvai 2004, Nature Reviews Genetics
Yeast Protein-Protein Interaction Network
- This network shows the largest connected
component of the yeast interactome as
determined by Yeast-2-Hybrid
- This component contains 78% of all proteins
- Nodes are color-coded by the effect of
a knock-out mutant:
Red: lethal
Green: non-lethal
Orange: slow growth
Yellow: unknown
- Hubs are often colored red!
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
113
NSF Visualization Challenge 2011 Honorable Mention: AraNet
Networks: Hairballs, Ridiculograms & Friends
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
114
Gehlenborg et al. 2010, Nature Methods
Protein-Protein Interaction Networks
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
115
Gehlenborg et al. 2010, Nature Methods
Protein-Protein Interaction Networks
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
116
Gehlenborg et al. 2010, Nature Methods
Protein-Protein Interaction Networks
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
117
Gehlenborg et al. 2010, Nature Methods
Protein-Protein Interaction Networks
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
118
Gehlenborg et al. 2010, Nature Methods
Protein-Protein Interaction Networks
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
119
http://www.cytoscape.org/
Cytoscape
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
120
http://ophid.utoronto.ca/navigator
NAViGaTOR
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
121
http://www.gephi.org
Gephi
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
122
http://www.genome.jp/kegg/pathway.html
Pathways
- KEGG Pathway
“Wiring diagrams of molecular interactions, reactions, and
relations”
A collection of network diagrams (manual layout!)
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
123
http://www.genome.jp/kegg/pathway.html
Pathways
KEGG Pathway provides so called reference pathways containing most metabolic
pathways.
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
124
http://www.genome.jp/kegg/pathway.html
Pathways
Species specific pathways highlight those enzymes available in a specific organism in green
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
125
http://www.biocyc.org
Pathways
- BioCyc (EcoCyc and MetaCyc)
2009: 1355 pathways with 7837 reactions, 5792 enzymes
Three types of databases
Tier 1: intensively curated databases
Tier 2: Computationally derived databases subject to moderate curation
Tier 3: Computationally derived databases without curation
- visualize individual metabolic pathways, or to view the complete metabolic map
of an organism
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
126
http://www.biocyc.org
Pathways
BioCyc
draws pathways interactively
with varying level of detail
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
127
http://www.biocyc.org
Pathways
BioCyc
draws pathways interactively
with varying level of detail
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
128
http://www.biocyc.org
Pathways
BioCyc
draws pathways interactively
with varying level of detail
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
129
http://www.biocyc.org
Pathways: Metabolomic Map
-
Nodes represent metabolites
Shape indicates class of metabolite (see key to right).
Lines represent reactions.
Moving the mouse over a metabolite icon identifies it.
BioCyc
draws pathways interactively
with varying level of detail
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
130
Gene Regulatory Networks
- a gene regulatory network is the set of activating and
repressing genes or gene products and their interactions
- networks are derived from (transcriptomics) datasets using
network inference techniques
- resulting networks are visualized as a network graph
G = (V, E) where
! V is the set of nodes representing the involved genes/gene products
! E is the set of edges representing the transcriptional regulation of the genes
! Edges can be either activating (+) or repressing (-)
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
131
Westenberg et al. 2008, Comp Graph Forum
Westenberg et al. 2010, Bioinformatics
Gene Regulatory Networks
SpotXplore
- maps expression profiles
onto regulatory network
- statistics can be visualized
- interaction
- highlight subnetworks
- Cytoscape plugin
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
132
Gehlenborg et al. 2010, Nature Methods
Networks and Multivariate Data
Cerebral (Cytoscape)
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
133
Gehlenborg et al. 2010, Nature Methods
Networks and Multivariate Data
Lichen
Prometra
VistaClara (Cytoscape)
GENeVis
VisANT
VANTED
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
134
Networks and Multivariate Data: Choice!?
1. Small multiples? (one value per node, all networks shown simultaneously)
2. Animation? (one value per node, one network shown at a time)
3. Complex glyphs? (multiple values per node)
4. Combination of multiple views? (network linked to heat map or profile plot)
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
135
Saraiya et al., 2005, InfoVis 2005 Proceedings
Networks and Abundance Data: Choice!?
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
136
Part 3
Design of
Visualization Systems
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
Challenge: Heterogeneity
Pathline
A Tool for Comparative Functional Genomics Data
joint work with:
Bang Wong, Mark Styczynski, Tamara Munzner, Hanspeter Pfister
Pathline: A Tool for Comparative Functional Genomics
M. Meyer et al., IEEE/Eurographics EuroVis 2010.
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
138
target
translate
design
implement
validate
target
translate
design
implement
validate
functional genomics
how do genes work together to perform
different functions in a cell?
functional genomics data
gene expression
molecular pathways
functional genomics data
gene expression
molecular pathways
gene expression is ...
biologists measure it ...
... for many genes
... in many samples (time points,
tissue types, species)
visualized with heatmaps
[Wilkinson09] [Saldanha04] [Seo02] [Eisen98]
[Gehlenborg10] [Weinstein08]
encode value with color
samples
genes
... the measured level of how
much a gene is on or off
... a single quantitative value
0.2
0.4
1.0
1.0
1.0
0.8
1.0
0.0
0.0
0.0
1.0
1.0
0.7
0.8
1.0
1.0
0.8
0.6
1.0
0.0
0.2
0.5
1.0
1.0
0.5
0.8
0.5
0.3
0.5
0.8
0.7
0.5
0.8
0.7
1.0
1.0
1.0
0.3
0.4
1.0
1.0
1.0
0.5
0.0
0.0
0.7
0.5
0.3
gene expression is ...
biologists measure it ...
... for many genes
... in many samples (time points,
tissue types, species)
visualized with heatmaps
[Wilkinson09] [Saldanha04] [Seo02] [Eisen98]
[Gehlenborg10] [Weinstein08]
encode value with color
samples
genes
... the measured level of how
much a gene is on or off
... a single quantitative value
0.2
0.4
1.0
1.0
1.0
0.8
1.0
0.0
0.0
0.0
1.0
1.0
0.7
0.8
1.0
1.0
0.8
0.6
1.0
0.0
0.2
0.5
1.0
1.0
0.5
0.8
0.5
0.3
0.5
0.8
0.7
0.5
0.8
0.7
1.0
1.0
1.0
0.3
0.4
1.0
1.0
1.0
0.5
0.0
0.0
0.7
0.5
0.3
gene expression is ...
... the measured level of how
much a gene is on or off
... a single quantitative value
biologists measure it ...
... for many genes
... in many samples (time points,
tissue types, species)
visualized with heatmaps
[Wilkinson09] [Saldanha04] [Seo02] [Eisen98]
[Gehlenborg10] [Weinstein08]
encode value with color
augmented with clustering
[Eisen98]
functional genomics data
gene expression
molecular pathways
the functioning of a cell is controlled by
many interrelated chemical reactions
performed by genes
input
output / input
genes
output
genes
= cell function
glycolysis
tca cycle
pathways
www.genome.jp/kegg/
functional genomics
how do genes work together to perform
different functions in a cell?
comparative functional genomics
how do the gene interactions vary across
different species?
collaborators: Regev Lab at the Broad Institute
biology: metabolism in yeast
data: multiple genes
multiple time points
multiple related species
multiple pathways
problem: existing tools can only look at a subset of this data
comparative functional genomics
how do the gene interactions vary across
different species?
target
translate
design
implement
validate
t
s6
gene expression
glycolysis
metabolic
pathways
•10
to 50 pathways
of interest
genes and
140 metabolites
•6
•inputs/outputs
called metabolites
•directed
tca cycle
graph
t1
time points
•14
species of yeast
•3D
table
t1
s2
t1
s1
t1 g1 t20.2 g2
t30.41
s4
s3
•6000
t1 g1 t20
s5
g30.0
-0
g1 0.2 g20.41.0 1.0
t2
t3
t4
t5
g30.0
g40.8
-0.7
1
g1 0.2 g20.41.0 1.0
1.00.0
1.0
t2
t3
t4
t5
t6
g40.8
g5
1.0
0.0
-0
g30.0
-0.7
0.01.0
1.0
g1 0.2 g20.41.0 1.0
1.00.0
1.0
1.0
t2
t3
t4
t5
t6
g50.0
-0.5g6
0.8
-0
g4 1.0
g30.0
-0.7
0.8
0.8
g1 0.2 g20.41.0 1.0
1.00.00.8
1.00.01.0
1.01.01.00.2
t2
t3
t4
t5
t6
g7
-0.7
0.5
-1g50.0
-0.5g6
0.8
g4 1.0
0.50.5
1.0
g30.0
-0.7
0.8
0.2
0.8
g1 0.2 g20.41.0 1.0
1.00.00.8
1.00.01.0
1.01.01.00.2
t1
g8
-1.0
-0.3
-0g60.8
-0.7g7
0.5
0.8
g5 -0.5
-0.3
-0.
g40.81.0
1.0
0.2
-0.70.0
0.2
m1 1.0 g30.0
0.01.00.0
1.01.00.2
0.80.80.50.5
g8-0.3
-0.5
0.0
g70.5
-1.0
0.4
g6 -0.7
0.8
-0.7
-1.g50.0
-0.5
-0.3
-0.5
-0.5
0.2
g2 -0.7g40.81.0 1.0
1.00.20.8
0.80.50.5
0.21.0
g8-0.3
-0.5
0.0
0.0
g7 -1.0
0.4
-1.0
-1.g60.8
-0.7
0.8
-0.7
-1.0
0.5
-0.50.2
-0.3
-0.5
-0.5
m2 1.0 g50.0
0.50.50.5
1.0
0.2
g8-0.3
-0.5
0.0
0.0
-0.7-1.0
-0.
g70.5
-1.0
0.4
-1.0
-1.0
-0.70.5
0.8
-0.7
-1.0
0.5
g3 -0.5g60.8
-0.3
-0.5
-0.5
g8-0.3
-0.5
0.0
-0.7-1.0
-0.5 -0.7
-1.00.8
0.40.0
-1.0
-1.0
m3 -0.7g70.5
-0.7
-1.0
0.5
similarity scores
s1
t1
g1 0.2
aggregate
s2
t1
g1 0.2
s3
t1
g1 0.2
t2
t3
t4
t5
t6
0.4
1.0
1.0
1.0
1.0
t2
t3
t4
t5
t6
0.4
1.0
1.0
1.0
1.0
t2
t3
t4
t5
t6
0.4
1.0
1.0
1.0
1.0
...
,
,
,
•aggregate
time series
for a gene/metabolite
over species
•similarity
= 0.83
of expression
across species
•aggregate: Pearson,
Spearman, others
-0.50.40.0-1.0
0.0-1.0
-0.7-1.0
-0.5 -0.7
g7 -1.0g8-0.3
phylogeny
g8 -0.5 0.0
0.0 -0.7 -0.5 -0.7
•evolutionary
relationship
'"#()*
'"#+%,
'"#-./
'"#-./01
2"#3$.
'"#(.4
•binary
tree
5"#&6$
5"#7.$
5"#$.(
'"#,$0
8"#9.:
2"#.$-
!"#$%&
•quantitative
value
'"#;.&
'"#&6+
tasks
-
study expression data as a time series
-
compare a limited number of time series
-
compare similarity scores along a pathway(s)
-
comparison of multiple similarity scores
metabolic pathways
similarity scores
similarity scores
similarity scores
gene expression
phylogeny
target
translate
design
implement
validate
slide from Munzner 2011,Visualization Principles
Power of the plane: only position works for all!
158
encode quantitative values with spatial position
0123A1#
0'12'
"1''
linearized
pathway
!"
$"
topological layout
<"2:5;78=":=5"A
!"
!%
,
."
encode quantitative values with
- spatial position
!+
)
*
!"'
www.win.tue.nl/~mwestenb/genevis/
0123456"
heatmap
0123A1#
172884798:7;8<01:125=
<"2:5;78=":=5"A
)2))
/
,$
@0";5"@
;>:="<10
$%
$2))
+$
+(
curvemap
$&
$'(
$')
$'&
$'*
$+,
$+'
.'
!"
.+
1$
!#
0
+&
.-
-
.%
.
+$)
encode quantitative values with spatial position
,(
+$)
./
!$%
!$&
!$#
!$'
!()
!($
!((
!(*
.(
.)
+$#
!*)
.&
.*
.',
.''
,*
courtesy +(#
of M. Styczynski from JavaTreeview
jtreeview.sourceforge.net/
+(
.'+
$++
Pathline
linearized pathway representation
0123456"
linearized pathway
representation
common axes to compare
similarity scores
similarity score
0'12'
!"
!%
,
bars and circles
- visual layers for selective attention
- color-code gene direction
."
-
<"2:5;78=":=5"A
pathway
$"
0123A1#
-
!+
)
*
!"'
0.0
"1''
1.0
!"
0123456"
linearized pathway
representation
common axes to compare
similarity scores
-
0'12'
!"
,
bars and circles
- visual layers for selective attention
- color-code gene direction
multiple similarity scores
similarity score
!%
."
-
<"2:5;78=":=5"A
pathway
$"
0123A1#
-
!+
)
*
!"'
0.0
"1''
1.0
!"
3,4",
)&
%&
%*
0
2&
linearized pathway
representation
&4,,
!"
!#
!$
1
%$
.
/
%&,
common axes to compare
similarity scores
-
multiple similarity scores
-
multiple pathways
)*
bars and circles
- visual layers for selective attention
- color-code gene direction
%&'
!+,
%*'
)+
-
%&,
%*
%++
%&'
%&(
pathway to ordered list of nodes
/*0#12)30*
+*$*,-.
!"#$%&
423$0
!"#$%&
6
6
7
7
'(%)*
,#. and cut
,1.
unroll
,%.
reinsert
,5.
shared
coordinate frame
and stylized marks
1234567"
1234B2#
3,4",
&4,,
)&
%&
%*
0
2&
linearized pathway
representation
="3;6<89>";>6"B
!"
!#
!$
1
%$
.
/
%&,
putting it together . . .
-
topology is secondary
)*
use spatial position for similarity scores
%&'
!+,
%*'
)+
-
%&,
%*
%++
%&'
%&(
Pathline
curvemap
$"
$#
"1((
curvemap
!"
!#
-
&'(%
'%
)*+,
'&
.
base visual unit is a)'-.
curve
$
= %
(
= >
?
= @
time
inspired by heatmaps
-
$%
!"#%
expression
0(12(
!"#$
'(
time
$,
)&/
')
012-
'*
)3-4
'+
#562
',
*
+
$"(
$"(
!"#
$ %
$&
0(12(
$"
$#
!"#$
"1((
curvemap
!"
!#
-
&'(%
'%
)*+,
'&
inspired by heatmaps
.
-
-
$,
$%
!"#%
base visual unit is a)'-.
curve
'(
filled, framed line charts to
enhance shape perception
)&/
')
*
+
012-
'*
)3-4
'+
#562
',
$"(
$"(
!"#
$ %
$&
$
= %
(
= >
?
= @
1234567"
1234B2#
="3;6<89>";>6"B
0(12(
!"
#$CDC#7<7$CC(1%'
#"
#$
!$
/"
*
+
base visual unit is a curve
!"(
&'
C,1,
(%
0%1E
C%1'
0%1%
C21)
0%1F
C'1(
(*
0,1)
C)1F
(+
0F1F
C)1$
(,
!"(
filled, framed line charts to
enhance shape perception
0%1$
C)1(
(-
021E
C'1E
(.
0%1"
C)1,
(/
rows are species
0%1%
&$
!"%
C'1,
(%0
#'(
0E1'
C'1E
(%%
0%1,
C"1F
(%'
0$1F
C"1F
(%)
0(12(
3456-5+
5787597
%"&'()*+&"$
!$%
!$
0,1E
"1((
!"# !"#"$
&'
-
&%
()
.
inspired by heatmaps
-
$%
9>";52#A
('
-
!,
-
<?;>"=21
"1((
!"
&"
curvemap
A1"<6"A
.:+:57*;:4<-=
%"&,+-$
&7-594<>?.#54?@"
C"1E
(%*
0%1(
C)1(
C%1'
C)1,
0E1'
021$
0,1"
&7-594<>?.#54?@$
&7-594<ABB
1234567"
1234B2#
="3;6<89>";>6"B
0(12(
!"
#$CDC#7<7$CC(1%'
#"
#$
!$
/"
*
+
-
base visual unit is a curve
!"(
&'
C,1,
(%
0%1E
C%1'
0%1%
C21)
0%1F
C'1(
(*
0,1)
C)1F
(+
0F1F
C)1$
(,
!"(
filled, framed line charts to
enhance shape perception
0%1$
C)1(
(-
021E
C'1E
(.
0%1"
C)1,
(/
rows are species
0%1%
&$
!"%
C'1,
(%0
#'(
columns are genes/metabolites
0E1'
C'1E
(%%
0%1,
C"1F
(%'
0$1F
C"1F
(%)
0(12(
3456-5+
5787597
%"&'()*+&"$
!$%
!$
0,1E
"1((
!"# !"#"$
&'
-
&%
()
.
inspired by heatmaps
-
$%
9>";52#A
('
-
!,
-
<?;>"=21
"1((
!"
&"
curvemap
A1"<6"A
.:+:57*;:4<-=
%"&,+-$
&7-594<>?.#54?@"
C"1E
(%*
0%1(
C)1(
C%1'
C)1,
0E1'
021$
0,1"
&7-594<>?.#54?@$
&7-594<ABB
1234567"
1234B2#
="3;6<89>";>6"B
0(12(
!"
#$CDC#7<7$CC(1%'
#"
#$
!$
/"
*
+
-
-
base visual unit is a curve
!"(
&'
C,1,
(%
0%1E
C%1'
0%1%
C21)
0%1F
C'1(
(*
0,1)
C)1F
(+
0F1F
C)1$
(,
!"(
filled, framed line charts to
enhance shape perception
0%1$
C)1(
(-
021E
C'1E
(.
0%1"
C)1,
(/
rows are species
0%1%
&$
!"%
C'1,
(%0
#'(
columns are genes/metabolites
5787597
!$%
!$
(%%
0%1,
C"1F
0$1F
C"1F
0,1E
"1((
!"# !"#"$
%"&'()*+&"$
C'1E
(%)
0(12(
3456-5+
0E1'
(%'
overlays to enhance trends
&'
-
&%
()
.
inspired by heatmaps
-
$%
9>";52#A
('
-
!,
-
<?;>"=21
"1((
!"
&"
curvemap
A1"<6"A
.:+:57*;:4<-=
%"&,+-$
&7-594<>?.#54?@"
C"1E
(%*
0%1(
C)1(
C%1'
C)1,
0E1'
021$
0,1"
&7-594<>?.#54?@$
&7-594<ABB
1234567"
1234B2#
="3;6<89>";>6"B
0(12(
!"
#$CDC#7<7$CC(1%'
#"
#$
!$
/"
*
+
-
-
base visual unit is a curve
!"(
&'
C,1,
(%
0%1E
C%1'
0%1%
C21)
0%1F
C'1(
(*
0,1)
C)1F
(+
0F1F
C)1$
(,
!"(
filled, framed line charts to
enhance shape perception
0%1$
C)1(
(-
021E
C'1E
(.
0%1"
C)1,
(/
rows are species
0%1%
&$
!"%
C'1,
(%0
#'(
columns are genes/metabolites
5787597
!$%
!$
(%%
0%1,
C"1F
0$1F
C"1F
0,1E
"1((
!"# !"#"$
%"&'()*+&"$
C'1E
(%)
0(12(
3456-5+
0E1'
(%'
overlays to enhance trends
&'
-
&%
()
.
inspired by heatmaps
-
$%
9>";52#A
('
-
!,
-
<?;>"=21
"1((
!"
&"
curvemap
A1"<6"A
.:+:57*;:4<-=
%"&,+-$
&7-594<>?.#54?@"
C"1E
(%*
0%1(
C)1(
C%1'
C)1,
0E1'
021$
0,1"
&7-594<>?.#54?@$
&7-594<ABB
target
translate
design
implement
validate
Demo
target
translate
design
implement
validate
case study
-
qualitative research method
-
in-depth study of individual or group
-
real-world setting
-
description and interpretation
0123456"
0123A1#
<"2:5;78=":=5"A
0123456"
2+3"+
;>:="<10
$%
&3++
%&
(&
@0";5"@
8=":41#@
$&
$'
D$3"
()
2)3"
%)
DF3*
/
<"2:5;78=":=5"A
1&
0123A1#
0
+2++
(&
@0";5"@
$%
(,
()
.
2*3)
2&3*
D*3+
(%
%&+
/
D)3)
DF3"
2)3,
!"
!#
!$
D)3,
(&
(*
2+3,
D$3'
('
2)3'
0
(+
(-
D#3+
2*3)
D'3"
-
(.
%&'
()
%$
2&3*
;>:="<10
&2++
%&+
1&
(*
(+
%$
%&
%)
!"
!$DED!8=8$DD+3')
!#
!$
2&3,
.
(,
()/
!*+
D#3#
2*3&
D"3"
%&+
())
2,3*
(%
()*
%&+
D&3'
2)3'
D+3+
()+
2+3"+
!"#
(),
!"#"$
%)'
4567/6.
68986:8
(*
%)
%"&'()*+&"$
%"&,+-$
(,
()
%&'
%&*
D)3#
D,3*
(8/6:5=?@0!65@A)
(8/6:5=BCC
DF3&
2&3&
0;.;68-<;5=/>
(8/6:5=?@0!65@A&
%**
(&
&3++
2&3*
2*3)
2,3*
DF3"
('
2)3*
$&
$'
0123456"
0123A1#
<"2:5;78=":=5"A
0123456"
2+3"+
;>:="<10
$%
&3++
%&
(&
@0";5"@
8=":41#@
$&
$'
D$3"
()
2)3"
%)
DF3*
/
<"2:5;78=":=5"A
1&
0123A1#
0
+2++
(&
@0";5"@
$%
(,
()
.
2*3)
2&3*
D*3+
(%
%&+
/
D)3)
DF3"
2)3,
!"
!#
!$
D)3,
(&
(*
2+3,
D$3'
('
2)3'
0
(+
(-
D#3+
2*3)
D'3"
-
(.
%&'
()
%$
2&3*
;>:="<10
&2++
%&+
1&
(*
(+
%$
%&
%)
!"
!$DED!8=8$DD+3')
!#
!$
2&3,
.
(,
()/
!*+
D#3#
2*3&
D"3"
%&+
())
2,3*
(%
()*
%&+
D&3'
2)3'
D+3+
()+
2+3"+
!"#
(),
!"#"$
%)'
4567/6.
68986:8
(*
%)
%"&'()*+&"$
%"&,+-$
(,
()
%&'
%&*
D)3#
D,3*
(8/6:5=?@0!65@A)
(8/6:5=BCC
DF3&
2&3&
0;.;68-<;5=/>
(8/6:5=?@0!65@A&
%**
(&
&3++
2&3*
2*3)
2,3*
DF3"
('
2)3*
$&
$'
0123456"
0123A1#
<"2:5;78=":=5"A
0123456"
2+3"+
;>:="<10
$%
&3++
%&
(&
@0";5"@
8=":41#@
$&
$'
D$3"
()
2)3"
%)
DF3*
/
<"2:5;78=":=5"A
1&
0123A1#
0
+2++
(&
@0";5"@
$%
(,
()
.
2*3)
2&3*
D*3+
(%
%&+
/
D)3)
DF3"
2)3,
!"
!#
!$
D)3,
(&
(*
2+3,
D$3'
('
2)3'
0
(+
(-
D#3+
2*3)
D'3"
-
(.
%&'
()
%$
2&3*
;>:="<10
&2++
%&+
1&
(*
(+
%$
%&
%)
!"
!$DED!8=8$DD+3')
!#
!$
2&3,
.
(,
()/
!*+
D#3#
2*3&
D"3"
%&+
())
2,3*
(%
()*
%&+
D&3'
2)3'
D+3+
()+
2+3"+
!"#
(),
!"#"$
%)'
4567/6.
68986:8
(*
%)
%"&'()*+&"$
%"&,+-$
(,
()
%&'
%&*
D)3#
D,3*
(8/6:5=?@0!65@A)
(8/6:5=BCC
DF3&
2&3&
0;.;68-<;5=/>
(8/6:5=?@0!65@A&
%**
(&
&3++
2&3*
2*3)
2,3*
DF3"
('
2)3*
$&
$'
0123456"
0123A1#
<"2:5;78=":=5"A
0123456"
2+3"+
;>:="<10
$%
&3++
%&
(&
@0";5"@
8=":41#@
$&
$'
D$3"
()
2)3"
%)
DF3*
/
<"2:5;78=":=5"A
1&
0123A1#
0
+2++
(&
@0";5"@
$%
(,
()
.
2*3)
2&3*
D*3+
(%
%&+
/
D)3)
DF3"
2)3,
!"
!#
!$
D)3,
(&
(*
2+3,
D$3'
('
2)3'
0
(+
(-
D#3+
2*3)
D'3"
-
(.
%&'
()
%$
2&3*
;>:="<10
&2++
%&+
1&
(*
(+
%$
%&
%)
!"
!$DED!8=8$DD+3')
!#
!$
2&3,
.
(,
()/
!*+
D#3#
2*3&
D"3"
%&+
())
2,3*
(%
()*
%&+
D&3'
2)3'
D+3+
()+
2+3"+
!"#
(),
!"#"$
%)'
4567/6.
68986:8
(*
%)
%"&'()*+&"$
%"&,+-$
(,
()
%&'
%&*
D)3#
D,3*
(8/6:5=?@0!65@A)
(8/6:5=BCC
DF3&
2&3&
0;.;68-<;5=/>
(8/6:5=?@0!65@A&
%**
(&
&3++
2&3*
2*3)
2,3*
DF3"
('
2)3*
$&
$'
0123456"
0123A1#
<"2:5;78=":=5"A
0123456"
2+3"+
;>:="<10
$%
&3++
%&
(&
@0";5"@
8=":41#@
$&
$'
D$3"
()
2)3"
%)
DF3*
/
<"2:5;78=":=5"A
1&
0123A1#
0
+2++
(&
@0";5"@
$%
(,
()
.
2*3)
2&3*
D*3+
(%
%&+
/
D)3)
DF3"
2)3,
!"
!#
!$
D)3,
(&
(*
2+3,
D$3'
('
2)3'
0
(+
(-
D#3+
2*3)
D'3"
-
(.
%&'
()
%$
2&3*
;>:="<10
&2++
%&+
1&
(*
(+
%$
%&
%)
!"
!$DED!8=8$DD+3')
!#
!$
2&3,
.
(,
()/
!*+
D#3#
2*3&
D"3"
%&+
())
2,3*
(%
()*
%&+
D&3'
2)3'
D+3+
()+
2+3"+
!"#
(),
!"#"$
%)'
4567/6.
68986:8
(*
%)
%"&'()*+&"$
%"&,+-$
(,
()
%&'
%&*
D)3#
D,3*
(8/6:5=?@0!65@A)
(8/6:5=BCC
DF3&
2&3&
0;.;68-<;5=/>
(8/6:5=?@0!65@A&
%**
(&
&3++
2&3*
2*3)
2,3*
DF3"
('
2)3*
$&
$'
highlights
-
Pathline
-
multiple genes, time points, species, and pathways
-
linearized pathway representation
-
curvemap
-
tool deployment
open source
- used daily by several collaborators
-
www.pathline.org
Challenge: Scale
- Example: The Cancer Genome Atlas (NCI & NHGRI)
- around 20 cancer types
- at least 500 patients per cancer type
- all: mRNA transcript expression levels, copy number variation, SNPs, microRNA
expression levels, methylation, clinical data
- some: whole genome sequencing
- so far: more than 4.5 TB of data (for around 3000 patients)
- tools for remote visualization needed
- (new visualization methods needed as well!)
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
177
Challenge: Scale
- access to public biological data sets typically through web-interfaces
- large-scale data sets are analyzed on compute clusters or cloud infra structures
- web browser-based visualization options:
- browser plugins: Java Applets and Adobe Flash
- native support: Scalable Vector Graphics, HTML5 Canvas, WebGL
- alternative: desktop applications with client/server architecture (e.g. IGV,
GenomeView)
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
178
Visualization Toolkits for the Web: Examples
- Java applets: Processing, Prefuse*
- Flash: Flare*
- JavaScript
- SVG: Google Chart Tools*, Flot*, ProtoVis*, Raphael, TheJIT
- HTML5 Canvas: Three.js, ProcessingJS
- WebGL: Three.js, PhiloGL
* indicates high-level visualization library
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
179
Collaboration: Web-based Tools
Payao www.payaologue.org
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
180
Collaboration: Web-based Tools
WikiPathways www.wikipathways.org
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
181
Collaboration: Web-based Tools
IBM Many Eyes manyeyes.alphaworks.ibm.com
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
182
Keim et al. 2009,Visual Data Mining:Theory,Techniques and Tools for Visual Analytics
Visual Analytics
Builds the bridge between Visualization and Analytical Reasoning
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
183
Visual Analytics
- formation of abstract visual metaphors in combination with human
interaction
- enables detection of the expected and discovery of the unexpected
within massive, dynamically changing information spaces
- knowledge is gained from visualization, automatic analysis, as well as
the preceding interactions between visualizations, models, and the
human analysts
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
184
Visual Analytics of Biological Data
- Biological data is very heterogeneous, complex and often very large
- Visualization of biological data plays a central role
- Complemented with computational analysis methods and interaction
accelerates process of gaining insight of biological processes and
modeling them
- Applications in all areas of biology where large amounts of
heterogeneous data need to be interpreted
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
185
Acknowledgements
Tamara Munzner (University of British Columbia, Canada) gave us permission to use
slides from her talks.
Kay Nieselt (University of Tübingen, Germany) gave us permission to use slides and
helped to design an earlier version of this course.
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
186
Resources
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
Scientific and Information Visualization
- Scientific Visualization (“scivis”) and Information Visualization (“infovis”) are very illdefined terms
- Scientific Visualization is often used to describe visualization of data that is intrinsically
spatial (such as medical imaging data, fluid flows or protein structures)
- Information Visualization is typically used to describe visualization of abstract data
(such as gene expression data or interaction networks)
- there is plenty of overlap and the separation is quite arbitrary
- both Scientific and Information Visualization are used to visualize scientific data
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
188
Recommended Books
Information Visualization - Perception for Design
Colin Ware, Morgan Kaufmann, 2004
Information Visualization - Using Vision to Think
Stuart K Card, Jock D Mackinlay, Ben Shneiderman, Morgan Kaufmann, 1999
The Visual Display of Quantitative Information (2nd Edition)
Edward R Tufte, Graphics Press, 2001
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
189
Recommended Books
Fundamentals of Computer Graphics (3rd Edition)
Peter Shirley, Steve Marschner, AK Peters Publishers, 2009
(in particular: “Chapter 27 - Visualization”, also as free PDF from Tamara Munzner’s website)
The Non-Designer’s Design Book (3rd Edition)
Robin Williams, Peachpit Press, 2008
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
190
Recommended Resources on Color
A Field Guide to Digital Color
Maureen C Stone, AK Peters Publishers, 2003
ColorBrewer 2.0
Cynthia Brewer, Mark Harrower, http://www.colorbrewer2.org
VisCheck
http://www.vischeck.com
Color Oracle
http://colororacle.cartography.ch
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
191
Recommended Journals
Nature Methods Special Issue on Visualizing Biological Data
http://www.nature.com/nmeth/journal/v7/n3s
IEEE Transactions on Visualization and Computer Graphics
http://www.computer.org/portal/web/tvcg
IEEE Computer Graphics and Applications
http://www.computer.org/portal/web/cga/home
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
192
Recommended Meetings
IEEE Symposium on Biological Data Visualization - BioVis
www.biovis.net
Workshop on Visualizing Biological Data - VIZBI
www.vizbi.org
IEEE VisWeek with InfoVis,Vis and VAST Conferences
www.visweek.org
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
193
Tools for Interaction Network Visualization
Name
Stand-alone
Arena 3D
BiNA
BioLayout Express 3D
BiologicalNetworks 2
Cytoscape
GENeVis
Medusa
NBrowse
NAViGaTOR
Ondex
Osprey
Pajek
ProViz
SpectralNET
Tulip
VANTED
yEd
Cytoscape Plug-ins
BiNoM
BioModules
Cerebral
MCODE
VistaClara
Web-based
Graphle
Lichen
MAGGIE Data Viewer
STITCH 2
VisANT
Cost
Availability
Description
URL
Free
Free
Free
Free
Free
Free
Free
Free
Free
Free
Free
Free
Free
Free
Free
Free
Free
Win Mac Linux
Win Mac Linux
Win Mac Linux
Win Mac Linux
Win Mac Linux
Win Mac Linux
Win Mac Linux
Win Mac Linux
Win Mac Linux
Win Mac Linux
Win Mac Linux
Win
Win Mac Linux
Win
Win Mac Linux
Win Mac Linux
Win Mac Linux
Visualization of biological multi-layer networks in 3D
Exploration and interactive visualization of pathways
Generation and cluster analysis of networks with 2D/3D visualization
Analysis suite; visualizes networks and heat map; maps abundance data
Network analysis; extensive list of plug-ins for advanced visualization
Network and pathway visualization; abundance data
Basic network visualization tool
Network visualization software for heterogeneous interaction data
Visualization of large protein-protein interaction data sets; abundance data
Integrative workbench; large network visualizations; abundance data
Tool for visualization of interaction networks
Generic network visualization and analysis tool
Software for visualization and exploration of interaction networks
Network visualizations; scatter plots for dimensionality reduction methods
Generic visualization and analysis tool; extremely large networks; 3D support
Combined visualization of abundance data and pathways
Generic network visualization software; offers many layout algorithms.
http://www.arena3d.org
http://www.bnplusplus.org/bina
http://www.biolayout.org
http://www.biologicalnetworks.org
http://www.cytoscape.org
http://tinyurl.com/genevis
http://coot.embl.de/medusa
http://www.gnetbrowse.org
http://tinyurl.com/navigator1
http://www.ondex.org
http://tinyurl.com/osprey1
http://pajek.imfm.si
http://tinyurl.com/proviz
http://tinyurl.com/spectralnet
http://tulip.labri.fr/TulipDrupal
http://tinyurl.com/vanted
http://tinyurl.com/yEdGraph
Free
Free
Free
Free
Free
Win Mac Linux
Win Mac Linux
Win Mac Linux
Win Mac Linux
Win Mac Linux
Extensive support for common systems biology network formats
Detects modules in networks; maps abundance data onto nodes and modules
Biologically motivated layout algorithm; maps abundance data; clustering
Network clustering algorithm; support for manual cluster refinement
Mapping of abundance data to nodes and “heat strips”; provides heat map
http://tinyurl.com/binom1
http://tinyurl.com/biomodules
http://tinyurl.com/cerebral1
http://preview.tinyurl.com/MCODE123
http://www.cytoscape.org/plugins
Win Mac Linux
Distributed client/server network exploration and visualization tool
Library for web-based visualization of network and abundance matrix data
Visualization of networks; abundance data in heat maps and profile plots
Construction and visualization of networks from a wide range of sources
Analysis, mining and visualization of pathways and integrated omics data
http://tinyurl.com/graphle
http://tinyurl.com/Lichen1
http://maggie.systemsbiology.net
http://stitch.embl.de
http://visant.bu.edu
Free
Free
Free
Free
Free
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
194
Tools for Pathway Visualization
Name
Stand-alone
BioTapestry
Caleydo
CellDesigner
Edinburgh Pathway Editor
GenMAPP 2
IngenuityPathways
JDesigner
KaPPA View
KEGG Atlas
MetaCore
PathVisio
VitaPad
Web-based
ArrayXPath
GEPA
iPath
MapMan
Omics Viewer
Pathway Explorer
PATIKA
Payaologue
ProMeTra
Reactome SkyPainter
WikiPathways
Cost
Availability
Description
URL
Free
Free
Free
Free
Free
$
Free
Free
Free
$
Free
Free
Win Mac Linux
Win Linux
Win Mac Linux
Win Mac Linux
Win
Win Mac Linux
Win
Win
Win Mac Linux
Win Mac Linux
Win Mac Linux
Win Mac Linux
Visualization of genetic regulatory networks, also with experimental data.
Interactive framework for pathway and expression data; 3D “bucket” view
Drawing and simulation of pathways and models, supports SBGN
Construction and visualization of pathway diagrams, supports SBGN
Pathway visualization and construction; abundance data
Full analysis suite; network and pathway visualizations; abundance data.
Drawing and simulation of pathways and models
Analysis and visualization of plant pathways and mapped abundance data
Visualization of abundance data on interactive KEGG pathways
Pathway, network and omics data analysis and visualization suite
Visualization and editing pathways, supports mapping of omics data
Editing of pathway diagrams, integration of abundance data
http://www.biotapestry.org
http://www.caleydo.org
http://www.celldesigner.org
http://tinyurl.com/EdinburghPE
http://www.genmapp.org
http://tinyurl.com/IngenuityPath
http://tinyurl.com/jdesigner
http://tinyurl.com/kappa-view
http://www.genome.jp/kegg
http://www.genego.com
http://www.pathvisio.org
http://tinyurl.com/vitapad
Mapping of abundance data to pathway visualizations
Analysis suite; visualization of transcriptomics data on pathways maps
Visualization and exploration of combined KEGG pathways
Application that visualizes abundance data on metabolic pathways
Tool that maps abundance data to BioCyc pathway diagrams
Visualization of abundance data on pathways
Extensive pathway visualization tool; good support for signaling pathways
Collaborative pathway annotation and visualization tool
Maps abundance matrices of multiple omics data types on pathways
Visualization of overrepresented pathways and reactions from gene lists
Wiki-based, community-driven pathway curation and visualization tool
http://tinyurl.com/ArrayXPath
http://tinyurl.com/GEPAT1
http://pathways.embl.de
http://tinyurl.com/MapManApp
http://www.biocyc.org
http://tinyurl.com/pathwayexp
http://www.patika.org
http://celldesigner.org/payao
http://tinyurl.com/ProMeTra
http://reactome.org
http://www.wikipathways.org
Free
Free
Free
Free
Free
Free
Free
Free
Free
Free
Free
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
195
Tools for Visualization of Multivariate Data
Name
Stand-alone
BicOverlapper
BiGGEsTS
Brain Explorer
Caryoscope
Data Matrix Viewer
EXPANDER
GENESIS
GeneSpring GX
GeneVAnD
geWorkbench
Hierarchical Clustering Explorer
Java TreeView
Mayday
MultiExperiment Viewer
PointCloudXplore
Spotfire Functional Genomics
TimeSearcher
R/BioConductor
Geneplotter
Web-based
ExpressionProfiler
GenePattern
Cost
OS
Description
URL
Free
Free
Free
Free
Free
Free
Free
$
Free
Free
Free
Free
Free
Free
Free
$
Free
Win Mac Linux
Win Mac Linux
Win Mac
Win Mac Linux
Win Mac Linux
Win Linux
Win Mac Linux
Win Mac Linux
Win Mac Linux
Win Mac Linux
Win
Win Mac Linux
Win Mac Linux
Win Mac Linux
Win Mac Linux
Win
Win
Visualization of biclusters combined with profile plots and heat maps
Heat map-based bicluster visualization
Visualization of 3D transcription data in the central nervous system
Abundance data mapped to chromosomal location
Simple profile plot visualization; supports Gaggle
Heat maps, scatter plots and profile plots of cluster averages
Analysis suite; offers several interactive visualizations
Analysis suite; interactive and linked visualizations; also networks
Linked heat maps, dendrograms and 2D/3D scatter plots
Modular suite; heat maps, dendrograms, profile and scatter plots
Linked heat map, profile and scatter plots; systematic exploration
Linked heat maps, karyoscopes, sequence alignments, scatter plots
Modular suite; many linked visualizations; enhanced heat map113
Analysis suite; heat maps, dendrograms, profile and scatter plots
Visualization of 3D transcription data in Drosophila embryos
Analysis suite; many linked visualizations and exploration tools
Exploration and analysis of time series; advanced profile plots
http://vis.usal.es/bicoverlapper/
http://tinyurl.com/BiGGEsTS
http://tinyurl.com/brainExplorer
http://tinyurl.com/caryoscope
http://gaggle.systemsbiology.net
http://acgt.cs.tau.ac.il/expander
http://genome.tugraz.at
http://tinyurl.com/genespring
http://tinyurl.com/GeneVAnD
http://tinyurl.com/geWorkbench
http://tinyurl.com/HCExplorer
http://jtreeview.sourceforge.net
http://tinyurl.com/maydaywp
http://www.tm4.org
http://tinyurl.com/PointCloudXplore
http://spotfire.tibco.com
http://tinyurl.com/timesearcher
Free
Win Mac Linux
Karyoscope-style plots and other visualizations
http://www.bioconductor.org
Transcriptomics data analysis suite with basic visualizations
Modular analysis platform; several visualization modules available
http://tinyurl.com/exprespro
http://tinyurl.com/GenePatt
Free
Free
Visualization of Large-Scale Biological Data / Bio-IT World 2011 / N Gehlenborg & M Meyer
196
Download