Last time …

advertisement
Last time …
ow viewers interpret data. In this work, we introduce perceptual kernels: distance maments. Perceptual kernels represent perceptual differences between and within visual
cable to visualization evaluation and automated design. We report results from crowdolor, shape, size and combinations thereof. We analyze kernels estimated using five
ngs among pairs, ordinal triplet comparisons, and manual spatial arrangement — and
We derive recommendations for collecting perceptual similarities, and then demonstrate
omate visualization design decisions.
Hot off the press …
Network showing similarities of perceptual kernels,
Heer et al 2014
design. As viewgs, it is important
les such as color,
ical perception.
e perceived simiables. We broadly
s perceptual disistance matrix of
measures quantify
create visualizahows a perceptual
d using grayscale
y. The prominent
mong shapes that
ualization design.
e an effectiveness
data types (nomis chosen, these
Fig. 1: (Left) A crowd-estimated perceptual kernel for a shape palette.
The kernel was obtained using ordinal triplet matching. (Right) A
two-dimensional projection of the palette shapes obtained via multidimensional scaling of the perceptual kernel.
2
R ELATED W ORK
What is similar, what is not? Five different ways of asking, very
similar answers
Fig. 8: Experiment 1 Results. Univariate perceptual kernels for the shape, color and size palettes across different judgment types. Darker colors
indicate higher perceptual similarity. For each palette, the matrices exhibit consistent structures across judgment types.
4.1.3 Size
same paper: color
discrimination
Fig. 9: (a) A crowd-estimated perceptual kernel elicited using triplet
matching (Tm) for the color palette. (b) A two-dimensional projection of the palette colors obtained via multidimensional scaling of the
perceptual kernel.
Munsell color hue wheel
Self-test
•
Test for hues (Farnsworth Munsell 100 Hue Test):
http://www.xrite.com/online-color-test-challenge
Out of these two charts,
which one is better?
1st option: Network
2nd option: Chord Diagram
Inference with
Graphics
stat/engl 332
Heike Hofmann
Outline
• Visual inference:
introduce lineups and properties
• Define power of lineups
• Case study: efficiency of airports
1
Lineup Example
2
3
4
6
7
8
20000
15000
10000
5
Which plot is the most different?
Housing
20000
15000
10000
9
10
11
12
20000
15000
10000
200
400
600
800
200
400
600
800
200
Climate−terrain
400
600
800
200
400
600
800
!
Lineups
• data plot is placed randomly among decoys;
“police lineup”
• are we able to still identify the data?
Visual p-value
… yes? - that’s
evidence that the data is different
Visual p-value
from the decoy
Assume weplots
have N independent observers. Let X be number of
Graphical
Inference
Heike
Hofmann
• Probability to identify data ‘accidentally’: 1 in m
difference as visual p-value:
• quantify
Pr(at least x out of n observers identified the data)
Outline
My
Research
Area
observers who pick the data plot from a lineup of size m.
Under null hypothesis X ⇤ BN,1/m , and data plot is not
distinguishable from null plots
Motivation
If k observers pick the data plot from the lineup, we get an
estimate of a visual p-value as
⇥
⇥i
⇥N
N
⇤
Simulation!
N
1
1
study
example:
5 out of
p 9 value = P (X ⇥ k) =
1
-4
i
m
m
P(XPower
≥ 5) ≤ 10
Visual
Testing
Protocol
1st
responses
picked data
Analysis
Comparing
P(X ≥ 5)i=k=
i
Power of a design
• Premise: given a choice of plot designs, that
design is better that makes it the easiest for an
observer to identify the signal
!
• Power: Pr(pick data plot from lineup)
5 out of 9 people picked first example:
Power is 5/9
data
that
size.
data
The
bles
f the
2. Create lineups from competing designs: using the same data,
render lineups of all competing designs.
Compare Designs
3. Evaluate Lineups: by presenting the lineups to independent observers. Assess both signal strength and time needed by individuals to come to a decision. Note that each observer should only
be exposed to each lineup data once.
Simplest
Scenario
4. Evaluate
Competing Designs: differences in signal strength or
ce of
ch is
icks
rom
data
Type
more
hose
dom
data
as a
alue
g the
•
time to decision are due to differences in the design. In the case
that individuals were shown multiple lineups (as part of a bigger
One data
set, two
study),
it is possible
to designs:
correct outcome measurements for an
individual’s
visualevaluate
ability.
n1 observers
design I, x1 identify data
Comparing
power of evaluate
competing designs
involves comparn2 observers
designtherefore
2, x2 identify
data
ing percentages of correct responses ⇥b1 and ⇥b2 . An ·100% confidence t-test:
interval for this comparison is given as
q
⇥b1 ⇥b2 ± t1 /2,n 1 ⇥b1 (1 ⇥b1 )/n1 + ⇥b2 (1 ⇥b2 )/n2 ,
(1)
•
^ 1Welch-Satterthwaite
^ 2 [27]
wherefor
n is the
of the degrees of freeπ
= x1/n1 and π
= xestimate
2/n2
dom. Note that we use ⇥bi = (xi + 1)/(ni + 1) and ni + 2 for a better
coverage of the confidence interval [1]. In the case of more than two
More interesting:
What affects Power?
Add in covariates and assess power of
signal strength
•
• design
• other problem specific properties
• individuals’ visual abilities
Statistical Method:
logistic regression with random effect for individuals
Airport Efficiency and
Wind Direction
• Data: Wheel-on and -off events for three years (FAA),
combined with weather (wind condition) for each
event (restricted to normal operating hours between
6 am and 10 pm)
• results in approx. 500k events
• efficiency:
time in mins
between
wheel events
SEA airport
Displaying windefficiency relationship
• Wind direction is measured in angles (discrete, in
1
2
3
4
1.08
9
5
10 degree intervals)
• Fill color indicates time
6
7
between wheel events
0.8
N/N
NW
Minutes between
Wheel Events
10
NE
8
0.6
7
0.4
• Additional white helper
11
12
6
0.2
13 W
0.0
14
15
line
5
E
4
3
2
SW
16
17
18
SE
19
20
1
0
most in
time
needed
Wind direction
SEA
S
for these directions
Displaying windefficiency relationship
• Orthogonal instead of polar layout:
N/N
1.0
1.0
0.8
NW
NE
0.8
0.6
0.4
0.6
0.2
0.0
W
E
0.4
0.2
SW
SE
0.0
S
N
E
S
Wind Direction
W
Designs &
Experimental Setup
N/N
1.0
0.8
NW
NE
0.6
0.4
0.2
0.0
W
E
SW
SE
S
N/N
1.0
0.8
•
NW
NE
0.6
design: polar versus orthogonal
with and without grid lines
•
• shifts in direction (in
• two replicates each
0.4
0.2
0.0
W
E
SW
SE
S
sample size (in %): 2, 4, 6, 8, 10, 24
1.0
0.8
o): 0, 90, 180, 270
!
0.6
0.4
0.2
0.0
N
E
S
W
Wind Direction
1.0
0.8
• results in 192 different plots, included in
as many lineups
0.6
0.4
0.2
0.0
N
E
S
W
show ten lineups to
each participant in
user study
Amazon Mechanical Turk
‘Automaton’ Chess
Player from 18th century
(cheap) web service for
jobs that humans are
better at solving than
machines
Evaluation
• 958 evaluations by 100 participants
• use one of ten lineups as reference - if people don’t get a
very easy one correct, we will exclude their data from the
study
euclid
polar
count
40
30
20
10
0
101
101.5
102
102.5
101
101.5
Time taken (in seconds) for answer
answer
correct
wrong
102
102.5
Comparison of Designs
2
4
6
set by design
has the high
charts.
N/N
1.0
0.8
NW
NE
0.6
0.4
0.2
0.0
W
E
SW
SE
P−
a
P+
a
E−
a
E+
a
●
●
b
●
b
S
+
N/N
1.0
0.8
NW
NE
0.6
0.4
0.2
0.0
W
E
SW
●
●
b
●
b
SE
S
1.0
0.8
0.6
0.4
0.2
●
●
a
●
a
0.0
N
+
E
S
W
Wind Direction
1.0
0.8
0.6
0.4
0.2
●
●
a
●
a
1.0
0.0
N
E
S
W
Wind Direction
8
10
24
0.8
N/N
1.0
NW
NE
Predicted Power
0.8
0.6
0.4
0.2
0.0
W
E
SW
P−
●
b
●
b
●
b
SE
S
+
N/N
1.0
0.8
NW
NE
0.6
0.4
0.2
0.0
W
E
SW
P+
●
b
●
b
●
b
SE
S
1.0
0.8
0.6
0.4
E−
a
E+
a
●
a
●
a
a
●
a
●
0.2
0.6
0.4
0.0
N
E
S
W
Wind Direction
+
0.2
1.0
0.8
0.6
0.4
●
●
0.2
0.0
N
E
S
W
Wind Direction
0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
Power and Power Comparisons
0.0
Proportion
Polar charts perform
significantly correct
worse
0.2
0.4
0.6
0.8
1.0
0.0
2
Fig. 8. Power results for four competing designs: polar versus cartesian,
each significant
with and without
a reference
panels
are facetted
by in
sample
No
benefit
fromline;
helper
lines
(except
people’s
Fig. 9. Pred
size (as percentage of original data). Dots show estimated power, sury axis show
confidence)
rounded by intervals of standard errors. The letters at the front of each
show averag
panel in
allow
comparisons
designs
all designs
difShift
wind
directionacross
doesallnot
have[18]:
an impact
on with
performance
...
ferent letters have significantly different power (at = 0.05). This is
Time take
Effect of shifts
euclid
polar
1.0
50
0.6
40
0.4
count
Predicted Power
0.8
0.2
30
20
10
0.0
2
4
6
8
10
2
4
Sample Size (in %)
90
180
6
8
−1.0
10
−0.5
0.0
0.5
1.0
Subject−specific Random Effects
Shift in Wind Direction
0
0
270
Figure 4: Histograms of random effects. Su
(right).
thickeffects
solid lines
• average power drawn by
thinobject
lines
type not found
• subject-specific power shown##with
Error:
how
doplot
weofget
power
• subject specific effects quite large
Figure- 5:
Effects
model
(1). Scenar
observers?
used to show significant differences between
Conclusions
for Seattle
• overwhelming evidence that winds from SE lead to
least efficient traffic flow
• BUT: winds from NW lead to most efficient traffic
flow
• naive conclusion: use runways in other direction
for days with SE winds?
Conclusions
• Use lineup scenario to get valid p-values for visual findings • useful in situations where conventional methods break
down (large data)
• define power (function) for lineups to evaluate
- competing designs
- measure impact of other co-variates on display
• Airport study: euclidean charts better at detecting patterns
• Paper has 2nd case study
funded by
Download