Last time … ow viewers interpret data. In this work, we introduce perceptual kernels: distance maments. Perceptual kernels represent perceptual differences between and within visual cable to visualization evaluation and automated design. We report results from crowdolor, shape, size and combinations thereof. We analyze kernels estimated using five ngs among pairs, ordinal triplet comparisons, and manual spatial arrangement — and We derive recommendations for collecting perceptual similarities, and then demonstrate omate visualization design decisions. Hot off the press … Network showing similarities of perceptual kernels, Heer et al 2014 design. As viewgs, it is important les such as color, ical perception. e perceived simiables. We broadly s perceptual disistance matrix of measures quantify create visualizahows a perceptual d using grayscale y. The prominent mong shapes that ualization design. e an effectiveness data types (nomis chosen, these Fig. 1: (Left) A crowd-estimated perceptual kernel for a shape palette. The kernel was obtained using ordinal triplet matching. (Right) A two-dimensional projection of the palette shapes obtained via multidimensional scaling of the perceptual kernel. 2 R ELATED W ORK What is similar, what is not? Five different ways of asking, very similar answers Fig. 8: Experiment 1 Results. Univariate perceptual kernels for the shape, color and size palettes across different judgment types. Darker colors indicate higher perceptual similarity. For each palette, the matrices exhibit consistent structures across judgment types. 4.1.3 Size same paper: color discrimination Fig. 9: (a) A crowd-estimated perceptual kernel elicited using triplet matching (Tm) for the color palette. (b) A two-dimensional projection of the palette colors obtained via multidimensional scaling of the perceptual kernel. Munsell color hue wheel Self-test • Test for hues (Farnsworth Munsell 100 Hue Test): http://www.xrite.com/online-color-test-challenge Out of these two charts, which one is better? 1st option: Network 2nd option: Chord Diagram Inference with Graphics stat/engl 332 Heike Hofmann Outline • Visual inference: introduce lineups and properties • Define power of lineups • Case study: efficiency of airports 1 Lineup Example 2 3 4 6 7 8 20000 15000 10000 5 Which plot is the most different? Housing 20000 15000 10000 9 10 11 12 20000 15000 10000 200 400 600 800 200 400 600 800 200 Climate−terrain 400 600 800 200 400 600 800 ! Lineups • data plot is placed randomly among decoys; “police lineup” • are we able to still identify the data? Visual p-value … yes? - that’s evidence that the data is different Visual p-value from the decoy Assume weplots have N independent observers. Let X be number of Graphical Inference Heike Hofmann • Probability to identify data ‘accidentally’: 1 in m difference as visual p-value: • quantify Pr(at least x out of n observers identified the data) Outline My Research Area observers who pick the data plot from a lineup of size m. Under null hypothesis X ⇤ BN,1/m , and data plot is not distinguishable from null plots Motivation If k observers pick the data plot from the lineup, we get an estimate of a visual p-value as ⇥ ⇥i ⇥N N ⇤ Simulation! N 1 1 study example: 5 out of p 9 value = P (X ⇥ k) = 1 -4 i m m P(XPower ≥ 5) ≤ 10 Visual Testing Protocol 1st responses picked data Analysis Comparing P(X ≥ 5)i=k= i Power of a design • Premise: given a choice of plot designs, that design is better that makes it the easiest for an observer to identify the signal ! • Power: Pr(pick data plot from lineup) 5 out of 9 people picked first example: Power is 5/9 data that size. data The bles f the 2. Create lineups from competing designs: using the same data, render lineups of all competing designs. Compare Designs 3. Evaluate Lineups: by presenting the lineups to independent observers. Assess both signal strength and time needed by individuals to come to a decision. Note that each observer should only be exposed to each lineup data once. Simplest Scenario 4. Evaluate Competing Designs: differences in signal strength or ce of ch is icks rom data Type more hose dom data as a alue g the • time to decision are due to differences in the design. In the case that individuals were shown multiple lineups (as part of a bigger One data set, two study), it is possible to designs: correct outcome measurements for an individual’s visualevaluate ability. n1 observers design I, x1 identify data Comparing power of evaluate competing designs involves comparn2 observers designtherefore 2, x2 identify data ing percentages of correct responses ⇥b1 and ⇥b2 . An ·100% confidence t-test: interval for this comparison is given as q ⇥b1 ⇥b2 ± t1 /2,n 1 ⇥b1 (1 ⇥b1 )/n1 + ⇥b2 (1 ⇥b2 )/n2 , (1) • ^ 1Welch-Satterthwaite ^ 2 [27] wherefor n is the of the degrees of freeπ = x1/n1 and π = xestimate 2/n2 dom. Note that we use ⇥bi = (xi + 1)/(ni + 1) and ni + 2 for a better coverage of the confidence interval [1]. In the case of more than two More interesting: What affects Power? Add in covariates and assess power of signal strength • • design • other problem specific properties • individuals’ visual abilities Statistical Method: logistic regression with random effect for individuals Airport Efficiency and Wind Direction • Data: Wheel-on and -off events for three years (FAA), combined with weather (wind condition) for each event (restricted to normal operating hours between 6 am and 10 pm) • results in approx. 500k events • efficiency: time in mins between wheel events SEA airport Displaying windefficiency relationship • Wind direction is measured in angles (discrete, in 1 2 3 4 1.08 9 5 10 degree intervals) • Fill color indicates time 6 7 between wheel events 0.8 N/N NW Minutes between Wheel Events 10 NE 8 0.6 7 0.4 • Additional white helper 11 12 6 0.2 13 W 0.0 14 15 line 5 E 4 3 2 SW 16 17 18 SE 19 20 1 0 most in time needed Wind direction SEA S for these directions Displaying windefficiency relationship • Orthogonal instead of polar layout: N/N 1.0 1.0 0.8 NW NE 0.8 0.6 0.4 0.6 0.2 0.0 W E 0.4 0.2 SW SE 0.0 S N E S Wind Direction W Designs & Experimental Setup N/N 1.0 0.8 NW NE 0.6 0.4 0.2 0.0 W E SW SE S N/N 1.0 0.8 • NW NE 0.6 design: polar versus orthogonal with and without grid lines • • shifts in direction (in • two replicates each 0.4 0.2 0.0 W E SW SE S sample size (in %): 2, 4, 6, 8, 10, 24 1.0 0.8 o): 0, 90, 180, 270 ! 0.6 0.4 0.2 0.0 N E S W Wind Direction 1.0 0.8 • results in 192 different plots, included in as many lineups 0.6 0.4 0.2 0.0 N E S W show ten lineups to each participant in user study Amazon Mechanical Turk ‘Automaton’ Chess Player from 18th century (cheap) web service for jobs that humans are better at solving than machines Evaluation • 958 evaluations by 100 participants • use one of ten lineups as reference - if people don’t get a very easy one correct, we will exclude their data from the study euclid polar count 40 30 20 10 0 101 101.5 102 102.5 101 101.5 Time taken (in seconds) for answer answer correct wrong 102 102.5 Comparison of Designs 2 4 6 set by design has the high charts. N/N 1.0 0.8 NW NE 0.6 0.4 0.2 0.0 W E SW SE P− a P+ a E− a E+ a ● ● b ● b S + N/N 1.0 0.8 NW NE 0.6 0.4 0.2 0.0 W E SW ● ● b ● b SE S 1.0 0.8 0.6 0.4 0.2 ● ● a ● a 0.0 N + E S W Wind Direction 1.0 0.8 0.6 0.4 0.2 ● ● a ● a 1.0 0.0 N E S W Wind Direction 8 10 24 0.8 N/N 1.0 NW NE Predicted Power 0.8 0.6 0.4 0.2 0.0 W E SW P− ● b ● b ● b SE S + N/N 1.0 0.8 NW NE 0.6 0.4 0.2 0.0 W E SW P+ ● b ● b ● b SE S 1.0 0.8 0.6 0.4 E− a E+ a ● a ● a a ● a ● 0.2 0.6 0.4 0.0 N E S W Wind Direction + 0.2 1.0 0.8 0.6 0.4 ● ● 0.2 0.0 N E S W Wind Direction 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Power and Power Comparisons 0.0 Proportion Polar charts perform significantly correct worse 0.2 0.4 0.6 0.8 1.0 0.0 2 Fig. 8. Power results for four competing designs: polar versus cartesian, each significant with and without a reference panels are facetted by in sample No benefit fromline; helper lines (except people’s Fig. 9. Pred size (as percentage of original data). Dots show estimated power, sury axis show confidence) rounded by intervals of standard errors. The letters at the front of each show averag panel in allow comparisons designs all designs difShift wind directionacross doesallnot have[18]: an impact on with performance ... ferent letters have significantly different power (at = 0.05). This is Time take Effect of shifts euclid polar 1.0 50 0.6 40 0.4 count Predicted Power 0.8 0.2 30 20 10 0.0 2 4 6 8 10 2 4 Sample Size (in %) 90 180 6 8 −1.0 10 −0.5 0.0 0.5 1.0 Subject−specific Random Effects Shift in Wind Direction 0 0 270 Figure 4: Histograms of random effects. Su (right). thickeffects solid lines • average power drawn by thinobject lines type not found • subject-specific power shown##with Error: how doplot weofget power • subject specific effects quite large Figure- 5: Effects model (1). Scenar observers? used to show significant differences between Conclusions for Seattle • overwhelming evidence that winds from SE lead to least efficient traffic flow • BUT: winds from NW lead to most efficient traffic flow • naive conclusion: use runways in other direction for days with SE winds? Conclusions • Use lineup scenario to get valid p-values for visual findings • useful in situations where conventional methods break down (large data) • define power (function) for lineups to evaluate - competing designs - measure impact of other co-variates on display • Airport study: euclidean charts better at detecting patterns • Paper has 2nd case study funded by