Wrestling with the question • in private: ponder on case-by-case basis Evaluation, When and How - as an author - as a reviewer • in public: series of 4 meta-papers “She generally gave herself very good advice, (though she very seldom followed it).” - Lewis Carroll Tamara Munzner University of British Columbia - process and pitfalls Dagstuhl07 - nested model of design and validation InfoVis09 Evaluation: How Much Evaluation Is Enough? Panel,VIS13 • NM revisited: blocks & guidelines BELIV12, Inf Vis Journal 13 - design study methodology InfoVis12 2 Victories and challenges: I User studies: tricky to do right • evolving sophistication: the user study pendulum swings - a few dubious ones, lacking rigor - ideal: collaborate with experts... several times! - some good ones appear - some authors do user studies without understanding why or how http://www.biologycorner.com/resources/pendulum.jpg 5 Of course... • ... you should evaluate your work - use appropriate methods! • ... you should not have a user study in every paper - avoid litmus test and cargo cult thinking http://en.wikipedia.org/wiki/File:Litmus_paper.JPG http://blog.bhargreaves.com/wp-content/uploads/2010/04/cargo-cult.jpg 9 • as you go, to make it better: formative • iterate! learn winnow PRECONDITION personal validation 3 cast discover design implement deploy CORE inward-facing validation reflect write ANALYSIS outward-facing validation [Design Study Methodology: Reflections from the Trenches and from the Stacks. Sedlmair, Meyer, Munzner. IEEE TVCG 18(12): 2431-2440, 2012 (Proc. InfoVis 2012).] 4 Victories and challenges: III • post-deployment studies with target users - we’ve moved beyond “my friends liked it” - we’ve moved beyond “I’m the only one who’s used it” - new frontier: multiple regression for comparison - new frontier: post-adoption studies [Cognitive measurements of graph aesthetics. Ware, Purchase, Colpoys, and McGill. Information Visualization, 2002. 1(2): p. 103-110.] • Seven Scenarios: only 5 out of 800! [Empirical Studies in Information Visualization: Seven Scenarios. Lam, Bertini, Isenberg, Plaisant, and Carpendale. TVCG 18(9):1520-1536, 2012.] • what happens after you get that first paper out?... - different axis from lab/field • BELIV workshops - Experimental Human-Computer Interaction, Helen Purchase, Cambridge Univ Press - some reviewers expect all papers to have user studies observe target users post-deployment (“field study”) measure adoption • mismatch: cannot show technique good with system timings • mismatch: cannot show abstraction good with lab study • at the end, to show you got it right: summative • qualitative vs quantitative - How to Design and Report Experiments, Andy Field & Graham Hole, SAGE • but pushes to change culture often overshoot... problem domain: observe target users using existing tools data/task abstraction: encoding/interaction technique: justify design wrt alternatives algorithm: measure system time analyze computational complexity analyze results qualitatively measure human time with lab experiment (“user study”) - new frontier: thinking beyond time and error • good resources - rigorous studies are common “Begin at the beginning and go on till you come to the end; then stop.” - Lewis Carroll • significance testing with controlled experiments • many ways to go wrong that are not obvious from reading finished papers - no user studies at all When to evaluate? [A Nested Model for Visualization Design and Validation. Munzner. TVCG 15(6):921-928, 2009 (Proc. InfoVis 09).] Victories and challenges: II “If you don't know where you are going, any road will get you there.” - Lewis Carroll • we’ve come a long way! Evaluation: broadly interpreted - 06 AVI, 08 CHI, 10 CHI, 12 VisWeek 6 7 8