CHI99 Panel Comparative Evaluation of Usability Tests Presentation by Rolf Molich DialogDesign Denmark molich@dialogdesign.dk CHI99 Panel Comparative Evaluation of Usability Tests Take a web-site. Take nine professional usability teams. Let each team usability test the web-site. Are the results similar? What Have We Done? Nine teams have usability tested the same web-site – Seven professional teams – Two student teams Test web-site: www.hotmail.com Free e-mail service Panel Format Introduction (Rolf Molich) Five minute statements from five participating teams The Customer’s point of view (Meeta Arcuri, Hotmail) Conclusions (Rolf Molich) Discussion - 30 minutes Purposes of Comparison Survey the state-of-the art within professional usability testing of websites. Investigate the reproducibility of usability test results NON Purposes of Comparison To pick a winner To make a profit Basis for Usability Test Web-site address: www.hotmail.com Client scenario Access to client through intermediary Three weeks to carry out test What Each Team Did Run standard usability test Anonymize the usability test report Send the report to Rolf Molich Problems Found Total number of different usability problems found 300 Found by seven teams six teams five teams four teams three teams two teams one team 1 1 4 4 15 49 226 (75%) Comparative Usability Evaluation 2 Barbara Karyukina, SGI (USA) Klaus Kaasgaard & Ann D. Thomsen, KMD (Denmark) Lars Schmidt and others, Networkers (Denmark) Meghan Ede and others, Sun Microsystems, Inc., (USA) Wilma van Oel, P5 (The Netherlands) Meeta Arcuri, Hotmail, Microsoft Corp. (USA) (Customer) Rolf Molich, DialogDesign (Denmark) (Coordinator) Comparative Usability Evaluation 2 Joseph Seeley, NovaNET Learning Inc. (USA) Kent Norman, University of Maryland (USA) Torben Norgaard Rasmussen and others, Technical University of Denmark Marji Schumann and others, Southern Polytechnic State University (USA) CHI99 Panel Comparative Evaluation of Usability Tests Presentation by Barbara Karuykina SGI, Wisconsin USA barbarak@sgi.com Challenges: Twenty functional areas + User preferences questions Possible Solutions: Two usability tests Surveys User notes Focus groups Results: 26 tasks + 10 interview questions 100 findings Challenges: Twenty functional areas + User preferences questions Problems Found Total number of different usability problems found 300 Found by seven teams six teams five teams four teams three teams two teams one team 1 1 4 4 15 49 226 (75%) CHI99 Panel Comparative Evaluation of Usability Tests Presentation by Klaus Kaasgaard Kommunedata Denmark kka@kmd.dk Slides currently not available CHI99 Panel Comparative Evaluation of Usability Tests Presentation by Lars Schmidt Framtidsfabriken Networkers Denmark ls@networkers.dk Team E Framtidsfabriken Networkers Testlab, Denmark Key learnings CUE-2 Setting up the test – Insist on dialog with customer – Secure complete understanding of user groups and user tasks – Narrow down test goals Writing the report – Use screendumps – State conclusions - skip the premises – Test the usability of the usability report Improving Test Methodology Searching for usability and usefulness – Hook up with different methodologies (e.g. interviews) Focus on website context – Test against e.g. YahooMail – Test against softwarebased email clients CHI99 Panel Comparative Evaluation of Usability Tests Presentation by Meghan Ede Sun Microsystems California, USA meghan.ede@sun.com Hotmail Study Requests 18 Specific Features e.g. Registration, Login, Compose... 6 Questions e.g. "How do users currently do email?" 24 Potential Study Areas Usability Methods Expert Review 6 Reviewers 6 Questions Usability Study 6 Participants (3 + 3) 5 Tasks (with sub-tasks) Report Description 1. Executive Summary - 4 Main High-Level Themes - Brief Study Description 2. Debriefing Meeting Summary - 7 Areas (e.g. overall, navigation, power features, ...) 3. Findings - 31 Sections - Study Requests, Extra Areas, Bugs, Task Times, Study Q & A 4. Study Description Total: 36 Pages - 150 Findings Lessons Learned Importance of close contact with product team Consider including: severity ratings more specific recommendations screen shots Discussion Issues How can we measure the usability of our reports? How to deal with the difference between number of problems found and number included in report? CHI99 Panel Comparative Evaluation of Usability Tests Presentation by Wilma van Oel P5 The Netherlands w.vanoel@p5-adviseurs.nl Wilma van Oel P5 adviseurs voor produkt-& kwaliteitsbeleid quality & product management consultants Amsterdam, the Netherlands Structure of Presentation 1. Introduction 2. Deviations in approach – Test design – Results and recommendations 3. Lessons for the future – Change in approach? – Was it worth the effort? Introduction • Company: P5 Consultants • Personal background: psychologist Test design Subjects: n=11, pilot, ‘critical users’, 1 hour session Data collection: log software, video recording Methods: lab evaluation + informal approach Techniques: exploration, task execution, think aloud, interview, questionnaire Tool: SUS A Test Session Results and recommendations Results: 'general' severity? Negative n = median Positive n > mean Recommendations: general not 'how' Lessons for the future Change in approach? – Methods: add a usability inspection method – Procedure: extensive analysis, add session time – Results: less general, severity? Was it worth the effort? – Company: to get experience & benchmarking – Personally: to improve skills, knowledge CHI99 Panel Comparative Evaluation of Usability Tests Presentation by Meeta Arcuri Microsoft Corporation California, USA meeta@hotmail.com CUE - 2 The Customer’s Perspective Meeta Arcuri User Experience Manager Microsoft Corp., San Jose, CA Customer Summary of Findings New findings ~ 4% Validation of known issues ~ 67% – Previous finding from our lab tests – Finding from on-going inspections Remainder - beyond Hotmail Usability – Business reasons for not changing – Out of Hotmail’s control (partner sites) – Problems generic to the web Report Content: Positive Observations Quick and Dirty results Recommendations for problem fixes Participant quotes – get tone/intensity of feedback Exact # of P who encountered each issue Background of Participants Environment (browser, speed of connection, etc.) Additional Strengths of Reports Fresh perspectives Lots of data on non-US users Recommendations from participants Trend reporting Report of outdated material on site (some help files) Appreciate positive findings, comments Report Content: Weaknesses Some recommendations not sensitive to web issues (performance, security) At least one finding irreproducible (not preserving fields in Reg. Form) Frequency of issue reported was sometimes vague. Some descriptions terse, vague - had to decipher How Hotmail Will Use Results Cross-validate new findings with Hotmail Customer Service reports Lots of good data to cite in planning meetings Some good recommendations given by labs and participants Conclusion Focused, iterative testing would give better results Wide array of user data very valuable Overall - good qualitative and quantitative data to help prioritize, schedule, and improve usability of Hotmail. CHI99 Panel Comparative Evaluation of Usability Tests Presentation by Rolf Molich DialogDesign Denmark molich@dialogdesign.dk Comparison of Tests Based only on test reports Liberal scoring Focus on major differences Two generally recognized textbooks: – Dumas and Redish, ”A Practical Guide to Usability Testing” – Jeff Rubin, ”Handbook of Usability Testing” Resources Team Person hours used for test A B 136 123 C D E 84 (16) 130 F G 50 107 H J 45 218 # Usability professionals 2 1 1 1 3 1 1 3 6 Number of tests 7 6 6 50 9 5 11 4 6 Usability Results Team A B C D E F G H J # Positive findings 0 8 4 7 24 25 14 4 6 # Problems 26 150 17 10 58 75 30 18 20 % Exclusive 42 24 10 57 51 33 56 60 71 Usability Results Team B C D E F G H J # Problems 26 150 17 10 58 75 30 18 20 % Core problems (100%=26) 38 73 35 8 58 54 50 27 31 136 123 84 Person hours used for test A NA 130 50 107 45 218 Problems Found Total number of different usability problems found 300 Found by seven teams six teams five teams four teams three teams two teams one team 1 1 4 4 15 49 226 (75%) Conclusion If Hotmail is typical, then the total number of usability problems for a typical web-site is huge, much larger than you can hope to find in one series of usability tests Usability testing techniques can be improved We need more awareness of the Usability of Usability work Download Test Reports and Slides http://www.dialogdesign.dk/cue2.htm