Usability Evaluation 1 www.id-book.com The aims Explain the key concepts used in evaluation. Introduce different evaluation methods. Show how different methods are used for different purposes at different stages of the design process and in different contexts of use. Show how evaluators mix and modify methods. Discuss the practical challenges 2 www.id-book.com ©2011 Why, what, where, and when to evaluate Iterative design & evaluation is a continuous process that examines: Why: to check that users can use the product and that they like it. What: a conceptual model, early prototypes of a new system and later, more complete prototypes. Where: in natural and laboratory settings. When: throughout design; finished products can be evaluated to collect information to inform new products. Designers need to check that they understand users’ requirements. 3 www.id-book.com ©2011 Bruce Tognazzini tells you why you need to evaluate “Iterative design, with its repeating cycle of design and testing, is the only validated methodology in existence that will consistently produce successful results. If you don’t have user-testing as an integral part of your design process you are going to throw buckets of money down the drain.” See AskTog.com for topical discussions about design and evaluation. 4 www.id-book.com ©2011 Types of evaluation Controlled settings involving users, e.g., usability testing & experiments in laboratories and living labs. Natural settings involving users, e.g., field studies to see how the product is used in the real world. Any settings not involving users, e.g., consultants critique, predict, analyze & model aspects of the interface analytics. 5 www.id-book.com ©2011 Usability lab http://iat.ubalt.edu/usability_lab/ 6 www.id-book.com ©2011 Living labs People’s use of technology in their everyday lives can be evaluated in living labs. Such evaluations are too difficult to do in a usability lab. E.g., http://www.sfu.ca/westhouse.html Currently occupied by participants! 7 www.id-book.com ©2011 Usability testing & field studies can be complementary 8 www.id-book.com ©2011 Evaluation case studies Experiment to investigate a computer game Regan L. Mandryk and Kori M. Inkpen. 2004. Physiological indicators for the evaluation of colocated collaborative play. In Proc. ACM conf. on Computer supported cooperative work (CSCW '04), 102111. DOI=10.1145/1031607.1031625 http://doi.acm.org/10.1145/1031607.1031625 In the wild field study of skiers Crowd sourcing 9 www.id-book.com ©2011 Challenge & engagement in a collaborative immersive game Physiological measures were used. Engagement was measured in two conditions: 1. Playing against another person 2. Playing against a computer. 10 www.id-book.com ©2011 What does this data tell you? Rate each experience state on a scale from 1 to 5 (1=low, 5 = high) Playing against Playing against friend computer Boring Challenging Easy Engaging Exciting Frustrating Fun Mean St. Dev. Mean St. Dev. 2.3 3.6 2.7 3.8 3.5 2.8 3.9 0.949 1.08 0.823 0.422 0.527 1.14 0.738 1.7 3.9 2.5 4.3 4.1 2.5 4.6 0.949 0.994 0.850 0.675 0.568 0.850 0.699 Source: Mandryk and Inkpen (2004). 11 www.id-book.com ©2011 Question What were the precautionary measures that the evaluators had to take? 12 www.id-book.com ©2011 Evaluation case studies Experiment to investigate a computer game In the wild field study of skiers Francis Jambon and Brigitte Meillon. 2009. User experience evaluation in the wild. In Ext. Abstracts of the int. conf. on Human factors in computing systems (CHI EA '09), 4069-4074. http://doi.acm.org/10.1145/1520340.1520619 Crowd sourcing 13 www.id-book.com ©2011 Why study skiers in the wild ? Jambon et al. (2009) User experience in the wild. In: Proceedings of CHI ’09, ACM Press, New York, p. 4072. 14 www.id-book.com ©2011 e-skiing system components Jambon et al. (2009) User experience in the wild. In: Proceedings of CHI ’09, ACM Press, New York, p. 4072. ©2011 15 www.id-book.com Evaluation case studies Experiment to investigate a computer game In the wild field study of skiers Crowd sourcing Jeffrey Heer and Michael Bostock. 2010. Crowdsourcing graphical perception: using mechanical turk to assess visualization design. In Proc. of the int. Conf. on Human factors in computing systems (CHI '10). ACM, New York, NY, USA, 203-212. http://doi.acm.org/10.1145/1753326.1753357 16 www.id-book.com ©2011 Crowdsourcing-when might you use it? 17 www.id-book.com ©2011 Evaluating an ambient system The Hello Wall is a new kind of system that is designed to explore how people react to its presence. What are the challenges of evaluating systems like this? 18 www.id-book.com Evaluation methods Method Controlled Natural settings settings Observing x x Asking users Asking experts Testing x x Without users x x x x Modeling 19 www.id-book.com ©2011 The language of evaluation Analytics Analytical evaluation Controlled experiment Expert review or crit Field study Formative evaluation Heuristic evaluation In the wild study Living laboratory Predictive evaluation Summative evaluation Usability laboratory User studies Usability testing Users or participants 20 www.id-book.com Key points Evaluation & design are closely integrated in user centered design. Some of the same techniques are used in evaluation as for establishing requirements but they are used differently (e.g. observation interviews & questionnaires). Three types of evaluation: laboratory based with users, in the field with users, studies that do not involve users The main methods are: observing, asking users, asking experts, user testing, inspection, and modeling users’ task performance, analytics. Dealing with constraints is an important skill for evaluators to develop. 21 www.id-book.com ©2011 Butterfly Ballot “The Butterfly Ballot: Anatomy of disaster” was written by Bruce Tognazzini, http://www.asktog.com/columns/042ButterflyBall ot.html Try it as a design exercise – create a similar ballot, conduct a usability test (did people actually vote as they intended?), create a modified ballot, conduct a second test and see if the results are improved 22 www.id-book.com ©2011 23 www.id-book.com AN EVALUATION FRAMEWORK 24 www.id-book.com The aims are: Introduce and explain the DECIDE framework. Discuss the conceptual, practical, and ethical issues involved in evaluation. 25 www.id-book.com ©2011 DECIDE: a framework to guide evaluation Determine the goals. Explore the questions. Choose the evaluation methods. Identify the practical issues. Decide how to deal with the ethical issues. Evaluate, analyze, interpret and present the data. 26 www.id-book.com ©2011 Determine the goals What are the high-level goals of the evaluation? Who wants it and why? The goals influence the methods used for the study. Goals vary and could be to: identify the best metaphor for the design check that user requirements are met check for consistency investigate how technology affects working practices improve the usability of an existing product 27 www.id-book.com ©2011 Explore the questions Questions help to guide the evaluation. The goal of finding out why some customers prefer to purchase paper airline tickets rather than e-tickets can be broken down into subquestions: What are customers’ attitudes to e-tickets? Are they concerned about security? Is the interface for obtaining them poor? What questions might you ask about the design of a cell phone? 28 www.id-book.com ©2011 Choose the evaluation approach & methods The evaluation method influences how data is collected, analyzed and presented. E.g. field studies typically: Involve observation and interviews. Involve users in natural settings. Do not involve controlled tests Produce qualitative data. 29 www.id-book.com ©2011 Identify practical issues For example, how to: Selecting users Finding evaluators Selecting equipment Staying on budget Staying on schedule 30 www.id-book.com ©2011 Decide about ethical issues Develop an informed consent form Participants have a right to: - Know the goals of the study; - Know what will happen to the findings; - Privacy of personal information; - Leave when they wish; - Be treated politely. 31 www.id-book.com Evaluate, interpret & present data Methods used influence how data is evaluated, interpreted and presented. The following need to be considered: - Reliability: can the study be replicated? - Validity: is it measuring what you expected? - Biases: is the process creating biases? - Scope: can the findings be generalized? - Ecological validity: is the environment influencing the findings? i.e. Hawthorn effect. 32 www.id-book.com ©2011 Key points Many issues to consider before conducting an evaluation study. These include: goals of the study; involvment or not of users; the methods to use; practical & ethical issues; how data will be collected, analyzed & presented. The DECIDE framework provides a useful checklist for planning an evaluation study. 33 www.id-book.com ©2011 An exercise left for you… Find an evaluation study (proceedings of CHI, CSCW, SOUPS) Use the DECIDE framework to analyze it. Describe the aspect of DECIDE that are explicitly addressed in the report and which are not. On a scale of 1-5, where 1 = poor and 5 = excellent, how would you rate this study? 34 www.id-book.com EVALUATION STUDIES: FROM CONTROLLED TO NATURAL SETTINGS 35 www.id-book.com The aims: • Explain how to do usability testing • Outline the basics of experimental design • Describe how to do field studies 36 www.id-book.com ©2011 Usability testing Involves recording performance of typical users doing typical tasks. Controlled settings. Users are observed and timed. Data is recorded on video & key presses are logged. The data is used to calculate performance times, and to identify & explain errors. User satisfaction is evaluated using questionnaires & interviews. Field observations may be used to provide contextual understanding. 37 www.id-book.com ©2011 Experiments & usability testing Experiments test hypotheses to discover new knowledge by investigating the relationship between two or more things – i.e., variables. Usability testing is applied experimentation. Developers check that the system is usable by the intended user population for their tasks. Experiments may also be done in usability testing. 38 www.id-book.com ©2011 Usability testing vs research Usability testing Improve products Few participants Results inform design Usually not completely replicable Conditions controlled as much as possible Procedure planned Results reported to developers Experiments for research Discover knowledge Many participants Results validated statistically Must be replicable Strongly controlled conditions Experimental design Scientific report to scientific community 39 www.id-book.com Usability testing Goals & questions focus on how well users perform tasks with the product. Comparison of products or prototypes common. Focus is on time to complete task & number & type of errors. Data collected by video & interaction logging. Testing is central. User satisfaction questionnaires & interviews provide data about users’ opinions. 40 www.id-book.com ©2011 Usability lab with observers watching a user & assistant 41 www.id-book.com ©2011 Portable equipment for use in the field 42 www.id-book.com 43 www.id-book.com Mobile head-mounted eye tracker Picture courtesy of SensoMotoric Instruments (SMI), copyright 2010 44 www.id-book.com ©2011 Testing conditions Usability lab or other controlled space. Emphasis on: selecting representative users; developing representative tasks. 5-10 users typically selected. Tasks usually last no more than 30 minutes. The test conditions should be the same for every participant. Informed consent form explains procedures and deals with ethical issues. 45 www.id-book.com ©2011 Typical metrics Time to complete a task. Time to complete a task after a specified. time away from the product. Number and type of errors per task. Number of errors per unit of time. Number of navigations to online help or manuals. Number of users making a particular error. Number of users completing task successfully. 46 www.id-book.com ©2011 Other metrics? Not everything is performance driven… 47 www.id-book.com ©2011 Usability engineering orientation Aim is improvement with each version. Current level of performance. Minimum acceptable level of performance. Target level of performance. 48 www.id-book.com ©2011 How many participants is enough for user testing? The number is a practical issue. Depends on: schedule for testing; availability of participants; cost of running tests. Typically 5-10 participants. Some experts argue that testing should continue until no new insights are gained. 49 www.id-book.com ©2011 Name 3 features for each that can be tested by usability testing iPad 50 www.id-book.com CASE STUDY: IPAD EVALUATION 51 www.id-book.com Neilsen Norman Group study http://www.nngroup.com/reports/mobile/ipad/ “These reports are based on 2 rounds of usability studies with real users, reporting how they actually used a broad variety of iPad apps as well as websites accessed on the iPad.” “The first edition of our iPad report (from 2010) describes the original research we conducted with the first crop of iPad apps immediately after the launch of this tablet device.” “The second edition of our iPad report (from 2011) describes the follow-up research we conducted a year later, after apps and websites had been given the chance to improve based on the early findings.” ©2011 52 www.id-book.com Research Goals Understand how the interactions with the device affected people Get feedback to their clients and developers Report on whether the iPad lived up to the hype Are user expectations different for the iPad compared with the iPhone? 53 www.id-book.com ©2011 Methodology: Usability Testing With a think aloud protocol Conducted in Chicago, and Fremont, CA Aim of understanding the typical usability issues that people encounter when using apps and accessing websites on the iPad 7 participants : Experienced iPhone users (3+ months exp) who used a variety of apps Varied ages (20’s to 60’s) and occupations 3 male, 4 female 54 www.id-book.com ©2011 Questions: Do you think the selection of participants was appropriate? Why? What problems may occur when asking participants to think out loud as they complete tasks? 55 www.id-book.com ©2011 The tests 90 minutes of exploring applications on the iPad (self-directed exploration) Researcher sat beside participant, observed, took notes Session was video recorded Specific tasks Open a specific app/website Carry out one or more tasks “as if they were on their own” Balanced presentation order of tasks 60 tasks from 32 different sites 56 www.id-book.com ©2011 The Equipment Camera (pointing down at the iPad) recorded the participant’s interactions and gestures when using the iPad Recording streamed to a laptop computer Webcam used to record the expressions on the participants’ faces and their think aloud commentary Laptop ran Morae, which synched the two data streams Up to 3 observers watched the video streams (including the in-room moderator) so that the person’s personal space was not invaded 57 www.id-book.com ©2011 App or website Task iBook Download a free copy of Alice’s Adventures in Wonderland and read through the first few pages. Craigslist Find some free mulch for your garden. eBay You want to buy a new iPad on eBay. Find one that you could buy from a reputable seller. Time Magazine Browse through the magazine and find the best pictures of the week. Epicurious You want to make an apple pie for tonight. Find a recipe and see what you need to buy in order to prepare it. Kayak You are planning a trip to Death Valley in May this year. Find a hotel located in or close to the park. How representative do you think these tasks are? 58 www.id-book.com ©2011 Question? What aspects are considered to be important for good usability and user experience in this case? 59 www.id-book.com ©2011 Metrics of interest Usability User Experience Efficient Support creativity Effective Be motivating Safe Be helpful Easy to learn Be satisfying to use Easy to remember Have good utility 60 www.id-book.com Results Website interactions not optimal Links too small to tap reliably Fonts could be difficult to read Participants sometimes did not know where to tap on the iPad to select options such as buttons and menus Got lost (no back button to return to the home page) Inconsistencies between portrait and lansdcape presentations 61 www.id-book.com ©2011 Usability Experiments Predict the relationship between two or more variables. Independent variable is manipulated by the researcher. Dependent variable depends on the independent variable. Typical experimental designs have one or two independent variable. Validated statistically & replicable. 62 www.id-book.com ©2011 Experimental designs Between subjects: Different participants - single group of participants is allocated randomly to the experimental conditions. Within subjects: Same participants - all participants appear in both conditions. Matched participants - participants are matched in pairs, e.g., based on expertise, gender, etc. 63 www.id-book.com ©2011 Between, within, matched participant design Design Advantages Disadvantages Between No order effects Many subjects & individual differences a problem Within Few individuals, no individual differences Counter-balancing needed because of ordering effects Matched Same as different participants but individual differences reduced Cannot be sure of perfect matching on all differences 64 www.id-book.com ©2011 Field studies Field studies are done in natural settings. “in the wild” is a term for prototypes being used freely in natural settings. Aim to understand what users do naturally and how technology impacts them. Field studies are used in product design to: - identify opportunities for new technology; - determine design requirements; - decide how best to introduce new technology; - evaluate technology in use. 65 www.id-book.com ©2011 Data collection & analysis Observation & interviews Notes, pictures, recordings Video Logging Analyzes Categorized Categories can be provided by theory Grounded theory Activity theory 66 www.id-book.com ©2011 Data presentation The aim is to show how the products are being appropriated and integrated into their surroundings. Typical presentation forms include: vignettes, excerpts, critical incidents, patterns, and narratives. 67 www.id-book.com ©2011 Key points 1 Usability testing is done in controlled conditions. Usability testing is an adapted form of experimentation. Experiments aim to test hypotheses by manipulating certain variables while keeping others constant. The experimenter controls the independent variable(s) but not the dependent variable(s). 68 www.id-book.com ©2011 Key points 2 There are three types of experimental design: different-participants, same- participants, & matched participants. Field studies are done in natural environments. “In the wild” is a recent term for studies in which a prototype is freely used in a natural setting. Typically observation and interviews are used to collect field studies data. Data is usually presented as anecdotes, excerpts, critical incidents, patterns and narratives. 69 www.id-book.com ©2011