Chapter 7 - Evaluation HCI: Developing Effective Organizational Information Systems Dov Te’eni Jane Carey Ping Zhang Copyright 2006 John Wiley and Sons, Inc. Road Map Context Foundation Application 4 1 Physical Introduction Engineering 7 8 Evaluation Principles & Guidelines 3 Interactive 2 Technologies 5 Context Methodology Engineering Org & Business 11 Cognitive 9 6 Affective 10 Organizational Componential Tasks Design Engineering 12 13 Relationship, Collaboration, Social & & Organization Global Issues 14 Changing Needs of IT Development & Use Additional Context Copyright 2006 John Wiley and Sons, Inc. Learning Objectives Explain what evaluation is and why it is important. Understand the different types of HCI concerns and their rationales. Understand the relationships of HCI concerns with various evaluations. Understand usability, usability engineering, and universal usability. Copyright 2006 John Wiley and Sons, Inc. Learning Objectives Understand different evaluation methods and techniques. Select appropriate evaluation methods for a particular evaluation need. Carry out effective and efficient evaluations. Critique reports of studies done by others. Understand the reasons for setting up industry standards. Copyright 2006 John Wiley and Sons, Inc. Evaluation Evaluation: the determination of the significance, worth, condition, or value by careful appraisal and study. Copyright 2006 John Wiley and Sons, Inc. HCI Methodology and Evaluation Project Selection & Planning Analysis Project Selection Project Planning Requirements Determination Context Analysis User Needs Test User Analysis Task Analysis Formative Evaluation Alternative Selection Interface Specification Design Metaphor Design Media Design Dialogue Design Presentation Design Coding Implementation Formative Evaluation Formative Evaluation Copyright 2006 John Wiley and Sons, Inc. Summative Evaluation HCI Principles & Guidelines Evaluation Metrics What to evaluate? Four levels of HCI concerns HCI Concern Description Sample Measure Items Physical System fits our physical strengths and limitations and does not cause harm to our health Legible Audible Safe to use Cognitive System fits our cognitive strengths and Fewer errors and easy recovery Easy to use limitations and functions as the Easy to remember how to use cognitive extension of our brain Easy to learn Affective System satisfies our aesthetic and affective needs and is attractive for its own sake Aesthetically pleasing Engaging Trustworthy Satisfying Enjoyable Entertaining Fun Usefulness Using the system would provide rewarding consequences Support individual’s tasks Can do some tasks that would not be possible without the system Extend one’s capability Rewarding Why evaluate? The goal of the evaluation is to provide feedback in software development thus suporting an iterative development process (Gould and Lewis 1985). Copyright 2006 John Wiley and Sons, Inc. When to evaluate Formative Evaluation: conducted during the development of a product in order to form or influence design decisions. Summative Evaluation: conducted after the product is finished to ensure that it posses certain quality, meets certain standards or satisfies certain requirements set by the sponsors or other agencies. Copyright 2006 John Wiley and Sons, Inc. When to evaluate Implementation Prototyping Task analysis/ Functional analysis Evaluation Requirements specification Conceptual design/ formal design Figure 7.1 Evaluation as the Center of Systems Development Copyright 2006 John Wiley and Sons, Inc. When to evaluate Use and Impact Evaluation: conducted during the actual use of the product by real users in real context. Longitudinal Evaluation: involving the repeated observation or examination of a set of subjects over time with respect to one or more evaluation variables. Copyright 2006 John Wiley and Sons, Inc. Issues in Evaluation Evaluation Plan Stage of design (early, middle, late) Novelty of product (well defined versus exploratory) Number of expected users Criticality of the interface (e.g., life-critical medical system versus museum-exhibit support) Costs of product and finances allocated for test Time available Experience of the design and evaluation team Copyright 2006 John Wiley and Sons, Inc. Usability and Usability Engineering Usability: the extent to which a product can be used by specified users to achieve specified goals with effectiveness, efficiency and satisfaction in a specified context of use. Copyright 2006 John Wiley and Sons, Inc. Usability and Usability Engineering Figure 7.2 System Acceptability and Usability Table 7.4 Nielsen’s Definitions Usefulness: is the issue whether the system can be used to achieve some desired goal. Utility: the question of whether the functionality of the system in principle can do what is needed. Usability: the question of how well users can use that functionality. Learnability: the system should be easy to learn so that the user can rapidly start getting some work done with the system. Efficiency: the system should be efficient to use, so that once the user has learned the system, a high level of productivity is possible. Copyright 2006 John Wiley and Sons, Inc. Table 7.4 Nielsen’s Definitions Memorability: the system should be easy to remember, so that the casual user is able to return to the system after some period of not having used it, without having to learn everything all over again. Errors: the system should have a low error rate, so that users make few errors during the use of the system, and so that if they do make errors they can easily recover from them. Further, catastrophic errors much not occur. Satisfaction: the system should be pleasant to sue, so that users are subjectively satisfied when using it; they like it. Copyright 2006 John Wiley and Sons, Inc. Usability Engineering Usability Engineering: a process through which usability characteristics are specified, quantitatively and early in the development process, and measured throughout the process. Copyright 2006 John Wiley and Sons, Inc. Evaluation Methods Field strategies (Settings under conditions as natural as possible) Respondent strategies (Settings are muted or made moot) Field studies Ethnography and interaction Contextual inquiry Judgment studies Usability inspection methods (e.g. heuristic evaluation) analysis Field experiments Beta testing of products Studies of technological change Sample surveys Questionnaires Interviews Experimental strategies (Settings concocted for research purposes) Theoretical strategies (No observation of behavior required) Experimental stimulations Usability testing Usability engineering Formal theory Design theory (e.g. Norman’s 7 stages) Behavioral theory (e.g. color vision) Laboratory Experiments Controlled Experiments Computer Simulation Human Information Processing Theory Copyright 2006 John Wiley and Sons, Inc. Analytical Methods Heuristic Evaluation Heuristics: higher level design principles when used in practice to guide designs. Heuristics are also called rules-of-thumb. Heuristic evaluation: a group of experts, guided by a set of higher level design principles or heuristics, evaluate whether interface elements conform to the principles. Copyright 2006 John Wiley and Sons, Inc. Usability Heuristics Rules Description Visibility of system status The system should always keep users informed about what is going on, through appropriate feedback within reasonable time. Match between system and the real world The system should speak the users' language, with words, phrases and concepts familiar to the user, rather than system-oriented terms. Follow real-world conventions, making information appear in a natural and logical order. User control and freedom Users often choose system functions by mistake and will need a clearly marked "emergency exit" to leave the unwanted state without having to go through an extended dialogue. Support undo and redo. Consistency and standards Users should not have to wonder whether different words, situations, or actions mean the same thing. Follow platform conventions. Error prevention Even better than good error messages is a careful design which prevents a problem from occurring in the first place. Table 7.3 Ten Usability Heuristics Copyright 2006 John Wiley and Sons, Inc. Usability Heuristics Rules Description Recognition rather than recall Make objects, actions, and options visible. The user should not have to remember information from one part of the dialogue to another. Instructions for use of the system should be visible or easily retrievable whenever appropriate. Flexibility and efficiency of use Accelerators -- unseen by the novice user -- may often speed up the interaction for the expert user such that the system can cater to both inexperienced and experienced users. Allow users to tailor frequent actions. Aesthetic and minimalist design Dialogues should not contain information which is irrelevant or rarely needed. Every extra unit of information in a dialogue competes with the relevant units of information and diminishes their relative visibility. Help users recognize, diagnose, and recover from errors Error messages should be expressed in plain language (no codes), precisely indicate the problem, and constructively suggest a solution. Help and documentation Even though it is better if the system can be used without documentation, it may be necessary to provide help and documentation. Any such information should be easy to search, focused on the user's task, list concrete steps to be carried out, and not be too large. Table 7.6 Ten Usability Heuristics Eight Golden Rules Rules Description Strive for consistency This rule is the most frequently violated one, but following it can be tricky because there are many forms of consistency. Consistent sequences of actions should be required in similar situations; Identical terminology should be used in prompt, menus, and help screens; Consist color, layout, capitalization, fonts, etc. should be employed throughout. Exceptions, such as required confirmation of the delete command or no echoing of passwords, should be comprehensible and limited in number. Cater to universal usability Recognize the needs of diverse users and design for plasticity, facilitating transformation of content. Novice-expert differences, age ranges, disabilities, and technology diversity each enrich the spectrum of requirements that guides design. Adding features for novices, such as explanations and features for expert, such as shortcuts and faster pacing, can enrich the interface design and improve perceived system quality. Offer informative feedback For every user action, there should be some system feedback. For frequent and minor actions, the response can be modest, whereas for infrequent and major actions, the response should be more substantial. Visual presentation of the objects of interest provides a convenient environment for showing changes explicitly. Design dialogs to yield closure Sequence of actions should be organized into groups with a beginning, middle, and end. Informative feedback at the completion of a group of actions gives operators the satisfaction of accomplishment, a sense of relief, the signal to drop contingency plans from their minds, and a signal to prepare for the next group of actions. Table 7.7 Eight Golden Rules for User Interface Design Eight Golden Rules Rules Description Prevent errors As much as possible, design the system so that users cannot make serious errors. If a user makes an error, the interface should detect the error and offer simple, constructive and specific instructions for recovery. Erroneous actions should leave the system state unchanged, or the interface should give instructions about restoring the state. Permit easy reversal of actions As much as possible, actions should be reversible. This feature relieves anxiety, since the user knows that errors can be undone, thus encouraging exploration of unfamiliar options. The units of reversibility may be a single action, a data-entry task, or a complete group of actions, such as entry of a name and the address book. Support internal locus of control Experienced operators strongly desire the sense that they are in charge of the interface and that the interface responds their actions. Surprising interface actions, tedious sequences of data entries, inability to obtain or difficulty in obtaining necessary information, and inability to produce the action desired all build anxiety and dissatisfaction. Reduce short-term memory load The limitation of human information processing in short-term memory requires that displays be kept simple, multiple-page displays be consolidated, window-motion frequency be reduced, and sufficient training time be allotted for codes, mnemonics, and sequences of actions. Where appropriate, online access to command-syntax forms, abbreviations, codes, and other information should be provided. Table 7.7 Eight Golden Rules for User Interface Design (Shneiderman and Plaisant 2005) HOMERUN Heuristics for Websites Description High-quality content Often updated Minimal download time Ease of use Relevant to users’ needs Unique to the online medium Net-centric corporate culture Table 7.8 HOMERUN Heuristics for Commercial Websites (Nielsen 2000) Copyright 2006 John Wiley and Sons, Inc. Cognitive Walkthrough The following steps are involved in cognitive walkthroughs: The characteristics of typical users are identified and documented and sample tasks are developed that focus on the aspects of the design to be evaluated. A designer and one or more expert evaluators then come together to do the analysis. The evaluators walk through the action sequences for each task, placing it within the context of a typical scenario, and as they do this they try to answer the following questions: Will the correct action be sufficiently evident to the user? Will the user notice that the correct action is available? Will the user associate and interpret the response from the action correctly? Copyright 2006 John Wiley and Sons, Inc. Cognitive Walkthrough As the walkthrough is being done, a record of critical information is complied in which the assumptions about what would cause problems and why are recorded. This involves explaining why users would face difficulties. Notes about side issues and design changes are made. A summary of the results is compiled. The design is then revised to fix the problems presented. Copyright 2006 John Wiley and Sons, Inc. Pluralistic Walkthroughs Pluralistic walkthroughs are “another type of walkthrough in which users, developers and usability experts work together to step through a task scenario, discussing usability issues associated with dialog elements involved in the scenario steps.” (Nielsen and Mack 1994) Copyright 2006 John Wiley and Sons, Inc. Inspection with Conceptual Frameworks such as the TSSL model Another structured analytical evaluation method is to use conceptual frameworks as bases for evaluation and inspection. One such framework is the TSSL model we have introduced earlier in the book. Copyright 2006 John Wiley and Sons, Inc. Example 1 - Evaluating option/configuration specification interfaces Figure 7.3 A Sample Dialog Box Copyright 2006 John Wiley and Sons, Inc. Evaluating option/configuration specification interfaces Tabs act as a menu for the Dialog Figure 7.4 A Sample Tabbed Dialog Box Evaluating option/configuration specification interfaces Title Area Tree menu Figure 7.5 The Preferences Dialog Box with Tree Menu Copyright 2006 John Wiley and Sons, Inc. Evaluating option/configuration specification interfaces Additional Tabs Navigators Tabbed DropDown Menu Copyright 2006 John Wiley and Sons, Inc. Example 2 Yahoo, Google, and Lycos web portals and search engines Compare and contrast displays for top searches of 2003. Which uses color most effectively? Layout? Ease of understanding? Why? Copyright 2006 John Wiley and Sons, Inc. Empirical Methods Surveys and Questionnaires Used to collect information from a large group of respondents. Interviews (including focus groups) Used to collect information from a small key set of respondents. Experiments Used to determine the best design features from many options. Field studies Results are more generalizable since they occur in real settings. Copyright 2006 John Wiley and Sons, Inc. Lifecycle Stage System Status Environ. Of Evaluation Real Users Participation User Tasks Used Main Advantage Main disadvantage Heuristic evaluation Any stage; early ones benefit most Any status (mock up, prototype, final product) Any None None Finds individual problems. Can address expert user issues Does not involve real users, thus may not find problems related to real users in real context. Does not link to user's tasks. Guideline preview Any stage; early ones benefit most Any status Any None None Finds individual problems. Does not involve real users. Does not link to user's tasks. Cognitive walkthrough Any stage; early ones benefit most Any status Any None Yes, need to identify tasks first Less expensive. Does not involve real users. Limited to expert's view. Table 7.11 Comparison of Evaluation Methods Lifecycle Stage System Status Environ. Of Evaluation Real Users Participation User Tasks Used Main Advantage Main disadvantage TSSL based inspection Any stage Any status Any None Yes, need to identify tasks first Direct link to user tasks. Structured with less number of steps to go through. Does not involve real users. Limited to the tasks identified. Survey Any stage Any status Any Yes, a lot Yes or no Finds subjective reactions. Ease to conduct and compare. Questions need to be well designed. Need large sample. Interview Task analysis Mock up, prototype Any Yes None Flexible, in-depth probing. Time consuming. Hard to analyze and compare. Lab controlled experiment Design, implement, or use Prototype, final product Lab Yes. Yes, most time artificially designed to mimic real tasks Provides factbased measurements. Results easy to compare. Requires expensive facility, setup, and expertise. Field study w/ observation and monitoring Design, implement, or use Prototype, final product Real work setting Yes None Easy applicable. Reveal user's real tasks. Can highlight difficulties in real use Observation may affect user behavior Table 7.11 Comparison of Evaluation Methods Standards Standards: are concerned with prescribed ways of discussing, presenting, or doing things to achieve consistency across same type of products. Quality in Use Product Quality Process Quality Organizational Capability User Performance/ Satisfaction Product Development Process Life Cycle Process Figure 7.10 Categories of HCI Related Standards Copyright 2006 John Wiley and Sons, Inc. Standards Sources of Standards Information Published ISO standards ISO national member bodies BSI: British Standards Institute URL www.iso.ch/projects/programme.html www.iso.ch/addresse/membodies.html www.bsi.org.uk ANSI: American National Standards Institute www.ansi.org NSSN: A National Resource for Global Standards www.nssn.org TRUMP list of HCI and Usability Standards www.usability.serco.com/trump/resources/sta ndards.htm Table 7.12 Sources for HCI and Usability Related Standards Common Industry Format (CIF) Common Industry Format (CIF): a standard method for reporting summative usability test findings. The type of information and level of detail that is required in a CIF report is intended to ensure that: Good practice in usability evaluation had been adhered to. There is sufficient information for a usability specialist to judge the validity of the results. If the test was replicated on the basis of the information given in the CIF, it should produce essentially the same results. Specific effectiveness and efficiency metrics must be used, Satisfaction must also be measured. Copyright 2006 John Wiley and Sons, Inc. Common Industry Format (CIF) According to NIST, the CIF can be used in the following fashion. For purchased software: Require that suppliers provide usability test reports in CIF format. Analyze for reliability and applicability. Replicate within agency if required. Use data to select products. For developed software (in house or subcontract): Define measurable usability goals. Conduct formative usability testing as part of user interface design activities. Conduct summative usability test using the CIF to ensure goals have been met. Copyright 2006 John Wiley and Sons, Inc. Summary Evaluations are driven by the ultimate concerns of human– computer interaction. In this chapter, we presented four types of such concerns along the following four dimensions of human needs: agronomical, cognitive, affective, and extrinsic motivational (usefulness). Evaluations should occur during the entire system development process, after system is finished, and during the period the system is actually used. This chapter introduced several commonly used evaluation methods. Their pros and cons were compared and discussed. The chapter also provided several useful instruments and heuristics. Standards play an important role in practice. This is discussed in the chapter. A particular standard, Common Industry Format, is described and the detailed format is given in the appendix. Copyright 2006 John Wiley and Sons, Inc.