“Retrospective vs. concurrent think-aloud protocols: usability testing of an online library catalogue.” Presented by: Aram Saponjyan & Elie Boutros Overview: • The article discusses the think aloud techniques that are used as part of usability tests. • The two main think aloud approaches, retrospective and concurrent, are compared through the test of an online library catalogue. • The three main points of comparison are : the detected usability problems, the overall task performance and the participants experience. Think Aloud Protocol • A method of usability evaluation. • A method that allows researchers to understand the thought process of testers as they use a given product or device. • It is a great method for software designers to interact with potential users and to improve their designs based on the user feedback. RTA vs. CTA • In RTA, participants are asked to perform a set of tests silently(while being video taped) and then verbalize their experience at the end of the testing session while watching themselves on tape. • In CTA, participants are asked to explain their thoughts as they are testing the product. A facilitator is always present to remind them to “think aloud” in case they remain silent. • Both the retrospective and the concurrent techniques are used for usability tests of websites,GUIs, and database front ends. • Both techniques are valid, useful, and widely adopted in usability tests. • Both methods yield nearly unbiased software evaluations since participants do not have to recall their thoughts long after performing the tasks. Advantages of CTA • CTA tends to involve less biased thoughts since users are asked to verbalize their thinking process during task performance. “CTA is more faithful representative of a strictly task oriented usability test”. • More observed problems are revealed during task completion as opposed to the RTA which depends heavily on the user’s verbalizations which take place after task completion. Disadvantages of CTA • Users might potentially feel uncomfortable verbalizing their thoughts while performing the task at hand.( especially if they are not doing so in their native spoken language) • participants have an extra burden in speaking their thoughts while performing the tasks as opposed to the RTA where users have more time to verbalize the problems after task completion. Effects of CTA disadvantages on test results. • This burden did not slow down the process of task completion. However, the success rate of task completion was affected. CTA participants were less successful in completing their tasks than those who used RTA. Advantages of RTA • Participants are not burdened with the extra task of verbalizing their thoughts as they test. This will make it easier for non-native English speakers since they will have more time to think and translate their thoughts from their native language into English. • Another benefit of RTA is the potential decrease in reactivity since participant can execute a task at their own pace and are not rushed in a way that can affect their normal software usage. This will make it more likely for them to not perform better nor worse than usual. Disadvantages of RTA • RTA might not be as precise in the user experience description as CTA since users are asked to describe their experience after finishing their tasks. This extra time might introduce biased judgment since participants might forget specific things they had experienced during their task performance. • Overall session time is longer in RTA than it is in CTA since users of RTA not only perform their tasks but also watch these in retrospect. Test Object. • The online library catalogue was chosen to be tested because it combines the characteristics of a search engine and a website which makes it complexes enough for novice users. • The participants were a group of 40 university students gathered by the mean of email announcements and printed forms. • The participants were of age 18 to 24 and were asked to participate in return for a financial reward. Tasks • The tasks were all equally difficult and independent in order to prevent participants from getting stuck. • They were defined to cover the catalogue’s main search functions. • Those search functions included the simple search, advance search, sort results and filter results. Questionnaires • Two different questionnaires were given to the participants. One at the beginning of the test session and the other at the end. • The 1st one had questions on the demographic details of the participants such as age , gender and education. • The 2nd one had questions aiming towards finding out how participants felt about participating in the experiment. Processing of the data Total number of usability problems detected in each condition was examined. After that, a distinction was made according to the way the usability problems had surfaced in the data: • through observation of the behavioral data • through verbalization by the participant • a combination of observation and verbalization. Problem Types • Layout problems: The participant fails to spot a particular element within a screen of the catalogue; • Terminology problems: The participant does not comprehend part(s) of the terminology used in the catalogue; • Data entry problems: The participant does not know how to conduct a search (i.e. enter a search term, use dropdown windows, or start the actual searching); • Comprehensiveness problems: The catalogue lacks information necessary to use it effectively; • Feedback problems: The catalogue fails to give relevant feedback on searches conducted. Results • 93% of all comments made by CTA participants corresponded to an observable problem in their task execution, compared to 54% of the comments of the RTA participants • Of the 72 problems that were detected, 47% were reported in both conditions, 31% were detected exclusively in the CTA condition, and another 22% were detected exclusively in the RTA condition. • This table shows that 89% of all the problem detections involved problems that were experienced by participants in both conditions. What this tables show? • The CTA participants had to verbalize and work at the same time, which gave them less time to comment on problems that were not acute. • While the CTA method reveals more problems that can be observed during task performance, the RTA method depends more on the participants’ verbalizations. • Verbal protocols in this study do not so much serve to reveal problems but rather to verbally support the problems that are otherwise observable. Task performance • Does double workload in CTA has an effect on the participants’ task performance? • Indicators: the successful completion of the seven tasks the time it took the participants to complete them • Result: No significant differences were found. Participant experiences • Questions: experiences with concurrent or retrospective thinking aloud method of working presence of the facilitator and the recording equipment • Result: No significant differences as to how the participants in both conditions experienced CAT & RAT. • CTA participants found the test situation less disturbing than the RTA participants. • Explanation: RTA participants are given more time to fill in the questionnaire. Presence of the facilitator during the first part of the RTA test (silent task performance) is less functional than in a CTA design, and that it may be confronting for participants to see their actions back on video. The CTA participants had to actively perform tasks and think aloud, which considerably reduced the amount of attention they could spare for noticing the facilitator and the recording equipment. Conclusion • Both methods are comparable in terms of quantitative output, they differed significantly as to how this output was established. • RTA method proved to be more effective in revealing problems that were not observable, but could only be detected by means of verbalization. • RTA participants tended to give explanations and suggestions, while CTA participants more often limited themselves to giving descriptions of their actions. • Very limited contribution of the participant’s verbalizations to the outcome (in terms of user problems detected) of the usability test. Conclusion • The task of concurrently verbalizing thoughts caused the participants to make more errors in the process of task performing and to be less successful in completing the seven tasks. • Less successful performance of CTA method lies in the participant’s workload: the difficulty of the tasks given to the participants may have been a crucial factor in this study. • A strong, and new argument in favor of RTA protocols is that they may be less susceptible to the influence of task difficulty, both in terms of reactivity and in terms of completeness of the verbalizations.