Think aloud technique

advertisement
Reports
Gordon Rugg, January 2006
Reports
Reports involve the respondent reporting aloud about something. There are various
well-established forms of report, such as think aloud technique (where the respondent
thinks aloud about what they are doing) and critical incident technique (where the
respondent describes some past incident which was in some way critical).
These forms of report can be fairly neatly categorised in terms of two criteria.
One criterion is the tense – future, present or past. Sometimes the reporting is about
something that might happen in the future (as in scenarios, where you ask the
respondent what would happen if something were to happen). Sometimes it’s about
what is happening now, as when a respondent is thinking aloud about their experience
of trying to use a piece of software. Sometimes it’s about the past, as in critical
incident technique.
Another criterion is the person. Some reports involve the respondent reporting their
own future/present/past actions: first person reports. Others involve the respondent
reporting on the future/present/past actions of somebody else: third person reports.
Third person reports overlap with projective techniques, when you ask someone to
respond as if they were some other specified person (e.g. asking carers to respond as
if they were a patient). They can be useful for getting at “back” versions and for
handling tasks where it’s logistically difficult to do first-person reports (for instance,
any task involving language, such as calming down an angry customer). There is,
however, the risk that the results from third person reports will be affected by
attributions – there’s a relevant literature on attribution theory.
It’s also possible in principle to do second-person reports, for example when piloting
instructions and materials for a study. You can ask one or two pilot respondents to
work through the instructions etc, and to give a running commentary on what they
think you’re trying to do in the study. This could reveal some useful things about how
your intended respondents might construe the instructions – for instance, whether they
think there’s some sort of trick question involved. Fixing such things should improve
the validity and robustness of your findings. This approach isn’t widely used, but it’s
one we’re going to investigate in more depth.
This article isn’t a comprehensive summary of all the types; it focuses mainly on the
ones that we happen to use most often because they’re most suited to our purposes.
We haven’t given a comprehensive bibliography, but we have given enough keywords
to make it pretty easy to track down the original texts when you want more detail.
Think aloud technique
Think aloud technique is pretty much what it sounds like. You ask someone to do a
task, and to think aloud about what they are doing while they are doing it. This is
useful for a lot of purposes, and allows you to get at various kinds of knowledge
which are difficult or impossible to reach via other methods. The most obvious of
these is short term memory, but think aloud technique is also useful for giving
insights into whether people are tackling a task using pattern matching or sequential
reasoning; it’s also useful for identifying which things they bother with, and which
things they don’t notice. As ever, there are various limitations; for instance, the action
of thinking aloud interferes with some types of task, so you don’t get valid insights,
and analysing the output can be challenging. It can be very useful in preliminary
investigations, as a way of identifying things worth following up with other
techniques.
Think aloud technique has been around for a long time, under various names. These
include concurrent verbalisation and on-line self-report. In some fields, learners are
taught to think aloud as part of the learning process, so that the instructor can check
that the learner is paying attention to the right things (for example, when driving a
car). It’s closely related to other techniques such as critical incident technique and
scenarios (which involve reporting on past and hypothetical future events
respectively). It can also be used projectively, when you ask the respondent to answer
as if they were someone else (for instance, asking a nurse to answer as if they were an
elderly patient). This can be useful for identifying where there are systematic
misunderstandings between groups.
The basic concept is simple: you tell the respondent what the task is, and ask them to
think aloud while doing it. If they are silent for more than a set length of time (e.g.
five seconds) then you use a prearranged prompt to get them talking again (e.g.
“Could you tell me what you’re thinking about now?”) These prompts should not be
leading questions (e.g. “Are you looking at the background of the picture?”)
The task needs to be one where thinking aloud won’t cause interference. There are
obvious problems with some verbal tasks such as interpreting or negotiating; these
can be tackled to some extent by first recording the respondent doing the task in their
normal way, and then playing back the recording to them, and asking them to give a
commentary based around the recording. There are less obvious problems with some
tasks which involve problem-solving and compiled skills, where the fact that the
respondent is thinking explicitly about what they are doing causes interference
(probably because they are shifting into sequential reasoning for a task they would
normally tackle using pattern matching and/or parallel processing). The document on
questioning methodology elsewhere on this website explains these terms, if you’re not
already familiar with them.
The actual data collection using this technique is usually pretty straightforward. One
thing to watch for is respondents using visual signals which will be lost if you use
only audio recording (for instance, saying “that bit of the page” and pointing to an
area of the web page they’re commenting on). The other most common problem
during recording is respondents either talking too much or too little. This problem can
be reduced by doing a quick demonstration as part of the respondent’s briefing, in
which you do a think-aloud about something completely unrelated to what they will
describe, so you don’t cue them. For instance, if they’re doing a think-aloud about car
advertisements, you might do a think-aloud about a painting or diagnosing an
electrical fault, or whatever area of expertise you happen to have – hobbies are useful
for this.
If you’re using this technique for reconnaissance, then you won’t need to do elaborate
analysis. If you’re using it for your main data collection, then you will probably hit
problems of some sort with the analysis.
A major source of problems is that the data from this technique is usually messy,
unclear and unstructured. If you have a clearly defined research question, you may be
able to analyse the results straight off the tapes; if you have to transcribe the data, then
this can be very time-consuming (in the order of ten hours of transcription per hour of
tape, depending on how good your typing is and how loquacious your respondents
are).
One thing worth looking at is what your respondents do in the first few seconds after
starting the task. For some tasks you’ll get an instant response, within a second – for
instance, an immediate response to an advertisement or to a Web page.This tells you
that they’re using pattern matching, and responding to the image as a whole even
before they’ve read any of the words on it. (What the implications are in a given case
is another question, which should give you plenty of food for thought.) For this
reason, it’s worth structuring your data collection so that the respondent doesn’t see
the task until after you’ve started recording, so you can record that immediate
response.
Another thing worth looking at is where your respondents go quiet and start thinking.
This tells you something about where problematic areas might be. A similar issue is
swearwords; these are invaluable indicators of problems, particularly if you’re
looking at task design or product design. You should not discourage respondents from
swearing. If you are getting someone else to transcribe for you, then make sure that
they don’t sanitise the transcript by leaving out or changing the swearwords. The
same applies to silent areas: the convention on transcripts is to use one full stop per
second of silence (so “….” shows four seconds of silence). “Um” and “er” sounds are
also worth noting, for the same reason, particularly when the respondent is otherwise
articulate.
You can analyse the output qualitatively, by identifying the things which are
mentioned, and/or by representations such as cognitive causal maps. You can also
analyse it quantitatively, by recording which things are mentioned by which
respondents. A frequently used way of doing this is a table, with each column
representing a respondent, and each row showing a thing which has been mentioned;
each cell then shows whether or not the appropriate respondent has mentioned that
thing. Alternatively, the table can show how often each respondent mentions each
thing, but this is considerably more work, and makes an implicit assumption that the
number of mentions corresponds directly to the thing’s importance, which may not be
the case. The table below shows some hypothetical data arranged in this way, with
responses for nurses and for patients (n1-n4 and p1-p4 respectively) dealing with
problems affecting hospital out-patients with hand injuries.
bathing oneself
feeding oneself
n1 n2 n3 n4
● ● ● ●
● ● ● ●
p1 p2 p3 p4
● ● ● ●
● ● ● ●
combing hair
●
doing washing up
●
changing TV channels with remote control
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
These fictitious results show that all the nurses, and all the patients, correctly
identified bathing, feeding and combing hair as important; most of the nurses
identified doing the washing up as important, but only one nurse identified changing
TV channels with the remote control as being an important potential problem.
Think aloud technique has been applied to a lot of areas. It works well for
investigating people’s perceptions of artefacts and products (it’s widely used for
investigating perceptions of advertisements and of software interfaces, including Web
pages); it also works well with assessing how usable something is, and with
investigating how people tackle tasks and problems (which can be useful in training
and education, especially if you compare how experts and novices tackle something).
One problem with data from this technique is that respondents often raise tantalising
points, but don’t unpack them. One simple solution is to use think-aloud technique in
combination with laddering, and either unpack the points when they are mentioned, or
work through them all at the end of the think-aloud session in a follow-up laddering
session.
The classic source for think-aloud technique is Newell & Simon’s work. It’s also
described in the textbook on Human Computer Interaction by Dix et al.
Some examples of student projects using this technique:
Glenn McIntyre and Kim Ridsdale used it to investigate which features of a website
were perceived as important for assessing the security of the website and the quality
of product being offered by the website respectively.
Mira Chernikova used this technique to compare perceptions of websites for
Australian tourist sites among Russian, Dutch and British respondents, and found
some interesting differences between these cultures in relation to their perceptions of
the websites. (This involved collecting data in three languages, and then translating it
before analysing it, which is why this technique is seldom used for cross-cultural
work, even though it gives useful insights – Andy Hurd’s use of card sorts for crosscultural elicitation shows a different way of tackling this problem.)
Colette Best used think-aloud technique to investigate which things users wanted
from websites providing online tutorials (as opposed to what the literature suggested
to be the key things which online tutorials should provide).
Zoe Szymansky used think-aloud technique to investigate gender bias in website
design.
Scenarios
Scenarios involve asking respondents to report on what they/some other person would
do if a given situation arose. This can be useful for handling what-if cases and cases
which couldn’t be handled via present-tense reports: for instance, dangerous situations
or ones which would be logistically difficult to arrange.
Scenarios can be used to explore possibilities systematically: for instance, exploring
all of the options identified as possibilities in a public consultation exercise, or all the
logically possible solutions to a design problem. Neil Maiden and colleagues
produced an elegant example of this via a software tool which took the prototypical
script for an interaction (in this case, using a “hole in the wall” machine to withdraw
money), and then automatically generated scenarios for various types of error which
could occur at each point in this script (for instance, the scenario that the respondent
entered their PIN incorrectly).
Scenarios can be very useful, but need to be handled with care. There is considerable
evidence that people are very bad at predicting their own future behaviour in
situations that they haven’t encountered before. The “heuristics and biases” literature
contains numerous examples of this (Kahneman, Slovic & Tversky’s classic text is a
good place to start, though their findings have been re-interpreted by researchers such
as Gigerenzer, Wright and Ayton).
Critical incident technique
This technique, as its name suggests, involves focusing on an event which was in
some way critical – sometimes because it involved something going horribly wrong,
sometimes because the incident involved some particularly important illustrative
issues. In this respect it is similar to techniques such as hard case technique (which
uses a difficult case to demonstrate a particular point that may not be so obvious in
easy cases, or to elicit knowledge about how experts tackle the cases which novices
can’t handle) and illuminative incident technique (which focuses on incidents which
illuminate some underlying problem whose nature is usually not clearly visible).
Critical incident technique is well established in domains such as accident analysis,
and has a well formulated procedure which has been described in detail by various
authors.
There are obvious potential problems with critical incident technique, which are well
recognised among users of the technique.
One set of potential problems involves deliberate human bias – those reporting on the
incident may have vested interests in presenting a version of events which displays
them in a favourable light. A related potential problem involves unintentional human
bias, particularly when those involved are depending on their memories of the events.
Human memory is not a passive recording process resulting in something like a
grainy photograph; it is a process which is active at both the point of encoding and the
point of retrieval, more like a sketch drawing than a photograph, with the artist
deciding what to draw and what to omit, and also having to work out afterwards what
was represented by a particular set of lines. Just because a memory is vivid, that does
not mean that it is necessarily accurate; even vivid memories from an impartial
outside observer may suffer from various biases and failings, such as misremembering the sequence of events.
These problems are more of an issue for some uses of this technique than for other
uses. For instance, if you are studying the espoused practices of an organisation (i.e.
the practices which that organisation claims to follow), then the factual accuracy of
versions of a critical incident are less important than the symbolic importance of that
event to members of the organisation. If, on the other hand, you are trying to find out
the factual events leading to an accident, so you can prevent a similar accident in the
future, then these problems are clearly more important. Techniques such as document
analysis and indirect observation can be used to complement critical incident
technique, as a way of independently checking some facts and of establishing others.
Download