full paper - School of Computing and Mathematics

advertisement
Questioning methodology
Gordon Rugg and Peter McGeorge
Working paper
Faculty of Management and Business
University College Northampton
University College Northampton Faculty of Management and Business Working paper
99/03
ISBN 1 901 547 008
1999
1
The Authors:
Gordon Rugg is Reader in Technology Acceptance at University College
Northampton.
Contact details:
Dr Gordon Rugg
Reader in Technology Acceptance
School of Accountancy, Information Systems and Law
University College Northampton
Boughton Green Road
Northampton, NN2 7AL
UK
Email: Gordon.Rugg@northampton.ac.uk
Tel: +44 (0)1604 735500
Peter McGeorge is Senior Lecturer in Psychology at Aberdeen University
Contact details:
Dr Peter McGeorge
Department of Psychology
University of Aberdeen
Aberdeen
AB24 2UB
Scotland, UK
Email: psy144@mailserv.abdn.ac.uk
Phone: +44 (0)1224 272248
2
Acknowledgements
Any work of synthesis and integration is likely to include a significant amount of input of
ideas and influence from many other people, and this paper is no exception.
The sections of this work dealing with knowledge representation and integration derive at
least in part from experience in developing a Knowledge Elicitation Workbench while in the
Artificial Intelligence Group at the Department of Psychology, University of Nottingham,
working with Nigel Shadbolt, Mike Burton and Han Reichgelt. The sections on implicit
knowledge were grounded in Peter McGeorge’s PhD work in the same department, with
Mike Burton. The concept of accessing different versions of knowledge via different elicitation
techniques derives from Gordon Rugg’s PhD work with Wyn Bellin, while in the Department
of Psychology, Reading University. The idea of accessing different types of memory via
different elicitation techniques in the context of requirements acquisition was developed with
Neil Maiden, HCI Design Group, School of Business Computing, City University. The
extension of the requirements acquisition work to the wider concept of questioning
methodology was largely inspired by work with Ann Blandford, School of Computing
Science, Middlesex University.
We would also like to record our gratitude to everyone else who helped us in this work,
particularly those who provided constructive suggestions on previous drafts, and the longsuffering respondents who provided us with the practical experience of elicitation on which
this work was based.
3
Abstract
A central problem in many disciplines is the elicitation of a complete, correct, valid and
reliable set of information from human beings – finding out what people want, think, know or
believe. Examples include social science research, market and product research, opinion polls
and client briefs. Although numerous elicitation techniques exist, there has traditionally been
little theoretically-driven guidance available on choice, sequencing and integration of
techniques. Choice of technique has been largely a matter of individual preference, with
interviews and questionnaires usually being chosen, regardless of how suitable they are for
the task being approached. This paper discusses the issues involved in providing guidance
about choice of technique, then describes a framework for providing such guidance.
A central feature of this paper is the distinction between various types of memory and
knowledge. Some of these can be accessed via interviews or questionnaires. Others, however,
can only be accessed by one technique, and are inaccessible to interviews and questionnaires.
These types are listed in the framework, and matched with corresponding recommended
elicitation techniques. The framework is illustrated by case studies, including two from the
authors’ industrial experience.
The paper concludes that questioning methodology fills a methodological gap between
experimental design and statistics, and should be established as a discipline in its own right.
4
Contents:
1:
2:
3:
4:
5:
6:
7:
8:
Introduction
A framework for categorising techniques
Selecting and integrating techniques
Method fragments
Case studies
Discussion
Future work
Conclusion
Bibliography
Figure 1: A three layer graph
Table 1: Recommended and contra-indicated techniques for handling each knowledge type
5
1 INTRODUCTION
A central problem in many disciplines is finding out what people want, or think, or believe, or
know. This problem is at the heart of any research involving human behaviour or attitudes –
the social sciences, in effect – and a surprising range of other fields. In computing science, for
example, elicitation of expertise is central to knowledge acquisition for knowledge based
systems, and elicitation of client requirements is at the heart of system analysis and of
requirements acquisition.
The problem is not caused by a lack of research in the area, or of elicitation techniques; a
recent book on qualitative research methods alone runs to over six hundred pages (Denzin
and Lincoln 1994), and an overview article on requirements acquisition listed a dozen major
techniques, with clear recognition that there were numerous other techniques in existence, as
well as numerous versions of both the major and minor techniques (Maiden and Rugg, 1996).
The problem is more to do with choice of the appropriate technique or techniques, and with
using them in the correct way. The same problem occurs in a wide range of disciplines.
Traditionally, there have been three main approaches to choice of questioning technique. One
is to view choice of technique as unimportant; a second is to use the techniques traditionally
used in the discipline, and the third is to view the issue as important, but not yet well enough
understood to enable an informed choice. The first main point which emerges clearly from the
findings described below is that choice of the correct questioning technique is not just
important, but essential, in any discipline which involves eliciting information from people.
The second main point which emerges is that there is now a theoretically grounded and
practical way of approaching this area. These issues are the central theme of this paper.
A short example demonstrates the type of issue involved. One of the authors recently
supervised an undergraduate project which was investigating hassles (minor stresses and
irritations) affecting IT managers. This is a topic which is of considerable importance both
theoretically (in relation to stress research) and practically (staff turnover among IT managers
is a major problem to companies with a high IT presence). The student did a thorough piece
of work, establishing a good rapport with the IT managers she was studying, and using
several techniques to investigate different aspects of the topic, including interviews and
“hassle diaries”. These provided an interesting insight into the nature of an IT manager’s role,
with enough detail and breadth of coverage to produce the basis of a good dissertation.
However, the interviews and hassle diaries all missed a major feature which was only
detected by use of shadowing (i.e. following the managers around while they worked),
namely that the managers quite often had no lunch break because of pressure of work.
If this were an isolated case, then there would be little cause for concern. However, it is such a
typical case that the authors now routinely use “compare and contrast” designs involving
different elicitation techniques as a standard basis for student projects. Although these
projects also focus on an interesting domain, so that analysis can concentrate on the domain if
the different techniques do not produce different findings, in practice the different techniques
have reliably and systematically produced different findings across a range of domains and
techniques. The following sections discuss reasons for this, and the implications which
follow.
There has been considerable exchange of concepts and techniques between disciplines. For
instance, laddering was developed by Hinkle (Hinkle, 1965) from Kelly’s Personal Construct
Theory (Kelly, 1955), and has since then been used in clinical psychology (Bannister and
Fransella, 1980; Fransella and Bannister, 1977) , architecture (Honikman, 1977), market
research (Reynolds and Gutman, 1988), knowledge acquisition (Rugg and McGeorge, 1995)
and requirements acquisition (Maiden and Rugg, 1996). Ethnographic approaches in various
forms have been applied outside traditional ethnography to fields such as criminology
6
(Patrick, 1973) and requirements acquisition for air traffic control systems (Sommerville,
Rodden, Sawyer, Bentley and Twidale, 1993).
This exchange, however, has traditionally been at the level of individual concepts and
techniques, rather than in terms of larger frameworks. This is in interesting contrast to the
situation with statistics, experimental design and with survey methods, which have
historically been viewed as semi-autonomous disciplines in their own right, with the same
textbooks and journals being used by researchers from a wide range of disciplines. The reason
for this difference is probably quite simple, namely that there has in the past been little in the
way of higher-level frameworks and metalanguage to handle elicitation techniques as a
whole. It is, however, a critically important absence, because statistics, experimental design
and survey methods cannot make up for damage caused by incorrect selection or use of
questioning technique.
The aim of this article is to describe a framework which will help remedy this situation, by
providing theoretically grounded and systematic guidance on choice of techniques. This
framework is intended to be applicable to a range of disciplines, and to provide a common
ground for the establishment of questioning methodology as a discipline in its own right. This
new discipline would complement survey methods, experimental design, and statistics,
thereby providing researchers with a complete set of conceptual tools and methods for
research involving human behaviour.
This paper is divided into four main sections. The first section briefly describes existing
questioning techniques. The second section describes and discusses knowledge and memory
types, and the implications of these for choice of questioning technique. The third section
describes a framework for selection and integration of questioning technique. The fourth
section provides a brief description of knowledge representation and related concepts, to
provide some further metalanguage.
These are followed by two short case studies and a discussion of implications for further
work.
1.2 Existing techniques
This section provides a brief overview of the main questioning techniques, to set the
subsequent theoretical analysis in context. It is tempting to derive guiding frameworks from
the techniques themselves, or from practical issues involved in technique choice, such as time
or equipment required. Although this can be useful, it is only part of what is needed.
Technique-based frameworks are derived from existing solutions, rather than from the
problem, and it is the problem which is central. This issue is discussed in detail below. It
should be emphasised that the ordering of the list of techniques in this section is largely
arbitrary, and is not intended as a classification in its own right; classification is described
later in this paper.
The descriptions of techniques are intended as a brief overview so that readers know what the
various techniques are before encountering the sections on selection and integration of
techniques – few readers are likely to be familiar with all of them. For clarity, these have been
kept deliberately brief. There is a separate section later in this paper which deals with further
concepts relevant to techniques, such as knowledge representation; some topics which are
only tersely outlined in the descriptions of techniques, such as hierarchical structures of
knowledge, are discussed in more detail in the “further concepts” section.
1.3 The main elicitation techniques
There is a considerable literature on the individual techniques, and on the philosophical,
theoretical and methodological issues associated with them – for instance, the role of the
observer, and the nature of subjectivity in data collection. A good introduction to this
literature is provided by Denzin and Lincoln (1994). Although these are important issues, for
7
reasons of space they are not discussed in detail in this paper, which concentrates instead on
the interaction between knowledge types and elicitation techniques. Some of the techniques
described below can be tracked back to a key source, or can be illustrated by a classic study;
other techniques, such as interviews, are ubiquitous and have no clear origin. The
descriptions below are intended to give a brief overview of the main techniques in use, and
include references to further reading where a technique is likely to be unfamiliar to most
readers.
Ethnographic approaches usually involve spending extensive amounts of time with the
group being studied so as a gain a thorough first-hand understanding of how their physical
and conceptual world is structured. Varieties include participant observation, where the
observer participates in the group’s activities, which may in turn be either undisclosed
participant observation (in which the observer does not disclose to the group that their
participation is for purposes of research) or disclosed (in which the observer does not attempt
to conceal the purpose of the participation).
A classic example of using disclosed participant observation is Mead’s (1928) study of sexual
behaviour in Samoa. A more recent example is Sommerville et al’s (Sommerville et al, 1993)
study of the behaviour of air traffic controllers. Classic examples of undisclosed participant
observation include Patrick’s study of a Glasgow gang (Patrick, 1973) and Rosenhan’s study
of behaviour in a psychiatric ward (Rosenhan, 1973).
Observation involves observing the activity in question. Varieties include participant
observation (described above, under ethnographic approaches), direct observation and
indirect observation. In direct observation, the activity itself is observed; in the case of indirect
observation, the by-products of the target activity are observed, usually when the target
activity itself cannot be directly observed. A familiar example of direct observation is
shadowing, where the researcher follows the respondent around, usually in the context of the
respondent’s work. An example of indirect observation is examination of illegitimacy rates as
an indicator of the incidence of premarital sex.
Reports involve the respondent verbally reporting on the target activity. There are numerous
varieties, some of which would traditionally be considered as techniques in their own right
(e.g. scenarios). The underlying similarity in deep structure, however, is great enough for
classifying them together to be sensible. Varieties include self report and report of others.
Each of these can in turn be subdivided into on-line and off-line reporting. In self report, the
respondent reports on their own actions; in reports of others, the respondent reports on the
actions of others. In on-line report, the reporting occurs while the action is taking place; in offline report, the reporting occurs after the action. Scenarios are a special form of report in
which the respondent reports on what they think would happen in a particular situation (i.e.
scenario), which may involve themselves and/or others. Critical incident technique, and
several closely related techniques such as illuminative incident analysis (Cortazzi and Roote,
1975) involves asking the respondent to describe and discuss a particularly instructive past
incident.
Interviews are one of the most familiar and widely used elicitation techniques. The core
concept is of a question and answer session between elicitor and respondent, but the term
“interview” is used so loosely, and to cover so many variants, that it is of debatable value. A
traditional distinction is made between structured and unstructured interviews. In the former,
the elicitor has a series of prepared topics or specific questions; in the latter, the agenda is left
open and unstructured. Interviews may overlap with scenarios, by asking about possible
situations, and with critical incident technique, by asking about important past events, as well
as with other techniques such as laddering, when clarifying the meaning of technical terms.
The Personal Construct Theory techniques are a range of techniques deriving from Kelly’s
Personal Construct Theory (PCT). These include repertory grids, card sorts and laddering.
PCT is based on a set of assumptions explicitly described by Kelly (1955). These cluster round
a model in which people make sense of the world by dividing it up into things (elements)
8
which can then be described by appropriate attributes (constructs). There are various
assumptions about the nature of entities and constructs: for instance, that there is enough
similarity across individuals to allow us to communicate with each other, but enough
divergence for each individual to be different. This model is reflected in the elicitation
techniques based on PCT. Repertory grids are entity:construct (broadly equivalent to
object:attribute) matrices with values in the resulting cells of the matrix; card sorts involve
repeatedly sorting entities into groups on the basis of different criteria; laddering is a
hierarchically-structured technique similar to a highly restricted interview, for eliciting
categorisations, hierarchies and levels of explanation. These techniques are often formally
linked to each other in elicitation, with output from one being used directly as input for
another. Examples include Kelly’s original work describing PCT (Kelly, 1955); Bannister and
Fransella’s more accessible descriptions of PCT and of repertory grid technique (Bannister
and Fransella, 1980 and Fransella and Bannister, 1977 respectively). Personal Construct
Theory and its associated techniques have been applied to knowledge acquisition and
requirements acquisition by Boose, Gaines and Shaw (e.g. Shaw, 1980, Shaw and Gaines,
1988, and Boose, Shema and Bradshaw, 1989). Card sorts are described in detail in Rugg and
McGeorge, 1997. Laddering was used by Honikman in architecture (Honikman, 1977 ) and by
Reynolds and Gutman in advertising research (Reynolds and Gutman, 1988).
Questionnaires are lists of questions or statements, usually administered in written form, but
sometimes used in spoken form (e.g. via telephone sessions). When in spoken form, they
overlap with structured interviews (described above). A traditional distinction in
questionnaires is between open questions, in which the respondent may use their own words,
and closed questions, in which the respondent has to choose between possible responses on a
list supplied by the elicitor.
Prototyping is an approach used under different names in different disciplines. Versions
include architects’ models, engineering prototypes, software prototypes, artists’ impressions
in architecture, etc. The prototype is shown to the respondent, who then critiques it; the
results from this are then usually fed back into another iteration of design. It should be noted
that this is a completely separate concept from prototype theory, which is discussed
separately in section 3.2.2 below.
9
2 A FRAMEWORK FOR CATEGORISING TECHNIQUES
The choice of structure for a framework is an important issue. A single hierarchical tree, with
classes and sub-classes, is only able to represent a single way of classifying the entities
involved. In the case of elicitation techniques, however, it is necessary to categorise in several
ways, which might include time taken to use the technique (minutes in the case of card sorts,
months or years in the case of some ethnographic work) or equipment needed to use the
technique (extremely sophisticated recording equipment for observation of human:computer
interaction, or a notepad and pen in the case of laddering).
The approach used by Maiden and Rugg (Maiden and Rugg, 1996) and used in the present
paper is a faceted one, in which several different categorisations are used and treated as
orthogonal (i.e. separate from, and uncorrelated with, each other, but applied to the same
entities). This has considerable advantages in terms of clarity. It also has the advantage of
handling range of convenience much more elegantly than is the case with non-faceted
approaches, such as matrix representations or elaborated trees. “Range of convenience” is a
concept in Personal Construct Theory (Kelly, 1955), which refers to the way in which a
particular term can only be used meaningfully within a certain range of settings. For instance,
“IBM compatible” is meaningful only when applied to computer equipment, and is
meaningless when applied to a dug-out canoe. In slot and filler representations such as
matrices, such cases have to be handled by a “not applicable” value, and in extreme cases the
“not applicable” cases can outnumber the meaningful values. These issues are discussed in
more detail below in relation to knowledge representation and its role in questioning
methodology.
Although the technique-driven facets are important, they are not enough. A technique-based
classification would be analogous to a representation of illness which was based on the
medicines and treatments which were available, but which contained no systematic
description of the illnesses which the medicines and treatments were designed to cure.
The most important facet of the Maiden and Rugg framework involves what the authors
termed “internal representations”. This term covers types of memory, of knowledge and of
communication filter which affect the quantity and type of information which can be elicited.
An initial distinction can be made between what is termed “new system” knowledge and
“existing domain” knowledge in the Maiden and Rugg framework. The former refers to
knowledge about things which do not yet exist, the latter to knowledge about things which do
exist, or have already existed. This is a distinction with important implications for the degree
of validity which can reasonably be expected, and will be discussed in more depth later.
Existing domain knowledge is divided into three types of internal representation, namely
tacit, semi-tacit and non tacit knowledge, which form the bulk of the Maiden and Rugg
framework.
2.2 Tacit knowledge is knowledge which is not available to conscious introspection, and can
be subdivided into implicit learning (Seger, 1994) and compiled skills (Neves and
Anderson, 1981, Anderson, 1990). Implicit learning occurs without any conscious learning
process being involved; the learning proceeds straight from the training set of large numbers
of examples into the brain without any intermediate conscious cognitive processes. Compiled
skills were initially learned explicitly, but subsequently became habitualised and speeded up
to the point where the conscious component was lost. Everyday examples include touch
typing and changing gear when driving a car.
In such cases, asking the respondent about the skill will produce valid responses only by
chance. Touch typists, for instance, do not usually have significant explicit memory for the
position of keys on the keyboard; if asked which key is to the right of “g”, for instance, they
will usually have to visualise themselves typing, and observe the answer. Similarly, car
10
drivers will not usually be able to recall the precise sequence of hand and foot movements
which they made when going round a roundabout. Asking respondents to describe what they
are doing while using a compiled skill usually leads to breakdown of performance because of
the intrusion of a slower conscious component into the task. Tacit knowledge may include a
significant amount of pattern matching, which is a very fast, massively parallel form of search
quite different from the sequential reasoning used for other tasks; an everyday example of
pattern matching is recognition of a familiar face. Because of its massively parallel nature,
pattern matching is not amenable to being broken down into lower-level explanations, with
consequent implications for elicitation. One of the most strikingly unexpected results from
research into expertise was the extent to which experts use matching against a huge learned
set of previous instances, rather than sequential logic, as a way of operating (e.g. Chi, Glaser
and Farr, 1988, Ellis, 1989).
2.3 Explicit knowledge is defined as knowledge which is available to conscious introspection.
This type of knowledge is in principle accessible using any elicitation technique, although it
may be subject to various biases and distortions.
2.4 Semi-tacit knowledge is a term which applies to a wide range of memory types and
communication filters. These include short term memory; recall versus recognition; taken for
granted knowledge; preverbal construing, and front and back versions. The common factor
shared by these is that they can only be accessed via some routes.
2.4.2 Short term memory is probably the most widely known of these types, and is well
understood as a result of considerable research in psychology. It is a limited capacity, short
term storage, with a capacity of about seven items, plus or minus two, (Miller, 1956) and a
duration of a few seconds. Long term memory, in contrast, has enormous capacity, and can
last for tens of years. In complex cognitive tasks, short term memory is often used as a sort of
scratchpad, with the information involved never reaching long term memory. This means that
any attempt to access that information after the task (e.g. via interviews) is doomed to failure,
since the information was lost from memory within seconds of being used. Short term
memory is only accessible via contemporaneous techniques such as on-line self-report, or
indirectly via observation.
2.4.3 recall versus recognition is another aspect of memory structure. Recall is active
memory, when information is deliberately retrieved from memory; recognition is passive
memory, when a specified item is compared to what is stored in memory to search for a
match. Recognition is normally considerably more powerful than recall (c.f. Eysenck and
Keane, 1995). A simple example involves trying to recall the names of the states in the USA,
where most people can only recall a small number, but can correctly recognise a much larger
number if shown a list of names.
2.4.4 Taken for granted knowledge (TFG knowledge) is knowledge which one participant in
a communication assumes to be known by the other participant or participants (Grice 1975).
The concept is related to Norman’s concept of knowledge in the head, as opposed to
knowledge in the world (i.e. knowledge explicitly represented in the external world, for
example as instructions on street signs). (Norman, 1990). TFG knowledge is normally not
stated explicitly during communication; for instance, one does not say “My aunt, who is a
woman” because it can be taken for granted that aunts, by definition, are women. This
principle increases the efficiency of normal communication by leaving out superfluous
information. Unfortunately, filtering out of TFG knowledge is based on the assumption that
the other participant or participants share the knowledge, and this assumption can be false.
This is particularly the case when experts are dealing with non-experts, and are describing
everyday features of their area of expertise. Precisely because these features are so familiar to
them, experts are likely to take them for granted, and to assume that they are equally familiar
to the non-expert. Initial evidence from research into semi-tacit knowledge suggests that TFG
knowledge is one of the more common, and more serious, reasons for incomplete elicitation of
information.
11
2.4.4 Preverbal construing is a term used in Personal Construct Theory to describe construing
which occurs without a verbal label for the constructs involved. This effect is what is referred
to in lay language by expressions such as “I can’t put it into words, but…” In some cases, this
may refer to constructs which are fairly explicitly understood by the respondent, but which
happen not to have a verbal label; in other cases, some form of tacit knowledge is involved. A
striking effect which sometimes happens when using PCT techniques is that the respondent
suddenly has an “aha” experience, when a construct changes from preverbal to verbal status.
This is usually accompanied by expressions such as “I’d always known there was a
difference, but I’d never been able to put my finger on it before”.
2.4.5 Front and back versions are, respectively, the “public consumption” and “behind the
scenes” versions of reality which members of a group present to outsiders (in the case of front
versions) and insiders (in the case of back versions). These terms are derived from Goffman’s
(1959) dramaturgical metaphor of the stage performance. This metaphor has the advantage of
not implying any intention to deceive in the front version; the front version in many
professions is viewed by group members as a professional image to be maintained, not as an
extended lie to be fed to the public. It has been anecdotally reported that members of the US
Air Force about to testify to public hearings are given three pieces of advice: firstly, don’t lie,
secondly, don’t try to be funny, and thirdly, don’t panic and blurt out the truth. Although this
does not map exactly onto the distinction between front and back versions, it does neatly
capture the distinction between telling the whole truth on the one hand and not telling a lie on
the other.
Any outsider, such as a researcher or analyst, coming into an organisation is likely to be given
the front version. Although this may not be dishonest, it is also unlikely to be the whole truth,
and the missing information can be extremely important. An extensive literature dating back
to Weber (e.g. Weber, 1924) has consistently found that in most organisations there are
usually unofficial short-cuts in working practices which are not officially allowed, but without
which the system would be too unwieldy to work. A simple illustration of this is the work to
rule, a form of industrial action in which the participants follow the official procedures
exactly. This usually reduces productivity dramatically. The distinction between front and
back versions is not an absolute one, but more of a spectrum. Outsiders may become
gradually accepted by the group, and given access to increasingly sensitive back versions of
events.
2.4.6 The so-called “stranger on a train” effect is a paradoxical effect, in which people are
prepared to discuss extremely personal and sensitive information if the situation is one of
anonymity (such as talking to a sympathetic stranger on a train whom one does not expect
ever to meet again). This may be used by investigators, but requires careful setting up – for
instance, it is advisable only to use a single elicitation session with each respondent, and to
make it clear that the respondent will not be identifiable in the published outcome of the
research.
2.4.7 Future system knowledge is the term used by Maiden and Rugg to describe knowledge
about future systems in the context of software development. This term was used within the
context of developing software systems; a more appropriate term for general questioning
would be “predictive knowledge”. This involves quite different issues from the knowledge
types described above. In the case of the knowledge types described above, the relevant
knowledge exists somewhere, and the key problem is accessing this information reliably and
validly. The term “accessing” is an important one in this context. “Elicitation” describes the
process of extracting information from the respondent, via the respondent; however, some
types of knowledge, such as tacit knowledge, have to be acquired by indirect induction rather
than directly from the respondent.
An example would be the use of observation to identify key actions during performance of a
compiled skill; it would in principle be possible to produce a complete and correct description
12
of this skill without the respondent ever knowing what was in the description. In knowledge
acquisition, this sort of situation occurs in relation to machine learning, where the salient
variables may be identified via explicit elicitation from a human respondent, but the correct
weightings and correlations between these variables are then worked out by software. This
approach can lead to a system which performs better than the human experts from whom the
variables were elicited (Michalski and Chilausky, 1980; Kahneman, Slovic and Tversky, 1982);
the reasons for this have important implications for questioning methodology, and are
discussed in more detail below. The distinction between elicitation and acquisition is now
generally accepted in Artificial Intelligence (AI) and in requirements engineering, with
elicitation of knowledge or requirements being recognised as subsets of knowledge
acquisition or requirements acquisition respectively.
2.5 Predicting requirements and behaviour. When a new product is being developed, it is
not normally possible for any single individual to predict what the requirements will be. One
reason for this is that usually more than one stakeholder is involved, leading to the need for
negotiation of requirements between stakeholders.
Another reason involves what is known in Information Science as the Anomalous State of
Knowledge (Belkin, Oddy and Brooks, 1982). An Anomalous State of Knowledge (ASK)
exists when a person wants something (e.g. a relevant reference or a new system), but does
not have enough knowledge of the possibility space to be able to know what is possible and
what could therefore meet their requirements. This is particularly striking in the case of
software development, where users may be utterly unaware of what is technically feasible,
and may dramatically alter their requirements when they see what can be done.
A third major reason for problems identifying future needs involves people’s weakness in
predicting future events and behaviours. This is well recognised in attitude theory, where it
has long been known that people’s expressed attitudes correlate weakly at best with their
actions (e.g. Wicker, 1969). The same principle applies to people’s predictions about their own
behaviours in situations such as seeing smoke come from underneath the door in a waiting
room. Some personality theorists have gone so far as to argue that an individual’s own
predictions about their behaviour in a given situation are no higher in validity than the
predictions of someone else who knows that person well, and that our mental models of our
personalities are derived from observation of our own behaviour, rather than being the cause
of our own behaviour. Although more recent research has shown that it is possible to reduce
significantly the gap between expressed attitudes and actual behaviours by concentrating on
key variables in the research design and the data collection (Myers, 1990), the gap is still a
long way from closed, and the topic needs to be addressed with care.
This issue may well be a subset of a more general principle, namely human weakness in
dealing with multivariate information. A considerable literature in judgement and decision
making has consistently found that humans are bad at identifying randomness in multivariate
data, with a corresponding tendency to see correlations and patterns where none exist
(Kahneman, Slovic and Tversky, 1982). When correlations and patterns do exist, people are
consistently poor at weighting the variables correctly. An elegant example of this is a study by
Ayton, (1998), involving prediction of football scores. The first part of this study involved
asking British football fans and Turkish students with no interest in football to predict British
football results. The result was that the Turkish students performed at a similar level to the
British fans, at well above the level which would be expected by chance. The Turkish students
were using the only information available to them, namely whether or not they had heard of
the teams or the towns where they were based. These tended to be the larger and/or more
famous examples, and these tended to beat smaller or less famous rivals. This effect was a
strong one, and the other variables used in predictions by the British fans were comparatively
weak predictors; the British fans, however, weighted these other variables too heavily in
relation to the main one.
An obvious way of dealing with this problem, and one already used in knowledge
acquisition, is to use elicitation techniques to identify the salient variables, and then use
13
statistical or computational techniques to identify the appropriate weightings for these
variables. This approach seems to have been comparatively little used in the social sciences,
although multivariate approaches are routinely applied to the variables identified by the
researchers involved. If human weakness in handling multivariate data is as prevalent as it
appears, then attempts to extract accurate predictions from people will usually be attempts to
find something which does not exist, and will therefore be a waste of time and effort.
It should be noted as a parenthesis that, although the findings on human judgement and
decision making (J/DM) described above are reliable and robust, there has been debate about
their validity. The naturalist school of J/DM research argue that the effects found in the
“heuristics and biases” are largely artefacts of the statistical representation used by them. The
“heuristics and biases” school have generally used a probabilist presentation, i.e. one
involving probability judgements, when framing the experimental task. Researchers such as
Gigerenzer argue that if the same task is reframed in a frequentist format, i.e. one involving
frequency judgements, then the biases and distortions described above no longer occur
(Gigerenzer, 1994). This debate is unlikely to be resolved in the near future, and is closely
linked with a long-running debate in statistics about the relative meaningfulness and validity
of probabilist and frequentist representations.
It is likely that future research will identify further types of memory and knowledge filter; for
instance, the authors are currently investigating the potential semi-tacit category of “not
worth mentioning” knowledge, and intend to investigate tacit knowledge in more detail.
14
3 SELECTING AND INTEGRATING TECHNIQUES
It is clear from the account above that no single technique is likely to be able to deal with all
the types of knowledge involved in any given situation. Selection and integration of the
appropriate techniques is therefore necessary. There are various facets on which selection and
integration can be described, such as knowledge types involved, equipment needed and input
and output formalisms. For brevity, only selection and integration on the basis of knowledge
type are described in any detail here. Table 1 below is not exhaustive or set in tablets of stone;
its main function is to provide a clear overview of the recommendations arising from the
analysis of knowledge types and of techniques above. The reasons for the recommendations
should be clear from the preceding text.
15
Table 1: recommended and contra-indicated techniques for handling each knowledge type.
Knowledge type:
Recommended
technique(s):
Contra-indicated
technique(s):
Predictive knowledge
Any technique, but problems
with validity
None
Non-tacit knowledge
Any technique, but there
may be problems with
validity of memory
None
On-line self-report
All others (see list in section
1.2)
Techniques which do not
involve showing examples to
the respondent (e.g.
interviews)
All others
Semi-tacit knowledge:
Short term memory
Recall v. recognition
Taken for granted knowledge
Preverbal construing
Front and back versions
Tacit knowledge
Compiled skill
Implicit learning
Techniques involving
showing examples to the
respondent (e.g. reports,
picture sorts, item sorts)
Observation;
laddering
Repertory grid; card sorts;
laddering; possibly reports
and interviews if handled
with care
Observation; possibly
interviews, critical incident
technique and reports once
good rapport has been
established with respondent
Observation and
experimentation
Observation and
experimentation
16
All others
All others
All others
All others
One important part of questioning is the identification of which knowledge types are most
salient in the situation being investigated. Practical considerations of time and resources
usually limit the amount of investigation which can be undertaken, so it is important to
identify the most important aspects of the situation and to choose the appropriate techniques
for them. A certain amount of information can often be gained informally during the initial
meetings with potential respondents, gatekeepers and other members of the organisation
when a study is being set up. If the research is to take place in a commercial company, for
instance, it is often possible to use direct and indirect observation when on the way to the
contact person’s office – for instance, the demeanour of the staff, the information and other
resources available to them (e.g. manuals on desks) and the speed with which they perform
tasks. Demonstrations of tasks allow the identification of tacit knowledge; a standard
indicator of this is that the demonstrator is able to talk while performing the task, with the
conversation ceasing when conscious thought is required to perform the task. This kind of
information is difficult or impossible to gather using preliminary interviews; however helpful
the respondents are, they will omit to mention taken for granted knowledge, and will
probably never have noticed the extent to which they use tacit knowledge. This issue is
discussed in more detail in the case studies described below.
Once the types of knowledge involved have been identified, it is then possible to start
prioritising the topics which need to be investigated further, and to select the appropriate
techniques to handle the knowledge involved. It is advisable to proceed this way round,
rather than selecting the issues first and then profiling the knowledge involved, because the
profiling may well reveal serious misconceptions in the elicitor’s initial model of the area. An
effective demonstration of this is to ask a geologist to give an on-line self-report on how they
identify a rock specimen, leading up to identifying it, and then to follow this immediately by
asking the same geologist to identify a rock specimen and then explain how they knew that it
was the stated type of rock. For the second task, experienced field geologists will usually be
able to identify a rock before the elicitor has finished putting it on the table; the on-line selfreport, however, can go on for as much as half an hour. It is clear that the actual identification
is accomplished by some form of tacit knowledge (in this case, pattern matching) and that the
tasks described in the on-line self-report are a reconstructed version of how to proceed, used
only for teaching students or for difficult specimens. Such differences can easily mislead the
inexperienced elicitor depending on initial briefing interviews; a moment spent in observation
is seldom wasted.
3.2 Terminology
One historical legacy of the separate evolution of elicitation techniques is that there has been
only partial and unsystematic transfer of concepts across techniques and disciplines, so that
concepts viewed as indispensable in one area are practically unknown in another. This section
describes a range of concepts which are relevant across disciplines and techniques, and which
are among the conceptual tools of questioning methodology as an integrated discipline.
The terminology described below derives from a variety of sources, but primarily from
knowledge representation, which is a fairly recent but well established and extensive field
within Artificial Intelligence. A good introduction is provided by Reichgelt (1991). Knowledge
representation is also important in other areas of computing, such as requirements engineering
(Jarke, Pohl, Jacobs, Bubenko, Assenova, Holm, Wangler, Rolland, Plihon, Schmitt, Sutcliffe,
Jones, Maiden, Till, Vassilou, Constantopoulos and Spandoudakis, 1993).
A full description of the topic is beyond the scope of this paper; however, it provides an
important basis for a metalanguage for questioning methodology. One significant advantage
of using this literature as a foundation is that there has been considerable work on the formal
semantics of the various representations used. This allows a more systematic, clean and
rigorous terminology than would otherwise be the case. The following account draws heavily
on this literature, with additions from other literatures where appropriate.
17
3.2.1 Validity and reliability
An important initial distinction is between validity and reliability, used here in the sense in
which the terms are employed in statistics and experimental design. “Validity” describes the
extent to which what is elicited corresponds to reality; “reliability” describes the extent to
which the same finding occurs repeatedly, whether between different elicitors, different
respondents, different occasions, or whatever other variable is involved. The standard
metaphor is target shooting, where “validity” refers to how near the bullets are to the target,
and “reliability” refers to how near the bullets are to each other. Bullets may be near to each
other while very distant from the target, which is generally less desirable than the converse;
however, it is usually easier to assess reliability than validity, and it is tempting to hope for
the best if the results are reliable.
An everyday example of this is the Father Christmas effect. If a number of respondents are
separately asked to describe Father Christmas, then their accounts are likely to agree closely
(white bearded man, somewhat overweight, in long red coat and hood with white trim –
probably a more detailed description than in many crime reports). However, this reliability
does not mean that there is a real Father Christmas, only that there is a widely known
stereotype, which all adult respondents know to be fictitious.
Human memory is subject to numerous distortions, biases and imperfections, and should
therefore be treated with caution. The clarity and detail of a memory are not valid indicators
of its accuracy. Distortions can be significant, such as complete reversals of a sequence of
events. There is a considerable literature on this topic, dating from Bartlett’s early work
(Bartlett, 1932) to more recent work by e.g. Loftus and Palmer, (1974) and Baddeley (1990).
Robust findings include the active nature of memory, which involves encoding of events into
memory rather than a passive recording of them. This encoding frequently leads to
schematisation of the memory so that it fits into a familiar schema, even though this may
involve a reversal of the sequence of events, or of the role of the participants involved.
3.2.2 Category theory and fuzzy representations.
Categorisation is an important part both of everyday cognition and of expertise. Categories
are usually defined in terms of the set of attributes which are specific to the category in
question – for instance, the category “bird” in lay language is defined in terms of having
feathers, being able to fly, making nests and laying eggs. However, many categories are not
watertight, in the sense of having no exceptions or ambiguities, and there may be similar
uncertainty about the individual attributes. In the case of birds, for instance, penguins do not
fly, most reptiles lay eggs and some penguins do not make nests. Within individual attributes,
an attribute may be defined in terms of several sub-components, and these, like the attribute,
may be “fuzzy” attributes. This term refers to attributes whose applicability is not a clear-cut
“either-or” issue, but rather a question of extent. The concept “tall”, for instance, applies
strongly to someone two metres high, but there is no unambiguous cut-off point at which a
height is described as “average” rather than “tall”. This lack of precision, however, does not
stop the attribute from being meaningful; it means, rather, that the metalanguage needed to
describe it needs to be sufficiently sophisticated.
Category theory and more specifically prototype theory have been investigated in some
depth by Rosch (Rosch, 1983) and other researchers in the same tradition, who use the
concept of core membership of a category, with increasing degrees of variation from the
prototypical core membership. A robin, in this approach, is a prototypical bird exhibiting all
the usual attributes of membership of the category “bird”; a puffin is less prototypical, and a
penguin is on the edge of the category. Various branches of set theory and of formal
semantics also deal with the same issue of categorisation, which is an important and
ubiquitous one.
18
At a practical level, categorisation has major implications for any bureaucracy, and
particularly for a bureaucracy trying to automate its procedures, which has been noted since
Weber’s research into bureaucracies (Weber, 1924); the same is true for the law. For instance,
assessment of welfare entitlements, or of tax liability, often involves a considerable amount of
decision-making about the appropriate category in which to put a particular issue; once the
category has been decided, the rest of the assessment is comparatively trivial. At a theoretical
level, the topic of categorisation is of particular interest to social anthropologists, in terms of
the social construction of defining features of social structure, such as in-groups and outgroups.
Fuzziness is the topic of an extensive literature on fuzzy logic, dating back to Zadeh’s original
work (Zadeh, 1965). This literature uses a mathematical approach to describe degrees of
membership of fuzzy sets, and has proved a powerful tool in handling data of this sort. The
basic concept is that set membership is quantified on a scale from zero (not a member) to one
(completely a member), with intermediate membership being given an intermediate numeric
score, such as 0.3 or 0.7.
There are also extensive literatures in statistics and psychology, particularly judgement and
decision-making (J/DM) dealing with areas such as uncertainty, stochastic events, imperfect
knowledge and incomplete knowledge, which are different from fuzzy knowledge, but may
overlap with it. Uncertainty refers to knowledge may or may not be true; stochastic events
happen or do not happen on a probabilistic basis; imperfect knowledge contains errors;
incomplete knowledge is simply incomplete. Thus, for example, a doctor may think that a
patient has a particular disease, but not be sure of the diagnosis (uncertainty); the disease may
be known to cause delirium at unpredictable intervals (stochastic events); the medical records
may contain errors, although the doctor does not know which parts of the records are correct
and which are incorrect (imperfect knowledge); and the medical records may not contain any
information about one aspect of the patient’s previous health (incomplete knowledge). Each
of these has different implications for theory and practice.
3.2.3 Terms from knowledge representation
The standard literature on knowledge representation in Artificial Intelligence deals in depth
with formalisms for representing knowledge, including facts, relationships and actions.
Although these provide a powerful language for handling the output from elicitation
sessions, this is too broad a topic to be covered in detail in this paper, so only an outline is
given below.
Three well-established formalisms for representing relationships are nets, frames and rules.
Nets, i.e. semantic networks, have the advantage of considerable flexibility in handling
different types of relationship (e.g. “is-a” and “part-of” links) but the disadvantage of unclear
semantics and of lack of structure. Frames involve a slot and filler notation, in which the
various relevant semantic categories are listed in advance and then filled in for each instance
being described. These have the advantage of clarity and completeness, but the disadvantage
of rigidity. Rules represent information in terms of conditions and consequences (e.g. IF
condition A AND condition B THEN action C). This is useful for representing knowledge
about actions, but can lead to problems of obscurity with regard to precedence, concurrency,
etc in large rule sets. Although all of these formalisms are relevant to elicitation, the most
immediately relevant is semantic networks, whose terminology is explicitly used in laddering
and in category theory (described later).
Another set of representations from AI with implications for questioning methodology deals
with classes, attributes, and inheritance. Classes are categories which may be composed of
sub-classes and of sub-sub-classes. Eventually all classes end in instances, i.e. specific, unique
entities which belong to that class. A familiar example is zoological classification, in which the
class (using knowledge representation terminology) of canids includes the sub-class of dogs,
and the sub-class of dogs in turn contains instances consisting of all the dogs in the world.
19
Each class has a set of attributes which define and/or describe it; for instance, the class of
mammals includes the attributes of giving birth to live young and suckling the young with
milk.
The concept of inheritance refers to the situation where a sub-class has not only its own
attributes, but also inherits the attributes belonging to any higher-level classes to which that
class belongs. Although computationally and semantically attractive because of its parsimony
and elegance, this concept encounters representational problems with inheritance from
different sets of higher-level classes and with exceptions which over-ride the inherited
attributes; it therefore needs to be applied with caution. The classic example is Tweety the
bird: the class of “bird” normally has the attribute “able to fly”, but if Tweety is a penguin,
then this inherited attribute has to be over-ridden at the level of the class of “penguin” with
the attribute “unable to fly”.
3.2.4 Terms from Personal Construct Theory (PCT) terminology
Personal Construct Theory makes an initial distinction between elements (the entities being
described) and constructs (the concepts used to describe them). This distinction is very
similar to the distinction in AI between objects and attributes respectively. Considerable
emphasis is placed in PCT on the elicitation of respondents’ own categorisation in the form of
elements and constructs. Although elicitation of constructs may appear to a novice to be an
endless task, in fact the number of constructs relevant to a particular domain of discourse is
usually quite small (in fact, usually less than twenty, and often significantly less than that).
Part of the reason for this is that the domain of discourse is only relevant to a sub-set of the
constructs which the respondent knows; another part of the reason is that respondents will
explicitly state that they know of more constructs which are applicable, but which are not
particularly important. Since the elicited constructs are usually tersely described (two or three
words) and tractable in number, it is possible to compare results across different respondents
more easily than is the case with interviews, etc, and with more validity than is the case with
e.g. questionnaires, which normally impose the elicitor’s constructs on the respondent rather
than eliciting the respondent’s constructs.
PCT has an explicitly defined set of terminology and concepts, such as focus of convenience
(the core area to which a construct can be applied) and range of convenience (the range of
contexts to which a construct can meaningfully be applied). Focus of convenience and range
of convenience are the most immediately relevant to questioning methodology, and space
prevents a more exhaustive listing, but PCT terminology is an area which could profitably be
studied by elicitors working in a range of disciplines and approaches in which it is currently
little known, such as discourse analysis. In particular, its combination of flexibility and
formalism would make it well suited to areas which have in the past used structuralism or
semiotics; PCT is at least as flexible and formalist as these, but considerably richer and better
defined. This flexibility is also a factor in the authors’ preference for PCT over approaches
such as Q methodology. For instance, the classic Q sort, in which cards are sorted into a
predetermined distribution, is diametrically opposed in its approach to the PCT practice of
examining a respondent’s repertory grid specifically to see whether the responses show an
unusual distribution. One potential link between PCT and grounded theory (Glaser and
Strauss, 1967) could repay investigation: grounded theory’s concept of tracing inferencing
through a series of levels of abstraction of data has clear similarities to some of the concepts in
laddering. In particular, laddering on explanations can be used to check whether concepts
have been fully explained, as described below in the section on graph theory.
20
3.2.5 Graph theory
A relevant literature which is comparatively little known in most non-mathematical
disciplines is graph theory. This provides a clear, useful notation for representing knowledge
in a way which allows qualitative analysis to be combined with quantitative. The term
“graph” in this context refers not to graphs in the sense of plotting sets of values against each
other, but to items linked to each other by lines, as in the simplified diagram below.
Figure 1: a three layer graph
A
node
arc
A1
A1a
A2
A1b
A2a
A2b
A2c
In this case, the top-level node (A) is joined by two arcs (connecting lines) to two lower-level
nodes, (A1 and A2). Node A1 is joined by two arcs to leaf level (bottom level) nodes (A 1a and
A1b); the node on the right (A2) is joined by three arcs to leaf level nodes. The graph has a total
depth of three levels; the leaf-level nodes are the children of nodes A1 and A2, which in turn
are the children of node A. The terms “nodes” and “arcs” are widely used in a range of
disciplines in the sense described above, although formal graph theory favours the terms
“vertices” and edges” respectively for the same concepts.
There are various forms of graph, such as trees (graphs in which each node may have an
upwards connection to a parent, and may have downwards connections to one or more
children, but no sideways connections to other nodes) and nets (graphs which do not have the
hierarchical structure of trees, and in which sideways links may occur). Graphs may be
directed (each arc may be followed in one direction only) or undirected (each arc may be
followed in either direction).
Using a very simple tree as an example, it is possible to see how graphs offer a powerful and
flexible formalism for representation of relationships. For instance, it is possible to count the
layers of nodes in the graph, as an index of hierarchical organisation of structure, or to count
the number of nodes at a particular level of the graph, as an index of differentiation and
breadth at that point. An obvious application is the study of organisational behaviour, where
such indices can be used to describe the structure of the organisation; however, the same
concept can be applied to other areas. It has, for instance, been applied to elucidatory depth,
i.e. the number of successive layers of explanation needed to reach public domain terms or
tacit knowledge (Rugg and McGeorge, 1995), and can be applied in the same way to the
fabricatory depth, i.e. the way in which tools are used to make tools to make tools as an index
of the depth and breadth of a culture’s technological infrastructure (currently being
investigated by the authors).
21
Facet theory, as used by Rugg and McGeorge (1995), is derived largely from graph theory,
with the concept of separate trees orthogonal to each other but sharing some or all of the same
leaf-level instances. This concept is conveniently similar to the concept of “views” in software
engineering, and is becoming increasingly used in that field. A similar concept is well
established in information science (Vickery, 1960), though without the same underlying
mathematical formalisms. Facet theory makes it possible to describe complex multivariate
structures as a set of separate and comparatively simple structures, and is applicable to a
wide range of uses. For instance, an organisation may have one structure for the commercial
organisation, another for union membership within it and another for safety officers.
3.2.6 Schema theory
One of the features which Bartlett discovered in his research on memory (Bartlett, 1932) was
that the processes of memory tend to organise events and facts into regular templates, which
Bartlett termed schemata. The same underlying concept has been re-worked repeatedly in
psychology since then, for instance in the form of script theory (Schanck and Abelson, 1977).
This phenomenon is important to questioning methodology for two main reasons. The first is
that it explains and predicts certain types of error in memory, particularly recall, which is
salient to questioning techniques dependent on the respondent’s memory of the past. The
second is that it helps explain the way in which respondents, particularly experts, structure
parts of their expertise.
This has important implications for elicitation of information about values and judgements,
and can explain apparent inconsistencies in them, although there appears to have been
comparatively little work on this. In the field of software metrics, for instance, the majority of
work appears to have concentrated on the elicitation of individual metrics for evaluating
software, rather than on finding out which categories respondents use to cluster software into
groups, and which metrics are relevant to each of those groups. In the domain of car design,
for instance, there are well established groups of car, such as town car, luxury car and estate
car. The metric of “size” is applicable to all of these, but the desired value is very different for
the different groups. In the case of a town car, small size is an asset, whereas in the case of a
luxury car it is a drawback.
Techniques such as laddering are well-suited to the elicitation of schemata, and it will be
interesting to see what comes of future work using this approach. The field of software design
appears to be particularly ready for such work, which would complement the existing
literature on customising software to the individual user, and on identifying generic user
types.
22
4 METHOD FRAGMENTS
The traditional unit of analysis and discussion in elicitation is the method/technique: for
instance, the interview, or the questionnaire, or repertory grid technique. There are, however,
significant problems with this approach when looking at the bigger picture. One problem is
that for most techniques there is no single standard form, so any description of the technique
has to include descriptions of the main variants of the technique. Another is that the various
techniques tend to blur into each other – the distinction between a self-report and an
interview in which the respondent uses materials to demonstrate a point, for instance, is hard
to draw. A related further problem is that the same features may occur in two or more
techniques, leading to duplication of description in any systematic account of the techniques.
These problems, and others like them, make it difficult to provide a systematic, clear, precise
set of descriptions and prescriptions about methods/techniques and their use.
One solution to this problem is to use a finer-grained unit of analysis. Instead of treating each
method or technique as an integral whole, one can instead treat it as being composed of a
number of sub-components. For instance, in scenarios the elicitor uses a prepared set of
information for the respondent; the elicitor then asks natural language questions; the
respondent answers using natural language responses. This is quite different from the
structure of a repertory grid session, where the elicitor uses a prepared grid format, and
encourages the respondent to identify constructs which describe the elements used in the
grid. It is, however, composed of some elements in common with a structured interview,
which also involves natural language questions and natural language responses.
We introduce the term “method fragments” to describe these sub-components. Method
fragments can be identified at various levels of granularity. The coarsest grained level consists
of fragments such as “natural language question,” with finer grained levels such as “natural
language question about a future event” and “natural language question phrased as a
probability value.”
Method fragments have obvious practical advantages in any systematic work involving
elicitation techniques and methods. They can be used to reduce or remove repetition when
two or more techniques share common method fragments. In such cases, it is only necessary
to cover each method fragment once, and to state which techniques involve that fragment.
They can also be used in a “pick and mix” way to create the appropriate customised variant
of a technique, or blend of two or more techniques, to fit a particular situation. One of our
recent student projects, for instance, involved asking respondents to say what was going on in
a photo, then followed this up with a short set of previously prepared questions, the
responses to which were in turn probed using laddering. These fragments allowed
investigation of attributional effects via the report on the photos (for instance, investigation of
how women’s status was perceived in photos where the women were using IT equipment)
which could then be compared with the accounts obtained via the interviews; the laddering
allowed identification of attributes which were perceived as status markers, which could in
turn be compared with results from the other two fragments.
A more profound advantage is that the use of fine-grained method fragments makes it
possible to provide grounded advice about use of appropriate formats. In the case of “natural
language question phrased as probability value,” for instance, there is a considerable
literature within the Judgement and Decision-Making (J/DM) area of psychology dealing
with the various cognitive biases which are associated with probabilist and frequentist
presentations of the same underlying question. Similarly, the literature on attribution theory
provides strong guidance about outcomes from phrasing a question in the second or the third
person (“What would you do…” versus “what would most people do…”).
Although it might be thought that the number of potential method fragments would be
enormous, our initial work in this area suggests that the number is in fact quite tractable. Our
research so far has been both bottom-up, working from practical experience towards theory,
23
and top-down, working from theory towards practice. There is still a considerable amount of
work to be done in this area, but it holds great potential.
24
5 CASE STUDIES
5.1 Bulk carriers
The first case study described here was one of the precipitating events leading to the
development of the framework described above. The case study involved following the
development of software to be used in the loading of bulk carrier ships. As part of this
process, the author wanted not only to interview the software development team, but also to
observe them in action, and to observe loading in progress. The interviews were
unproblematic, but there were practical and security problems with access to the loading.
During the negotiations about this, the software developers decided to undertake their own
visit to observe loading, since their knowledge of the process came from requirements given
them as documentation.
The developers soon found several important aspects of the loading process which had
serious implications for system design, but which had not been mentioned anywhere in their
documentation. For instance, the developers had assumed that loading would occur at a fairly
constant rate, making it possible to predict loading strains on the hull reasonably well in
advance; however, this assumption turned out not to be correct. It also transpired that hull
stresses could very quickly change from safe to dangerous if the cargo being loaded was a
dense one, such as iron ore, where a large weight of cargo could be loaded very quickly. The
developers had also not realised how much noise, glare and vibration were associated with
the loading process, which had serious implications for the design of any computer based
warning system.
In this example, the system analysis had been carried out competently by professionals, but
had failed to record several important facts in the documentation. These facts were
discovered in less than an hour by developers with no formal training in observation, leading
one to wonder how many more might have been uncovered by a trained specialist.
Interestingly, all the missing factors in this example appear to have been cases of taken for
granted knowledge.
5.2 Industrial printing
The second case study was undertaken by Blandford and Rugg (in preparation) as part of an
assessment of the feasibility of integrating requirements acquisition for real-world software
systems with usability evaluation in general and Programmable User Models in particular.
The domain involved was industrial printing of, for example, sell-by dates onto products; the
company involved was a market leader in the field. The case study consisted of two main
phases, the first of which was undertaken at the company’s premises, and the second of
which was undertaken in a client’s food processing factory, where the equipment could be
seen in action.
The first phase involved interviews with stakeholders, conducted separately to identify any
differences between stakeholders with regard to requirements, and also demonstrations of the
equipment, which were combined with observation and on-line self-report. The
demonstrations showed that the demonstrators did not use the equipment often enough to
have compiled the skills involved in using it, and also showed that using it was not a trivially
easy task. (Since the equipment is normally set up to print a sell-by date a given number of
days in the future, and can automatically update the date to be printed, simply showing the
user the date being printed is not enough; the equipment also needs to be able to show the
length of time by which the date is being offset.)
It became clear that user navigation through the equipment and security issues associated
with the password protection for the equipment were particularly important potential
problems. Programmable User Models were used to identify particular problems which
25
might arise, after which the visit to the client’s site was conducted to see how well these
predictions corresponded with reality.
The security issue turned out to have been solved by a passive work-around; the equipment
was positioned next to packing teams, making it extremely difficult for anyone to use the
equipment without authorisation. One device, however, was positioned in an isolated part of
the factory, and there had been concerns about its security, as predicted by the authors.
(There had been one occasion when a print code had mysteriously changed in the middle of a
shift.)
The user navigation issue turned out to be an interesting one in several ways. The first phase
of investigation had shown that frequency of use of the device would be an important
variable (and one where the software development stakeholder and the training stakeholder
had different perceptions of how often typical users would use the device). The authors had
predicted that if the device was used frequently enough, then the skills involved would
become compiled, and navigation would not be a problem; however, if the device was used
less frequently, then navigation would be a problem, with various likely errors.
The site manager told the authors that there were different levels of training for the different
staff involved, who used the device at various levels of sophistication. He mentioned that he
and some of the other senior staff had been on the full training course, and were familiar with
the device. This is what would be expected as a front version, and there was the prospect that
the back version would be very different. However, when the manager demonstrated a
feature of the device, the speed at which he operated it was clearly the result of a compiled
skill, which indicated considerable use of the device, which in turn indicated that there was
not a significantly different back version. For staff who used the device less often, for simple
tasks, there were printed “crib sheets” (aides-memoires) attached to the device. This was an
interesting finding, since an earlier interviewee had told the authors that this approach would
not be used in the food industry because of the need to clean the outside of the device
frequently to comply with health and safety regulations.
In addition to these expected issues, some serendipitous findings emerged. It had been
expected that observation would identify issues missed in the previous sessions, but it was
not possible to predict what these would be. An example of this was that the air in the second
site contained a high proportion of suspended dust particles from the dried foods being
processed. This was not directly relevant to the requirements for the equipment design, but
had important indirect implications. The amount of dust was sufficient to make wearing
spectacles inconvenient, since the lenses soon became dusty. A significant proportion of the
staff on site were middle aged, and needed glasses to read small text, such as that on the
device’s display screen. Since the site dealt with food processing, health and safety legislation
meant that staff had to wear white coats. This combination of factors meant that for a
significant proportion of staff, checking the display on the device involved taking out
spectacles from under a white coat, putting the spectacles on, reading the display, cleaning
the spectacles, and then putting them away again. This in turn meant that it was not possible
to depend on staff glancing at the display in passing as a routine method of checking the
device, with consequent implications for working practices.
Although the client had a long relationship with the company, and the sales representative
who accompanied the authors to the site was on good terms with the manager and clearly
knew the site well, there had been no previous mention of the dust issue and its implications.
The most likely explanation was, once again, taken for granted knowledge which had gone
unmentioned and undetected until observation was used.
26
5.3: women’s working dress
A study of perceptions of women’s clothing at work used card sorts to investigate
categorisation of women’s working dress by male and female respondents (Gerrard, 1995).
This is an area which had previously been investigated by other researchers using a range of
familiar techniques. However, there had not been any previous work using card sorts, which
appeared to be a particularly appropriate technique for this area. In the study, each card held
a different picture of a set of women’s clothing worn by a model. Respondents were asked to
sort the cards repeatedly into groups of their choice, using a different criterion to categorise
all of the cards each time (individual cards could be sorted into a group such as “not
applicable” or “don’t know” if necessary). One finding was that half of the male respondents,
but none of the female respondents, used the criterion of whether the women depicted were
married or unmarried. This was something which had not emerged as an issue in any of the
previous research in this area. It was also of interest because the pictures did not show the
faces or the hands of the models, because of the risk of distraction from cues other than the
clothing itself, so the respondents were unable to see wedding rings or other indications of
marital status, and were therefore categorising solely on the basis of the clothing.
5.4: change at work
Management of change is a topic which has received considerable attention from researchers
and practitioners. Change of apparently trivial factors can have knock-on implications which
connect to high-level values and goals in those affected by the change, and which can in turn
lead to strong emotions and often resistance to the proposed change. This appeared a
particularly suitable area for investigation via laddering, and was investigated using both
laddering and a questionnaire in the same organisation, which was about to bring in a new IT
system (Andrews, 1999).
The results obtained via the two techniques had some similarities; for instance, the theme of
improved communication via the proposed new IT system ran through responses from both
techniques. However, there were also some striking differences. For example, only 5% of the
respondents stated in questionnaires that the new technology would affect their job security,
whereas this was explicitly mentioned by 43% of the respondents when laddering was used.
Another interesting result emerged during the quantitative analysis of the laddering results.
This involved counting the average number of levels of higher level goals and implications of
the new system elicited from respondents in different positions in the organisation. The
average number of levels used by respondents with higher positions in the organisation was
1.9, whereas respondents lower in the organisation used an average of 3.7 levels. This result is
counter-intuitive, since one would expect the more senior staff to have thought through more
implications than the less senior staff. However, what was happening with the responses was
that often the more senior staff were proceeding directly to the implications for the
organisation, whereas the less senior staff were first proceeding to the implications for them
personally, then moving to the implications for the organisation, and then proceeding to
further implications for them personally.
Case studies: summary
The case studies are simply case studies; wholesale testing of the framework will be a much
larger operation. However, it is significant that in all cases, important issues were missed by
previous work, and emerged only when different questioning techniques were introduced to
the field, as predicted by the framework. It is also interesting that taken for granted
knowledge, missed by interviews, featured prominently in both the first two cases.
The problem of missing knowledge cannot be simply resolved by using observation in
addition to whichever other technique the elicitor happens to favour; although observation
happened to be an appropriate method in two of these case studies, there are other situations
27
where it is impractical or impossible. An example of this occurred when one of the authors
was investigating staff and student perceptions of what constituted a good dissertation. Staff
and students agreed that presentation was an important factor, but elucidation via laddering
of what was meant by “good presentation” showed that students interpreted “good
presentation” quite differently from staff. The systematic nature of laddering made it possible
to uncover the different interpretations in a way which would not have been practical via
observation (which would have required an enormous range of examples of dissertations, and
even then could not have guaranteed to identify all the rare features). It also improved the
chances of identifying that there were different interpretations of the same term: because
laddering usually proceeds down till a term has bottomed out, it elicits a fairly full
description of how a term is being used. Interviews can do this, but it is not an inherent
feature of interviews per se, and the degree of elucidation is normally decided by the
preferences of the interviewer rather than any systematic principle. Selection and integration
of techniques is clearly a critical factor in eliciting valid, reliable information, and needs to be
considered carefully.
28
6 DISCUSSION
It should be clear from the examples above that questioning methodology spans a wide range
of areas, and that a considerable amount of work remains to be done. The following sections
discuss the main issues involved.
6.1 Questioning methodology
It is clear that choice of questioning technique is something which draws on a wide body of
findings from a numerous disciplines, and which is not trivially simple. It would therefore
make sense to treat questioning methodology as a field in its own right, analogous to, and
complementary to, statistics and survey methods. The commonality of methodological and
theoretical issues in questioning across disciplines is sufficient to make cross-fertilisation both
possible and desirable.
It is also clear that no single technique is adequate for handling the full range of knowledge
types likely to be encountered, and that elicitors should expect to use more than one
technique. Choice of the appropriate technique is an important issue, and there is a need for a
guiding framework which is empirically validated and which is theoretically grounded in
knowledge types and information filters, rather than a hopeful collection of ad hoc rules of
thumb.
6.2 Validation
The framework described here is built of components which are individually validated, but
this does not mean that the way in which they have been assembled is necessarily valid. An
important piece of future work is validation of the framework, so that errors can be identified
and corrected. The authors are currently working on this, via a combination of case studies
and formal experiments. Initial results from case studies are consistent with the predictions of
the framework, particularly in the case of semi-tacit knowledge; the role of taken for granted
knowledge has been particularly striking.
6.3 Training needs
A practical point arising from the discussion above is that if the framework is validated by
further work, then elicitors will need to be trained in relevant techniques before undertaking
questioning work. This has serious implications in terms of training needs, and it is likely that
questioning courses, analogous to statistics courses, would need to be set up as a routine part
of academic and industrial infrastructure. The authors’ experience is that such a course is
feasible, especially if the initial emphasis is on providing an overview of the framework and
the main issues, with more detailed coverage of the particular techniques needed for an
individual project.
6.4 Cross-discipline work
One of the reasons that a framework has taken so long to emerge is almost certainly that the
relevant knowledge was scattered across so many different disciplines that most researchers
would never see enough to obtain an overview. The brief descriptions above of the various
techniques and concepts does not do justice to the wealth of knowledge which has been built
up in the different disciplines, and there is much to be gained from exchange of knowledge
across disciplines.
The need is not only for exchange of information, but also for comparisons across disciplines,
domains and cultures, to assess the degree of commonality and of difference which exists
29
across them. Repertory grid technique, for instance, makes clear and explicit assumptions
about some aspects of human cognition; it would be interesting, and very much in the
tradition of PCT, to see whether these hold true across different cultures.
6.5 Future work
The most immediate and obvious need for further work involves empirical validation of the
framework described above. Although the framework is composed of established parts, this
does not guarantee that the way in which they have been fitted together is correct. Validation
of this sort requires large data sets; initial results from case studies indicate that the
framework is sound, and provides useful guidance. One striking effect in the case studies has
been the prominence of taken for granted knowledge as a source of missed requirements
when only interviews are used. Another interesting feature is the frequent use of tacit
knowledge by experts, often as a sub-component of a wider skill or task (described in more
detail below).
An area which is attracting increasing attention is elicitation of information across cultures –
for instance, when a high-cost product such as a building or an aircraft is being designed and
built for a client from another culture. Even within a single country, different organisations
can have quite different corporate cultures, with different implications for the suitability of a
particular product to their context: this has been the subject of considerable work, much of it
by researchers following the sociotechnical approach pioneered by groups such as the
Tavistock Institute. The framework described above provides a structured and systematic
way of approaching such problems, but does not in itself guarantee that the appropriate
techniques exist to solve them.
One promising approach to such problems is the use of laddering. Laddering allows the
elicitor to break down terms used by the respondent into progressively more specific
components, until the explanation “bottoms out” at a level which cannot be explained further.
The components at this level may be of several types. One type is externally observable
features, such as size and colour; the other main type involves pattern matching in the
broadest sense (shape, texture and sound), which may in turn be either public domain pattern
matching (i.e. using patterns known to the lay public) or expert pattern matching, which is
often associated with implicit learning and compiled skills. These components can then be
compared across respondents.
A similar approach can be used to tackle issues such as script theory and chunking, where
different respondents have different ways of grouping items together into higher-level
structures. These structures differ between disciplines and professions. The usual example of
a script (Schanck and Abelson, 1977) is a series of actions linked in a predictable way, each of
which may in turn be composed of several sub-actions; eating at a restaurant, for instance,
usually consists of several main actions, such as “booking a table”, “hanging up coats” and
“ordering”. “Chunking” is a similar concept involving the grouping together of individual
items into a higher-level group.
One of the major differences between experts and novices is the extent to which series of
actions are scripted or chunked up. In field archaeology, for instance, the script of “drawing a
section” (i.e. a cross-section through an archaeological feature being excavated) is composed
of a large number of sub-tasks such as establishing a reference height relative to the site
temporary bench mark, with each of these in turn being composed of other lower-level tasks
such as setting up the surveying equipment. It is possible to elicit these scripts and chunks
using laddering, and then to use graph-theoretic approaches to show the number and nature
of them, thus combining qualitative and quantitative analysis.
Interestingly, the concept of script theory can be applied to the concept of agenda setting in
discourse analysis, as an example of implicit or explicit debate about the script to be used by
30
participants in the discourse. It should in principle be possible to use the same ladderingbased approach as described above to investigate the nature and number of scripts available
for a particular situation, and to approach the area of social interaction from a social cognition
perspective.
31
6.6 Conclusion
In the past, questioning methodology was a Cinderella discipline compared to the elegant
sisters of statistics and survey methods. It is now clear, however, that questioning
methodology is as important and as rich a discipline as its sisters. The next steps are the
traditional ones for a newly emerging discipline: the bringing together of knowledge from
parent disciplines, the establishment of new research agendas and approaches, and the setting
up of the infrastructure to support this, in such forms as workshops, textbooks, conferences,
and journals. It will be interesting to see what emerges from the work ahead; whether any
previously intractable problems turn out to be tractable after all, and what new challenges
appear to take their place. Traditionally, living in interesting times was treated as a curse, but
in academia, living in interesting times is what every researcher hopes for. This certainly
appears to be the most likely future for researchers in questioning methodology.
32
BIBLIOGRAPHY
Anderson, J.R. The Adaptive Character of Thought. Erlbaum, Hillsdale N.J., 1990
Andrews, S. An assessment of end user attitudes and motivation towards new technologies in the
workplace and the behaviours arising from them. Unpublished undergraduate thesis, University
College Northampton, 1999
Ayton, P. Pers. com. April 1998
Baddeley, A.D. Human memory: Theory and practice. Lawrence Erlbaum Associates, Hove, 1990
Bannister, D. and Fransella, F. Inquiring man. Penguin, Harmondsworth, 1980
Bartlett, F.C. Remembering: A study in experimental and social psychology. Cambridge University
Press, Cambridge, 1932
Belkin, N.J. , Oddy, R.N. and Brooks, H.M. ASK for Information Retrieval: Part I, Background
and Theory. Journal of Documentation, 38(2), pp. 61-71, June 1982.
Blandford, A. and Rugg, G. Integration of Programmable User Model Approaches with
Requirements Acquisition: A case study. In preparation
Boose, J.H., Shema, D.B. and Bradshaw, J.M. Recent progress in AQUINAS: A knowledge
acquisition workbench. Knowledge Acquisition, (1): 185-214 (1989).
Chi, M.T.H., Glaser, R. and Farr, M.J. (eds.) The Nature of Expertise. Lawrence Erlbaum
Associates, London, 1988.
Cortazzi, D. and Roote, S. Illuminative Incident Analysis. McGraw-Hill, London, 1975
Denzin, N.K. and Lincoln, Y.S. (eds.) Handbook of Qualitative Research. Sage, London, 1994
Ellis, C. (ed.) Expert Knowledge and Explanation: The Knowledge-Language Interface. Ellis
Horwood, Chichester, 1989.
Eysenck, M.W. and Keane, M.T. Cognitive Psychology. Psychology Press, Hove, 1995
Fransella, F. and Bannister, D. A manual for repertory grid technique. Academic Press, London,
1977
Gerrard, S. The working wardrobe: perceptions of women’s clothing at work. Unpublished Master’s
thesis, London University, 1995
Gigerenzer, G. Why the distinction between single event probabilities and frequencies is
important for psychology (and vice versa). In Wright, D. and Ayton, P. (eds.) Subjective
probability. John Wiley and Sons, Chichester, 1994
Glaser, B.G. and Strauss, A.L. The Discovery of Grounded Theory. Aldine, New York, 1967
Goffmann, E. The Presentation of Self in Everyday Life. Doubleday, New York 1959
Grice, H.P. Logic and Conversation. In Cole, P. and Morgan, J.L. (eds.) Syntax and Semantics 3.
Academic Press, New York, 1975
Hinkle, D., The change of personal constructs from the viewpoint of a theory of construct implications.
Unpublished PhD thesis, Ohio State University, 1965.
33
Cited in Bannister, D. and Fransella, F. Inquiring man. Penguin, Harmondsworth, 1980
Honikman, B. (1977). Construct Theory as an Approach to Architectural and
Environmental Design. In Slater, P. (ed.), The Measurement of Interpersonal Space by Grid
Technique: Volume 2: Dimensions of Interpersonal Space. John Wiley and Sons, London, 1977
Jarke, M., Pohl, K., Jacobs, S., Bubenko, J., Assenova, P., Holm, P., Wangler, P., Rolland, C.,
Plihon, V., Schmitt, J., Sutcliffe, A.G., Jones, S., Maiden, N.A.M., Till, D., Vassilou, Y.,
Constantopoulos, P. and Spandoudakis, G. Requirements Engineering: An Integrated View of
Representation. In Sommerville, I. and Manfred, P. (eds.) Proceedings 4th European Software
Engineering Conference, Garmesh-Partenkirchen, 1993. Springer-Verlag, Lecture Notes in
Computer Science 717, pp. 100-114.
Kahneman, D., Slovic, P. and Tversky, A. (Eds.), Judgement under Uncertainty: Heuristics and
Biases. Cambridge University Press, Cambridge, 1982
Kelly, G.A. The Psychology of Personal Constructs. W.W. Norton, New York, 1955
Loftus, E.F. and Palmer, J.C. Reconstruction of automobile destruction: An example of the
interaction between language and memory. Journal of Verbal Learning and Verbal Behaviour, 13,
585-589, 1974
Maiden, N.A.M. and Rugg, G. ACRE: a framework for acquisition of requirements.
Software Engineering Journal, pp. 183-192, 1996
Mead, M. Coming of Age in Samoa. William Morrow, New York, 1928
Michalski, R.S. and Chilauski, R.L. Learning by being told and learning from examples: an
experimental comparison of the two methods of knowledge acquisition in the context of
developing an expert system for soybean disease. International Journal of Policy Analysis and
Information Systems, 4, pp. 125-161, 1980
Miller, G.A. The magic number seven, plus or minus two: Some limits on our capacity for
processing information. Psychological Review, 63, 81-93, 1956
Myers, D.G. Social Psychology (3d edition). McGraw Hill, New York, 1990
Neves, D.M. and Anderson J.R. (1981). Knowledge compilation: mechanisms for the
automatization of cognitive skills. In Anderson J R (ed). Cognitive skills and their acquisition.
Erlbaum, Hillsdale, N.J., 1981
Norman, D. The design of everyday things. Doubleday/Currency, New York, 1990
Reichgelt, H. Knowledge Representation. Ablex Publishing Corp., Norwood,
1991
Patrick, J. A Glasgow Gang Observed. Eyre Methuen, London, 1973
Reynolds, T.J. and Gutman, J. (1988).
Laddering Theory, Method, Analysis, and
Interpretation. Journal of Advertising Research, February-March 1988, pp. 11-31.
Rosch E. Prototype Classification and Logical Classification: the Two Systems.
In Scholnick, K. (ed.) New Trends in Conceptual Representation: Challenges to Piaget's Theory.
Lawrence Erlbaum Associates, Hillsdale N.J., 1983
Rosenhan, D.L. On Being Sane in Insane Places. Science 179 (January 19, 1973)
34
Rugg, G. and McGeorge, P. Laddering. Expert Systems, 12(4), pp. 339-346, 1995
Rugg, G. and McGeorge, P. The sorting techniques: a tutorial paper on card sorts, picture
sorts and item sorts. Expert Systems, 14(2), 1997
Seger, C.A. Implicit learning. Psychological Bulletin, 115 (2), pp. 163-196 (1994).
Shaw, M.L.G. Recent Advances in Personal Construct Theory. Academic Press, London, 1980.
Shaw, M.L.G. and Gaines, B.R. A methodology for recognising consensus, correspondence,
conflict and contrast in a knowledge acquisition system. Proceedings of Workshop on Knowledge
Acquisition for Knowledge-Based Systems, Banff, Canada, Nov 7-11, 1988.
Schank, R.C. and Abelson, R.P. Scripts, plans, goals and understanding. Lawrence Erlbaum
Associates, Hillside, N.J., 1977
Sommerville I., Rodden T., Sawyer P., Bentley R. and Twidale M., 1993,
Integrating Ethnography into the Requirements Engineering Process. Proceedings of IEEE
Symposium on Requirements Engineering, IEEE Computer Society Press, 165-173.
Vickery, B.C. Faceted Classification: A Guide to the Construction and Use of Special Schemes. Aslib,
London, 1960.
Weber, M. Legitimate Authority and Bureaucracy (1924) In Pugh, D.S. (ed.)
Organisation theory: selected readings (third edition) Penguin, London, 1990
Wicker, A.W. Attitude versus actions: The relationship of verbal and overt behavioral
responses to attitude objects. Journal of Social Issues, 25(4), pp. 41-78 (1969)
Zadeh, L. Fuzzy sets. Information and Control, 8, pp. 338-353 (1965)
35
Download