Questioning methodology Gordon Rugg and Peter McGeorge Working paper Faculty of Management and Business University College Northampton University College Northampton Faculty of Management and Business Working paper 99/03 ISBN 1 901 547 008 1999 1 The Authors: Gordon Rugg is Reader in Technology Acceptance at University College Northampton. Contact details: Dr Gordon Rugg Reader in Technology Acceptance School of Accountancy, Information Systems and Law University College Northampton Boughton Green Road Northampton, NN2 7AL UK Email: Gordon.Rugg@northampton.ac.uk Tel: +44 (0)1604 735500 Peter McGeorge is Senior Lecturer in Psychology at Aberdeen University Contact details: Dr Peter McGeorge Department of Psychology University of Aberdeen Aberdeen AB24 2UB Scotland, UK Email: psy144@mailserv.abdn.ac.uk Phone: +44 (0)1224 272248 2 Acknowledgements Any work of synthesis and integration is likely to include a significant amount of input of ideas and influence from many other people, and this paper is no exception. The sections of this work dealing with knowledge representation and integration derive at least in part from experience in developing a Knowledge Elicitation Workbench while in the Artificial Intelligence Group at the Department of Psychology, University of Nottingham, working with Nigel Shadbolt, Mike Burton and Han Reichgelt. The sections on implicit knowledge were grounded in Peter McGeorge’s PhD work in the same department, with Mike Burton. The concept of accessing different versions of knowledge via different elicitation techniques derives from Gordon Rugg’s PhD work with Wyn Bellin, while in the Department of Psychology, Reading University. The idea of accessing different types of memory via different elicitation techniques in the context of requirements acquisition was developed with Neil Maiden, HCI Design Group, School of Business Computing, City University. The extension of the requirements acquisition work to the wider concept of questioning methodology was largely inspired by work with Ann Blandford, School of Computing Science, Middlesex University. We would also like to record our gratitude to everyone else who helped us in this work, particularly those who provided constructive suggestions on previous drafts, and the longsuffering respondents who provided us with the practical experience of elicitation on which this work was based. 3 Abstract A central problem in many disciplines is the elicitation of a complete, correct, valid and reliable set of information from human beings – finding out what people want, think, know or believe. Examples include social science research, market and product research, opinion polls and client briefs. Although numerous elicitation techniques exist, there has traditionally been little theoretically-driven guidance available on choice, sequencing and integration of techniques. Choice of technique has been largely a matter of individual preference, with interviews and questionnaires usually being chosen, regardless of how suitable they are for the task being approached. This paper discusses the issues involved in providing guidance about choice of technique, then describes a framework for providing such guidance. A central feature of this paper is the distinction between various types of memory and knowledge. Some of these can be accessed via interviews or questionnaires. Others, however, can only be accessed by one technique, and are inaccessible to interviews and questionnaires. These types are listed in the framework, and matched with corresponding recommended elicitation techniques. The framework is illustrated by case studies, including two from the authors’ industrial experience. The paper concludes that questioning methodology fills a methodological gap between experimental design and statistics, and should be established as a discipline in its own right. 4 Contents: 1: 2: 3: 4: 5: 6: 7: 8: Introduction A framework for categorising techniques Selecting and integrating techniques Method fragments Case studies Discussion Future work Conclusion Bibliography Figure 1: A three layer graph Table 1: Recommended and contra-indicated techniques for handling each knowledge type 5 1 INTRODUCTION A central problem in many disciplines is finding out what people want, or think, or believe, or know. This problem is at the heart of any research involving human behaviour or attitudes – the social sciences, in effect – and a surprising range of other fields. In computing science, for example, elicitation of expertise is central to knowledge acquisition for knowledge based systems, and elicitation of client requirements is at the heart of system analysis and of requirements acquisition. The problem is not caused by a lack of research in the area, or of elicitation techniques; a recent book on qualitative research methods alone runs to over six hundred pages (Denzin and Lincoln 1994), and an overview article on requirements acquisition listed a dozen major techniques, with clear recognition that there were numerous other techniques in existence, as well as numerous versions of both the major and minor techniques (Maiden and Rugg, 1996). The problem is more to do with choice of the appropriate technique or techniques, and with using them in the correct way. The same problem occurs in a wide range of disciplines. Traditionally, there have been three main approaches to choice of questioning technique. One is to view choice of technique as unimportant; a second is to use the techniques traditionally used in the discipline, and the third is to view the issue as important, but not yet well enough understood to enable an informed choice. The first main point which emerges clearly from the findings described below is that choice of the correct questioning technique is not just important, but essential, in any discipline which involves eliciting information from people. The second main point which emerges is that there is now a theoretically grounded and practical way of approaching this area. These issues are the central theme of this paper. A short example demonstrates the type of issue involved. One of the authors recently supervised an undergraduate project which was investigating hassles (minor stresses and irritations) affecting IT managers. This is a topic which is of considerable importance both theoretically (in relation to stress research) and practically (staff turnover among IT managers is a major problem to companies with a high IT presence). The student did a thorough piece of work, establishing a good rapport with the IT managers she was studying, and using several techniques to investigate different aspects of the topic, including interviews and “hassle diaries”. These provided an interesting insight into the nature of an IT manager’s role, with enough detail and breadth of coverage to produce the basis of a good dissertation. However, the interviews and hassle diaries all missed a major feature which was only detected by use of shadowing (i.e. following the managers around while they worked), namely that the managers quite often had no lunch break because of pressure of work. If this were an isolated case, then there would be little cause for concern. However, it is such a typical case that the authors now routinely use “compare and contrast” designs involving different elicitation techniques as a standard basis for student projects. Although these projects also focus on an interesting domain, so that analysis can concentrate on the domain if the different techniques do not produce different findings, in practice the different techniques have reliably and systematically produced different findings across a range of domains and techniques. The following sections discuss reasons for this, and the implications which follow. There has been considerable exchange of concepts and techniques between disciplines. For instance, laddering was developed by Hinkle (Hinkle, 1965) from Kelly’s Personal Construct Theory (Kelly, 1955), and has since then been used in clinical psychology (Bannister and Fransella, 1980; Fransella and Bannister, 1977) , architecture (Honikman, 1977), market research (Reynolds and Gutman, 1988), knowledge acquisition (Rugg and McGeorge, 1995) and requirements acquisition (Maiden and Rugg, 1996). Ethnographic approaches in various forms have been applied outside traditional ethnography to fields such as criminology 6 (Patrick, 1973) and requirements acquisition for air traffic control systems (Sommerville, Rodden, Sawyer, Bentley and Twidale, 1993). This exchange, however, has traditionally been at the level of individual concepts and techniques, rather than in terms of larger frameworks. This is in interesting contrast to the situation with statistics, experimental design and with survey methods, which have historically been viewed as semi-autonomous disciplines in their own right, with the same textbooks and journals being used by researchers from a wide range of disciplines. The reason for this difference is probably quite simple, namely that there has in the past been little in the way of higher-level frameworks and metalanguage to handle elicitation techniques as a whole. It is, however, a critically important absence, because statistics, experimental design and survey methods cannot make up for damage caused by incorrect selection or use of questioning technique. The aim of this article is to describe a framework which will help remedy this situation, by providing theoretically grounded and systematic guidance on choice of techniques. This framework is intended to be applicable to a range of disciplines, and to provide a common ground for the establishment of questioning methodology as a discipline in its own right. This new discipline would complement survey methods, experimental design, and statistics, thereby providing researchers with a complete set of conceptual tools and methods for research involving human behaviour. This paper is divided into four main sections. The first section briefly describes existing questioning techniques. The second section describes and discusses knowledge and memory types, and the implications of these for choice of questioning technique. The third section describes a framework for selection and integration of questioning technique. The fourth section provides a brief description of knowledge representation and related concepts, to provide some further metalanguage. These are followed by two short case studies and a discussion of implications for further work. 1.2 Existing techniques This section provides a brief overview of the main questioning techniques, to set the subsequent theoretical analysis in context. It is tempting to derive guiding frameworks from the techniques themselves, or from practical issues involved in technique choice, such as time or equipment required. Although this can be useful, it is only part of what is needed. Technique-based frameworks are derived from existing solutions, rather than from the problem, and it is the problem which is central. This issue is discussed in detail below. It should be emphasised that the ordering of the list of techniques in this section is largely arbitrary, and is not intended as a classification in its own right; classification is described later in this paper. The descriptions of techniques are intended as a brief overview so that readers know what the various techniques are before encountering the sections on selection and integration of techniques – few readers are likely to be familiar with all of them. For clarity, these have been kept deliberately brief. There is a separate section later in this paper which deals with further concepts relevant to techniques, such as knowledge representation; some topics which are only tersely outlined in the descriptions of techniques, such as hierarchical structures of knowledge, are discussed in more detail in the “further concepts” section. 1.3 The main elicitation techniques There is a considerable literature on the individual techniques, and on the philosophical, theoretical and methodological issues associated with them – for instance, the role of the observer, and the nature of subjectivity in data collection. A good introduction to this literature is provided by Denzin and Lincoln (1994). Although these are important issues, for 7 reasons of space they are not discussed in detail in this paper, which concentrates instead on the interaction between knowledge types and elicitation techniques. Some of the techniques described below can be tracked back to a key source, or can be illustrated by a classic study; other techniques, such as interviews, are ubiquitous and have no clear origin. The descriptions below are intended to give a brief overview of the main techniques in use, and include references to further reading where a technique is likely to be unfamiliar to most readers. Ethnographic approaches usually involve spending extensive amounts of time with the group being studied so as a gain a thorough first-hand understanding of how their physical and conceptual world is structured. Varieties include participant observation, where the observer participates in the group’s activities, which may in turn be either undisclosed participant observation (in which the observer does not disclose to the group that their participation is for purposes of research) or disclosed (in which the observer does not attempt to conceal the purpose of the participation). A classic example of using disclosed participant observation is Mead’s (1928) study of sexual behaviour in Samoa. A more recent example is Sommerville et al’s (Sommerville et al, 1993) study of the behaviour of air traffic controllers. Classic examples of undisclosed participant observation include Patrick’s study of a Glasgow gang (Patrick, 1973) and Rosenhan’s study of behaviour in a psychiatric ward (Rosenhan, 1973). Observation involves observing the activity in question. Varieties include participant observation (described above, under ethnographic approaches), direct observation and indirect observation. In direct observation, the activity itself is observed; in the case of indirect observation, the by-products of the target activity are observed, usually when the target activity itself cannot be directly observed. A familiar example of direct observation is shadowing, where the researcher follows the respondent around, usually in the context of the respondent’s work. An example of indirect observation is examination of illegitimacy rates as an indicator of the incidence of premarital sex. Reports involve the respondent verbally reporting on the target activity. There are numerous varieties, some of which would traditionally be considered as techniques in their own right (e.g. scenarios). The underlying similarity in deep structure, however, is great enough for classifying them together to be sensible. Varieties include self report and report of others. Each of these can in turn be subdivided into on-line and off-line reporting. In self report, the respondent reports on their own actions; in reports of others, the respondent reports on the actions of others. In on-line report, the reporting occurs while the action is taking place; in offline report, the reporting occurs after the action. Scenarios are a special form of report in which the respondent reports on what they think would happen in a particular situation (i.e. scenario), which may involve themselves and/or others. Critical incident technique, and several closely related techniques such as illuminative incident analysis (Cortazzi and Roote, 1975) involves asking the respondent to describe and discuss a particularly instructive past incident. Interviews are one of the most familiar and widely used elicitation techniques. The core concept is of a question and answer session between elicitor and respondent, but the term “interview” is used so loosely, and to cover so many variants, that it is of debatable value. A traditional distinction is made between structured and unstructured interviews. In the former, the elicitor has a series of prepared topics or specific questions; in the latter, the agenda is left open and unstructured. Interviews may overlap with scenarios, by asking about possible situations, and with critical incident technique, by asking about important past events, as well as with other techniques such as laddering, when clarifying the meaning of technical terms. The Personal Construct Theory techniques are a range of techniques deriving from Kelly’s Personal Construct Theory (PCT). These include repertory grids, card sorts and laddering. PCT is based on a set of assumptions explicitly described by Kelly (1955). These cluster round a model in which people make sense of the world by dividing it up into things (elements) 8 which can then be described by appropriate attributes (constructs). There are various assumptions about the nature of entities and constructs: for instance, that there is enough similarity across individuals to allow us to communicate with each other, but enough divergence for each individual to be different. This model is reflected in the elicitation techniques based on PCT. Repertory grids are entity:construct (broadly equivalent to object:attribute) matrices with values in the resulting cells of the matrix; card sorts involve repeatedly sorting entities into groups on the basis of different criteria; laddering is a hierarchically-structured technique similar to a highly restricted interview, for eliciting categorisations, hierarchies and levels of explanation. These techniques are often formally linked to each other in elicitation, with output from one being used directly as input for another. Examples include Kelly’s original work describing PCT (Kelly, 1955); Bannister and Fransella’s more accessible descriptions of PCT and of repertory grid technique (Bannister and Fransella, 1980 and Fransella and Bannister, 1977 respectively). Personal Construct Theory and its associated techniques have been applied to knowledge acquisition and requirements acquisition by Boose, Gaines and Shaw (e.g. Shaw, 1980, Shaw and Gaines, 1988, and Boose, Shema and Bradshaw, 1989). Card sorts are described in detail in Rugg and McGeorge, 1997. Laddering was used by Honikman in architecture (Honikman, 1977 ) and by Reynolds and Gutman in advertising research (Reynolds and Gutman, 1988). Questionnaires are lists of questions or statements, usually administered in written form, but sometimes used in spoken form (e.g. via telephone sessions). When in spoken form, they overlap with structured interviews (described above). A traditional distinction in questionnaires is between open questions, in which the respondent may use their own words, and closed questions, in which the respondent has to choose between possible responses on a list supplied by the elicitor. Prototyping is an approach used under different names in different disciplines. Versions include architects’ models, engineering prototypes, software prototypes, artists’ impressions in architecture, etc. The prototype is shown to the respondent, who then critiques it; the results from this are then usually fed back into another iteration of design. It should be noted that this is a completely separate concept from prototype theory, which is discussed separately in section 3.2.2 below. 9 2 A FRAMEWORK FOR CATEGORISING TECHNIQUES The choice of structure for a framework is an important issue. A single hierarchical tree, with classes and sub-classes, is only able to represent a single way of classifying the entities involved. In the case of elicitation techniques, however, it is necessary to categorise in several ways, which might include time taken to use the technique (minutes in the case of card sorts, months or years in the case of some ethnographic work) or equipment needed to use the technique (extremely sophisticated recording equipment for observation of human:computer interaction, or a notepad and pen in the case of laddering). The approach used by Maiden and Rugg (Maiden and Rugg, 1996) and used in the present paper is a faceted one, in which several different categorisations are used and treated as orthogonal (i.e. separate from, and uncorrelated with, each other, but applied to the same entities). This has considerable advantages in terms of clarity. It also has the advantage of handling range of convenience much more elegantly than is the case with non-faceted approaches, such as matrix representations or elaborated trees. “Range of convenience” is a concept in Personal Construct Theory (Kelly, 1955), which refers to the way in which a particular term can only be used meaningfully within a certain range of settings. For instance, “IBM compatible” is meaningful only when applied to computer equipment, and is meaningless when applied to a dug-out canoe. In slot and filler representations such as matrices, such cases have to be handled by a “not applicable” value, and in extreme cases the “not applicable” cases can outnumber the meaningful values. These issues are discussed in more detail below in relation to knowledge representation and its role in questioning methodology. Although the technique-driven facets are important, they are not enough. A technique-based classification would be analogous to a representation of illness which was based on the medicines and treatments which were available, but which contained no systematic description of the illnesses which the medicines and treatments were designed to cure. The most important facet of the Maiden and Rugg framework involves what the authors termed “internal representations”. This term covers types of memory, of knowledge and of communication filter which affect the quantity and type of information which can be elicited. An initial distinction can be made between what is termed “new system” knowledge and “existing domain” knowledge in the Maiden and Rugg framework. The former refers to knowledge about things which do not yet exist, the latter to knowledge about things which do exist, or have already existed. This is a distinction with important implications for the degree of validity which can reasonably be expected, and will be discussed in more depth later. Existing domain knowledge is divided into three types of internal representation, namely tacit, semi-tacit and non tacit knowledge, which form the bulk of the Maiden and Rugg framework. 2.2 Tacit knowledge is knowledge which is not available to conscious introspection, and can be subdivided into implicit learning (Seger, 1994) and compiled skills (Neves and Anderson, 1981, Anderson, 1990). Implicit learning occurs without any conscious learning process being involved; the learning proceeds straight from the training set of large numbers of examples into the brain without any intermediate conscious cognitive processes. Compiled skills were initially learned explicitly, but subsequently became habitualised and speeded up to the point where the conscious component was lost. Everyday examples include touch typing and changing gear when driving a car. In such cases, asking the respondent about the skill will produce valid responses only by chance. Touch typists, for instance, do not usually have significant explicit memory for the position of keys on the keyboard; if asked which key is to the right of “g”, for instance, they will usually have to visualise themselves typing, and observe the answer. Similarly, car 10 drivers will not usually be able to recall the precise sequence of hand and foot movements which they made when going round a roundabout. Asking respondents to describe what they are doing while using a compiled skill usually leads to breakdown of performance because of the intrusion of a slower conscious component into the task. Tacit knowledge may include a significant amount of pattern matching, which is a very fast, massively parallel form of search quite different from the sequential reasoning used for other tasks; an everyday example of pattern matching is recognition of a familiar face. Because of its massively parallel nature, pattern matching is not amenable to being broken down into lower-level explanations, with consequent implications for elicitation. One of the most strikingly unexpected results from research into expertise was the extent to which experts use matching against a huge learned set of previous instances, rather than sequential logic, as a way of operating (e.g. Chi, Glaser and Farr, 1988, Ellis, 1989). 2.3 Explicit knowledge is defined as knowledge which is available to conscious introspection. This type of knowledge is in principle accessible using any elicitation technique, although it may be subject to various biases and distortions. 2.4 Semi-tacit knowledge is a term which applies to a wide range of memory types and communication filters. These include short term memory; recall versus recognition; taken for granted knowledge; preverbal construing, and front and back versions. The common factor shared by these is that they can only be accessed via some routes. 2.4.2 Short term memory is probably the most widely known of these types, and is well understood as a result of considerable research in psychology. It is a limited capacity, short term storage, with a capacity of about seven items, plus or minus two, (Miller, 1956) and a duration of a few seconds. Long term memory, in contrast, has enormous capacity, and can last for tens of years. In complex cognitive tasks, short term memory is often used as a sort of scratchpad, with the information involved never reaching long term memory. This means that any attempt to access that information after the task (e.g. via interviews) is doomed to failure, since the information was lost from memory within seconds of being used. Short term memory is only accessible via contemporaneous techniques such as on-line self-report, or indirectly via observation. 2.4.3 recall versus recognition is another aspect of memory structure. Recall is active memory, when information is deliberately retrieved from memory; recognition is passive memory, when a specified item is compared to what is stored in memory to search for a match. Recognition is normally considerably more powerful than recall (c.f. Eysenck and Keane, 1995). A simple example involves trying to recall the names of the states in the USA, where most people can only recall a small number, but can correctly recognise a much larger number if shown a list of names. 2.4.4 Taken for granted knowledge (TFG knowledge) is knowledge which one participant in a communication assumes to be known by the other participant or participants (Grice 1975). The concept is related to Norman’s concept of knowledge in the head, as opposed to knowledge in the world (i.e. knowledge explicitly represented in the external world, for example as instructions on street signs). (Norman, 1990). TFG knowledge is normally not stated explicitly during communication; for instance, one does not say “My aunt, who is a woman” because it can be taken for granted that aunts, by definition, are women. This principle increases the efficiency of normal communication by leaving out superfluous information. Unfortunately, filtering out of TFG knowledge is based on the assumption that the other participant or participants share the knowledge, and this assumption can be false. This is particularly the case when experts are dealing with non-experts, and are describing everyday features of their area of expertise. Precisely because these features are so familiar to them, experts are likely to take them for granted, and to assume that they are equally familiar to the non-expert. Initial evidence from research into semi-tacit knowledge suggests that TFG knowledge is one of the more common, and more serious, reasons for incomplete elicitation of information. 11 2.4.4 Preverbal construing is a term used in Personal Construct Theory to describe construing which occurs without a verbal label for the constructs involved. This effect is what is referred to in lay language by expressions such as “I can’t put it into words, but…” In some cases, this may refer to constructs which are fairly explicitly understood by the respondent, but which happen not to have a verbal label; in other cases, some form of tacit knowledge is involved. A striking effect which sometimes happens when using PCT techniques is that the respondent suddenly has an “aha” experience, when a construct changes from preverbal to verbal status. This is usually accompanied by expressions such as “I’d always known there was a difference, but I’d never been able to put my finger on it before”. 2.4.5 Front and back versions are, respectively, the “public consumption” and “behind the scenes” versions of reality which members of a group present to outsiders (in the case of front versions) and insiders (in the case of back versions). These terms are derived from Goffman’s (1959) dramaturgical metaphor of the stage performance. This metaphor has the advantage of not implying any intention to deceive in the front version; the front version in many professions is viewed by group members as a professional image to be maintained, not as an extended lie to be fed to the public. It has been anecdotally reported that members of the US Air Force about to testify to public hearings are given three pieces of advice: firstly, don’t lie, secondly, don’t try to be funny, and thirdly, don’t panic and blurt out the truth. Although this does not map exactly onto the distinction between front and back versions, it does neatly capture the distinction between telling the whole truth on the one hand and not telling a lie on the other. Any outsider, such as a researcher or analyst, coming into an organisation is likely to be given the front version. Although this may not be dishonest, it is also unlikely to be the whole truth, and the missing information can be extremely important. An extensive literature dating back to Weber (e.g. Weber, 1924) has consistently found that in most organisations there are usually unofficial short-cuts in working practices which are not officially allowed, but without which the system would be too unwieldy to work. A simple illustration of this is the work to rule, a form of industrial action in which the participants follow the official procedures exactly. This usually reduces productivity dramatically. The distinction between front and back versions is not an absolute one, but more of a spectrum. Outsiders may become gradually accepted by the group, and given access to increasingly sensitive back versions of events. 2.4.6 The so-called “stranger on a train” effect is a paradoxical effect, in which people are prepared to discuss extremely personal and sensitive information if the situation is one of anonymity (such as talking to a sympathetic stranger on a train whom one does not expect ever to meet again). This may be used by investigators, but requires careful setting up – for instance, it is advisable only to use a single elicitation session with each respondent, and to make it clear that the respondent will not be identifiable in the published outcome of the research. 2.4.7 Future system knowledge is the term used by Maiden and Rugg to describe knowledge about future systems in the context of software development. This term was used within the context of developing software systems; a more appropriate term for general questioning would be “predictive knowledge”. This involves quite different issues from the knowledge types described above. In the case of the knowledge types described above, the relevant knowledge exists somewhere, and the key problem is accessing this information reliably and validly. The term “accessing” is an important one in this context. “Elicitation” describes the process of extracting information from the respondent, via the respondent; however, some types of knowledge, such as tacit knowledge, have to be acquired by indirect induction rather than directly from the respondent. An example would be the use of observation to identify key actions during performance of a compiled skill; it would in principle be possible to produce a complete and correct description 12 of this skill without the respondent ever knowing what was in the description. In knowledge acquisition, this sort of situation occurs in relation to machine learning, where the salient variables may be identified via explicit elicitation from a human respondent, but the correct weightings and correlations between these variables are then worked out by software. This approach can lead to a system which performs better than the human experts from whom the variables were elicited (Michalski and Chilausky, 1980; Kahneman, Slovic and Tversky, 1982); the reasons for this have important implications for questioning methodology, and are discussed in more detail below. The distinction between elicitation and acquisition is now generally accepted in Artificial Intelligence (AI) and in requirements engineering, with elicitation of knowledge or requirements being recognised as subsets of knowledge acquisition or requirements acquisition respectively. 2.5 Predicting requirements and behaviour. When a new product is being developed, it is not normally possible for any single individual to predict what the requirements will be. One reason for this is that usually more than one stakeholder is involved, leading to the need for negotiation of requirements between stakeholders. Another reason involves what is known in Information Science as the Anomalous State of Knowledge (Belkin, Oddy and Brooks, 1982). An Anomalous State of Knowledge (ASK) exists when a person wants something (e.g. a relevant reference or a new system), but does not have enough knowledge of the possibility space to be able to know what is possible and what could therefore meet their requirements. This is particularly striking in the case of software development, where users may be utterly unaware of what is technically feasible, and may dramatically alter their requirements when they see what can be done. A third major reason for problems identifying future needs involves people’s weakness in predicting future events and behaviours. This is well recognised in attitude theory, where it has long been known that people’s expressed attitudes correlate weakly at best with their actions (e.g. Wicker, 1969). The same principle applies to people’s predictions about their own behaviours in situations such as seeing smoke come from underneath the door in a waiting room. Some personality theorists have gone so far as to argue that an individual’s own predictions about their behaviour in a given situation are no higher in validity than the predictions of someone else who knows that person well, and that our mental models of our personalities are derived from observation of our own behaviour, rather than being the cause of our own behaviour. Although more recent research has shown that it is possible to reduce significantly the gap between expressed attitudes and actual behaviours by concentrating on key variables in the research design and the data collection (Myers, 1990), the gap is still a long way from closed, and the topic needs to be addressed with care. This issue may well be a subset of a more general principle, namely human weakness in dealing with multivariate information. A considerable literature in judgement and decision making has consistently found that humans are bad at identifying randomness in multivariate data, with a corresponding tendency to see correlations and patterns where none exist (Kahneman, Slovic and Tversky, 1982). When correlations and patterns do exist, people are consistently poor at weighting the variables correctly. An elegant example of this is a study by Ayton, (1998), involving prediction of football scores. The first part of this study involved asking British football fans and Turkish students with no interest in football to predict British football results. The result was that the Turkish students performed at a similar level to the British fans, at well above the level which would be expected by chance. The Turkish students were using the only information available to them, namely whether or not they had heard of the teams or the towns where they were based. These tended to be the larger and/or more famous examples, and these tended to beat smaller or less famous rivals. This effect was a strong one, and the other variables used in predictions by the British fans were comparatively weak predictors; the British fans, however, weighted these other variables too heavily in relation to the main one. An obvious way of dealing with this problem, and one already used in knowledge acquisition, is to use elicitation techniques to identify the salient variables, and then use 13 statistical or computational techniques to identify the appropriate weightings for these variables. This approach seems to have been comparatively little used in the social sciences, although multivariate approaches are routinely applied to the variables identified by the researchers involved. If human weakness in handling multivariate data is as prevalent as it appears, then attempts to extract accurate predictions from people will usually be attempts to find something which does not exist, and will therefore be a waste of time and effort. It should be noted as a parenthesis that, although the findings on human judgement and decision making (J/DM) described above are reliable and robust, there has been debate about their validity. The naturalist school of J/DM research argue that the effects found in the “heuristics and biases” are largely artefacts of the statistical representation used by them. The “heuristics and biases” school have generally used a probabilist presentation, i.e. one involving probability judgements, when framing the experimental task. Researchers such as Gigerenzer argue that if the same task is reframed in a frequentist format, i.e. one involving frequency judgements, then the biases and distortions described above no longer occur (Gigerenzer, 1994). This debate is unlikely to be resolved in the near future, and is closely linked with a long-running debate in statistics about the relative meaningfulness and validity of probabilist and frequentist representations. It is likely that future research will identify further types of memory and knowledge filter; for instance, the authors are currently investigating the potential semi-tacit category of “not worth mentioning” knowledge, and intend to investigate tacit knowledge in more detail. 14 3 SELECTING AND INTEGRATING TECHNIQUES It is clear from the account above that no single technique is likely to be able to deal with all the types of knowledge involved in any given situation. Selection and integration of the appropriate techniques is therefore necessary. There are various facets on which selection and integration can be described, such as knowledge types involved, equipment needed and input and output formalisms. For brevity, only selection and integration on the basis of knowledge type are described in any detail here. Table 1 below is not exhaustive or set in tablets of stone; its main function is to provide a clear overview of the recommendations arising from the analysis of knowledge types and of techniques above. The reasons for the recommendations should be clear from the preceding text. 15 Table 1: recommended and contra-indicated techniques for handling each knowledge type. Knowledge type: Recommended technique(s): Contra-indicated technique(s): Predictive knowledge Any technique, but problems with validity None Non-tacit knowledge Any technique, but there may be problems with validity of memory None On-line self-report All others (see list in section 1.2) Techniques which do not involve showing examples to the respondent (e.g. interviews) All others Semi-tacit knowledge: Short term memory Recall v. recognition Taken for granted knowledge Preverbal construing Front and back versions Tacit knowledge Compiled skill Implicit learning Techniques involving showing examples to the respondent (e.g. reports, picture sorts, item sorts) Observation; laddering Repertory grid; card sorts; laddering; possibly reports and interviews if handled with care Observation; possibly interviews, critical incident technique and reports once good rapport has been established with respondent Observation and experimentation Observation and experimentation 16 All others All others All others All others One important part of questioning is the identification of which knowledge types are most salient in the situation being investigated. Practical considerations of time and resources usually limit the amount of investigation which can be undertaken, so it is important to identify the most important aspects of the situation and to choose the appropriate techniques for them. A certain amount of information can often be gained informally during the initial meetings with potential respondents, gatekeepers and other members of the organisation when a study is being set up. If the research is to take place in a commercial company, for instance, it is often possible to use direct and indirect observation when on the way to the contact person’s office – for instance, the demeanour of the staff, the information and other resources available to them (e.g. manuals on desks) and the speed with which they perform tasks. Demonstrations of tasks allow the identification of tacit knowledge; a standard indicator of this is that the demonstrator is able to talk while performing the task, with the conversation ceasing when conscious thought is required to perform the task. This kind of information is difficult or impossible to gather using preliminary interviews; however helpful the respondents are, they will omit to mention taken for granted knowledge, and will probably never have noticed the extent to which they use tacit knowledge. This issue is discussed in more detail in the case studies described below. Once the types of knowledge involved have been identified, it is then possible to start prioritising the topics which need to be investigated further, and to select the appropriate techniques to handle the knowledge involved. It is advisable to proceed this way round, rather than selecting the issues first and then profiling the knowledge involved, because the profiling may well reveal serious misconceptions in the elicitor’s initial model of the area. An effective demonstration of this is to ask a geologist to give an on-line self-report on how they identify a rock specimen, leading up to identifying it, and then to follow this immediately by asking the same geologist to identify a rock specimen and then explain how they knew that it was the stated type of rock. For the second task, experienced field geologists will usually be able to identify a rock before the elicitor has finished putting it on the table; the on-line selfreport, however, can go on for as much as half an hour. It is clear that the actual identification is accomplished by some form of tacit knowledge (in this case, pattern matching) and that the tasks described in the on-line self-report are a reconstructed version of how to proceed, used only for teaching students or for difficult specimens. Such differences can easily mislead the inexperienced elicitor depending on initial briefing interviews; a moment spent in observation is seldom wasted. 3.2 Terminology One historical legacy of the separate evolution of elicitation techniques is that there has been only partial and unsystematic transfer of concepts across techniques and disciplines, so that concepts viewed as indispensable in one area are practically unknown in another. This section describes a range of concepts which are relevant across disciplines and techniques, and which are among the conceptual tools of questioning methodology as an integrated discipline. The terminology described below derives from a variety of sources, but primarily from knowledge representation, which is a fairly recent but well established and extensive field within Artificial Intelligence. A good introduction is provided by Reichgelt (1991). Knowledge representation is also important in other areas of computing, such as requirements engineering (Jarke, Pohl, Jacobs, Bubenko, Assenova, Holm, Wangler, Rolland, Plihon, Schmitt, Sutcliffe, Jones, Maiden, Till, Vassilou, Constantopoulos and Spandoudakis, 1993). A full description of the topic is beyond the scope of this paper; however, it provides an important basis for a metalanguage for questioning methodology. One significant advantage of using this literature as a foundation is that there has been considerable work on the formal semantics of the various representations used. This allows a more systematic, clean and rigorous terminology than would otherwise be the case. The following account draws heavily on this literature, with additions from other literatures where appropriate. 17 3.2.1 Validity and reliability An important initial distinction is between validity and reliability, used here in the sense in which the terms are employed in statistics and experimental design. “Validity” describes the extent to which what is elicited corresponds to reality; “reliability” describes the extent to which the same finding occurs repeatedly, whether between different elicitors, different respondents, different occasions, or whatever other variable is involved. The standard metaphor is target shooting, where “validity” refers to how near the bullets are to the target, and “reliability” refers to how near the bullets are to each other. Bullets may be near to each other while very distant from the target, which is generally less desirable than the converse; however, it is usually easier to assess reliability than validity, and it is tempting to hope for the best if the results are reliable. An everyday example of this is the Father Christmas effect. If a number of respondents are separately asked to describe Father Christmas, then their accounts are likely to agree closely (white bearded man, somewhat overweight, in long red coat and hood with white trim – probably a more detailed description than in many crime reports). However, this reliability does not mean that there is a real Father Christmas, only that there is a widely known stereotype, which all adult respondents know to be fictitious. Human memory is subject to numerous distortions, biases and imperfections, and should therefore be treated with caution. The clarity and detail of a memory are not valid indicators of its accuracy. Distortions can be significant, such as complete reversals of a sequence of events. There is a considerable literature on this topic, dating from Bartlett’s early work (Bartlett, 1932) to more recent work by e.g. Loftus and Palmer, (1974) and Baddeley (1990). Robust findings include the active nature of memory, which involves encoding of events into memory rather than a passive recording of them. This encoding frequently leads to schematisation of the memory so that it fits into a familiar schema, even though this may involve a reversal of the sequence of events, or of the role of the participants involved. 3.2.2 Category theory and fuzzy representations. Categorisation is an important part both of everyday cognition and of expertise. Categories are usually defined in terms of the set of attributes which are specific to the category in question – for instance, the category “bird” in lay language is defined in terms of having feathers, being able to fly, making nests and laying eggs. However, many categories are not watertight, in the sense of having no exceptions or ambiguities, and there may be similar uncertainty about the individual attributes. In the case of birds, for instance, penguins do not fly, most reptiles lay eggs and some penguins do not make nests. Within individual attributes, an attribute may be defined in terms of several sub-components, and these, like the attribute, may be “fuzzy” attributes. This term refers to attributes whose applicability is not a clear-cut “either-or” issue, but rather a question of extent. The concept “tall”, for instance, applies strongly to someone two metres high, but there is no unambiguous cut-off point at which a height is described as “average” rather than “tall”. This lack of precision, however, does not stop the attribute from being meaningful; it means, rather, that the metalanguage needed to describe it needs to be sufficiently sophisticated. Category theory and more specifically prototype theory have been investigated in some depth by Rosch (Rosch, 1983) and other researchers in the same tradition, who use the concept of core membership of a category, with increasing degrees of variation from the prototypical core membership. A robin, in this approach, is a prototypical bird exhibiting all the usual attributes of membership of the category “bird”; a puffin is less prototypical, and a penguin is on the edge of the category. Various branches of set theory and of formal semantics also deal with the same issue of categorisation, which is an important and ubiquitous one. 18 At a practical level, categorisation has major implications for any bureaucracy, and particularly for a bureaucracy trying to automate its procedures, which has been noted since Weber’s research into bureaucracies (Weber, 1924); the same is true for the law. For instance, assessment of welfare entitlements, or of tax liability, often involves a considerable amount of decision-making about the appropriate category in which to put a particular issue; once the category has been decided, the rest of the assessment is comparatively trivial. At a theoretical level, the topic of categorisation is of particular interest to social anthropologists, in terms of the social construction of defining features of social structure, such as in-groups and outgroups. Fuzziness is the topic of an extensive literature on fuzzy logic, dating back to Zadeh’s original work (Zadeh, 1965). This literature uses a mathematical approach to describe degrees of membership of fuzzy sets, and has proved a powerful tool in handling data of this sort. The basic concept is that set membership is quantified on a scale from zero (not a member) to one (completely a member), with intermediate membership being given an intermediate numeric score, such as 0.3 or 0.7. There are also extensive literatures in statistics and psychology, particularly judgement and decision-making (J/DM) dealing with areas such as uncertainty, stochastic events, imperfect knowledge and incomplete knowledge, which are different from fuzzy knowledge, but may overlap with it. Uncertainty refers to knowledge may or may not be true; stochastic events happen or do not happen on a probabilistic basis; imperfect knowledge contains errors; incomplete knowledge is simply incomplete. Thus, for example, a doctor may think that a patient has a particular disease, but not be sure of the diagnosis (uncertainty); the disease may be known to cause delirium at unpredictable intervals (stochastic events); the medical records may contain errors, although the doctor does not know which parts of the records are correct and which are incorrect (imperfect knowledge); and the medical records may not contain any information about one aspect of the patient’s previous health (incomplete knowledge). Each of these has different implications for theory and practice. 3.2.3 Terms from knowledge representation The standard literature on knowledge representation in Artificial Intelligence deals in depth with formalisms for representing knowledge, including facts, relationships and actions. Although these provide a powerful language for handling the output from elicitation sessions, this is too broad a topic to be covered in detail in this paper, so only an outline is given below. Three well-established formalisms for representing relationships are nets, frames and rules. Nets, i.e. semantic networks, have the advantage of considerable flexibility in handling different types of relationship (e.g. “is-a” and “part-of” links) but the disadvantage of unclear semantics and of lack of structure. Frames involve a slot and filler notation, in which the various relevant semantic categories are listed in advance and then filled in for each instance being described. These have the advantage of clarity and completeness, but the disadvantage of rigidity. Rules represent information in terms of conditions and consequences (e.g. IF condition A AND condition B THEN action C). This is useful for representing knowledge about actions, but can lead to problems of obscurity with regard to precedence, concurrency, etc in large rule sets. Although all of these formalisms are relevant to elicitation, the most immediately relevant is semantic networks, whose terminology is explicitly used in laddering and in category theory (described later). Another set of representations from AI with implications for questioning methodology deals with classes, attributes, and inheritance. Classes are categories which may be composed of sub-classes and of sub-sub-classes. Eventually all classes end in instances, i.e. specific, unique entities which belong to that class. A familiar example is zoological classification, in which the class (using knowledge representation terminology) of canids includes the sub-class of dogs, and the sub-class of dogs in turn contains instances consisting of all the dogs in the world. 19 Each class has a set of attributes which define and/or describe it; for instance, the class of mammals includes the attributes of giving birth to live young and suckling the young with milk. The concept of inheritance refers to the situation where a sub-class has not only its own attributes, but also inherits the attributes belonging to any higher-level classes to which that class belongs. Although computationally and semantically attractive because of its parsimony and elegance, this concept encounters representational problems with inheritance from different sets of higher-level classes and with exceptions which over-ride the inherited attributes; it therefore needs to be applied with caution. The classic example is Tweety the bird: the class of “bird” normally has the attribute “able to fly”, but if Tweety is a penguin, then this inherited attribute has to be over-ridden at the level of the class of “penguin” with the attribute “unable to fly”. 3.2.4 Terms from Personal Construct Theory (PCT) terminology Personal Construct Theory makes an initial distinction between elements (the entities being described) and constructs (the concepts used to describe them). This distinction is very similar to the distinction in AI between objects and attributes respectively. Considerable emphasis is placed in PCT on the elicitation of respondents’ own categorisation in the form of elements and constructs. Although elicitation of constructs may appear to a novice to be an endless task, in fact the number of constructs relevant to a particular domain of discourse is usually quite small (in fact, usually less than twenty, and often significantly less than that). Part of the reason for this is that the domain of discourse is only relevant to a sub-set of the constructs which the respondent knows; another part of the reason is that respondents will explicitly state that they know of more constructs which are applicable, but which are not particularly important. Since the elicited constructs are usually tersely described (two or three words) and tractable in number, it is possible to compare results across different respondents more easily than is the case with interviews, etc, and with more validity than is the case with e.g. questionnaires, which normally impose the elicitor’s constructs on the respondent rather than eliciting the respondent’s constructs. PCT has an explicitly defined set of terminology and concepts, such as focus of convenience (the core area to which a construct can be applied) and range of convenience (the range of contexts to which a construct can meaningfully be applied). Focus of convenience and range of convenience are the most immediately relevant to questioning methodology, and space prevents a more exhaustive listing, but PCT terminology is an area which could profitably be studied by elicitors working in a range of disciplines and approaches in which it is currently little known, such as discourse analysis. In particular, its combination of flexibility and formalism would make it well suited to areas which have in the past used structuralism or semiotics; PCT is at least as flexible and formalist as these, but considerably richer and better defined. This flexibility is also a factor in the authors’ preference for PCT over approaches such as Q methodology. For instance, the classic Q sort, in which cards are sorted into a predetermined distribution, is diametrically opposed in its approach to the PCT practice of examining a respondent’s repertory grid specifically to see whether the responses show an unusual distribution. One potential link between PCT and grounded theory (Glaser and Strauss, 1967) could repay investigation: grounded theory’s concept of tracing inferencing through a series of levels of abstraction of data has clear similarities to some of the concepts in laddering. In particular, laddering on explanations can be used to check whether concepts have been fully explained, as described below in the section on graph theory. 20 3.2.5 Graph theory A relevant literature which is comparatively little known in most non-mathematical disciplines is graph theory. This provides a clear, useful notation for representing knowledge in a way which allows qualitative analysis to be combined with quantitative. The term “graph” in this context refers not to graphs in the sense of plotting sets of values against each other, but to items linked to each other by lines, as in the simplified diagram below. Figure 1: a three layer graph A node arc A1 A1a A2 A1b A2a A2b A2c In this case, the top-level node (A) is joined by two arcs (connecting lines) to two lower-level nodes, (A1 and A2). Node A1 is joined by two arcs to leaf level (bottom level) nodes (A 1a and A1b); the node on the right (A2) is joined by three arcs to leaf level nodes. The graph has a total depth of three levels; the leaf-level nodes are the children of nodes A1 and A2, which in turn are the children of node A. The terms “nodes” and “arcs” are widely used in a range of disciplines in the sense described above, although formal graph theory favours the terms “vertices” and edges” respectively for the same concepts. There are various forms of graph, such as trees (graphs in which each node may have an upwards connection to a parent, and may have downwards connections to one or more children, but no sideways connections to other nodes) and nets (graphs which do not have the hierarchical structure of trees, and in which sideways links may occur). Graphs may be directed (each arc may be followed in one direction only) or undirected (each arc may be followed in either direction). Using a very simple tree as an example, it is possible to see how graphs offer a powerful and flexible formalism for representation of relationships. For instance, it is possible to count the layers of nodes in the graph, as an index of hierarchical organisation of structure, or to count the number of nodes at a particular level of the graph, as an index of differentiation and breadth at that point. An obvious application is the study of organisational behaviour, where such indices can be used to describe the structure of the organisation; however, the same concept can be applied to other areas. It has, for instance, been applied to elucidatory depth, i.e. the number of successive layers of explanation needed to reach public domain terms or tacit knowledge (Rugg and McGeorge, 1995), and can be applied in the same way to the fabricatory depth, i.e. the way in which tools are used to make tools to make tools as an index of the depth and breadth of a culture’s technological infrastructure (currently being investigated by the authors). 21 Facet theory, as used by Rugg and McGeorge (1995), is derived largely from graph theory, with the concept of separate trees orthogonal to each other but sharing some or all of the same leaf-level instances. This concept is conveniently similar to the concept of “views” in software engineering, and is becoming increasingly used in that field. A similar concept is well established in information science (Vickery, 1960), though without the same underlying mathematical formalisms. Facet theory makes it possible to describe complex multivariate structures as a set of separate and comparatively simple structures, and is applicable to a wide range of uses. For instance, an organisation may have one structure for the commercial organisation, another for union membership within it and another for safety officers. 3.2.6 Schema theory One of the features which Bartlett discovered in his research on memory (Bartlett, 1932) was that the processes of memory tend to organise events and facts into regular templates, which Bartlett termed schemata. The same underlying concept has been re-worked repeatedly in psychology since then, for instance in the form of script theory (Schanck and Abelson, 1977). This phenomenon is important to questioning methodology for two main reasons. The first is that it explains and predicts certain types of error in memory, particularly recall, which is salient to questioning techniques dependent on the respondent’s memory of the past. The second is that it helps explain the way in which respondents, particularly experts, structure parts of their expertise. This has important implications for elicitation of information about values and judgements, and can explain apparent inconsistencies in them, although there appears to have been comparatively little work on this. In the field of software metrics, for instance, the majority of work appears to have concentrated on the elicitation of individual metrics for evaluating software, rather than on finding out which categories respondents use to cluster software into groups, and which metrics are relevant to each of those groups. In the domain of car design, for instance, there are well established groups of car, such as town car, luxury car and estate car. The metric of “size” is applicable to all of these, but the desired value is very different for the different groups. In the case of a town car, small size is an asset, whereas in the case of a luxury car it is a drawback. Techniques such as laddering are well-suited to the elicitation of schemata, and it will be interesting to see what comes of future work using this approach. The field of software design appears to be particularly ready for such work, which would complement the existing literature on customising software to the individual user, and on identifying generic user types. 22 4 METHOD FRAGMENTS The traditional unit of analysis and discussion in elicitation is the method/technique: for instance, the interview, or the questionnaire, or repertory grid technique. There are, however, significant problems with this approach when looking at the bigger picture. One problem is that for most techniques there is no single standard form, so any description of the technique has to include descriptions of the main variants of the technique. Another is that the various techniques tend to blur into each other – the distinction between a self-report and an interview in which the respondent uses materials to demonstrate a point, for instance, is hard to draw. A related further problem is that the same features may occur in two or more techniques, leading to duplication of description in any systematic account of the techniques. These problems, and others like them, make it difficult to provide a systematic, clear, precise set of descriptions and prescriptions about methods/techniques and their use. One solution to this problem is to use a finer-grained unit of analysis. Instead of treating each method or technique as an integral whole, one can instead treat it as being composed of a number of sub-components. For instance, in scenarios the elicitor uses a prepared set of information for the respondent; the elicitor then asks natural language questions; the respondent answers using natural language responses. This is quite different from the structure of a repertory grid session, where the elicitor uses a prepared grid format, and encourages the respondent to identify constructs which describe the elements used in the grid. It is, however, composed of some elements in common with a structured interview, which also involves natural language questions and natural language responses. We introduce the term “method fragments” to describe these sub-components. Method fragments can be identified at various levels of granularity. The coarsest grained level consists of fragments such as “natural language question,” with finer grained levels such as “natural language question about a future event” and “natural language question phrased as a probability value.” Method fragments have obvious practical advantages in any systematic work involving elicitation techniques and methods. They can be used to reduce or remove repetition when two or more techniques share common method fragments. In such cases, it is only necessary to cover each method fragment once, and to state which techniques involve that fragment. They can also be used in a “pick and mix” way to create the appropriate customised variant of a technique, or blend of two or more techniques, to fit a particular situation. One of our recent student projects, for instance, involved asking respondents to say what was going on in a photo, then followed this up with a short set of previously prepared questions, the responses to which were in turn probed using laddering. These fragments allowed investigation of attributional effects via the report on the photos (for instance, investigation of how women’s status was perceived in photos where the women were using IT equipment) which could then be compared with the accounts obtained via the interviews; the laddering allowed identification of attributes which were perceived as status markers, which could in turn be compared with results from the other two fragments. A more profound advantage is that the use of fine-grained method fragments makes it possible to provide grounded advice about use of appropriate formats. In the case of “natural language question phrased as probability value,” for instance, there is a considerable literature within the Judgement and Decision-Making (J/DM) area of psychology dealing with the various cognitive biases which are associated with probabilist and frequentist presentations of the same underlying question. Similarly, the literature on attribution theory provides strong guidance about outcomes from phrasing a question in the second or the third person (“What would you do…” versus “what would most people do…”). Although it might be thought that the number of potential method fragments would be enormous, our initial work in this area suggests that the number is in fact quite tractable. Our research so far has been both bottom-up, working from practical experience towards theory, 23 and top-down, working from theory towards practice. There is still a considerable amount of work to be done in this area, but it holds great potential. 24 5 CASE STUDIES 5.1 Bulk carriers The first case study described here was one of the precipitating events leading to the development of the framework described above. The case study involved following the development of software to be used in the loading of bulk carrier ships. As part of this process, the author wanted not only to interview the software development team, but also to observe them in action, and to observe loading in progress. The interviews were unproblematic, but there were practical and security problems with access to the loading. During the negotiations about this, the software developers decided to undertake their own visit to observe loading, since their knowledge of the process came from requirements given them as documentation. The developers soon found several important aspects of the loading process which had serious implications for system design, but which had not been mentioned anywhere in their documentation. For instance, the developers had assumed that loading would occur at a fairly constant rate, making it possible to predict loading strains on the hull reasonably well in advance; however, this assumption turned out not to be correct. It also transpired that hull stresses could very quickly change from safe to dangerous if the cargo being loaded was a dense one, such as iron ore, where a large weight of cargo could be loaded very quickly. The developers had also not realised how much noise, glare and vibration were associated with the loading process, which had serious implications for the design of any computer based warning system. In this example, the system analysis had been carried out competently by professionals, but had failed to record several important facts in the documentation. These facts were discovered in less than an hour by developers with no formal training in observation, leading one to wonder how many more might have been uncovered by a trained specialist. Interestingly, all the missing factors in this example appear to have been cases of taken for granted knowledge. 5.2 Industrial printing The second case study was undertaken by Blandford and Rugg (in preparation) as part of an assessment of the feasibility of integrating requirements acquisition for real-world software systems with usability evaluation in general and Programmable User Models in particular. The domain involved was industrial printing of, for example, sell-by dates onto products; the company involved was a market leader in the field. The case study consisted of two main phases, the first of which was undertaken at the company’s premises, and the second of which was undertaken in a client’s food processing factory, where the equipment could be seen in action. The first phase involved interviews with stakeholders, conducted separately to identify any differences between stakeholders with regard to requirements, and also demonstrations of the equipment, which were combined with observation and on-line self-report. The demonstrations showed that the demonstrators did not use the equipment often enough to have compiled the skills involved in using it, and also showed that using it was not a trivially easy task. (Since the equipment is normally set up to print a sell-by date a given number of days in the future, and can automatically update the date to be printed, simply showing the user the date being printed is not enough; the equipment also needs to be able to show the length of time by which the date is being offset.) It became clear that user navigation through the equipment and security issues associated with the password protection for the equipment were particularly important potential problems. Programmable User Models were used to identify particular problems which 25 might arise, after which the visit to the client’s site was conducted to see how well these predictions corresponded with reality. The security issue turned out to have been solved by a passive work-around; the equipment was positioned next to packing teams, making it extremely difficult for anyone to use the equipment without authorisation. One device, however, was positioned in an isolated part of the factory, and there had been concerns about its security, as predicted by the authors. (There had been one occasion when a print code had mysteriously changed in the middle of a shift.) The user navigation issue turned out to be an interesting one in several ways. The first phase of investigation had shown that frequency of use of the device would be an important variable (and one where the software development stakeholder and the training stakeholder had different perceptions of how often typical users would use the device). The authors had predicted that if the device was used frequently enough, then the skills involved would become compiled, and navigation would not be a problem; however, if the device was used less frequently, then navigation would be a problem, with various likely errors. The site manager told the authors that there were different levels of training for the different staff involved, who used the device at various levels of sophistication. He mentioned that he and some of the other senior staff had been on the full training course, and were familiar with the device. This is what would be expected as a front version, and there was the prospect that the back version would be very different. However, when the manager demonstrated a feature of the device, the speed at which he operated it was clearly the result of a compiled skill, which indicated considerable use of the device, which in turn indicated that there was not a significantly different back version. For staff who used the device less often, for simple tasks, there were printed “crib sheets” (aides-memoires) attached to the device. This was an interesting finding, since an earlier interviewee had told the authors that this approach would not be used in the food industry because of the need to clean the outside of the device frequently to comply with health and safety regulations. In addition to these expected issues, some serendipitous findings emerged. It had been expected that observation would identify issues missed in the previous sessions, but it was not possible to predict what these would be. An example of this was that the air in the second site contained a high proportion of suspended dust particles from the dried foods being processed. This was not directly relevant to the requirements for the equipment design, but had important indirect implications. The amount of dust was sufficient to make wearing spectacles inconvenient, since the lenses soon became dusty. A significant proportion of the staff on site were middle aged, and needed glasses to read small text, such as that on the device’s display screen. Since the site dealt with food processing, health and safety legislation meant that staff had to wear white coats. This combination of factors meant that for a significant proportion of staff, checking the display on the device involved taking out spectacles from under a white coat, putting the spectacles on, reading the display, cleaning the spectacles, and then putting them away again. This in turn meant that it was not possible to depend on staff glancing at the display in passing as a routine method of checking the device, with consequent implications for working practices. Although the client had a long relationship with the company, and the sales representative who accompanied the authors to the site was on good terms with the manager and clearly knew the site well, there had been no previous mention of the dust issue and its implications. The most likely explanation was, once again, taken for granted knowledge which had gone unmentioned and undetected until observation was used. 26 5.3: women’s working dress A study of perceptions of women’s clothing at work used card sorts to investigate categorisation of women’s working dress by male and female respondents (Gerrard, 1995). This is an area which had previously been investigated by other researchers using a range of familiar techniques. However, there had not been any previous work using card sorts, which appeared to be a particularly appropriate technique for this area. In the study, each card held a different picture of a set of women’s clothing worn by a model. Respondents were asked to sort the cards repeatedly into groups of their choice, using a different criterion to categorise all of the cards each time (individual cards could be sorted into a group such as “not applicable” or “don’t know” if necessary). One finding was that half of the male respondents, but none of the female respondents, used the criterion of whether the women depicted were married or unmarried. This was something which had not emerged as an issue in any of the previous research in this area. It was also of interest because the pictures did not show the faces or the hands of the models, because of the risk of distraction from cues other than the clothing itself, so the respondents were unable to see wedding rings or other indications of marital status, and were therefore categorising solely on the basis of the clothing. 5.4: change at work Management of change is a topic which has received considerable attention from researchers and practitioners. Change of apparently trivial factors can have knock-on implications which connect to high-level values and goals in those affected by the change, and which can in turn lead to strong emotions and often resistance to the proposed change. This appeared a particularly suitable area for investigation via laddering, and was investigated using both laddering and a questionnaire in the same organisation, which was about to bring in a new IT system (Andrews, 1999). The results obtained via the two techniques had some similarities; for instance, the theme of improved communication via the proposed new IT system ran through responses from both techniques. However, there were also some striking differences. For example, only 5% of the respondents stated in questionnaires that the new technology would affect their job security, whereas this was explicitly mentioned by 43% of the respondents when laddering was used. Another interesting result emerged during the quantitative analysis of the laddering results. This involved counting the average number of levels of higher level goals and implications of the new system elicited from respondents in different positions in the organisation. The average number of levels used by respondents with higher positions in the organisation was 1.9, whereas respondents lower in the organisation used an average of 3.7 levels. This result is counter-intuitive, since one would expect the more senior staff to have thought through more implications than the less senior staff. However, what was happening with the responses was that often the more senior staff were proceeding directly to the implications for the organisation, whereas the less senior staff were first proceeding to the implications for them personally, then moving to the implications for the organisation, and then proceeding to further implications for them personally. Case studies: summary The case studies are simply case studies; wholesale testing of the framework will be a much larger operation. However, it is significant that in all cases, important issues were missed by previous work, and emerged only when different questioning techniques were introduced to the field, as predicted by the framework. It is also interesting that taken for granted knowledge, missed by interviews, featured prominently in both the first two cases. The problem of missing knowledge cannot be simply resolved by using observation in addition to whichever other technique the elicitor happens to favour; although observation happened to be an appropriate method in two of these case studies, there are other situations 27 where it is impractical or impossible. An example of this occurred when one of the authors was investigating staff and student perceptions of what constituted a good dissertation. Staff and students agreed that presentation was an important factor, but elucidation via laddering of what was meant by “good presentation” showed that students interpreted “good presentation” quite differently from staff. The systematic nature of laddering made it possible to uncover the different interpretations in a way which would not have been practical via observation (which would have required an enormous range of examples of dissertations, and even then could not have guaranteed to identify all the rare features). It also improved the chances of identifying that there were different interpretations of the same term: because laddering usually proceeds down till a term has bottomed out, it elicits a fairly full description of how a term is being used. Interviews can do this, but it is not an inherent feature of interviews per se, and the degree of elucidation is normally decided by the preferences of the interviewer rather than any systematic principle. Selection and integration of techniques is clearly a critical factor in eliciting valid, reliable information, and needs to be considered carefully. 28 6 DISCUSSION It should be clear from the examples above that questioning methodology spans a wide range of areas, and that a considerable amount of work remains to be done. The following sections discuss the main issues involved. 6.1 Questioning methodology It is clear that choice of questioning technique is something which draws on a wide body of findings from a numerous disciplines, and which is not trivially simple. It would therefore make sense to treat questioning methodology as a field in its own right, analogous to, and complementary to, statistics and survey methods. The commonality of methodological and theoretical issues in questioning across disciplines is sufficient to make cross-fertilisation both possible and desirable. It is also clear that no single technique is adequate for handling the full range of knowledge types likely to be encountered, and that elicitors should expect to use more than one technique. Choice of the appropriate technique is an important issue, and there is a need for a guiding framework which is empirically validated and which is theoretically grounded in knowledge types and information filters, rather than a hopeful collection of ad hoc rules of thumb. 6.2 Validation The framework described here is built of components which are individually validated, but this does not mean that the way in which they have been assembled is necessarily valid. An important piece of future work is validation of the framework, so that errors can be identified and corrected. The authors are currently working on this, via a combination of case studies and formal experiments. Initial results from case studies are consistent with the predictions of the framework, particularly in the case of semi-tacit knowledge; the role of taken for granted knowledge has been particularly striking. 6.3 Training needs A practical point arising from the discussion above is that if the framework is validated by further work, then elicitors will need to be trained in relevant techniques before undertaking questioning work. This has serious implications in terms of training needs, and it is likely that questioning courses, analogous to statistics courses, would need to be set up as a routine part of academic and industrial infrastructure. The authors’ experience is that such a course is feasible, especially if the initial emphasis is on providing an overview of the framework and the main issues, with more detailed coverage of the particular techniques needed for an individual project. 6.4 Cross-discipline work One of the reasons that a framework has taken so long to emerge is almost certainly that the relevant knowledge was scattered across so many different disciplines that most researchers would never see enough to obtain an overview. The brief descriptions above of the various techniques and concepts does not do justice to the wealth of knowledge which has been built up in the different disciplines, and there is much to be gained from exchange of knowledge across disciplines. The need is not only for exchange of information, but also for comparisons across disciplines, domains and cultures, to assess the degree of commonality and of difference which exists 29 across them. Repertory grid technique, for instance, makes clear and explicit assumptions about some aspects of human cognition; it would be interesting, and very much in the tradition of PCT, to see whether these hold true across different cultures. 6.5 Future work The most immediate and obvious need for further work involves empirical validation of the framework described above. Although the framework is composed of established parts, this does not guarantee that the way in which they have been fitted together is correct. Validation of this sort requires large data sets; initial results from case studies indicate that the framework is sound, and provides useful guidance. One striking effect in the case studies has been the prominence of taken for granted knowledge as a source of missed requirements when only interviews are used. Another interesting feature is the frequent use of tacit knowledge by experts, often as a sub-component of a wider skill or task (described in more detail below). An area which is attracting increasing attention is elicitation of information across cultures – for instance, when a high-cost product such as a building or an aircraft is being designed and built for a client from another culture. Even within a single country, different organisations can have quite different corporate cultures, with different implications for the suitability of a particular product to their context: this has been the subject of considerable work, much of it by researchers following the sociotechnical approach pioneered by groups such as the Tavistock Institute. The framework described above provides a structured and systematic way of approaching such problems, but does not in itself guarantee that the appropriate techniques exist to solve them. One promising approach to such problems is the use of laddering. Laddering allows the elicitor to break down terms used by the respondent into progressively more specific components, until the explanation “bottoms out” at a level which cannot be explained further. The components at this level may be of several types. One type is externally observable features, such as size and colour; the other main type involves pattern matching in the broadest sense (shape, texture and sound), which may in turn be either public domain pattern matching (i.e. using patterns known to the lay public) or expert pattern matching, which is often associated with implicit learning and compiled skills. These components can then be compared across respondents. A similar approach can be used to tackle issues such as script theory and chunking, where different respondents have different ways of grouping items together into higher-level structures. These structures differ between disciplines and professions. The usual example of a script (Schanck and Abelson, 1977) is a series of actions linked in a predictable way, each of which may in turn be composed of several sub-actions; eating at a restaurant, for instance, usually consists of several main actions, such as “booking a table”, “hanging up coats” and “ordering”. “Chunking” is a similar concept involving the grouping together of individual items into a higher-level group. One of the major differences between experts and novices is the extent to which series of actions are scripted or chunked up. In field archaeology, for instance, the script of “drawing a section” (i.e. a cross-section through an archaeological feature being excavated) is composed of a large number of sub-tasks such as establishing a reference height relative to the site temporary bench mark, with each of these in turn being composed of other lower-level tasks such as setting up the surveying equipment. It is possible to elicit these scripts and chunks using laddering, and then to use graph-theoretic approaches to show the number and nature of them, thus combining qualitative and quantitative analysis. Interestingly, the concept of script theory can be applied to the concept of agenda setting in discourse analysis, as an example of implicit or explicit debate about the script to be used by 30 participants in the discourse. It should in principle be possible to use the same ladderingbased approach as described above to investigate the nature and number of scripts available for a particular situation, and to approach the area of social interaction from a social cognition perspective. 31 6.6 Conclusion In the past, questioning methodology was a Cinderella discipline compared to the elegant sisters of statistics and survey methods. It is now clear, however, that questioning methodology is as important and as rich a discipline as its sisters. The next steps are the traditional ones for a newly emerging discipline: the bringing together of knowledge from parent disciplines, the establishment of new research agendas and approaches, and the setting up of the infrastructure to support this, in such forms as workshops, textbooks, conferences, and journals. It will be interesting to see what emerges from the work ahead; whether any previously intractable problems turn out to be tractable after all, and what new challenges appear to take their place. Traditionally, living in interesting times was treated as a curse, but in academia, living in interesting times is what every researcher hopes for. This certainly appears to be the most likely future for researchers in questioning methodology. 32 BIBLIOGRAPHY Anderson, J.R. The Adaptive Character of Thought. Erlbaum, Hillsdale N.J., 1990 Andrews, S. An assessment of end user attitudes and motivation towards new technologies in the workplace and the behaviours arising from them. Unpublished undergraduate thesis, University College Northampton, 1999 Ayton, P. Pers. com. April 1998 Baddeley, A.D. Human memory: Theory and practice. Lawrence Erlbaum Associates, Hove, 1990 Bannister, D. and Fransella, F. Inquiring man. Penguin, Harmondsworth, 1980 Bartlett, F.C. Remembering: A study in experimental and social psychology. Cambridge University Press, Cambridge, 1932 Belkin, N.J. , Oddy, R.N. and Brooks, H.M. ASK for Information Retrieval: Part I, Background and Theory. Journal of Documentation, 38(2), pp. 61-71, June 1982. Blandford, A. and Rugg, G. Integration of Programmable User Model Approaches with Requirements Acquisition: A case study. In preparation Boose, J.H., Shema, D.B. and Bradshaw, J.M. Recent progress in AQUINAS: A knowledge acquisition workbench. Knowledge Acquisition, (1): 185-214 (1989). Chi, M.T.H., Glaser, R. and Farr, M.J. (eds.) The Nature of Expertise. Lawrence Erlbaum Associates, London, 1988. Cortazzi, D. and Roote, S. Illuminative Incident Analysis. McGraw-Hill, London, 1975 Denzin, N.K. and Lincoln, Y.S. (eds.) Handbook of Qualitative Research. Sage, London, 1994 Ellis, C. (ed.) Expert Knowledge and Explanation: The Knowledge-Language Interface. Ellis Horwood, Chichester, 1989. Eysenck, M.W. and Keane, M.T. Cognitive Psychology. Psychology Press, Hove, 1995 Fransella, F. and Bannister, D. A manual for repertory grid technique. Academic Press, London, 1977 Gerrard, S. The working wardrobe: perceptions of women’s clothing at work. Unpublished Master’s thesis, London University, 1995 Gigerenzer, G. Why the distinction between single event probabilities and frequencies is important for psychology (and vice versa). In Wright, D. and Ayton, P. (eds.) Subjective probability. John Wiley and Sons, Chichester, 1994 Glaser, B.G. and Strauss, A.L. The Discovery of Grounded Theory. Aldine, New York, 1967 Goffmann, E. The Presentation of Self in Everyday Life. Doubleday, New York 1959 Grice, H.P. Logic and Conversation. In Cole, P. and Morgan, J.L. (eds.) Syntax and Semantics 3. Academic Press, New York, 1975 Hinkle, D., The change of personal constructs from the viewpoint of a theory of construct implications. Unpublished PhD thesis, Ohio State University, 1965. 33 Cited in Bannister, D. and Fransella, F. Inquiring man. Penguin, Harmondsworth, 1980 Honikman, B. (1977). Construct Theory as an Approach to Architectural and Environmental Design. In Slater, P. (ed.), The Measurement of Interpersonal Space by Grid Technique: Volume 2: Dimensions of Interpersonal Space. John Wiley and Sons, London, 1977 Jarke, M., Pohl, K., Jacobs, S., Bubenko, J., Assenova, P., Holm, P., Wangler, P., Rolland, C., Plihon, V., Schmitt, J., Sutcliffe, A.G., Jones, S., Maiden, N.A.M., Till, D., Vassilou, Y., Constantopoulos, P. and Spandoudakis, G. Requirements Engineering: An Integrated View of Representation. In Sommerville, I. and Manfred, P. (eds.) Proceedings 4th European Software Engineering Conference, Garmesh-Partenkirchen, 1993. Springer-Verlag, Lecture Notes in Computer Science 717, pp. 100-114. Kahneman, D., Slovic, P. and Tversky, A. (Eds.), Judgement under Uncertainty: Heuristics and Biases. Cambridge University Press, Cambridge, 1982 Kelly, G.A. The Psychology of Personal Constructs. W.W. Norton, New York, 1955 Loftus, E.F. and Palmer, J.C. Reconstruction of automobile destruction: An example of the interaction between language and memory. Journal of Verbal Learning and Verbal Behaviour, 13, 585-589, 1974 Maiden, N.A.M. and Rugg, G. ACRE: a framework for acquisition of requirements. Software Engineering Journal, pp. 183-192, 1996 Mead, M. Coming of Age in Samoa. William Morrow, New York, 1928 Michalski, R.S. and Chilauski, R.L. Learning by being told and learning from examples: an experimental comparison of the two methods of knowledge acquisition in the context of developing an expert system for soybean disease. International Journal of Policy Analysis and Information Systems, 4, pp. 125-161, 1980 Miller, G.A. The magic number seven, plus or minus two: Some limits on our capacity for processing information. Psychological Review, 63, 81-93, 1956 Myers, D.G. Social Psychology (3d edition). McGraw Hill, New York, 1990 Neves, D.M. and Anderson J.R. (1981). Knowledge compilation: mechanisms for the automatization of cognitive skills. In Anderson J R (ed). Cognitive skills and their acquisition. Erlbaum, Hillsdale, N.J., 1981 Norman, D. The design of everyday things. Doubleday/Currency, New York, 1990 Reichgelt, H. Knowledge Representation. Ablex Publishing Corp., Norwood, 1991 Patrick, J. A Glasgow Gang Observed. Eyre Methuen, London, 1973 Reynolds, T.J. and Gutman, J. (1988). Laddering Theory, Method, Analysis, and Interpretation. Journal of Advertising Research, February-March 1988, pp. 11-31. Rosch E. Prototype Classification and Logical Classification: the Two Systems. In Scholnick, K. (ed.) New Trends in Conceptual Representation: Challenges to Piaget's Theory. Lawrence Erlbaum Associates, Hillsdale N.J., 1983 Rosenhan, D.L. On Being Sane in Insane Places. Science 179 (January 19, 1973) 34 Rugg, G. and McGeorge, P. Laddering. Expert Systems, 12(4), pp. 339-346, 1995 Rugg, G. and McGeorge, P. The sorting techniques: a tutorial paper on card sorts, picture sorts and item sorts. Expert Systems, 14(2), 1997 Seger, C.A. Implicit learning. Psychological Bulletin, 115 (2), pp. 163-196 (1994). Shaw, M.L.G. Recent Advances in Personal Construct Theory. Academic Press, London, 1980. Shaw, M.L.G. and Gaines, B.R. A methodology for recognising consensus, correspondence, conflict and contrast in a knowledge acquisition system. Proceedings of Workshop on Knowledge Acquisition for Knowledge-Based Systems, Banff, Canada, Nov 7-11, 1988. Schank, R.C. and Abelson, R.P. Scripts, plans, goals and understanding. Lawrence Erlbaum Associates, Hillside, N.J., 1977 Sommerville I., Rodden T., Sawyer P., Bentley R. and Twidale M., 1993, Integrating Ethnography into the Requirements Engineering Process. Proceedings of IEEE Symposium on Requirements Engineering, IEEE Computer Society Press, 165-173. Vickery, B.C. Faceted Classification: A Guide to the Construction and Use of Special Schemes. Aslib, London, 1960. Weber, M. Legitimate Authority and Bureaucracy (1924) In Pugh, D.S. (ed.) Organisation theory: selected readings (third edition) Penguin, London, 1990 Wicker, A.W. Attitude versus actions: The relationship of verbal and overt behavioral responses to attitude objects. Journal of Social Issues, 25(4), pp. 41-78 (1969) Zadeh, L. Fuzzy sets. Information and Control, 8, pp. 338-353 (1965) 35