Verification of Expertise and Reasoning (Verifier) Gordon Rugg, Joanne Hyde, Marian Petre & Aspassia Daskalopulu September 2004 Brief summary: the Verifier approach was designed to tackle long-standing problems where research does not appear to be making much progress; particularly problems where researchers believe that they have most of the pieces of the jigsaw, but can’t get them to fit together. The approach involves establishing what the key assumptions and concepts are in the domain, then critically re-examining them for possible errors. This is difficult to do properly, and involves a significant number of different literatures and disciplines. We estimate that it would take about fifteen years to learn all the relevant skills to the appropriate level. The method can help identify key assumptions which are not solidly grounded, and/or other possibilities which have been overlooked. It should be particularly useful for apparently intractable long-standing problems. It can in principle be used in any field, since it focuses on classic shortcomings in human reasoning, which tend to be constant across disciplines. We are preparing an article describing this method, and the case study on the Voynich manuscript, in more detail. Introduction: This document describes a method for critically re-examining previous research into difficult problems. This method integrates several bodies of literature on topics such as expert reasoning, the craft skills of research, human error and normative logics. The result is a systematic approach which can be used to identify points where previous research may have gone down the wrong route as a result of human error; it can also help identify other routes which had previously been missed. The first author has tested this method informally on a long-standing problem from cryptography (the Voynich manuscript). The previous consensus had been that the manuscript was too linguistically complex to be a hoax; re-examination showed that it was possible to hoax something of comparable complexity using sixteenth century methods (Rugg, 2004). The main stages and literatures involved in this method are as follows. Some literatures appear in more than one stage. Assessment of expertise: Has the topic been studied by experts in the appropriate areas, or has some area of expertise been missed? There is a substantial and robust literature on the nature of expertise, which is relevant to various aspects of Verifier. One well established finding in this literature is that expertise is tightly bounded and brittle. Experts in one area are not automatically experts in an area which appears closely related. Experts are frequently, but not always, well aware of the boundaries of their expertise. Key literatures: The classic texts in this area include de Groot’s study of chess masters, and a general examination of expertise by Chi et al. Expertise has been studied from various perspectives, including the development of Artificial Intelligence expert systems, expert problem solving, and experts’ use of mental images in problem-solving. It has been consistently found that experts in a wide range of fields show similar behaviour. For instance, expertise is typically built on a huge amount of knowledge about the expert’s field, rather than on reasoning ability; experts structure their knowledge about their field in a richer and deeper manner than novices; experts often use pattern-matching against previous cases as a fast way of solving problems; and expertise is tightly bounded. Implications: One implication of this literature is that experts operating outside their area are likely to make mistakes – for instance, if they judge something to be unlikely or impossible (since they will be unaware of possible solutions arising from other fields). Another implication is that if an area of expertise is missing from research into a particular problem, then a set of possible solutions or promising approaches may be missed. Although expertise in a given area is dependent on considerable knowledge which is specific to that area, the literature on human error shows that the types of mistakes which experts make are fairly consistent across different areas – (in brief, getting something right depends on specific knowledge of the field, but getting it wrong doesn’t require expertise…) Establishing how research is actually conducted in the area: What are the methods, craft skills, assumptions, etc, which are used, and what are the implications? If the expertise is relevant, then the next step is to understand how that expertise actually operates (as opposed to how the books say it operates). There are various relevant literatures, such as the sociology of science, and the literature on craft skills. Key literatures: The literature on the history and sociology of science contains numerous instances of social conventions and shared assumptions within a field which have led to misunderstandings and errors. Steven Jay Gould’s essays are a good introduction to this. This literature is the subject of considerable debate. For instance, Latour’s well-publicised ethnographic study of scientists in action is open to the accusation that by deliberately only studying the surface manifestation of what the scientists were doing, he systematically missed the underlying reasoning and knowledge which made those actions meaningful. Feynman’s analogy of cargo cults is particularly relevant here: during the second world war, some Melanesian islanders imitated the outward appearance of airstrip construction, in the hope that this would lead to aircraft bringing in goods for them. In this instance, imitating the outward appearance of a radio was of no use: only a real radio would fulfil the functions which the islanders required. Similarly, anyone studying expert behaviour runs the risk of seeing only the external appearances of what the experts are doing, and of missing the underlying features which make the experts’ activity meaningful. When studying how experts operate, it is important to understand how they really operate, as opposed to the “public consumption” version. The literature on elicitation methods is particularly relevant, since many parts of expertise involve tacit knowledge, which cannot be accessed reliably or validly via interviews and questionnaires: other methods are needed. An important set of expertise involves “craft skills” – skills usually viewed as not important or formal enough to merit description in textbooks – and these skills are usually tacit skills which are taught through mentoring and experiential learning. A related issue is that a technology, or a categorisation system, can steer researchers in a particular direction. In the case of the Voynich manuscript research, for instance, most previous researchers had used probability theory as a research tool, which had led them away from some possible solutions. This is not the same as the Sapir-Whorf hypothesis, at least in its stronger and popularised form; although plausible-looking, that hypothesis has been largely discredited within linguistics. Implications: finding out how experts actually operate is essential; without a clear and correct understanding of this, any attempt to evaluate the experts’ reasoning is based on sand. Finding out how experts actually operate is difficult. Representing how knowledge is represented, structured and used in this area: What metalanguage, concepts and classification do people use when doing research in this area? Once it is clear what the experts are actually doing in the problem area, then it is necessary to examine the conceptual tools that they are using. Key literatures: Key literatures include knowledge representation, category theory, set theory and semantics. Implications: This is important because the experts may be using language, classifications or technologies which subtly channel them towards a particular route, and away from other possible solutions – for instance, towards a single hierarchy rather than a faceted system. Identifying key assertions in this area: What are the key assertions, and what are the conclusions that researchers have drawn from them? After the conceptual tools have been identified, the next stage is to identify the key assertions underpinning most work in the problem area, and to examine how these have been interpreted and used by experts in this area. These assertions are often classic findings from famous past studies – for instance, Miller’s paper about the limitations of human short term memory. Key literatures: Relevant literatures include the sociology and history of science, which provide insights into how researchers use evidence; other literatures such as bibliometrics are also relevant. Knowledge representation is also important, and may be complemented by approaches such as argumentation and cognitive causal mapping. Implications: This stage is important because a key belief, or a key conclusion, may be based on a misunderstanding, a logical fallacy, an inaccurate summary of an original source in a more widely-read secondary source, or on a primary source which is erroneous. If this is the case, then any reasoning based on this source is fundamentally flawed. What sorts of errors are people likely to make in this context: Are there classic types of errors that are particularly likely here? Once we know what the problem area is like, what sorts of expertise are involved, what the conceptual tools are, and what the key assertions are, then we can look for classic types of human error which are particularly likely to occur in this context. Key literatures: There are substantial and robust literatures on human error; on human judgment and decision-making; on disasters, and various other related topics. For instance, if a task involves making estimates of probability, then humans are prone to various well-established types of error; if the same task involves making estimates of frequencies, however, then these errors are less likely, even if the underlying task is the same. Two of the main research traditions which we have used are the “heuristics and biases” approach associated with Kahneman, Slovic & Tversky, and the naturalist/frequentist approach associated with Gigerenzer, Wright, Ayton and colleagues. There is also a literature on human error, as in Reason’s work, and a considerable literature on human factors and disaster, as in Perrow’s work. There are also relevant research traditions which are less widely known, such as Edwards’ pioneering work on errors in reasoning, and the literature on naïve physics. Much expertise involves categorisation, and there are rich literatures on categorisation and set theory, including approaches such as Rosch’s work on psychological prototypes, and subsequent work on “natural categories”. In addition, there is an extensive literature on logical fallacies and on rhetorical devices relating to these. Some of these literatures appear in more than one stage of the Verifier approach. Implications: Although expertise is tightly bounded, the evidence strongly suggests that human error shows considerable regularities even across widely different domains. Have people actually made errors here, and if so, what are they? This stage involves using appropriate formal logics to assess whether the experts’ reasoning is correct for each of their key conclusions. The stages described above provide the foundations and materials for this. Formally assessing the correctness of expert reasoning requires the preliminary stages described above, since otherwise the assessor may be operating from faulty assumptions about how the experts are operating. Key literatures: The relevant literatures here include those on logic, formal methods and decision theory, and also the judgment and decision-making literature. Implications: This stage is important because researchers may have based their work on an underlying logical error which has steered them in the wrong direction. It is also a stage which is more complex than might be supposed, since choice of the correct logic is non-trivial, and depends on a clear and detailed knowledge of the preconditions for using each logic, and of whether these preconditions have been satisfied. This has been a recurrent theme in previous attempts to assess human reasoning within the heuristics and biases tradition, where a frequent problem has been estimation of realistic baserates, combined with an understanding of the subjects’ construing of the experimental task. Summary: The Verifier approach works by bringing together a substantial range of complementary disciplines, to establish how experts have actually tackled a problem, and to identify areas where the experts may have proceeded using faulty assumptions, or have missed possible routes. What is novel about this approach is the integration of these disciplines. The integration is an important part of the method: using any of the individual disciplines in isolation would probably not work, or have only limited success. For instance, formally assessment of the experts’ reasoning depends on a correct understanding of what the experts’ reasoning actually is, which in turn depends on correct use of knowledge representation and on eliciting the experts’ reasoning in context. Using this approach properly requires a significant knowledge of several different disciplines. We are about to investigate whether this knowledge can be supplied by a team of people with complementary skills, or whether it needs to be supplied by a single individual who can see the “big picture” of the problem area in a way that a team could not. We also plan to test this approach on a variety of problems, to gather more information about the types of problems for which it is best suited. Frequently asked questions: Q: Why hasn’t this been done before? A: It depends on a large number of different literatures from different disciplines. It takes a long time to become familiar enough with all of these to be able to use them properly in combination. People have used the individual components before in isolation, with mixed results – a fair amount of success, but usually encountering problems about validity and generalisability. For example, previous critical assessment of expert reasoning has typically been criticised on the grounds that it wasn’t based on an accurate understanding of the context within which the experts were operating, or in realistic tasks. Getting that right requires a detailed knowledge of how to elicit an accurate and complete picture of how the experts really operate, which is a large and difficult problem in its own right. Q: Can I do this myself? A: If you’re prepared to read up the research literature on the topics above. Remember that you’ll need to read the journal articles, etc, rather than the simplified textbook accounts. Q: Is this just another version of lateral thinking, or bringing in a fresh pair of eyes? A: No. Lateral thinking is about creative generation of possible new solutions; this approach is about re-examining what’s been done before, to see where and why research has become stuck. The fresh pair of eyes may see a new solution, but longstanding problems will already have been seen by many fresh pairs of eyes without success; this approach involves a systematic examination, based on knowledge of classic errors. Q: Would this be applicable to areas like research into AIDS? A: We don’t know. This approach focuses on problems where research has hit problems because of mistaken assumptions, rather than lack of knowledge. In many fields, such as AIDS research, it’s clear that we don’t yet have all the parts of the puzzle. Q: Could you use a team-based approach to provide all the relevant expertise? A: We’re looking into that. There would be obvious practical advantages in this, but it may be that some key stages depend on one person having a big-picture overview. Q: Did you show that the Voynich manuscript was a hoax? A: No: the Voynich case study showed that a hoax of comparable complexity was possible using sixteenth century techniques. That’s subtly different. Whether or not you can prove unequivocally that something is a meaningless hoax is an interesting question, and is the subject of considerable discussion in the Voynich research community. Q: How did the Voynich manuscript come into the story? A: We were planning to apply for research funding to apply Verifier to Alzheimer’s, and Gordon wanted to get some practical experience of using Verifier before we wrote the bid. The manuscript looked like a suitable test case – a long-standing, welldefined problem involving an apparent paradox. (And, yes, he really did work at the kitchen table, and his research budget really was about fifteen pounds – for paper, ink and a calligraphic pen.) Q: What areas will you look at next? A: We’re considering various possibilities in a range of disciplines, including medicine, biochemistry and physics. Contact: Queries should be addressed to the first author: g.rugg@cs.keele.ac.uk References: Rugg, G. (2004a) An elegant hoax? A possible solution to the Voynich manuscript Cryptologia, XXVIII(1), January 2004, pp. 31-46 ISSN 0161 - 1194 Rugg, G. (2004b) The Voynich Manuscript. Scientific American, July 2004, pp. 104-109