Verification of Expertise and Reasoning (Verifier)

advertisement
Verification of Expertise and Reasoning (Verifier)
Gordon Rugg, Joanne Hyde, Marian Petre & Aspassia Daskalopulu
September 2004
Brief summary: the Verifier approach was designed to tackle long-standing problems
where research does not appear to be making much progress; particularly problems
where researchers believe that they have most of the pieces of the jigsaw, but can’t get
them to fit together. The approach involves establishing what the key assumptions and
concepts are in the domain, then critically re-examining them for possible errors. This
is difficult to do properly, and involves a significant number of different literatures
and disciplines. We estimate that it would take about fifteen years to learn all the
relevant skills to the appropriate level. The method can help identify key assumptions
which are not solidly grounded, and/or other possibilities which have been
overlooked. It should be particularly useful for apparently intractable long-standing
problems. It can in principle be used in any field, since it focuses on classic
shortcomings in human reasoning, which tend to be constant across disciplines. We
are preparing an article describing this method, and the case study on the Voynich
manuscript, in more detail.
Introduction:
This document describes a method for critically re-examining previous research into
difficult problems. This method integrates several bodies of literature on topics such
as expert reasoning, the craft skills of research, human error and normative logics.
The result is a systematic approach which can be used to identify points where
previous research may have gone down the wrong route as a result of human error; it
can also help identify other routes which had previously been missed.
The first author has tested this method informally on a long-standing problem from
cryptography (the Voynich manuscript). The previous consensus had been that the
manuscript was too linguistically complex to be a hoax; re-examination showed that it
was possible to hoax something of comparable complexity using sixteenth century
methods (Rugg, 2004).
The main stages and literatures involved in this method are as follows. Some
literatures appear in more than one stage.
Assessment of expertise:
Has the topic been studied by experts in the appropriate areas, or has some area of
expertise been missed?
There is a substantial and robust literature on the nature of expertise, which is relevant
to various aspects of Verifier. One well established finding in this literature is that
expertise is tightly bounded and brittle. Experts in one area are not automatically
experts in an area which appears closely related. Experts are frequently, but not
always, well aware of the boundaries of their expertise.
Key literatures: The classic texts in this area include de Groot’s study of chess
masters, and a general examination of expertise by Chi et al. Expertise has been
studied from various perspectives, including the development of Artificial Intelligence
expert systems, expert problem solving, and experts’ use of mental images in
problem-solving. It has been consistently found that experts in a wide range of fields
show similar behaviour. For instance, expertise is typically built on a huge amount of
knowledge about the expert’s field, rather than on reasoning ability; experts structure
their knowledge about their field in a richer and deeper manner than novices; experts
often use pattern-matching against previous cases as a fast way of solving problems;
and expertise is tightly bounded.
Implications: One implication of this literature is that experts operating outside their
area are likely to make mistakes – for instance, if they judge something to be unlikely
or impossible (since they will be unaware of possible solutions arising from other
fields). Another implication is that if an area of expertise is missing from research into
a particular problem, then a set of possible solutions or promising approaches may be
missed.
Although expertise in a given area is dependent on considerable knowledge which is
specific to that area, the literature on human error shows that the types of mistakes
which experts make are fairly consistent across different areas – (in brief, getting
something right depends on specific knowledge of the field, but getting it wrong
doesn’t require expertise…)
Establishing how research is actually conducted in the area:
What are the methods, craft skills, assumptions, etc, which are used, and what are the
implications?
If the expertise is relevant, then the next step is to understand how that expertise
actually operates (as opposed to how the books say it operates). There are various
relevant literatures, such as the sociology of science, and the literature on craft skills.
Key literatures: The literature on the history and sociology of science contains
numerous instances of social conventions and shared assumptions within a field
which have led to misunderstandings and errors. Steven Jay Gould’s essays are a good
introduction to this. This literature is the subject of considerable debate. For instance,
Latour’s well-publicised ethnographic study of scientists in action is open to the
accusation that by deliberately only studying the surface manifestation of what the
scientists were doing, he systematically missed the underlying reasoning and
knowledge which made those actions meaningful. Feynman’s analogy of cargo cults
is particularly relevant here: during the second world war, some Melanesian islanders
imitated the outward appearance of airstrip construction, in the hope that this would
lead to aircraft bringing in goods for them. In this instance, imitating the outward
appearance of a radio was of no use: only a real radio would fulfil the functions which
the islanders required. Similarly, anyone studying expert behaviour runs the risk of
seeing only the external appearances of what the experts are doing, and of missing the
underlying features which make the experts’ activity meaningful.
When studying how experts operate, it is important to understand how they really
operate, as opposed to the “public consumption” version. The literature on elicitation
methods is particularly relevant, since many parts of expertise involve tacit
knowledge, which cannot be accessed reliably or validly via interviews and
questionnaires: other methods are needed. An important set of expertise involves
“craft skills” – skills usually viewed as not important or formal enough to merit
description in textbooks – and these skills are usually tacit skills which are taught
through mentoring and experiential learning.
A related issue is that a technology, or a categorisation system, can steer researchers
in a particular direction. In the case of the Voynich manuscript research, for instance,
most previous researchers had used probability theory as a research tool, which had
led them away from some possible solutions. This is not the same as the Sapir-Whorf
hypothesis, at least in its stronger and popularised form; although plausible-looking,
that hypothesis has been largely discredited within linguistics.
Implications: finding out how experts actually operate is essential; without a clear and
correct understanding of this, any attempt to evaluate the experts’ reasoning is based
on sand. Finding out how experts actually operate is difficult.
Representing how knowledge is represented, structured and used in this area:
What metalanguage, concepts and classification do people use when doing research
in this area?
Once it is clear what the experts are actually doing in the problem area, then it is
necessary to examine the conceptual tools that they are using.
Key literatures: Key literatures include knowledge representation, category theory, set
theory and semantics.
Implications: This is important because the experts may be using language,
classifications or technologies which subtly channel them towards a particular route,
and away from other possible solutions – for instance, towards a single hierarchy
rather than a faceted system.
Identifying key assertions in this area:
What are the key assertions, and what are the conclusions that researchers have
drawn from them?
After the conceptual tools have been identified, the next stage is to identify the key
assertions underpinning most work in the problem area, and to examine how these
have been interpreted and used by experts in this area. These assertions are often
classic findings from famous past studies – for instance, Miller’s paper about the
limitations of human short term memory.
Key literatures: Relevant literatures include the sociology and history of science,
which provide insights into how researchers use evidence; other literatures such as
bibliometrics are also relevant. Knowledge representation is also important, and may
be complemented by approaches such as argumentation and cognitive causal
mapping.
Implications: This stage is important because a key belief, or a key conclusion, may
be based on a misunderstanding, a logical fallacy, an inaccurate summary of an
original source in a more widely-read secondary source, or on a primary source which
is erroneous. If this is the case, then any reasoning based on this source is
fundamentally flawed.
What sorts of errors are people likely to make in this context:
Are there classic types of errors that are particularly likely here?
Once we know what the problem area is like, what sorts of expertise are involved,
what the conceptual tools are, and what the key assertions are, then we can look for
classic types of human error which are particularly likely to occur in this context.
Key literatures: There are substantial and robust literatures on human error; on human
judgment and decision-making; on disasters, and various other related topics. For
instance, if a task involves making estimates of probability, then humans are prone to
various well-established types of error; if the same task involves making estimates of
frequencies, however, then these errors are less likely, even if the underlying task is
the same. Two of the main research traditions which we have used are the “heuristics
and biases” approach associated with Kahneman, Slovic & Tversky, and the
naturalist/frequentist approach associated with Gigerenzer, Wright, Ayton and
colleagues. There is also a literature on human error, as in Reason’s work, and a
considerable literature on human factors and disaster, as in Perrow’s work. There are
also relevant research traditions which are less widely known, such as Edwards’
pioneering work on errors in reasoning, and the literature on naïve physics.
Much expertise involves categorisation, and there are rich literatures on categorisation
and set theory, including approaches such as Rosch’s work on psychological
prototypes, and subsequent work on “natural categories”. In addition, there is an
extensive literature on logical fallacies and on rhetorical devices relating to these.
Some of these literatures appear in more than one stage of the Verifier approach.
Implications: Although expertise is tightly bounded, the evidence strongly suggests
that human error shows considerable regularities even across widely different
domains.
Have people actually made errors here, and if so, what are they?
This stage involves using appropriate formal logics to assess whether the experts’
reasoning is correct for each of their key conclusions. The stages described above
provide the foundations and materials for this. Formally assessing the correctness of
expert reasoning requires the preliminary stages described above, since otherwise the
assessor may be operating from faulty assumptions about how the experts are
operating.
Key literatures: The relevant literatures here include those on logic, formal methods
and decision theory, and also the judgment and decision-making literature.
Implications: This stage is important because researchers may have based their work
on an underlying logical error which has steered them in the wrong direction. It is also
a stage which is more complex than might be supposed, since choice of the correct
logic is non-trivial, and depends on a clear and detailed knowledge of the
preconditions for using each logic, and of whether these preconditions have been
satisfied. This has been a recurrent theme in previous attempts to assess human
reasoning within the heuristics and biases tradition, where a frequent problem has
been estimation of realistic baserates, combined with an understanding of the
subjects’ construing of the experimental task.
Summary:
The Verifier approach works by bringing together a substantial range of
complementary disciplines, to establish how experts have actually tackled a problem,
and to identify areas where the experts may have proceeded using faulty assumptions,
or have missed possible routes.
What is novel about this approach is the integration of these disciplines. The
integration is an important part of the method: using any of the individual disciplines
in isolation would probably not work, or have only limited success. For instance,
formally assessment of the experts’ reasoning depends on a correct understanding of
what the experts’ reasoning actually is, which in turn depends on correct use of
knowledge representation and on eliciting the experts’ reasoning in context.
Using this approach properly requires a significant knowledge of several different
disciplines. We are about to investigate whether this knowledge can be supplied by a
team of people with complementary skills, or whether it needs to be supplied by a
single individual who can see the “big picture” of the problem area in a way that a
team could not. We also plan to test this approach on a variety of problems, to gather
more information about the types of problems for which it is best suited.
Frequently asked questions:
Q: Why hasn’t this been done before?
A: It depends on a large number of different literatures from different disciplines. It
takes a long time to become familiar enough with all of these to be able to use them
properly in combination. People have used the individual components before in
isolation, with mixed results – a fair amount of success, but usually encountering
problems about validity and generalisability. For example, previous critical
assessment of expert reasoning has typically been criticised on the grounds that it
wasn’t based on an accurate understanding of the context within which the experts
were operating, or in realistic tasks. Getting that right requires a detailed knowledge
of how to elicit an accurate and complete picture of how the experts really operate,
which is a large and difficult problem in its own right.
Q: Can I do this myself?
A: If you’re prepared to read up the research literature on the topics above. Remember
that you’ll need to read the journal articles, etc, rather than the simplified textbook
accounts.
Q: Is this just another version of lateral thinking, or bringing in a fresh pair of eyes?
A: No. Lateral thinking is about creative generation of possible new solutions; this
approach is about re-examining what’s been done before, to see where and why
research has become stuck. The fresh pair of eyes may see a new solution, but longstanding problems will already have been seen by many fresh pairs of eyes without
success; this approach involves a systematic examination, based on knowledge of
classic errors.
Q: Would this be applicable to areas like research into AIDS?
A: We don’t know. This approach focuses on problems where research has hit
problems because of mistaken assumptions, rather than lack of knowledge. In many
fields, such as AIDS research, it’s clear that we don’t yet have all the parts of the
puzzle.
Q: Could you use a team-based approach to provide all the relevant expertise?
A: We’re looking into that. There would be obvious practical advantages in this, but it
may be that some key stages depend on one person having a big-picture overview.
Q: Did you show that the Voynich manuscript was a hoax?
A: No: the Voynich case study showed that a hoax of comparable complexity was
possible using sixteenth century techniques. That’s subtly different. Whether or not
you can prove unequivocally that something is a meaningless hoax is an interesting
question, and is the subject of considerable discussion in the Voynich research
community.
Q: How did the Voynich manuscript come into the story?
A: We were planning to apply for research funding to apply Verifier to Alzheimer’s,
and Gordon wanted to get some practical experience of using Verifier before we
wrote the bid. The manuscript looked like a suitable test case – a long-standing, welldefined problem involving an apparent paradox. (And, yes, he really did work at the
kitchen table, and his research budget really was about fifteen pounds – for paper, ink
and a calligraphic pen.)
Q: What areas will you look at next?
A: We’re considering various possibilities in a range of disciplines, including
medicine, biochemistry and physics.
Contact:
Queries should be addressed to the first author:
g.rugg@cs.keele.ac.uk
References:
Rugg, G. (2004a)
An elegant hoax? A possible solution to the Voynich manuscript
Cryptologia, XXVIII(1), January 2004, pp. 31-46
ISSN 0161 - 1194
Rugg, G. (2004b)
The Voynich Manuscript.
Scientific American, July 2004, pp. 104-109
Download