1 - IHMC Internal

advertisement
Hoffman, R. R. (2002, September). “An Empirical Comparison of Methods for Eliciting and Modeling Expert
Knowledge.” In Proceedings of the 46th Meeting of the Human Factors and Ergonomics Society. Santa Monica, CA:
Human Factors and Ergonomics Society.
AN EMPIRICAL COMPARISON OF METHODS
FOR ELICITING AND MODELING EXPERT KNOWLEDGE
Robert R. Hoffman, Ph. D.
John W. Coffey, Ed.D.
Mary Jo Carnot, MA
Joseph D. Novak, Ph.D.
Institute for Human and Machine Cognition
University of West Florida
The goal of this project was to apply a variety of methods of Cognitive Task Analysis (CTA) and Cognitive
Field Research (CFR) to support a process going all the way from knowledge elicitation to system
prototyping, and also use this as an opportunity to empirically compare and evaluate the methods. The
research relied upon the participation of expert, journeyman, and apprentice weather forecasters at the
Naval Training Meteorology and Oceanography Facility at Pensacola Naval Air Station. Methods included
protocol analysis, a number of types of structured interviews, workspace and work patterns analysis, the
Critical Decision Method, the Knowledge Audit, Concept Mapping, and the Cognitive Modeling
Procedure. The methods were compared in terms of (1) their yield of information that was useful in
modeling expert knowledge, (2) their yield in terms of identification of leverage points (where the
application of new technology might bring about positive change), and (3) their efficiency. Efficiency was
gauged in terms of total effort (time to prepare to run a procedure, plus time to run the procedure, plus time
to analyze the data) relative to the yield (number of leverage points identified, number of propositions
suitable for use in a model of domain knowledge). CTA/CFR methods supported the identification of
dozens of leverage points and also yielded behaviorally-validated models of the reasoning of expert
forecasters. Knowledge modeling using Concept-Mapping resulted in thousands of propositions covering
domain knowledge. The Critical Decision Method yielded a number of richly-populated case studies with
associated Decision Requirements Tables. Results speak to the relative efficiency of various methods of
CTA/CFR, and also the strengths of each of the methods. In addition to extending our empirical base on
the comparison of knowledge elicitation methods, a deliverable from the project was a knowledge model
that illustrates the integration of training support and performance aiding in a single system.
INTRODUCTION
The empirical comparison of knowledge elicitation (KE)
methods is nearly 20 years old, dated from Duda and
Shortliffe (1983), who recognized what came to be called the
"knowledge acquisition bottleneck"—that it took longer for
computer scientists to interview experts and build a
knowledge base than it did to actually write the software for
the expert system. The first systematic comparisons of
knowledge elicitation methods (i.e., Burton, Shadbolt,
Hedgecock, & Rugg, 1987; Hoffman 1987), and the first wave
of psychological research on expertise (e.g., Chi, Feltovich, &
Glaser, 1981; Chi, Glaser, & Farr, 1988; Glaser, et al., 1985;
Hoffman, 1992; Shanteau, 1992; Zsambok & Klein, 1997),
resulted in some guidance concerning knowledge elicitation
methodology (see Cooke, 1994; Hoffman, Shadbolt, Burton,
& Klein, 1995). In the subsequent years, new methods were
developed, including the Critical Decision Method (see
Hoffman, Crandall, & Shadbolt, 1998) and the Cognitive
Modeling Procedure (Hoffman, Coffey, & Carnot, 2000). In
addition, a number of research projects have attempted to
extend our empirical base on knowledge elicitation
methodology, including Thorsden's (1991) comparison of
Concept Mapping with the Critical Decision Method, and
Evans, Jentsch, Hitt, Bowers, and Salas' (2001) comparison of
Concept Mapping with methods for rating and ranking domain
concepts.
A factor that has made interpretation difficult is that
some studies have used college-age participants (and, of
course, assessments of the sorts of knowledge that they would
possess, e.g., sports, fashion). The transfer of the findings to
knowledge elicitation for genuine experts in significant
domains is questionable. A second and major difficulty in the
comparison of KE methods is the selection of dependent
variables. Hoffman (1987) compared methods in terms of
relative efficiency—the number of useful propositions
obtained per total task minute, where total task minute is the
time take to prepare to run the KE procedure, the time taken to
run the procedure, plus the time taken to analyze the data and
cull out the useful propositions; and where the adjective
"useful" was applied to any proposition that was not already
contained in the first-pass knowledge base that had been
constructed on the basis of a documentation analysis. (A
somewhat similar metric, number of elicited procedural rules,
was utilized in the work of Burton et al., 1987.) Hoffman's
initial purpose for creating an efficiency metric involved the
need of computer scientists to assess the usefulness of the
results in terms of building knowledge bases for expert
systems. While a somewhat reasonable metric from the
standpoint of first-generation expert systems, it would not
work for all of the purposes of either computer science or
experimental psychology.
For their dependent variable, Evans et al. (2001)
generated correlations of the similarity ratings among domain
concepts. This correlation approach makes it possible to lock
down the relative similarity of domain concepts and scale the
convergence among alternative methods (e.g., ranking versus
Concept-Mapping), but raw pairwise similarity of domain
concepts glosses over the meaning and content that are
necessary for the construction of models. Another factor that
clouds the interpretation of results from some studies that have
used the Concept-Mapping procedure is that it is often
apparent that the Concept Maps that are created (either by
domain practitioners or by practitioners in a collaboration with
the researchers) are lacking in the qualities that define Concept
Maps. These criteria, and their foundations in the theory of
meaningful learning, have been discussed by Novak and his
colleagues (e.g., Ausubel, Novak, & Hanesian, 1978; Novak,
1998). Criteria include semi-hierarchical morphology,
propositional coherence, labeled links, the use of cross-links,
and the avoidance of certain pitfalls that characterize Concept
Maps made by unpracticed individuals (including the creation
of "fans," "stacks," sentence-like "spill-overs," and other
features).
A final factor that makes interpretation difficult is the
fact that some studies involve apples-oranges comparisons.
For instance, to those who are familiar with the techniques, it
would make little sense to compare a concept sorting task to
the Concept-Mapping in terms of their ability to yield models
of expert reasoning—in fact, neither method is suited to that
purpose. One goal of the present research was to create a
comparison that involved a reasonable mix of alternative
methods, but also to put all of the methods on a more level
playing field. Hoffman's efficiency metric was re-defined as
the yield of useful propositions, useful in that they could be
used in a variety of ways (and not just in creating a knowledge
base for an expert system). One could seek to create models
of expert knowledge, or create models of expert reasoning. In
addition, a second metric was used to carve out the
applications aspect of KE research—the yield of leverage
points. A leverage point was defined as any aspect of the
domain or work practice where an infusion of new tools
(simple or complex) might result in an improvement in the
work. Leverage points were initially identified by the
researchers but were then affirmed by the domain practitioners
themselves. Also, there was ample opportunity for
convergence in that leverage points could be identified in the
results from more than one KE method.1
METHODS
Participants
Participants (n = 22) were senior expert civilian
forecasters, junior Aerographers (i.e., Apprentices who were
qualified as Observers) and senior Aerographers (i.e.,
Advanced Journeymen and Journeymen who were qualified as
Forecasters) at the Meteorology and Oceanography Training
Facility at Pensacola Naval Air Station.
Methods
The following methods of CTA/CFR were utilized:
Bootstrapping (documentation analysis, analysis of SOP
documents, the Recent Case Walkthrough method),
2. Proficiency Scaling (Participant Career Interviews;
comparison of experience versus forecast hit rates as a
measure of actual performance),
3. Client (i.e., pilots and pilot trainers) Interviews,
4. Workspace Analysis (Repeated photographic surveys,
detailed workspace mapping),
5. Workpatterns Analysis (live and videotaped Technical
Training Briefings, Watchfloor observations),
6. The Knowledge Audit,
7. Decision Requirements Analysis,
8. The Critical Decision Method,
9. The Cognitive Modeling Procedure (see Hoffman, et al.,
2000),
10. Protocol Analysis,
11. Concept Mapping using the CMap Tools software.
1.
RESULTS AND DISCUSSION
The conduct of some methods was relatively easy and
quick. For example, the Knowledge Audit procedure took a
total of 70 minutes. Others were quite time-consuming. For
instance, we conducted over 60 hours of Concept Mapping
sessions.
Full protocol analysis of a single knowledge modeling
session took a total of 18 hours to collect and analyze the data.
Results for protocol analysis confirm a finding from previous
studies (Burton, et al., 1990; Hoffman, et al., 1995), that full
protocol analysis (i.e., transcription and functional coding of
audiotaped protocol statements, with independent coders) is so
time consuming and effortful as to have a relatively low
effective yield. Knowledge models and reasoning models can
be developed, refined, and validated much more efficiently
(i.e., by orders of magnitude), using such procedures as
Concept Mapping and the Cognitive Modeling Procedure.
The CDM
The CDM worked effectively as a method for
generating rich case studies. However, the present results
provide a useful qualification to previous reports on the CDM
(e.g., Hoffman, et al., 1998). A lesson learned in the present
project was that in this domain and organizational context, the
conduct of each CDM session had to span more than one day.
On the first day the researcher would conduct the first 3 steps
in the CDM, then retreat to the lab to input the results into the
method's boilerplate forms. The researcher returned to the
workplace on a subsequent day to complete the procedure.
Weather forecasting cases are rich (in part because weather
phenomena can span days and usually involve dozens of data
types and scores of data fields). More importantly, expert
forecasters' memories of cases are often remarkably rich.
Indeed, there is a tradition in meteorology to convey important
lessons by means of case reports (e.g., Buckley & Leslie,
2000; any issue of The Monthly Weather Review). The impact
of this domain feature was that the conduct of the CDM was
time-consuming and effortful. Previous studies had suggested
that the CDM procedure takes about 2 hours, but those
measurements only looked at session time. The present study
involved a more inclusive measure of effort, total task time,
and in the present research context, the conduct of the CDM
took about 10 hours per case.
Concept Mapping
We are led to qualify a conclusion of Thorsden (1991),
who also used the CDM in conjunction with Concept
Mapping. Thorsden argued that the strength of the CDM lies
in eliciting "tacit knowledge" whereas Concept Mapping has
its strength in supporting the domain practitioner in laying out
a model of their tasks. Putting aside legitimate (and overdue)
debate about the meaning of the phrase "tacit knowledge," we
see the greatest strength of the CDM to be the generation of
rich case studies, including information about cues,
hypothetical reasoning, strategies, etc. (i.e., decision
requirements), all of which can be useful in the modeling of
the reasoning procedures or strategies. The strength of
Concept Mapping lies in generating models of domain
knowledge. Concept Mapping (either paper-and-pencil or
through the use of the CMap Tools software) can be used to
concoct diagrams that look like flow diagrams or decision
trees. Our experience is that it is easy for novices to see
Concept Maps as being flow-diagrams or models of
procedural knowledge. However, good Concept Maps can just
as easily describe the domain in a way that is task and device
independent. (And therefore the Concept Mapping procedure
can provide a window into the nature of the "true work;" as in
Vicente, 1999.)
To put a fine point on it, our calculations of yield
(number of mappable propositions generated per total task
minute) place Concept Mapping right on the mark in terms of
rate of gain of information for knowledge modeling. Previous
guidance (Hoffman, 1987) was that the "effective" knowledge
elicitation techniques yield two or more informative
propositions per total task minute. (Again by comparison, full
protocol analysis was calculated to yield less than one
informative proposition per total task minute.) In the present
research, it took about 1.5 to 2 hours to create, refine, and
verify each Concept-Map. (The Concept Maps contained an
average of 47 propositions. Verification took about seven
propositions per minute, for about seven minutes per ConceptMap.) The rate of gain for Concept Mapping was just about
two mappable propositions per session minute. If one takes
into account the fact that for the Concept Mapping procedure,
session time actually is total task time (i.e., there is no
preparation time and the result from a session is the final
product), it can be safely concluded that Concept Mapping is
as at least as efficient at generating models of domain
knowledge as any other method of knowledge elicitation.
Indeed, it is quite probably much more efficient.
Leverage Points
In terms of effectiveness at the identification of
leverage points, 35 in all were identified. Leverage points
ranging all the way from simple interventions (e.g., a tickle
board to remind the forecasters of when certain tasks need to
be conducted) to the very complex (e.g., an AI-enabled fusion
box to support the forecaster's creation of a visual
representation of their mental models of atmospheric
dynamics). All of the leverage points were affirmed as being
leverage points by one or more of the participating experts.2
Furthermore, all of the leverage points were confirmed
by their identification in more than one method. The leverage
points were placed into broad categories (e.g., decision-aids
for the forecaster, methods of presenting weather data to
pilots, methods of archiving organizational knowledge, etc.).
No one of the CTA/CFR methods resulted in leverage points
that were confined to any one category. We found it
interesting that, overall, the observational methods (e.g.,
Watchfloor observations) had a greater yield of identified
leverage points. On the other hand, acquiring those leverage
points took more time. For example, we observed 15 weather
briefings that were presented either to pilots or to the other
forecasters, resulting in 15 identified leverage points. But the
yield was 15/954 minutes = 0.016 leverage points per
observed minute.
APPLICATION TO SYSTEM DESIGN
After identifying the preservation of local weather
forecasting expertise as an organizationally-relevant leverage
point for a prototyping effort, the models of reasoning that
were created using the Cognitive Modeling Procedure, the
models of knowledge that were created using the Concept
Mapping Procedure, and the case studies that were created
using the CDM were all integrated into a Concept Map-based
Knowledge Model. This model contained 24 Concept-Maps,
which themselves contained a total of 1,129 propositions and
420 individual multimedia resources. This "System To
Organize Representations in Meteorology-Local Knowledge"
(STORM-LK) is not an expert system but instead uses the
Concept-Mapsa model of the expert's knowledgeto be the
interface to support the trainee or practicing forecaster as they
navigate through the work domain. A screen shot of a
Concept-Map is presented in Figure 1, below. The screen shot
in Figure 2 shows a Concept-Map overlaid with examples of
some of the kinds of resources that are directly accessible
from the clickable icons that are appended to many of the
concept-nodes. These include satellite images, charts, and
digitized videos allow the apprentice to "stand on the expert's
shoulders" by viewing mini-tutorials.
Also appended to concept-nodes are Concept Map icons
that take one to the Concept Map indicated by the conceptnode to which the icon is attached. The Top Map serves as a
"Map of Maps" in that it contains concept-nodes that designate
all of the other Concept-Maps (e.g., cold fronts,
thunderstorms, etc.). At the top node in every other Concept
Map is an icon that takes one back to the Top Map and to all
of the immediately associated Concept-Maps. For example,
the Top Map contains a concept-node for Hurricanes, and
appended to that are links to both of the Concept-Maps that
are about hurricanes (i.e., hurricane dynamics and hurricane
developmental phases). Through the use of these clickable
icons, one can meaningfully navigate from anywhere in the
knowledge model to anywhere else, in two clicks at most.
Disorientation in webspace becomes a non-issue.
STORM-LK contains all of the information in the
"Local Forecasting Handbook," and since the Concept Maps
are web-enabled, they allow real-time access to actual weather
data (radar satellite, computer forecasts, charts, etc.)—within a
context that provides the explanatory glue for the weather
understanding process. STORM-LK is intended also for use in
distance learning and collaboration, acceleration of the
acquisition of expertise, and knowledge preservation at the
organizational level. Evaluations and extensions of STORMLK are currently underway.
CONCLUSION
Our understanding of the strengths and weakness of
alternative CTA/CFR methods is becoming more refined, as is
our understanding that knowledge elicitation is one part of a
larger process of co-creative system design and evaluation
(see Hoffman & Woods, 2000; Hollnagel & Woods, 1984;
Potter, Roth, Woods, and Elm, 2000; Rasmussen, 1992;
Vicente, 1999), a larger process that embraces both the science
and aesthetics of the design of complex cognitive systems.
However, there remains a need for more work along these
lines, especially including studies in domains of expertise
having characteristics that differ from those of the domains
that have been studied to date. Additional KE methods can be
examined as well.
Footnote
1. To be sure, other researchers might have identified leverage points
other than the ones we identified.
2. We can note also that leverage point affirmation also took the form
of concrete action on the basis of our recommendations. For instance,
the physical layout of the watchfloor was changed.
References
Ausubel, D. P., Novak, J. D., & Hanesian, H. (1978).
Educational psychology: A cognitive view (2nd ed.). New York:
Holt, Rinehart and Winston.
Buckley, B. W., & Leslie, L. M. (2000). The Australian
Boxing Day storm of 1998--Synoptic description and numerical
simulations. Weather & Forecasting, 16, 543-558.
Burton, A. M., Shadbolt, N. R., Hedgecock, A. P., & Rugg,
G. (1987). A formal evaluation of a knowledge elicitation techniques
for expert systems: Domain 1. In D. S. Moralee (Ed.), Research and
development in expert systems, Vol 4. (pp.35-46). Cambridge:
University Press.
Chi, M. T. H, Feltovich, P. J., & Glaser, R. (1981).
Categorization and representation of physics problems by experts and
novices. Cognitive Science, 5, 121-152.
Chi, M. T. H., Glaser, R., & Farr, M. J. (Eds.) (1988). The
nature of expertise. Mahwah, NJ: Erlbaum.
Cooke, N. M. (1994). Varieties of knowledge elicitation
techniques. International Journal of human-Computer Studies, 41,
801-849.
Duda, R. O., & Shortliffe, E. H. (1983). Expert systems
research. Science, 220, 261-268.
Evans, A. W., Jentsch, F., Hitt, J. M., Bowers, C, & Salas,
E. (2001). Mental model assessments: Is there convergence among
different methods? In Proceedings of the Human Factors and
Ergonomics Society 45th Annual Meeting, (pp. 293-296). Santa
Monica, CA: Human Factors and Ergonomics Society.
Glaser, R., Lesgold, A. Lajoie, S., Eastman, R., Greenberg,
L., Logan, D., Magone, M., Weiner, A., Wolf, R., & Yengo, L.
(1985). Cognitive task analysis to enhance technical skills training
and assessment. Report, Learning Research and Development
Center, University of Pittsburgh, Pittsburgh, PA.
Hoffman, R. R. (1987, Summer). The problem of extracting
the knowledge of experts from the perspective of experimental
psychology. The AI Magazine, 8, 53-67.
Hoffman, R. R. (Ed.). (1992). The psychology of expertise:
Cognitive research and empirical AI. New York: Springer Verlag.
Hoffman, R. R., Shadbolt, N., Burton, A. M., & Klein, G.
A. (1995). Eliciting knowledge from experts: A methodological
analysis. Organizational Behavior and Human Decision Processes,
62, 129-158.
Hoffman, R. R., Coffey, J. W., & Carnot, M. J. (2000,
November). Is there a "fast track" into the black box?: The Cognitive
Models Procedure. Poster presented at the 41st annual meeting of the
Psychonomics Society, New Orleans, LA.
Hoffman, R. R., Crandall, B., & Shadbolt, N. (1998). A
case study in cognitive task analysis methodology: The Critical
Decision Method for the elicitation of expert knowledge. Human
Factors, 40, 254-276.
Hoffman, R. R., Shadbolt, N., Burton, A. M., & Klein, G.
A. (1995). Eliciting knowledge from experts: A methodological
analysis. Organizational Behavior and Human Decision Processes,
62, 129-158.
Hoffman, R. R., & Woods, D. D. (2000). Studying
cognitive systems in context. Human Factors, 42, 1-7.
Hollnagel, E. & Woods, D. D. (1983). Cognitive Systems
Engineering: New wine in new bottles. International Journal of ManMachine Studies, 18, 583-600.
Novak, J. D. (1998). Learning, creating, and
using knowledge. Mahwah, NJ: Erlbaum.
Potter, S. S., Roth, E. M., Woods, D. D., & Elm, W. C.
(2000). Bootstrapping multiple converging cognitive task analysis
techniques for system design. In J. M. Schraagen & S. F. Chipman
(Eds.), Cognitive task analysis (pp. 317-340). Mahwah, NJ: Erlbaum.
Rasmussen, J. (1992). Use of field studies for design of
workstations for integrated manufacturing systems. In M. Helander &
N. Nagamachi (Eds.), Design for manufacturability: A systems
approach to concurrent engineering and ergonomics (pp. 317-338).
London: Taylor and Francis.
Shanteau, J. (1992). Competence in experts: The role of
task characteristics. Organizational Behavior and Human Decision
Processes, 53, 252-266.
Thorsden, M. L. (1991). A comparison of two tools for
cognitive task analysis: Concept Mapping and the Critical Decision
Method. In Procedings of the Human Factors Society 35th Annual
Meeting (pp. 283-285). Santa Monica: CA: Human Factors Society
Vicente, K. (1999). Cognitive work analysis: Toward safe,
productive, and healthy computer-based work. Mahwah, NJ:
Erlbaum.
Zambok, C. E., & Klein, G. (Eds.) (1997). Naturalistic
decision making. Mahwah, NJ: Erlbaum.
Figure 1 A screen shot from STORM-LK showing a Concept-Map
Figure 2. A screen shot from STORM-LK showing example resources.
Download