Explanation Support for the Case-Based Reasoning Tool myCBR

advertisement
Explanation Support for the Case-Based Reasoning Tool myCBR
Daniel Bahls and Thomas Roth-Berghofer
German Research Center for Artificial Intelligence DFKI GmbH
Trippstadter Straße 122, 67663 Kaiserslautern, Germany, and
Department of Computer Science, University of Kaiserslautern,
P.O. Box 3049, 67653 Kaiserslautern
{daniel.bahls,thomas.roth-berghofer}@dfki.de
The Open-Source Tool myCBR
Abstract
1
myCBR is an open-source plug-in for the open-source ontology editor Protégé2 . It follows in the footsteps of the
integrated, Smalltalk-based CBR shell CBR-Works (Schulz
1999) with its rich point-and-click user interface. We also
implemented a basic export interface to exchange similarity
measures with jColibri, a powerful Java-based CBR framework3 , which allows more complex reasoning.
In Protégé users define classes and attributes in an objectoriented way. Protégé also manages instances of these
classes, which we interpret as cases. So vocabulary and case
base knowledge containers are already handled by Protégé.
Case-Based Reasoning, in short, is the process of solving
new problems based on solutions of similar past problems,
much like humans solve many problems. myCBR, an extension of the ontology editor Protégé, provides such similaritybased retrieval functionality. Moreover, the user is supported
in modelling appropriate similarity measures by forward and
backward explanations.
Case-Based Reasoning
Case-Based Reasoning (CBR), according to (Aamodt &
Plaza 1994), basically follows this pattern: One formulates
a problem as a query case and the repository of already experienced problem and solution pairs (the case base) will be
ordered by similarity to the given query. The most similar
cases are used to generate the solution for the posed problem. After a solution is retrieved, the new case (consisting
of the new problem and the retrieved solution) is stored in
the case base. This new experience can be used in the next
retrieval. The CBR system learns.
CBR systems’ knowledge can be divided in four knowledge containers (Richter 1995):
• Vocabulary This knowledge container is the basis for the
three other containers. It defines attributes and classes
for query and case descriptions. In object-oriented CBR
systems the vocabulary consists of numerical, symbolic,
plain text, and instance type attributes.
• Case Base This is the collection of previously experienced cases (traditional view) or products.
• Similarity Measure The degree of similarity between a
query and a case is defined by metrics. Local similarity measures define similarities for each attribute. Global
similarity measures, e.g., weighted sum, minimum, or
maximum, aggregate the local similarity measures into
one similarity value on each class level.
• Adaptation Rules This container provides knowledge for
adapting the solution of a case to fit the query. This is
often realised with rules.
Adaptation rules are outside the scope of this work. We concentrate on the support of similarity measure modelling.
Figure 1: Similarity measure editors of myCBR
The myCBR plug-in adds several similarity measure editors, which can be applied to the classes and attributes of
an ontology. Its retrieval engine finds similar cases for a
specified query. Additionally, CSV files can be imported
for which a simple similarity model is built automatically if
none exists. A standalone retrieval engine allows for easy
integration into other applications. Figure 1 shows a screenshot of some of the available editors.
1
http://mycbr-project.net
http://protege.stanford.edu/
3
http://gaia.fdi.ucm.es/projects/jcolibri/
2
c 2007, Association for the Advancement of Artificial
Copyright Intelligence (www.aaai.org). All rights reserved.
1844
For symbolic attribute types similarity values for all possible attribute values are defined using a table. The columns
are headed by the symbols defined in the ontology, as are the
rows. The similarity value for a query q and a case c can now
be found in row q at column c (see lower half of Figure 1).
This works fine for a small set of symbols. Otherwise, the
effort of setting up and maintaining such a table becomes
cumbersome, since the amount of entries increases quadratically with the number of symbols. Therefore, another editing mode, based on taxonomies, is available (see right half
of Figure 1). One can build up a tree structure upon the
symbols where the distance between symbols indicate their
similarity.
To define similarity measures for numerical attributes,
some simplification is helpful. In order to offer an easyto-use interface (see left half of Figure 1), we reduce the
dimensionality of the similarity function sim(q, c) by either calculating the quotient q/c or the difference q − c.
This value is now the parameter for a helper function
h : D → [0, 1], which graph is editable by pointing and
clicking. Note that the domain D has the range [min −
max, max − min] for the difference mode, and it has the
range [min/max, max/min] for the quotient mode, where
min and max are the range limits of the numerical attribute.
Obviously, the quotient mode is applicable for numerical attributes with a value range not containing zero.
This is just a selection of available local similarity measures, for which explanation support exists.
ing. If the user wants to know why the case with the highest
similarity for a certain attribute is not under the best five regarding total similarity, he or she should use this option and
examine its remaining attributes for similarity to the query.
Forward Explanations While developing a CBR system
an important question is: Does the similarity measure lead
to the appropriate cases for a given query? Forward explanations predict the behaviour of the system during modelling
time and explains the interdependencies between similarity
measure and case base. For this, a central explanation component analyses the case base and gathers some statistics.
It caches the value distribution for each attribute to make it
available for peripheral explanation components. The value
distribution itself can already be quite helpful and may reveal parts of similarity measures that are in fact never used.
Or they reveal the missing of a border case, which is important for exception treatment.
But we want to go a little further. As soon as we have a
history of submitted queries, we can obtain a distribution of
(q, c)-pairs. So we can examine, which parts of the similarity measures are often used. Thus, we can find out, which
parts of the similarity measures are of high or low relevance.
Setting up a CBR system, there is no such history. But, assuming that the value distribution in the case base equals the
value distribution of real queries, one can build up a distribution as described above. Although such an assumption may
be critical, it still delivers useful information.
CBR systems need to become easier to set-up and maintain. For that, in our view, the inner workings must be easier to comprehend. myCBR is intended to be an integrated,
yet open experimentation platform for improved communication between a complex information system and its users.
Explanation Support in myCBR
Explanations, in principle, are answers to questions. In this
paper we concentrate on questions the knowledge engineer
might have during similarity measure modelling (see (RothBerghofer 2004) for more details on explanation sources
in CBR systems). myCBR provides two kinds of explanations: forward and backward explanations (Richter & RothBerghofer 2007). Forward explanations explain indirectly
by showing different ways to optimise a given result. They
open up possibilities for the exploratory use of a device or
application. Backward explanations explain the result of a
process and how it was obtained. Here, we provide a way
to understand the results of myCBR’s similarity calculation
(backward explanations) and to explore the case base contents (forward explanations).
Acknowledgements We thank our colleagues Armin
Stahl and Andreas Rumpf for their help in realising this
work, which has been funded partially by the federal state
Rhineland-Palatinate, project ADIB (Adaptive Provision of
Information).
References
Aamodt, A., and Plaza, E. 1994. Case-Based Reasoning: Foundational issues, methodological variations, and
system approaches. AI Communications 7(1):39–59.
Richter, M. M., and Roth-Berghofer, T. 2007. Explanation,
information and utility. Unpublished.
Richter, M. M. 1995. The knowledge contained in similarity measures. Invited Talk at ICCBR’95, Sesimbra, Portugal.
Roth-Berghofer, T. R. 2004. Explanations and CaseBased Reasoning: Foundational issues. In Funk, P., and
González-Calero, P. A., eds., Advances in Case-Based Reasoning, 389–403. Springer-Verlag.
Schulz, S. 1999. CBR-Works: A state-of-the-art shell for
case-based application building. In Melis, E., ed., Proceedings of GWCBR’99, Würzburg, Germany, 166–175. University of Würzburg.
Backward Explanations After the vocabulary has been
set up, some cases have been injected into the case base, and
similarity measures have been defined, the CBR system is
ready for retrieval. For a query the system delivers a ranking of the case base. But since the model may be complex,
the retrieval result may be quite surprising and need some
explanation. To increase transparency, myCBR creates an
explanation object for each case during the retrieval. This
tree-like data structure stores global and local similarity values as comments for each attribute. These retrieval details
are presented to the user by tool tips.
Another valuable feature is the option to find the first most
similar cases with respect to a single attribute in this rank-
1845
Download