Ontology Analysis for the Semantic Web Mala Mehrotra Introduction

From: AAAI Technical Report WS-02-11. Compilation copyright © 2002, AAAI (www.aaai.org). All rights reserved.
Ontology Analysis for the Semantic Web
Mala Mehrotra
Pragati Synergetic Research, Inc.
922 Liberty Ct.
Cupertino, CA 95014
mm@pragati-inc.com
•
retrospective analysis aid (by providing a
hindsight mechanism to reevaluate the efficacy
of the declared ontological commitments in light
of their actual usage) (Mehrotra 1996)
• discovery mechanism for reusable software
patterns in the ontology and knowledge base (by
discovering recurring software patterns in the
ontology and knowledge base) (Mehrotra et.al
1998)
• meta-property annotation aid (for noting subtle
and intricate relationships across various concept
terms in the ontology), and
• merging and alignment aid across multiplyauthored ontologies and knowledge bases (by
extracting usage patterns in axiom clusters
having common functionality)
In this paper, we will highlight our experiences with
applying the MVP-CA tool to the analysis of various
knowledge bases and their ontologies. The analysis is
performed with a view to extracting subtle inter-concept
relationships, as well as, reusable usage patterns. In the
Rapid Knowledge Formation (RKF) project for DARPA
(DARPA 2000), our efforts are currently focused on
providing an infrastructure for efficient analysis of
knowledge bases in the Shaken (SRI-led team) and
Kraken (Cycorp-led team) systems. Both Shaken and
Kraken system provide frameworks for enabling subject
matter experts (SMEs) to author knowledge bases. The
former is based on a component-driven approach based on
the belief that a handful of generic core components in the
base ontology should suffice as building blocks for
complex knowledge authoring tasks. Kraken strives to
enable SMEs to author knowledge-bases by axiomatizing
various types of common-sense knowledge in its upper
ontology, the Integrated Knowledge Base (IKB). We
have applied MVP-CA analysis to the core KB of Shaken
as well as on Cyc’s IKB. In another independent project
for ONR, we have analyzed ontologies for multi-agent
command and control knowledge based systems designed
to support Navy and Marines logistics operations. We will
be sharing some of our experiences with all these systems
in this paper.
We expect that the emerging DAML/RuleML
framework for the Semantic Web will encounter many
similar issues of comprehension, reliability and reuse
which we expose through the MVP-CA tool; hence, a
similar analysis would be applicable. We believe that the
annotation frameworks for the semantic web should have
the necessary support infrastructure to capture the types of
Introduction
Ontology engineering will become an increasingly
important discipline as ontologies scale up on the
semantic web. The enabling mechanism for the semantic
web will undoubtedly lie in the construction of ontologies
to address the diversity of web services. For such a
dynamic environment to be realized, the underlying
ontological infrastructure will need to be extremely
adaptable and reliable (Everett and Bobrow 2002). In
addition, since sizeable effort will be spent specifying
these ontologies for various domains, it is important that
they be built and maintained cost-effectively. A high-level
and domain-independent approach is needed to annotate
these ontologies appropriately, so that they can be
maximally utilized. Smart tools are required that allow
developers to
• familiarize themselves rapidly with the terms and
concepts in the ontology for a knowledge
base(KB),
• exploit and reuse preexisting knowledge
(Chaudhri et.al 2000) through intelligent
analysis, and
• merge and align concepts across different
knowledge bases reliably and efficiently.
Knowledge comprehension can be aided by graphical
browsing and editing tools (Paley et al. 1997; Fikes et al.
1997), and ontology merging tools such as Chimaera
(McGuinness et al. 2000) and PROMPT (Noy and Musen
2000). However, most such tools are limited to taxonomic
knowledge. In particular, most tools do not support the
comprehension of collections of rules in the context of
their usage. Rather, they provide only a single viewpoint
of a system; and do not focus on software engineering
issues, such as comprehension, maintenance, management
and verification, for collections of rules.
The Multi-ViewPointClustering Analysis (MVP-CA)
technology, developed by Pragati Inc., provides an
infrastructure for ontology analysis and knowledge base
evaluation by clustering the axioms that utilize the
ontology (Mehrotra 1995). This type of clustering allows
the terms and concepts in a knowledge base to be exposed
in the context of their usage (as opposed to the context of
their declaration). Given this approach for analysis, MVPCA tool can be especially useful to ontological engineers
as a:
• navigational aid (by familiarizing them rapidly
with the knowledge base artifacts in their
situated contexts)
1
clusters. Graphical representations of the clustering
process, such as dendrograms, further aid the user in
establishing links across various concept terms in the
knowledge base. In addition, the tool provides several
views of the clusters at the pattern, rule, and cluster levels
to aid the user in identifying the relevant clusters. We
currently have a limited repertoire of automatic detection
routines for flagging clusters that are relevant according
to common analysis goals.
information we are about to discuss. Usually we find that
pseudo-partitions—that is, islands of concepts that mostly
refer to each other and only occasionally to concepts
outside the island—appear as a KB evolves. Clustering
offers a very effective way to identify these candidates for
a partitioning and annotation mechanism. It is very
important to understand trends of usage patterns as
ontologies evolve on the semantic web, so as to enable its
reuse and sharing in an effective manner.
In the next section we provide a brief overview of the
MVP-CA approach. In the Experimental Results section,
we present three broad categories of reusable ontological
information, discovered in the course of our analysis. In
the final section we conclude with some of the open
issues that need to be addressed with respect to extending
our work for web ontology markup.
Experimental Results
Clustering through the MVP-CA tool has exposed a range
of potential reuse opportunities in the knowledge bases
analyzed. Through our analysis we show how higher-level
axioms can be developed for the ontological concepts in
these knowledge bases. Formation of higher-level axioms
can encapsulate valuable meta-information for the
concept terms and their usage patterns in a very succinct
manner. We have categorized this meta-information into
three broad classes: templates, reified meta-concepts and
clichés. Before we discuss these categories in detail, we
will first describe briefly the three knowledge bases used
in our experiments so as to provide to the reader the
context in which our analysis was performed.
The core knowledge base in KM (Knowledge Machine)
(Clark and Porter 1999), developed at University of Texas
at Austin, contains a relatively wide range of concepts and
forms the ontological basis for knowledge entry in RKF’s
Shaken system. It forms the backbone ontology of Shaken
for development of pump-priming as well as SMEauthored knowledge. The bulk of concepts in the core KB
deal with spatial entities, relationships, and events. There
are also concepts that address agent actions, biological
organisms, etc. Most concepts have one axiom that
defines their “owns” slots (properties of the class itself)
and a second axiom that defines their “member” slots
(instance properties). Many concepts also have additional
axioms that describe supplementary information such as
text generation and test cases.
The CYC IKB provides the base ontology for RKF’s
Kraken system. We analyzed the spatial slice of Cycorp's
IKB (IKB is a subset of the Cyc Knowledge Base licensed
to the participants of DARPA's RKF program) which has
been developed over several years (Lenat and Guha
1989). In the course of this development, it has evolved
considerably. We clustered these rules several times using
different parameters in order to analyze the axiom set
from multiple perspectives. Higher level concept groups,
such as Orientation, Containment, Portals, etc. emerged
from the merging of closely related smaller clusters. It is
interesting to note that formal investigations of spatial
representation have also identified similar key concepts
including topology, mereology, orientation, distance, size,
and shape. There has been substantive work on each of
these aspects of spatial representation (Forbus 1984; Cohn
and Hazarika 2001), and numerous formal models of
spatial representation are available. None of the existing
Multi-ViewPoint Clustering Analysis
(MVP-CA) Approach
Pragati’s Multi-ViewPoint Clustering Analysis (MVPCA) tool facilitates analysis of knowledge-based systems
by clustering KB rules that share significant common
properties. It exposes ontology developers to the
semantics of a knowledge-based system through semiautomatic clustering of its rules.
The MVP-CA tool consists of three stages: parsing,
cluster generation, and cluster analysis. In the parsing
phase, a front-end parser translates a knowledge base’s
axioms from their original form into a languageindependent representation. The user can specify
numerous rule and pattern transformations in this stage to
eliminate noise in the data. The cluster generation phase
applies a hierarchical agglomerative clustering algorithm
to the transformed rules. During each iteration, the two
most similar clusters merge to form a new cluster. The
order of merging forms a hierarchy of clusters. Similarity
between rules is defined by a set of heuristic distance
metrics (Mehrotra and Wild 1995), which the user
chooses based on the nature of the task performed by the
rule base (e.g., classification, diagnosis, control, etc.)
(Chandrasekharan 1986). These heuristic-based distance
metrics have evolved from our experiences with different
types of knowledge bases. In the cluster analysis phase,
the user interacts with the tool to pinpoint the relevant
clusters from the generated pool of clusters. A cluster’s
relevance depends entirely on the objective of the
analysis. However, the tool does provide some automatic
support for common analysis goals, such as flagging
clusters whose rules contain very similar clauses.
In its primary role as a comprehension aid, MVP-CA
provides support for honing in on clusters that provide
insight into the important conceptual regions in the
knowledge base. In order to identify such seed concepts,
the tool generates information about patterns and clusters,
such as, pattern frequency, cluster size, the dominant
patterns of a cluster, etc. The user can utilize this
information to assess the quality and relevance of the
2
formal models, however, combine these different aspects
of spatial representation into one theory as done in Cyc.
The IMMACCS (Integrated Marine Multi-Agent
Command and Control System) system, developed by the
CAD Research Center (Pohl et.al 1999), provides near
real-time decision support for military command and
control personnel in the form of enhanced situation
awareness. Critical to achieving this goal is the
IMMACCS Object Model, an ontology that represents
relevant objects in terms of their behavioral
characteristics and relationships to other objects. The
agent engine takes charge of the dynamic problemsolving aspects in the environment and generates the
desired views of the battle space to support the planning
and training activities. The informational aspects of the
objects are thus separated from the logic aspects of the
system. The FIRES agent for example, responds to “Call
for Fire” messages in the system. In response to such a
message its purpose is to select the best weapon based on
availability, deliverability and acceptability. To
accomplish this goal it accesses concepts such as range,
time of flight, target type, urgency, circular error of
probability (CEP), effective casualty rate (ECR),
availability and rules of engagement (ROE) concepts
from the IMMACCS object model. The deconfliction
rules in the FIRES agent also address the trajectory of the
munitions relative to the position of other friendly assets
and infra structure objects.
(every Spatial-Entity has
(location ((must-be-a Place)))
(is-outside ((must-be-a Spatial-Entity)))
(does-not-enclose ((must-be-a SpatialEntity)))
(is-inside ((must-be-a Spatial-Entity)))
(encloses ((must-be-a Spatial-Entity)))
(abuts ((must-be-a Spatial-Entity)
(excluded-values (Self))))
(is-above ((must-be-a Spatial-Entity)))
(is-below ((must-be-a Spatial-Entity)))
(is-along ((must-be-a Spatial-Entity)))
(is-at ((must-be-a Spatial-Entity)))
(is-behind ((must-be-a Spatial-Entity)))
(is-in-front-of ((must-be-a SpatialEntity)))
(is-on ((must-be-a Spatial-Entity)))
(has-on-it ((must-be-a Spatial-Entity)))
(is-opposite ((must-be-a Spatial-Entity)))
(is-over ((must-be-a Spatial-Entity)))
(is-under ((must-be-a Spatial-Entity)))
(is-near ((must-be-a Spatial-Entity)))
..)
Figure 1
structure in a higher-level axiom/rule, having open slots
to be instantiated with the appropriate bindings as slot
fillers. Templating thus reduces to basically being able to
parameterize a set of axioms or rules so that future
extensions or modifications of the knowledge base can
proceed in an intelligent, focused manner. In our
experience with various types of knowledge bases,
opportunities for template formation arise repeatedly,
regardless of the domain or the representation language.
Below we present an example from each of the three
knowledge bases analyzed recently. Some of the axioms
have been abbreviated to conserve space.
Templatization of Usage Patterns
Clustering often juxtaposes rules that share an overall
structural similarity, but differ at a small number of
variation points. There is generally a conceptual
coherence to these sets of axioms, which can be identified
and represented in terms of a template. Once the common
structure across rules has been recognized, the problem of
template formation is reduced to factoring out this
Templates Applied to Slot Propagation in Concepts:
Spatial-Entity, Place, and Move. The axiom cluster
every Place has
(location ((exactly 0 Place)))
(is-near ((forall (the location-of of Self) (the is-near of It))))
(abuts ((forall (the location-of of Self) (the abuts of It))))
(is-above ((forall (the location-of of Self) (the is-above of It))))
(is-below ((forall (the location-of of Self) (the is-below of It))))
(is-along ((forall (the location-of of Self) (the is-along of It))))
(is-at ((forall (the location-of of Self) (the is-at of It))))
(is-at-of ((forall (the location-of of Self) (the is-at-of of It))))
(is-beside ((forall (the location-of of Self) (the is-beside of It))))
(is-between ((forall (the location-of of Self) (the is-between of It))))
(is-behind ((forall (the location-of of Self) (the is-behind of It))))
(is-in-front-of ((forall (the location-of of Self) (the is-in-front-of of It))))
(is-inside ((forall (the location-of of Self) (the is-inside of It))))
(encloses ((forall (the location-of of Self) (the encloses of It))))
(is-on ((forall (the location-of of Self) (the is-on of It))))
(has-on-it ((forall (the location-of of Self) (the has-on-it of It))))
(is-opposite ((forall (the location-of of Self) (the is-opposite of It))))
(is-outside ((forall (the location-of of Self) (the is-outside of It))))
(does-not-enclose ((forall (the location-of of Self) (the does-not-enclose of It))))
(is-over ((forall (the location-of of Self) (the is-over of It))))
(is-under ((forall (the location-of of Self) (the is-under of It))))
)
Figure 2
3
((every Move has
...
(destination ((must-be-a Spatial-Entity)))
...
(add-list
((if (has-value (the destination of Self))
then
(forall (the object of Self)
(:set
(:triple It is-near (the is-near of (the destination of Self)))
(:triple It abuts (the abuts of (the destination of Self)))
(:triple It is-beside (the is-beside of (the destination of Self)))
(:triple It is-between (the is-between of (the destination of Self)))
... )))))
(del-list
((forall (the object of Self)
(:set
(:triple It location (the location of It))
(forall2 (the is-near of It)
(if (not ((the is-near of (the destination of Self)) includes It2))
then (:triple It is-near It2)))
(forall2 (the abuts of It)
(if (not ((the abuts of (the destination of Self)) includes It2))
then (:triple It abuts It2)))
(forall2 (the is-beside of It)
(if (not ((the is-beside of (the destination of Self)) includes It2))
then (:triple It is-beside It2)))
(forall2 (the is-between of It)
(if (not ((the is-between of (the destination of Self)) includes It2))
then (:triple It is-between It2)))
... )))))
Figure 3
through our tool. 18 slots are common to all three axioms.
Each concept is either defined by or manipulates a long
list of shared slots such as is-near, abuts, etc.
These slots describe how a spatial entity is situated
relative to other spatial entities. Even though this is a
useful and appropriate way to represent such
relationships, there is a potentially serious KB
maintenance issue in that each of the slots must be
repeatedly dealt with every time one of the concepts, such
as SpatialEntity is specialized (Place) or
manipulated (Move). While the issue is not
overwhelming for just three concepts, it becomes more
serious as the knowledge base scales up and, perhaps,
other KBs are constructed on top of the core.
shown in the Figures 1, 2 and 3 consists of the member
slot definition axioms for three concepts: SpatialEntity, Place, and Move respectively in UT’s KMcore. Superficially, the connections between the three
concepts are not obvious, since they are not lexically
close. Since Place is a subclass of Spatial-Entity;
this relationship could, in principle, be obtained through
the taxonomic link. However, Move makes use of both
Spatial-Entity and Place. This connection is not
obvious from the ontological hierarchy as Move is
derived from the Action class, whereas SpatialEntity and Place are derived from the Entity class.
A close inspection of the definitions of these three
concepts reveals why the concepts clustered together
(<property> ((forall (the location-of of Self) (the <property> of It))))
Figure 4: Template 1
(:triple It <property> (the <property> of (the destination of Self)))
Figure 5: Template 2
(forall2 (the <property> of It)
(if (not ((the <property> of (the destination of Self)) includes It2))
then (:triple It <property> It2)))
Figure 6: Template 3
4
One possible way to deal, at least partially, with
repetitive slot references is to parameterize the frame
definitions. For example, the slots for Place that
specify its spatial relationships share a similar
characteristic: they propagate to all of the spatial entities
located in that Place. There is no significant difference
between how is-near propagates and how abuts
propagates. Most of the pre/post-conditions for Move
display parallel, though not identical, slot propagation
behavior.
For Place, a template can take the form of Template1
in Figure 4.
To create a concrete axiom, one would bind the
<property> parameter to an appropriate property
name. By applying the template to the abuts property,
for instance, we would end up with the following:
If Spatial-Entity has slot with <property> then
Place has clause <Template 1> and
Move has clause <Template 2> in add-list
and clause <Template 3> in del-list
Figure 7: Higher Level Axiom
retrievable annotations so that this knowledge can be
reused.
A hind-sight point to be noted here is that even for the
concepts in this cluster, several slot properties, such as,
is-beside, and is-between, are mentioned by
Place and Move but not by Spatial-Entity.
Move, for instance, refers to the in-between slot of
its destination, where destination
is
constrained to Spatial-Entity rather than to
Place, and thus might not have the in-between slot
defined. Such violations in domain constraints are
commonplace in our experience. Clustering makes it very
easy to detect such differences across axioms as they
surface easily when we align the axioms for analysis.
For the most part we found that, concepts in KM are
thoughtfully designed and would be applicable to a
variety of knowledge-based applications. However, even
the best-designed software systems inevitably encounter
issues as they scale up. At some point in the KB’s
evolution, this broad scope might become a limiting
factor. Grouping together many concepts from different
subject areas may make it more difficult for KB builders
to find and extend the most appropriate concepts. Some
method of grouping the knowledge base into more-or-less
discrete “chunks” can add significant value. By applying
MVP-CA technology to the UT core knowledge base, we
have discovered a number of clusters that can aid in
creating more reusable components, partitioning
knowledge where appropriate, and exposing errors.
(abuts ((forall (the location-of of Self)
(the abuts of It))))
From the concept of Move, Templates 2 and 3, as
shown in Figures 5 and 6, result.
A higher level axiom can be formed connecting the
concepts Spatial-Entity, Place, and Move
along the lines shown in Figure 7.
Of course, such templates wouldn’t be applicable to
every slot—there is obviously a need to give special
treatment to certain slots, such as location in Place.
However, templates would ensure more uniform
treatment of similar logic—both within a concept and,
perhaps more importantly, across concepts. The effort
then reduces to listing all the shared slots for each concept
that uses them.
Templates might be implemented in a variety of ways,
ranging from language extensions to a simple userinterface mechanism. The important message is to be able
to record the connections across the concepts as
Rule # 98
(#$implies (#$suspendedIn ?OBJ ?FLU)
(#$physicalStructuralAttributes ?FLU #$Pourable))
Rule # 140
(#$implies (#$in-ImmersedFully ?OBJ ?FLU)
(#$physicalStructuralAttributes ?FLU #$Pourable))
Rule # 160
(#$implies (#$in-ContFullOf ?STUFF ?CONT)
(#$physicalStructuralAttributes ?STUFF #$Pourable))
Figure 8: Pourable Cluster in Cyc
(#$implies (<some-property> <obj1> <obj2>)
(#$physicalStructuralAttributes <obj1> <some-physicial-structural-attribute>))
[from #160]
Figure 9: Template 1 from Pourable cluster
(#$implies (<some-property> <obj1> <obj2>)
(#$physicalStructuralAttributes <obj2> <some-physicial-structural-attribute>))
[from #98, #140]
Figure 10: Template 2 from Pourable Cluster
5
Figure 11: Conflicts Cluster
therefore, that Pourable would be better represented as
a property of fluids, rather than being declared as a
physicalStructuralAttributes. It would make
more sense to structure the axioms so that if an object is
immersed in something we can first infer that the
something must be a fluid, and derive the fluid attributes
from there. Thus this cluster suggests an intermediate
concept of "fluid" or "fluid properties" in the
ontology. This would enable Pourable to participate
directly in concepts such as suspension and immersion
that are currently only tangentially related (through fluids)
to pourability. Such observations can be made only when
one sees concepts situated in the context of their usage, as
opposed to in the context of their declaration in the
ontology hierarchy.
Even when ontological engineers take great care to
define the right information at the right conceptual level
in the ontology, it is often the case that not all the
different aspects of object and problem definition can be
foreseen a priori in the forward engineering phase of the
project. Often certain subtle but important relationships
become evident through time, after studying the patterns
of data/information accesses in the system.
Applying Templates to Cyc Spatial Axioms: We list
here similar experiences in templatization when we
analyzed the spatial clusters from the Cyc IKB. In Figure
8 we present a cluster that shows several fluid-related
concepts, specifically those linked to "Pourable".
Upon close examination of the structural aspects of this
cluster where "physicalStructuralAttributes" seems to be a
dominant concept, rules 98, 140, and 160 naturally give
rise to one of templates 1 and 2 as shown in Figures 9 and
10. The second template is the same as the first, except
the argument order for <some-property> happens to be
reversed.
However, an argument to be made here is that the
clustering points to a possible flaw in the ontology design
for the KB, by showing that one of the arguments to
<some-property> is always ignored. The cluster shown
above is part of the Object Attributes cluster, where the
predicate term physicalStructuralAttributes
brings together various shapes asserted in the IKB slice
related
to
“sheets
&
corners”
and
“LongAndThin”. However, the term Pourable falls
under the umbrella of object attributes even though it is
not really a shape concept, because, the Cyc axioms assert
Pourable as a physicalStructuralAttribute,
just like LongAndThin, SheetShaped, Corner2d, etc. The rules are designed to find the fluid in a fluidobject or fluid-container relationship; they then conclude
some property about the fluid. This approach seems
overly specific -- the fact that an object happens to have
some relation to the fluid (being immersed in it, for
instance) doesn't really make any difference to the
fundamental properties of the fluid. It is evident
Applying Templates to IMMACCS Axioms: In our
experience with the IMMACCS system, we discovered
that even though the rules in the agents are well-organized
in their respective groups, each rule is between two to
three pages long, with many clauses repeated across rules.
Since each rule references many classes defined in the
object model, the knowledge base becomes very opaque
from the standpoint of human comprehension. In order to
understand each rule, one has to undergo multiple context
6
much wider scope of applicability for the different types
of conflict situations that can arise for weapon targeting.
switches. Many sets of rules are slight variations on a
base concept, implying a need for some sort of factoring:
either by breaking up the rules themselves into
shared/unshared pieces, or creating a new superclass in
the ontology to represent the similarities in the objects
manipulated by the rules, or both.
One IMMACCS cluster dealing with firing conflicts
contains two parallel sets of rules that describe two
possible sources of conflict: buildings and rotary wings.
Since the rules are very long we list here just the rule
names. The two sub clusters under the Conflicts cluster
are Conflict-due-to-blocking-Building, with rules names:
Structure_Trajectory_Weapon,
Structure_Trajectory_Entity,
Structure_Trajectory_Platform
and Conflict-due-to-blocking-Rotary-Wing, with rules
names:
RotaryWing_Trajectory_Weapon,
RotaryWing_Trajectory_Entity,
RotaryWing_Trajectory_Platform.
In Figure 11, the green nodes represent concepts from the
Object Model that are referenced by rules in both
subclusters. The greyed objects in the figure are the
concepts that are referenced by only one of the two
subclusters for Conflict. The concept in red is the label we
have assigned to the cluster based on our analysis. Our
proposed change was to formulate a more general concept
of Conflict-due-to-blocking-object in the object model
ontology, formulate a templatized base rule for this and
instantiate the base rule with the objects, rotary wing or
building, as and when the need arises. Details for this can
be obtained from (Mehrotra and Bobrovnikoff 2001).
Abstracting rules to this level of generality provides a
Reification
The purpose of reification in knowledge based systems is
to be able to represent and store certain useful behavior
patterns in the system in a parsimonious manner for easy
recall and reuse. In this spirit, by clustering similar
axioms/rules together, MVP-CA tool exposes regions in
the knowledge base which can be identified as possible
candidates for reifiable concepts. We distinguish this
exercise from the previous section’s templating exercise,
by asserting that reification involves conceptualizing and
representing a new term that encapsulates the intended
reusable nature of that concept. Templating and
reification, nevertheless share the same origins in our
research as both get flagged through the repetitious usage
of particular concept terms in the knowledge base.
Reifying Slot Propagation in KM-Core concepts:
Duplicate and Divide: A reification opportunity was
identified through the MVP-CA tool by studying the
various propagation modes of a certain set of slots in the
concepts Duplicate and Divide as is shown below.
Duplicate and Divide belong to different branches
of the ontology stemming from the base concept of
Action. The first one derives from the Create branch
of Action concept whereas Divide derives from the
Destroy branch of the same. These two axioms were
brought together through clustering due to access of
similar slots, such as, material, age, animacy, area,
breakability, etc. These are valid properties to be
addressed in the development of certain concepts in cell
(every Divide has
(add-list ((:triple
Self
result
(an instance of (the instance-of of (the object of Self)) with
(material ((forall (the material of (the object of Self))
((an instance of (the instance-of of It))))))
;; age is smaller than the object's age
(animacy ((the animacy of (the object of Self))))
;; area is smaller than the object
(breakability ((the breakability of (the object of Self))))
...
) [Divide-add-1])
(:triple
Self
result
(an instance of (the instance-of of (the object of Self)) with
(material ((forall (the material of (the object of Self))
((an instance of (the instance-of of It))))))
;; age is smaller than the object's age
(animacy ((the animacy of (the object of Self))))
;; area is smaller than the object
(breakability ((the breakability of (the object of Self))))
...
Figure 12: Concept of Divide
7
(every Duplicate has
(object
((a Tangible-Entity)))
; An exact Duplicate of all the features of the object!
(add-list
((:triple
Self
result
(a Tangible-Entity with
(instance-of ((the instance-of of (the object of Self))))
; Duplicate certain relevant properties of the original
(material ((forall (the material of (the object of Self))
((an instance of (the instance-of of It))))))
(age
((the age of (the object of Self))))
(animacy
((the animacy of (the object of Self))))
(area
((the area of (the object of Self))))
(breakability ((the breakability of (the object of Self))))
...
Figure 13: Concept of Duplicate
<x> propagates to <class-object> using <mode-x> propagation
where
mode-x may take increasing, same or decreasing values
Figure 14: Reification of Propagation Mode
(for example, the area stays same for Duplicate). A more
sophisticated representation could be sought here to
encode such propagation properties in the slot definitions.
Another important point to be noted is that templates,
as described in the previous section, could also be applied
here for capturing the various types of slot propagation.
However, we have chosen to demonstrate the alternative
approach of reifying attribute propagation which can
ensure that particular modes of slot propagation are
applied uniformly and can be reusable in different
situations. We can define a new family of “propagation”
artifacts that are responsible for propagating attributes in
various ways easily across different concepts.
biology, such as that of cell replication and cell-division.
In studying the two concepts above, we would like to
draw attention to the different modes in which slot
propagation takes place, rather than the slot properties
themselves. Thus for Duplicate and Divide, various
property attributes (material, age, animacy, ...)
need to be propagated from an original object either to
two new, smaller objects (Divide), as indicated by the
comments for Divide in Figure 12, or to an exact copy
of the original (Duplicate), as shown in Figure 13.
For both actions, the propagation takes place in a
relatively uniform manner. However, not every property
is propagated in precisely the same way. In the case of
Divide, for instance, some properties are propagated asis with straight propagation, such as animacy,
breakability, temperature, taste, texture,, while others
have decreasing propagation (in degree, size, etc.), such
as, age, area, depth, height, length, etc. in the new
objects. Only a handful of propagation modes are
required in order to cover all types of required attribute
propagation for both Duplicate and Divide. The
important lesson is that the nature of such types of
attribute propagation can be reused in different
situations. A solution for recording such an observation is
to reify the various propagation modes and express it as
shown in Figure 14. The slot propagation mode then
needs to be annotated on the appropriate set of slots.
A side note on the ontological design issue that
clustering raised is that these specific slots should have
been placed within the objects that Divide and
Duplicate manipulate rather than in the actions
Divide & Duplicate. The problem is that one may
want to propagate a slot a particular way in one context
Reifying Continuous vs Discrete Value Propagation in
KM-Core concepts: A number of clusters found through
the MVP-CA tool contain rules that shared identical or
very similar clauses. Upon close inspection of these
clusters we saw how the type of slot propagation
depended on whether the properties of the slots could take
continuous values (by expressing greater-than,
less-than, or same-as concepts), as shown by
a representative axiom for Color-Value in Figure 15,
or discrete values by referring to the categoricalconstant-class, as shown by a representative
axiom for Brightness-Value in Figure 16 from the
clusters.
Other concepts that fell into the class of discrete slot
propagation values were, for example: DirectionValue, Sex-Value, Sentience-Value, etc.
Brightness-value has properties like less-than,
greater-than etc. which implies the need for continuous
values to be propagated through the axioms for concepts
8
that are nonetheless related by their usage, demonstrating
the value of clustering over simple pattern matching. The
cluster below deals with abstract spatial relationships such
as hulls, interior, and borders. The functionality in this
cluster seems to be of functions returning either the axis
or the interior or the hull, etc. of an object. The
operational commonality that brings these axioms
together is that the functions return the same regions, no
matter how many times they get applied.
Since the essence of the above axioms is to return the
same value for a unique function, regardless of the
number of times the function maybe applied, one can
encapsulate the functionality by forming a new class of
functions called “UniqueFn” and have functions such as
“ConvexHullFn”, “InteriorFn” etc. become
members of this class. The reified function defining all
unique functions can be expressed along the lines
expressed in Figure 18.
Cyc has recognized and implemented this aspect of
reusability by declaring these types of occurrences as
macropredicates. These are terse representations of
recurring and useful patterns in the Cyc axioms,
recognized generally in the development phase of Cyc.
However, through our analysis of existing Cyc axioms,
we can expose further opportunities for their formation
which may have been overlooked during the forward
engineering phase or can only be obtained in hindsight,
after the KB has evolved to a certain level.
(every Color-Value has
(color-of ((must-be-a Tangible-Entity)))
(value ((possible-values (the instances
of (the categorical-constant-class
of color)))))
(same-as ((must-be-a Color-Value))))
Figure 15: Axiom for Color-Value
(every Brightness-Value has
(brightness-of ((must-be-a TangibleEntity)))
(less-than ((must-be-a BrightnessValue)))
(greater-than ((must-be-a BrightnessValue)))
(same-as ((must-be-a Brightness-Value))))
Figure 16: Axiom for Brightness-Value
like Brightness. Other concepts which fall under this
umbrella are: Capacity-Value,
DensityValue, Depth-Value, Height-Value, etc.
These concepts may not appear as related in the declared
ontological hierarchy; however, there is an aspect that is
common to all these concepts which needs to be
recognized so that it can be exploited in mass when an
application needs to address it. Clustering exposed these
two classes of closely related slot property propagation
characteristics by having them appear in sibling clusters
from the MVP-CA tool. Reification of these concepts
would proceed by specifying a Discrete value propagation
vs Continuous Value propagation and the appropriate
concepts will need to be annotated as such.
Clichés
A cliché is a pattern that captures action semantics
applicable to multiple concepts [Barker ]. Clichés arise
out of the fact that certain classes of action have the same
difference in their slot definitions. If these differing
features can be extracted into clichés, new actions can get
defined in terms of these clichés or their combinations.
Reified Function in Cyc IKB’s Spatial Slice. An
interesting feature of another cluster from Cyc IKB,
shown in Figure 17, found a wide mix of distinct terms
Rule # 12
(#$implies (#$and (#$termOfUnit ?CONVEXHULLFN (#$ConvexHullFn ?OBJECT))
(#$termOfUnit ?CONVEXHULLFN-1 (#$ConvexHullFn ?CONVEXHULLFN)))
(#$equals ?CONVEXHULLFN ?CONVEXHULLFN-1))
Rule # 56
(#$implies (#$and (#$termOfUnit ?INTERIORFN (#$InteriorFn ?INTERIORFN-1))
(#$termOfUnit ?INTERIORFN-1 (#$InteriorFn ?ANYOBJECT)))
(#$equals ?INTERIORFN ?INTERIORFN-1))
Rule # 125
(#$implies (#$and (#$termOfUnit ?BORDERBETWEENFN (#$BorderBetweenFn ?REG2 ?REG1))
(#$termOfUnit ?BORDERBETWEENFN-1 (#$BorderBetweenFn ?REG1 ?REG2)))
(#$equals ?BORDERBETWEENFN ?BORDERBETWEENFN-1))
Figure 17: Cluster of Reifable Functions in Cyc
(#$implies (#$and (#$UniqueFn ?UNIQUEFN)
(#$termOfUnit ?UNIQUEFN-TERM-1 (?UNIQUEFN ?OBJECT))
(#$termOfUnit ?UNIQUEFN-TERM-2 (?UNIQUEFN ?UNIQUEFN-RESULT-1)))
(#$equals ?UNIQUEFN-TERM-1 ?UNIQUEFN-TERM-2))
Figure 18: Reified UniqueFn
9
Discussion
(every Enter has-definition
(instance-of (Move-Into Reflexive-Cliche)))
We have shown through our analysis that important metaproperties of a knowledge base are exposed when it is
abstracted, structured, and partitioned in a meaningful
manner. Such structuring can often reveal parallel
construction of axioms leading to a higher level of
understanding about the ontological distinctions and
design choices in the system. Understanding the trends or
prototypical ways in which concept relationships have
been exercised in the axioms that utilize the ontology,
allows us to take a fresh perspective on the ontology
design issues. Often studying the “alignable differences”
(Forbus 1984) among these axioms, exposes infelicitous
knowledge entry patterns as well, some of which can be
traced back to ontology design issues.
Recognizing recurring patterns enables better
knowledge organization by suggesting ways to either
more optimally structure the knowledge base rules or
build higher-level axioms that capture knowledge about
such axiom clusters as meta-properties in the ontology. In
addition, discovery of usage patterns can also suggest
opportunities for componentization, leading to a higher
degree of reusability (Clarke and Porter 1997).
Structuring also reveals the actual context in which the
terms in an ontology have been used, by exposing various
other concept terms used in its vicinity.
As shown by our analysis, partitioning through the
MVP-CA tool provides several major benefits. Firstly, it
makes it easier for users to focus on concepts relevant to a
particular area of interest. Secondly, it enables the
representation of multiple perspectives on the same
knowledge. Thirdly, if an ontological engineer is aware
of such partitions in the KB, he/she may combat the urge
to over-generalize, which often results in complex,
bloated representations that attempt to cover every
possible application at the expense of modularity and
understandability. Finally, partitions can also support
greater inferencing efficiency, though this is
implementation-dependent.
We propose that the extraction, annotation and retrieval
of such information become part of ontological
engineering practice for the semantic web. Our research
has thus far focused on the extraction of such information
from knowledge bases. We would like to propose that the
annotations in DAML/RuleML provide the infrastructure
to express these meta-properties of an ontology so that the
• intended meaning of properties, classes and
relationships in an ontology become apparent,
• commitments about various design choices in the
ontology can be reevaluated and revamped as
ontologies evolve, and
• reusable regions in the ontology can be captured
either through templates, reification, or clichés,
so that they are easily and reliably extensible.
In future, we will be exploring these issues in the
context of DAML/RuleML representations. We believe
that cluster-based analysis is an important tool for
(every Exit has-definition
(instance-of (Move-Out-Of Reflexive-Cliche)))
(every Reflexive-Cliche has-definition
(instance-of (Thing))
(agent ((the object of Self)))
(object ((the agent of Self))))
(every Reflexive-Cliche has
(agent ((exactly 1 Entity)
(the object of Self)))
(object (((the agent of Self)&(a Entity)))))
Figure 19: Reflexive Cliché declared in KM-core
Annotating such information either in the inheritance
hierarchy in the ontology or as special operators to
generate new types of actions for a given class can be a
desirable feature for assuring reuse of concepts. UT has
taken some initial steps toward specifying clichés through
a forward-engineering approach; in a complement to their
efforts, we have been able to expose candidate patterns by
reverse-engineering existing KBs. Our analysis of the
KM-core was able to extract existing clichés, as well as
other action concepts with a potential for being
represented as clichés.
It is possible to implement clichés using a variety of
techniques, including the creation of special operators to
generate new types of actions and annotations of the
ontology.
Reflexive Cliché: Enter/Exit. The cluster in Figure 19
shows Reflexive-Cliche (the only cliché currently
defined in KM core) being used to automatically classify
instances of Enter and Exit. Reflexive-Cliche
means that the agent slot and the object slot for an
instance are the same—e.g., in the case of Enter, an
agent moves itself. This is defined in the following
manner in KM.
Inverse
Cliché:
Increase/Decrease;
ComeTogether/Disperse. There are many more instances of
actions that have been discovered through our axiom
clusters which can fall under the category of clichés.
Increase & Decrease are concepts that can belong
to inverse classes and the clichés associated with them
need to address the change in degree, that is, greater-than
and less-than aspects of the concept.
Come-Together and Disperse could potentially
fall into near-inverse classes because in addition to
reversal of origin and destination slots, they incorporate
different sub-event classes in their definition; ComeTogether has Go-To sub-event whereas Disperse
has Leave sub-event in its definition.
Symmetric
Cliché:
Move-Together/Move-Apart.
Move-Together and Move-Apart have the aspect
of reversal of origin and destination slots; hence, these
would fall under symmetric clichés.
10
comprehending, maintaining, and improving KBs and
ontologies throughout their life cycles.
Guarino N. and Welty C. Evaluating Ontological Decisions with
ONTOCLEAN Communications of the ACM February 2002 Vol
45 No.2 61-65.
Acknowledgements
Guha, R.V. 1990. Micro-theories and Contexts in Cyc Part I:
Basic issues. MCC Technical Report ACT-CYC-129-9, MCC.
We deeply appreciate the support provided by Pragati’s
consultant Dmitri Bobrovnikoff in analyzing our results.
We are also extremely grateful to the staff scientists at
Cycorp, SRI, U T at Austin and CDM Technologies who
have provided us valuable insight about their systems.
Our special thanks to Pat Hayes (University of Florida)
and Vinay Chaudhri (SRI) for the very insightful
discussions we have had with them during this research.
This research was supported in part by the DARPA RKF
program under contract N66001-00-C-8019 and ONR
contract N00014-00-M-0205. We would like to thank
both Murray Burke, DARPA Program Manager and Dr.
Philip Abraham, ONR Program Manager for their
ongoing support of our work.
Lenat, D.B. and Guha, R.V. 1989. Building Large Knowledgebased Systems: Representation and Inference in the Cyc Project.
Reading, MA: Addison-Wesley.
McGuinness, D., Fikes R., et al. 2000. The Chimaera Ontology
Environment. In Proceedings of the Seventeenth National
Conference on Artificial Intelligence, 1123-1124. Menlo Park,
CA: AAAI Press.
Mehrotra, M. 1995. Requirements and Capabilities of the MultiViewPoint Clustering Analysis Methodology. In Notes for the
IJCAI-95 Workshop on Verification, Validation and Testing of
Knowledge-Based Systems, 49-56. Menlo Park, CA: AAAI
Press.
References
Barker,
K.
Mehrotra and Bobrovnikoff 2001. MVP-CA Analysis for
IMMACCS. ONR Final Report March 2001.
2000.
Mehrotra, M. 1996. Application of Multi-ViewPoint Clustering
Analysis to an Expert Systems Advocate Advisor, Technical
Report FHWA-RD-97022, Federal Highway Administration,
McLean, VA.
http://www.cs.utexas.edu/users/kbarker/working_notes/cliches.h
tml. Updated August 2000.
Chandrasekharan, B 1986. Generic tasks in knowledgebased reasoning: High-level building blocks for expert
systems design. IEEE Expert, Fall 1986.
Mehrotra, M., Alvarado, S., and. Wainwright R. 1999. Laying a
Foundation for Software Engineering of Knowledge Bases in
Spacecraft Ground Systems. In Proceedings of FLAIRS-99
Conference, 73-77. Menlo Park, CA: AAAI Press.
Chaudhri, V. K, Stickel, M.E., Thomere, J.F., Waldinger, R. J.
2000. Using Prior Knowledge: Problems and Solutions.
Proceedings of the Seventeenth National Conference on
Artificial Intelligence, 436-442. Menlo Park, CA: AAAI Press.
Mehrotra, M. and Wild, C. 1993. Multi-View Point Clustering
Analysis. In Proceedings of 1993 Goddard Conference on Space
Applications of Artificial Intelligence. 217-231. Greenbelt,
MD:.NASA Conference Publications.
Clarke, P. and Porter, B. 1999. KM-The Knowledge Machine:
Users manual. Technical Report, AI Lab, Univ. of Texas at
Austin. http://www.cs.utexas.edu/users/mfkb/km.html
Mehrotra, M. and Wild, C. 1995. Analyzing Knowledge-Based
Systems Using Multi-ViewPoint Clustering Analysis. Journal of
Systems and Software 29:235-249.
Clarke, P. and Porter, B. 1997. Building Concept
Representations from Reusable Components. In Proceedings of
the Fourteenth National Conference on Artificial Intelligence,
369-376. Menlo Park, CA: AAAI Press.
Noy, N.F. and Musen, M.A. 2000. PROMPT: Algorithm and
Tool for Automated Ontology Merging and Alignment In
Proceedings of the AAAI-2000, 450-455. Menlo Park, CA:
AAAI Press.
Cohn, A. G. and. Hazarika, S. M. 2001. Qualitative Spatial
Representation and Reasoning: An Overview. Fundamenta
Informaticae 46 (1-2):1-29.
Paley, S. M., Lowrance, J. D., and Karp P. D. 1997. A Generic
Knowledge Base Browser and Editor. In Proceedings of the
Fourteenth National Conference on Artificial Intelligence, 10451051. Menlo Park, CA: AAAI Press.
DARPA 2000. The Rapid Knowledge Formation Project:
http://reliant.teknowledge.com/RKF/2000
Everett, J., Bobrow, D.G., et.al 2002 Making Ontologies Work
for Resolving Redundancies across Documents Communications
of the ACM February 2002 Vol 45 No.2 55-60.
J. Pohl., Porczak, M. et. al 1999. IMMACCS A Multi-Agent
Decision-Support System. CAD Research Center. Design
Institute Report: CADRU-12-99.
Fikes, R., Farquhar, A., and Rice, J. 1997. Tools for Assembling
Modular Ontologies in Ontolingua. In Proceedings of the
Fourteenth National Conference on Artificial Intelligence, 436441. Menlo Park, CA: AAAI Press.
Forbus, K. 1984. Qualitative Process Theory. Artificial
Intelligence, 24:85-168.
11