From: AAAI Technical Report WS-02-11. Compilation copyright © 2002, AAAI (www.aaai.org). All rights reserved. Ontology Analysis for the Semantic Web Mala Mehrotra Pragati Synergetic Research, Inc. 922 Liberty Ct. Cupertino, CA 95014 mm@pragati-inc.com • retrospective analysis aid (by providing a hindsight mechanism to reevaluate the efficacy of the declared ontological commitments in light of their actual usage) (Mehrotra 1996) • discovery mechanism for reusable software patterns in the ontology and knowledge base (by discovering recurring software patterns in the ontology and knowledge base) (Mehrotra et.al 1998) • meta-property annotation aid (for noting subtle and intricate relationships across various concept terms in the ontology), and • merging and alignment aid across multiplyauthored ontologies and knowledge bases (by extracting usage patterns in axiom clusters having common functionality) In this paper, we will highlight our experiences with applying the MVP-CA tool to the analysis of various knowledge bases and their ontologies. The analysis is performed with a view to extracting subtle inter-concept relationships, as well as, reusable usage patterns. In the Rapid Knowledge Formation (RKF) project for DARPA (DARPA 2000), our efforts are currently focused on providing an infrastructure for efficient analysis of knowledge bases in the Shaken (SRI-led team) and Kraken (Cycorp-led team) systems. Both Shaken and Kraken system provide frameworks for enabling subject matter experts (SMEs) to author knowledge bases. The former is based on a component-driven approach based on the belief that a handful of generic core components in the base ontology should suffice as building blocks for complex knowledge authoring tasks. Kraken strives to enable SMEs to author knowledge-bases by axiomatizing various types of common-sense knowledge in its upper ontology, the Integrated Knowledge Base (IKB). We have applied MVP-CA analysis to the core KB of Shaken as well as on Cyc’s IKB. In another independent project for ONR, we have analyzed ontologies for multi-agent command and control knowledge based systems designed to support Navy and Marines logistics operations. We will be sharing some of our experiences with all these systems in this paper. We expect that the emerging DAML/RuleML framework for the Semantic Web will encounter many similar issues of comprehension, reliability and reuse which we expose through the MVP-CA tool; hence, a similar analysis would be applicable. We believe that the annotation frameworks for the semantic web should have the necessary support infrastructure to capture the types of Introduction Ontology engineering will become an increasingly important discipline as ontologies scale up on the semantic web. The enabling mechanism for the semantic web will undoubtedly lie in the construction of ontologies to address the diversity of web services. For such a dynamic environment to be realized, the underlying ontological infrastructure will need to be extremely adaptable and reliable (Everett and Bobrow 2002). In addition, since sizeable effort will be spent specifying these ontologies for various domains, it is important that they be built and maintained cost-effectively. A high-level and domain-independent approach is needed to annotate these ontologies appropriately, so that they can be maximally utilized. Smart tools are required that allow developers to • familiarize themselves rapidly with the terms and concepts in the ontology for a knowledge base(KB), • exploit and reuse preexisting knowledge (Chaudhri et.al 2000) through intelligent analysis, and • merge and align concepts across different knowledge bases reliably and efficiently. Knowledge comprehension can be aided by graphical browsing and editing tools (Paley et al. 1997; Fikes et al. 1997), and ontology merging tools such as Chimaera (McGuinness et al. 2000) and PROMPT (Noy and Musen 2000). However, most such tools are limited to taxonomic knowledge. In particular, most tools do not support the comprehension of collections of rules in the context of their usage. Rather, they provide only a single viewpoint of a system; and do not focus on software engineering issues, such as comprehension, maintenance, management and verification, for collections of rules. The Multi-ViewPointClustering Analysis (MVP-CA) technology, developed by Pragati Inc., provides an infrastructure for ontology analysis and knowledge base evaluation by clustering the axioms that utilize the ontology (Mehrotra 1995). This type of clustering allows the terms and concepts in a knowledge base to be exposed in the context of their usage (as opposed to the context of their declaration). Given this approach for analysis, MVPCA tool can be especially useful to ontological engineers as a: • navigational aid (by familiarizing them rapidly with the knowledge base artifacts in their situated contexts) 1 clusters. Graphical representations of the clustering process, such as dendrograms, further aid the user in establishing links across various concept terms in the knowledge base. In addition, the tool provides several views of the clusters at the pattern, rule, and cluster levels to aid the user in identifying the relevant clusters. We currently have a limited repertoire of automatic detection routines for flagging clusters that are relevant according to common analysis goals. information we are about to discuss. Usually we find that pseudo-partitions—that is, islands of concepts that mostly refer to each other and only occasionally to concepts outside the island—appear as a KB evolves. Clustering offers a very effective way to identify these candidates for a partitioning and annotation mechanism. It is very important to understand trends of usage patterns as ontologies evolve on the semantic web, so as to enable its reuse and sharing in an effective manner. In the next section we provide a brief overview of the MVP-CA approach. In the Experimental Results section, we present three broad categories of reusable ontological information, discovered in the course of our analysis. In the final section we conclude with some of the open issues that need to be addressed with respect to extending our work for web ontology markup. Experimental Results Clustering through the MVP-CA tool has exposed a range of potential reuse opportunities in the knowledge bases analyzed. Through our analysis we show how higher-level axioms can be developed for the ontological concepts in these knowledge bases. Formation of higher-level axioms can encapsulate valuable meta-information for the concept terms and their usage patterns in a very succinct manner. We have categorized this meta-information into three broad classes: templates, reified meta-concepts and clichés. Before we discuss these categories in detail, we will first describe briefly the three knowledge bases used in our experiments so as to provide to the reader the context in which our analysis was performed. The core knowledge base in KM (Knowledge Machine) (Clark and Porter 1999), developed at University of Texas at Austin, contains a relatively wide range of concepts and forms the ontological basis for knowledge entry in RKF’s Shaken system. It forms the backbone ontology of Shaken for development of pump-priming as well as SMEauthored knowledge. The bulk of concepts in the core KB deal with spatial entities, relationships, and events. There are also concepts that address agent actions, biological organisms, etc. Most concepts have one axiom that defines their “owns” slots (properties of the class itself) and a second axiom that defines their “member” slots (instance properties). Many concepts also have additional axioms that describe supplementary information such as text generation and test cases. The CYC IKB provides the base ontology for RKF’s Kraken system. We analyzed the spatial slice of Cycorp's IKB (IKB is a subset of the Cyc Knowledge Base licensed to the participants of DARPA's RKF program) which has been developed over several years (Lenat and Guha 1989). In the course of this development, it has evolved considerably. We clustered these rules several times using different parameters in order to analyze the axiom set from multiple perspectives. Higher level concept groups, such as Orientation, Containment, Portals, etc. emerged from the merging of closely related smaller clusters. It is interesting to note that formal investigations of spatial representation have also identified similar key concepts including topology, mereology, orientation, distance, size, and shape. There has been substantive work on each of these aspects of spatial representation (Forbus 1984; Cohn and Hazarika 2001), and numerous formal models of spatial representation are available. None of the existing Multi-ViewPoint Clustering Analysis (MVP-CA) Approach Pragati’s Multi-ViewPoint Clustering Analysis (MVPCA) tool facilitates analysis of knowledge-based systems by clustering KB rules that share significant common properties. It exposes ontology developers to the semantics of a knowledge-based system through semiautomatic clustering of its rules. The MVP-CA tool consists of three stages: parsing, cluster generation, and cluster analysis. In the parsing phase, a front-end parser translates a knowledge base’s axioms from their original form into a languageindependent representation. The user can specify numerous rule and pattern transformations in this stage to eliminate noise in the data. The cluster generation phase applies a hierarchical agglomerative clustering algorithm to the transformed rules. During each iteration, the two most similar clusters merge to form a new cluster. The order of merging forms a hierarchy of clusters. Similarity between rules is defined by a set of heuristic distance metrics (Mehrotra and Wild 1995), which the user chooses based on the nature of the task performed by the rule base (e.g., classification, diagnosis, control, etc.) (Chandrasekharan 1986). These heuristic-based distance metrics have evolved from our experiences with different types of knowledge bases. In the cluster analysis phase, the user interacts with the tool to pinpoint the relevant clusters from the generated pool of clusters. A cluster’s relevance depends entirely on the objective of the analysis. However, the tool does provide some automatic support for common analysis goals, such as flagging clusters whose rules contain very similar clauses. In its primary role as a comprehension aid, MVP-CA provides support for honing in on clusters that provide insight into the important conceptual regions in the knowledge base. In order to identify such seed concepts, the tool generates information about patterns and clusters, such as, pattern frequency, cluster size, the dominant patterns of a cluster, etc. The user can utilize this information to assess the quality and relevance of the 2 formal models, however, combine these different aspects of spatial representation into one theory as done in Cyc. The IMMACCS (Integrated Marine Multi-Agent Command and Control System) system, developed by the CAD Research Center (Pohl et.al 1999), provides near real-time decision support for military command and control personnel in the form of enhanced situation awareness. Critical to achieving this goal is the IMMACCS Object Model, an ontology that represents relevant objects in terms of their behavioral characteristics and relationships to other objects. The agent engine takes charge of the dynamic problemsolving aspects in the environment and generates the desired views of the battle space to support the planning and training activities. The informational aspects of the objects are thus separated from the logic aspects of the system. The FIRES agent for example, responds to “Call for Fire” messages in the system. In response to such a message its purpose is to select the best weapon based on availability, deliverability and acceptability. To accomplish this goal it accesses concepts such as range, time of flight, target type, urgency, circular error of probability (CEP), effective casualty rate (ECR), availability and rules of engagement (ROE) concepts from the IMMACCS object model. The deconfliction rules in the FIRES agent also address the trajectory of the munitions relative to the position of other friendly assets and infra structure objects. (every Spatial-Entity has (location ((must-be-a Place))) (is-outside ((must-be-a Spatial-Entity))) (does-not-enclose ((must-be-a SpatialEntity))) (is-inside ((must-be-a Spatial-Entity))) (encloses ((must-be-a Spatial-Entity))) (abuts ((must-be-a Spatial-Entity) (excluded-values (Self)))) (is-above ((must-be-a Spatial-Entity))) (is-below ((must-be-a Spatial-Entity))) (is-along ((must-be-a Spatial-Entity))) (is-at ((must-be-a Spatial-Entity))) (is-behind ((must-be-a Spatial-Entity))) (is-in-front-of ((must-be-a SpatialEntity))) (is-on ((must-be-a Spatial-Entity))) (has-on-it ((must-be-a Spatial-Entity))) (is-opposite ((must-be-a Spatial-Entity))) (is-over ((must-be-a Spatial-Entity))) (is-under ((must-be-a Spatial-Entity))) (is-near ((must-be-a Spatial-Entity))) ..) Figure 1 structure in a higher-level axiom/rule, having open slots to be instantiated with the appropriate bindings as slot fillers. Templating thus reduces to basically being able to parameterize a set of axioms or rules so that future extensions or modifications of the knowledge base can proceed in an intelligent, focused manner. In our experience with various types of knowledge bases, opportunities for template formation arise repeatedly, regardless of the domain or the representation language. Below we present an example from each of the three knowledge bases analyzed recently. Some of the axioms have been abbreviated to conserve space. Templatization of Usage Patterns Clustering often juxtaposes rules that share an overall structural similarity, but differ at a small number of variation points. There is generally a conceptual coherence to these sets of axioms, which can be identified and represented in terms of a template. Once the common structure across rules has been recognized, the problem of template formation is reduced to factoring out this Templates Applied to Slot Propagation in Concepts: Spatial-Entity, Place, and Move. The axiom cluster every Place has (location ((exactly 0 Place))) (is-near ((forall (the location-of of Self) (the is-near of It)))) (abuts ((forall (the location-of of Self) (the abuts of It)))) (is-above ((forall (the location-of of Self) (the is-above of It)))) (is-below ((forall (the location-of of Self) (the is-below of It)))) (is-along ((forall (the location-of of Self) (the is-along of It)))) (is-at ((forall (the location-of of Self) (the is-at of It)))) (is-at-of ((forall (the location-of of Self) (the is-at-of of It)))) (is-beside ((forall (the location-of of Self) (the is-beside of It)))) (is-between ((forall (the location-of of Self) (the is-between of It)))) (is-behind ((forall (the location-of of Self) (the is-behind of It)))) (is-in-front-of ((forall (the location-of of Self) (the is-in-front-of of It)))) (is-inside ((forall (the location-of of Self) (the is-inside of It)))) (encloses ((forall (the location-of of Self) (the encloses of It)))) (is-on ((forall (the location-of of Self) (the is-on of It)))) (has-on-it ((forall (the location-of of Self) (the has-on-it of It)))) (is-opposite ((forall (the location-of of Self) (the is-opposite of It)))) (is-outside ((forall (the location-of of Self) (the is-outside of It)))) (does-not-enclose ((forall (the location-of of Self) (the does-not-enclose of It)))) (is-over ((forall (the location-of of Self) (the is-over of It)))) (is-under ((forall (the location-of of Self) (the is-under of It)))) ) Figure 2 3 ((every Move has ... (destination ((must-be-a Spatial-Entity))) ... (add-list ((if (has-value (the destination of Self)) then (forall (the object of Self) (:set (:triple It is-near (the is-near of (the destination of Self))) (:triple It abuts (the abuts of (the destination of Self))) (:triple It is-beside (the is-beside of (the destination of Self))) (:triple It is-between (the is-between of (the destination of Self))) ... ))))) (del-list ((forall (the object of Self) (:set (:triple It location (the location of It)) (forall2 (the is-near of It) (if (not ((the is-near of (the destination of Self)) includes It2)) then (:triple It is-near It2))) (forall2 (the abuts of It) (if (not ((the abuts of (the destination of Self)) includes It2)) then (:triple It abuts It2))) (forall2 (the is-beside of It) (if (not ((the is-beside of (the destination of Self)) includes It2)) then (:triple It is-beside It2))) (forall2 (the is-between of It) (if (not ((the is-between of (the destination of Self)) includes It2)) then (:triple It is-between It2))) ... ))))) Figure 3 through our tool. 18 slots are common to all three axioms. Each concept is either defined by or manipulates a long list of shared slots such as is-near, abuts, etc. These slots describe how a spatial entity is situated relative to other spatial entities. Even though this is a useful and appropriate way to represent such relationships, there is a potentially serious KB maintenance issue in that each of the slots must be repeatedly dealt with every time one of the concepts, such as SpatialEntity is specialized (Place) or manipulated (Move). While the issue is not overwhelming for just three concepts, it becomes more serious as the knowledge base scales up and, perhaps, other KBs are constructed on top of the core. shown in the Figures 1, 2 and 3 consists of the member slot definition axioms for three concepts: SpatialEntity, Place, and Move respectively in UT’s KMcore. Superficially, the connections between the three concepts are not obvious, since they are not lexically close. Since Place is a subclass of Spatial-Entity; this relationship could, in principle, be obtained through the taxonomic link. However, Move makes use of both Spatial-Entity and Place. This connection is not obvious from the ontological hierarchy as Move is derived from the Action class, whereas SpatialEntity and Place are derived from the Entity class. A close inspection of the definitions of these three concepts reveals why the concepts clustered together (<property> ((forall (the location-of of Self) (the <property> of It)))) Figure 4: Template 1 (:triple It <property> (the <property> of (the destination of Self))) Figure 5: Template 2 (forall2 (the <property> of It) (if (not ((the <property> of (the destination of Self)) includes It2)) then (:triple It <property> It2))) Figure 6: Template 3 4 One possible way to deal, at least partially, with repetitive slot references is to parameterize the frame definitions. For example, the slots for Place that specify its spatial relationships share a similar characteristic: they propagate to all of the spatial entities located in that Place. There is no significant difference between how is-near propagates and how abuts propagates. Most of the pre/post-conditions for Move display parallel, though not identical, slot propagation behavior. For Place, a template can take the form of Template1 in Figure 4. To create a concrete axiom, one would bind the <property> parameter to an appropriate property name. By applying the template to the abuts property, for instance, we would end up with the following: If Spatial-Entity has slot with <property> then Place has clause <Template 1> and Move has clause <Template 2> in add-list and clause <Template 3> in del-list Figure 7: Higher Level Axiom retrievable annotations so that this knowledge can be reused. A hind-sight point to be noted here is that even for the concepts in this cluster, several slot properties, such as, is-beside, and is-between, are mentioned by Place and Move but not by Spatial-Entity. Move, for instance, refers to the in-between slot of its destination, where destination is constrained to Spatial-Entity rather than to Place, and thus might not have the in-between slot defined. Such violations in domain constraints are commonplace in our experience. Clustering makes it very easy to detect such differences across axioms as they surface easily when we align the axioms for analysis. For the most part we found that, concepts in KM are thoughtfully designed and would be applicable to a variety of knowledge-based applications. However, even the best-designed software systems inevitably encounter issues as they scale up. At some point in the KB’s evolution, this broad scope might become a limiting factor. Grouping together many concepts from different subject areas may make it more difficult for KB builders to find and extend the most appropriate concepts. Some method of grouping the knowledge base into more-or-less discrete “chunks” can add significant value. By applying MVP-CA technology to the UT core knowledge base, we have discovered a number of clusters that can aid in creating more reusable components, partitioning knowledge where appropriate, and exposing errors. (abuts ((forall (the location-of of Self) (the abuts of It)))) From the concept of Move, Templates 2 and 3, as shown in Figures 5 and 6, result. A higher level axiom can be formed connecting the concepts Spatial-Entity, Place, and Move along the lines shown in Figure 7. Of course, such templates wouldn’t be applicable to every slot—there is obviously a need to give special treatment to certain slots, such as location in Place. However, templates would ensure more uniform treatment of similar logic—both within a concept and, perhaps more importantly, across concepts. The effort then reduces to listing all the shared slots for each concept that uses them. Templates might be implemented in a variety of ways, ranging from language extensions to a simple userinterface mechanism. The important message is to be able to record the connections across the concepts as Rule # 98 (#$implies (#$suspendedIn ?OBJ ?FLU) (#$physicalStructuralAttributes ?FLU #$Pourable)) Rule # 140 (#$implies (#$in-ImmersedFully ?OBJ ?FLU) (#$physicalStructuralAttributes ?FLU #$Pourable)) Rule # 160 (#$implies (#$in-ContFullOf ?STUFF ?CONT) (#$physicalStructuralAttributes ?STUFF #$Pourable)) Figure 8: Pourable Cluster in Cyc (#$implies (<some-property> <obj1> <obj2>) (#$physicalStructuralAttributes <obj1> <some-physicial-structural-attribute>)) [from #160] Figure 9: Template 1 from Pourable cluster (#$implies (<some-property> <obj1> <obj2>) (#$physicalStructuralAttributes <obj2> <some-physicial-structural-attribute>)) [from #98, #140] Figure 10: Template 2 from Pourable Cluster 5 Figure 11: Conflicts Cluster therefore, that Pourable would be better represented as a property of fluids, rather than being declared as a physicalStructuralAttributes. It would make more sense to structure the axioms so that if an object is immersed in something we can first infer that the something must be a fluid, and derive the fluid attributes from there. Thus this cluster suggests an intermediate concept of "fluid" or "fluid properties" in the ontology. This would enable Pourable to participate directly in concepts such as suspension and immersion that are currently only tangentially related (through fluids) to pourability. Such observations can be made only when one sees concepts situated in the context of their usage, as opposed to in the context of their declaration in the ontology hierarchy. Even when ontological engineers take great care to define the right information at the right conceptual level in the ontology, it is often the case that not all the different aspects of object and problem definition can be foreseen a priori in the forward engineering phase of the project. Often certain subtle but important relationships become evident through time, after studying the patterns of data/information accesses in the system. Applying Templates to Cyc Spatial Axioms: We list here similar experiences in templatization when we analyzed the spatial clusters from the Cyc IKB. In Figure 8 we present a cluster that shows several fluid-related concepts, specifically those linked to "Pourable". Upon close examination of the structural aspects of this cluster where "physicalStructuralAttributes" seems to be a dominant concept, rules 98, 140, and 160 naturally give rise to one of templates 1 and 2 as shown in Figures 9 and 10. The second template is the same as the first, except the argument order for <some-property> happens to be reversed. However, an argument to be made here is that the clustering points to a possible flaw in the ontology design for the KB, by showing that one of the arguments to <some-property> is always ignored. The cluster shown above is part of the Object Attributes cluster, where the predicate term physicalStructuralAttributes brings together various shapes asserted in the IKB slice related to “sheets & corners” and “LongAndThin”. However, the term Pourable falls under the umbrella of object attributes even though it is not really a shape concept, because, the Cyc axioms assert Pourable as a physicalStructuralAttribute, just like LongAndThin, SheetShaped, Corner2d, etc. The rules are designed to find the fluid in a fluidobject or fluid-container relationship; they then conclude some property about the fluid. This approach seems overly specific -- the fact that an object happens to have some relation to the fluid (being immersed in it, for instance) doesn't really make any difference to the fundamental properties of the fluid. It is evident Applying Templates to IMMACCS Axioms: In our experience with the IMMACCS system, we discovered that even though the rules in the agents are well-organized in their respective groups, each rule is between two to three pages long, with many clauses repeated across rules. Since each rule references many classes defined in the object model, the knowledge base becomes very opaque from the standpoint of human comprehension. In order to understand each rule, one has to undergo multiple context 6 much wider scope of applicability for the different types of conflict situations that can arise for weapon targeting. switches. Many sets of rules are slight variations on a base concept, implying a need for some sort of factoring: either by breaking up the rules themselves into shared/unshared pieces, or creating a new superclass in the ontology to represent the similarities in the objects manipulated by the rules, or both. One IMMACCS cluster dealing with firing conflicts contains two parallel sets of rules that describe two possible sources of conflict: buildings and rotary wings. Since the rules are very long we list here just the rule names. The two sub clusters under the Conflicts cluster are Conflict-due-to-blocking-Building, with rules names: Structure_Trajectory_Weapon, Structure_Trajectory_Entity, Structure_Trajectory_Platform and Conflict-due-to-blocking-Rotary-Wing, with rules names: RotaryWing_Trajectory_Weapon, RotaryWing_Trajectory_Entity, RotaryWing_Trajectory_Platform. In Figure 11, the green nodes represent concepts from the Object Model that are referenced by rules in both subclusters. The greyed objects in the figure are the concepts that are referenced by only one of the two subclusters for Conflict. The concept in red is the label we have assigned to the cluster based on our analysis. Our proposed change was to formulate a more general concept of Conflict-due-to-blocking-object in the object model ontology, formulate a templatized base rule for this and instantiate the base rule with the objects, rotary wing or building, as and when the need arises. Details for this can be obtained from (Mehrotra and Bobrovnikoff 2001). Abstracting rules to this level of generality provides a Reification The purpose of reification in knowledge based systems is to be able to represent and store certain useful behavior patterns in the system in a parsimonious manner for easy recall and reuse. In this spirit, by clustering similar axioms/rules together, MVP-CA tool exposes regions in the knowledge base which can be identified as possible candidates for reifiable concepts. We distinguish this exercise from the previous section’s templating exercise, by asserting that reification involves conceptualizing and representing a new term that encapsulates the intended reusable nature of that concept. Templating and reification, nevertheless share the same origins in our research as both get flagged through the repetitious usage of particular concept terms in the knowledge base. Reifying Slot Propagation in KM-Core concepts: Duplicate and Divide: A reification opportunity was identified through the MVP-CA tool by studying the various propagation modes of a certain set of slots in the concepts Duplicate and Divide as is shown below. Duplicate and Divide belong to different branches of the ontology stemming from the base concept of Action. The first one derives from the Create branch of Action concept whereas Divide derives from the Destroy branch of the same. These two axioms were brought together through clustering due to access of similar slots, such as, material, age, animacy, area, breakability, etc. These are valid properties to be addressed in the development of certain concepts in cell (every Divide has (add-list ((:triple Self result (an instance of (the instance-of of (the object of Self)) with (material ((forall (the material of (the object of Self)) ((an instance of (the instance-of of It)))))) ;; age is smaller than the object's age (animacy ((the animacy of (the object of Self)))) ;; area is smaller than the object (breakability ((the breakability of (the object of Self)))) ... ) [Divide-add-1]) (:triple Self result (an instance of (the instance-of of (the object of Self)) with (material ((forall (the material of (the object of Self)) ((an instance of (the instance-of of It)))))) ;; age is smaller than the object's age (animacy ((the animacy of (the object of Self)))) ;; area is smaller than the object (breakability ((the breakability of (the object of Self)))) ... Figure 12: Concept of Divide 7 (every Duplicate has (object ((a Tangible-Entity))) ; An exact Duplicate of all the features of the object! (add-list ((:triple Self result (a Tangible-Entity with (instance-of ((the instance-of of (the object of Self)))) ; Duplicate certain relevant properties of the original (material ((forall (the material of (the object of Self)) ((an instance of (the instance-of of It)))))) (age ((the age of (the object of Self)))) (animacy ((the animacy of (the object of Self)))) (area ((the area of (the object of Self)))) (breakability ((the breakability of (the object of Self)))) ... Figure 13: Concept of Duplicate <x> propagates to <class-object> using <mode-x> propagation where mode-x may take increasing, same or decreasing values Figure 14: Reification of Propagation Mode (for example, the area stays same for Duplicate). A more sophisticated representation could be sought here to encode such propagation properties in the slot definitions. Another important point to be noted is that templates, as described in the previous section, could also be applied here for capturing the various types of slot propagation. However, we have chosen to demonstrate the alternative approach of reifying attribute propagation which can ensure that particular modes of slot propagation are applied uniformly and can be reusable in different situations. We can define a new family of “propagation” artifacts that are responsible for propagating attributes in various ways easily across different concepts. biology, such as that of cell replication and cell-division. In studying the two concepts above, we would like to draw attention to the different modes in which slot propagation takes place, rather than the slot properties themselves. Thus for Duplicate and Divide, various property attributes (material, age, animacy, ...) need to be propagated from an original object either to two new, smaller objects (Divide), as indicated by the comments for Divide in Figure 12, or to an exact copy of the original (Duplicate), as shown in Figure 13. For both actions, the propagation takes place in a relatively uniform manner. However, not every property is propagated in precisely the same way. In the case of Divide, for instance, some properties are propagated asis with straight propagation, such as animacy, breakability, temperature, taste, texture,, while others have decreasing propagation (in degree, size, etc.), such as, age, area, depth, height, length, etc. in the new objects. Only a handful of propagation modes are required in order to cover all types of required attribute propagation for both Duplicate and Divide. The important lesson is that the nature of such types of attribute propagation can be reused in different situations. A solution for recording such an observation is to reify the various propagation modes and express it as shown in Figure 14. The slot propagation mode then needs to be annotated on the appropriate set of slots. A side note on the ontological design issue that clustering raised is that these specific slots should have been placed within the objects that Divide and Duplicate manipulate rather than in the actions Divide & Duplicate. The problem is that one may want to propagate a slot a particular way in one context Reifying Continuous vs Discrete Value Propagation in KM-Core concepts: A number of clusters found through the MVP-CA tool contain rules that shared identical or very similar clauses. Upon close inspection of these clusters we saw how the type of slot propagation depended on whether the properties of the slots could take continuous values (by expressing greater-than, less-than, or same-as concepts), as shown by a representative axiom for Color-Value in Figure 15, or discrete values by referring to the categoricalconstant-class, as shown by a representative axiom for Brightness-Value in Figure 16 from the clusters. Other concepts that fell into the class of discrete slot propagation values were, for example: DirectionValue, Sex-Value, Sentience-Value, etc. Brightness-value has properties like less-than, greater-than etc. which implies the need for continuous values to be propagated through the axioms for concepts 8 that are nonetheless related by their usage, demonstrating the value of clustering over simple pattern matching. The cluster below deals with abstract spatial relationships such as hulls, interior, and borders. The functionality in this cluster seems to be of functions returning either the axis or the interior or the hull, etc. of an object. The operational commonality that brings these axioms together is that the functions return the same regions, no matter how many times they get applied. Since the essence of the above axioms is to return the same value for a unique function, regardless of the number of times the function maybe applied, one can encapsulate the functionality by forming a new class of functions called “UniqueFn” and have functions such as “ConvexHullFn”, “InteriorFn” etc. become members of this class. The reified function defining all unique functions can be expressed along the lines expressed in Figure 18. Cyc has recognized and implemented this aspect of reusability by declaring these types of occurrences as macropredicates. These are terse representations of recurring and useful patterns in the Cyc axioms, recognized generally in the development phase of Cyc. However, through our analysis of existing Cyc axioms, we can expose further opportunities for their formation which may have been overlooked during the forward engineering phase or can only be obtained in hindsight, after the KB has evolved to a certain level. (every Color-Value has (color-of ((must-be-a Tangible-Entity))) (value ((possible-values (the instances of (the categorical-constant-class of color))))) (same-as ((must-be-a Color-Value)))) Figure 15: Axiom for Color-Value (every Brightness-Value has (brightness-of ((must-be-a TangibleEntity))) (less-than ((must-be-a BrightnessValue))) (greater-than ((must-be-a BrightnessValue))) (same-as ((must-be-a Brightness-Value)))) Figure 16: Axiom for Brightness-Value like Brightness. Other concepts which fall under this umbrella are: Capacity-Value, DensityValue, Depth-Value, Height-Value, etc. These concepts may not appear as related in the declared ontological hierarchy; however, there is an aspect that is common to all these concepts which needs to be recognized so that it can be exploited in mass when an application needs to address it. Clustering exposed these two classes of closely related slot property propagation characteristics by having them appear in sibling clusters from the MVP-CA tool. Reification of these concepts would proceed by specifying a Discrete value propagation vs Continuous Value propagation and the appropriate concepts will need to be annotated as such. Clichés A cliché is a pattern that captures action semantics applicable to multiple concepts [Barker ]. Clichés arise out of the fact that certain classes of action have the same difference in their slot definitions. If these differing features can be extracted into clichés, new actions can get defined in terms of these clichés or their combinations. Reified Function in Cyc IKB’s Spatial Slice. An interesting feature of another cluster from Cyc IKB, shown in Figure 17, found a wide mix of distinct terms Rule # 12 (#$implies (#$and (#$termOfUnit ?CONVEXHULLFN (#$ConvexHullFn ?OBJECT)) (#$termOfUnit ?CONVEXHULLFN-1 (#$ConvexHullFn ?CONVEXHULLFN))) (#$equals ?CONVEXHULLFN ?CONVEXHULLFN-1)) Rule # 56 (#$implies (#$and (#$termOfUnit ?INTERIORFN (#$InteriorFn ?INTERIORFN-1)) (#$termOfUnit ?INTERIORFN-1 (#$InteriorFn ?ANYOBJECT))) (#$equals ?INTERIORFN ?INTERIORFN-1)) Rule # 125 (#$implies (#$and (#$termOfUnit ?BORDERBETWEENFN (#$BorderBetweenFn ?REG2 ?REG1)) (#$termOfUnit ?BORDERBETWEENFN-1 (#$BorderBetweenFn ?REG1 ?REG2))) (#$equals ?BORDERBETWEENFN ?BORDERBETWEENFN-1)) Figure 17: Cluster of Reifable Functions in Cyc (#$implies (#$and (#$UniqueFn ?UNIQUEFN) (#$termOfUnit ?UNIQUEFN-TERM-1 (?UNIQUEFN ?OBJECT)) (#$termOfUnit ?UNIQUEFN-TERM-2 (?UNIQUEFN ?UNIQUEFN-RESULT-1))) (#$equals ?UNIQUEFN-TERM-1 ?UNIQUEFN-TERM-2)) Figure 18: Reified UniqueFn 9 Discussion (every Enter has-definition (instance-of (Move-Into Reflexive-Cliche))) We have shown through our analysis that important metaproperties of a knowledge base are exposed when it is abstracted, structured, and partitioned in a meaningful manner. Such structuring can often reveal parallel construction of axioms leading to a higher level of understanding about the ontological distinctions and design choices in the system. Understanding the trends or prototypical ways in which concept relationships have been exercised in the axioms that utilize the ontology, allows us to take a fresh perspective on the ontology design issues. Often studying the “alignable differences” (Forbus 1984) among these axioms, exposes infelicitous knowledge entry patterns as well, some of which can be traced back to ontology design issues. Recognizing recurring patterns enables better knowledge organization by suggesting ways to either more optimally structure the knowledge base rules or build higher-level axioms that capture knowledge about such axiom clusters as meta-properties in the ontology. In addition, discovery of usage patterns can also suggest opportunities for componentization, leading to a higher degree of reusability (Clarke and Porter 1997). Structuring also reveals the actual context in which the terms in an ontology have been used, by exposing various other concept terms used in its vicinity. As shown by our analysis, partitioning through the MVP-CA tool provides several major benefits. Firstly, it makes it easier for users to focus on concepts relevant to a particular area of interest. Secondly, it enables the representation of multiple perspectives on the same knowledge. Thirdly, if an ontological engineer is aware of such partitions in the KB, he/she may combat the urge to over-generalize, which often results in complex, bloated representations that attempt to cover every possible application at the expense of modularity and understandability. Finally, partitions can also support greater inferencing efficiency, though this is implementation-dependent. We propose that the extraction, annotation and retrieval of such information become part of ontological engineering practice for the semantic web. Our research has thus far focused on the extraction of such information from knowledge bases. We would like to propose that the annotations in DAML/RuleML provide the infrastructure to express these meta-properties of an ontology so that the • intended meaning of properties, classes and relationships in an ontology become apparent, • commitments about various design choices in the ontology can be reevaluated and revamped as ontologies evolve, and • reusable regions in the ontology can be captured either through templates, reification, or clichés, so that they are easily and reliably extensible. In future, we will be exploring these issues in the context of DAML/RuleML representations. We believe that cluster-based analysis is an important tool for (every Exit has-definition (instance-of (Move-Out-Of Reflexive-Cliche))) (every Reflexive-Cliche has-definition (instance-of (Thing)) (agent ((the object of Self))) (object ((the agent of Self)))) (every Reflexive-Cliche has (agent ((exactly 1 Entity) (the object of Self))) (object (((the agent of Self)&(a Entity))))) Figure 19: Reflexive Cliché declared in KM-core Annotating such information either in the inheritance hierarchy in the ontology or as special operators to generate new types of actions for a given class can be a desirable feature for assuring reuse of concepts. UT has taken some initial steps toward specifying clichés through a forward-engineering approach; in a complement to their efforts, we have been able to expose candidate patterns by reverse-engineering existing KBs. Our analysis of the KM-core was able to extract existing clichés, as well as other action concepts with a potential for being represented as clichés. It is possible to implement clichés using a variety of techniques, including the creation of special operators to generate new types of actions and annotations of the ontology. Reflexive Cliché: Enter/Exit. The cluster in Figure 19 shows Reflexive-Cliche (the only cliché currently defined in KM core) being used to automatically classify instances of Enter and Exit. Reflexive-Cliche means that the agent slot and the object slot for an instance are the same—e.g., in the case of Enter, an agent moves itself. This is defined in the following manner in KM. Inverse Cliché: Increase/Decrease; ComeTogether/Disperse. There are many more instances of actions that have been discovered through our axiom clusters which can fall under the category of clichés. Increase & Decrease are concepts that can belong to inverse classes and the clichés associated with them need to address the change in degree, that is, greater-than and less-than aspects of the concept. Come-Together and Disperse could potentially fall into near-inverse classes because in addition to reversal of origin and destination slots, they incorporate different sub-event classes in their definition; ComeTogether has Go-To sub-event whereas Disperse has Leave sub-event in its definition. Symmetric Cliché: Move-Together/Move-Apart. Move-Together and Move-Apart have the aspect of reversal of origin and destination slots; hence, these would fall under symmetric clichés. 10 comprehending, maintaining, and improving KBs and ontologies throughout their life cycles. Guarino N. and Welty C. Evaluating Ontological Decisions with ONTOCLEAN Communications of the ACM February 2002 Vol 45 No.2 61-65. Acknowledgements Guha, R.V. 1990. Micro-theories and Contexts in Cyc Part I: Basic issues. MCC Technical Report ACT-CYC-129-9, MCC. We deeply appreciate the support provided by Pragati’s consultant Dmitri Bobrovnikoff in analyzing our results. We are also extremely grateful to the staff scientists at Cycorp, SRI, U T at Austin and CDM Technologies who have provided us valuable insight about their systems. Our special thanks to Pat Hayes (University of Florida) and Vinay Chaudhri (SRI) for the very insightful discussions we have had with them during this research. This research was supported in part by the DARPA RKF program under contract N66001-00-C-8019 and ONR contract N00014-00-M-0205. We would like to thank both Murray Burke, DARPA Program Manager and Dr. Philip Abraham, ONR Program Manager for their ongoing support of our work. Lenat, D.B. and Guha, R.V. 1989. Building Large Knowledgebased Systems: Representation and Inference in the Cyc Project. Reading, MA: Addison-Wesley. McGuinness, D., Fikes R., et al. 2000. The Chimaera Ontology Environment. In Proceedings of the Seventeenth National Conference on Artificial Intelligence, 1123-1124. Menlo Park, CA: AAAI Press. Mehrotra, M. 1995. Requirements and Capabilities of the MultiViewPoint Clustering Analysis Methodology. In Notes for the IJCAI-95 Workshop on Verification, Validation and Testing of Knowledge-Based Systems, 49-56. Menlo Park, CA: AAAI Press. References Barker, K. Mehrotra and Bobrovnikoff 2001. MVP-CA Analysis for IMMACCS. ONR Final Report March 2001. 2000. Mehrotra, M. 1996. Application of Multi-ViewPoint Clustering Analysis to an Expert Systems Advocate Advisor, Technical Report FHWA-RD-97022, Federal Highway Administration, McLean, VA. http://www.cs.utexas.edu/users/kbarker/working_notes/cliches.h tml. Updated August 2000. Chandrasekharan, B 1986. Generic tasks in knowledgebased reasoning: High-level building blocks for expert systems design. IEEE Expert, Fall 1986. Mehrotra, M., Alvarado, S., and. Wainwright R. 1999. Laying a Foundation for Software Engineering of Knowledge Bases in Spacecraft Ground Systems. In Proceedings of FLAIRS-99 Conference, 73-77. Menlo Park, CA: AAAI Press. Chaudhri, V. K, Stickel, M.E., Thomere, J.F., Waldinger, R. J. 2000. Using Prior Knowledge: Problems and Solutions. Proceedings of the Seventeenth National Conference on Artificial Intelligence, 436-442. Menlo Park, CA: AAAI Press. Mehrotra, M. and Wild, C. 1993. Multi-View Point Clustering Analysis. In Proceedings of 1993 Goddard Conference on Space Applications of Artificial Intelligence. 217-231. Greenbelt, MD:.NASA Conference Publications. Clarke, P. and Porter, B. 1999. KM-The Knowledge Machine: Users manual. Technical Report, AI Lab, Univ. of Texas at Austin. http://www.cs.utexas.edu/users/mfkb/km.html Mehrotra, M. and Wild, C. 1995. Analyzing Knowledge-Based Systems Using Multi-ViewPoint Clustering Analysis. Journal of Systems and Software 29:235-249. Clarke, P. and Porter, B. 1997. Building Concept Representations from Reusable Components. In Proceedings of the Fourteenth National Conference on Artificial Intelligence, 369-376. Menlo Park, CA: AAAI Press. Noy, N.F. and Musen, M.A. 2000. PROMPT: Algorithm and Tool for Automated Ontology Merging and Alignment In Proceedings of the AAAI-2000, 450-455. Menlo Park, CA: AAAI Press. Cohn, A. G. and. Hazarika, S. M. 2001. Qualitative Spatial Representation and Reasoning: An Overview. Fundamenta Informaticae 46 (1-2):1-29. Paley, S. M., Lowrance, J. D., and Karp P. D. 1997. A Generic Knowledge Base Browser and Editor. In Proceedings of the Fourteenth National Conference on Artificial Intelligence, 10451051. Menlo Park, CA: AAAI Press. DARPA 2000. The Rapid Knowledge Formation Project: http://reliant.teknowledge.com/RKF/2000 Everett, J., Bobrow, D.G., et.al 2002 Making Ontologies Work for Resolving Redundancies across Documents Communications of the ACM February 2002 Vol 45 No.2 55-60. J. Pohl., Porczak, M. et. al 1999. IMMACCS A Multi-Agent Decision-Support System. CAD Research Center. Design Institute Report: CADRU-12-99. Fikes, R., Farquhar, A., and Rice, J. 1997. Tools for Assembling Modular Ontologies in Ontolingua. In Proceedings of the Fourteenth National Conference on Artificial Intelligence, 436441. Menlo Park, CA: AAAI Press. Forbus, K. 1984. Qualitative Process Theory. Artificial Intelligence, 24:85-168. 11