From: AAAI Technical Report WS-96-05. Compilation copyright © 1996, AAAI (www.aaai.org). All rights reserved. The K-Rep System Robert Weida, Eric Mays, Chihong Robert Liang, Architecture Dionne, Meir Laker, Frank J. Oles Brian White, IBM T. J. Watson Research Center* P.O. Box 218 Yorktown Heights, NY 10598 1 Introduction ist independent of process termination or system shutdown. K-Rep is an object-oriented knowledge representation Distribution: Knowledge bases must be simultanesystem based on description logic [Woods and Schmolze, ously accessible from multiple client systems in a 1992]. K-Repprovides a language to express descriptions networked environment. (e.g., of medical concepts such as drugs, treatments, and diseases), inference mechanismsfor reasoning about deEmbeddability: K-Rep must offer terminology serscriptions and their relationship to one another, and convices as a seamlessly integrated componentof larger siderable supporting infrastructure. This paper presents application systems. an overview of K-Rep’s system architecture, which has Performance: K-Rep must be able to respond at been motivated in part by the needs of several applicainteractive speeds in critical production environtions in the area of clinical information systems [Mays ments. et al., 1996]. It has becomeclear that accurate and comprehensive Authoring support: K-l:tep must provide a profescontrolled medical terminologies are crucial for advanced sional development environment to facilitate the clinical information systems, e.g., to achieve comparacollaborative knowledgeengineering and visualizable data across the enterprise in support of outcomes tion activities of domainexperts in an intuitive and analysis and inter-operation of ancillary systems, conappealing manner. eeptual indexing of clinical advice, and knowledge-based Consistency maintenance: Revision and reclassifientry of orders for drugs, procedures, etc. Organization cation of descriptions is essential, both for interacand maintenance of such a terminology is a formidable tive knowledgebase editing and for run-time infertask: there are at least 100,000 concepts of initial intereneing. est, both national (or international) standardization and local customization of content must coexist, and the terScope: K-Rep must accommodate a wide variety of minology will evolve significantly over time. The state information within descriptions, including both seof the art in medical terminologies is characterized by mantic roles and primitives, and non-semantic inad hoe definitions, manualclassification, little inferenformation such as bitmaps, synonyms, abbreviatial power, and no provision for collaborative developtions, and codes. ment. Therefore, the area of clinical information sysThese considerations led us naturally to our current detems has becomeone of our key applications. In October 1994, after several years of experience with a Common sign point, illustrated in Figure 1 and discussed below. Lisp implementation of the K-Rep description logic system [Mays et al., 1991], we embarked upon a substantial 2 Discussion redesign effort to accomplish the following objectives: K-Rep addresses scalability and persistence using ObSealability: K-Rep must support extremely large jectStore, an object-oriented database managementsysknowledge bases, potentially beyond the limits of tem with a persistent memory model. Persistent memvirtual storage. ory extends virtual memorysimilar to the way that virtual memoryextends real memory,but also includes the Perslstenee: Knowledge bases must be immediately database characteristics of atomic, serializable, and reavailable to applications, and must continue to excoverable transactions. A particularly important feature of ObjectStore’s persistent memoryimplementation *The first author may be contacted via email at is the mapping of persistent memorypages into virtual weida@cs.columbia.edu.The other authors, respectively, memorypages using pointer swizzling [Lambet al., 1991]. maybe contacted at {emays, dionne, melt, bfwhite, cliang, oles} @watson.ibm.com. For example, once a persistent page’s addresses have - 197- Application K-Rep C++Client ,=‘ ,o. i. 0 r 5;v;op;,;n;", II Environment i ~ ......... I/f,’ I I D =’,R,moteK-Ropl ~ r l ~ " ,,.,t,,t,.,,, ~" ....... ’ P o E IoE A 7’ ..... "~.... ’ I A I I K-.ep if : .pp.,=,o°s : ..v.C,,on, ............ , ........ I I ’1 ~1 OStore Server I I S I I I1" /, /I, I KBI !__OS_tor_e Client_.... - i~-Hen~erver r" J ;" OStor’e’S;~;; "’’& .... ’’’E" , , "i~ I= .~:,==--’~------’ ’ I I ~ u~tore~erver ........ Figure 1: System Architecture been mappedinto virtual memory,dereference of a persistent pointer to that page is implementedwith a single load instruction. An address reference may occur outside of the mappedrange, in which case a persistent page fault occurs. K-Repuses segment allocation and clustering to improve locality of reference, and its algorithms are specially designed to minimize persistent memory references. Although relational database management systems are now evolving to support objects in the form 1, contemporary systems are simply unable to of BLOBS match ObjectStore in terms of speed. Nonetheless, we are actively exploring appropriate uses of relational technology as our framework evolves. C++, however grotesque, was an obvious pragmatic choice for our core implementation language due to its object-orientation, portability, and amenability to integration with database systems and applications. (Java’s cleaner design, potentially superior portability, and especially its automatic memorymanagementmake it an intriguing future alternative.) K-Rep’s C++class library defines an application program interface (API) which presents an interface to the underlying kernel. The API encapsulates numerous implementation details such as the persistent storage mechanism. It also protects the integrity of internal data structures, including all those which persist. An identical interface is presented by an ephemeral (or transient) memoryversion of K-Rep which does not use ObjectStore. Using ephemeral knowledge bases consumes fewer system resources and facili- tates both application development and development of smaller knowledge bases. In addition to K-Rep applications, both the K-Rep development enviromment and a set of K-Rep system utilities access the kernel strictly through the API. The K-Rep kernel is divided into definition space (DSPACE) and semantic space (SSPACE) components, which are responsible for managing descriptions and their classification, respectively. The DSPAC~. maintains consistency amongseveral representations of a description: 1. A source form is a character string which is suitable for text file import/export operations, and can be replaced through an API call, e.g., when using the development environment’s text editor. 1Binary Large Objects. - 198- 2. A normalized form is a canonical, recursive data structure for the description per se, which is suitable for tabular presentation and structural editing. It is updatable through API calls, e.g., to implement "direct manipulation" editing in the graphical development environment. Amongother things, the normalized form includes both normalized roles for semantic attributes and facets for non-semantic attributes of concepts and roles as exemplified earlier. 3. A completed form captures the description’s meaning in the overall context of the knowledge base, i.e., it incorporates the results of inferences such as inheritance. It is only updated indirectly through changes to source or normalized forms. Thereis a many-to-one mappingfrom DSPACEcon- design systems support long running transactions, Kceptsto theircounterparts in the SSPACE. Taxonomic Rep has a model of collaborative work. It records a jourrelationships amongDSPACEconcepts aredetermined by hal of changes as textual representations. The journals reference to thecorresponding SSPACEconcepts, which associated with several terminology developers maythen areorganized in an explicit subsumption taxonomy. This be examined and merged. A methodology for automatapproach neatlyhandles the coexistence of equivalent ing this process in association with K-Repis the subject concepts. Semantic descriptions onlyrecord localdiffer- of[Campbell et al., 1996]. Key concerns in concurrent, distributed development of terminologies include: encesfrom immediate ancestorsin the taxonomy.We takeadvantage of thisto reduce thenumber of testsre¯ Conflict detection and resolution quiredby ourclassification algorithm, andto to mini¯ Distribution of approved terms mizethesizeof SSPACE concepts. In thecontext of ObjectStore, theDSPACE/SSPACE separation provides dra¯ Update of local versions maticperformance benefits: SSPACEconcepts areclus- Of course, subsumption and related inferences are exteredin separate ObjectStore segment(s), so onlySS- tremely useful for conflict detection and reporting. PACE pagesaremappedduringclassification (andonly forthoseSSPACE concepts encountered whiletraversing Conclusion the taxonomy). In addition, the smallsizeof SSPACE 3 Bringing K-Rep to the point of around-the-clock proconcepts reduces thenumberof pagesaccessed. K-Repcan operatein eithersinglesystemor dis- duction use at Kaiser-Permanente in Colorado [Mays et al., 1996] has been a non-trivial endeavor. Other tributed client/server mode.Distribution, in turn,is level of achieved in twodifferent ways,eitherthrough Object- projects are underway at the inter-regional Store’s client/server facility usingK-Rep’s nativeC++ Kaiser-Permanente, Barnes Jewish and Christian HosAPI,or withK-Rep’s ownclient/server queryfacility, pital, Columbia Presbyterian Medical Center, the Mayo Clinic, and Stanford University. In our experience, rigwherea run-time serverprovides accessto K-Repfor lightweight clients viaa scripting language (parameter- orous attention to a wide range of "systems issues" is izedqueries canbe parsedandregistered forefficient crucial, both to elicit interest from developmentpartners and commercial customers, and to win their acceptance. reuse).Theuseof a scripting language minimizes the numberof separate client/server transactions, andenablesdesktop applications wheresystemresources areat References a premium (asmaybethecaseina clinical workstation).[Campbell et al., 1996] K. E. Campbell, S. P. Cohn, BothC++ and Javaclientsare implemented usingthe C. C. Chute, G. Rennels, and E. H. Shortliffe. Galascripting language. The server provides a range of lexpagos: Computer-based support for evolution of a ical pattern matching capabilities which desktop clients convergent medical terminology. In Proceedings of may use to locate terms. A desktop client can also navthe 1996 AMIA Annual Fall Symposium, Washington, igate the structure of a knowledgebase to locate items DC, 1996. To appear. by their classification. Various forms of navigation are [Lamb et al., 1991] C. Lamb, G. Landis, J. Orenstein, possible depending on the user interaction technique. and D. Weinreb. The objectstore database system. K-Rep’s development environment includes a direct Communications of the ACM, 34(10):50-63, Oct 1991. manipulation graphical user interface, which features browse, query, and edit capabilities, a commit/abort [Mays et al., 1991] E. Mays, R. Dionne, and R. Weida. protocol for modifications on a per concept basis (orK-rep system overview. SIGARTBulletin, 2(3):93-97, thogonal to database transaction commits/aborts), and June 1991. Special Issue on Implemented Knowledge a journal facility. Considerable thought and effort was Representation and Reasoning Systems. required to make the user interface suitable for serious [Mays et al., 1996] E. Mays, R. Weida, R. Dionne, knowledgeengineering activities. A taxonomyviewer alM. Laker, B. White, C. Liang, and F. J. Oles. Scalable lows users to navigate the taxonomy in an incremental, and expressive medical terminologies. In Proceedings top-down manner. A user can focus his or her attenof the 1996 AMIAAnnual Fall Symposium, Washingtion by selectively revealing and hiding portions of the ton, DC, 1996. To appear. taxonomy. Other viewers, such as the definition viewer [Woods and Schmolze, 1992] W. A. Woods and J. G. which presents the normalized form and the significance Schmolze. The kl-one family. Computers and Matheviewer which presents the completed form, help users matics with Applications, 74(2-5), 1992. to assimilate definitions, their semantic import, and the relationship between the two. Incremental terminology development is fostered by a variety of mechanismsfor description "copy and edit" and role restriction refinement. During terminology development, K-Rep operates on a database in exclusive write mode, because modifications may have non-local effects. In the same way that - 199-