From: AAAI Technical Report WS-96-05. Compilation copyright © 1996, AAAI (www.aaai.org). All rights reserved.
The K-Rep System
Robert
Weida,
Eric
Mays,
Chihong
Robert
Liang,
Architecture
Dionne,
Meir Laker,
Frank J. Oles
Brian
White,
IBM T. J.
Watson Research Center*
P.O. Box 218
Yorktown Heights,
NY 10598
1
Introduction
ist independent of process termination or system
shutdown.
K-Rep is an object-oriented knowledge representation
Distribution:
Knowledge bases must be simultanesystem based on description logic [Woods and Schmolze,
ously accessible from multiple client systems in a
1992]. K-Repprovides a language to express descriptions
networked environment.
(e.g., of medical concepts such as drugs, treatments, and
diseases), inference mechanismsfor reasoning about deEmbeddability: K-Rep must offer terminology serscriptions and their relationship to one another, and convices as a seamlessly integrated componentof larger
siderable supporting infrastructure. This paper presents
application systems.
an overview of K-Rep’s system architecture, which has
Performance:
K-Rep must be able to respond at
been motivated in part by the needs of several applicainteractive speeds in critical production environtions in the area of clinical information systems [Mays
ments.
et al., 1996].
It has becomeclear that accurate and comprehensive
Authoring support: K-l:tep must provide a profescontrolled medical terminologies are crucial for advanced
sional development environment to facilitate
the
clinical information systems, e.g., to achieve comparacollaborative knowledgeengineering and visualizable data across the enterprise in support of outcomes
tion activities of domainexperts in an intuitive and
analysis and inter-operation of ancillary systems, conappealing manner.
eeptual indexing of clinical advice, and knowledge-based
Consistency maintenance: Revision and reclassifientry of orders for drugs, procedures, etc. Organization
cation of descriptions is essential, both for interacand maintenance of such a terminology is a formidable
tive knowledgebase editing and for run-time infertask: there are at least 100,000 concepts of initial intereneing.
est, both national (or international) standardization and
local customization of content must coexist, and the terScope: K-Rep must accommodate a wide variety of
minology will evolve significantly over time. The state
information within descriptions, including both seof the art in medical terminologies is characterized by
mantic roles and primitives, and non-semantic inad hoe definitions, manualclassification, little inferenformation such as bitmaps, synonyms, abbreviatial power, and no provision for collaborative developtions, and codes.
ment. Therefore, the area of clinical information sysThese considerations led us naturally to our current detems has becomeone of our key applications. In October
1994, after several years of experience with a Common sign point, illustrated in Figure 1 and discussed below.
Lisp implementation of the K-Rep description logic system [Mays et al., 1991], we embarked upon a substantial
2
Discussion
redesign effort to accomplish the following objectives:
K-Rep addresses scalability and persistence using ObSealability:
K-Rep must support extremely large
jectStore, an object-oriented database managementsysknowledge bases, potentially beyond the limits of
tem with a persistent memory model. Persistent memvirtual storage.
ory extends virtual memorysimilar to the way that virtual memoryextends real memory,but also includes the
Perslstenee:
Knowledge bases must be immediately
database characteristics of atomic, serializable, and reavailable to applications, and must continue to excoverable transactions. A particularly important feature of ObjectStore’s persistent memoryimplementation
*The first author may be contacted via email at
is the mapping of persistent memorypages into virtual
weida@cs.columbia.edu.The other authors, respectively,
memorypages using pointer swizzling [Lambet al., 1991].
maybe contacted at {emays, dionne, melt, bfwhite, cliang,
oles} @watson.ibm.com.
For example, once a persistent page’s addresses have
- 197-
Application
K-Rep
C++Client
,=‘
,o.
i. 0
r 5;v;op;,;n;",
II
Environment
i
~
.........
I/f,’
I
I
D
=’,R,moteK-Ropl ~ r l ~
"
,,.,t,,t,.,,,
~" .......
’
P
o
E IoE
A
7’ ..... "~.... ’
I
A
I
I K-.ep
if : .pp.,=,o°s
:
..v.C,,on,
............
,
........
I
I
’1
~1 OStore Server I
I
S
I
I
I1"
/,
/I,
I KBI
!__OS_tor_e
Client_....
- i~-Hen~erver
r"
J
;" OStor’e’S;~;;
"’’& ....
’’’E"
,
,
"i~
I=
.~:,==--’~------’
’
I
I
~
u~tore~erver
........
Figure 1: System Architecture
been mappedinto virtual memory,dereference of a persistent pointer to that page is implementedwith a single
load instruction. An address reference may occur outside of the mappedrange, in which case a persistent page
fault occurs. K-Repuses segment allocation and clustering to improve locality of reference, and its algorithms
are specially designed to minimize persistent memory
references. Although relational
database management
systems are now evolving to support objects in the form
1, contemporary systems are simply unable to
of BLOBS
match ObjectStore in terms of speed. Nonetheless, we
are actively exploring appropriate uses of relational technology as our framework evolves.
C++, however grotesque, was an obvious pragmatic
choice for our core implementation language due to its
object-orientation, portability, and amenability to integration with database systems and applications. (Java’s
cleaner design, potentially superior portability, and especially its automatic memorymanagementmake it an intriguing future alternative.) K-Rep’s C++class library
defines an application program interface (API) which
presents an interface to the underlying kernel. The API
encapsulates numerous implementation details such as
the persistent storage mechanism. It also protects the
integrity of internal data structures, including all those
which persist. An identical interface is presented by
an ephemeral (or transient) memoryversion of K-Rep
which does not use ObjectStore. Using ephemeral knowledge bases consumes fewer system resources and facili-
tates both application development and development of
smaller knowledge bases. In addition to K-Rep applications, both the K-Rep development enviromment and a
set of K-Rep system utilities access the kernel strictly
through the API.
The K-Rep kernel is divided into definition space
(DSPACE) and semantic space (SSPACE) components,
which are responsible for managing descriptions and
their classification, respectively. The DSPAC~.
maintains
consistency amongseveral representations of a description:
1. A source form is a character string which is suitable
for text file import/export operations, and can be
replaced through an API call, e.g., when using the
development environment’s text editor.
1Binary Large Objects.
- 198-
2. A normalized form is a canonical, recursive data
structure for the description per se, which is suitable for tabular presentation and structural editing.
It is updatable through API calls, e.g., to implement "direct manipulation" editing in the graphical
development environment. Amongother things,
the normalized form includes both normalized roles
for semantic attributes and facets for non-semantic
attributes of concepts and roles as exemplified earlier.
3. A completed form captures the description’s meaning in the overall context of the knowledge base,
i.e., it incorporates the results of inferences such as
inheritance. It is only updated indirectly through
changes to source or normalized forms.
Thereis a many-to-one
mappingfrom DSPACEcon- design systems support long running transactions,
Kceptsto theircounterparts
in the SSPACE. Taxonomic Rep has a model of collaborative work. It records a jourrelationships
amongDSPACEconcepts
aredetermined
by
hal of changes as textual representations. The journals
reference
to thecorresponding
SSPACEconcepts,
which associated with several terminology developers maythen
areorganized
in an explicit
subsumption
taxonomy.
This be examined and merged. A methodology for automatapproach
neatlyhandles
the coexistence
of equivalent ing this process in association with K-Repis the subject
concepts.
Semantic
descriptions
onlyrecord
localdiffer- of[Campbell et al., 1996]. Key concerns in concurrent,
distributed development of terminologies include:
encesfrom immediate
ancestorsin the taxonomy.We
takeadvantage
of thisto reduce
thenumber
of testsre¯ Conflict detection and resolution
quiredby ourclassification
algorithm,
andto to mini¯ Distribution of approved terms
mizethesizeof SSPACE concepts.
In thecontext
of ObjectStore,
theDSPACE/SSPACE separation
provides
dra¯ Update of local versions
maticperformance
benefits:
SSPACEconcepts
areclus- Of course, subsumption and related inferences are exteredin separate
ObjectStore
segment(s),
so onlySS- tremely useful for conflict detection and reporting.
PACE pagesaremappedduringclassification
(andonly
forthoseSSPACE concepts
encountered
whiletraversing
Conclusion
the taxonomy).
In addition,
the smallsizeof SSPACE 3
Bringing K-Rep to the point of around-the-clock proconcepts
reduces
thenumberof pagesaccessed.
K-Repcan operatein eithersinglesystemor dis- duction use at Kaiser-Permanente in Colorado [Mays
et al., 1996] has been a non-trivial endeavor. Other
tributed
client/server
mode.Distribution,
in turn,is
level of
achieved
in twodifferent
ways,eitherthrough
Object- projects are underway at the inter-regional
Store’s
client/server
facility
usingK-Rep’s
nativeC++ Kaiser-Permanente, Barnes Jewish and Christian HosAPI,or withK-Rep’s
ownclient/server
queryfacility, pital, Columbia Presbyterian Medical Center, the Mayo
Clinic, and Stanford University. In our experience, rigwherea run-time
serverprovides
accessto K-Repfor
lightweight
clients
viaa scripting
language
(parameter- orous attention to a wide range of "systems issues" is
izedqueries
canbe parsedandregistered
forefficient crucial, both to elicit interest from developmentpartners
and commercial customers, and to win their acceptance.
reuse).Theuseof a scripting
language
minimizes
the
numberof separate
client/server
transactions,
andenablesdesktop
applications
wheresystemresources
areat
References
a premium
(asmaybethecaseina clinical
workstation).[Campbell et al., 1996] K. E. Campbell, S. P. Cohn,
BothC++ and Javaclientsare implemented
usingthe
C. C. Chute, G. Rennels, and E. H. Shortliffe. Galascripting language. The server provides a range of lexpagos: Computer-based support for evolution of a
ical pattern matching capabilities which desktop clients
convergent medical terminology. In Proceedings of
may use to locate terms. A desktop client can also navthe 1996 AMIA Annual Fall Symposium, Washington,
igate the structure of a knowledgebase to locate items
DC, 1996. To appear.
by their classification. Various forms of navigation are
[Lamb et al., 1991] C. Lamb, G. Landis, J. Orenstein,
possible depending on the user interaction technique.
and D. Weinreb. The objectstore
database system.
K-Rep’s development environment includes a direct
Communications
of
the
ACM,
34(10):50-63,
Oct 1991.
manipulation graphical user interface, which features
browse, query, and edit capabilities,
a commit/abort
[Mays et al., 1991] E. Mays, R. Dionne, and R. Weida.
protocol for modifications on a per concept basis (orK-rep system overview. SIGARTBulletin, 2(3):93-97,
thogonal to database transaction commits/aborts), and
June 1991. Special Issue on Implemented Knowledge
a journal facility. Considerable thought and effort was
Representation and Reasoning Systems.
required to make the user interface suitable for serious
[Mays et al., 1996] E. Mays, R. Weida, R. Dionne,
knowledgeengineering activities. A taxonomyviewer alM. Laker, B. White, C. Liang, and F. J. Oles. Scalable
lows users to navigate the taxonomy in an incremental,
and expressive medical terminologies. In Proceedings
top-down manner. A user can focus his or her attenof the 1996 AMIAAnnual Fall Symposium, Washingtion by selectively revealing and hiding portions of the
ton, DC, 1996. To appear.
taxonomy. Other viewers, such as the definition viewer
[Woods and Schmolze, 1992] W. A. Woods and J. G.
which presents the normalized form and the significance
Schmolze. The kl-one family. Computers and Matheviewer which presents the completed form, help users
matics with Applications, 74(2-5), 1992.
to assimilate definitions, their semantic import, and the
relationship between the two. Incremental terminology
development is fostered by a variety of mechanismsfor
description "copy and edit" and role restriction refinement.
During terminology development, K-Rep operates on
a database in exclusive write mode, because modifications may have non-local effects. In the same way that
- 199-