MIE 2003 Data Representation in Healthcare 2 25 03

advertisement
Data Representation in Healthcare: A design for a freely-available
openly-developed international reference terminology for healthcare.
Authors: Peter L. Elkin, MD1, Steven H. Brown, MD2, Michael J. Lincoln, MD2,
Mark Pittelkow, MD1, Brent A. Bauer, MD1, Dietlind Wahner-Roedler, MD1,
Rhonda Thomas, PhD3, Larry Bergstrom, MD1
Affiliations: 1 – Mayo Foundation for Medical Education and Research, 2 – Veterans Administration, 3 –
Conceptual Health Solutions
Abstract:
The importance of data representation in healthcare has been of concern for centuries. In the last fifty years
there has been an increasing awareness of the need for formal representations of terminological systems.
We propose that terminological system development should be consensus driven and the product of
iterative open refinement in order to practically serve the needs of the healthcare community. The system
of development and maintenance of such a system must involve recruitment of the best and the brightest in
the medical community to take responsibility for insuring the accuracy and completeness of such an effort.
We suggest a method for algorithmically building a starting point for such a reference terminology. Further
we suggest a distributed authoring environment that would allow domain experts to contribute regardless of
their location or language. Open intellectual contribution is the necessary ingredient for a consensus based
international health reference terminology.
Keywords:
Controlled Health Terminologies, Ontologies, Reference Terminologies
Introduction:
The benefit of common, language independent knowledge representation is that healthcare practitioners
will be able to speak anyway they like and then automated systems will be able to “understand” what they
mean. This requires a high degree of synonymy, normalization of words and noun phrases, term
completion, word completion and a detailed warehousing service that can reconcile compositional
expressions with the same or similar meaning. This is exemplified in the USA VHA’s HealthePeople
program which advocates for standards based Health IT practices using a common data infrastructure to
support interoperability.
To accomplish the goal of near complete content coverage we need to build a terminological system that
supports post-coordination of concepts, i.e., a compositional system. The LSVT trial clearly showed that
large pre-coordinated terminologies provide limited exact content coverage (58% coverage by UMLS of
the terms submitted for the LSVT). At the Mayo Clinic we have been able to successfully encode 95.12%
of the diagnoses generated by clinicians on a busy General Medical Hospital Service using SNOMED-RT
in a system that supports post-coordination.
We propose an approach to create a set of systematic (i.e. human readable) and formal description logic
definitions for Health and Healthcare as a step towards achieving unambiguous comparable data (please see
Figures # 1 and 2). This approach takes description logic based terminological representation one-step
further by linking it to a formal level two ontology for medical knowledge. In doing so, we hope to
facilitate inferencing and decision support that will help meet goals such as patient safety. To quote
William J. Mayo, MD “The best interest of the patient is the only interest to be considered.”
Model:
1.
Hierarchical Models will be used. Within the field of Health several different terminological
domains can be identified. Each of these domains requires a different description logic
representation to ensure both adequate content coverage and comparable data be achieved. We
have identified a number of domains, which fall into two categories: Basic Science and Clinicial.
The Basic Science area includes the terminological domains of: Genetics, Proteinomics,
Anatomy, Physiology, Biochemistry, Microbiology, Pharmacology, Pathophysiology,
Epidemiology, Biomedical Informatics. The Clinical area domains include: Diagnoses, Findings,
Tests, Procedures, Medications, Genetics.
For each of these domains a hierarchy should be built using a classifier based on the description
logic definitions for each of the concepts contained within that domain. For example, within the
Clinical area: 1) Diagnoses roles should be “has-Location,” “has-KernelDiagnosis,” and “hasReason.” 2) Findings roles should be “has-Location,” “has-Manifestation,” and “has-Reason.” 3)
Tests should use the LOINC fields as the description logic representation to be compatible with
HL7 and most lab test and ordering systems (“has-Measured-Component, has-Method, hasProperty, has-Scale-Type, has-Specimen, has-Subject-of-Observation, and has-Time-Aspect”). 4)
Procedures should be modeled using the roles “has-Reason,” “has-Approach,” “has-Extent,”
“Uses-Equipment,” “Uses-anesthesia,” “has-Location,” “Uses-Medication,” and “hasComplication.” And, 5) in Anatomy we should use the “Part-of” and “has-Function”, “hasAdjacency” relations using the hierarchies provided within the Digital Anatomist Project (Rosse,
Shapiro et al. 1998). Furthermore, in Physiology we should use the roles “has-Location,” “hasFunction,” and “has-Measurement-Method”and, in Genetics we should follow the NCI and Gene
Ontology conventions for genes and proteins as a start.
2.
Candidate terms should be identified.
3.
Terms should be modeled and definitions both systematic and formal definitions should be
assigned to each concept.
Method:
First cut terminology development
As our terminology needs to “understand” health and healthcare we should “teach” it medicine. An early
lesson should address anatomy by using the Foundational Model of Anatomy. Next we will add
physiology. Lacking a standard terminology we will agglomerate indices from textbooks. From there we
will teach it Pathophysiology, Microbiology, Pharmacology, Epidemiology, Biomedical Informatics.
Subsequent domains include: Diagnoses, Findings, Tests, Procedures, and Medications . For Medications
we should use the NDF-RT for medications and LOINC for test results. For Diagnoses we should use
ICD10 and for procedures we should use ICD10-PCS. Findings can be derived from the Mayo database.
We will identify candidate terms from our Master-sheet index, which has been accumulating clinical
impressions (Diagnoses and Findings) regarding patient conditions since 1911. This rich source of
terminological material can be combined with the Surgical Index, which has been accumulating procedural
data for almost as long. We have identified a substrate of at least 10,000 terms that have been registered as
either Findings or Diagnoses by the Mayo practice.
We would build a semantically integrated terminology by initially adding domains that are relatively
“atomic,” followed by domains that are increasingly “molecular.” With each additional domain, new terms
could be dissected algorithmically using the evolving reference terminology to create description logic
definitions whenever possible. By going in logical order we will have the substrate necessary to parse each
subsequent domain. For example, we will add representations of genes, proteins, anatomy, and physiology
sequentially to the evolving model. This will generate a single model with a consistent non-overlapping set
of hierarchies. Figure # 1 provides a pictorial depiction of the proposed method. The end product of this
process will be expressed in the Web Ontology Language (OWL), communicated using the Resource
Description Framework Schema (RDFS) of the World Wide Web consortium (w3c). As a description logic
framework we will employ the SHIQ description logic representation.
Making something ugly beautiful
The true beauty of this approach lies in its support for a consensus process, A web-based application should
be deployed to allow anyone to contribute content. The content should be parsed and matched to the
nomenclature. If the submitted term is well defined and not currently in the nomenclature, it should be
dissected and considered a candidate for inclusion in the the terminology. A content expert who is the
editor for that domain should review suggested changes to existing concepts. Each editor should employ
other specialists and sub-specialists in their field to assist them in the maintenance tasks. Synonyms should
be added in much the same way. Rearranging the hierarchies is considered equivalent to changing the
content and will need a consensus review. Synonyms should necessarily be multi-lingual. Associated
language models should be stored to assist with reconstruction of concepts from one language to another.
Terminology Modeling
We propose a “squad” system of modeler organization, where each squad consists of a number of term
modelers and a squad editor (these would be organized by subspecialty). A chief editor should supervise
the squad editors. A common pool of domain and technical experts should be available to work with all
modeling groups (figure 3).
Subspecialty
modeling
consultants
Modeling
Group 1
Modeling
Group 2
Editor 1
Editor 2
…
Modeling
Group X
Editor X
CHIEF
EDITOR
Figure #3: Modeling Workflow
Term Modeling Process Overview
For internal review purposes, the chief editor should assign terminology domain areas to the various
modeling groups and monitor progress. The domain editors should assign terms to individual modelers on
their team and monitor progress.
Modelers in each modeling group should receive assignments from their editor. They should use a webbased terminology-authoring environment and a suite of medical knowledge sources (e.g., computerized
books, Web resources, and dictionaries) to create description logic models of their assigned concepts. They
should decide how to model each concept, or else consult with a reference subspecialty expert. All
modelers should receive performance feedback from their editors regarding their modeling consistency. To
insure consistent, high-quality modeling, they should use a modeling style guide that addresses specific
modeling issues and choices. They should also participate in quarterly modeling meetings together with
the editors and project leaders in order to resolve modeling issues and improve the style guide.
Similar processes and procedures have been tested in production at Kaiser-Permanente, the College of
American Pathologists, the National Center for Biotechnology Information (NCBI) and National Library of
Medicine.
Domain Editor’s Role and Responsibilities
The domain editor should be charged with using shared resources (e.g. specialty consultants, informatics
consultants) to accomplish modeling tasks. The editor should have a deep understanding of the domain,
domain models and modeling process. The domain editor should have at least a rudimentary knowledge of
description logic theory (figure 4). Examples of tasks to be performed include:
Distribute and track modeling assignments
Mediate ‘unresolvable’ conflicts
Pair modelers for mentoring and definition ‘double-checking’
Educate new modelers
First line tech support for squad members
Reference for style guide interpretation issues
Bring modeling issues to style guide discussions
Participate in quarterly modelers meetings
Participate in weekly teleconferences
Status reporting to chief editor
Public relations/educationon an “as needed” basisAssess inter-modeler agreement and provide remedial
training when needed; Work with chief editor on inter-squad modeling consistency.
Modeler’s Roles and Responsibilities
The modeler is charged with creating description logic definitions for assigned concepts, and participating
in workflow management and quality control processes. Specific responsibilities include:
Term modeling
Participation in workflow process to allow status monitoring
Read, understand, and comply with the style guide
Raise style issues with the squad editor
Participate in weekly calls
Participate in quarterly modelers meetings
Provide accurate work completion time frames to domain editor
Provide accurate availability for tasking information to domain editor
Resolve conflicts collaboratively and according to style guide
Do not ‘sub optimize’ efforts. When uncertain of a definition, pass it immediately to the next level modeler.
Assist in creating, validating, and modifying partition term model as assigned
Participation in quality assurance within and between squads as directed by the domain editor
Recruit, train and mentor new modelers
Willingly step aside if unable to constructively contribute in an ongoing manner
Discussion:
A common freely available openly developed international reference terminology for health is an essential
component of the information infrastructure needed to move healthcare into the 21 st Century. We propose
one method that may be used to develop a beginning ontology, which we propose to name the Foundational
Linguistic Ontology of Health (FLOH). We further recommend an open process for authoring and
contributing to the ontology. This will encourage content experts to contribute and criticize the work,
necessarily leading to iterative improvement. An editorial process for handling conflicts designed to
encourage best practice in ontology creation and domain representation will temper this fully open
authoring environment.
Finding methods of incorporating various language models will assist our ability to be truly language
independent. If developed, we believe that such a reference terminology would be widely accepted which
would in turn support its ongoing maintenance. Quality data representation that supports full postcoordination and normalization is essential if we are to represent the details necessary for decision support.
We believe that this open development process will make progress towards achieving the goals of
interoperability and diversity of surface forms.
Limitations of this method include possible insertion of errors into the ontology both in content and
semantics. We believe that with time this will be expunged through the editorial process documented in the
methods section. Furthermore as we evolve over time there will be version control, obsolescence marking
and pointers with dates for concept evolution (as we learn more in medicine regarding particular medical
entities).
The terminology should be concept oriented and should avoid using “Not Elsewhere Classified” (NEC) in
the terminology. Principles of quality in terminological authoring will be adhered to in this project (see
ISO TS17117). Incorporation of suggestions should be in a timely basis. Suggestions without conflicts
that are fully specified (i.e., classify automatically) will be added to the terminology algorithmically. This
will facilitate rapid inclusion of less controversial concepts. Official versions should be released at least
every three months.
We conclude that the domains of biology and health need a common reference terminology to support their
data representation needs. These needs include but are not limited to patient safety, patient care at large,
epidemiology, public health reporting and decision support. In this manuscript we have outlined one
method by which such a distributed knowledge-based freely available, openly developed reference
terminology for health could be developed.
Acknowledgements:
This work has been supported in part by a grant from the National Library of Medicine
(LM06918-A101). The authors are grateful to Cornelius Rosse and Onard Mejino for
their comments regarding this manuscript.
References:
1.
2.
3.
4.
5.
6.
Elkin PL, et al; “Standard specification of Quality Indicators for Controlled Health Vocabularies,
ASTM / ANSI Standard (ASTM E2087).
Elkin PL, et al; “Health Informatics. Controlled health vocabularies - Vocabulary structure and highlevel indicators”, ISO TC 215 Technical Specification (TS17117).
http://www.daml.org/2001/03/daml+oil-index
Elkin PL, et al; “Standard Problem List Generation, Utilizing the Mayo Canonical Vocabulary
Embedded within the Unified Medical Language System”; Journal of the American Medical
Informatics Association (JAMIA) Supplement 1997, p. 500-504.
Rogers J, Rector A; “GALEN’s Model of Parts and Wholes: Experience and Comparisons”,
Proceedings/AMIA Annual Symposium, 2000, pages 714-718.
Cimino JJ; “Providing concept-oriented Views for Clinical Data using a Knowledge-Based System: an
Evaluation”, Journal of the Americal Medical Informatics Association 2002, p. 294-305.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
Rector A, Rossi A, Consorti MF, Zanstra P; “Practical Development of Re-usable Terminologies:
GALEN-IN-USE and the GALEN Organisation”, International Journal of Medical Informatics,
February 1998, 48(1-3) pages 71-84.
Horrocks U, Tobies, S; “Practical reasoning for description logics with functional restrictions, inverse
and transitive roles, and role hierarchies”, Proceedings of the first workshop on Methods for Modalities
(M4M-1), May 1999.
Horrocks U, Tobies, S; “Optimisation of terminological reasoning”, Proceedings of the 2000
International Workshop on Description Logics (DL2000), pages 183-192, 2000.
Baenziger J, et al; “Development of the Logical Observation Identifier Names and Codes (LOINC)
Vocabulary”, Journal of the American Medical Informatics Association, May-June 1998, 5(3) pages
276-292.
Elkin PL, et al; “A Randomized Controlled Trial of Automated Term Composition”, JAMIA, 1998,
Symposium, Supplement, l:765-769.
Elkin PL, Tuttle MS, Cesnik B, McCray AT, Scherrer JR; “The Role of Compositionality in
Standardized Problem List Generation”, Ninth World Congress on Medical Informatics; IOS Press;
1998. p. 660-664.
Rosse C, Mejino JL, Modayur BR, Jakobovits R, Hinshaw KP, Brinkley JF. 1998. Motivation and
organizational principles for anatomical knowledge representation: the Digital Anatomist Symbolic
Knowledge Base. J. Am. Med. Informatics Assoc.5:17-40.
Rosse C, Shapiro LG, Brinkley JF. 1998. The Digital Anatomist Foundational Model: Principles for
Defining and Structuring its Concept Domain. IN Chute EG (ed): A paradigm shift in health care
information systems: clinical infrastructures for the 21st century. JAMIA Symposium Supplement. '98:
820-824
Mejino JL, Rosse C. 1998. The Potential of the Digital Anatomist Foundational Model for assuring
consistency in UMLS sources. IN Chute EG (ed): A paradigm shift in health care information systems:
clinical infrastructures for the 21st century. JAMIA Symposium Supplement. '98:825-829
Neal PJ, Shapiro LG, Rosse C. 1998. The Digital Anatomist Spatial Abstraction: a scheme for the
spatial description of anatomical entities. IN Chute EG (ed): A paradigm shift in health care information
systems: clinical infrastructures for the 21st century. JAMIA Symposium Supplement. '98:423-427.
Mejino JLV, Rosse C . 1999. Conceptualizations of Anatomical Spatial Entities in the Digital Anatomist
Foundational Model. J. Am. Med. Assoc. AMIA ’99 Symp. Suppl. '99: 112-116.
Agoncillo A, Mejino JLV, Rosse C. 1999. Influence of the Digital Anatomist Foundational Model on
Traditional Representations of Anatomical Concepts. J. Am. Med. Assoc. AMIA ’99 Symp. Suppl. '99:
2-6.
Michael J, Mejino JLV, Rosse C. 2001. The role of definitions in biomedical concept representation.
JAMIA Symposium Supplement. '01:463-467.
Hahn JS, Burnside E, Brinkley JF, Rosse C, Musen MA. 1999. Representing the Digital Anatomist
Foundational Model in a Protégé ontology. JAMIA Symposium Supplement. '99:1070p
Rosse C, Mejino JLV, Shapiro LG, Brinkley JF. 2000. Visible Human, Know Thyself: The Digital
Anatomist Structural Abstraction. In: The Third Visible Human Project Conference Proceedings.
Bethesda: NLM; 85-86.
Mejino JLV, Noy NF, Musen M, Rosse C. 2001. Representation of structural relationships in the
Foundational Model of anatomy. JAMIA Symposium Supplement. '01:937p.
Noy NF, Mejino JLV Jr., Musen MA, Rosse C. 2002. Pushing the envelope: challenges in frame-based
representation of human anatomy. Submitted. Data & Knowledge
PL Elkin, BA Bauer, et al; “A Randomized Double-Blind Controlled Trial of Automated Term
Dissection”, JAMIA, Supplement, 1999.
Decision
Support
Practice Mgt.
Decision
Support
Practice Mgt.
Patient Safety
Research
Vocabulary Services
Informatics Team
Microbiology
Biochemistry
Anatomy
Genomics
Pathology
Physiology Laboratory Pharmacology
Radiology
Radiology
Common
Reference
Terminology
Model
Common
Reference
Terminology
Model
Figure #1: A proposed jigsaw puzzle of the reference terminology
development model for health. This is meant to depict the sort of disciplines
represented but not the full enumeration of important contributions to the
representational schema.
Evolution of Reference
Terminologies

Basic
Science
Clinical
Science
Clinical
Practice
Anatomy
Biochemistry
Physiology
Microbiology
Laboratory
Pathology
Radiology
Genomics
Pharmacology
Findings
Tests
Procedures
Medications
Genetics
Data
Collection
&
Analysis
Vocabulary
Practice
Knowledge
Management
Enhancement
Standards
Practice
Management
Coding
Research
Statistics
Decision
Support
Data Mining
Patient Safety
Education
Collaboration
Outcome
Improvement
Reduction in
Adverse Events
Figure #2: The evolution of reference terminology development and use.
Figure 4: Domain characterization for IRT.
International
Reference
Terminology
Genetics
Anatomy
Proteinomics
Physiology
Biochemistry
Microbiology
Pharmacology
Pathophysiology
Epidemiology
Biomedical
Informatics
Term
Modeling
Process
Basic Science
Domains
Clinical
Domains
Diagnosis
Findings
Tests
Procedures
Medications
Genetics
Download