Data Representation in Healthcare: A design for a freely-available openly-developed international reference terminology for healthcare. Authors: Peter L. Elkin, MD1, Steven H. Brown, MD2, Michael J. Lincoln, MD2, Mark Pittelkow, MD1, Brent A. Bauer, MD1, Dietlind Wahner-Roedler, MD1, Rhonda Thomas, PhD3, Larry Bergstrom, MD1 Affiliations: 1 – Mayo Foundation for Medical Education and Research, 2 – Veterans Administration, 3 – Conceptual Health Solutions Abstract: The importance of data representation in healthcare has been of concern for centuries. In the last fifty years there has been an increasing awareness of the need for formal representations of terminological systems. We propose that terminological system development should be consensus driven and the product of iterative open refinement in order to practically serve the needs of the healthcare community. The system of development and maintenance of such a system must involve recruitment of the best and the brightest in the medical community to take responsibility for insuring the accuracy and completeness of such an effort. We suggest a method for algorithmically building a starting point for such a reference terminology. Further we suggest a distributed authoring environment that would allow domain experts to contribute regardless of their location or language. Open intellectual contribution is the necessary ingredient for a consensus based international health reference terminology. Keywords: Controlled Health Terminologies, Ontologies, Reference Terminologies Introduction: The benefit of common, language independent knowledge representation is that healthcare practitioners will be able to speak anyway they like and then automated systems will be able to “understand” what they mean. This requires a high degree of synonymy, normalization of words and noun phrases, term completion, word completion and a detailed warehousing service that can reconcile compositional expressions with the same or similar meaning. This is exemplified in the USA VHA’s HealthePeople program which advocates for standards based Health IT practices using a common data infrastructure to support interoperability. To accomplish the goal of near complete content coverage we need to build a terminological system that supports post-coordination of concepts, i.e., a compositional system. The LSVT trial clearly showed that large pre-coordinated terminologies provide limited exact content coverage (58% coverage by UMLS of the terms submitted for the LSVT). At the Mayo Clinic we have been able to successfully encode 95.12% of the diagnoses generated by clinicians on a busy General Medical Hospital Service using SNOMED-RT in a system that supports post-coordination. We propose an approach to create a set of systematic (i.e. human readable) and formal description logic definitions for Health and Healthcare as a step towards achieving unambiguous comparable data (please see Figures # 1 and 2). This approach takes description logic based terminological representation one-step further by linking it to a formal level two ontology for medical knowledge. In doing so, we hope to facilitate inferencing and decision support that will help meet goals such as patient safety. To quote William J. Mayo, MD “The best interest of the patient is the only interest to be considered.” Model: 1. Hierarchical Models will be used. Within the field of Health several different terminological domains can be identified. Each of these domains requires a different description logic representation to ensure both adequate content coverage and comparable data be achieved. We have identified a number of domains, which fall into two categories: Basic Science and Clinicial. The Basic Science area includes the terminological domains of: Genetics, Proteinomics, Anatomy, Physiology, Biochemistry, Microbiology, Pharmacology, Pathophysiology, Epidemiology, Biomedical Informatics. The Clinical area domains include: Diagnoses, Findings, Tests, Procedures, Medications, Genetics. For each of these domains a hierarchy should be built using a classifier based on the description logic definitions for each of the concepts contained within that domain. For example, within the Clinical area: 1) Diagnoses roles should be “has-Location,” “has-KernelDiagnosis,” and “hasReason.” 2) Findings roles should be “has-Location,” “has-Manifestation,” and “has-Reason.” 3) Tests should use the LOINC fields as the description logic representation to be compatible with HL7 and most lab test and ordering systems (“has-Measured-Component, has-Method, hasProperty, has-Scale-Type, has-Specimen, has-Subject-of-Observation, and has-Time-Aspect”). 4) Procedures should be modeled using the roles “has-Reason,” “has-Approach,” “has-Extent,” “Uses-Equipment,” “Uses-anesthesia,” “has-Location,” “Uses-Medication,” and “hasComplication.” And, 5) in Anatomy we should use the “Part-of” and “has-Function”, “hasAdjacency” relations using the hierarchies provided within the Digital Anatomist Project (Rosse, Shapiro et al. 1998). Furthermore, in Physiology we should use the roles “has-Location,” “hasFunction,” and “has-Measurement-Method”and, in Genetics we should follow the NCI and Gene Ontology conventions for genes and proteins as a start. 2. Candidate terms should be identified. 3. Terms should be modeled and definitions both systematic and formal definitions should be assigned to each concept. Method: First cut terminology development As our terminology needs to “understand” health and healthcare we should “teach” it medicine. An early lesson should address anatomy by using the Foundational Model of Anatomy. Next we will add physiology. Lacking a standard terminology we will agglomerate indices from textbooks. From there we will teach it Pathophysiology, Microbiology, Pharmacology, Epidemiology, Biomedical Informatics. Subsequent domains include: Diagnoses, Findings, Tests, Procedures, and Medications . For Medications we should use the NDF-RT for medications and LOINC for test results. For Diagnoses we should use ICD10 and for procedures we should use ICD10-PCS. Findings can be derived from the Mayo database. We will identify candidate terms from our Master-sheet index, which has been accumulating clinical impressions (Diagnoses and Findings) regarding patient conditions since 1911. This rich source of terminological material can be combined with the Surgical Index, which has been accumulating procedural data for almost as long. We have identified a substrate of at least 10,000 terms that have been registered as either Findings or Diagnoses by the Mayo practice. We would build a semantically integrated terminology by initially adding domains that are relatively “atomic,” followed by domains that are increasingly “molecular.” With each additional domain, new terms could be dissected algorithmically using the evolving reference terminology to create description logic definitions whenever possible. By going in logical order we will have the substrate necessary to parse each subsequent domain. For example, we will add representations of genes, proteins, anatomy, and physiology sequentially to the evolving model. This will generate a single model with a consistent non-overlapping set of hierarchies. Figure # 1 provides a pictorial depiction of the proposed method. The end product of this process will be expressed in the Web Ontology Language (OWL), communicated using the Resource Description Framework Schema (RDFS) of the World Wide Web consortium (w3c). As a description logic framework we will employ the SHIQ description logic representation. Making something ugly beautiful The true beauty of this approach lies in its support for a consensus process, A web-based application should be deployed to allow anyone to contribute content. The content should be parsed and matched to the nomenclature. If the submitted term is well defined and not currently in the nomenclature, it should be dissected and considered a candidate for inclusion in the the terminology. A content expert who is the editor for that domain should review suggested changes to existing concepts. Each editor should employ other specialists and sub-specialists in their field to assist them in the maintenance tasks. Synonyms should be added in much the same way. Rearranging the hierarchies is considered equivalent to changing the content and will need a consensus review. Synonyms should necessarily be multi-lingual. Associated language models should be stored to assist with reconstruction of concepts from one language to another. Terminology Modeling We propose a “squad” system of modeler organization, where each squad consists of a number of term modelers and a squad editor (these would be organized by subspecialty). A chief editor should supervise the squad editors. A common pool of domain and technical experts should be available to work with all modeling groups (figure 3). Subspecialty modeling consultants Modeling Group 1 Modeling Group 2 Editor 1 Editor 2 … Modeling Group X Editor X CHIEF EDITOR Figure #3: Modeling Workflow Term Modeling Process Overview For internal review purposes, the chief editor should assign terminology domain areas to the various modeling groups and monitor progress. The domain editors should assign terms to individual modelers on their team and monitor progress. Modelers in each modeling group should receive assignments from their editor. They should use a webbased terminology-authoring environment and a suite of medical knowledge sources (e.g., computerized books, Web resources, and dictionaries) to create description logic models of their assigned concepts. They should decide how to model each concept, or else consult with a reference subspecialty expert. All modelers should receive performance feedback from their editors regarding their modeling consistency. To insure consistent, high-quality modeling, they should use a modeling style guide that addresses specific modeling issues and choices. They should also participate in quarterly modeling meetings together with the editors and project leaders in order to resolve modeling issues and improve the style guide. Similar processes and procedures have been tested in production at Kaiser-Permanente, the College of American Pathologists, the National Center for Biotechnology Information (NCBI) and National Library of Medicine. Domain Editor’s Role and Responsibilities The domain editor should be charged with using shared resources (e.g. specialty consultants, informatics consultants) to accomplish modeling tasks. The editor should have a deep understanding of the domain, domain models and modeling process. The domain editor should have at least a rudimentary knowledge of description logic theory (figure 4). Examples of tasks to be performed include: Distribute and track modeling assignments Mediate ‘unresolvable’ conflicts Pair modelers for mentoring and definition ‘double-checking’ Educate new modelers First line tech support for squad members Reference for style guide interpretation issues Bring modeling issues to style guide discussions Participate in quarterly modelers meetings Participate in weekly teleconferences Status reporting to chief editor Public relations/educationon an “as needed” basisAssess inter-modeler agreement and provide remedial training when needed; Work with chief editor on inter-squad modeling consistency. Modeler’s Roles and Responsibilities The modeler is charged with creating description logic definitions for assigned concepts, and participating in workflow management and quality control processes. Specific responsibilities include: Term modeling Participation in workflow process to allow status monitoring Read, understand, and comply with the style guide Raise style issues with the squad editor Participate in weekly calls Participate in quarterly modelers meetings Provide accurate work completion time frames to domain editor Provide accurate availability for tasking information to domain editor Resolve conflicts collaboratively and according to style guide Do not ‘sub optimize’ efforts. When uncertain of a definition, pass it immediately to the next level modeler. Assist in creating, validating, and modifying partition term model as assigned Participation in quality assurance within and between squads as directed by the domain editor Recruit, train and mentor new modelers Willingly step aside if unable to constructively contribute in an ongoing manner Discussion: A common freely available openly developed international reference terminology for health is an essential component of the information infrastructure needed to move healthcare into the 21 st Century. We propose one method that may be used to develop a beginning ontology, which we propose to name the Foundational Linguistic Ontology of Health (FLOH). We further recommend an open process for authoring and contributing to the ontology. This will encourage content experts to contribute and criticize the work, necessarily leading to iterative improvement. An editorial process for handling conflicts designed to encourage best practice in ontology creation and domain representation will temper this fully open authoring environment. Finding methods of incorporating various language models will assist our ability to be truly language independent. If developed, we believe that such a reference terminology would be widely accepted which would in turn support its ongoing maintenance. Quality data representation that supports full postcoordination and normalization is essential if we are to represent the details necessary for decision support. We believe that this open development process will make progress towards achieving the goals of interoperability and diversity of surface forms. Limitations of this method include possible insertion of errors into the ontology both in content and semantics. We believe that with time this will be expunged through the editorial process documented in the methods section. Furthermore as we evolve over time there will be version control, obsolescence marking and pointers with dates for concept evolution (as we learn more in medicine regarding particular medical entities). The terminology should be concept oriented and should avoid using “Not Elsewhere Classified” (NEC) in the terminology. Principles of quality in terminological authoring will be adhered to in this project (see ISO TS17117). Incorporation of suggestions should be in a timely basis. Suggestions without conflicts that are fully specified (i.e., classify automatically) will be added to the terminology algorithmically. This will facilitate rapid inclusion of less controversial concepts. Official versions should be released at least every three months. We conclude that the domains of biology and health need a common reference terminology to support their data representation needs. These needs include but are not limited to patient safety, patient care at large, epidemiology, public health reporting and decision support. In this manuscript we have outlined one method by which such a distributed knowledge-based freely available, openly developed reference terminology for health could be developed. Acknowledgements: This work has been supported in part by a grant from the National Library of Medicine (LM06918-A101). The authors are grateful to Cornelius Rosse and Onard Mejino for their comments regarding this manuscript. References: 1. 2. 3. 4. 5. 6. Elkin PL, et al; “Standard specification of Quality Indicators for Controlled Health Vocabularies, ASTM / ANSI Standard (ASTM E2087). Elkin PL, et al; “Health Informatics. Controlled health vocabularies - Vocabulary structure and highlevel indicators”, ISO TC 215 Technical Specification (TS17117). http://www.daml.org/2001/03/daml+oil-index Elkin PL, et al; “Standard Problem List Generation, Utilizing the Mayo Canonical Vocabulary Embedded within the Unified Medical Language System”; Journal of the American Medical Informatics Association (JAMIA) Supplement 1997, p. 500-504. Rogers J, Rector A; “GALEN’s Model of Parts and Wholes: Experience and Comparisons”, Proceedings/AMIA Annual Symposium, 2000, pages 714-718. Cimino JJ; “Providing concept-oriented Views for Clinical Data using a Knowledge-Based System: an Evaluation”, Journal of the Americal Medical Informatics Association 2002, p. 294-305. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. Rector A, Rossi A, Consorti MF, Zanstra P; “Practical Development of Re-usable Terminologies: GALEN-IN-USE and the GALEN Organisation”, International Journal of Medical Informatics, February 1998, 48(1-3) pages 71-84. Horrocks U, Tobies, S; “Practical reasoning for description logics with functional restrictions, inverse and transitive roles, and role hierarchies”, Proceedings of the first workshop on Methods for Modalities (M4M-1), May 1999. Horrocks U, Tobies, S; “Optimisation of terminological reasoning”, Proceedings of the 2000 International Workshop on Description Logics (DL2000), pages 183-192, 2000. Baenziger J, et al; “Development of the Logical Observation Identifier Names and Codes (LOINC) Vocabulary”, Journal of the American Medical Informatics Association, May-June 1998, 5(3) pages 276-292. Elkin PL, et al; “A Randomized Controlled Trial of Automated Term Composition”, JAMIA, 1998, Symposium, Supplement, l:765-769. Elkin PL, Tuttle MS, Cesnik B, McCray AT, Scherrer JR; “The Role of Compositionality in Standardized Problem List Generation”, Ninth World Congress on Medical Informatics; IOS Press; 1998. p. 660-664. Rosse C, Mejino JL, Modayur BR, Jakobovits R, Hinshaw KP, Brinkley JF. 1998. Motivation and organizational principles for anatomical knowledge representation: the Digital Anatomist Symbolic Knowledge Base. J. Am. Med. Informatics Assoc.5:17-40. Rosse C, Shapiro LG, Brinkley JF. 1998. The Digital Anatomist Foundational Model: Principles for Defining and Structuring its Concept Domain. IN Chute EG (ed): A paradigm shift in health care information systems: clinical infrastructures for the 21st century. JAMIA Symposium Supplement. '98: 820-824 Mejino JL, Rosse C. 1998. The Potential of the Digital Anatomist Foundational Model for assuring consistency in UMLS sources. IN Chute EG (ed): A paradigm shift in health care information systems: clinical infrastructures for the 21st century. JAMIA Symposium Supplement. '98:825-829 Neal PJ, Shapiro LG, Rosse C. 1998. The Digital Anatomist Spatial Abstraction: a scheme for the spatial description of anatomical entities. IN Chute EG (ed): A paradigm shift in health care information systems: clinical infrastructures for the 21st century. JAMIA Symposium Supplement. '98:423-427. Mejino JLV, Rosse C . 1999. Conceptualizations of Anatomical Spatial Entities in the Digital Anatomist Foundational Model. J. Am. Med. Assoc. AMIA ’99 Symp. Suppl. '99: 112-116. Agoncillo A, Mejino JLV, Rosse C. 1999. Influence of the Digital Anatomist Foundational Model on Traditional Representations of Anatomical Concepts. J. Am. Med. Assoc. AMIA ’99 Symp. Suppl. '99: 2-6. Michael J, Mejino JLV, Rosse C. 2001. The role of definitions in biomedical concept representation. JAMIA Symposium Supplement. '01:463-467. Hahn JS, Burnside E, Brinkley JF, Rosse C, Musen MA. 1999. Representing the Digital Anatomist Foundational Model in a Protégé ontology. JAMIA Symposium Supplement. '99:1070p Rosse C, Mejino JLV, Shapiro LG, Brinkley JF. 2000. Visible Human, Know Thyself: The Digital Anatomist Structural Abstraction. In: The Third Visible Human Project Conference Proceedings. Bethesda: NLM; 85-86. Mejino JLV, Noy NF, Musen M, Rosse C. 2001. Representation of structural relationships in the Foundational Model of anatomy. JAMIA Symposium Supplement. '01:937p. Noy NF, Mejino JLV Jr., Musen MA, Rosse C. 2002. Pushing the envelope: challenges in frame-based representation of human anatomy. Submitted. Data & Knowledge PL Elkin, BA Bauer, et al; “A Randomized Double-Blind Controlled Trial of Automated Term Dissection”, JAMIA, Supplement, 1999. Decision Support Practice Mgt. Decision Support Practice Mgt. Patient Safety Research Vocabulary Services Informatics Team Microbiology Biochemistry Anatomy Genomics Pathology Physiology Laboratory Pharmacology Radiology Radiology Common Reference Terminology Model Common Reference Terminology Model Figure #1: A proposed jigsaw puzzle of the reference terminology development model for health. This is meant to depict the sort of disciplines represented but not the full enumeration of important contributions to the representational schema. Evolution of Reference Terminologies Basic Science Clinical Science Clinical Practice Anatomy Biochemistry Physiology Microbiology Laboratory Pathology Radiology Genomics Pharmacology Findings Tests Procedures Medications Genetics Data Collection & Analysis Vocabulary Practice Knowledge Management Enhancement Standards Practice Management Coding Research Statistics Decision Support Data Mining Patient Safety Education Collaboration Outcome Improvement Reduction in Adverse Events Figure #2: The evolution of reference terminology development and use. Figure 4: Domain characterization for IRT. International Reference Terminology Genetics Anatomy Proteinomics Physiology Biochemistry Microbiology Pharmacology Pathophysiology Epidemiology Biomedical Informatics Term Modeling Process Basic Science Domains Clinical Domains Diagnosis Findings Tests Procedures Medications Genetics