Terminologies, Classifications and Groupings Dr M N Kamel Boulos MIM Centre, City University, London XaBpR (aka Dermatologist) E-mail: dk708@city.ac.uk 2001 Why Use a Clinical Terminology? Free-Text Search Weaknesses • Brute force free-text search techniques cannot locate relevant knowledge efficiently for three reasons: The sought page might be using a different term (or synonym) that points to the same concept. Myocardial infarction and coronary thrombosis cannot be matched, although they are the same. Spelling mistakes and variants are considered as different terms in a computer environment. For example, psoriasis (correct spelling) and psoriaisis (typographical error) cannot be matched. Similarly, anaemia (correct UK spelling) and anemia (correct US spelling) cannot be matched. Why Use a Clinical Terminology? Free-Text Search Weaknesses (Cont’d) • Brute force free-text search techniques cannot locate relevant knowledge efficiently for three reasons (Cont’d): In the bibliographic world, e.g., MEDLINE, search engines cannot process HTML intelligently. For example, searching for resources on ‘psoriasis’ will retrieve all the documents containing this word, but many of these resources might not be relevant, i.e., psoriasis was just mentioned by the way in these documents and is not their actual topic. For example, some documents might be mentioning psoriasis within a ‘See also: psoriasis’ sentence, e.g., at the bottom of a page covering another papulous squamous disease, or under the differential diagnosis section of a page covering another disease, e.g., Reiter’s disease or seborrhoeic dermatitis. You might be doing a search on the term ‘stroke’ (cerebral infarction) and end up with documents that teach you about the workings of the twostroke motorcycle engine. The nondiscriminatory free-text method of document retrieval inevitably produces a number of irrelevant leads or noise (Kiley, 1999). Solution • Structured (headings), coded data entry. • It is essential that healthcare professionals agree on the nature and content of the component data sets of the different EPRs (e.g., record structures and headings related to the different medical, surgical and nursing specialities), so that consistent basic models of these records can be constructed and shared in a reliable way. (The framework of headings Web site http://www.nhsia.nhs.uk/headings/ is a good example of well-organised standardisation efforts.) Headings are very important. For example, the same term “schizophrenia” assumes different meanings/implications depending on whether it appears under diagnosis or present/past history or family history headings in an electronic patient record. Introduction to Clinical Coding • Clinical coding lies at the heart of successful implementation of the EPR and integrated decision support modules. • EPR coding is done using a fine granularity terminology (or controlled healthcare vocabulary) like Read CTV3 (Read Clinical Terms Version 3). • A controlled healthcare vocabulary is a system of concepts to populate electronic healthcare applications. Controlled healthcare vocabularies are products of the electronic era, designed to support computer-based functionality. • Read CTV3 allows not only the coding of diagnoses and drugs (treatment), but also the coding of symptoms and signs, and of different tests and investigations. Moreover, Read CTV3 is a compositional terminology, which means that concepts can be constructed from primitive building blocks with rules controlling different combinations. Introduction to Clinical Coding • On the other hand, a clinical classification allows categorisation of clinical data according to intrinsic rules. Formal clinical classifications have existed for over 100 years, initially for mortality, but more recently for morbidity and interventions. Classifications like the WHO's ICD-10 (The International Statistical Classification of Diseases and Related Health Problems, tenth revision) offer a coarser granularity (1000s of entries vs. 100,000s of entries in clinical terminologies) and only single parentage (so that an item may not be counted twice under different headings), and are therefore more suitable for statistical reporting (national statistics and international comparisons) using aggregated data. Introduction to Clinical Coding • Groupings like HRGs (Health Resource Groups) have an even much coarser granularity, lumping together tens of different conditions in single groups according to their resource consumption (100s of groups). Grouping information from aggregated EPRs helps in resource management, planning and budget negotiations. Introduction How Do Concept Codes Help in Decision Support and Research? All the concepts represented by the terms in Read CTV3 are arranged in a hierarchy (multiple parentage allowed), i.e., they are semantically defined within the clinical vocabulary. The hierarchy describes which concepts are types of something else. Consider this example: Myocardial infarction (and its synonyms) is a type of Ischaemic heart disease (and its synonyms) which is a type of Disorder of heart (and its synonyms) which is a type of Cardiovascular disorder (and its synonyms) which is a type of Disorders (or diseases) which is a type of Clinical findings How Do Concept Codes Help in Decision Support and Research? Now suppose that a doctor wishes to prescribe a drug that must not be used by anyone having a heart disorder. Because the clinical terminology knows every condition that is a type of heart disorder, it can automatically check the patient’s record to see whether the patient has any of these conditions. This could not have been achieved with a free text patient record. For instance, Angina is a type of heart disorder, but this could not have been detected in a text-based patient record, where the best that can be achieved is to search for the word "heart" in the text. Nor would it have been possible to search for words "heart OR angina OR coronary OR myocardial OR ... etc.", as there are over 1000 types of heart disorder listed in the Read Codes for example. Read Codes’ hierarchy of concepts allows all sorts of research questions such as "list all my patients who have eczema, and list these according to the type of eczema that they have." The possibilities are endless... Controlled Clinical Terminology Desirable Features (Cimino) 1 Concept based 2 Completeness (the compositional feature of a terminology ensures completeness) 3 Synonymy (in this way the terminology is less restrictive and richer; all synonyms of a concept point to it and are semantically associated with it) 4 Hierarchical 5 Multiple classification and multiple parentage 6 Compositional 7 Semantic definition of concepts 8 Mapped to classifications (e.g., Read -> ICD10; can be one-toone or one-to-many maps; usually some detail is lost as classifications have coarser granularity compared to terminologies) 9 Language-independent model No Ambiguity or Redundancy • No duplicate concepts are allowed, i.e., cannot allow two different ways of coding the same thing or concept, e.g., "Heart attack" and "Myocardial infarction" cannot be considered two different concepts and given two concept codes; they are just synonyms. • "Paget disease" cannot be a concept’s preferred term or label, because it is ambiguous; it can point to "Paget disease of the breast" as well as "Paget disease of bone". • Each concept has one unambiguous preferred term and any number of synonyms. Synonyms may be shared with other concepts, e.g., "Ventricle" is a synonym (but cannot be the preferred term) of both "Cardiac ventricle" and "Brain ventricle". Concept and Term Codes Concept and Term Codes • "Plaque psoriasis" (concept label) has a Read concept code ‘M1614’ while the codes (TermIds) of the preferred term ‘Plaque psoriasis’ (again) is ‘Y50HZ’ and synonymous terms: "Discoid psoriasis" – ‘Y50Ha’ and "Nummular psoriasis" – ‘Y50Hc’, i.e., four codes for this example: one concept code and three TermIds. Arranging Concepts • Concepts can be arranged orthographically (by spelling, i.e., A to Z), like a dictionary (e.g., Apple, Dog, Orange, Zebra). However, arranging concepts semantically (by meaning) like a thesaurus is much more useful (e.g., Fruits [Apple, Orange], Animal [Dog, Zebra]). Directed Acyclic Graph (DAG) • DAG allows multiple parentage and allows concepts to be moved and reclassified as medical knowledge changes (cf. rigid codedependent hierarchy of ICD). With DAG, unlimited hierarchy depths can be reached (cf. Only four levels in ICD), but all these features of DAG come on the expense of increased complexity for implementers. Directed Acyclic Graph (DAG) Enumerative Vs Compositional Terminologies • Enumerative (pre-coordinated) terminologies, where every possible concept is listed explicitly, result in compositional explosion and you can never be sure that you have listed all the possibilities. • A compositional (post-coordinated) terminology on the other hand, like Read CTV3, seeks to construct concepts from primitive building blocks, governed by validation rules. • OAV (Object-Attribute-Value) triples constitute the description logic scheme used in Read CTV3, and help achieving semantic definition of concepts. • In SNOMED (Systematised Nomenclature of Human and Veterinary Medicine - College of American Pathologists), the description logic is KRSS (Knowledge Representation System Specification) while in GALEN it is GRAIL. • N.B.: Read Codes are due to be merged with SNOMED by 2002, to create a new worldwide standard clinical coding scheme. This will be called SNOMED Clinical Terms (SNOMED-CT). Enumerative Vs Compositional Terminologies Description Logics • Description logics (DLs) lie at the heart of any clinical terminology. DLs are languages that allow reasoning about information, in particular supporting the classification of descriptions by working out how concepts and their instances relate to one another based on their roles. They can thus infer knowledge implied by an ontology. Terminology Servers • Medical terminologies are foundational ontologies used by many applications, and hence they should not be embedded in client applications, but should be shared and reused as distributed resources by implementing them as services through terminology servers. • A terminology server is a special type of ontology servers that allows retrieval of related concepts (parent, child, sibling, cousin and uncle concepts) and synonyms, and querying and cross-mapping multiple terminologies/ classifications at the same time. Ideally, it should also support concept mapping, which involves processing free text queries to identify corresponding terms from a controlled vocabulary; this relieves users from any restrictions while ensuring accurate results (contextual relevancy) and can also support multiple languages. Terminology Servers (Cont’d) • Chute et al mention the following desiderata for a clinical terminology server: word normalisation, word completion, target terminology specification, spelling correction, lexical matching, term completion, semantic locality, term composition and decomposition. • Examples of terminology servers include Saphire International (http://www.ohsu.edu/cliniweb/saphint/) and jTerm (http://www.jterm.org), a Java-based open source terminology server. Classifications • A classification is a system of categories to which entities are assigned according to some established criteria, e.g., anatomy, disease process or pathology, aetiology, clinical (like obstetrics), or a combination of these. Categories are limited in number, all encompassing and stable over time. Common and important entities are assigned to specific categories, while uncommon and less significant entities are included within other categories. Classifications Entities rules of engagement in ICD 1 Index, e.g., 443.1 Buergers disease (ICD9) 2 Inclusions (when an entity is less significant), e.g., 443.6 Other (incl. Acrocyanosis, Diabetic peripheral angiopathy) [excl. Chilblains, Frostbite, Immersion foot] (ICD9) 3 Exclusions (see above example; tells you not to count an entity under this code, as it is listed elsewhere within the classification with another code) 4 Otherwise specified categories (OS) – include other specific but less significant entities 5 Unspecified (NOS – Not Otherwise Specified), e.g., 443.9 Unspecified (incl. Intermittent claudication, Spasm of artery) [excl. Spasm of cerebral artery (435)] (ICD9) 6 Extensions (5th digit), e.g., to differentiate between closed and open fracture neck of femur as an open fracture is much more liable to infection and complications. 7 Dagger – asterisk, e.g., {Cause} 265.0 Beriberi {Effect} 425.7* Nutritional cardiomyopathies Characteristics of a Classification • All concepts can find a single place in a suitable classification (i.e., all-inclusive, mutually exclusive). A single concept cannot be classified under two different headings (i.e., no multiple parentage); this prevents double counting of a condition, which is essential for reliable statistics and central returns (remember: statistics are the main raison d’être of classifications). • In classifications, you loose detail (related concepts are aggregated and counted together; no distinction between them is made on the code and statistics levels). Characteristics of a Classification • Classifications become less accurate with time, and will eventually need revision at some stage, e.g., when new diseases are discovered (where to put these diseases, and if we put them under existing categories the meaning of these categories will drift with time making comparisons with previous years statistics done using the same classification less accurate and reliable). Updating a classification also implies preparing an equivalencemapping table (to compare statistics done using different versions of the classification). • In addition to ICD (for primary diagnosis), OPCS-4 (a surgical operations and procedure classification) is used in the UK. The Pyramid Mapping and Grouping Tools CamsCoder presents us with a review screen showing the diagnostic and operative procedures entered. CamsCoder is a good example of a mapping tool that translates Read CTV3 terms into ICD-10 and OPCS-4 terms and codes. The coder can set the episode coding to be complete when they are happy with it. The episode’s HRG is displayed here. CamsCoder automatically validates the information presented. If there are any problems, a message is displayed here explaining what the coder needs to do. If the coding was invalid clicking on the “Action” button would take the coder through the process of making the episode valid. The coder now clicks on “OK” to finish this coded episode. CamsCoder allows the entered statements (which may contain multiple terms and codes) to be re-ordered/deleted using these buttons. The LOINC Codes • The LOINC database provides a set of universal names and ID codes for identifying laboratory and clinical observations. The goal of LOINC is to facilitate the exchange and pooling of clinical laboratory results, such as blood haemoglobin or serum potassium, for clinical care, outcomes management, and research. • Currently, many laboratories are using HL7 or similar standards, to send laboratory results electronically from producer laboratories to clinical care systems in hospitals. Most laboratories identify tests in these messages by means of their internal (and idiosyncratic) code values, so the receiving systems cannot fully "understand" the results they receive unless they either adopt the producer's laboratory codes (which is impossible if they receive results from multiple source laboratories), or invest in work to map each laboratory's code system to the receiver's internal code system. The LOINC Codes You may download and use to LOINC database browser free of charge from http://www.regenstrief.org/loinc/loinc.htm The LOINC Codes • If laboratories all used the LOINC codes to identify their results in data transmissions, this problem would disappear. The receiving system with LOINC codes in its master vocabulary file would be able to understand and properly file HL7 results messages that also use the LOINC code. Similarly, government agencies would be able to pool and analyse results for tests from many sites if they were reported electronically using the LOINC codes. 2001 MeSH (Medical Subject Headings) • MeSH was originally developed by United States’ National Library of Medicine (NLM) to index the world medical literature in MEDLINE (MeSH provides bibliographic headings for indexing); the latest MeSH version is 2001 MeSH. MeSH also forms an essential part of the NLM’s Unified Medical Language System (UMLS). • MeSH qualifiers or subheadings are used to better define a topic, narrow retrieval, or express a certain aspect of a main heading. • It should be noted that MeSH is not an efficient indexing language for tasks such as classifying episodes of patient care. The more efficient clinical coding systems (e.g., Read Codes/Clinical Terms Version 3) are more suited to coding the Electronic Patient Record. 2001 MeSH (Medical Subject Headings) MeSH Descriptor Data for Psoriasis, a skin disease. 2001 MeSH Tree Structures • MeSH hierarchy allows broader (parents or ancestors and siblings) and narrower (children or successors) concept relationships. Moreover, within this hierarchy, a single concept may appear as narrower concepts of more than one broader concept, e.g., "Psoriatic Arthritis" appears under both "Joint Diseases" and "Skin Diseases”. cf. ICD; remember each coding language or scheme is most suited to particular purpose(s). 2001 MeSH Tree Structures http://www.nlm.nih.gov/mesh/MBrowser.html UMLS (Unified Medical Language System) • The UMLS project (http://umls.nlm.nih.gov/) is a longterm research and development project at the United States' National Library of Medicine (NLM) whose goal is to help health professionals and researchers to intelligently retrieve and integrate information from a wide range of disparate electronic biomedical information sources. It can be used to overcome variations in the way similar concepts are expressed in different sources. This makes it easier for users to link information from patient record systems, bibliographic databases, factual databases, expert systems, etc. The UMLS Knowledge Services can also assist in data creation and indexing applications. UMLS (Unified Medical Language System) • The UMLS includes machine-readable "Knowledge Sources" that can be used by a wide variety of applications programs to compensate for differences in the way concepts are expressed in different machine-readable sources and by different users, to identify the information sources most relevant to a user inquiry. • The Metathesaurus contains mappings to MeSH, ICD-9-CM, SNOMED, CPT, and a number of other coding systems. UMLS (Unified Medical Language System) • The UMLS is not itself a standard; it is a crossreferenced collection of standards and other data and knowledge sources. It is a very valuable resource for solving the most difficult problem in exchanging healthcare information: the multiplicity of coding systems in use today. • One on-line use of the UMLS is the Medical World Search site (http://www.mwsearch.com/). When a user searches the Web for a medical concept, Medical World Search uses UMLS to include synonyms in the query. The UMLS Project and its Components • The project is directed by a multidisciplinary team, including clinicians, computer and information scientists, and linguists, and involves collaboration with many medical informatics research groups. The project work has resulted in a set of knowledge sources and accompanying programs that are updated and distributed regularly on CD-ROM. Online access to the UMLS knowledge sources is provided through the Internet-based UMLS Knowledge Source Server, which includes an application programming interface (API) and a World Wide Web interface. The Web site requires registration (http://umlsks.nlm.nih.gov/). UMLS Metathesaurus • The Metathesaurus contains information about biomedical concepts and terms from a large number of controlled terminologies and thesauri. The Metathesaurus preserves the information encoded in the source vocabularies, such as the hierarchical contexts of the terms, their meanings and other attributes. The Metathesaurus is organised by concepts, which means that alternate names (synonyms, lexical variants, and translations) for the same meaning are all linked together as one concept. The Metathesaurus adds information to the concepts, including semantic types, definitions, and inter-concept relationships. UMLS Metathesaurus (Cont’d) • The Metathesaurus contains hundreds of thousands of concepts from a broad range of vocabularies. These include, for example, all or portions of the following terminologies: – – – – – – – – – – – • the Systematised Nomenclature of Medicine (SNOMED International), the Read Thesaurus, the International Classification of Diseases - Clinical Modification (ICD9-CM), the Universal Medical Device Nomenclature System, the WHO Adverse Drug Reaction Terminology, the Classification of Nursing Diagnoses (NANDA), the Home Health Care Classification of Nursing Diagnoses and Interventions, the Physicians' Current Procedural Terminology (CPT), the Medical Subject Headings (MeSH), the Diagnostic and Statistical Manual of Mental Disorders (DSMIV), and the Thesaurus of Psychological Index Terms. In addition, translations of some of the terminologies into languages other than English are included. UMLS Semantic Network • The Semantic Network, through its high-level semantic types, or categories, provides a consistent categorisation of all concepts represented in the Metathesaurus. The links between the semantic types provide the structure for the Network and represent important relationships in the biomedical domain. There are semantic types for organisms, anatomical structures, biologic function, chemicals, events, physical objects, and concepts or ideas. The primary relationship is the "is_a" link, and there are five major categories of additional relationships: physical, spatial, temporal, functional, and conceptual relationships. UMLS (Unified Medical Language System) UMLS (Unified Medical Language System) UMLS (Unified Medical Language System) Software Implementations You Can Experiment With • Agora’s Web-based READ 3.1 browser that you can play with: http://www. agora.co.uk :1080/read/ gp.htm Lightweight browser and search engine for the Read Codes clinical thesaurus by Agora. Software Implementations You Can Experiment With • CLUE (CIC Look Up Engine from The Clinical Information Consultancy, UK) is a freeware clinical coding solution that helps you add NHS Clinical Terms Version 3 capabilities to clinical applications in hours rather than months (you may use Visual Basic for example to access CLUE's API). CLUE also offers a ready-to-use Read Codes browser. You may download the full CLUE package free of charge, but beware that a terminology like Read CTV3 with more than 200,000 concepts, nearly 300,000 terms and over a million access keys is not a small download, so be prepared for this one (over 20MB). http://www.clinical-info.co.uk/ClueDownload.htm Software Implementations You Can Experiment With • CLUE (CIC Read CTV3 Look Up Engine from The Clinical Information Consultancy, UK) Software Implementations You Can Experiment With ICD9 CodeFinder lets you search and browse ICD9 categories and codes. You may download CodeFinder free of charge (298 KB). http://www.winsite.com/info/pc/win95/misc/cfind20.zip/ Software Implementations You Can Experiment With e-MDs Online ICD-9 Search: http://www.e-mds.com/icd9/index.html Recommended Web Links and Papers • Bechhofer SK, Goble CA, Rector AL, Solomon WD, and Nowlan WA. Terminologies and Terminology Servers for Information Environments. In: Proceedings of STEP '97 Software Technology and Engineering Practice, 1997. URI: http://citeseer.nj.nec.com/354766.html • Chute CG, Elkin PL, Sheretz DD and Tuttle MS. Desiderata for a Clinical Terminology Server. In: Proceedings of AMIA'99 Annual Symposium, 1999. URI: http://www.amia.org/pubs/symposia/D005782.PDF • Rector AL. Clinical Terminology: Why Is it so Hard? Methods Inf Med. 1999;38(4-5):239-52 • The British Association of Clinical Terminology Specialists: • http://www.bacts.org.uk/ OpenGALEN: http://www.opengalen.org/ • Read Codes Engines: http://www.cams.co.uk/ and http://www.visualread.com • See also “Related Web Links” section at: https://wwws.soi.city.ac.uk/intranet/students/courses/mim/mi/lect2_2.htm