Part I: Biomedical Ontologies: A Critical Survey Barry Smith http://ontology.buffalo.edu/smith 1 I: Biomedical Ontologies: A Critical Survey Ontologies, terminologies and thesauri are now in common use in the domain of biomedical informatics. Their goal is to support search and retrieval, but also to advance genuine reasoning about biomedical phenomena and to enable re-use of heterogeneous data through the use of common systems of annotations. We examine a representative collection of biomedical ontologies in light of these criteria, and draw (somewhat sad) conclusions as to the current state of the field. II. The Ontology of Biomedical Reality (terminology) Ontologies to support scientific research and clinical medicine have special characteristics, which we shall outline in terms of a distinction between three levels: (1) the level of reality; (2) the level of cognitive representations; and (3) the level of the publicly accessible concretizations of such cognitive representations for example in ontologies. Against this background we shall clarify the relations between ontologies, terminologies, information models, databases, and similar artifacts. III. The OBO Foundry Project: Towards Scientific Standards and Principles-Based Coordination in Biomedical Ontology Development The OBO Foundry is a collaborative experiment, involving a group of ontology developers who have agreed in advance to the adoption of a growing set of principles specifying best practices in ontology development. The primary objective is to establish gold standard reference ontologies, one for each core domain of biomedical science. We shall describe how this objective is already being realized, and show how it can not only help solve the problems of data retrieval and re-use but also foster the development of the powerful tools that will be needed to reason with biomedical data in the future. 2 Problem: how to reason with data deriving from different sources, each of which uses its own system of classification ? 3 Solution: Ontology ! 4 Examples of current needs for ontologies in biomedicine to enforce semantic consistency within a database to enable data retrieval, sharing and reuse to enable data integration (bridging across data at multiple granularities) to allow querying 5 General trend on the part of NIH, FDA and other bodies to consolidate ontology-based standards for the communication and processing of biomedical data. 6 Old approach gather terminologies in libraries Unified Medical Language System National Library of Medicine 7 UMLS SNOMED 8 New Approach MusicBeanz 9 http://www.w3.org/ 10 Semantic Web deposits Pet Profile Ontology Review Vocabulary Band Description Vocabulary Musical Baton Vocabulary MusicBrainz Metadata Vocabulary Kissology 11 http://www.w3.org/ Beer Ontology all instances of hops that have ever existed are necessarily ingredients of beer. 12 Both UMLS- and OWL-type responses involve ad hoc creation of new terminologies by each separate community, and an opendoor policy for admission Many of these terminologies remain as torsos, gather dust, poison the wells, ... 13 OWL’s syntactic regimentation is not enough to ensure high-quality ontologies – the use of a common syntax and logical machinery and the careful separating out of ontologies into namespaces does not solve the problem of ontology integration 14 from Ontological Engineering location =def. a spatial point identified by a name (p. 12) arrivalPlace =def. a journey ends at a location (p. 13) facet = def. ternary relation that holds between a frame, a slot, and the facet (p. 51) an example of function is Pays, which obtains the price of a room after applying a discount (p. 13) 15 from Handbook of Ontology On 'achieving consistency from multiple sources‘: if exact semantic identity is lacking, terms can be unified at a higher level, and information that is possibly related can be retrieved as well. When the application objective is to study and understand, the end-user can reject misleading records. (p. 94) owl:InverseFunctionalProperty defines a property that for which two different objects cannot have the same value, e.g. isTheSocialSecurityNumberOf (a social number is assigned to one person only) (p. 78) 16 UMLS The Good, the Bad, and the UGLY SNOMED 17 A methodology for qualityassurance of ontologies tested thus far in the biomedical domain on: FMA GO + other OBO Ontologies FuGO SNOMED UMLS Semantic Network NCI Thesaurus ICF (International Classification of Functioning, Disability and Health) ISO Terminology Standards HL7-RIM 18 The Good Foundational Model of Anatomy (FMA) Pro clear statement of scope: structural human anatomy, at all levels of granularity, from the whole organism to the biological macromolecule Powerful treatment of definitions, from which the entire FMA hierarchy is generated – can serve as basis for formal reasoning Con Some unfortunate artifacts in the ontology deriving from its specific computer representation (Protégé) 19 it’s better manually 20 Anatomical Structure Anatomical Space Organ Cavity Subdivision Organ Cavity Organ Serous Sac Cavity Subdivision Serous Sac Cavity Serous Sac Organ Component Organ Subdivision Pleural Sac Pleural Cavity Parietal Pleura Interlobar recess Organ Part Mediastinal Pleura Pleura(Wall of Sac) Visceral Pleura Mesothelium of Pleura Tissue The Foundational Model of Anatomy Follows formal rules for ‘Aristotelian’ definitions When A is_a B, the definition of ‘A’ takes the form: an A =def. a B which ... a human being =def. an animal which is rational 22 FMA Example Cell =def. an anatomical structure which consists of cytoplasm surrounded by a plasma membrane with or without a cell nucleus Plasma membrane =def. a cell part that surrounds the cytoplasm 23 The FMA regimentation Each definition reflects the position in the hierarchy to which a defined term belongs. The position of a term within the hierarchy enriches its own definition by incorporating automatically the definitions of all the terms above it. The entire information content of the FMA’s term hierarchy can be translated very cleanly into a computer representation 24 Principle Use Aristotelian definitions An A is a B which C’s. 25 Intermediate GALEN Pro Allows formal representation of clinical information Allows multiple views of relevant detail as needed Uses powerful Description Logic (DL)-based formal structure Makes definitions easy to formulate Con Remains only partially developed Contains errors: Vomitus contains carrot – which DLs did not prevent 26 Principle An ontology should not remain a torso 27 Principle An ontology should have a properly personed help desk 28 Principle An ontology should have procedures for updating in light of scientific advance 29 Intermediate The Gene Ontology Con Poor formal architecture Full of errors menopause part_of death Poor support for automatic reasoning and errorchecking Poor treatment of definitions Not trans-granular No relation to time or instances 30 The Gene Ontology Pro Open Source Cross-Species ... has recognized the need for reform, including explicit representation of granular levels 31 Old GO Definitions hemolysis =def. the causes of hemolysis 32 GO now adopting structured definitions which contain both genus and differentiae Species =def Genus + Differentiae neuron cell differentiation =def differentiation by which a cell acquires features of a neuron Ontology alignment One of the current goals of GO is to align: Cell Types in GO with cone cell fate commitment Cell Types in the Cell Ontology retinal_cone_cell keratinocyte differentiation keratinocyte adipocyte differentiation fat_cell dendritic cell activation dendritic_cell lymphocyte proliferation lymphocyte T-cell homeostasis T_lymphocyte garland cell differentiation garland_cell heterocyst cell differentiation heterocyst Alignment of the two ontologies will permit the generation of consistent and complete definitions GO id: CL:0000062 name: osteoblast def: "A bone-forming cell which secretes an extracellular matrix. Hydroxyapatite crystals are then deposited into the matrix to form bone." [MESH:A.11.329.629] is_a: CL:0000055 relationship: develops_from CL:0000008 relationship: develops_from CL:0000375 + Cell type = Osteoblast differentiation: Processes whereby an osteoprogenitor cell or a cranial neural crest cell acquires the specialized features of an osteoblast, a bone-forming cell which secretes extracellular matrix. New Definition Other Ontologies to be aligned with GO Chemical ontologies 3,4-dihydroxy-2-butanone-4-phosphate synthase activity Anatomy ontologies metanephros development 36 Principle Exploit existing ontologies when formulating definitions 37 The Bad Reactome Pro Rich catalogue of biological process Con Incoherent treatment of categories: ReferentEntity (embracing e.g. small molecules) is a sibling of PhysicalEntity (embracing complexes, molecules, ions and particles). Similarly CatalystActivity is a sibling of Event. 38 Principle An ontology should be in agreement with the truths of basic science (e.g. that molecules are physical entities) 39 The Ugly Disease Ontology / ICD-10 Other problems with special functions Tuberculosis of unspecified bones and joints, tubercle bacilli not found by bacteriological or histological examination, but tuberculosis confirmed by other methods (inoculation of animals) Other mineral salts, not elsewhere classified, causing adverse effects in therapeutic use Other general medical examination for administrative purposes Assault by other specified means 40 The Ugly Disease Ontology / ICD-10 Other accidental submersion or drowning in water transport accident injuring other specified person Accident to powered aircraft, other and unspecified, injuring occupant of military aircraft, any rank Other accidental submersion or drowning in water transport accident injuring occupant of other watercraft - crew 41 The Ugly Disease Ontology / ICD-10 Normal pregnancy Fall on stairs or ladders in water transport injuring occupant of small boat, unpowered Railway accident involving collision with rolling stock and injuring pedal cyclist Injury due to war operations by lasers Nontraffic accident involving motor-driven snow vehicle injuring pedestrian 42 The Ugly Disease Ontology / ICD-10 Donors of other specified organ or tissue Fitting and adjustment of wheelchair Hot (boiling) tap water Training in use of lead dog for the blind Person consulting on behalf of another person 43 Principle An ontology should have a clearly specified domain (captured by its root node) 44 “Circular Hierarchical Relationships in the UMLS: Etiology, Diagnosis, Treatment, Complications and Prevention” Olivier Bodenreider Topographic regions: General terms Physical anatomical entity Anatomical spatial entity Anatomical surface Body regions Topographic regions 45 Principle Avoid cycles 46 MeSH National Socialism is_a Political Systems National Socialism is_a Anthropology ... 47 Principle Use singular nouns 48 MeSH National Socialism is_a MeSH Descriptor 49 Plant Ontology cell = def. structural and physiological unit of a living organism; it (i.e., plant cell) consists of protoplast and cell wall; ... 50 Principle For the sake of interoperability with other ontologies, do not give special meanings to terms with established general meanings (Don’t use ‘cell’ when you mean ‘plant cell’) 51 ICNP: International Classification of Nursing Procedures water =def. a type of Nursing Phenomenon of Physical Environment with the specific characteristics: clear liquid compound of hydrogen and oxygen that is essential for most plant and animal life influencing life and development of human beings. 52 MORE UGLY National Cancer Institute Thesaurus (NCIT) 53 The NCIT reflects a recognition of the need for high quality shared ontologies and terminologies the use of which by clinical researchers in large communities can ensure re-usability of data collected by different research groups 54 NCIT “a biomedical vocabulary that provides consistent, unambiguous codes and definitions for concepts used in cancer research” “exhibits ontology-like properties in its construction and use”. 55 Goals to make use of current terminology “best practices” to relate relevant concepts to one another in a formal structure, so that computers as well as humans can use the Thesaurus for a variety of purposes, including the support of automatic reasoning; to speed the introduction of new concepts and new relationships in response to the emerging needs of basic researchers, clinical trials, information services and other users. 56 Formal Definitions of 37,261 nodes, 33,720 were stipulated to be primitive in the DL sense Thus only a small portion of the NCIT ontology can be used for purposes of automatic classification and error-checking by using OWL. 57 Principle Supply definitions wherever possible (both human-understandable natural language definitions, and equivalent formal definitions) 58 Verbal Definitions About half the NCIT terms are assigned verbal definitions Unfortunately some are assigned more than one 59 Disease Progression Definition1 Cancer that continues to grow or spread. Definition2 Increase in the size of a tumor or spread of cancer in the body. Definition3 The worsening of a disease over time. This concept is most often used for chronic and incurable diseases where the stage of the disease is an important determinant of therapy and prognosis. 60 Principle Each term should have at most one definition* *which may have both natural-language and formal versions 61 To make matters worse Disease Progression has as subclass: Cancer Progression Definition: The worsening of a cancer over time. This concept is most often used for incurable cancers where the stage of the cancer is an important determinant of therapy and prognosis. 62 Cancer a process (of getting better or worse) an object (which can grow and spread) 63 Principle Distinguish continuant entities (molecule, cell, tumor, organism) from occurrent entities (processes of growth, change, ...) 64 Two kinds of entities occurrents (processes, events, happenings) cell division, ovulation, death continuants (objects, qualities, ...) cell, ovum, organism, temperature of organism, ... 65 NCIT confuses definitions with descriptions Tuberculosis Definition A chronic, recurrent infection caused by the bacterium Mycobacterium tuberculosis. Tuberculosis (TB) may affect almost any tissue or organ of the body with the lungs being the most common site of infection. The clinical stages of TB are primary or initial infection, latent or dormant infection, and recrudescent or adult-type TB. Ninety to 95% of primary TB infections may go unrecognized. Histopathologically, tissue lesions consist of granulomas which usually undergo central caseation necrosis. Local symptoms of TB vary according to the part affected; acute symptoms include hectic fever, sweats, and emaciation; serious complications include granulomatous erosion of pulmonary bronchi associated with hemoptysis. If untreated, progressive TB may be associated with a high degree of mortality. This infection is frequently observed in immunocompromised individuals with AIDS or a history of illicit IV drug use. 66 Confuses definitions with Tuberculosis descriptions Definition A chronic, recurrent infection caused by the bacterium Mycobacterium tuberculosis. Tuberculosis (TB) may affect almost any tissue or organ of the body with the lungs being the most common site of infection. The clinical stages of TB are primary or initial infection, latent or dormant infection, and recrudescent or adult-type TB. Ninety to 95% of primary TB infections may go unrecognized. Histopathologically, tissue lesions consist of granulomas which usually undergo central caseation necrosis. Local symptoms of TB vary according to the part affected; acute symptoms include hectic fever, sweats, and emaciation; serious complications include granulomatous erosion of pulmonary bronchi associated with hemoptysis. If untreated, progressive TB may be associated with a high degree of mortality. This infection is frequently observed in immunocompromised individuals with AIDS or a history of illicit IV drug use. 67 A better definition Tuberculosis Definition: A chronic, recurrent infection caused by the bacterium Mycobacterium tuberculosis. IS THIS CORRECT? (An infection is not a disease) 68 the use-mention confusion Conceptual Entities =Def. An organizational header for concepts representing mostly abstract entities. Confuses use and mention (swimming is healthy and has eight letters) 69 Principle Don’t confuse an entity with the name of an entity 70 Duratec, Lactobutyrin, Stilbene Aldehyde are classified by the NCIT as Unclassified Drugs and Chemicals 71 Problematic synonyms Anatomic Structure, System, or Substance ~ Anatomic Structures and Systems Does ‘anatomic’ apply only to structure or also to system and substance? Biological Function ~ Biological Process some biological processes are the exercises of biological functions others (e.g. pathological processes, side effects) not Genetic Abnormality ~ Molecular Abnormality (with subtype: Molecular Genetic Abnormality) (definitions not supplied) 72 Three disjoint classes of plants Vascular Plant Non-vascular Plant Other Plant 73 Three kinds of cells Abnormal Cell is a top-level class (thus not subsumed by Cell Normal Cell is a subclass of Microanatomy. Cell is a subclass of Other Anatomic Concept (so that cells themselves are concepts) 74 NCIT as now constituted will block automatic reasoning Neither Normal Cells nor Abnormal Cells are Cells within the context of the NCIT 75 Some consolations NCIT is open source NCIT has broad coverage NCIT has some formal structure (OWL-DL) NCIT is much, much better than (for example) the HL7-RIM NCIT has realized the errors of its ways 76 What might have been http://www.cbdnet.com/index.php/search/show/938464 = “Review of NCI Thesaurus and Development of Plan to Achieve OBO Compliance” 77 Welcome to the Pre-NCIT: http://nciterms.nci.nih.gov/NCIBro wser/Dictionary.do Fragment of Pre-NCIT Hierarchy Murine Tissue Type Body Fluids and Substances (MMHCC) Cardiovascular System (MMHCC) Blood Vessel (MMHCC) Heart (MMHCC) Digestive System (MMHCC) 78 More UGLY 79 MeSH MeSH Descriptors Index Medicus Descriptor Anthropology, Education, Sociology and Social Phenomena (MeSH Category) Social Sciences Political Systems National Socialism National Socialism is_a Political Systems National Socialism is_a Anthropology ... 80 MeSH National Socialism is_a MeSH Descriptors The Bodenreider Defence: MeSH is not an ontology 81 BIRNLex 82 BIRNLex The eye =def. The eyeball and its constituent parts, e.g. retina mouse =def. common name for the species mus musculus 83 BIRNLex 84 BIRNLex 85 Principle Avoid circular definitions (The term defined should not appear in its own definition) 86 The UMLS Semantic Network 87 More Ugly UMLS Semantic Network Pros Broad coverage; no multiple inheritance Cons Incoherent use of ‘conceptual entities’ (e.g. the digestive system as a conceptual part of the organism) Full of errors 88 UMLS Semantic Network Edges in the graph represent merely “possible significant (= some-some) relations”: Bacterium causes Experimental Model of Disease Experimental Model of Disease affects Fungus Experimental model of disease is_a Pathologic Function 89 UMLS Semantic Network Unclear what the nodes of the graph are: Drug Delivery Device contains Clinical Drug Drug Delivery Device narrower_in_meaning_than Manufactured Object The use-mention confusion: “Swimming is healthy and has 8 letters” 90 UMLS Semantic Network Edges in the graph represent merely “possible significant (= some-some) relations”: Bacterium causes Experimental Model of Disease Experimental Model of Disease affects Fungus Experimental Model of Disease is_a Pathologic Function 91 a pudding of ‘concepts’ 92 location_of Fungus location_of Vitamin Tissue location_of Mental or Behavioral Dysfunction 93 Fungus location_of Vitamin Every instance of vitamin is located in some fungus? Some instances of vitamin are located in some fungi? Some instances of fungi have instances of vitamin located in them? Every instance of vitamin is located in every instance of fungus? 94 what are the nodes in this graph? 95 96 UMLS Semantic Network Unclear what the nodes of the graph are: Drug Delivery Device contains Clinical Drug Drug Delivery Device narrower_in_meaning_than Manufactured Object The use-mention confusion: “Swimming is healthy and has 8 letters” 97 NCIT inherits this ontological and terminological incoherence from source vocabularies in UMLS Conceptual Entities =def An organizational header for concepts representing mostly abstract entities. Includes as subtypes: action, change, color, death, event, fluid, injection, temperature 98 The UMLS Unified Medical Language System Metathesaurus Semantic Network (SN) 99 BIRNLex and UMLS-SN Rest =SN Daily or Recreational Activity Principal Investigator =SN Professional or Occupational Group Left handedness =SN Organism Attribute Ambidextrous =SN Finding Brain Imaging =SN Diagnostic Procedure Brain Mapping =SN Diagnostic Procedure & Research Activity Healthy Adult =SN Finding 100 To build a high quality shared ontology requires hard work and staying power You cannot cheat by borrowing from UMLS UMLS (= the UMLS Metathesaurus) is not an ontology 101 is_a (sensu UMLS) A is_a B =def ‘A’ is narrower in meaning than ‘B’ grows out of the heritage of dictionaries, which reflect meanings, not biological reality 102 Concepts, Concept Names, and their Identifiers in the UMLS The Metathesaurus is organized by concept. One of its primary purposes is to connect different names for the same concept from many different vocabularies. 103 The desperate search for ‘mappings’ A concept is a meaning. A meaning can have many different names. A key goal of Metathesaurus construction is to understand the intended meaning of each name in each source vocabulary and to link all the names from all of the source vocabularies that mean the same thing (the synonyms). 104 The desperate search for ‘mappings’ This is not an exact science. ... Metathesaurus editors decide what view of synonymy to represent in the Metathesaurus concept structure. Please note that each source vocabulary’s view of synonymy is also present in the Metathesaurus, irrespective of whether it agrees or disagrees with the Metathesaurus view. 105 These strange mapping between names as they appear in different source vocabularies created for widely different purposes can still be very useful but the source vocabularies themselves are of variable quality (not all mappings are created equal) and the sorts of search which the UMLS supports reflects an already outmoded technology 106 is_a (sensu UMLS) congenital absent nipple is_a nipple surgical procedure not carried out because of patient’s decision is_a surgical procedure cancer documentation is_a cancer disease prevention is_a disease living subject is_a information object representing an animal or complex organism individual allele is_a act of observation limb is_a tissue 107 is_a (sensu UMLS) both testes is_a testis plant leaves is_a plant smoking is_a individual behavior walking is_a social behavior 108 Advantages of the methodology of shared coherently defined ontologies once the interoperable gold standard reference ontologies are there, it will make sense to reformulate parts of existing incompatible terminologies (e.g. in UMLS) in terms of the standard ontologies in order to achieve greater domain coverage and alignment of different but veridical views. Thus not everything that was done in the past turns out to be a waste. 109 is_a (sensu UMLS) A is_a B =def ‘A ’ is narrower in meaning than ‘B ’ grows out of the heritage of dictionaries (which ignore the basic distinction between universals and instances) 110 The really ugly 111 112 HL7 Marketing HL7 V3 claims to be: “The foundation of healthcare interoperability” “The data standard for biomedical informatics” from blood banks to Electronic Health Records to clinical genomics 113 HL7 Incredibly Successful adopted by Oracle as basis for its Electronic Health Record technology; supported by IBM, GE, Sun ... embraced as US federal standard central part of $35 billion program to integrate all UK hospital information systems 114 Problem V3 of HL7 is designed to address in HL7 V2 the realization of the messaging task allows ad hoc interpretations of the standard by each sending or receiving institution. Result: vendor products never properly interoperable, and always require mapping software. 115 The solution to this problem (V3) is the HL7 RIM or Reference Information Model = a world standard for exchange of information between clinical information systems 116 The V3 solution Remove optionality by having the RIM serve as a master model of all health information, from blood banks to Electronic Health Records to clinical genomics 117 The hype “HL7 V3 is the standard of choice for countries and their initiatives to create national EHR and EHR data exchange standards as it provides a level of semantic interoperability unavailable with previous versions and other standards. Significant V3 national implementations exist in many countries, e.g. in the UK (e.g. the English NHS), the Netherlands, Canada, Mexico, Germany and Croatia.” 118 The reality (I asked them) “None of the implementations have a national scope” (e.g. Stockholm City Council) The paradigm Dutch national HL7 V3 EHR implementation uses HL7 technology exclusively for exchanging data (i.e. messaging). The EHR architectures themselves are HL7-free. 119 The Oracle Healthcare Transaction Base (HTB) Oracle itself refers (April 2006) to three implementations of HTB described as being 'live for EHR projects': 1) Byrraju Foundation (BSRF) in India (Live) 2) Stockholm County (planned to go live by May 2006) 3) Louisiana (planned to go live by May 2006) 120 Regarding the Byrraju case, I am told that there is no V3 application running in India today and that the Byrraju Foundation is presently not using any telemedicine application that utilizes HL7. As to the Stockholm case, the HTB was purchased and deployed in late 2004. An attempt to port a pilot system was made during the spring of 2005. This attept was abandoned, as I understand from my Swedish colleagues, partly because of poor performance (the new application performed significantly less well than the system it was designed to replace, even though it was being run on considerably more expensive hardware), and partly because of a lack of fault tolerance, which made it inadequate as a mechanism for integrating legacy systems marked by a high degree of variation in data quality. During the spring of 2006, it seems, an attempt will be made to construct a new pilot application, this time with the more modest goal of handling referrals. 121 The hype The RIM is “credible, clear, comprehensive, concise, and consistent” It is “universally applicable” and “extremely stable” 122 The reality • HL7 V3 documentation is 542,458 KB, divided into 7,573 files • It remains subject to frequent revisions • It is very difficult to understand 123 The reality The decision to adopt the RIM was made already in 1996, yet the promised benefits of interoperability still, after 10 years, remain elusive. HL7 has bet the farm on the RIM – technology has advanced in these 10 years 124 RIM NORMATIVE CONTENT 125 to design a message, choose from here 126 Too many combinations as the traffic on HL7’s own vocabulary mailing list reveals, there is no adequate mechanism for ensuring that the vast number of combinations of coded terms within actual messages can be controlled in such a way that messages will be understood in the same way by designers, senders and receivers. 127 128 These pre-defined attributes code, class_code, mood_code, status_code, etc. yield a combinatorial explosion: class_code (61 values) x mood_code (13 values) x code (estimate 200) x status_code (10 codes) = 1.58 million combinations. Adding in the other codes this becomes 810 billion. 129 Why does the RIM embody so many combinations? To ensure in advance that everything can be said in conformity to the standard 130 The RIM methodology defines a set of ‘normative’ classes (Act, Role, and so on), with which are associated a rich stock of attributes from which one must make a selection when applying the RIM to each new domain (pharmacy, clinical genomics ...), Compare: attempting to create manufacturing software by drawing from a store containing preestablished parts (so that the store would need to have the bits needed for making every conceivable manufacturable thing, be it a lawnmower, a refrigerator, a hunting bow, and so on). 131 The RIM methodology are there examples where a methodology of this sort has been made to work? Does the RIM yield a coherent basis for constructing well-designed software artifacts for functions like the EHR or computerized decision support? 132 This methodology does not impede the formation of local dialects Different teams produce different message designs for the very same topic. In the UK, the £ 35 bn. NHS National Program “Connecting for Health” has applied the RIM rigorously, using all the normative elements, and it discovered that it needed to create dialects of its own to make the V3-based system work for its purposes (it still does not work) 133 The RIM documentation • is subject to multiple and systematic internal inconsistencies and unclarities: • is marked by sloppy and unexplained use of terms such as ‘act’, ‘Act’, ‘Acts’, ‘action’, ‘ActClass’ ‘Act-instance’, ‘Act-object’ • and uncertain cross-referencing to other HL7 documents • no publicly available teaching materials (no HL7 for Dummies) 134 from HL7 email forum (do not circulate) “I am ... frightened when I contemplate the number of potential V3ers who ... simply are turned away by the difficulty of accessing the product. “Some of them attend V3 tutorials which explain V3 as the hugely complex process of creating a message and are turned off. [They] simply do not have the stamina, patience, endurance, time, or brain-cells to understand enough for them to feel comfortable contributing to debates / listserves, etc., so they remain silent.” 135 Problems of scope Only two main classes in the RIM Act = roughly: intentional action Entity = persons, places, organizations, material How can the RIM deal transparently with information about, say, disease processes, drug interactions, wounds, accidents, bodily organs, documents? 136 Diseases in the RIM ... are not Acts ... are not Entities ... are not Roles, Participations ... So what are they? At best: a case of pneumonia is identified as the Act of Observation of a case of pneumonia Note: RIM’s treatment of SNOMED codes 137 HL7 Clinical Document Architecture defines a document as an Act HL7’s Clinical Genomics Standard Specifications defines an individual allele as an Act of Observation 138 Why the centrality of ‘Act’ because of HL7’s roots in US hospital messaging – and thus in US hospital billing: intentional actions are what can be billed 139 Mayo RIM discussion of the meaning of ‘Act’ as “intentional action” Is a snake bite or bee sting an "intentional action"? Is a knife stabbing an intentional action? Is a car accident an intentional action? When a child swallows the contents of a bottle of poison is that an intentional action? 140 The RIM has no coherent criteria for deciding For this reason, too, dialects are formed – and the RIM does not do its job. One health information system might conceive snakebites and gunshots as Procedures. Another might classify them with diseases, and so treat them as Observations. If basic categories cannot be agreed upon for common phenomena like snakebites, then the RIM is in serious trouble. 141 Are definitions like this a good basis for achieving semantic interoperability in the biomedical domain?: LivingSubject Definition: A subtype of Entity representing an organism or complex animal, alive or not. 142 Person (from HL7 Glossary) Definition: A Living Subject representing single human being [sic] who is uniquely identifiable through one or more legal documents 143 The Problem of Circularity A Person =def. A person with documents ‘An A is an A which is B’ – useless in practical terms, since neither we nor the machine can use it to find out what ‘A’ means – incorporates a vicious infinite regress – has the effect of making it impossible to refer to A’s which are not Bs, for example to undocumented persons 144 Katrina 145 Katrina 146 What is the RIM about? blood pressure measurement = an information item blood pressure = something in reality which exists independently of any recording of information, and which the measurement measures Q: Is the RIM about information, or about the reality to which such information relates? A: There is no difference between the two 147 RIM Philosophy “The truth about the real world is constructed through a combination and arbitration of attributed statements ... “As such, there is no distinction between an activity and its documentation.” 148 The RIM as an Information Model ‘a static (UML) model of health and health care information’ The scope of the RIM’s class hierarchy consists in packets of information: the information content of invoices, statements of observations, lab reports, … 149 A good, general constraint on a theory of meaning For each linguistic expression ‘E’ ‘E’ means E ‘snow’ means snow ‘pneumonia’ means pneumonia 150 From the perspective of the RIM on the Information Model conception ‘medication’ does not mean: medication rather it means: the record of medication in an information system ‘stopping a medication’ does not mean: stopping a medication rather it means: change of state in the record of a Substance Administration Act from Active to Aborted 151 The RIM’s Entity class persons, places, organizations, material 152 States of Entity • active: The state representing the fact that the Entity is currently active. • nullified: The state representing the termination of an Entity instance that was created in error. • inactive: The state representing the fact that an entity can no longer be an active participant in events. • normal: The “typical” state. Excludes “nullified”, which represents the termination state of an Entity instance that was created in error 153 Persons are Entities What do ‘active’ and ‘nullifed’ mean as applied to Person? Is there a special kind of death-throughnullification in the case of those instances of Person who were created in error? 154 HL7 Glossary Definition of Animal: A subtype of Living Subject representing any animal-of-interest to the Personnel Management domain. An Animal is not an animal. Rather (an) Animal represents an animal: it is an information item which represents a certain highly specific kind of animal-of-interest, namely an animal that is of interest to the Personnel Management domain. 155 Double Standards The RIM is a confusion of two separate artifacts: 1. an “information model”, relating to names of persons, records of observations, social security numbers, etc. 2. a reference ontology, relating to persons, observations, documents, acts, etc. 156 The examples provided to illustrate the RIM’s classes are almost always in conformity with the Reference Ontology Conception of the RIM They involve the familiar kinds of things and processes in reality (medication, patients, devices, paper documents, surgery, diet, supply of bedding) with which healthcare messages are concerned. 157 HL7 Glossary: Instances of Person include: John Smith, RN, Mary Jones, MD, etc. not: information about John Smith ... 158 Some of the RIM’s definitions are in conformity with the Information Model Conception 159 HL7’s backbone ‘Act’ class Definition of Act: A record of something that is being done, has been done, can be done, or is intended or requested to be done An Act is the record of an Act “There is no difference between an activity and its documentation” 160 Acts are records: but the examples of Act given by the RIM are as follows: “The kinds of acts that are common in health care are (1) a clinical observation, (2) an assessment of health condition (such as problems and diagnoses), (3) healthcare goals, (4) treatment services (such as medication, surgery, physical and psychological therapy), ... 161 The class Procedure (a subclass of Act) Definition of Procedure: An Act whose immediate and primary outcome (postcondition) is the alteration of the physical condition of the subject Examples: chiropractic treatment, acupuncture, straightening rivers, draining swamps. 162 What is an information model ? Is it a model of entities in reality (an ontology)? Or of information about entities in reality (an ontology)? The RIM is an incoherent mixture of the two Does this matter? 163 What’s gone wrong? People of good will are making mistakes because of insufficient concern for clarity and consistency Even large ontologies are built in the spirit of the amateur hobbyist Money is wasted on megasystems that cannot be used 164 Lessons for Semantic Interoperability Clear and easily accessible documentation – based on an intuitive ontology (understandable to all classes of users) Business model should be such that those responsible for creating documentation do not have an incentive for it to be unclear Centralized control of documentation, to ensure consistency (too much democracy is a bad thing) 165 Lessons for Standards for Semantic Interoperability Create standards on the basis of thorough pilot testing (Avoid systems like the RIM, which is imposed from the top down, on a wing and a prayer) 166 What should take the place of the RIM? 1. A Reference Ontology of the types of biomedical entity such as thing, process, person, disease, infection, molecule, procedure, etc., 2. A Reference Ontology of the types of biomedical information entity such as message, document, record, image, diagnosis, interpretation, etc. 1. provides a high-level framework in terms of which the lower-level types captured in vocabularies like SNOMED CT could be coherently organized 2. helps to specify how information can be combined into meaningful units and used for further processing. 167