International Standard Bad Philosophy Barry Smith http://ontologist.com Organism Organ 10-1 m Tissue Cell 10-5 m Organelle Protein DNA 10-9 m A new golden age of classification 30,000 genes in human 200,000 proteins 100s of cell types 100,000s of disease types 1,000,000s of biochemical pathways (including disease pathways) … legacy of Human Genome Project “annotations” controlled vocabularies How overcome incompatibilities between different scientific index terms? immunology genetics cell biology Open Biological Ontologies Consortium Gene Ontology Cell Ontology Sequence Ontology Mouse Anatomy Ontology etc. http://obo.sourceforge.net/ Unified Medical Language System (UMLS) UMLS Metathesaurus: 1 million biomedical concepts 2.8 million concept names from more than 100 controlled vocabularies and classifications built by US National Library of Medicine UMLS a compendium of source vocabularies including: SNOMED (Systematized Nomenclature of Medicine ICD International Classification of Diseases MeSH – Medical Subject Headings Foundational Model of Anatomy LOINC (Logical Observation Identifiers Names and Codes) To reap the benefits of standardization we need to make ONE SYSTEM out of many different terminologies = UMLS “Semantic Network” nearest thing to an “ontology” in the UMLS UMLS SN 134 Semantic Types 54 types of edges (relations) yielding a graph containing more than 6,000 edges Fragment of UMLS SN Axioms UMLS Semantic Network entity is_a physical object organism conceptual entity Fruit similarTo Vegetable NarrowerTerm Orange synonymWith Apfelsine Graph with labels edges (similarTo, Narrower, synonymWith) Fixed set of edge labels (a.k.a. relations) Goble & Shadbolt UMLS SN is_a =def. If one item ‘is_a’ another item then the first item is more specific in meaning than the second item. (Italics added) fish is_a vertebrate copulation is_a biological process both testes is_a testis plant parts is_a plant Fragment of UMLS SN What are the nodes in this graph? Almost all nodes are linked to other nodes by a multiplicity of different types of edges Compare: swimming is healthy swimming has 8 letters Semantic Network Definition: Concept =def. An abstract concept, such as a social, religious, or philosophical concept How can concepts figure as relata of these relations? part_of =def. Composes, with one or more other physical units, some larger whole causes =def. Brings about a condition or an effect. contains =def. Holds or is the receptacle for fluids or other substances. How can a concept serve as a receptacle for fluids or other substances? How can concepts stand in relations such as affects or causes? connected_to =def. Directly attached to another physical unit as tendons are connected to muscles. How can a concept be directly attached to another physical unit? Fragment of UMLS SN Experimental Model of Disease affects Fungus Bacterium causes Experimental Model of Disease Biomedical or Dental Material causes Mental or Behavioral Dysfunction Manufactured Object causes Disease or Syndrome part of the UMLS Semantic Network UMLS Semantic Network entity physical object event conceptual entity UMLS Semantic Network entity physical object event conceptual entity conceptual entity Organism Attribute Finding Idea or Concept Occupation or Discipline Organization Group Group Attribute Intellectual Product Language Conceptual Entity Idea or Concept Functional Concept Qualitative Concept Quantitative Concept Spatial Concept Body Location or Region Body Space or Junction Geographic Area Molecular Sequence Amino Acid Sequence Carbohydrate Sequence Nucleotide Sequence Tonawanda Tonawanda is an Idea or Concept gene part_of cell component body system conceptual_part_of fully formed anatomical structure conceptual entity idea or concept functional concept body system But: Gene or Genome is defined as: “A specific sequence … of nucleotides along a molecule of DNA or RNA …” and nucleotide sequence is_a conceptual entity entity physical object conceptual entity idea or concept confusion of entity and concept functional concept body system Functional Concept: Body system is_a Functional Concept. but: Concepts do not perform functions or have physical parts. This: is not a concept Problem: Confusion of Is_A and Has_Role Physical Entity Chemical Entity Chemical Viewed Structurally Chemical Viewed Functionally Chemical Chemical Viewed Structurally Inorganic Organic Chemical Chemical Chemical Viewed Functionally Enzyme Biomedical or Dental Material Chemical Viewed Structurally vs. Chemical Viewed Functionally reflects a distinction between types of classification – not between types of entity compare a classificationof people into: tall people, people who play tennis, people who look like flies from a distance etc. The Hydraulic Equation BP = CO*PVR arterial blood pressure is directly proportional to the product of blood flow (cardiac output, CO) and peripheral vascular resistance (PVR) Confusion of Ontology and Epistemology blood pressure is an Organism Function, cardiac output is a Laboratory or Test Result or Diagnostic Procedure BP = CO*PVR thus asserts that blood pressure is proportional either to a laboratory or test result or to a diagnostic procedure Disease History is classified by UMLS under Health Care Activity This runs together the history or course of a disease on the side of the patient (ontology) with the act of eliciting that history (epistemology). Further Principles univocity: terms should have the same meanings (and thus point to the same referents) on every occasion of use UMLS-SN: ‘organization’ = body plan ‘organization’ = social organization rules for definitions intelligibility: the terms used in a definition should be simpler (more intelligible) than the term to be defined otherwise the definition provides no assistance to the understanding (for humans) or is unprocessable (for machines) UMLS-SN Semantic Relations Semantic Relation: functionally_related_to TUI: T139 Definition: Related by the carrying out of some function or activity. Inverse: functionally_related_to An unintuitive top-level with unintuitive (or no) rules for classification and definition leads to coding errors difficulties in training of curators obstacles to alignment with other ontology and terminology systems obstacles to harvesting content in automatic reasoning systems The UMLS Semantic Network is ‘an upper-level ontology … in which all concepts are given a consistent and semantically coherent representation’. Alexa McCray, “An upper level ontology for the biomedical domain”. Comp Functional Genomics 2003; 4: 80-84. CEN/TC251 ENV 12264 : This ENV is applicable to the description of the categorial structure of systems of concepts supporting computer-based terminological systems, including coding systems, for health-care. – concept : “unit of thought constituted through abstraction on the basis of properties common to a set of one or more referents” BUT THEY NEVER IN FACT LOOK AT THE REFERENTS AT ALL! ISO/TC215/N142: Health informatics — Vocabulary of terminology – The purpose of this International Standard is to define a set of basic concepts required to describe and discuss formal representation of concepts and characteristics, for use especially in formal computer based concept representation systems. – concept: “unit of knowledge created by a unique combination of characteristics” THEY ARE ALREADY TWO LEVELS REMOVED FROM THE REFERENT! CEN/TC 251 Europe-wide acceptance of the need for a comprehensive, communicable and secure pan-European Electronic Health Record as a prerequisite for high-quality healthcare. A problem for terminology integration EHRs across Europe need to use equivalent terms for equivalent disorders. standardized clinical terminologies now exist in an abundance of different flavors. the Unified Medical Language System (UMLS) contains over 100 systems, with in all some 3 million medical “concepts” we need international standards for terminologies responsibility of ISO Technical Committee (TC) 37 ISO TC 37 founded in 1952 by Eugen Wüster (1898-1977) businessman, saw-manufacturer, and professor of woodworking machinery in the Vienna Agricultural College fan of the Vienna Circle unified science movement devotee of Esperanto Wüster chaired TC 37 for the first 20 years of its existence was principal author of the documents which have served as the basis for work in terminology standardization ever since. astonishing influence due to normative character of ISO definitions 1935 Wüster’s theory of concept acquisition All knowledge of concepts starts out from sensory experience: The new-born infant finds itself “constantly amidst a panoply of diverse sensory impressions”. Soon, it begins to “analyse” this sensory mosaic and forms the opinion that the perceived impressions start from objective constructs which partly belong to its own body and partly are separated from it.” The child begins thereupon to mentally subdivide the sensory mosaic into individual objects (and Wüster stresses repeatedly in this connection that objects in reality are constructed by human beings, and that there is a high degree of arbitrariness and variability to such construction) Initially the child deals only with “individual objects” Every object “is for the child something unique, like a particular person.” But the child can also remember objects, and a memory that is not associated with sensory impressions “constitutes a ‘concept’. The concept of an individual is an ‘individual concept’. Examples are: “‘Napoleon’ or the concept of my fountain pen.” Concepts originate in memory. Then the child notices that there are individual objects which are “interchangeably alike” e.g. apples or bricks or cans of paint – objects which are also given the same name by older speakers of the language. Note that the thesis according to which concepts are aquired via perception of similars has also long since abandoned by cognitive scientists [6]. Here general concepts enter the scene “The child learns to blend the individual concepts of such objects in its thinking” and thus arrives at general concepts, which are, like individual concepts “thought (=mental) objects. They exist only in the heads of people.” Communication If “a speaker wishes to draw the attention of an interlocutor to a particular individual object, which is visible to both parties or which he carries with him, he only has to point to it”. ... Gulliver, The Academicians of Laggago Recall: the Academicians of Lagago held that since Words are only Names for Things, it would be more convenient for all Men to carry about them, such Things as were necessary to express the particular Business they are to discourse on … which hath only this Inconvenience attending it, that if a Man’s Business be very great, and of various kinds, he must be obliged in Proportion to carry a greater bundle of Things upon his Back Otherwise, “the only thing available is the individual concept of the object , provided that it is readily accessible in the heads of both persons.” (Those engaged in communication about, say, Napoleon, are thus somehow required to gain access to the interiors of each other’s heads.) individual concepts can be grouped together into general concepts “Several individual apples, for example, provide together the general concept ‘apple’. [This,] together with the concepts ‘pear’, ‘plum’ etc. then yield the superordinate concept ‘fruit’.” The formation of concepts at this level, too, is “highly dependent on human discretion.” so general concepts can be grouped together into concepts of higher degrees of abstraction. Concepts and their extensions Wüster hereby runs together individual object and individual concept, as is manifested in his notion of the extension of a concept which he defines as both “the totality of all subordinated concepts” and “the totality of all individual objects which fall under the concept”. There is a ‘realm’ (Reich) of concepts Terminology work is designed to provide clear delineations of the concepts in this realm, and only when such delineations have been achieved can terms be assigned. Terms materialize concepts Concepts can also be ‘materialised’ through token individuals, e.g. a chess piece or a playing card, which are called “representatives”. But the most important materializations are signs: Terms materialize concepts A proper name such as ‘Napoleon’, for example, can serve as a “substitute object”, which can be used to bring the corresponding individual concept “to the consciousness of the interlocutor”. The sign is available as a substitute object “if it is a concept [sic] which can easily be materialised at any time; that is, a suitable general concept ... in the form of a phonetic or graphic sign. For a phonetic concept (a phoneme or phoneme combination) and a graphic sign concept can easily be materialised at any time.” In this way, objects and concepts are confused not only with each other, but also with signs. For even signs are special kinds of concepts, in Wüster’s thinking, in spite of the fact that, as we recall, concepts “exist only in the heads of people”. Concepts and Characteristics before we can assign a term to a concept we must first “delineate” the concept, which means: list the totality of characteristics which form what he calls its content or intension. The characteristics of the concept bulb are: lamp light-emitting stuff solid stuff emission of light through electrically generated heat. What are characteristics? In some passages Wüster refers to them as if they were themselves concepts (so that characteristics, like concepts, would be in the heads of people). In other passages he refers to them characteristics as if they were properties of objects. More recent ISO documents have sought to resolve this conflict ISO-1087:1990 defines a concept as: A unit of thought constituted through abstraction on the basis of properties common to a set of objects. A characteristic it defines as: A mental representation of a property of an object serving to form and delimit its concept. ISO in 2000 Concept = A unit of knowledge created by a unique combination of characteristics. Characteristic = An abstraction of a property of an object or of a set of objects. Object = Anything perceivable or conceivable (a unicorn being given as a specific example of the latter). The problem The concept-based approach leaves those involved in the authoring and maintenance of terminologies unsure as to whether their task is the representation of ideas in people’s heads, or of meanings of words, or of types of entities and relations in the world. SNOMED-CT: “Disorders are concepts in which there is an explicit or implicit pathological process causing a state of disease which tends to exist for a significant length of time under ordinary circumstances.” Given SNOMED’s definition of concepts as “unique units of thought”, this would imply that all disorders are imagined. Wüsterianism in Medicine Wüster: concepts are formed on the basis of much human discretion and arbitrariness his ideas are well-suited to the area of medical terminology, which is subject to the constant coinage of novel terms. But in medicine we often have to deal with families of entities which manifest no or very few characteristics “identifiable in encounters of similars”, and certainly insufficiently many such characteristics to allow definitions of corresponding concepts. Hence some 85% of SNOMED-CT’s concepts remain undefined. A tumour starts out as (initially undetectable) mutations in a small number of cells and then becomes transformed by degrees into a full-fledged object its own right on the scale of coarse anatomy. processes in medicine embryological development, aging, the history of a disease No way to isolate in perception certain “essential properties” which could be identified as characteristics of corresponding general concepts. (Vienna Circle idea of logical reduction) Wüster’s notion of concept which underlies the terminology standards of TC 37 has nothing to do with medicine at all. He was concerned primarily with standardization in the domain of artefacts, of manufactured products The Machine Tool. An Interlingual Dictionary of Basic Concepts Artefacts truly are such as to manifest characteristics identifiable in encounters of similars – because they have been manufactured as such. Vocabulary itself is treated by Wüster and his TC 37 followers “as if it could be standardised in the same way as types of paint and varnish [TC 35] or aircraft and space vehicles [TC 20]” [12, p. 12]. Object ISO/IEC JTC1 SC36 N0579: an object is anything that can be perceived or conceived. Some objects, concrete objects such as a machine, a diamond, or a river, shall be considered material; other objects are to be considered immaterial or abstract, such as each manifestation of financial planning, gravity, flowability, or a conversion ratio; still others are to be considered purely imagined, for example, a unicorn, a philosopher’s stone or a literary character. Are processes objects? Are they concrete or abstract? Are characteristics objects? Are concepts objects? Are dispositions, functions, qualities, limbs, organs, bodily cavities, blood flow, apoptosis objects? Are they concrete or abstract? Material or immaterial? Real or imagined? Ontology = the task of creating a coherent, principled framework in which coherent answers to such questions can be given is of increasing importance to the future of medical coding and of the EHR. ISO makes clear its position as to the importance of this task for the future of terminology research: In the course of producing a terminology, philosophical discussions on whether an object actually exists in reality are beyond the scope of this standard and are to be avoided. Objects are assumed to exist and attention is to be focused on how one deals with objects for the purposes of communication. Unfortunately ISO’s definitions of terms like ‘object’ and ‘concept’ have been propagated in ever wider circles through all subsequent generations of relevant standards because of ISO’s own rules governing re-use But they are so vague as to leave the putative user of the corresponding standards entirely in the dark. An Ontological Basis for Coding Systems and the EHR European and international efforts towards the standardization of biomedical terminology and electronic healthcare records have been stymied International Standard Bad Philosophy. True, some critical remarks about certain conceptions in ISO TC 37 documents have been recently advanced, and the proposed alternative certainly represents an advance on Wüster in its treatment of individual objects. As concerns what is general, however, it still runs together objects and concepts, identifying specific kinds or types of phenomena in the world with the general concepts created by human beings. [14] In this way, like Wüster, it leaves itself with no benchmark in relation to which given concepts or concept-systems could be established as correct or incorrect. Bacteria would still have properties different from those of trees if there were no humans able to form the corresponding concepts. Now, however, it is time to do better, and to absorb the best ontological theories and tools which contemporary philosophy has to offer – and this means above all the right sort of ontology, an ontology that is able explicitly and unambiguously to relate the universal kinds or types in reality as well as to the individual tokens which are their instances [15]. Such kinds or types are organism, cell, neurulation, sleep, death. It is the job of medical terminology systems to represent universals (types, kinds), not concepts in people’s heads a role must be played in improving biomedical terminologies and coding systems by the resolute imposition of a coherent ontology of universals in place of the obfuscations of Eugen Wüster. Electronic Health Records Our idea is that such an ontology will enable us to introduce into EHR and coding systems a coherent representation of the different categories of entities in reality and of the relations between them as a substitute for the confused treatments of ‘object’, ‘concept’ and ‘characteristic’ that have predominated hitherto. In this way, it can help us also in ensuring that the coding systems and terminologies developed henceforth are compatible with each other and with the EHRs which they were designed to support. Many hold that it will suffice to establish communication standards for the EHR if we can only establish a way to refer unambiguously to “concepts” as units of knowledge agreed upon by domain experts and defined in formal ways. But even under such ideal conditions the focus on concepts would be misplaced. To allow clinical data registered in EHRs to be used for further automated processing, it should be clear whether entities in the associated coding system refer to diseases, or to statements made about diseases, to acts on the part of physicians, or to documents in which such acts are recorded, or to observations of such acts, or to statements about such observations. Applying a sound realist ontology to coding systems and to EHR architectures means in the first place ensuring that the latter are calibrated not to the denizens of Wüster’s “realm of concepts” but rather to those entities in reality – such as particular patients, diseases, therapies and the universals which they instantiate – which form the subject matter of healthcare. Better documentation coding systems built with the aid of a robust realist ontology will be consistent – not, perhaps, with information models concocted by database designers from afar – but rather with those commonsensical intuitions about the objects and processes in reality which are shared by patients and healthcare providers. In sum GO, UMLS, etc. remain at the level of TERMINOLOGY What we need is a REFERENCE ONTOLOGY = a formal theory of the foundational relations which hold ONTOLOGIES together The solution we need to distinguish clearly between concepts and universals: concepts are creatures of cognition universals are invariants (types, kinds, universals) out there in reality NCOR – The future National Center for Ontological Research Buffalo Stanford (OBO Consortium: Berkeley; Jackson Labs, ... (University of Washington, Seattle)... (EBI, Cambridge UK; Swiss Bioinformatics Institute, ...) Note we are not claiming that to establish the ontology of the world of medical universals will be a simple task. There is, it is clear, no single unified perspective on which all reasonable persons must agree if they would only open their eyes – Hence the popularity of T. S. Kuhn’s ideas on conflicting paradigms (and of Wüster’s own ideas on the human-induced arbitrariness involved in the “construction” of both objects and concepts). Against both Kuhn and Wüster we accept existence of a plurality of different perspectives on the world (perspectives corresponding, for example, to different life science disciplines, or to different biomedical terminologies, or to the different axes in SNOMED). But the world itself is one. Because of its immense complexity this one world is accessible to us only in terms of a wide variety of different sorts of complementary perspectives. These different perspectives correspond broadly to the concept-systems of Wüster and his followers. But the latter’s running together of concept and object means that they lack any benchmark in relation to which the integration of concept-systems could be effected. For us, in contrast, the world itself is able to serve also as benchmark for such integration. Thus first, we need to bring down the International Standards Organization The End