VT Realism, Concepts and Categories Or: how realism can be pragmatically useful for information systems Barry Smith Conceptual Spaces Great book Theory of conceptual spaces can be applied not only to concepts but also to the universals in reality (concepts of shell shapes + invariants among shell shapes themselves) PG: I’M A CONCEPTUALIST There is a reality (an unknowable Ding an sich) But I am concerned with perceptions, our experiences The concepts are in our heads There may be things out there but they are not very useful for us pragmatic standpoint PG: I’M A CONCEPTUALIST The concepts are in our heads There may be universals out there corresponding to our concepts, but they are not very useful for us pragmatic standpoint Do we need realism to solve practical problems in the theory of conceptual spaces? PG: I’M A CONCEPTUALIST The concepts are in our heads There may be universals out there corresponding to our concepts, but they are not very useful for us pragmatic standpoint Do we need realism to solve practical problems in the theory of conceptual spaces? Experiments are a good way to come into contact with reality Thus it is significant that in describing his experiment with shells PG distinguished between the space of shells in reality and the conceptual space of shells Experiments are a good way to come into contact with reality But in appealing to experiments in general (by other cognitive scientists) Peter is assuming that these scientists, too, are able to apprehend invariants in reality -- in children’s responses to stimuli -- in language uses -- in shapes and patterns in stimuli themselves (for example pictures of cups, jugs, bowls) In talking about pigeons, And in giving an argument for his own doctrine of conceptualism, Peter presupposed a distinction between the universals green and mixture-of-blue-andyellow out there in the world And also a distinction between pigeons and humans Therefore Peter is a realist Peter’s response: I am just pretending to believe that there is such a mind-independent distinction BUT THEN THIS DEPRIVES HIS ARGUMENT OF ALL RHETORICAL FORCE Peter’s second response: I believe that there is such a mindindependent distinction only when I’m arguing for this point in my theory, but when I’m arguing for other points in my theory I don’t believe it any more Indeed I deny it, because I want to remain faithful to my conceptualism PG: I’M A CONCEPTUALIST But he needs realism about universals to defend his argument Do we need universals for other (pragmatic) reasons? Do we need realism to solve practical problems? Medicine Medical learning What medical students know What doctors know -- highly multi-dimensional concept spaces What a typical patient knows -- a lower-dimensional concept space Medicine What a doctor knows -- a multi-dimensional concept space What there is to be known -- many medical phenomena we just can’t explain -- an even more highly multi-dimensional invariant space A practical medical problem How to choose a doctor? What a doctor knows vs. What there is to be known -- many medical phenomena we just can’t explain PG: I’M A CONCEPTUALIST But he needs realism to defend his argument Do we need realism or not to solve practical problems in information systems? PG Axiom: Concept systems have to be learnable Therefore there is an upper limit on the number of dimensions they can have, and on the topography by which these dimensions are organized Universal Medical Language System 5 million words 1.3 million concepts divided into 135 dimensions corresponding to the UMLS system of semantic types Computers mean that we can break out of the restraints of learnability And then we do not break out at random – rather we break out in reflection of the invariants we find in the reality studied by medical scientists Ontology like cartography must work with maps at different scales How fit these maps (conceptual grids) together into a single system? Consider them as grids transparent to reality allowing our directedness towards objects beyond Cartographic Projection intentionality = the directedness towards objects via conceptual grids object conceptual grids treated always only as mediators towards objects in reality intentionality = the directedness towards concepts concepts intentionality = the directedness towards objects via conceptual grids object conceptual grids treated always only as mediators towards objects in reality Intentional directedness … is effected via conceptual grids we are able to reach out to the objects themselves because our conceptual grids are transparent Kantianism = the inability to appreciate the fact that our conceptual grids can be transparent to reality beyond = Midas touch epistemology there are many compatible map-like partitions at different scales, which are all transparent to the reality beyond animal Universe/Periodic Table bird canary ostrich ontology of DNA space fish ontology of biological species animal Universe/Periodic Table bird fish canary ostrich both are transparent partitions of one and the same reality Ontological Zooming The job of the ontologist is to understand how different partitions of the same reality interrelate Back to our practical problem Why Neokantianism makes for bad information systems ontologies IFOMIS Institute for Formal Ontology and Medical Information Science http://ifomis.de The problem Different communities of medical researchers use different and often incompatible category systems in expressing the results of their work Example: Medical Nomenclature UMLS: blood is a tissue MeSH: blood is a body fluid different concept systems need not interconnect at all for example they may relate to entities of different granularity we cannot make incompatible terminology-systems interconnect just by looking at concepts, or knowledge or language to decide which of a plurality of competing definitions to accept we need some tertium quid we need, in other words, to take the world itself into account For medical students: patients are the solution In information systems ontology is the solution’ Two alternative readings Ontologies are special sorts of terminology systems = currently popular IT conception, with roots in KR Ontologies are special sorts of theories about entities in reality = traditional philosophical conception, embraced by IFOMIS Example: The Gene Ontology (GO) hormone ; GO:0005179 %digestive hormone ; GO:0046659 %peptide hormone ; GO:0005180 %adrenocorticotropin ; GO:0017043 %glycopeptide hormone ; GO:0005181 %follicle-stimulating hormone ; GO:0016913 % = subsumption (lower term is_a higher term) as tree hormone digestive hormone adrenocorticotropin peptide hormone glycopeptide hormone follicle-stimulating hormone GO is very useful for purposes of standardization in the reporting of genetic information but it is not much more than a telephone directory of standardized designations organized into hierarchies GO can in practice be used only by trained biologists whether a GO-term stands in the subsumption relationship depends on the context in which the term is used (for example on the type of organism) A still more important problem: GDB Genome Database of Human Genome Project GenBank National Center for Biotechnology Information, Washington DC etc. What is a gene? GDB: a gene is a DNA fragment that can be transcribed and translated into a protein GenBank: a gene is a DNA region of biological interest with a name and that carries a genetic trait or phenotype GO uses ‘gene’ in its term hierarchy, but it does not tell us which of these definitions is correct GO has no robust formal organization no capability to be aligned with systems which would have the power to use it to reason with genetic information GO deals with basic ontological notions very haphazardly GO’s three main term-hierarchies are: component, function and process But GO confuses functions with structures, and also with executions of functions and has no clear account of the relation between functions and processes BFO vs. KR In the knowledge engineering world in which information systems ontology has its home terms and concepts come first, – the job is to validate them by building working programs In the BFO world robust ontology (with all its reasoning power) comes first and terms and term-hierarchies must be subjected to the constraints of ontological coherence IFOMIS: Get basic ontological organization right and problems of formalization (consistency, portability) will become easier to solve later Current orthodoxy focuses instead on issues of representation (XML) and reasoning (Description logics) Description logics decidable logics, thus expressively weaker than first-order predicate logic used for ensuring consistency of definitions of terms and for computing relations of subsumption ontologically neutral (i.e. neutral as between good ontology and ontological nonsense) Concept hierarchy ontology: Ontologies are inside the computer thus subject to severe constraints on expressive power (effectively the expressive power of description logic) Concept hierarchy ontology cannot solve the data-integration problem because of its roots in knowledge representation/knowledge mining Concept hierarchy ontology has its philosophical roots also in Quine’s doctrine of ontological commitment and in the ‘internal metaphysics’ of Carnap/Putnam Roughly, for a concept hierarchy ontology the world and the semantic model are one and the same What exists = what the system says exists Quineanism: ontology is the study of the ontological commitments or presuppositions embodied in scientific theories (or in the beliefs of experts) Quineanism, too, faces the integration problem If an ontology is the set of ontological commitments of a theory, how can we cope with questions pertaining to the relations between the objects to which different theories are committed? (Recall the Vienna Circle program of the Unity of Science) What is needed is some sort of wider common framework sufficiently rich and nuanced to allow concept systems deriving from different theoretical/data sources to be handcallibrated What is needed is not a Concept Hierarchy Ontology but a Reference Ontology (something like old-fashioned, realist, metaphysics) Reference Ontology An ontology is a theory of a domain of entities in the world Ontology is outside the computer seeks maximal expressiveness and adequacy to reality and sacrifices computational tractability for the sake of representational adequacy Reference Ontology a theory of the tertium quid – called reality – needed to hand-callibrate database/terminology systems Methodology Get ontology right first (realism; descriptive adequacy; rather powerful logic); solve tractability problems later DL: ontology deals with ‘simplified models’ Tom Gruber (1993): An ontology should make as few claims as possible about the world being modeled … specifying the weakest theory (allowing the most models) and defining only those terms that are essential to the communication of knowledge consistent with that theory. Belnap “it is a good thing logicians were around before computer scientists; “if computer scientists had got there first, then we wouldn’t have numbers because arithmetic is undecidable” It is a good thing Aristotelian metaphysics was around before description logic, because otherwise we would have only hierarchies of concepts/universals/classes and no individual instances … SNOMED-RT Systematized Nomenclature of Medicine A Reference Terminology with Legal Force Example 2: SNOMED-RT – 121,000 concepts, – 340,000 relationships – “common reference point for comparison and aggregation of data throughout the entire healthcare process” Problems with both UMLS and SNOMED Each is a ‘fusion’ of several source vocabularies, some of dubious quality They were fused without an ontological system being established first They contain circularities, taxonomic gaps, and unnatural ad hoc determinations SNOMED RT (2000) already has description logic definitions but it also has some bad coding, which derives from failure to pay attention to ontological principles: e.g. both testes is_a testis DL is supposed to allow future SNOMED to reason from data formulated in a structured way to handle multiple relationship types, in addition to is_a to take account of context-sensitivity in use of terms The long march of Description Logic Today SNOMED Tomorrow THE WORLD The Semantic Web Initiative The Web is a vast edifice of heterogeneous data sources Needs the ability to query and integrate across different conceptual systems How resolve such incompatibilities? enforce terminological compatibility via standardized term hierarchies, with standardized definitions of terms, which 1. satisfy the constraints of a description logic (DL) 2. are applied as meta-tags to websites Semantic Web effort thus far devoted primarily to developing systems for standardized representation of web pages and web processes (= ontology of web typography) not to the harder task of developing of ontologies (term hierarchies) for the content of such web pages Metadata: the new Silver Bullet agree on a metadata standard for washing machines as concerns size, price, etc. create machine-readable databases and put them on the net consumers can query multiple sites simultaneously and search for highly specific, reliable, context-sensitive results A world of exhaustive, reliable metadata would be a utopia. Ontology Learning Semi-automatic tuning of ontology with human intervention – cooperative paradigm ontology ontology learning candidate new concepts new ontology Verizon The promise of Web Services, augmented with the Semantic Web, is to provide THE major solution for integration, the largest IT cost / sector, at $ 500 BN/year. The Web Services and Semantic Web trends are heading for a major failure (i.e., the most recent Silver Bullet). In reality, Web Services, as a technology, is in its infancy. ... There is no technical solution (i.e., no basis) other than fantasy for the rest of the Web Services story. Analyst claims of maturity and adoption (...) are already false. ... Verizon must understand it so as not to invest too heavily in technologies that will fail or that will not produce a reasonable ROI. Dr. Michael L. Brodie, Chief Scientist, Verizon IT OntoWeb Meeting, Innsbruck, December 16-18, 2002 PLAN General problems with the Semantic Web initiative (Partial) solutions to these general problems in the medical domain Problems specific to the medical domain The Semantic Web General problems with the Semantic Web initiative (Partial) solutions to these general problems in the medical domain Problems specific to the medical domain Problem 1: People lie Meta-utopia is a world of reliable metadata. But poisoning the well can confer benefits to the poisoners Metadata exists in a competitive world. Some people are crooks. Some people are cranks. Some people are Neokantians. Problem 2: People are lazy Half the pages on Geocities are called “Please title this page” Problem 3: People are stupid The vast majority of the Internet's users (even those who are native speakers of English) cannot spell or punctuate Will internet users learn to accurately tag their information with whatever DLhierarchy they're supposed to be using? Problem 4: Multiple descriptions “Requiring everyone to use the same vocabulary denudes the cognitive landscape, enforces homogeneity in ideas.” (Cary Doctorow) Problem 5: Ontology Impedance = semantic mismatch between ontologies being merged This problem recognized in Semantic Web literature: http://ontoweb.aifb.uni-karlsruhe.de /About/Deliverables/ontoweb-del-7.6-swws1.pdf Solution 1: treat it as (inevitable) ‘impedance’ and learn to find ways to cope with the disturbance which it brings Suggested here: http://ontoweb.aifb.uni-karls-ruhe.de/About/Deliverables/ontoweb-del-7.6-swws1.pdf Solution 2: resolve the impedance problem on a case-by-case basis Suppose two databases are put on the web. Someone notices that "where" in the friends table and "zip" in the places table mean the same thing. http://www.w3.org/DesignIssues/Semantic.html Both solutions fail treating mismatches as ‘impedance’ ignores the problem of error propagation (and is inappropriate in an area like medicine) 2. resolving impedance on a case-by-case basis defeats the very purpose of the Semantic Web The Semantic Web General problems with the Semantic Web initiative (Partial) solutions to these general problems in the medical domain Problems specific to the medical domain Solutions in the medical domain Problem 1: People lie Problem 2: People are lazy Problem 3: People are stupid None of these is true in the world of medical informatics Solutions in the medical domain Problem 1: People lie Problem 2: People are lazy Problem 3: People are stupid Achieve quality control via division of labour Division of Labour 1. Clinical activities 2. Structured data representation 3. Software coding (e.g. for NLP) Division of Labour 1. Clinical activities 2. Structured data representation 3. Software coding 4. Ontology building Use 4. to constrain 2. and 3. to achieve better data processing via quality control DL-Division of Labour 1. Clinical activities 2. Structured data representation 3. Software coding 4. Ontology building For DL 4. is a special case of 3. For DL Ontologies are software tools thus limited in their expressive power and in their effectiveness as quality controls IFOMIS idea: distinguish two separate tasks: - the task of developing computer applications capable of running in real time the task of developing an expressively rich ontology of a sort which will allow sophisticated quality control The Semantic Web General problems with the Semantic Web initiative (Partial) solutions to these general problems in the medical domain Problems specific to, or made more acute within, the medical domain Problem 4: Multiple descriptions Requiring everyone to use the same vocabulary to describe their material is not always medically practicable Clinicians often do not use category systems at all – they use unstructured text from which usable data has to be extracted in a further step Why? Because every case is different, much patient data is context-dependent Problem 5: Ontology Impedance = semantic mismatch between ontologies ‘gene’ used in websites issued by biotech companies involved in gene patenting medical researchers interested in role of genes in predisposition to smoking insurance companies Other problems with DL-based ontologies DL poor when dealing with contextdependent information/usages of terms DL poor when it comes to dealing with information about instances (rather than concepts or classes) also DL poor when it comes to dealing with time SARS is NOT Severe Acute Respiratory Syndrome it is THIS collection of instances of Severe Acute Respiratory Syndrome associated with THIS coronavirus and ITS mutations BFO = basic formal ontology BFO ontology not the ‘standardization’ or ‘specification’ of concepts (not a branch of knowledge or concept engineering) but an inventory of the types of entities existing in reality BFO goal: to remove ontological impedance by constraining terminology systems with good ontology BFO not a computer application but a reference ontology (not a reference terminology in the sense of SNOMED) Recall: GDB: a gene is a DNA fragment that can be transcribed and translated into a protein Genbank: a gene is a DNA region of biological interest with a name and that carries a genetic trait or phenotype Ontology ‘fragment’, ‘region’, ‘name’, ‘carry’, ‘trait’, ‘type’ ... ‘part’, ‘whole’, ‘function’, ‘inhere’, ‘substance’ … are ontological terms in the sense of traditional (philosophical) ontology UMLS Semantic Network a tool to find our way around the UMLS Metathesaurus (January 2003 version consists of 135 Semantic Types + 54 links) can be arranged in the form of a graph whose vertices are the Semantic Types and whose edges are the links. UMLS Semantic Network arranged in a double tree structure, with two superclasses: Entities and Events. Entity = “A broad type for grouping physical and conceptual entities”. Event = “A broad type for grouping activities, processes and states”. Basic Formal-Ontological Distinctions 1. Continuant vs. Occurrent (= SNAP vs. SPAN) 2. Dependent vs. Independent 3. Universals vs. Particulars Basic Formal-Ontological Distinctions 1. Continuant vs. Occurrent (= SNAP vs. SPAN) 2. Dependent vs. Independent 3. Universals vs. Particulars Continuant vs. Occurrent (= SNAP vs. SPAN) continuants = entities which continue to exist through time, e.g. organisms, cells, chromosomes occurrents = entities which unfold themselves through time in successive temporal phases, e.g. an intravenous drug infusion continuant/occurrent = (roughly) UMLS distinction between Entity and Event Basic Formal-Ontological Distinctions 1. Continuant vs. Occurrent (= SNAP vs. SPAN) 2. Dependent vs. Independent 3. Universals vs. Particulars Dependent vs. Independent independent = has an inherent ability to exist without reference to other entities – e.g. molecules, organisms, planets dependent = require a support from other entities in order to exist – e.g. cellular motion (which requires reference to a cell which moves), or viral infection (which requires reference to some carrier) Need to find ways to deal with time in medical informatics Guidelines Workflow need to be clear about the distinction between continuants and occurrents occurrents (in medicine) are always dependent entities. Thus of the four abstractly possible combinations only three are instantiated Independent and Dependent Continuants Independent Continuants = substances, objects, things Dependent Continuants = qualities (your height, your skin-color) states or conditions (your diabetes) roles (your role as student, as doctor) functions (of a drug, of a machine) UMLS Semantic Network Conceptual Entity, with subclasses: Organism Attribute Finding Idea or Concept Occupation or Discipline Organization Group Group Attribute Intellectual Product Language Conceptual Entities are dependent on minds but Organism Attributes can exist without minds and Groups (e.g. a group of macac monkeys) can exist without minds UMLS Semantic Tree with root Event Event has subclasses: Activity Phenomenon or Process Natural Phenomenon or Process Biologic Function Physiologic Function Pathologic Function runs together functions, which are continuants, with processes, which are occurrents Functions are continuants Functions exist self-identically through time; they have no temporal phases and exist even when not being exercised The exercise of a function unfolds itself through its temporal phases The compilers of UMLS have confused what exists dispositionally in a thing, and is the product of design or evolution, with what the thing does episodically, and is the product of intentionality or immediate causal influence UMLS Semantic Type Collections Chen, Perl et al. and Geller, Perl et al. partition the UMLS Semantic Network into more meaningful units called Semantic Type Collections. problems revealed by the BFO analysis especially in: Pathologic Function Physiologic Function Idea or Concept Subclasses of Pathologic Function Experimental Model of Disease Cell or Molecular Dysfunction Cell or Molecular Dysfunction Disease or Syndrome Mental or Behavioral Dysfunction Subclasses of Physiologic Function Organ or Tissue Function Mental Process Molecular Function Mental Process Genetic Function Cell Function Subclasses of Idea or Concept Functional Concept Body System Temporal Concept Qualitative Concept Quantitative Concept Spatial Concept Geographic Area Body Location or Region Molecular Sequence Carbohydrate Sequence Amino Acid Sequence Body Space or Junction Nucleotide Sequence CIRCULATORY SYSTEM bodily systems are parts of organisms (like fingers and hands) UMLS has ontological problems, too Idea or Concept Functional Concept Qualitative Concept Quantitative Concept Spatial Concept Body Location or Region Body Space or Junction Geographic Area Molecular Sequence Amino Acid Sequence Carbohydrate Sequence Nucleotide Sequence UMLS has ontological problems, too Idea or Concept Functional Concept Qualitative Concept Quantitative Concept Spatial Concept Body Location or Region Body Space or Junction Geographic Area Molecular Sequence Amino Acid Sequence Carbohydrate Sequence Nucleotide Sequence Copenhagen is an Idea or Concept UMLS has ontological problems, too Idea or Concept Functional Concept Qualitative Concept Quantitative Concept Spatial Concept Body Location or Region Body Space or Junction Geographic Area Molecular Sequence Amino Acid Sequence Carbohydrate Sequence Nucleotide Sequence Spatial Concept Body Location or Region An area, subdivision, or region of the body demarcated for the purpose of topographical description. Body Space or Junction An area enclosed or surrounded by body parts or organs or the place where two anatomical structures meet or connect. Geographic Area A geographic location, generally having definite boundaries. Idea or Concept Functional Concept A concept which is of interest because it pertains to the carrying out of a process or activity. Body System A complex of anatomical structures that performs a common function. Case Study: Regulation of Blood Pressure UMLS: hypertension is a Disease or Syndrome or a Sign or Symptom blood pressure is an Organism Function. Both are dependent SNAP entities: they endure identically for a certain period of time and they depend for their existence on their bearer. The hydraulic equation: BP = CO*PVR arterial blood pressure is directly proportional to the product of blood flow (cardiac output, CO) and peripheral vascular resistance (PVR). UMLS: blood flow is an Organism Function cardiac output is a Laboratory or Test Result (SNAP) and a Diagnostic Procedure (SPAN) Blood pressure is_proportional_to_a laboratory or test result? Blood pressure is_proportional_to_a diagnostic procedure? An amino acid sequence is_an idea or concept Copenhagen is_a spatial concept How eliminate this nonsense? Basic Formal-Ontological Distinctions 1. Continuant vs. Occurrent (= SNAP vs. SPAN) 2. Dependent vs. Independent 3. Universals vs. Particulars Replace concepts in peoples’ heads (e.g. in UMLS) with universals in re teach medical terminology systems the distinction between universals and particulars distinguish clearly between ontology (the study of reality) and epistemology/psychology (the study of peoples’ concepts) UMLS confuses epistemology with ontology it confuses the results of our attempts to gain knowledge of specific aspects of the organism (functions, qualities, processes) with those aspects themselves. What would a better UMLS toplevel look like? The Reference Ontology Community IFOMIS (Leipzig) Laboratories for Applied Ontology (Trento/Rome, Turin) Foundational Ontology Project (Leeds) Ontology Works (Baltimore) Ontek Corporation (Buffalo/Leeds) Language and Computing (L&C) (Belgium/Philadelphia) Domains of Current Work IFOMIS Leipzig: Medicine, Bioinformatics Laboratories for Applied Ontology Trento/Rome: Ontology of Cognition/Language Turin: Law Foundational Ontology Project: Space, Physics Ontology Works: Genetics, Molecular Biology Ontek Corporation: Biological Systematics Language and Computing: Natural Language Understanding Two basic BFO oppositions Granularity (of molecules, genes, cells, organs, organisms ...) SNAP vs. SPAN getting time right of crucial importance for medical informatics BFO = SNAP/SPAN + Theory of Granular Partitions + – theory of universals and instances – theory of part and whole – theory of boundaries – theory of functions, powers, qualities, roles – theory of environments – theory of spatial and spatiotemporal regions MedO: medical domain ontology – universals and instances and normativity – theory of part and whole and absence – theory of boundaries/membranes – theory of functions, powers, qualities, roles, (mal)functions, bodily systems – theory of environments: inside and outside the organism – theory of spatial and spatiotemporal regions: anatomical mereotopology MedO: medical domain ontology – theory of granularity relations – between – molecule ontology – gene ontology – cell ontology – anatomical ontology – etc.