Ontologies CSE 5810 Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut 371 Fairfield Road, Box U-255 Storrs, CT 06269-2155 steve@engr.uconn.edu http://www.engr.uconn.edu/~steve (860) 486 - 4818 ONTO-1 Motivation CSE 5810 Ontologies – Biomedical and Clinical What are they? How are they Used? What is Issue Facing Ontologies in Future? Each HIT System has its Own Ontology HIE Requires Integration of Patient Data Dealing with Semantic Differences (one EMR has weight in lbs, one in kg) Reconciling Ontologies – Each HIT System with Ontology for Same Info – Ontology + Data Impacts Integration – How do we Resolve Dramatic Differences? ONTO-2 Placing Ontologies into Perspective CSE 5810 Historical Evolution of WWW Ontology Definition and Description RDF and OWL Present Biomedical Ontology Applications of Biomedical Ontologies Clinical Trials OASIS: Integration Technique Clinical Decision Support System 3 ONTO-3 Current Information Systems on WWW CSE 5810 First Generation: Raw data which was pretty much hand-coded by the user was published online For example, Static web pages Second Generation: Dynamic content generation driven by MDA and databases Machines generate the respective HTML Third Generation: Semantic Web: Generating machine processable information where the content is machine understandable, enabling intelligent services such as information brokers, search agents, information filters to process domain related information. ONTO-4 What other Advances have Taken Place? CSE 5810 XML XML was designed to store and transport data. XML was designed to be both human- and machine-readable W3C recommended XML 1.0 on 2/10/1998 HTML5 5th revision of html markup language used for structuring and presenting content on the World Wide Web W3C published in October 2014 ONTO-5 What are Ontologies? CSE 5810 Definition (from Philosophy) : Ontology is study of being or existence and forms the basic subject matter of metaphysics. It seeks to describe the basic categories and relationships of being or existence to define entities and types of entities within its framework. Definition (from Computer Science): In Computer science , Ontology means “specification of a conceptualization”. It means “A data model that represents a set of concepts within a domain and the relationships between those concepts”. ONTO-6 Advantages of Ontology CSE 5810 Semantic way of representing knowledge of the domain Intelligent system can provide reasoning Systems to make inferences within the Ontology Two main Objectives Share the common structure of information Reuse the similar ontology in another domain ONTO-7 Development of Ontology CSE 5810 Determine the domain and Scope (Range) of the knowledge Look for an existing ontology in the similar domain Reuse without change (will it be possible?) Basis to evolve to domain-specific solution Listing all of Terminologies or Concepts of domain List all of classes and instances to be created in the ontology Create the properties which will relate these concepts in the ontology ONTO-8 Example of Ontology CSE 5810 Wine Australian Yellow Tail Individual Class Properties Color Yellow Flavor Delicate Maker Australia German ONTO-9 Parkinson’s Disease Management Ontology CSE 5810 ONTO-10 Parkinson’s Disease Management Ontology CSE 5810 ONTO-11 Parkinson’s Disease Management Ontology CSE 5810 ONTO-12 Parkinson’s Treatment Ontology CSE 5810 ONTO-13 Parkinson’s Treatment Ontology CSE 5810 ONTO-14 Neurological-Disease Ontology CSE 5810 ONTO-15 Neurological-Disease Ontology CSE 5810 ONTO-16 Excerpt of Medical Condition Ontology CSE 5810 ONTO-17 Patient Ontology CSE 5810 ONTO-18 Skelton Ontology CSE 5810 What is Phenotypic? A phenotype is the composite of an organism's observable characteristics or traits ONTO-19 How do Ontologies Related to other Models? CSE 5810 UML Model Substance Observation Person Name Id:Integer name: String statusCode: String effectiveTime:Date repeatNumber: Int Id:Integer statusCode: String name: String value: String Id: Integer name: name address: Address bday: String tel: String family-name: String given-name: String prefix: String suffix: String Address hasMedicalObservations takesPrescribedMedication Patient Ethnicity: String prefLang: String race:String Email: String gender: String getAllergies() get_clinical_notes() get_demographics() get_medications() get_immunizations() Provider deaNumber: String npiNumber:String Ethnicity: String race:String Email: String gender: String street: String locality: String region: String country: String ONTO-20 How do Ontologies Related to other Models? Entity Relationship Diagram CSE 5810 statusCode value Ethnicity id effectiveTime id prefLang Observation race Patient address Substance name id name tel effectiveTime bday statusCode repeatNumber Figure 3.3: Sample EHR Model in ERD. ONTO-21 How do Ontologies Related to other Models? CSE 5810 XML Schema <xs:element name=“Patient"> <xs:element name=“Substance"> <xs:complexType> <xs:complexType> <xs:sequence> <xs:sequence> <xs:element name=“id" type="xs:integer"/> <xs:element name=“id" type="xs:integer"/> <xs:element name=“ethnicity" type="xs:string"/> <xs:element name=“name" type="xs:string"/> <xs:element name=“race" type="xs:string"/> <xs:element name=“statusCode" type="xs:string"/> ………. ………. <xs:element name=“tel" type=“xs:string"/> <xs:element name=“repeatNumber" type=“xs:integer"/> </xs:sequence> </xs:sequence> </xs:complexType> </xs:complexType> </xs:element> </xs:element> <xs:element name=“takesPrescribedMedication"> <xs:sequence> <xs:element name=“Observation"> <xs:element ref =“Patient"/> <xs:complexType> <xs:element ref =“Substance"/> <xs:sequence> </xs:sequence> <xs:element name=“id" type="xs:integer"/> </xs:element> <xs:element name=“name" type="xs:string"/> <xs:element name=“hasMedicalObservation"> <xs:element name=“value" type="xs:string"/> <xs:element name=“statusCode" type=“xs:string"/> <xs:sequence> <xs:element ref =“Patient"/> </xs:sequence> <xs:element ref =“Observation"/> </xs:complexType> </xs:sequence> </xs:element> </xs:element> ONTO-22 How do we Model Ontologies? CSE 5810 Researchers proposed Semantic Web Stack illustrating hierarchy of languages, where each layer exploits and uses capabilities of the layers below OWL and RDF belong the family of knowledge representation language. RDF: Resource Description Framework http://www.w3.org/RDF/ OWL: Web Ontology Language http://www.w3.org/TR/owl-features/ RDF reminds of Semantic Networks which were popular in 1970’s ONTO-23 Introduction to RDF / OWL CSE 5810 ONTO-24 RDF: Resource Description Framework CSE 5810 RDF represents the knowledge in triples format: Subject – Predicate – Object For example, Students – registerTo – Classes (Subject) (Predicate) (Object) One triple is RDF is referred as a statement RDF is grammar based language has syntax similar to XML RDFS (RDF Schema) has syntax similar to RDF and provide schema grammar to RDF. For example, rdfs:Class, rdfs:subClassOf etc ONTO-25 RDF: Resource Description Framework CSE 5810 RDF syntax of the above example: <rdfs:Class rdf:about="http://www.example.com/examle#Students" rdfs:label="Students"> </rdfs:Class> <rdfs:Class rdf:about="http://www.example.com/examle#Classes" rdfs:label=“Classes"> </rdfs:Class> All the concepts described in the RDF are identified using an URI (ex. http://www.example.com/examle#Students). RDF can be viewed as standardized framework for providing metadata to domain concepts. ONTO-26 OWL: Web Ontology Language CSE 5810 OWL is placed on the top of the semantic web stack, utilizing all the powerful features offered by the layers below (RDF, RDFS, XML) OWL design has been influenced by description logic & knowledge representational paradigms SHIQ, Semantic Networks, Frames, SHOE, DAML, OIL, DAML+OIL. OWL provides richer semantic capabilities than its predecessor RDF For example, in the previous example, the predicate registerTo is of type rdf:Property. ONTO-27 OWL: Web Ontology Language CSE 5810 OWL differentiates between properties by defining owl:ObjectProperty – for connecting two concepts (registerTo) and owl:DatatypeProperty - for connecting a concept to a datatype (utilized from XML) These two properties inherit from RDF property OWL also defines owl:AnnotationProperty for embedding metadata onto classes, rules and axioms The following slide illustrates the use of OWL, RDF and RDFS ( taken from cardiac ontology build in OWL using protégé tool) ONTO-28 OWL: Web Ontology Language <owl:Class rdf:ID="Veins"> <rdfs:subClassOf> <owl:Class rdf:ID="Heart"/> </rdfs:subClassOf> </owl:Class> <Veins rdf:ID="Pulmonary_Vein"/> CSE 5810 Heart Vein Pulmonary Vein Pulmonary Vein is sub-class of Vein which is subclass of Heart. The next slide illustrates the OWL properties and expressive power of OWL to restrict the domain and range values accepted by these properties. BioMedical Informatics ONTO-29 OWL: Web Ontology Language <owl:ObjectProperty rdf:ID="Complications"> <rdfs:domain rdf:resource="#Cardiology_Diseases"/> <rdfs:range> <owl:Class> <owl:unionOf rdf:parseType="Collection"> <owl:Class rdf:about="#Cardiology_Complications"/> <owl:Class rdf:about="#Cardiology_Diseases"/> <owl:Class rdf:about="#Cardiology_Causes"/> </owl:unionOf> </owl:Class> </rdfs:range> </owl:ObjectProperty> CSE 5810 The object property “Complications” can take domain values from class “Cardiology_Diseases” and range values from combination of classes OWL combined with RDF/RDFS provides an environment for developing domain ontologies by organizing and describing the domain concepts BioMedical Informatics ONTO-30 Disease Ontology CSE 5810 Instances of Mitral_Valve_Disorders Hierarchical organization of Cardiology Diseases ONTO-31 Disease Ontology CSE 5810 Property Defined Representation of “Mitral_Valve_Prolapse” knowledge using properties and instances ONTO-32 Implemented Ontology in OWL Format ………….. CSE 5810 <Congenital_Heart_Disease rdf:ID="Atrial_septal_defect"> <Complications> <Cardiac_Arrhythmias rdf:ID="Arrhythmia"> <Has_Intervention rdf:datatype="http://www.w3.org/2001/XMLSchema#string" >defibrillation</Has_Intervention> <Have_Symptoms> <Cardiology_Symptoms rdf:ID="Dyspnea"/> </Have_Symptoms> <Has_Diagnosis_Test> <Cardiology_Diagnosis_Test rdf:ID="Coronary_Angiography"> <Has_Synonyms rdf:datatype="http://www.w3.org/2001/XMLSchema#string" >coronary catheterization </Has_Synonyms> ……………….. ONTO-33 Bio-Medical Ontologies CSE 5810 Review a Wide Range of Available Ontologies and Standards: OpenCyc WordNet Galen UMLS SNOMED – CT FMA Gene Ontology ONTO-34 Sample EHR Model in UML via HL7 CDA CSE 5810 Vitals Id: Integer effectiveTime: IVL_TS Immunization Procedure Observation Id:Integer code: CD statusCode: CS effectiveTime: IVL_TS product: CD routeCode: CD Id:Integer code: CD statusCode: CS effectiveTime: IVL_TS approachSiteCode:CD targetSiteCode: CD methodCode: CE Id:Integer statusCode: String effectiveTime: IVL_TS code: CD value: ANY targetSiteCode:CD hasVitals hasImmunizationRecords hasMedicalObservations perfomedProcedures Visit Id: Integer pId: Integer visitDate:Date encounters Substance Administration Patient Provider pId: Integer providerId: Integer Person Id: Integer name: name address: Address bday: String tel: String Id:Integer name: String statusCode: CS effectiveTime:IVL_TS doseQuantity: IVL_PQ routeCode:CE repeatNumber:ANY patientList Name Provider Patient deaNumber: String npiNumber:String Ethnicity: String race:String Email: String gender: String Ethnicity: String prefLang: String race:String Email: String gender: String getAllergies() get_clinical_notes() get_demographics() get_medications() get_immunizations() family-name: String given-name: String prefix: String suffix: String Address street: String locality: String region: String country: String ** CD, CE, CS, IVL_TS, ANY – HL7 CDA datatypes ONTO-35 OWL Equivalent for Observation <owl:Class rdf:Id=“IVL_TS”/> CSE 5810 <owl:DatatypeProperty rdf:Id=“Low”/> <owl:DatatypeProperty rdf:Id=“High”/> <owl:DatatypeProperty rdf:Id=“width”/> <owl:DatatypeProperty rdf:Id=“center”/> <owl:DatatypeProperty rdf:Id=“lowClosed”/> <owl:DatatypeProperty rdf:Id=“highClosed”/> </owl:Class> <owl:Class rdf:Id=“Observation”/> <owl:DatatypeProperty rdf:Id=“id”/> <owl:DatatypeProperty rdf:Id=“hasStatusCode”/> <owl:Attribute rdf:Id=“hasEffectiveTime”/> <owl:Attribute rdf:Id=“hasCode”/> <owl:Attribute rdf:Id=“hasValue”/> <owl:Attribute rdf:Id=“hasTargetSite”/> </owl:Class> ONTO-36 OWL Equivalent for Observation <owl:Class rdf:Id=“CD”/> CSE 5810 <owl:Attribute rdf:Id=“text”/> <owl:DatatypeProperty rdf:Id=“code”/> <owl:Attribute rdf:Id=“hasEffectiveTime”/> <owl:DatatypeProperty rdf:Id=“codeSystem”/> <owl:Domain rdf:Id=“Observation”/> <owl:Range rdf:Id=“IVL_TS”/> <owl:DatatypeProperty rdf:Id=“codeSystemName”/> <owl:Attribute/> <owl:DatatypeProperty rdf:Id=“codeSysteVersion”/> <owl:Attribute rdf:Id=“hasEffectiveTime”/> <owl:Domain rdf:Id=“Observation”/> <owl:DatatypeProperty rdf:Id=“displayName”/> <owl:Range rdf:Id=“IVL_TS”/> </owl:Class> <owl:Attribute/> <owl:Attribute rdf:Id=“hasCode”/> <owl:Domain rdf:Id=“Observation”/> <owl:Range rdf:Id=“CD”/> <owl:Attribute/> <owl:Attribute rdf:Id=“hasValue”/> <owl:Domain rdf:Id=“Observation”/> <owl:Range rdf:Id=“ANY”/> <owl:Attribute/> <owl:Attribute rdf:Id=“hasTargetSiteCode”/> <owl:Domain rdf:Id=“Observation”/> <owl:Range rdf:Id=“CD”/> <owl:Attribute/> ONTO-37 Sample OWL Ontology Model CSE 5810 …. …. (b) Test Ontology Model …. (a) Diagnosis Ontology Model Class Attribute Association (c) Anatomy Ontology Model Datatype Attribute ONTO-38 Ontology Example: Open Cyc CSE 5810 Open Cyc is an Upper level ontology developed by Cycorp Inc. Open Cyc has 60,000 hand coded assertions that capture “common sense language”, so that AI algorithms can perform human like reasoning and contains 6,000 concepts ONTO-39 Example of Open Cyc CSE 5810 ONTO-40 Ontology Example: Word Net CSE 5810 WordNet is an electronic lexical database developed at Princeton University that serves as a resource for applications in natural language processing and information retrieval. cancer, malignant neoplastic disease: any malignant growth or tumor caused by abnormal and uncontrolled cell division; it may spread to other parts of the body through the lymphatic system or the blood stream Cancer, Crab: (astrology) a person who is born while the sun is in Cancer Cancer: a small zodiacal constellation in the northern hemisphere; between Leo and Gemini Cancer, Cancer the Crab, Crab: the fourth sign of the zodiac; the sun is in this sign from about June 21 to July 22 Cancer, genus Cancer: type genus of the family Cancridae ONTO-41 Unifies Medical Language System CSE 5810 UMLS was developed for National Library of Medicine Disease is semantic type with around 392 relations (109 semantic relations and 22 other relations). Pneumonia categorized under one semantic type Disease, but has hundreds of relations. ONTO-42 Example Ontology: SNOMED-CT CSE 5810 SNOMED stands for Systemized Nomenclature Of Medicine Clinical Terms. SNOMED-CT is the result of merging two ontologies: SNOMED-RT and Clinical Terms. ONTO-43 Example Ontology: Clinical Trials CSE 5810 Low participation in Clinical Trials is the major problem in Clinical and translational research area. Matching the patient records to clinical trials is presently a manual procedure and its tedious. Need a Semantic Bridge between Clinical Ontologies (SNOMED CT, etc ..) and raw patient data for retrieving matching patient records, clinical guidelines and clinical decision support systems ( CDSS). ONTO-44 Technical Challenges CSE 5810 Challenges to be faced during real time scenario: Knowledge Engineering. Scalability Noisy or Incomplete Data Knowledge Engineering Clinical Ontology has the concept “Drug”, which described active composition of the various drugs However, patient record contains name of vendorspecific drugs list Clinical Ontology describe the cause of the disorder. The patient records only specify the presence or absence of the disorder and where was the clinical test conducted. ONTO-45 Architecture of Solution CSE 5810 Clinical Trials Patient Data SNOMED-CT Query Ontology ABox Reasoner TBox ONTO-46 Implementation Approach CSE 5810 Mapping Patient Data Terminology to SNOMED-CT Using UMLS as intermediate target. NLP mapping techniques Manual Mapping Map the raw patient data to SNOMED-CT terminology. Example: Cerner Drug: Lactulose Syrup 20G/30ml SNOMED-CT: administeredSubstance Allow user to specify which terms in the definition to be matched. Last Bullet Means Ontology Matching NOT Fully Automated! This is a Real Problem for Interoperating Data! ONTO-47 Contrast in Representation CSE 5810 Example: SNOMED-CT: Disease1 hasAgent Virus007 Infection due to Bacteria001 Infection due to MicroBacteria007 Patient Record: Disease1 Positive. As there is not much information in the patient record the query reasoner cannot find the records with partial data. ONTO-48 How are Observations Reconciled? CSE 5810 Clinical Trials Description NCT00084266 Patients with MSRA NCT00288808 Patients with warfarin NCT00298870 Patients on steroids NCT00304382 Patients with Pneumonia,source of Blood or Sputum Э associatedObservation MRSA Э associatedObservation Pneumococcal Penumonia П Э hasSpecimanSource Blood Ц Sputum ONTO-49 Clinical Decision Support System CSE 5810 Clinical Decision Support Systems (CDSS) are Interactive computer programs Designed to assist physicians and other health professionals with decision making tasks Components of CDSS: Knowledge Base Rule Based Engine Case Base Business Models ONTO-50 Example of Usage of Rules CSE 5810 IF “ RULE 1” &“RULE 2” &“RULE 3” ….. “Rule n” THEN “INTERVENTION 1 or Rule M” IF p.getGender() = “male” & p.getAge()=34 & p.getBP() <140 & p.getInsulinLevel()<20 THEN “ Asthma Intervention Level 2” Class Patinet HasGender “male” П hasAge “34” П hasBP MoreThan 140 П hasInsulinLevel MoreThan 20 ONTO-51 Ontology Integration CSE 5810 All ontologies developed have a common aim, describing the domain knowledge Integration of ontologies is becoming very critical Applications tend to use multiple ontologies Concepts in the various ontologies overlap or same concept is described in multiple ways. For example, the concept “Blood” is described as differently “Fluid” in one ontology “Substance” in another ontology “semi-solid” in a third ontology Need to Reconcile these Differences When Attempting to “Combine” data that Originates from Different Ontologies ONTO-52 Example of Conflicting Ontologies • Ontology 1: Disease References Symptoms which References Treatments Hierarchy of: CSE 5810 • • • Disease • Respiratory Disease • Cardio Disease • Nervous Disease Symptoms • General Symptoms • Behavioral Symptoms Treatment • General Treatment • Surgical Treatments • Ontology 2: Symptoms References Diseases which References Treatments Hierarchy of: • • • Symptoms • General Symptoms • Behavioral Symptoms Disease • Respiratory Disease • Cardio Disease • Nervous Disease Treatment • General Treatment • Surgical Treatments Previously Discussed Issues: How do you Integrate Ontologies Across HIT to Support HIE and Virtual Chart? How do you Merge Data Intensive Conflicting Ontologies? How do you query from Inside Out? ONTO-53 Ontology Integration CSE 5810 Semantics vs Structural Integration ? Difficulties of integration arise with similar, same and complementary ontology integration. Ontology B ONTO-54 OASIS Ontology Mapping and Integration Framework CSE 5810 ONTO-55 Summary - Ontologies CSE 5810 Ontology Definition and Descriptions Many Examples in Practice OWL and RDF Biomedical Ontology Open Cyc WordNet SNOMED - CT Application of Biomedical Ontology Clinical Trials OASIS: Integration Technique Clinical Decision Support System Integration of Ontologies ONTO-56