Introduction to Databases: From Data to Knowledge Bases Instructors: Bertram Ludaescher Kai Lin Overview • • • • • 08:30-9:30 9:30 – 9:45 9:45 -11:20 11:20-11:50 11:50-13:15 Introduction to KR (1h) BREAK (15’) Intro to KR (1h45’) Demos (30’) LUNCH (1h25’) • Demonstrations/Hands-on (~30’) • Ontology-enabled data integration • Concept map creation tool • Ontology creation tool Introduction to KR, B. Ludaescher & K. Lin 2 The Problem: Scientific Data Integration or: … from Questions to Queries … Introduction to KR, B. Ludaescher & K. Lin 3 Ontology Cheat Sheet (1/2) • What is an ontology? An ontology usually … – specifies a theory (a set of logic models) by … – defining and relating … – concepts representing features of a domain of interest • Also overloaded (sloppy) for: – – – – – – Controlled vocabularies Database schema (relational, XML Schema/DTD, …) Conceptual schema (ER, UML, … ) Thesauri (synonyms, broader term/narrower term) Taxonomies (classifications) Informal/semi-formal knowledge representations • “Concept spaces”, “concept maps” • Labeled graphs / semantic networks (RDF) – Formal ontologies, e.g., in [Description] Logic (OWL) • “formalization of a specification” constrains possible interpretation of terms Introduction to KR, B. Ludaescher & K. Lin 4 Ontology Cheat Sheet (2/2) • What are ontologies used for? – Conceptual models of a domain or application, (communication means, system design, …) – Classification of … • concepts (taxonomy) and • data/object instances through classes – Analysis of ontologies e.g. • Graph queries (reachability, path queries, …) • Reasoning (concept subsumption, consistency checking, …) – Targets for semantic data registration – Conceptual indexes and views for • • • • searching, browsing, querying, and integration of registered data Introduction to KR, B. Ludaescher & K. Lin 5 Ontologies as Metadata++ Ontologies = Smarter Metadata TM Introduction to KR, B. Ludaescher & K. Lin 6 Smarter (Meta)data I: Logical Data Views Adoption of a standard (meta)data model => wrap data sets into unified virtual views Source: NADAM Team (Boyan Brodaric et al.) Introduction to KR, B. Ludaescher & K. Lin 7 Smarter Metadata II: Multihierarchical Rock Classification for “Thematic Queries” (GSC) –– or: Taxonomies are not only for biologists ... Genesis Fabric “smart discovery & querying” via multiple, independent concept hierarchies (controlled vocabularies) • data at different description levels can be found and processed Composition Texture Introduction to KR, B. Ludaescher & K. Lin 8 Smarter Metadata III: Source Contextualization & Ontology Refinement Biomedical Informatics Research Network http://nbirn.net The next frontier: Capturing Knowledge about Dynamic Processes “Process Ontologies” Introduction to KR, B. Ludaescher & K. Lin 9 Ontology-Enabled Application Example: Geologic Map Integration domain knowledge Show formations where AGE = ‘Paleozic’ (without age ontology) (with age ontology) +/- a few hundred million years Nevada Introduction to KR, B. Ludaescher & K. Lin Show formations where AGE = ‘Paleozic’ 10 Integrated querying of multiple datasets via different “ontologies” (conceptual views) Introduction to KR, B. Ludaescher & K. Lin 11 Querying by Geologic Age … Introduction to KR, B. Ludaescher & K. Lin 12 Querying by Geologic Age: Result Introduction to KR, B. Ludaescher & K. Lin 13 Querying by Chemical Composition … Introduction to KR, B. Ludaescher & K. Lin 14 Querying by Chemical Composition: Results Note the fine differences in shades of gray: DO know: It’s NOT there! ! ? DON’T know! (not registered) OK – we got to work on the color coding ;-) Introduction to KR, B. Ludaescher & K. Lin 15 Querying w/ British Rock Classification Uses a GSC BRC inter-ontology articulation mapping Introduction to KR, B. Ludaescher & K. Lin 16 British Rock Classification Query: Results Uses a GSC BRC inter-ontology articulation mapping Introduction to KR, B. Ludaescher & K. Lin 17 Different views on State Geological Maps Introduction to KR, B. Ludaescher & K. Lin 18 The Query: Show sedimentary rocks The Puzzle: Find the 17 differences in the results… but first: what states are we looking at? Introduction to KR, B. Ludaescher & K. Lin 19 Sedimentary Rocks: BGS Ontology Introduction to KR, B. Ludaescher & K. Lin 20 Sedimentary Rocks: GSC Ontology Introduction to KR, B. Ludaescher & K. Lin 21 Differing Conceptual Views: Why? • We are looking at the same datasets – why do they look different? – Different rock classifications (GSC, BGS) are used as “targets” for registering data to – Not every rock name/rock type found in the raw data is found in both classifications – The mapping (“articulation”) between the classifications is an approximation only • Yet: having “conceptual views” (even if different) on the data really seems like a good idea… Introduction to KR, B. Ludaescher & K. Lin 22 Geologic Map Integration • Given: – Geologic maps from different state geological surveys (shapefiles w/ different data schemas) – Different ontologies: • Geologic age ontology • Rock classification ontologies: – Multiple hierarchies (chemical, fabric, texture, genesis) from Geological Survey of Canada (GSC) – Single hierarchy from British Geological Survey (BGS) • Problem: – Support uniform queries using different ontologies – Support registration w/ ontology A, querying w/ ontology B Introduction to KR, B. Ludaescher & K. Lin 23 A Multi-Hierarchical Rock Classification “Ontology” (really:Taxonomy) Genesis Fabric Composition Texture Introduction to KR, B. Ludaescher & K. Lin 24 Implementation in OWL: Not only “for the machine” … Introduction to KR, B. Ludaescher & K. Lin 25 Demonstration of Ontology-enabled Map Integration (OMI) v2 Data Data Ontology enabled Map Integrator {A,B} ontology A ontology B Data Application (B) ontology C Application (C) Data Data sets Introduction to KR, B. Ludaescher & K. Lin Ontologies 26 Applications Ontology Mapping: Overview • Align ontologies • Integrate data sets which are registered to different ontologies • Query data sets through different ontologies register Data set 1 Ontology 1 queries Ontology mappings Data set 2 Introduction to KR, B. Ludaescher & K. Lin register Ontology 2 27 Geology Workbench: Initial State click on Ontologies click on Datasets click on Applications An Ontology-based Mediator Introduction to KR, B. Ludaescher & K. Lin 28 Geology Workbench: Uploading Ontologies click on Ontology Submission Choose Click antoOWL checkfile its to detail upload Introduction to KR, B. Ludaescher & K. Lin 29 Name Space Can be used to import this ontology into others Geology Workbench: Data (to Ontology!) Registration Step 1: Choose Classes Click on Submission Data set name Select a shapefile Choose an ontology Introduction to KR, B. Ludaescher & K. Lin 30 Geology Workbench: Data Registration Step 2: Choose Columns for Selected Classes It contains information about geologic age AREA PERIMETER AZ_1000 AZ_1000_I D GEO PERIOD ABBREV DESCR D_SYMBOL P_SYMBOL Introduction to KR, B. Ludaescher & K. Lin 31 Geology Workbench: Data Registration Step 3: Resolve Mismatches Two terms are not matched any ontology terms Manually mapping algonkian into the ontology Introduction to KR, B. Ludaescher & K. Lin 32 Geology Workbench: Ontology-enabled Map Integrator All areas with the age Paleozoic Click on the name Choose interesting Classes Introduction to KR, B. Ludaescher & K. Lin 33 Geology Workbench: Change Ontology Run it New query interface Switch from Canadian Rock Classification to British Rock Classification Ontology mapping between British Rock Classification and Canadian Rock Classification Submit a mapping Introduction to KR, B. Ludaescher & K. Lin 34 Ontology Repository • Accept user-defined ontologies in OWL • Any ontology saved in the system or accessible by can be imported into another user-defined ontology ( inter-ontology references) • Provide tool to browse the ontologies in the repository composition.owl …………….. <owl:Ontology> <owl:imports rdf:resource= "http://compute5.sdsc.geongrid.org:8080/workbench/jsp/ontologies/genesis.owl" /> </owl:Ontology> ……………. <owl:Class rdf:ID="Ultramafite"> <rdfs:subClassOf rdf:resource="#Ultramafic"/> <rdfs:subClassOf rdf:resource= "http://compute5.sdsc.geongrid.org:8080/workbench/jsp/ontologies/genesis.owl#Igneous"> </owl:Class> …………….. Introduction to KR, B. Ludaescher & K. Lin 35 Ontology-Enabled Map Integration: Where do we stand? The simple case (done) : ontologies contain only the subclass relation More complicate cases (coming soon) : ontologies contain classes with attributes ontologies with constraints in Description Logic Implementation: • v1,v2 prototypes: detail-level registration to ontology • v3 (portal): item-level registration to ontology Introduction to KR, B. Ludaescher & K. Lin 36 Current Ontology Registration (Item-level) v3 Domain Knowledge Ontologies Arizona Introduction to KR, B. Ludaescher & K. Lin 37 GEON Search: Concept-based Querying Introduction to KR, B. Ludaescher & K. Lin 38 System Overview Client Access (via web services) User Access (via Portal) myOntology.owl metadata myDataset.foo metadata ResourceRegistration GEON Catalog Other distributed apps Kepler, DLESE, … Search condition(s) spatial temporal concept GEONsearch User actions add delete manipulate GEONworkbench GEON Workspace (user) SRB Log GEONmiddleware external services Gazetteer, DLESE, … Introduction to KR, B. Ludaescher & K. Lin 39 Geologic Age, Chronos, … Introduction to Knowledge Representation and Ontologies Introduction to KR, B. Ludaescher & K. Lin 40 Complex Multiple-Worlds Mediation and XML • XML is Syntax – DTDs talk about element nesting – XML Schema schemas give you data types – need anything else? => write comments! • Domain Semantics is complex: – implicit assumptions, hidden semantics sources seem unrelated to the non-expert • Need Structure and Semantics beyond XML trees! employ richer OO models make domain semantics and “glue knowledge” explicit use ontologies to fix terminology and conceptualization avoid ambiguities by using formal semantics Introduction to KR, B. Ludaescher & K. Lin 41 XML-Based vs. Model-Based Mediation CM ~ {Descr.Logic, ER, UML, RDF/XML(-Schema), …} Integrated-DTD := Ontologies XQuery(Src1-DTD,...) DMs, PMs CM-QL ~ {F-Logic, OWL, …} Integrated-CM := CM-QL(Src1-CM,...) No Domain Constraints IF THEN IF IFTHEN THEN Structural Constraints (DTDs), Parent, Child, Sibling, ... A = (B*|C),D B = ... C1 C2 .... XML Elements XML Models Raw Raw Data RawData Data C3 R .... . . .... .... Logical Domain Constraints Classes, Relations, is-a, has-a, ... (XML) Objects Conceptual Models Knowledge Representation: Relating Theory to the World via Formal Models Source: John F. Sowa, Knowledge Representation: Logical, Philosophical, and Computational Foundations “All models are wrong, but some models are useful!” Introduction to KR, B. Ludaescher & K. Lin 43 What is an ontology?? Introduction to KR, B. Ludaescher & K. Lin 44 Glossary (wordreference.com) • ontology noun 1 (Philosophy) the branch of metaphysics that deals with the nature of being 2 (Logic) the set of entities presupposed by a theory • taxonomy noun 1 a the branch of biology concerned with the classification of organisms into groups based on similarities of structure, origin, etc.b the practice of arranging organisms in this way 2 the science or practice of classification [ETYMOLOGY: 19th Century: from French taxonomie, from Greek taxis order + -nomy] • thesaurus noun (plural: -ruses, -ri [-raı]) 1 a book containing systematized lists of synonyms and related words 2 a dictionary of selected words or topics 3 (rare) a treasury[ETYMOLOGY: 18th Century: from Latin, Greek: treasure] Introduction to KR, B. Ludaescher & K. Lin 45 Glossary (wordreference.com) • concept noun 1 an idea, esp. an abstract idea example: the concepts of biology 2 (Philosophy) a general idea or notion that corresponds to some class of entities and that consists of the characteristic or essential features of the class 3 (Philosophy) a the conjunction of all the characteristic features of something b a theoretical construct within some theory c a directly intuited object of thought d the meaning of a predicate 4 [modifier] (of a product, esp. a car) created as an exercise to demonstrate the technical skills and imagination of the designers, and not intended for mass production or sale[ETYMOLOGY: 16th Century: from Latin conceptum something received or conceived, from concipere to take in, conceive] • contingent adjective 1 [when postpositive, often foll by on or upon] dependent on events, conditions, etc., not yet known; conditional 2 (Logic) (of a proposition) true under certain conditions, false under others; not necessary 3 (in systemic grammar) denoting contingency (sense 4) 4 (Metaphysics) (of some being) existing only as a matter of fact; not necessarily existing 5 happening by chance or without known cause; accidental 6 that may or may not happen; uncertain • glossary noun (plural: -ries); an alphabetical list of terms peculiar to a field of knowledge with definitions or explanations. Sometimes called: gloss [ETYMOLOGY: 14th Century: from Late Latin glossarium; see gloss2] Introduction to KR, B. Ludaescher & K. Lin 46 1st Attempt: Ontologies in CS • An ontology is ... – an explicit specification of a conceptualization [Gruber93] – a shared understanding of some domain of interest [Uschold, Gruninger96] • Different aspects: – a formal specification (reasoning and “execution”) – ... of a conceptualisation of a domain (community) – ... of some part of the world of interest (application, science domain) • Provides: – A common vocabulary of terms – Some specification of the meaning of the terms (semantics) – A shared “understanding” for people and machines Introduction to KR, B. Ludaescher & K. Lin 47 Ontology as a philosophical discipline • Ontology as a philosophical discipline, which deals with the nature and the organization of reality: – Ontology as such is usually contrasted with Epistemology, which deals with the nature and sources of our knowledge [a.k.a. Theory of Knowledge]. Aristotle defined Ontology as the science of being as such: unlike the special sciences, each of which investigates a class of beings and their determinations, Ontology regards all the species of being qua being and the attributes which belong to it qua being" (Aristotle, Metaphysics, IV, 1). • In this sense Ontology tries to answer to the question: What is being? What exists? – the nature of being, not an enumeration of “stuff” around us… Introduction to KR, B. Ludaescher & K. Lin 48 Some different uses of the word “Ontology” [Guarino’95] 1. Ontology as a philosophical discipline 2. Ontology as a an informal conceptual system 3. Ontology as a formal semantic account 4. Ontology as a specification of a “conceptualization” 5. Ontology as a representation of a conceptual system via a logical theory 5.1 characterized by specific formal properties 5.2 characterized only by its specific purposes 6. Ontology as the vocabulary used by a logical theory 7. Ontology as a (meta-level) specification of a logical theory http://ontology.ip.rm.cnr.it/Papers/KBKS95.pdf Introduction to KR, B. Ludaescher & K. Lin 49 Ontologies vs Conceptualizations • Given a logical language L ... – ... a conceptualization is a set of models of L which describes the admittable (intended) interpretations of its non-logical symbols (the vocabulary) – ... an ontology is a (possibly incomplete) axiomatization of a conceptualization. set of all models M(L) logic theories (consistent sets of sentences; closed under logical consequence) ontology conceptualization C(L) [Guarino96] http://www-ksl.stanford.edu/KR96/Guarino-What/P003.html Introduction to KR, B. Ludaescher & K. Lin 50 Ontologies vs Knowledge Bases • An ontology is a particular KB, describing facts assumed to be always true by a community of users: – in virtue of the agreed-upon meaning of the vocabulary used (analytical knowledge): • black => not white – ... whose truth does not descend from the meaning of the vocabulary used (non-analytical, common knowledge) • Rome is the capital of Italy • An arbitrary KB may describe facts which are contingently true, and relevant to a particular epistemic state: – Mr Smith’s pathology is either cirrhosis or diabetes Introduction to KR, B. Ludaescher & K. Lin 51 Formal Ontology [Guarino’96] • Theory of formal distinctions – among things – among relations • Basic tools – Theory of parthood • What counts as a part of a given entity? What properties does the part relation have? Are the different kinds of parts? – Theory of integrity • What counts as a whole? In which sense are its parts connected? – Theory of identity • How can an entity change while keeping its identity? What are its essential properties? Under which conditions does an entity loose its identity? Does a change of “point of view” change the identity conditions? – Theory of dependence • Can a given entity exist alone, or does it depend on other entities? Introduction to KR, B. Ludaescher & K. Lin 52 Ontology: Definition and Scope [Sowa] • The subject of ontology is the study of the categories of things that exist or may exist in some domain. The product of such a study, called an ontology, is a catalog of the types of things that are assumed to exist in a domain of interest D from the perspective of a person who uses a language L for the purpose of talking about D. The types in the ontology represent the predicates, word senses, or concept and relation types of the language L when used to discuss topics in the domain D. • An uninterpreted logic, such as predicate calculus, conceptual graphs, or KIF, is ontologically neutral. It imposes no constraints on the subject matter or the way the subject may be characterized. By itself, logic says nothing about anything, but the combination of logic with an ontology provides a language that can express relationships about the entities in the domain of interest. http://users.bestweb.net/~sowa/ontology/index.htm Introduction to KR, B. Ludaescher & K. Lin 53 Ontology: Definition and Scope [Sowa] • An informal ontology may be specified by a catalog of types that are either undefined or defined only by statements in a natural language. A formal ontology is specified by a collection of names for concept and relation types organized in a partial ordering by the type-subtype relation. Formal ontologies are further distinguished by the way the subtypes are distinguished from their supertypes: – an axiomatized ontology distinguishes subtypes by axioms and definitions stated in a formal language, such as logic or some computer-oriented notation that can be translated to logic – a prototype-based ontology distinguishes subtypes by a comparison with a typical member or prototype for each subtype. • Large ontologies often use a mixture of definitional methods: formal axioms and definitions are used for the terms in mathematics, physics, and engineering; and prototypes are used for plants, animals, and common household items. . http://users.bestweb.net/~sowa/ontology/index.htm Introduction to KR, B. Ludaescher & K. Lin 54 Why develop an ontology? • To make domain assumptions explicit – Easier to change domain assumptions – Easier to understand, update, and integrate legacy data data integration • To separate domain knowledge from operational knowledge – Re-use domain and operational knowledge separately • A community reference for applications • To share a consistent understanding of what information means. [Carole Goble, Nigel Shadbolt, Ontologies and the Grid Tutorial] Introduction to KR, B. Ludaescher & K. Lin 55 What is being shared? Metadata • Data describing the content and meaning of resources and services. • But everyone must speak the same language… Terminologies • Shared and common vocabularies • For search engines, agents, curators, authors and users • But everyone must mean the same thing… Ontologies • Shared and common understanding of a domain • Essential for search, exchange and discovery Ontologies aim at sharing meaning [Carole Goble, Nigel Shadbolt, Ontologies and the Grid Tutorial] Introduction to KR, B. Ludaescher & K. Lin 56 Origin and History • Humans require words (or at least symbols) to communicate efficiently. The mapping of words to things is indirect. We do it by creating concepts that refer to things. • The relation between symbols and things has been described in the form of the meaning triangle: Concept “Jaguar“ Ogden, C. K. & Richards, I. A. 1923. "The Meaning of Meaning." 8th Ed. New York, Harcourt, Brace & World, Inc before: Frege, Peirce; see [Sowa 2000] [Carole Goble, Nigel Shadbolt, Ontologies and the Grid Tutorial] Introduction to KR, B. Ludaescher & K. Lin 57 Human and machine communication [Maedche et al., 2002] • ... Human Agent 1 Human Agent 2 exchange symbol, e.g. via nat. language Machine Agent 1 Machine Agent 2 exchange symbol, e.g. via protocols Ontology Description Symbol ‘‘JAGUAR“ Formal Semantics Internal models commit commit commit Concept MA1 HA2 HA1 Formal models Ontology commit a specific domain, e.g. animals Introduction to KR, B. Ludaescher & K. Lin MA2 58 Things Meaning Triangle Introduction to Description Logics References: • F. Baader, W. Nutt. Basic Description Logics. In the Description Logic Handbook, edited by F. Baader, D. Calvanese, D.L. McGuinness, D. Nardi, P.F. Patel-Schneider, Cambridge University Press, 2002, pages 47-100. • Description Logics Tutorial, Ian Horrocks and Ulrike Sattler, ECAI-2002, Lyon, France, July 23rd, 2002. • Emerging Sparrow toolkit (Bowers, Ludaescher) Introduction to KR, B. Ludaescher & K. Lin 59 Example: Description Logic • DL definition of “Happy Father” (Example from Ian Horrocks, Ulrike Sattler, U Manchester) Introduction to KR, B. Ludaescher & K. Lin 60 Science Example: Ontology for SYNAPSE and NCMIR Purkinje cells and Pyramidal cells have dendrites that have higher-order branches that contain spines. Dendritic spines are ion (calcium) regulating components. Spines have ion binding proteins. Neurotransmission involves ionic activity (release). Ion-binding proteins control ion activity (propagation) in a cell. Ion-regulating components of cells affect ionic activity (release). Domain Expert Knowledge DM in Description Logic Domain Map (DM) Introduction to KR, B. Ludaescher & K. Lin 61 Source Contextualization, Ontology Refinement In addition to registering (“hanging off”) data relative to existing concepts, a source may also refine the mediator’s domain map... sources can register new concepts at the mediator ... Introduction to KR, B. Ludaescher & K. Lin 62 Some Description Logics History • “Structured Inheritance Networks” [Brachman 1977] • KL-ONE [Brachman, Schmolze 1985] • Core ideas: – Building blocks: atomic concepts (unary predicates), atomic roles (binary predicates), individuals (constants) – Constructors for building complex concepts and roles from simpler ones – Automated inference for concept subsumption and instance classification (is-a/is-instance-of are not explicitly given by the user, but inferred from concept definitions/instance properties) Introduction to KR, B. Ludaescher & K. Lin 63 Source: Description Logics Tutorial, Ian Horrocks and Ulrike Sattler, ECAI-2002, Lyon, France, July 23rd, 2002 Introduction to KR, B. Ludaescher & K. Lin 64 Knowledge Base (DL-Style) • Terminological Knowledge (TBox) – Concept Definition (naming of concepts): – Axiom (constraining of concepts): => a mediators “glue knowledge source” • Assertional Knowledge (ABox) about Individuals – n27_img118 : Neuron => the concrete instances/individuals of the concepts/classes that your sources export Introduction to KR, B. Ludaescher & K. Lin 65 Example TBox Atomic concepts = {P,F,W, M1,…} Base concepts = {P,F} Defined concepts = {W, M1, M2, …} Roles = {h1,h2} Concept Definition Axiom where A atomic concept, C, D complex concept expressions Introduction to KR, B. Ludaescher & K. Lin 66 Example TBox • Base concepts = {Person, Female} … occur on the RHS only • Defined concepts = {P, F, W, …} … occur on the LHS (& maybe RHS) • Base interpretation J: interpret base concepts only • Extension I of J: on same domain as J and agrees (on base) with J • TBox T is definitorial if every base interpretation has exactly one extension that is a model of T Introduction to KR, B. Ludaescher & K. Lin 67 Brains-On (Hands-off) Session TM Introduction to KR, B. Ludaescher & K. Lin 68 What do we mean here? Starting with the base interpretation of • I(Person) := “the class of persons” • I(Female) := “the class of females” … what is the meaning of the defined concepts? … what role play the roles in this process? Introduction to KR, B. Ludaescher & K. Lin 69 And the answer is … • • • • • atomic concept atomic concept concept def. w/ intersection … plus negation … existential restriction • … value restriction Introduction to KR, B. Ludaescher & K. Lin 70 Digression: “Sparrow” (Prolog) Syntax for DL Sparrow “Grammar” and “Parser” Example in Sparrow Syntax Introduction to KR, B. Ludaescher & K. Lin 71 Back to Reasoning with the Family ... • concept definition: MyConcept DL-formula • concept inclusion: MyConcept DL-formula • finite set of definitions is a terminology or TBox if for every atomic concept A there is at most one axiom whose lhs is A Introduction to KR, B. Ludaescher & K. Lin 72 Expansion of Terminologies • For acyclic T we can “unfold” concept definitions until every defined concepts is specified in terms of primitive concepts only the expansion of a TBox T • Example: Introduction to KR, B. Ludaescher & K. Lin 73 Reasoning in the Tableaux calculus From this TBox We want to show this Expansion In First-order (LeanTap) syntax Introduction to KR, B. Ludaescher & K. Lin 74 Reasoning Services • Remember the distinction between evaluation a query (over a DB) vs reasoning with queries (symbolic expressions)? • The former can be very hard, esp. for large databases and complex queries • The latter is much harder still, even for small queries and knowledge bases, ontologies • Specialized DL reasoners (FACT, Racer, …) better than general purpose FO reasoners Introduction to KR, B. Ludaescher & K. Lin 75 OK – enough of that jazz… let’s look at some demos … Introduction to KR, B. Ludaescher & K. Lin 76 Tools for Editing and Processing Ontology 1. Protégé 2000 (RDF, OWL) http://protege.stanford.edu/ 2. CmapTools (concept map) http://cmap.ihmc.us/ 3. Java API Jena http://www.hpl.hp.com/semweb/jena.htm OWL API http://sourceforge.net/projects/owlapi Geology Map Integration Demo: http://geon01.sdsc.edu:8080/workbench/jsp/onto-list.jsp Introduction to KR, B. Ludaescher & K. Lin 77 ANOTHER APPLICATION OF ONTOLOGIES: An Ontology-Driven Framework for Data Transformation in Scientific Workflows (from DILS’04) Shawn Bowers Bertram Ludäscher San Diego Supercomputer Center University of California, San Diego Introduction to KR, B. Ludaescher & K. Lin 78 Outline • Background (SEEK Project) • Scientific Workflows • The Problem: Reusing Structurally Incompatible Services • The Ontology-Driven Framework • Future Work Introduction to KR, B. Ludaescher & K. Lin 79 Outline • Background (SEEK Project) • Scientific Workflows • The Problem: Reusing Structurally Incompatible Services • The Ontology-Driven Framework • Future Work Introduction to KR, B. Ludaescher & K. Lin 80 Science Environment for Ecological Knowledge (SEEK) • Domain Science Driver – Ecology (LTER), biodiversity, … • Analysis & Modeling System – Design and execution of ecological models and analysis – End user focus – {application,upper}ware Architecture (cf. US cyberinfrastructure, UK e-Science) • Semantic Mediation System – Data Integration of hard-to-relate sources and processes – Semantic Types and Introduction to KR, B. Ludaescher & K. Lin Ontologies this paper 81 Outline • The SEEK Project • Scientific Workflows – Focus: analysis & component integration on top of data integration • The Problem: Reusing Structurally Incompatible Services • The Ontology-Driven Framework • Future Work Introduction to KR, B. Ludaescher & K. Lin 82 Promoter Identification in Kepler [SSDBM’03] • Problems – Many components (web serivces) are NOT designed to fit! “The problem P that X solves is simple, and X doesn’t solve it well” – Semantically meaningful connections are structurally incompatible • Approach – – – – Introduction to KR, B. Ludaescher & K. Lin 83 Distinguish structural type and semantic type Structural type: e.g. XML Schema Semantic type: e.g. OWL expressions Exploit the (optional!) semantic type as much as possible Service Reusability A scientist wishes to connect two (independent) services Source Service Introduction to KR, B. Ludaescher & K. Lin Desired Connection Pt Ps 84 Target Service Service Reusability In Ptolemy II/Kepler (and in web services), input and output ports (message parts) have structural types (XML Schema) Structural Type Ps Source Service Introduction to KR, B. Ludaescher & K. Lin Structural Type Pt Desired Connection Pt Ps 85 Target Service Service Reusability Unless “designed to fit,” independent services are structurally incompatible Generally, the source output type will not be a subtype of the target input type Structural Type Ps Source Service Introduction to KR, B. Ludaescher & K. Lin Incompatible (⋠) Structural Type Pt Desired Connection Pt Ps 86 Target Service Service Reusability A transformation mapping () is required to connect the services … artificially creating subtype compatibility If such a exists, the services are “structurally feasible” Structural Type Ps Incompatible (⋠) Source Service Introduction to KR, B. Ludaescher & K. Lin (Ps) Structural Type Pt (≺) Desired Connection Pt Ps 87 Target Service Service Reusability We can annotate services with semantic types for discovery and interoperability of services Ontologies (OWL) Compatible (⊑) Semantic Type Ps Source Service Introduction to KR, B. Ludaescher & K. Lin Semantic Type Pt Desired Connection Pt Ps 88 Target Service Service Reusability Services can be semantically compatible, but structurally incompatible Ontologies (OWL) Compatible (⊑) Semantic Type Ps Structural Type Ps Incompatible (⋠) Source Service Introduction to KR, B. Ludaescher & K. Lin (Ps) Semantic Type Pt Structural Type Pt (≺) Desired Connection Pt Ps 89 Target Service Example Structural Types (XML) structType(P2) root elem elem elem elem elem structType(P3) population sample meas cnt acc lsp = = = = = = (sample)* (meas, lsp) (cnt, acc) xsd:integer xsd:double xsd:string root elem elem elem <population> <sample> <meas> <cnt>44,000</cnt> <acc>0.95</acc> </meas> <lsp>Eggs</lsp> </sample> … <population> P3 S1 S2 (mortality rate for period) (life stage property) P4 Introduction to KR, B. Ludaescher & K. Lin (measurement)* (phase, obs) xsd:string xsd:integer <cohortTable> <measurement> <phase>Eggs</cnt> <obs>44,000</acc> </measurement> … <cohortTable> P2 P1 cohortTable = measuremnt = phase = obs = 90 P5 Example Semantic Types Portion of SEEK measurement ontology appliesTo MeasContext hasContext 0:* MeasProperty hasProperty itemMeasured Observation 0:* 1:* Spatial Location hasLocation 1:1 Numeric Value hasCount hasValue Introduction to KR, B. Ludaescher & K. Lin Entity Ecological Property Accuracy Qualifier 1:1 1:1 1:1 91 Abundance Count LifeStage Property Example Semantic Types Portion of SEEK measurement ontology appliesTo MeasContext Same in OWL, a description logic0:*standard (here, Sparrow syntax): hasContext 1:1 Observation subClassOf forall hasContext/MeasContext and hasProperty itemMeasured forall hasProperty/MeasProperty and MeasProperty Observation Entity 0:* exists itemMeasured/Entity. 1:* MeasContext subClassOf exists appliesTo/Entity and atmost 1/appliesTo. Ecological Property Accuracy EcologicalProperty subClassOf Entity. Qualifier LifeStageProperty subClassOf EcologicalProperty. Spatial LifeStage hasLocation Abundance AbundanceCount subClassOf EcologicalProperty and Location Count Property 1:1 exists hasLocation/SpatialLocation and hasValue atMost 1/hasLocation and 1:1 Numeric exists hasCount hasCount/NumericValue and ValueatMost 1:11/hasCount. Introduction to KR, B. Ludaescher & K. Lin 92 Example Semantic Types Semantic types for P2 and P3 MeasContext Observation hasContext appliesTo 1:1 semType(P3) 1:1 itemMeasured Abundance Count 1:1 LifeStage Property hasCount Number Value 1:1 1:1 ⊑ hasProperty semType(P2) 1:1 Accuracy Qualifier P2 P1 P3 S1 S2 (mortality rate for period) (life stage property) P4 Introduction to KR, B. Ludaescher & K. Lin hasValue 93 P5 Example Semantic Types Semantic types for P2 and P3 MeasContext semType(P3) subClassOf Observation and Observation exists hasContext/(MeasurementContext and exists appliesTo/LifeStageProperty and hasContext appliesTo LifeStage atMost 1/appliesTo) and Property 1:1 1:1 exists itemMeasured/AbundanceCount and atMost 1/itemMeasured. itemMeasured Abundance hasCount Number semType(P3) Count Value 1:1 1:1 semType(P2) subClassOf Observation and 1:1 ⊑ exists hasContext/(MeasurementContext and hasValue hasProperty exists appliesTo/LifeStageProperty and Accuracy semType(P2) atMost 1/appliesTo) and Qualifier 1:1 exists itemMeasured/AbundanceCount and atMost 1/itemMeasured and exists hasProperty/AccuracyQualifier and P2 P3 P5 P1 atMost 1/hasProperty. S 2 S1 (mortality rate for period) (life stage property) P4 Introduction to KR, B. Ludaescher & K. Lin 94 Outline • The SEEK Project • Scientific Workflows • The Problem: Reusing Structurally Incompatible Services • The Ontology-Driven Framework • Future Work Introduction to KR, B. Ludaescher & K. Lin 95 The Ontology-Driven Framework Define semantic registration mappings (“semantic views”) to connect structural and semantic types Use registration mappings to (semi-) automate transformation, based on derived structural correspondences Depending on the ontologies and registration mappings, it may not be possible to find an appropriate … (since the correspondence is often under-specified) Introduction to KR, B. Ludaescher & K. Lin 96 The Ontology-Driven Framework Ontologies (OWL) Semantic Type Ps Compatible (⊑) Registration Mapping (Input) Registration Mapping (Output) Structural Type Ps Source Service Ps Introduction to KR, B. Ludaescher & K. Lin Semantic Type Pt Structural Type Pt Desired Connection 97 Pt Target Service Registration Example (simple XPaths) structType(P2) root elem elem elem elem elem population sample meas cnt acc lsp = = = = = = /population/sample /population/sample/meas/cnt /population/sample/meas/cnt/text() /population/sample/meas/acc /population/sample/meas/acc/text() /population/sample/lsp/text() Introduction to KR, B. Ludaescher & K. Lin <population> <sample> <meas> <cnt>44,000</cnt> <acc>0.95</acc> </meas> <lsp>Eggs</lsp> </sample> … <population> (sample)* (meas, lsp) (cnt, acc) xsd:integer xsd:double xsd:string == == == == == == 98 semType(P2) semType(P2).itemMeasured semType(P2).itemMeasured.hasCount semType(P2).hasProperty semType(P2).hasProperty.hasValue semType(P2).hasContext.appliesTo Registration Example (simple XPaths) structType(P2) root elem elem elem elem elem population sample meas cnt acc lsp = = = = = = <population> <sample> <meas> <cnt>44,000</cnt> <acc>0.95</acc> </meas> <lsp>Eggs</lsp> </sample> … <population> (sample)* (meas, lsp) (cnt, acc) xsd:integer xsd:double xsd:string /population/sample /population/sample/meas/cnt /population/sample/meas/cnt/text() /population/sample/meas/acc /population/sample/meas/acc/text() /population/sample/lsp/text() == == == == == == semType(P2) semType(P2).itemMeasured semType(P2).itemMeasured.hasCount semType(P2).hasProperty semType(P2).hasProperty.hasValue semType(P2).hasContext.appliesTo Each sample is an instance of the semantic type Introduction to KR, B. Ludaescher & K. Lin 99 Registration Example (simple XPaths) structType(P2) root elem elem elem elem elem population sample meas cnt acc lsp = = = = = = <population> <sample> <meas> <cnt>44,000</cnt> <acc>0.95</acc> </meas> <lsp>Eggs</lsp> </sample> … <population> (sample)* (meas, lsp) (cnt, acc) xsd:integer xsd:double xsd:string /population/sample /population/sample/meas/cnt /population/sample/meas/cnt/text() /population/sample/meas/acc /population/sample/meas/acc/text() /population/sample/lsp/text() == == == == == == semType(P2) semType(P2).itemMeasured semType(P2).itemMeasured.hasCount semType(P2).hasProperty semType(P2).hasProperty.hasValue semType(P2).hasContext.appliesTo Each sample’s cnt represents the itemMeasured object Introduction to KR, B. Ludaescher & K. Lin 100 Registration Example (simple XPaths) structType(P2) root elem elem elem elem elem population sample meas cnt acc lsp = = = = = = <population> <sample> <meas> <cnt>44,000</cnt> <acc>0.95</acc> </meas> <lsp>Eggs</lsp> </sample> … <population> (sample)* (meas, lsp) (cnt, acc) xsd:integer xsd:double xsd:string /population/sample /population/sample/meas/cnt /population/sample/meas/cnt/text() /population/sample/meas/acc /population/sample/meas/acc/text() /population/sample/lsp/text() == == == == == == semType(P2) semType(P2).itemMeasured semType(P2).itemMeasured.hasCount semType(P2).hasProperty semType(P2).hasProperty.hasValue semType(P2).hasContext.appliesTo Each sample’s cnt’s value represents the hasCount value of the corresponding itemMeasured object Introduction to KR, B. Ludaescher & K. Lin 101 Registration Example (simple XPaths) structType(P3) root elem elem elem cohortTable = measuremnt = phase = obs = <cohortTable> <measurement> <phase>Eggs</cnt> <obs>44,000</acc> </measurement> … <cohortTable> (measurement)* (phase, obs) xsd:string xsd:integer /cohortTable/measurement /cohortTable/measurement/obs /cohortTable/measurement/obs/text() /cohortTable/measurement/phase/text() == == == == semType(P3) semType(P3).itemMeasured semType(P3).itemMeasured.hasCount semType(P3).hasContext.appliesTo … similary for P3 .. … . Introduction to KR, B. Ludaescher & K. Lin 102 The Ontology-Driven Framework Ontologies (OWL) Semantic Type Ps Compatible Registration Mapping (Input) Registration Mapping (Output) Structural Type Ps Source Service (⊑) Ps Introduction to KR, B. Ludaescher & K. Lin Correspondence Desired Connection 103 Semantic Type Pt Structural Type Pt Pt Target Service Correspondence Example Source-side semantic registration mapping /population/sample /population/sample/meas/cnt /population/sample/meas/cnt/text() /population/sample/meas/acc /population/sample/meas/acc/text() /population/sample/lsp/text() == == == == == == semType(P2) semType(P2).itemMeasured semType(P2).itemMeasured.hasCount semType(P2).hasProperty semType(P2).hasProperty.hasValue semType(P2).hasContext.appliesTo == == == == semType(P3) semType(P3).itemMeasured semType(P3).itemMeasured.hasCount semType(P3).hasContext.appliesTo Target-side semantic registration mapping /cohortTable/measurement /cohortTable/measurement/obs /cohortTable/measurement/obs/text() /cohortTable/measurement/phase/text() population sample * meas cnt xsd:integer cohortTable measurement * obs xsd:integer phase xsd:string acc xsd:double lsp xsd:string Introduction to KR, B. Ludaescher & K. Lin 104 Correspondence Example Source /population/sample /population/sample/meas/cnt /population/sample/meas/cnt/text() /population/sample/meas/acc /population/sample/meas/acc/text() /population/sample/lsp/text() == == == == == == semType(P2) semType(P2).itemMeasured semType(P2).itemMeasured.hasCount semType(P2).hasProperty semType(P2).hasProperty.hasValue semType(P2).hasContext.appliesTo == == == == semType(P3) semType(P3).itemMeasured semType(P3).itemMeasured.hasCount semType(P3).hasContext.appliesTo Target /cohortTable/measurement /cohortTable/measurement/obs /cohortTable/measurement/obs/text() /cohortTable/measurement/phase/text() We want to “compose” the registrations to obtain cohortTable structural correspondences population sample * meas cnt xsd:integer measurement * obs xsd:integer phase xsd:string acc xsd:double lsp xsd:string Introduction to KR, B. Ludaescher & K. Lin 105 Correspondence Example Source /population/sample /population/sample/meas/cnt /population/sample/meas/cnt/text() /population/sample/meas/acc /population/sample/meas/acc/text() /population/sample/lsp/text() == == == == == == semType(P2) semType(P2).itemMeasured semType(P2).itemMeasured.hasCount semType(P2).hasProperty semType(P2).hasProperty.hasValue semType(P2).hasContext.appliesTo == == == == semType(P3) semType(P3).itemMeasured semType(P3).itemMeasured.hasCount semType(P3).hasContext.appliesTo Target /cohortTable/measurement /cohortTable/measurement/obs /cohortTable/measurement/obs/text() /cohortTable/measurement/phase/text() population sample * meas cnt xsd:integer acc xsd:double lsp xsd:string Introduction to KR, B. Ludaescher & K. Lin cohortTable measurement * obs xsd:integer phase xsd:string These fragments correspond 106 Correspondence Example Source /population/sample /population/sample/meas/cnt /population/sample/meas/cnt/text() /population/sample/meas/acc /population/sample/meas/acc/text() /population/sample/lsp/text() == == == == == == semType(P2) semType(P2).itemMeasured semType(P2).itemMeasured.hasCount semType(P2).hasProperty semType(P2).hasProperty.hasValue semType(P2).hasContext.appliesTo == == == == semType(P3) semType(P3).itemMeasured semType(P3).itemMeasured.hasCount semType(P3).hasContext.appliesTo Target /cohortTable/measurement /cohortTable/measurement/obs /cohortTable/measurement/obs/text() /cohortTable/measurement/phase/text() population sample * meas cnt xsd:integer acc xsd:double lsp xsd:string Introduction to KR, B. Ludaescher & K. Lin cohortTable measurement * obs xsd:integer phase xsd:string These fragments correspond 107 Correspondence Example Source /population/sample /population/sample/meas/cnt /population/sample/meas/cnt/text() /population/sample/meas/acc /population/sample/meas/acc/text() /population/sample/lsp/text() == == == == == == semType(P2) semType(P2).itemMeasured semType(P2).itemMeasured.hasCount semType(P2).hasProperty semType(P2).hasProperty.hasValue semType(P2).hasContext.appliesTo == == == == semType(P3) semType(P3).itemMeasured semType(P3).itemMeasured.hasCount semType(P3).hasContext.appliesTo Target /cohortTable/measurement /cohortTable/measurement/obs /cohortTable/measurement/obs/text() /cohortTable/measurement/phase/text() population sample * meas cnt xsd:integer acc xsd:double lsp xsd:string Introduction to KR, B. Ludaescher & K. Lin cohortTable measurement * obs xsd:integer phase xsd:string These fragments correspond 108 Correspondence Example Source /population/sample /population/sample/meas/cnt /population/sample/meas/cnt/text() /population/sample/meas/acc /population/sample/meas/acc/text() /population/sample/lsp/text() == == == == == == semType(P2) semType(P2).itemMeasured semType(P2).itemMeasured.hasCount semType(P2).hasProperty semType(P2).hasProperty.hasValue semType(P2).hasContext.appliesTo == == == == semType(P3) semType(P3).itemMeasured semType(P3).itemMeasured.hasCount semType(P3).hasContext.appliesTo Target /cohortTable/measurement /cohortTable/measurement/obs /cohortTable/measurement/obs/text() /cohortTable/measurement/phase/text() population sample * meas cnt xsd:integer acc xsd:double lsp xsd:string Introduction to KR, B. Ludaescher & K. Lin cohortTable measurement * obs xsd:integer phase xsd:string These fragments correspond 109 The Ontology-Driven Framework Ontologies (OWL) Semantic Type Ps Compatible Registration Mapping (Input) Registration Mapping (Output) Structural Type Ps Correspondence Generate Source Service (⊑) Structural Type Pt (Ps) Transformation Ps Introduction to KR, B. Ludaescher & K. Lin Semantic Type Pt Desired Connection 110 Pt Target Service Example Result (XQuery) Based on the structural correspondences and certain assumptions, we derive the transformation XQuery: <cohortTable> { for $s in /population/sample return <measurement> { for $c in $s/meas/cnt return <obs>{$c/text()}</obs> } { for $l in $s/lsp return <phase>{$l/text()}</phase> } </measurement> } </cohortTable> Introduction to KR, B. Ludaescher & K. Lin 111 Assumptions Made (or why this may not work for you…) • Common XPath prefixes refer to the same element • Elements in correspondences have compatible cardinalities – source is equivalent or stricter than target (e.g., + is stricter than *) • Primitive data types are compatible Introduction to KR, B. Ludaescher & K. Lin 112