Semantic Mediation, Ontologies and Scientific Workflows and all the rest (+/– Web Services) Bertram Ludäscher Knowledge-Based Information Systems Lab San Diego Supercomputer Center University of California San Diego http://seek.ecoinformatics.org http://www.geongrid.org Outline • Motivation (SEEK, GEON, ..) • Ontologies 101 • Semantic Mediation, Data Registration, … • Application Examples (Stargazing with Kepler…) SDSC/LTER Workshop Feb’2004 2 Kepler Team, Projects, Sponsors • • • • • • • • • • • • • • • • • Ilkay Altintas SDM Chad Berkley SEEK Shawn Bowers SEEK Jeffrey Grethe BIRN Christopher H. Brooks Ptolemy II Zhengang Cheng SDM Efrat Jaeger GEON Matt Jones SEEK Edward A. Lee Ptolemy II Kai Lin GEON Ashraf Memon GEON Bertram Ludaescher BIRN, GEON, SDM, SEEK Steve Mock NMI Steve Neuendorffer Ptolemy II Mladen Vouk SDM Yang Zhao Ptolemy II … SDSC/LTER Workshop Feb’2004 3 Ptolemy II SDSC/LTER Workshop Feb’2004 4 SEEK Science Environment for Ecological Knowledge • EcoGrid • Uniform interfaces to manage environmental data • Kepler • Modeling scientific workflows • Semantic Mediation System • “Smart” data discovery and integration • • • Knowledge Representation (SEEK-KR) Classification and Nomenclature (SEEK-TAXON) Biodiversity and Ecological Analysis and Modeling (SEEK-BEAM) SDSC/LTER Workshop Feb’2004 5 SEEK Overview SDSC/LTER Workshop Feb’2004 6 Building the EcoGrid NTL AND HBR VCR LUQ Metacat node VegBank node Xanthoria node SDSC/LTER Workshop Feb’2004 SRB node DiGIR node Legacy system LTER Network (24) Natural History Collections (>> 100) Organization of Biological Field Stations (180) UC Natural Reserve System (36) Partnership for Interdisciplinary Studies of Coastal Oceans (4) Multi-agency Rocky Intertidal Network (60) 7 Heterogeneous Data integration • Requires advanced metadata and processing – – – – Attributes must be semantically typed Collection protocols must be known Units and measurement scale must be known Measurement relationships must be known • e.g., that ArealDensity=Count/Area SDSC/LTER Workshop Feb’2004 8 Semantic Mediation • Label data with semantic types • Label inputs and outputs of analytical components with semantic types Data Ontology Workflow Components • Use reasoning engines to generate transformation steps – Beware analytical constraints • Use reasoning engine to discover relevant components SDSC/LTER Workshop Feb’2004 9 Ecological ontologies • • • • What was measured (e.g., biomass) Type of measurement (e.g., Energy) Context of measurement (e.g., Psychotria limonensis) How it was measured (e.g., dry weight) • SEEK intends to enable community-created ecological ontologies using OWL – • Represents a controlled vocabulary for ecological metadata More about this in Bertram’s talk SDSC/LTER Workshop Feb’2004 10 Ontologies 101 (based on a tutorial by Shawn Bowers and CSE291) • Ontologies basics • Ontologies and data management • Benefits of ontologies • Constructing ontologies • Breakout Exercises SDSC/LTER Workshop Feb’2004 11 What are ontologies? It depends on who you ask We focus on the data-management view Generally speaking, an ontology specifies a theory (a model) by … defining and relating … generic concepts representing features of the real or abstract world (a domain of interest) SDSC/LTER Workshop Feb’2004 12 [Bunge] Concepts, Symbols, and Things • Humans use symbols (e.g., words) to communicate • Words are mapped to things indirectly through concepts that denote (refer to) things Concept “Jaguar” Ogden, C. K. & Richards, I. A. 1923. "The Meaning of Meaning." 8th Ed. New York, Harcourt, Brace & World, Inc SDSC/LTER Workshop Feb’2004 13 [Carole Goble, Nigel Shadbolt] Concepts, Symbols, and Things Symbols and concepts are not precise – The same symbol can stand for multiple things – The same thing can have multiple symbols – Concepts are usually not well-defined Concept “Jaguar” Ogden, C. K. & Richards, I. A. 1923. "The Meaning of Meaning." 8th Ed. New York, Harcourt, Brace & World, Inc SDSC/LTER Workshop Feb’2004 14 [Carole Goble, Nigel Shadbolt] Concepts, Symbols, and Things An ontology attempts to define and relate specific concepts for certain sets of things via agreed upon symbols Concept “Jaguar” Ogden, C. K. & Richards, I. A. 1923. "The Meaning of Meaning." 8th Ed. New York, Harcourt, Brace & World, Inc SDSC/LTER Workshop Feb’2004 15 What are ontologies? Ontologies are typically created to: Commit to a definition (a model) of a domain Explicitly state assumptions concerning the definition Have a wide scope (be general) Support exchange and integration of heterogeneous data sources and applications (more on this later…) SDSC/LTER Workshop Feb’2004 16 What are ontologies? Ontologies may be expressed Informally using natural language (e.g., in philosophy and sometimes biology) Formally using a mathematical language, e.g., firstorder logic We focus on formal ontologies To be precise about what the theory proposes SDSC/LTER Workshop Feb’2004 17 What are ontologies? Formal ontologies can vary in detail Controlled Vocabulary (list of terms) Simple Thesaurus (synonyms) Thesaurus (broader/narrower terms) Classification (class, instance, is-a, maybe part-of) Classification (value, cardinality constraints) Classification (axioms such as disjoint, union, etc.) Classification (general logic constraints) SDSC/LTER Workshop Feb’2004 18 What are ontologies? Formal ontologies can vary in detail Controlled Vocabulary (list of terms) Simple Thesaurus (synonyms) Thesaurus (broader/narrower terms) Expressiveness Classification (class, instance, is-a, maybe part-of) Classification (value, cardinality constraints) Classification (axioms such as disjoint, union, etc.) Classification (general logic constraints) SDSC/LTER Workshop Feb’2004 19 Class, Instance, and Is-a Animal is-a “Every Jaguar is an Animal” x . Jaguar(x) Animal(x) Jaguar Set of things (instances) denoted by the class Animal Set of things (instances) denoted by the class Jaguar SDSC/LTER Workshop Feb’2004 20 Properties and Cardinality Constraints Animal is-a Carnivore is-a Jaguar eats A cardinality constraint might state that carnivores must eat at least one Animal Question: Must Jaguars eat at least one Animal? SDSC/LTER Workshop Feb’2004 21 Value Restrictions Animal is-a Carnivore is-a Jaguar SDSC/LTER Workshop Feb’2004 eats A value restriction for Jaguar might restrict the eats property to the specific animals eaten by Jaguars 22 Value Restrictions Jaguars restrict the eats relationship to Marsh Deer, … Animal eats Carnivore Herbivore eats Marsh Deer SDSC/LTER Workshop Feb’2004 23 Jaguar Value Restrictions Does anyone see a problem with this choice of representation? Animal eats Carnivore Herbivore eats Marsh Deer SDSC/LTER Workshop Feb’2004 24 Jaguar Value Restrictions These different representations propose the same basic underlying theory Animal eats JaguarFood Herbivore Carnivore Marsh Deer Peccary Jaguar eats SDSC/LTER Workshop Feb’2004 25 What are ontologies? Formal ontologies can vary in detail Controlled Vocabulary (list of terms) Simple Thesaurus (synonyms) Thesaurus (broader/narrower terms) Expressiveness Classification (class, instance, is-a, maybe part-of) Classification (value, cardinality constraints) Classification (axioms such as disjoint, union, etc.) Classification (general logic constraints) SDSC/LTER Workshop Feb’2004 26 What are ontologies? An (informal) ontology of wine: Wines are potable liquids made by wineries within regions and with specific vintages Wines are characterized by the type of grape they are made with, their color (white, rose, red), their sugar (dry, offdry, or sweet), their body (light, medium, full), and their flavor (delicate, moderate, strong) Sauvignon Blanc, Merlot, Pinot Noir, and Riesling are types of wines SDSC/LTER Workshop Feb’2004 27 [OWL Guide] Exercise With a partner, take 5 minutes and try to define a “formal” ontology for the wine example – Select two or three classes – Identify some relationships between them – List any constraints (cardinality or value restrictions) that exist between them SDSC/LTER Workshop Feb’2004 28 What are ontologies? (Philosophy) An ontological theory can answer “ontological” questions – – – – Is Merlot a potable liquid? Are there wines made of things other than grapes? How are Pinot Gris and Pinot Noir related? Are there white wines that are dry, full, and strong made in Napa Valley? We will look at other uses later SDSC/LTER Workshop Feb’2004 29 [Bunge] Outline • Ontologies basics • Ontologies and data management • Benefits of using ontologies • Constructing ontologies • Breakout Exercises SDSC/LTER Workshop Feb’2004 30 Ontologies and Data Management Where do ontologies fit within data management architectures? There is no specific answer to this question… However, an ontology is similar to a schema or conceptual model if one exists, but is – Developed independently of a particular application – Probably given in a different language – Inherently more general – Usually not a very good schema (weak structure) SDSC/LTER Workshop Feb’2004 31 Ontologies and Data Management ( watch out for Semantic Data Registration later) Ontology use concepts from (explicitly or implicitly) Conceptual Model Schema Design Artifact Conceptual Model Schema Schema Schema Metadata Data SDSC/LTER Workshop Feb’2004 32 Outline • Ontologies basics • Ontologies and data management • Benefits of ontologies • Constructing ontologies • Breakout Exercises SDSC/LTER Workshop Feb’2004 33 Benefits of ontologies Ontologies are often developed within a community and are interdisciplinary Explicitly capture “knowledge” about a domain – Standard terms (symbols) for metadata values and schema design – Enables advanced searching techniques (via reasoning) – Enables exchange and integration SDSC/LTER Workshop Feb’2004 34 Benefits of ontologies Ontologies for metadata keywords {sonoma county, wine} {cabernet sauvignon, sonoma county, …} {medium, red, dry, …} SDSC/LTER Workshop Feb’2004 35 Benefits of ontologies Ontologies for metadata keywords Find information about dry california red wines {sonoma region, wine} {cabernet sauvignon, sonoma region, …} {medium, red, dry, …} We use the ontology to “expand” and/or “focus” the query, e.g., that cabernet sauvignon is red and dry; sonoma valley is in california SDSC/LTER Workshop Feb’2004 36 Benefits of ontologies Dataset (wines by regions) What regional characteristics produce the best-selling wines? Dataset (wine sales) Dataset (region characteristics) Integrate Integration can be extremely complex due to structural (schema and values) and semantic (ontological) differences Ontologies can help! SDSC/LTER Workshop Feb’2004 Analysis 37 Benefits of ontologies Dataset (wines by regions) What regional characteristics produce the best-selling wines? Dataset (wine sales) Dataset (region characteristics) Provides a uniform view of disparate sources SDSC/LTER Workshop Feb’2004 Integrate Analysis Registering datasets with ontologies Map structure (schema) to concepts Map data to classes/instances (various ways to do this…) 38 Outline • Ontologies basics • Ontologies and data management • Benefits of ontologies • Constructing ontologies • Breakout Exercises SDSC/LTER Workshop Feb’2004 39 Constructing ontologies Various Web-based standards are emerging for defining ontologies XML Schema • Mainly for defining “vocabularies” and less-formal ontologies (term-based is-a, some constraints) • Mainly a structural/schema representation – Topic Maps • For advanced thesauri, subject indexes – RDF/RDFS/OWL • Formal ontologies based on description logics (a variant of first-order logic) and semantic networks (more informal) SDSC/LTER Workshop Feb’2004 40 Resource Description Framework (RDF) Simple data model that consists of – Resources (uniquely identified via URIs) – Properties – Values (resources or character strings) Data organized into triples (subject, property, value) locatedIn CaliforniaRegion SonomaRegion Subject (Resource) Property (Resource) Value (Resource) locatedIn(SonomaRegion, California) SDSC/LTER Workshop Feb’2004 41 RDF Schema Adds a set of pre-defined properties to define classes and properties Allows instances to be connected to classes Sub-class and sub-property (is-a) relationships Region is a class locatedIn is a property locatedIn connects Regions locatedIn Region rdf:type rdf:type locatedIn CaliforniaRegion SonomaRegion SDSC/LTER Workshop Feb’2004 42 OWL Adds additional pre-defined properties to further constrain an ontology (See http://www.w3.org/TR/owl-guide/) Note, RDF(S) and OWL use XML Some graphic tools exist (e.g., Protégé) A Vintage is a class that is a subclass of an unnamed class whose instances always have one hasVintageYear property. <owl:Class rdf:ID="Vintage"> <rdfs:subClassOf> <owl:Restriction> <owl:onProperty rdf:resource="#hasVintageYear"/> <owl:cardinality>1</owl:cardinality> Note the uglified XML syntax… </owl:Restriction> The good news: meant for </rdfs:subClassOf> </owl:Class> parsers, not humans! SDSC/LTER Workshop Feb’2004 43 Protégé SDSC/LTER Workshop Feb’2004 44 Description Logic A language and syntax for describing “concept” logics – – – – – Concept names C (denote sets of instances) Class definitions D (denote sets of instances) Subclass definition C ⊑ D Equivalence definition C D Definition constructors • intersection D ⊓ D • union D ⊔ D • Property existence hasProp.D • Property restriction hasProp.D • Cardinality =1 hasProp.D, >1 hasProp.D, <2 hasProp.D SDSC/LTER Workshop Feb’2004 45 Description Logic Wine ⊑ PotableLiquid ⊔ hasColor.{Red, Rose, White) The class Wine is a sub-class of PotableLiquids that have at least one (exists one) hasColor property whose values are either Red, Rose, or White WhiteWine Wine ⊓ hasColor.{White) WhiteWines are exactly Wines whose color is White WhiteBurgandy ⊑ WhiteWine ⊓ Burgandy The set of WhiteBurgandy wines is a subset of the set of WhiteWines intersected with Burgandy wines SauvignonBlanc ⊑ WhiteWine ⊓ =1 madeFromGrape.SauvignonBlancGrape SDSC/LTER Workshop Feb’2004 46 Constructing Ontologies In general, creating an ontology is hard – Requires general agreement and understanding of a domain – Requires a clear, concise, and unambiguous definition – May invoke controversy – Is a hard data-modeling problem (complex constraints, broad domain) SDSC/LTER Workshop Feb’2004 47 Outline • Ontologies basics • Ontologies and data management • Benefits of ontologies • Constructing ontologies • Breakout Exercises SDSC/LTER Workshop Feb’2004 48 Breakout Exercises Divide into the same groups as yesterday Develop an ontology for the domain you worked on: • • • Define relevant concepts Define relationships among concepts If you have time, work on simple constraints (cardinality, value restrictions) Capture (on paper, or in PPT if you feel ambitious) your ontology in whatever way makes sense to you (e.g., as circle-line drawings or as list of terms and properties). What assumptions did you make in creating your ontology? If you have time, develop a scenario for your ontology in terms of your workflow. For example, to show how your ontology could help integration or query. SDSC/LTER Workshop Feb’2004 49 Some References Mario Bunge. Treatise on Basic Philosophy, Vol. 3, Ontology I: The Furniture of the World. D. Reidel Publishing Company, 1977. Nicola Guarino. Formal ontology and information systems. In Proc. of Formal Ontology in Information Systems, IOS Press, pp. 3-15, 1998. Thomas R. Gruber. Toward principles for the design of ontologies used for knowledge sharing. In Formal Ontology in Conceptual Analysis and Knowledge Representation, Kluwer Academic Publishers, 1993. Jeffrey Parsons and Yair Wand. Emancipating instances from the tyranny of classes in information modeling. In ACM Transactions on Database Systems, 25(2):228-268, 2000. SDSC/LTER Workshop Feb’2004 50 Some References Michael Smith, Chris Welty, and Deborah McGuinness. OWL Web Ontology Language Guide. W3C Proposed Recommendation. (http://www.w3.org/TR/owl-guide/). Includes Wine Ontology. Protégé. Stanford Medical Informatics. http://protege.stanford.edu/index.html. Freely available. Lots of plug-ins. SDSC/LTER Workshop Feb’2004 51 Data Registration What is Data Registration? • A mechanism by which data sources are published in a repository or registry for the purpose of – data discovery, querying, retrieval (“get”, “copy”), update, transformation, migration, application binding, query planning, concept-based rewriting, … SDSC/LTER Workshop Feb’2004 53 Things to Register • Data files (individual files) – • • • • e.g. shapefile as a blob (+ file type) Collections (of files or subcollections) Ontologies Services (web + grid services) Databases (has schema and can be queried) – – – – – e.g. shapefile as a DB with schema registered schemas (relational, XML, …), local integrity constraints, access information (connection mechanism, protocols, query capabilities, handles to actual data) registration constraints to (identifiable/registered) ontologies (aka “registration mappings”) SDSC/LTER Workshop Feb’2004 54 Things to register (w/ metadata!) aka Registration Objects • Data files (individual files) – Shapefile as a blob (+ file type) • Collections (of files; nested; eg satellite data) • Databases (has schema and can be queried) – Shapefile with schema registered • Ontologies • Services (web + grid services) • Other/external applications SDSC/LTER Workshop Feb’2004 55 Connecting Datasets to Ontologies DataCollectionEvent Measurement MeasurementContext MeasurableItem SpeciesCount SpeciesAbundance AbundanceCollectionEvent Location LTERSite SBLTERSite {naples,…} ⊑ contains.Measurement ⊑ measureOf.MeasurableItem ⊓ hasContext.MeasurementContext ⊑ hasTime.DateTime ⊓ hasLocation.Location ⊑ hasUnit.Unit ⊓ hasValue.UnitValue ⊑ MeasurableItem ⊓ hasSpecies.Species ⊓ hasUnit.RatioUnit … ⊑ Measurement ⊓ measureOf.SpeciesCount ⊑ DataCollectionEvent ⊓ contains.SpeciesAbundance ⊑ position.Coordinate ⊑ Location ⊑ LTERSite ⊓ position.SBLTERCoordinate ⊑ SBLTERSite Ontology (snippet) How can we “register” the dataset to concepts in the Ontology? Dataset Date 2000-09-08 2000-09-08 2000-09-08 2000-09-22 2000-09-18 2000-09-28 SDSC/LTER Workshop Feb’2004 Site CARP CARP CARP NAPL NAPL BULL 56 Transect 1 4 7 7 1 1 SP_Code CRGI LOCH MUCA LOCH PAPA CYOS Count 0 0 1 1 5 57 Purpose of Semantic Registration Expose “hidden” information: – What do attributes represent? – What do specific values represent? – What conceptual “objects” are in the dataset? Capture connections between the dataset and ontology to: – Find existing datasets (or parts of datasets) via ontological concepts (discovery) – Enable fine-grain integration of datasets (mediation) – Generate metadata for new data products (in a pipeline) SDSC/LTER Workshop Feb’2004 57 Semantic Registration Framework Step 1: Data provider selects relevant ontological concepts (for the dataset) Step 2: The semantic registration system creates a structural representation based on chosen concepts (data provide refines if needed) Step 3: The data provider maps the dataset information to the generated structural representation SDSC/LTER Workshop Feb’2004 58 Step1: Selecting Relevant Concepts Concepts from an Ontology • DataCollectionEvent • AbundanceCollectionEvent • Location • LTERSite • SBLTERSite • naples • Measurement • Abundance • SpeciesAbundance • MeasurementContext •… • MeasurableItem • SpeciesCount • Species •… Dataset Date 2000-09-08 2000-09-08 2000-09-08 2000-09-22 2000-09-18 2000-09-28 SDSC/LTER Workshop Feb’2004 Site CARP CARP CARP NAPL NAPL BULL Transect 1 4 7 7 1 1 59 SP_Code CRGI LOCH MUCA LOCH PAPA CYOS Count 0 0 1 1 5 57 Step1: Selecting Relevant Concepts Concepts from an Ontology • DataCollectionEvent • AbundanceCollectionEvent • Location • LTERSite • SBLTERSite • naples • Measurement • Abundance • SpeciesAbundance • MeasurementContext •… • MeasurableItem • SpeciesCount • Species •… Dataset Date 2000-09-08 2000-09-08 2000-09-08 2000-09-22 2000-09-18 2000-09-28 SDSC/LTER Workshop Feb’2004 Site CARP CARP CARP NAPL NAPL BULL Transect 1 4 7 7 1 1 60 SP_Code CRGI LOCH MUCA LOCH PAPA CYOS Count 0 0 1 1 5 57 Step2: Generate Object Model Concepts from an Ontology • DataCollectionEvent • AbundanceCollectionEvent • Location • LTERSite • SBLTERSite • naples Abundance Collection Event contains • Measurement • Abundance • SpeciesAbundance • MeasurementContext •… • MeasurableItem • SpeciesCount • Species •… measureOf SpeciesAbundanc e SpeciesCount hasValue hasTime DateTime SDSC/LTER Workshop Feb’2004 hasLoc hasSpecies RatioValue SBLTERSite 61 Species hasUnit RatioUnit SDSC/LTER Workshop Feb’2004 62 SDSC/LTER Workshop Feb’2004 63 SDSC/LTER Workshop Feb’2004 64 A System for Semantic Integration of Geologic Maps via Ontologies Kai Lin Bertram Ludäscher Geologic Map Integration • Given: – Geologic maps from different state geological surveys (shapefiles w/ different data schemas) – Different ontologies: • Geologic age ontology • Rock classification ontologies: – Multiple hierarchies (chemical, fabric, texture, genesis) from Geological Survey of Canada (GSC) – Single hierarchy from British Geological Survey (BGS) • Problem – Support uniform queries using different ontologies – Support registration w/ ontology A, querying w/ ontology B SDSC/LTER Workshop Feb’2004 66 Geologic Map Integration domain knowledge +/- a few hundred million years Nevada GEON Metamorphism Equation: +/- Energy Geoscientists + Computer Scientists Igneous Geoinformaticists A Multi-Hierarchical Rock Classification Ontology (GSC) Genesis Fabric Composition Texture SDSC/LTER Workshop Feb’2004 68 Implementation in OWL: Not only “for the machine” … SDSC/LTER Workshop Feb’2004 69 System Overview Data Data Ontology enabled Map Integrator {A,B} ontology A ontology B Data ontology C Application (B) Application (C) Data Data sets SDSC/LTER Workshop Feb’2004 Ontologies 70 Applications Ontology Repository • Accept user-defined ontologies in OWL • Any ontology saved in the system can be imported into a userdefined ontology ( inter-ontology references) • Provide tool to browse the ontologies in the repository …………….. composition.owl <owl:Ontology> <owl:imports rdf:resource= "http://compute5.sdsc.geongrid.org:8080/workbench/jsp/ontologies/genesis.owl" /> </owl:Ontology> ……………. <owl:Class rdf:ID="Ultramafite"> <rdfs:subClassOf rdf:resource="#Ultramafic"/> <rdfs:subClassOf rdf:resource= "http://compute5.sdsc.geongrid.org:8080/workbench/jsp/ontologies/genesis.owl#Igneous"> </owl:Class> …………….. SDSC/LTER Workshop Feb’2004 71 Ontology Mapping: Motivation • Align ontologies • Integrate data sets which are registered to different ontologies • Query data sets through different ontologies • Ontology parameterization Data set 1 register Ontology 1 Ontology mappings register Data set 2 SDSC/LTER Workshop Feb’2004 Ontology 2 72 queries Ontology Mapping: Definition An ontology mapping consists of : • a class mapping f: a partial mapping from the class set of Oa to the class set of Ob preserving the subclass relation • a property mapping g: a partial mapping from the property set of Oa to the property set of Ob such that if p is a property between A1 and A2 in Oa, then g(p) is a property between f (A1) and f(A2) in Ob f(A1) A1 p g(p) f(A2) A2 SDSC/LTER Workshop Feb’2004 Oa 73 Ob Ontology Mapping: Combining Ontologies The result O of combining ontologies Oa and Ob is a pushout of the following ontology mappings f and g : Example: Oc Oa Ob O Oc A1 A Oa p A2 O Ob B1 q SDSC/LTER Workshop Feb’2004 A2 p A B2 q B2 74 Ontology Switching Given an ontology mapping f from Oa to Ob, Oa can be used to query any data sets which are registered to Ob. Data set 1 register Ontology Ob Data set 2 register Ontology mapping Ontology Oa SDSC/LTER Workshop Feb’2004 75 queries Geology Workbench : Initial State click on Ontologies click on Datasets click on Applications An Ontology-based Mediator SDSC/LTER Workshop Feb’2004 76 Geology Workbench: Uploading Ontologies click on Ontology Submission Choose Click antoOWL checkfile its to detail upload SDSC/LTER Workshop Feb’2004 77 Name Space Can be used to import this ontology into others Geology Workbench: Data (to Ontology!) Registration Step 1: Choose Classes Click on Submission Data set name Select a shapefile Choose an ontology class SDSC/LTER Workshop Feb’2004 78 Geology Workbench: Data Registration Step 2: Choose Columns for Selected Classes It contains information about geologic age AREA PERIMETER AZ_1000 AZ_1000_ID GEO PERIOD ABBREV DESCR D_SYMBOL P_SYMBOL SDSC/LTER Workshop Feb’2004 79 Geology Workbench: Data Registration Step 3: Resolve Mismatches Two terms are not matched any ontology terms Manually mapping algonkian into the ontology SDSC/LTER Workshop Feb’2004 80 Geology Workbench: Ontology-enabled Map Integrator All areas with the age Paleozoic Click on the name Choose interesting Classes SDSC/LTER Workshop Feb’2004 81 Geology Workbench: Change Ontology Run it New query interface Switch from Canadian Rock Classification to British Rock Classification Ontology mapping between British Rock Classification and Canadian Rock Classification Submit a mapping SDSC/LTER Workshop Feb’2004 82 Back to Scientific Workflows, Kepler (and yes, web services…) Web Services & Scientific Workflows in Kepler • Web services = individual components (“actors”) • “Minute-Made” Application Integration: – Plugging-in and harvesting web service components is easy and fast • Rich SWF modeling semantics (“directors” and more): – Different and precise dataflow models of computation – Clear and composable component interaction semantics Web service composition and application integration tool • Coming soon: – – – – Shrinked wrapped, pre-packaged “Kepler-to-Go” (v0.8) SWFs with structural and semantic data types (better design support) Grid-enabled web services (for big data, big computations,…) Different deployment models (SWF WS, web site, applet, …) SDSC/LTER Workshop Feb’2004 84 Genomics Example: Promoter Identification Workflow Source: Matt Coleman (LLNL) SDSC/LTER Workshop Feb’2004 85 Ecology: GARP Analysis Pipeline for Invasive Species Prediction Test sample (d) Registered Ecogrid Database EcoGrid Query Species presence & absence points (native range) (a) Registered Ecogrid Database +A1 +A2 +A3 Sample Data Training sample (d) Data Calculation GARP rule set (e) Map Generation Native range prediction map (f) Model quality parameter (g) Integrated layers (native range) (c) Environmental layers (native range) (b) Invasion area prediction map (f) Map Generation Layer Integration Registered Ecogrid Database Environmental layers (invasion area) (b) Layer Integration User Model quality parameter (g) Integrated layers (invasion area) (c) EcoGrid Query Registered Ecogrid Database Validation Validation Archive To Ecogrid Selected prediction maps (h) Generate Metadata Species presence &absence points (invasion area) (a) SDSC/LTER Workshop Feb’2004 Source: NSF SEEK (Deana Pennington et. al, UNM) 86 Source: NIH BIRN (Jeffrey Grethe, UCSD) SDSC/LTER Workshop Feb’2004 87 KEPLER Core Capabilities (1/2) • Capturing scientific workflows – Accessing available workflows through the Grid • Designing scientific workflows – Composition of actors (tasks) to perform a scientific WF • Actor prototyping • Accessing heterogeneous data – Data access wizard to search and retrieve Grid-based resources – Relational DB access and query – Ability to link to EML data sources SDSC/LTER Workshop Feb’2004 88 KEPLER Core Capabilities (2/2) • Data transformation actors to link heterogeneous data • Executing scientific workflows – Distributed and/or local computation – Various models for computational semantics and scheduling – SDF and PN: Most common for scientific workflows • External computing environments: – C++, Python, C (… Perl--planned ...) • Deploying scientific tasks and workflows as web services (… planned …) SDSC/LTER Workshop Feb’2004 89 The KEPLER GUI (Vergil) Drag and drop utilities, director and actor libraries. SDSC/LTER Workshop Feb’2004 90 Running the workflow SDSC/LTER Workshop Feb’2004 91 Distributed SWFs in KEPLER • Web and Grid Service plug-ins – WSDL, GWSDL – ProxyInit, GlobusGridJob, GridFTP, DataAccessWizard • WS Harvester – Imports all the operations of a specific WS (or of all the WSs in a UDDI repository) as Kepler actors • WS-deployment interface (…ongoing work…) • XSLT and XQuery transformers to link non-fitting services together SDSC/LTER Workshop Feb’2004 92 A Generic Web Service Actor Given a WSDL and the name of an operation of a web service, dynamically customizes itself to implement and execute that method. Configure - select service operation SDSC/LTER Workshop Feb’2004 93 Set Parameters and Commit Set parameters and commit SDSC/LTER Workshop Feb’2004 94 WS Actor after Instantiation SDSC/LTER Workshop Feb’2004 95 Web Service Harvester • Imports the web services in a repository into the actor library. • Has the capability to search for web services based on a keyword. SDSC/LTER Workshop Feb’2004 96 Composing 3rd-Party WSs Output of previous web service User interaction & Transformations SDSC/LTER Workshop Feb’2004 97 Input of next web service GEON Kepler Examples • Geon Classifier (Efrat) A workflow for classifying igneous rocks. • Geologic Map Information Integration A workflow for map rendering using web services(created by Ilkay and Ashraf). • Database Access (Efrat) Generic actors for connecting and querying a database. SDSC/LTER Workshop Feb’2004 98 Problem Description • Classification of Igneous rocks • Data sets – Virginia rock database (provides mineral composition). – Igneous rock diagrams and a transition table for traversing between diagrams. • Method – Iterations of finer descriptive levels using a point-inpolygon algorithm. SDSC/LTER Workshop Feb’2004 99 British Classification of Igneous Rocks SDSC/LTER Workshop Feb’2004 100 Mineral Classification of Igneous Rocks • Inputs: – A row id from the Virginia rock database (contains mineral composition). – A dataset of diagrams for classification. • Outputs: – The rock name. – A browser display of each classification level. A new feature added in Kepler. • Execution: – Divided into levels. Each provides a finer level of granularity. – At each level, a point is classified within a diagram using a PointInPolygon algorithm. SDSC/LTER Workshop Feb’2004 101 Classifying with Kepler Extract mineral composition for row Id. Igneous Rock Diagrams information. SDSC/LTER Workshop Feb’2004 Rock Name. 102 Classifying with Kepler SDSC/LTER Workshop Feb’2004 103 Classifying with Kepler Finer granularity Extracted from the mineral composition and this level’s diagram coordinates. Diagrams information and transitions between them. Classifier: Locates the point’s region. SVG to polygons. Displays the point in the diagram for this level. SDSC/LTER Workshop Feb’2004 104 SDSC/LTER Workshop Feb’2004 105 Geologic Map Integration • Ontology-enabled Map Integration (OMI) – Integration of Heterogeneous Geological Datasets • Data sets – State geology map datasets (rocky mountain area) – State boundaries and coast lines. • Rock Type Ontologies SDSC/LTER Workshop Feb’2004 106 SWF Designed in Kepler SDSC/LTER Workshop Feb’2004 107 DataMapper Sub-Workflow SDSC/LTER Workshop Feb’2004 108 Result launched via the BrowserUI actor SDSC/LTER Workshop Feb’2004 109 Providing DB Access through Kepler • Database connection actor: – Opening a database connection and passing it to all actors accessing this database. • Database query actor: – A generic actor that queries a database and provides its result. • DBConnection type and DBConnectionToken: – A new IOPort type and a token to distinguish a database connection from any general type. Database Connection Actor OpenDBConnection actor: • Input: database connection information. • Output: A DBConnectionToken, a reference to a database connection instance, through a DBConnection output port. Database Query Actor Database Query actor: Input: A query string (SQL) and a database connection reference. Parameters: output type – XML, Record or String. output each row separately or all at once. Process: Execute query. Produce results according to parameters. Querying Example KEPLER and YOU • Kepler … – is a community-based, crossproject, open source collaboration – uses web services as basic building blocks – has a joint CVS repository, mailing lists, web site, … – is gaining momentum thanks to contributors and contributions • BSD-style license allows commercial spin-offs – a pre-packaged, shrink-wrapped version (“Kepler-to-GO”) coming soon to a place near you… SDSC/LTER Workshop Feb’2004 114