MP HP NDR WDO-It! 101 Workshop: Creating an abstraction of a process UTEP’s Trust Laboratory Cyber-ShARE Center of Excellence • Established at UTEP in 2007 with NSF funding • Focused – Cross-disciplinary research in science, engineering, and technology at UTEP – Training workshops on using cyberinfrastructure – Education and outreach – Resource acquisition and sharing (documentation of processes) Sharing resources via Cyberinfrastructure to advance research and education. Objectives of workshop • This workshop will go over the basics of creating Semantic Abstract Workflows (SAWs). – SAWs are useful to formalize your understanding of a process and communicate it to others in a graphical format. • After completing this workshop, you will be able to use the WDO-It! tool to create SAWs about processes that you use in your every-day work. Overview • • • • • • • Motivation Purpose of creating a process abstraction Guiding example: Description of a process Task 1: Identify process vocabulary Task 2: Create a process abstraction Group exercise What is next… Motivation • How easy is it for scientists to publish their scientific data and artifacts (e.g., map) on the web? – How easy would it be for programmers to publish their latest code on the web? • How easy is it for scientists to publish their raw data and tool parameters and to link them to their derived product? – How easy would it be for programmers to publish documentation about their code on the web and to maintain them linked? Provenance: An Example (1/2) How accurate is this map? isMemberOf isMemberOf What are the sources for this map? isMemberOf hasCreator hasAgent hasCreator Can I trust that this is a hasCreator quality map? isMemberOf hasSource isConsequentOf (http://utep.edu/myDataSet.xls) (http://utep.edu/myGravityMap.gif) Provenance: An Example (2/2) How accurate is this map? isMemberOf How isMemberOf accurate is hasCreator this sensor? hasAgent hasCreator isMemberOf Can I trust that this is a hasCreator quality map? isMemberOf hasSource isConsequentOf How reliable is this tool? (http://utep.edu/myDataSet.xls) (http://utep.edu/myGravityMap.gif) On the Use of Provenance • How to Represent Provenance? – PML ontology (UTEP team has 7 years of experience representing provenance) • How to Capture Provenance? – One needs to understand scientific processes to capture provenance Side note: • What is Ontology? – Branch of philosophy that deals with what entities exist and how those entities relate to each other – The computer science version • An inventory of concepts about a specific domain of interest, and the relation between those concepts • Encoded in a computer language, e.g., OWL • E.g., Ontology of vehicles: <owl:Class Vehicle /> <owl:Class SUV> <owl:subClassOf Vehicle /> </owl:Class> CI-Miner • A comprehensive approach to facilitate provenance encoding in PML: WDO-It! 101 workshop Overview • • • • • • • Motivation Purpose of creating a process abstraction Guiding example: Description of a process Task 1: Identify process vocabulary Task 2: Create a process abstraction Group exercise What is next… Purpose of creating a process abstraction • Purpose – Model your understanding of a process – Identify appropriate vocabulary for describing a process – Identify the parts of a process that are of interest to you • Benefits – Share your understanding of a process with others – Guide the development of systems that implement your understanding of a process – Enhance existing systems to provide functionality aligned to your understanding of a process Overview • • • • • • • Motivation Purpose of creating a process abstraction Guiding example: Description of a process Task 1: Identify process vocabulary Task 2: Create a process abstraction Group exercise What is next… Guiding example: A process (1/2) • Description: – Geo-referenced datasets usually are built from sparse field measurements , where the location of each point is given by Longitude/Latitude coordinates – To create a map model of a geo-referenced dataset, e.g., a contour map, the dataset usually needs to: 1. Be pre-processed to create a grid of uniformly-spaced data points 2. Create the map model from the uniformly distributed dataset Guiding example: A process (2/2) 1. Sparse geo-referenced dataset Longitude -074.4244296 -074.9746118 -074.4245976 -074.7647730 -074.7647730 -074.3714268 -074.3714268 -074.2129201 -074.3237562 -074.3237562 Latitude 40.0049488 40.0051130 40.0051168 40.0059447 40.0059447 40.0099501 40.0101141 40.0109512 40.0139483 40.0139483 OBS 4176.40 4189.60 4176.40 4199.07 4199.10 4173.71 4173.69 4159.90 4172.05 4172.10 2. Uniformly distributed dataset (Grid) ncols 5 nrows 5 cellsize=2 4604 4599 4619 4618 4596 4599 4551 4562 4532 4512 4598 4611 4598 4575 4482 4602 4586 4593 4572 4449 4606 4566 4585 4535 4459 3. Map model Overview • • • • • • • Motivation Purpose of creating a process abstraction Guiding example: Description of a process Task 1: Identify process vocabulary Task 2: Create a process abstraction Group exercise What is next… Task1: Identify process vocabulary • Launch WDO-It! – Download from http://trust.utep.edu/wdo/downloads • Capture process vocabulary in a WorkflowDriven Ontology (WDO) • WDOs: – OWL document – Can import vocabulary terms from other existing OWL ontologies Task1: Identify process vocabulary 1. Click 2. Enter a namespace (URI-like format) Recommendation: Use a namespace that matches the URL where you will publish the WDO Example: http://trust.utep.edu/2009/ContourMapWDO You can change namespaces later with a text editor that supports the “Replace All” operation. Task1: Identify process vocabulary Loaded OWL Documents Tree WDO namespace on the root Imported Ontologies subtree reflects <owl:imports> statement of OWL document Note1: All WDO’s import the wdo.owl ontology Note2: The WDO in the root of the tree is the one being edited, imported ontologies are not modified. Task1: Identify process vocabulary • WDOs capture the vocabulary of a process in two main categories: – Data – Method • What are the data concepts of our process? – ContourMap, GriddedDataset • What are the method concepts of our process? – Gridding, Contouring Task1: Identify process vocabulary Adding Data and Method concepts to the WDO 1. Click 2. Choose type 3. Add label (<rdfs:label>) and optionally a comment (<rdfs:comment>) Note: URIs automatically generated by WDO-It! in reference to namespace assigned to the WDO document. Facilitates renaming! Task1: Identify process vocabulary Start with the more general concepts of your process and start building your WDO hierarchies. To add a child concept: Select a concept from the Data or Method tree, then click the Add Concept Icon You can remove concepts that do not have children by selecting them and clicking the Remove Concept Icon You can rename and add comments to concepts by selecting them and clicking the Edit Concept Icon Note: Concept Hierarchy shows ALL concepts defined in the selected OWL document Overview • • • • • • • Motivation Purpose of creating a process abstraction Guiding example: Description of a process Task 1: Identify process vocabulary Task 2: Create a process abstraction Group exercise What is next… Task2: Create process abstraction • Now that process vocabulary was captured in a WDO, create process abstraction in a Semantic Abstract Workflow (SAW) • SAWs: – are OWL documents – do not include vocabulary (or class) definitions – reuse vocabulary definitions from a WDO • i.e., import a ‘Source WDO’ • e.g., <owl:imports rdfs:resource=“ContourMapWDO”> – use a graphical notation Task2: Create process abstraction 1. Click 2. Enter a namespace (URI-like format) Recommendation: Use a namespace that matches the URL where you will publish the SAW Example: http://trust.utep.edu/2009/CreateGravityContourMapSAW Task2: Create process abstraction Loaded OWL Documents Tree WDO namespace on the root (Source WDO) Workflows subtree reflects <owl:imports> statement of OWL document w.r.t. source WDO SAWs do not contain concept definitions, hence, concept hierarchy empty when SAW selected Workflow area enabled when SAW selected Task2: Create process abstraction Adding instances to the SAW 1. Click and hold on a WDO concept 2. Drag and drop on the Workflow area Data concepts are rendered as directed edges with beginning and ending ‘Sources’ Note: ‘Sources’ are instances of the ‘pmlp:Source’ class Methods are rendered as rectangles Task2: Create process abstraction Removing instances from the SAW 1. Select an Data or a Method instance 2. Press the Delete key Note: ‘Source’ instances cannot be individually deleted, since Data instances depend on them Edit instances 1. Select an Data, Method, or Source instance 2. Click the Edit Instance Icon Task2: Create process abstraction Editing SAW instances: pmlp:Source instances Source instances can be specialized into other subtypes, as per the pmlp ontology, e.g., pmlp:Person wdo:Data instances Data instances can be assigned a pmlp:Format instance (URI), which will be used during the creation of data annotators to encode provenance with PML wdo:Method instances Method instances can be assigned a pmlp:InferenceEngine instance (URI), which will be used during the creation of data annotators to encode provenance with PML Task2: Create process abstraction Assembling the process graph: No control-flow, just data-flow! Connecting edges and nodes: 1. Click on a Source instance and hold 2. Drag and drop on top of a Method instance to connect (Dropping Method instances into Source instances works too) Note: Data instances need to be attached to a Source instance or to a Method instance on each side of the edge Disconnecting edges and nodes: 1. Select and hold an endpoint of an edge 2. Drag and drop on another part of the workflow area to disconnect Note: ‘Sources’ cannot be disconnected from Data edges Task2: Create process abstraction • Sources can be merged, as long as they are attached to the same direction of their corresponding edges Task2: Create process abstraction Exercise: Build the following SAW Lets describe Contouring in more detail Task2: Create process abstraction Creating a ‘subworkflow’: 1. 2. 3. 4. Right Click on the Contouring method instance Select ‘Edit DetailedBy property…’ Select ‘New Workflow…’ Enter namespace for new workflow and start creating new abstract workflow E.g., http://trust.utep.edu/2009/ContouringSAW 5. Notice the updated Workflow subtree in the Loaded OWL Documents section Task2: Create process abstraction Exercise: Build the following subworkflow for ‘Contouring’ Overview • • • • • • • Motivation Purpose of creating a process abstraction Guiding example: Description of a process Task 1: Identify process vocabulary Task 2: Create a process abstraction Group exercise What is next… Group exercise • As a group – Choose a volunteer to drive the WDO-It! tool – Identify a common process Group exercise • As a group – Build a WDO for the process – Hint: • Data and Method concepts are intended to classify things from a perspective that is meaningful to the end user of the process • Ex1. Digital Elevation Map and Gravity Map are probably good Data concepts for a geophysicist • Ex2. PDF Map and JPG Map are probably not good classifications for a geophysicist Group exercise • As a group – Build a SAW for the process – Hint: • Focus on the valid flows of data as input/output of Methods • Disregard control structures such as selection and iteration Overview • • • • • • • Motivation Purpose of creating a process abstraction Guiding example: Description of a process Task 1: Identify process vocabulary Task 2: Create a process abstraction Group exercise What is next… What is next? (1/3) • Create HTML reports of your WDOs and SAWs – On the menu toolbar • Choose Tools/Generate Reports… • Select OWL documents for which to generate a report • Select a store location What is next? (2/3) • Use SAWs to capture provenance – Use WDO-It! to generate provenance-annotator modules to instrument existing systems – Use WDO-It! to manually link artifacts to document provenance of manually executed processes • Look for follow-up workshop: Using process abstractions to capture provenance What is next? (3/3) • Collaborate with others to make your process abstractions more robust – CI-Server/CI-Client – WF-Talk • Publish your CI resources – CI-Server/CI-Client • Look for follow up workshop: Using the CIServer collaboration framework to develop and publish CI resources Feedback • Please take a few minutes to complete the following surveys: – Feedback about the approach • Contact information optional • handout – Feedback about the presentation of the workshop • Anonymous feedback • Online: http://trust.utep.edu/wdo/doc Thank you! For more information please contact: Leonardo Salayandia, leonardo@utep.edu Paulo Pinheiro da Silva, paulo@utep.edu Aida Gandara, agandara1@miners.utep.edu Nick Del Rio, ndel2@miners.utep.edu