Advanced Semantic Technologies Prof. Deborah McGuinness and Dr. Patrice Seyed CSCI 6965 - 01 CSCI 4967 - 01 ITWS 6962 - 01 ITWS 4963 - 01 TA: Justin Karpenski (kaprej2@rpi.edu) Week 2, January 29, 2013 1 Introduction to SemantEco Patrice Seyed (NSF) DataONE Postdoctoral Fellow at Rensselaer Polytechnic Institute (RPI) joint with University of New Mexico (UNM) *A version of this slide deck was used in this year’s AGU Ignite talk and was a revision of the previous year’s AGU Ignite talk by Professor McGuinness. Project page: http://tw.rpi.edu/web/project/SemantEco 2 Introduction • Heterogeneity of Syntax and Schema Makes Data Integration Difficult • To address, we apply W3C Standards Resource Description Framework (RDF), and Web Ontology Language (OWL) to structure data and domain knowledge 3 Use Case (Hydro-Eco) • Aid scientists in discovering data about water quality conditions and potential correlation with population counts of certain organisms. • Model and query from facets: • geographic regions • time periods • measured variables • organisms • Combining data from: • Water data from EPA and USGS • Bird count Data from eBird 4 Use Case (Hydro-Eco) • Aid scientists in discovering data about water quality conditions and potential correlation with population counts of certain organisms. • Model and query from facets: • geographic regions • time periods • measured variables • organisms • Combining data from: • Water data from EPA and USGS • Bird count Data from eBird 5 Semantic Structure to Data: Translation Into RDF • Knowledge Pattern inspired by: – Open Geospatial Consortium (OGC) Observation & Measurement (O&M) – Extensible Observation Ontology (OBO-E) • Focus on Measurements: • ‘Of Entity’/ ‘has Characteristic’ (feature of interest) – e.g., horned Owl (species), Arsenic (chemical) • ‘Has Value’ (result of measuring procedure) – e.g., 2 , .400 • ‘Has Unit’ – e.g., Integer, mg/l • ‘From Source’ – e.g., eBird, EPA 6 Encoding Domain Knowledge: (SemantEco) Pollution Ontology • Extends existing best practice ontologies, e.g. SWEET, OWL-Time. • Includes terms for relevant pollution concepts • Can be use to show: any water source that has a measurement outside of its allowable range is a polluted water source. 7 Encoding Domain Knowledge: (SemantEco) Regulation Ontology • Simple reusable regulation ontology – Models the federal and state water quality regulations for drinking water sources – Can be use to define: for example, in California, “any measurement has value 0.01 mg/L is the limit for Arsenic” – Combined with the pollution ontology, we can infer “any water source contains 0.01 mg/L of Arsenic is a polluted water source.” Portion of Cal. Regulation Ontology. 8 SemantEco Workflow Publish CSV2RDF4LOD Direct Reason Visualize derive derive archive CSV2RDF4LOD Enhance Archive 9 SemantEco Workflow Publish CSV2RDF4LOD Direct Reason Visualize derive derive archive CSV2RDF4LOD Enhance Archive 10 Value of semantic technologies: Taxonomic Classification using eBird/Clements Taxonomy, GeoSpecies Ontology 1 Value of semantic technologies: Taxonomic Classification using eBird/Clements Taxonomy, GeoSpecies Ontology 2 Value of semantic technologies: (Left) Chemical categorizations from Ontology of Consortium of Universities for the Advancement of Hydrologic Science, Inc. (CUAHSI) SemantEco: A Tree-Faceted Search • Leverage representations derived from ontologies of: – Bird Species’ and – Chemicals Measured in Water • To narrow the search for: – Bird Counts and – Water Measurement Sites SemantEco (Portal) • Enable/Empower citizens & scientists to explore pollution sites, facilities, regulations, and health impacts along with provenance. • Demonstrates semantic monitoring possibilities. • Map presentation of analysis • Explanations and Provenance available 4 2 3 1 http://was.tw.rpi.edu/swqp/map.html and http://aquarius.tw.rpi.edu/projects/semantaqua 1. 2. 3. 4. Map view of analyzed results Explanation of pollution Plotting specific measurement data alone or alongside species count data Filtering by facet to select type of data 15 SemantEco 1. 2. 3. A zip code is entered along with selected type of data and sources A water site is selected in the map Option to plot water measurements from site alone or with bird count data. SemantEco: Selecting Characteristics for Plotting Data SemantEco: Searching for Nearby Bird Count Data • Use of Branch Siblings – Make suggestions to the user when no data Is Present • “You attempted to plot bird count data for the species “Bare-legged Owl”, but none is available. There is however Bird count data on other species of in family “Owl”, including “Bare-shanked Screech Owl” and “Brown-Fish Owl” , would you like to plot this data instead?” SemantEco: Plotting Water Data with Bird Count Data Water Measurement Data Plotted Alone Water Measurement Data Plotted With Nearby Bird Count Data SemantEco: Access Into Bird Count Data 20 In the Near Future • Use hierarchies as association links for DataONE SOLR Index Search • Use our Semanteco’s Methodology and Architecture to extend to other measurement domains (e.g., Air Quality data from EPA) and other species (e.g., algae/size) • Flexibly Find, Explore, and Visualize Data 21 Static Demos • Water Quality Portal – http://inferenceweb.org/wiki/Semantic_Water_Quality_Portal • SemantEco (recent work) – http://tw.rpi.edu/web/node/2778 22