- Rensselaer Polytechnic Institute

advertisement
Advanced Semantic
Technologies
Prof. Deborah McGuinness and Dr. Patrice Seyed
CSCI 6965 - 01
CSCI 4967 - 01
ITWS 6962 - 01
ITWS 4963 - 01
TA: Justin Karpenski (kaprej2@rpi.edu)
Week 2, January 29, 2013
1
Introduction to SemantEco
Patrice Seyed
(NSF) DataONE Postdoctoral Fellow
at Rensselaer Polytechnic Institute (RPI) joint with University of New Mexico (UNM)
*A version of this slide deck was used in this year’s AGU Ignite talk and was a revision of the
previous year’s AGU Ignite talk by Professor McGuinness.
Project page:
http://tw.rpi.edu/web/project/SemantEco
2
Introduction
• Heterogeneity of Syntax and Schema Makes Data
Integration Difficult
• To address, we apply W3C Standards
Resource Description Framework (RDF), and
Web Ontology Language (OWL)
to structure data and domain knowledge
3
Use Case (Hydro-Eco)
•
Aid scientists in discovering data about water quality conditions
and potential correlation with population counts of
certain organisms.
•
Model and query from facets:
• geographic regions
• time periods
• measured variables
• organisms
•
Combining data from:
• Water data from EPA and USGS
• Bird count Data from eBird
4
Use Case (Hydro-Eco)
•
Aid scientists in discovering data about water quality conditions
and potential correlation with population counts of
certain organisms.
•
Model and query from facets:
• geographic regions
• time periods
• measured variables
• organisms
•
Combining data from:
• Water data from EPA and USGS
• Bird count Data from eBird
5
Semantic Structure to Data:
Translation Into RDF
• Knowledge Pattern inspired by:
– Open Geospatial Consortium (OGC) Observation &
Measurement (O&M)
– Extensible Observation Ontology (OBO-E)
• Focus on Measurements:
• ‘Of Entity’/ ‘has Characteristic’ (feature of interest)
– e.g., horned Owl (species), Arsenic (chemical)
• ‘Has Value’ (result of measuring procedure)
– e.g., 2 , .400
• ‘Has Unit’
– e.g., Integer, mg/l
• ‘From Source’
– e.g., eBird, EPA
6
Encoding Domain Knowledge:
(SemantEco) Pollution Ontology
• Extends existing best
practice ontologies, e.g.
SWEET, OWL-Time.
• Includes terms for
relevant pollution
concepts
• Can be use to show:
any water source that has a
measurement outside of its
allowable range is a polluted
water source.
7
Encoding Domain Knowledge:
(SemantEco) Regulation Ontology
• Simple reusable
regulation ontology
– Models the federal and state
water quality regulations for
drinking water sources
– Can be use to define: for
example, in California, “any
measurement has value 0.01
mg/L is the limit for Arsenic”
– Combined with the pollution
ontology, we can infer “any water
source contains 0.01 mg/L of
Arsenic is a polluted water
source.”
Portion of Cal. Regulation Ontology.
8
SemantEco Workflow
Publish
CSV2RDF4LOD
Direct
Reason
Visualize
derive
derive
archive
CSV2RDF4LOD
Enhance
Archive
9
SemantEco Workflow
Publish
CSV2RDF4LOD
Direct
Reason
Visualize
derive
derive
archive
CSV2RDF4LOD
Enhance
Archive
10
Value of semantic technologies:
Taxonomic Classification using
eBird/Clements Taxonomy, GeoSpecies Ontology 1
Value of semantic technologies:
Taxonomic Classification using
eBird/Clements Taxonomy, GeoSpecies Ontology 2
Value of semantic technologies:
(Left) Chemical categorizations from
Ontology of Consortium of Universities for the
Advancement of Hydrologic Science, Inc. (CUAHSI)
SemantEco:
A Tree-Faceted Search
• Leverage representations
derived from ontologies of:
– Bird Species’ and
– Chemicals Measured in Water
• To narrow the search for:
– Bird Counts and
– Water Measurement Sites
SemantEco (Portal)
• Enable/Empower citizens &
scientists to explore pollution
sites, facilities, regulations, and
health impacts along with
provenance.
• Demonstrates semantic
monitoring possibilities.
• Map presentation of analysis
• Explanations and Provenance
available
4
2
3
1
http://was.tw.rpi.edu/swqp/map.html and
http://aquarius.tw.rpi.edu/projects/semantaqua
1.
2.
3.
4.
Map view of analyzed results
Explanation of pollution
Plotting specific measurement data alone or alongside species
count data
Filtering by facet to select type of data
15
SemantEco
1.
2.
3.
A zip code is entered along
with selected type of data and
sources
A water site is selected in the
map
Option to plot water
measurements from site
alone or with bird count data.
SemantEco:
Selecting Characteristics for
Plotting Data
SemantEco:
Searching for Nearby Bird Count Data
• Use of Branch Siblings
– Make suggestions to the
user when no data Is
Present
• “You attempted to plot bird
count data for the species
“Bare-legged Owl”, but
none is available. There is
however Bird count data
on other species of in
family “Owl”, including
“Bare-shanked Screech
Owl” and “Brown-Fish Owl”
, would you like to plot this
data instead?”
SemantEco:
Plotting Water Data with Bird Count Data
Water Measurement Data Plotted Alone
Water Measurement Data Plotted
With Nearby Bird Count Data
SemantEco:
Access Into Bird Count Data
20
In the Near Future
• Use hierarchies as association links for
DataONE SOLR Index Search
• Use our Semanteco’s Methodology and
Architecture to extend to other measurement
domains (e.g., Air Quality data from EPA) and
other species (e.g., algae/size)
• Flexibly Find, Explore, and Visualize Data
21
Static Demos
• Water Quality Portal
– http://inferenceweb.org/wiki/Semantic_Water_Quality_Portal
• SemantEco (recent work)
– http://tw.rpi.edu/web/node/2778
22
Download