- Tetherless World Constellation

advertisement
Linking Disparate Datasets of the
Earth Sciences with the SemantEco
Annotator
Session: Managing Ecological Data for
Effective Use and Reuse
Patrice Seyed1,2, Katherine Chastain1, and Deborah McGuinness1
1
Tetherless World Constellation, Rensselaer Polytechnic Institute, 110 8th Street, Troy, NY 12180
2 DataONE, University of New Mexico, 1 University Boulevard N.E., Albuquerque, NM 87131
Overview
•
•
•
•
•
•
•
•
•
•
Introduction
Semantics and Linked Data
Use Case: SemantEco
SemantEco Annotator
– Concept
– Getting started
– Overview
Ontologies
Capabilities
Integration with Semantic Applications
Future Work
Quick Look Video
Summary
1
Introduction
• How can we take datasets from different
sources and make them
– Easy to search and to discover?
– Easy to use and to re-use?
– Easy to integrate with each other for
visualization and other applications?
2
Semantics and Linked Data
• We need a way to describe the relationships
between tabular data columns…
Linked-data formats such as the Resource Description
Framework (RDF) capture such relationships in subjectpredicate-object triples.
• … and we need a method of description that
is both standardized and machine-readable.
Communities can develop, use, and reuse common
vocabulary with ontologies, expressed in a computerreadable format: the Web Ontology Language (OWL)
3
Semantics and Linked Data
• Linked format aids interoperability, making
it easier to share.
• Use existing URI’s to refer to well-defined
entities and concepts:
– How do you make sure that everyone using
your data understands that the string “NY”
refers to the US state of New York?
– What more can you learn if you can easily
discover other datasets that also refer to the
US state of New York?
4
Use Case: SemantEco
• SemantEco is a data visualization
environment that allows a user to
explore ecological data through a mapbased interface.
• Data comes from a variety of sources:
– Federal, such as the USGS, EPA.
– Local, such as the Darrin Freshwater
Institute of Upstate New York.
– … each with different notations and bestpractices for gathering and recording.
5
Conceptually....
• Represent data independent of the
schema by which it was recorded
• This enables comparisons across
data from different sources
• In SemantEco,
we look at
Measurements:
•
•
•
•
Water quality
Air quality
Birds
Fish
6
SemantEco Annotator
Allows a user to:
• Translate data into linked-data formats such as RDF:
– Linked data triples describe how columns in a data table relate
to each other, and to the data in that column.
– OWL ontologies provide standard vocabularies for describing
data these relationships.
– Resulting enriched RDF data can be used immediately within
RDF stores / hosted as LD.
• OR to utilize semantics to annotate data:
– Column headers correspond to OWL properties
– Data cell values can correspond to OWL classes or datatypes
– Organizational best-practices and terminology can be defined in
the data files themselves.
7
SemantEco Annotator:
Getting Started
8
Provenance and Metadata
• Annotator asks the user to provide
metadata about the dataset.
• This is also becomes part of the final RDF,
facilitating the dataset’s discoverability.
9
SemantEco Annotator
-- Tabular data view
10
SemantEco Annotator
-- Ontology loader
-- Ontology facets
11
SemantEco Annotator
-- Global settings
12
SemantEco Annotator
-- Drag-and-drop to make assignments
-- Work directly on tabular data
13
Ontologies
• Load one or
more ontologies
from the
dropdown menu.
• Or import from a
URI.
• Annotator also
maintains a list of
recent imports for
re-use.
14
Capabilities
• Provide a definition for “Accession Code”
• Specify which standard was used to record the Date
• Group “Lake Name”, “Z Max” and “Sample Z” together as a single
entity: the location where the sample was taken
• Make explicit that “NH4+” is the same thing as “Ammonium”, and that
the units (mg/L) apply to each number in that column.
15
Integration with Semantic
Applications
• Identify application’s
requirements:
• Eg., a piece of data with lat-long
coordinates can be plotted on a map.
“Big Moose
Lake”
• We brought in data from the
Darrin Freshwater Institute
containing water quality data
for lakes in Upstate New
York, augmenting existing
data from the U.S. Geological
Survey.
16
Integration with Semantic
Applications
• Linking data to well-defined entities and
concepts by URI enhances searchability.
dbpedia:
New_York
“NY”
“New York State”
dbpedia:
New_York_City
“New York”
17
Future Work
• Automatic mappings directed to a particular graph closed
under a predicate/object pair, use of OWL domain and
range restriction axioms to guide the user in vocabulary
selection decisions
• Use of OWL class definitions to enable a top-down
approach for modeling data
• Ability to load enhancement files, both to facilitate
translation of multiple similar datasets, and to make
corrections easier.
• Construction of a platform for better management of
linked data, within which the Annotator plays a vital role.
• Use of application requirements to create “templates” for
new data sources to be integrated more easily.
18
Summary
• “SemantEco Annotator” component for
ease of translation into RDF
• Multi-purposed for translation, annotation,
and generalized mapping.
• A Part of a Future “Suite” that couples
Annotation and Search
19
SemantEco Annotator
Project Page
Want more info? Interested in collaborating?
See Evan Patton or email Deborah McGuinness
dlm@cs.rpi.edu
We also have a project page with screenshots
and demonstration videos:
http://tw.rpi.edu/web/project/SemantEcoAnnotator
20
Acknowledgements
• Rensselaer Polytechnic Institute
• Tetherless World Constellation at RPI
• DataONE
21
SemantEco: More Info
For additional information about SemantEco:
“Addressing the Challenges of Multi-Domain
Data Integration with the SemantEco
Framework”
Friday @ 10:35am, IN52B-02.
E.W. Patton; P. Seyed; D.L. McGuinness
22
Download