DiGIR2 : A Semantic Data Provider for Communities Steve Perry ()

advertisement

DiGIR2 : A Semantic Data

Provider for Communities

Steve Perry (smperry@ku.edu)

© 2006 University of Kansas

DiGIR2 :: A Semantic Data Provider RDF, Ontologies and Meta-Data Workshop

The Problem

RDF/OWL is useful for the modeling and exchange of scientific data but…

• Most of the data are in relational databases

• Many of the organizations that want to publish data have insufficient IT resources

• Much of the work on building semantic web communities has focused on ontology development at the expense of issues such as versioning and trust

© 2006 KU BRC 18-Apr-20

DiGIR2 :: A Semantic Data Provider RDF, Ontologies and Meta-Data Workshop

Critical Issues for a Semantic Data Provider

• Synchronization : keeping published RDF graphs in synch with an underlying database

• Resource identification : unique identifiers for resources, descriptions, or graphs

• Versioning : tracking changes to resource descriptions over time

• Persistence : long term storage of descriptions

• Harvesting : acquisition of large amounts of data for the purpose of indexing or archiving

• Provenance and trust : communicating who published what and what their intentions are regarding their data

© 2006 KU BRC 18-Apr-20

DiGIR2 :: A Semantic Data Provider RDF, Ontologies and Meta-Data Workshop

DiGIR2 :: A Semantic Web Publishing System

Harvest

Service

SPARQL

Service

LSID

Authority

Public Web

Services

Triple Store

Synchronizer

• Implemented in Java

• Jena API and triple store

• Multiple services including

SPARQL and OAI-PMH

• Can support multiple datasets (not shown)

Data Source

© 2006 KU BRC 18-Apr-20

DiGIR2 :: A Semantic Data Provider RDF, Ontologies and Meta-Data Workshop

Synchronization

Updating a graph of resource descriptions based on changes in an underlying data source such as a relational database.

DiGIR2 provides a functional language for writing mapping programs that generate RDF from SQL queries, and files.

The synchronizer examines a data source, transforms it into

RDF, then updates the triple-store to reflect changes.

Handles resource identifier assignment, mapping and transformation, versioning, and long-term persistence of updated and deleted resource descriptions.

© 2006 KU BRC 18-Apr-20

DiGIR2 :: A Semantic Data Provider RDF, Ontologies and Meta-Data Workshop

Identifiers

Participants in community should optimally agree on how to identify resources (URLs, URNs, LSIDs, etc).

DiGIR2 supports pluggable resource URI assignment

In future, will likely support URIs for named graphs

We’ve built a custom LSID Authority that can talk to multiple

DiGIR2 servers in a domain in order to resolve a given

LSID

© 2006 KU BRC 18-Apr-20

DiGIR2 :: A Semantic Data Provider RDF, Ontologies and Meta-Data Workshop

Change Tracking and Versioning

Different domains demand different treatment with regards to versioning and persistence. DiGIR2 allows four different options for versioning of resource descriptions:

• No tracking, no versioning

– Every time synchronization occurs, the triple store is dropped

• Tracking, no versioning

– New versions replace old, but new resources are not created

• Tracking, non-persistent versioning

– New resources are created and related to old with predicates

• Tracking, persistent versioning

– New resources created and old descriptions moved to persistent storage

© 2006 KU BRC 18-Apr-20

DiGIR2 :: A Semantic Data Provider RDF, Ontologies and Meta-Data Workshop

Persistence

Full persistence means every version of every resource is maintained forever.

A community that requires full persistence puts a significant burden on data providers and can also slow down query processing, inconveniencing data consumers.

DiGIR2 provides the ability to store persistent versions in a secondary store. This makes them unavailable for query, but versioned resource descriptions are still available through LSID resolution.

© 2006 KU BRC 18-Apr-20

DiGIR2 :: A Semantic Data Provider RDF, Ontologies and Meta-Data Workshop

Harvesting

Indexers and caching services and data consumers must be able to efficiently “harvest” data from providers.

We’re currently building an experimental implementation of the OAI metadata harvesting protocol that can return resource descriptions.

When a dataset is configured to use the appropriate change tracking and versioning system, clients can selectively harvest by date range.

© 2006 KU BRC 18-Apr-20

DiGIR2 :: A Semantic Data Provider RDF, Ontologies and Meta-Data Workshop

Provenance and Trust

In a data publishing framework, triples can be seen as assertions made some publisher.

When consuming graphs, clients must make trust decisions about whether to accept or reject based on who is asserting the graph and what their intention is.

In future we may implement the ability to communicate a warrant with any graph. Warrants record who asserts a graph and what they intend by it.

© 2006 KU BRC 18-Apr-20

DiGIR2 :: A Semantic Data Provider RDF, Ontologies and Meta-Data Workshop

Conclusion

Building a semantic data sharing community is more complicated than placing RDF files on a web server

A data provider framework can make it easier for communities to provide consistent access to RDF graphs by standardizing the use of identifiers, synchronization, versioning, persistence, and basic data access protocols.

© 2006 KU BRC 18-Apr-20

Download