Storing and Accessing Semantic Data

advertisement
Semantic
Data
Access
Semantic CMS Community
Lecturer
Organization
Date of presentation
Co-funded by the
European Union
1
Copyright IKS Consortium
Page:
Part I: Foundations
(1)
Introduction of Content
Management
Part II: Semantic Content
Management
(3)
Knowledge Interaction
and Presentation
(2)
Foundations of Semantic
Web Technologies
Part III: Methodologies
(7)
Requirements Engineering
for Semantic CMS
Representation
(4) Knowledge
and Reasoning
(8)
Designing
Semantic CMS
(5)
Semantic Lifting
(9)
Semantifying
your CMS
(6)
Storing and Accessing
Semantic Data
(10)
www.iks-project.eu
Designing Interactive
Ubiquitous IS
Copyright IKS Consortium
Page: 3
What is this Lecture about?
 We


... which languages can be used
to model knowledge.
... how to extract knowledge
from content in a automatic way
(semantic lifting).
 We

have learned ...
need a way ...
... to store the extracted
knowledge technically in an
accessible way.
www.iks-project.eu
Part II: Semantic Content
Management
(3)
Knowledge Interaction
and Presentation
Representation
(4) Knowledge
and Reasoning
(5)
Semantic Lifting
(6)
Storing and Accessing
Semantic Data
Copyright IKS Consortium
Page: 4
Outline
 Semantic


Semantic Web
RDF
 Semantic



Data Storage
Triple Stores
 Semantic

Data
Data Access
SPARQL
RQL
API Calls
www.iks-project.eu
Copyright IKS Consortium
Page: 5
Semantic Data
 Stands
for machine understandable information
 Allows computers to figure out the data without user
interference
 Allows computers act intelligently without programming
for each task
www.iks-project.eu
Copyright IKS Consortium
Page: 6
Semantic Data
 Provides

Applications find out subsequent information based on the
previous relations. (e.g. Eiffel Tower -> Paris -> France)
 Allows

infrastructure to get practical results
reasoning capabilities
Providing extraction of related information which is not
directly linked
www.iks-project.eu
Copyright IKS Consortium
Page: 7
Semantic Web
 A classical

“Web of data”
 Extends

generic description:
the World Wide Web
By encouraging,
 Common

language for representing data
Transformable to/from disparate sources such as relational
databases, XML, etc (RDF)
 Common
reusable data model to represent data from different
domains in common terms (RDFS, OWL, etc)
 Rules to enable applications reason over the information
(SWRL)
www.iks-project.eu
Copyright IKS Consortium
Page: 8
Semantic Web Layer Cake
Semantic Web Layer Cake, Image source: http://www.w3.org/2007/03/layerCake.svg
www.iks-project.eu
Copyright IKS Consortium
Page: 9
Semantic Web

So many organizations publishing their data in different
domains






Media
Geographic
Government
…
Whole set contains approximately 30 billion triples
One of the largest collections is DBPEDIA


Semantified version of Wikipedia
Example:


Obtain cities of China that have population over 20 million
Needs efficient storage and query for semantic data
www.iks-project.eu
Copyright IKS Consortium
Page: 10
Representation of Semantic
Data
 RDF



The common data format
An abstract model with several serialization formats
Consists of statement referred as triples having the form
(subject, predicate, object) where,
 Subject:
any resource identifier
 Predicate: a resource identifier of any property
 Object: either a resource identifier or a literal value
www.iks-project.eu
Copyright IKS Consortium
Page: 11
Storing Semantic Data
 Need
for specialized designs for triple collections
 Two modalities:


Relational databases
Triple stores
 Mostly

used for storage
Lots of implementations
 They
can also be RDB based.
www.iks-project.eu
Copyright IKS Consortium
Page: 12
Triple Store
 A purpose-built
database for the storage and retrieval of
RDF data.

Optimized place to add, remove and query for triples.
Each triple in the TripleStore complies with the form
(subject, predicate, object)
www.iks-project.eu
Copyright IKS Consortium
Page: 13
Considering XML Databases

XML databases are existing storage systems for semistructured data


Idea: Transform RDF to XML and store it in XML databases
Yet, XML data model is not exactly same with semantic data


XML data model is a tree-like structure
RDF data is represented through a graph without an hierarchy
www.iks-project.eu
Copyright IKS Consortium
Page: 14
Considering XML Databases

XML Databases are not suitable for storage and querying
RDF



Only simple manipulations can be handled through XML query
languages
RDF Schema processing and inference is not possible
Standard RDF/XML mapping is unsuitable
www.iks-project.eu
Copyright IKS Consortium
Page: 15
Monolithic approach for DB
Based Triple Stores
 Generic
representation for all RDF schemas
 Only two tables are used


Resources table
Triples table
www.iks-project.eu
Copyright IKS Consortium
Page: 16
Monolithic approach for DB
Based Triple Stores
predid
subid
objid
6
2
5
id
uri
1
1
http://www.iks.og/topics.rdfs#Hotel
3
7
2
http://www.iks.og/topics.rdfs#HotelDirections
5
1
8
3
http://www.oclc.org/dublincore.rdfs#title
5
9
2
4
http://www.iks.og/schema.rdf#Ext.Resource
3
9
5
http://www.w3.org/1999/02/22-rdf-syntax-ns#type
6
http://www.w3.org/2000/01/rdf-schema#subClassOf
7
http://www.w3.org/1999/02/22-rdf-syntaxns#Property
8
http://www.w3.org/2000/01/rdf-schema#Class
9
rl
www.iks-project.eu
objvalue
Sunscal
e
Copyright IKS Consortium
Page: 17
Triples Stores
 Can

be categorized into 3 category:
In memory triple stores
 Used

for certain operations like benchmarking, caching, etc
Native triple stores
 Provides
their own implementations (Virtuoso, Mulgara,
AllegroGraph, …)

Non memory non native triple stores
 Are
built on third party databases (Jena SDB, Kaon, …)
www.iks-project.eu
Copyright IKS Consortium
Page: 18
Functionalities provided by
Triple Stores




RDBMS-support
General RDF model access
Query language support in the store such as RQL,
SPARQL
Some stores provide:
 Provenance
- tracking of who-said-what
 APIs for accessing triple store over network

Very few stores provide:
 Full
text search
 Inference and rule languages
www.iks-project.eu
Copyright IKS Consortium
Page: 19
Example Triple Store implementations

RDF Suite

Sofia Alexaki, Vassilis Christophides, Gregory Karvounarakis,
Dimitris Plexousakis, Karsten Tolle. The ICS-FORTH RDFSuite:
Managing Voluminous RDF Description Bases , SemWeb, 2001
Based on an ORDBMS model
Sesame





Jena



http://www.openrdf.org/
Relational databases (mysql, postgres, oracle)
http://www.hpl.hp.com/semweb/jena2.htm
Relational databases (mysql , postgres, oracle)
Virtuoso
 http://virtuoso.openlinksw.com/
 Native RDF Quad Storage (Physical Quads)
www.iks-project.eu
Copyright IKS Consortium
Page: 20
RDFSuite (ICS-Forth)*
* IST-1999-13479 C-Web, IST-2000-26074 Mesmuses
www.iks-project.eu
Copyright IKS Consortium
Page: 21
How triples are stored and
accessed in RDF Suite
 Separate
tables are created to store resources
 Properties,
subClasses, subProperties and instances
 Indices
on attributes like URI, source and target
 Querying is possible through RQL
www.iks-project.eu
Copyright IKS Consortium
Page: 22
How triples are stored and
accessed in RDF Suite
[Figure from *]
www.iks-project.eu
Copyright IKS Consortium
*Sofia Alexaki, Vassilis Christophides, Gregory Karvounarakis, Dimitris Plexousakis, Karsten Tolle. The ICS-FORTH RDFSuite: Managing Voluminous RDF Description Bases , SemWeb, 2001
Page: 23
Sesame Architecture

DBMS-independent API for
accessing triple
repositories
 SAIL API



A set of Java interfaces
between other modules and
repository
Abstract from the actual
storage mechanism
Query Module
 RQL

support
Different ways to
communicate with clients
 Through
Protocol handlers
www.iks-project.eu
Copyright IKS Consortium
*Jeen Broekstra and Arjohn Kampman and Frank van Harmelen, Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema, Proceedings of the First International
Semantic Web Conference, 2002
Page: 24
SAIL API over PostgreSQL

PostgreSQL

Object-relational
DBMS



www.iks-project.eu
Support sub-table
relations between its
tables for providing
RDF Schema class
and property
subsumption
Individuals are
represented under
separate tables
created for resources
Difficult to add table
Copyright IKS Consortium
*Jeen Broekstra and Arjohn Kampman and Frank van Harmelen, Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema, Proceedings of the First International
Semantic Web Conference, 2002
Page: 25
SAIL API over MySQL

MySQL


www.iks-project.eu
The database
schema does
not change
when the
RDFS changes
Has advantage
where RDFS is
unstable
Copyright IKS Consortium
*Jeen Broekstra and Arjohn Kampman and Frank van Harmelen, Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema, Proceedings of the First International
Semantic Web Conference, 2002
Page: 26
Jena2 Architecture
www.iks-project.eu
Copyright IKS Consortium
Page: 27
Jena2 Architecture
www.iks-project.eu
Copyright IKS Consortium
*Kevin Wilkinson, Craig Sayers, Harumi A. Kuno, Dave Reynolds: Efficient RDF Storage and Retrieval in Jena2, Proceedings of SWDB'03, The first International Workshop on
Semantic Web and Databases
Page: 28
Jena2
 Jena2

Denormalized schema
 Avoids
unnecessary joins by merging URIs, literals in
statements table

Multiple statement tables
 Better

locality and caching
Property Tables
www.iks-project.eu
Copyright IKS Consortium
Page: 29
Normalized vs Denormalized
Tables
www.iks-project.eu
Copyright IKS Consortium
Page: 30
Property Tables
Triple Store Only
Subject
Property
Person Property Table
Object
ID
name
age
gender
person1
name
Alice
person1
age
32
person1
twinOf
person2
person1
faxPhone
x1234
person1
adminPh
x5678
person2
name
Bob
person1
twinOf
person2
person2
age
35
person1
faxPhone
x1234
person2
adopteeOf person6
person1
adminPh
x5678
person2
friendOf
person8
person2
adopteeOf
person6
person2
gender
male
person2
friendOf
person8
www.iks-project.eu
p1
Alice
32
-
p2
Bob
35
male
Triple Store
Subject
Property
Object
Copyright IKS Consortium
*Kevin Wilkinson, Craig Sayers, Harumi A. Kuno, Dave Reynolds: Efficient RDF Storage and Retrieval in Jena2, Proceedings of SWDB'03, The first International Workshop on
Semantic Web and Databases
Page: 31
Jena Persistence Options
 SDB




Scalable storage and query for RDF
Specifically designed for SPARQL support
Supports: MySQL, PostgreSQL, Oracle 11g, Microsoft
SQL server and IBM DB2
Scales to graphs of 100 million triples
www.iks-project.eu
Copyright IKS Consortium
Page: 32
Jena Persistence Options
 TDB




Provides for large scale storage and query of RDF
datasets using a pure Java engine
Supports SPARQL
A non-transactional, faster database solution for use by a
single system
It scales well beyond SDB and is simpler to setup
www.iks-project.eu
Copyright IKS Consortium
Page: 33
Virtuoso
 General
purpose RDBMS with extensive RDF
adaptations
 RDF data is stored as RDF quads, i.e. it supports RDF
with named graphs


i.e. graph, subject, predicate, object tuples
The columns are G for graph, P for predicate, S for subject
and O for object
www.iks-project.eu
Copyright IKS Consortium
Page: 34
Querying Semantic Data
 Semantic

data can be queried from triple stores by
Various query languages
 SPARQL

Different endpoints provided
 RQL
 RDQL
 SeRQL
…

API Calls
 Through

proprietary APIs of different projects
Linked Data
www.iks-project.eu
Copyright IKS Consortium
Page: 35
SPARQL
 Is


an RDF query language
Standardized by W3C consortium
Similar concept of SQL for databases
 Syntactically
resembles to SQL
 RDF Graphs instead of databases
www.iks-project.eu
Copyright IKS Consortium
Page: 36
SPARQL Endpoints
 Provides
functionality to query the knowledge base via
the SPARQL language
 Accepts queries and returns results through HTTP
protocol
 Query results can be in different formats such as
RDF
 XML
 HTML
 JSON
 CSV

www.iks-project.eu
Copyright IKS Consortium
Page: 37
Semantic Data Access With API
Calls
 Open
source projects provides APIs to manipulate RDF
data




Jena
Apache Clerezza
Sesame
JRDF
www.iks-project.eu
Copyright IKS Consortium
Page: 38
Jena
 Jena
provides a rich API to manipulate the RDF stored in
the underlying triple store.



Model to represent graphs
CRUD methods for triples
Querying methods for existing resources
 See
the next slide for the code snippet…
www.iks-project.eu
Copyright IKS Consortium
Page: 39
Jena Code Snippet
String personURI = "http://somewhere/JohnSmith";
String givenName = "John";
String familyName = "Smith";
String fullName = givenName + " " + familyName;
// create an empty Model which represents an RDF graph
Model model = ModelFactory.createDefaultModel();
// create the resource which will produce the triples in the next slide
Resource johnSmith
= model.createResource(personURI)
.addProperty(VCARD.FN, fullName)
.addProperty(VCARD.N,
model.createResource()
.addProperty(VCARD.Given, givenName)
.addProperty(VCARD.Family, familyName));
www.iks-project.eu
Copyright IKS Consortium
Page: 40
Jena
 Created
triples with the code snippet in previous slide:
(<http://somewhere/JohnSmith>, VCARD.FN, “John
Smith”)
(<http://somewhere/JohnSmith>, VCARD.FN, _)
(_, VCARD.Given, “John”)
(_, VCARD.Family, “Smith”)
• Note that _ symbol represents a blank node
www.iks-project.eu
Copyright IKS Consortium
Page: 41
Apache Clerezza
 Provides
an API regardless from the different triples
stores it supports
 Its API provides a model to represent RDF graphs and
manipulate those graphs
 Also provides an SPARQL endpoint to query the stored
knowledge
www.iks-project.eu
Copyright IKS Consortium
Page: 42
Apache Clerezza Code Snippet

Simple code snippet adding two triples to the graph:
String base = “http://www.example.org#”;
MGraph g = new SimpleMGraph();
g.add( new TripleImpl(
new UriRef(base + “JohnSmith”),
new UriRef(rdf:Type)
new UriRef(foaf:Person)));
g.add( new TripleImpl(
new UriRef(base + “JohnSmith”),
new UriRef(VCARD:FN)
LiteralFactory.getInstance().createTypedLiteral(“John”)));
www.iks-project.eu
Copyright IKS Consortium
Page: 43
Linked Data
 Interrelated
datasets on the Web so that computers can
explore them
 Has a standard format to be accessed and managed
 Provides integration and reasoning on a huge amount
of data on the Web
www.iks-project.eu
Copyright IKS Consortium
Page: 44
Linked Data
 Four
famous principles of linked data represented by
Tim Berners-Lee




Use URIs as names of things
Use HTTP URIs to provide dereferencable data to people
When an URI is dereferenced provide useful information in
standard format (RDF, SPARQL)
Provide links to other URIs to make possible discovery of
related data
www.iks-project.eu
Copyright IKS Consortium
Page: 45
Linked Data
www.iks-project.eu
Copyright IKS Consortium
Page: 46
Linking Open Data Project
 Is
an W3C SWEO Project
 Aims to make data freely to everyone
 Aims to publish open data sets as RDF and set
semantic relationships between them



Serves information in a machine readable format
Enriches content
Reduces duplication
 Linked

datasets increasing rapidly
A large number of datasets are linked already
www.iks-project.eu
Copyright IKS Consortium
Page: 47
Linked Datasets As of October
2008
www.iks-project.eu
Copyright IKS Consortium
Page: 48
Linked Datasets As of September
2010
www.iks-project.eu
Copyright IKS Consortium
Page: 49
2011
www.iks-project.eu
Copyright IKS Consortium
Page: 50
Access Data In The Cloud
 Follow
the RDF links representing the “things”
 SPARQL Endpoints
 Ready to use software to discover linked data (See the
next slide)
www.iks-project.eu
Copyright IKS Consortium
Page: 51
Linked Data Applications

Lots of application on top of the linked data





Just google



Tabulator
Marbles
Openlink RDF Browser
…
RDF Crawlers
RDF Browsers
Also see the following link containing a number of linked data
applications:

http://www.w3.org/wiki/SweoIG/TaskForces/CommunityProjects/
LinkingOpenData/Applications
www.iks-project.eu
Copyright IKS Consortium
Page: 52
Available SPARQL Endpoints
 http://dbpedia.org/sparql
 http://www4.wiwiss.fu-berlin.de/dblp/
 To
see possible SPARQL endpoints providing a certain
URI see

http://void.rkbexplorer.com/endpoint-search/
 See

also a list of alive SPARQL endpoints
http://www.w3.org/wiki/SparqlEndpoints
www.iks-project.eu
Copyright IKS Consortium
Page: 53
References













http://www.w3.org/TR/rdf-sparql-query
http://jena.sourceforge.net/tutorial/RDF_API/index.html
http://www.slideshare.net/ldodds/sparql-tutorial
http://www.slideshare.net/shamod/a-hands-on-overview-of-the-semanticweb?src=related_normal&rel=1702851
http://www.cambridgesemantics.com/2008/09/sparql-by-example
http://linkeddata-specs.info/
http://www.w3.org/wiki/SweoIG/TaskForces/CommunityProjects/LinkingOpenData
http://www.bioontology.org/wiki/images/6/6a/Triple_Stores.pdf
Sofia Alexaki, Vassilis Christophides, Gregory Karvounarakis, Dimitris Plexousakis, Karsten Tolle. The
ICS-FORTH RDFSuite: Managing Voluminous RDF Description Bases , SemWeb, 2001
Jeen Broekstra and Arjohn Kampman and Frank van Harmelen, Sesame: A Generic Architecture for
Storing and Querying RDF and RDF Schema, Proceedings of the First International, Semantic Web
Conference, 2002
Kevin Wilkinson, Craig Sayers, Harumi A. Kuno, Dave Reynolds: Efficient RDF Storage and Retrieval in
Jena2, Proceedings of SWDB'03, The first International Workshop on Semantic Web and Databases
http://jena.sourceforge.net/DB/index.html
http://virtuoso.openlinksw.com/
www.iks-project.eu
Copyright IKS Consortium
Download