RDF Databases

advertisement
RDF Databases
By:
Chris Halaschek
Outline



Motivation / Requirements
Storage Issues
Sesame






General Introduction
Architecture
Scalability
RQL Introduction
Demo
Future Directions
Motivation

Having metadata available is not enough


Need tools to process, transform, and reason with
the information
Need a way to store the metadata and
interact with it
Requirements



Scalable
Good performance
Useful query language
Storage Issues

How to store the data?

In relational database as tables



Querying requires many joins…costly
Triples
Native graph structure

Querying requires graph traversals…need efficient
algorithms
Sesame - Introduction



Open source RDF Schema-based
repository and querying facility
Developed as a research prototype by
Aidministrator Nederland bv
NLnet Foundation sponsors its further
development as open source software
Sesame - Introduction


Can handle RDF data in XML-serialized
RDF and N-Triples format
Can extract the contents of a Sesame
repository in XML-serialized RDF, NTriples, and N3 format
Sesame – Architecture
Repository

Many options due to Repository Abstraction
Layer (RAL)




DBMS – relational, object-relational, etc
Existing RDF stores
RDF files
RDF network services
Repository Abstraction Layer
(RAL)


Interface that translates RDF-specific
methods to a specific DBMS
Defined by an RDF API

Created their own set of interfaces rather than
adopt or extent the existing RDF API proposal


Existing API targeted main memory model
Theirs offers specific operations that support RDF
Schema semantics (i.e. subsumption reasoning)
RAL Continued


Several of Sesame’s functional modules are
clients of the RAL
Problems:

Must read from repository – performance
decrease


Solution – selectively caching data in memory
For small repositories, all data can be cached
Functional Modules

Interact with RAL

RQL query module


RDF administration module


Evaluates RQL queries
Allows uploading RDF data and schema information,
as well as deleting information
RDF export module

Allows extraction of schema and/or data from
repository
RQL Query Module

Proposed RQL:





Sesame’s implementation of RQL is slightly different
from the proposed RQL
Better compliance to W3C specificaitons



Developed within the European IST project C-Web
Follow-up project by ICS at FORTH, in Greece
Adopts the syntax of OQL
Support for optional domain and range restrictions
Queries are translated into sets of call to the RAL
Note: Also supports RDQL – based on SquishQL
RQL Query Module
Admin Module

Main functions:





Add RDF data/schema information
Clear repository
Retrieves information from an RDF(s) source
and parses it using SiRPAC RDF parser
Parser delivers information to admin module
in statement form – (S,P,O)
Module check statements for consistency and
then inserts data
RDF Export Module


Exports the contents of a repository formatted
in XML-serialized RDF
Supplies a basis for using Sesame in
combination with other RDF tools
Communication with Sesame

Multiple options for various contexts




HTTP
RMI
SOAP
Intermediaries between the functional
modules and their clients
Sesame – Architecture
Sesame - Scalability

Performance Tests




Uploaded and queried collection of nouns from
Wordnet – 400,000 RDF statements
Performed on Sun UltraSPARC 5, 256 MB RAM
Used Java Servlets running on web server to
communicate of HTTP
PostgreSQL version 7.1.2 repository
Scalability Continued

Uploading nouns



94 minutes
71 statements per second
Querying was much slower than expected

Due to distributed storage over multiple tables

Retrieving data required doing many joins
Sesame’s Future


Migration of Sesame to alternate repositories
to boost performance
DAML + OIL support
RQL Introduction

Museum schema example
RQL - Syntax

Query typically built upon three clauses

Select


From


Projection over query results
Bind variables to specific locations in graph model
Where

Optional – constraint on values of variables in the from
clause
RQL - Example
select X, @P
from {X} @P {Y}
where Y like "Pablo"




x and y are bound to nodes
@P bound to a connecting edge - @ prefix signifies the
variable is bound to properties
$ prefix signifies classes
http://sesame.aidministrator.nl/sesame/actionFrameset.jsp
?repository=museum
RQL - Namespaces



In RDF, nodes and edges are identified by
URIs
Can be very long
Namespace abbreviation mechanism

Extra clause


using namespace
cult = http://www.icom.com/schema.rdf#
Simply type: cult:paints
RQL – Path Expressions

Specify a linear path through the graph
select PAINTER, PAINTING, TECH
from {PAINTER} cult:paints {PAINTING}. cult:technique {TECH}
using namespace cult = http://www.icom.com/schema.rdf#

http://sesame.aidministrator.nl/sesame/actionFramese
t.jsp?repository=museum
RQL – Querying Schema

Retrieving the class of a resource
select X, $X, Y
from {X : $X} cult:paints {Y}
using namespace cult = http://www.icom.com/schema.rdf#

Variable $X is matched to the class of the
resource value of X

http://sesame.aidministrator.nl/sesame/actionFramese
t.jsp?repository=museum
RQL – Querying Schema

Constraining resources to a schema
select X, Y
from {X : cult:Cubist } cult:paints {Y}
using namespace cult = http://www.icom.com/schema.rdf#
RQL – Standard Functions




Class (also Property)
subClassOf (also subProperyOf)
typeOf
In all above use ^ for only direct descendents
(i.e. subClassOf^( cult:Painter ) )
RQL – subClassOf

Example:
select X, @P, Y
from {X} @P {Y}
where X in subClassOf^( cult:Painter )
using namespace cult = http://www.icom.com/schema.rdf#
RQL – Advanced Queries

Set Operators




Union, Intersection, Difference
Logical Operators
Domain and Range Constraints
Comprehensive List:
http://sesame.aidministrator.nl/publications/rql-tutorial.html
Future of RDF Databases


Standard query language
Improved storage structures

Native graph model
References / Links



Sesame:
http://sesame.aidministrator.nl/
NLnet Foundation:
http://www.nlnet.nl/
Original Specifications of RQL:
http://139.91.183.30:9090/RDF/RQL
Download