RDF Databases By: Chris Halaschek Outline Motivation / Requirements Storage Issues Sesame General Introduction Architecture Scalability RQL Introduction Demo Future Directions Motivation Having metadata available is not enough Need tools to process, transform, and reason with the information Need a way to store the metadata and interact with it Requirements Scalable Good performance Useful query language Storage Issues How to store the data? In relational database as tables Querying requires many joins…costly Triples Native graph structure Querying requires graph traversals…need efficient algorithms Sesame - Introduction Open source RDF Schema-based repository and querying facility Developed as a research prototype by Aidministrator Nederland bv NLnet Foundation sponsors its further development as open source software Sesame - Introduction Can handle RDF data in XML-serialized RDF and N-Triples format Can extract the contents of a Sesame repository in XML-serialized RDF, NTriples, and N3 format Sesame – Architecture Repository Many options due to Repository Abstraction Layer (RAL) DBMS – relational, object-relational, etc Existing RDF stores RDF files RDF network services Repository Abstraction Layer (RAL) Interface that translates RDF-specific methods to a specific DBMS Defined by an RDF API Created their own set of interfaces rather than adopt or extent the existing RDF API proposal Existing API targeted main memory model Theirs offers specific operations that support RDF Schema semantics (i.e. subsumption reasoning) RAL Continued Several of Sesame’s functional modules are clients of the RAL Problems: Must read from repository – performance decrease Solution – selectively caching data in memory For small repositories, all data can be cached Functional Modules Interact with RAL RQL query module RDF administration module Evaluates RQL queries Allows uploading RDF data and schema information, as well as deleting information RDF export module Allows extraction of schema and/or data from repository RQL Query Module Proposed RQL: Sesame’s implementation of RQL is slightly different from the proposed RQL Better compliance to W3C specificaitons Developed within the European IST project C-Web Follow-up project by ICS at FORTH, in Greece Adopts the syntax of OQL Support for optional domain and range restrictions Queries are translated into sets of call to the RAL Note: Also supports RDQL – based on SquishQL RQL Query Module Admin Module Main functions: Add RDF data/schema information Clear repository Retrieves information from an RDF(s) source and parses it using SiRPAC RDF parser Parser delivers information to admin module in statement form – (S,P,O) Module check statements for consistency and then inserts data RDF Export Module Exports the contents of a repository formatted in XML-serialized RDF Supplies a basis for using Sesame in combination with other RDF tools Communication with Sesame Multiple options for various contexts HTTP RMI SOAP Intermediaries between the functional modules and their clients Sesame – Architecture Sesame - Scalability Performance Tests Uploaded and queried collection of nouns from Wordnet – 400,000 RDF statements Performed on Sun UltraSPARC 5, 256 MB RAM Used Java Servlets running on web server to communicate of HTTP PostgreSQL version 7.1.2 repository Scalability Continued Uploading nouns 94 minutes 71 statements per second Querying was much slower than expected Due to distributed storage over multiple tables Retrieving data required doing many joins Sesame’s Future Migration of Sesame to alternate repositories to boost performance DAML + OIL support RQL Introduction Museum schema example RQL - Syntax Query typically built upon three clauses Select From Projection over query results Bind variables to specific locations in graph model Where Optional – constraint on values of variables in the from clause RQL - Example select X, @P from {X} @P {Y} where Y like "Pablo" x and y are bound to nodes @P bound to a connecting edge - @ prefix signifies the variable is bound to properties $ prefix signifies classes http://sesame.aidministrator.nl/sesame/actionFrameset.jsp ?repository=museum RQL - Namespaces In RDF, nodes and edges are identified by URIs Can be very long Namespace abbreviation mechanism Extra clause using namespace cult = http://www.icom.com/schema.rdf# Simply type: cult:paints RQL – Path Expressions Specify a linear path through the graph select PAINTER, PAINTING, TECH from {PAINTER} cult:paints {PAINTING}. cult:technique {TECH} using namespace cult = http://www.icom.com/schema.rdf# http://sesame.aidministrator.nl/sesame/actionFramese t.jsp?repository=museum RQL – Querying Schema Retrieving the class of a resource select X, $X, Y from {X : $X} cult:paints {Y} using namespace cult = http://www.icom.com/schema.rdf# Variable $X is matched to the class of the resource value of X http://sesame.aidministrator.nl/sesame/actionFramese t.jsp?repository=museum RQL – Querying Schema Constraining resources to a schema select X, Y from {X : cult:Cubist } cult:paints {Y} using namespace cult = http://www.icom.com/schema.rdf# RQL – Standard Functions Class (also Property) subClassOf (also subProperyOf) typeOf In all above use ^ for only direct descendents (i.e. subClassOf^( cult:Painter ) ) RQL – subClassOf Example: select X, @P, Y from {X} @P {Y} where X in subClassOf^( cult:Painter ) using namespace cult = http://www.icom.com/schema.rdf# RQL – Advanced Queries Set Operators Union, Intersection, Difference Logical Operators Domain and Range Constraints Comprehensive List: http://sesame.aidministrator.nl/publications/rql-tutorial.html Future of RDF Databases Standard query language Improved storage structures Native graph model References / Links Sesame: http://sesame.aidministrator.nl/ NLnet Foundation: http://www.nlnet.nl/ Original Specifications of RQL: http://139.91.183.30:9090/RDF/RQL