Features of an Enterprise-ready Triple Store Ben Szekely June, 2006

advertisement
Features of an Enterprise-ready Triple Store
Ben Szekely
June, 2006
© 2006 IBM Corporation
IBM Internet Technology
Most examples of RDF triple stores focus on specific
difficult problems
 Focused on inference or standards
 Preoccupied with “Billions of Triples”
 Little thought given to application
programming model.
 Not multi-user (limited security)
Features of an Enterprise-ready Triple Store – Metadata and Ontologies Workshop
© 2006 IBM Corporation
IBM Internet Technology
Boca Overview – Multi-user, distributed enterprise RDF
repository
 Selective RDF replication from
server to client machines
 Security, including named-graphbased RDF access control
 Audit trails of changes to data
within named graphs
 Near real-time event notifications
 Sophisticated programming model
Features of an Enterprise-ready Triple Store – Metadata and Ontologies Workshop
© 2006 IBM Corporation
IBM Internet Technology
Named Graphs
 A named graph is the logical unit of RDF storage in Boca.
 Each triple exists in exactly one named graph
–
If a triple exists in more than one named graph, it exists twice.
–
Adding and removing triples is done in the context of a named graph
 Each named graph has a metadata graph, containing information
such as ACLs
 Named graphs can be exposed via LSIDs, URLs, Web Services
 Named Graph applications
–
LSID metadata
–
Workflow documents
–
Atom feeds
–
FOAF profiles
Features of an Enterprise-ready Triple Store – Metadata and Ontologies Workshop
© 2006 IBM Corporation
IBM Internet Technology
Underlying Technologies
 Relational Database (DB2, Oracle, MySQL)
–
RDF triples stored in a table (subject, predicate, object, graphid)
–
Space saved by normalizing URIs and strings to integer ids.
–
Extra tables for history, ACLs, replication
 J2EE (Jetty, Tomcat, WebSphere)
–
Jetty: Standalone server, checkout from CVS and run for testing
–
WAS: Enterprise-ready Web-application server for real deployment
 JMS Server (Active MQ, WebSphere MQ)
–
pub-sub messaging used for real-time notifications of triple updates.
Features of an Enterprise-ready Triple Store – Metadata and Ontologies Workshop
© 2006 IBM Corporation
IBM Internet Technology
Replication
 Boca clients have a persistent local RDF store that mirrors a subset
of the triples on the Boca server.
 Replicated subset specified by:
–
Triple patterns; e.g.
(<http://tdwg.org/meetings/GUID-2#>, <http://tdwg.org/preds/hasParticipant>,*)
–
Named graph URIs
–
Triple patterns within named graphs
 When a replication is initiated, the service computes what has
changed in the subset based on pattern and graph subscriptions.
 Replication can work as a background process on the client, or be
explicitly initiated.
 Applications can query/write against graphs in the local and server
models.
Features of an Enterprise-ready Triple Store – Metadata and Ontologies Workshop
© 2006 IBM Corporation
IBM Internet Technology
Notification – maintaining the replica in real-time
 Updates to named graphs on server are published in near real-time
to clients.
 Local replicas can be kept up-to-date between replications.
 Notification is central to distributed RDF applications
–
Ex: workflow, collaboration
Features of an Enterprise-ready Triple Store – Metadata and Ontologies Workshop
© 2006 IBM Corporation
IBM Internet Technology
Access Controls
 Boca uses can have the following system-wide permissions:
–
canInsertNamedGraphs -- a user must have this permission in order to create a
new named graph (i.e. insert statements into a graph that does not yet exist in the
system)
 Boca users can have the following per-named-graph permissions
(these apply also to the system graph):
–
canRead -- a user with this permission may view the triples in the named graph
and in its metadata graph
–
canAdd -- a user with this permission may insert new triples into the named
graph
–
canRemove -- a user with this permission may remove triples from the named
graph
–
canChangeNamedGraphACL -- a user with this permission may change the
ACL triples in the metadata graph
–
canRemoveNamedGraph -- a user with this permission may entirely remove the
named graph from the system
Features of an Enterprise-ready Triple Store – Metadata and Ontologies Workshop
© 2006 IBM Corporation
IBM Internet Technology
Versioning
 SVN-like approach to versioning
 When a triple is added to or removed from a named graph, a new
revision of that named graph is created.
 Simple API for reading old revisions
 Provides a straightforward mechanism for concurrent distributed
computing.
–
When a client submits an update to a named graph, it may specify the version
number that it currently has. The update will fail if the graph has been more
recently modified.
Features of an Enterprise-ready Triple Store – Metadata and Ontologies Workshop
© 2006 IBM Corporation
IBM Internet Technology
The Boca Programming Model
 Named Graphs
 Commands
 Transactions
 Versioning
 Replication
 Notification
Features of an Enterprise-ready Triple Store – Metadata and Ontologies Workshop
© 2006 IBM Corporation
IBM Internet Technology
Abandoned features – Collections, Statement ACLs &
Reification
 Collections – a statement can exist in multiple collections
–
A more difficult programming model, what happens when I delete in the context of one
collection?
–
Expensive to maintain
–
Not a widely accepted programming model (as named graphs are)
 Statement-level ACLs
–
Too expensive
–
Difficult to program
–
Not particularly useful, other than the odd, very important statement
– In that case, such a statement can live in its own named graph
 Reification
–
Queries were very difficult to formulate
–
Most RDF applications do not deal with reification
–
Reification semantics often confused with true quoting
–
Reification is an arbitrary layer of indirection that can be solved with ontologies
Features of an Enterprise-ready Triple Store – Metadata and Ontologies Workshop
© 2006 IBM Corporation
IBM Internet Technology
Future Features
 Arbitrary query-based replication/notification
 Distributed servers
 Open source
Features of an Enterprise-ready Triple Store – Metadata and Ontologies Workshop
© 2006 IBM Corporation
IBM Internet Technology
Applications
 Executing OWL-S in a distributed fashion
 Storing annotations
 Providing LSID metadata
 Web 2.0 application backend
–
Wikis, Blogs, Tagging, Atom
 National Cancer Institute research platform
Features of an Enterprise-ready Triple Store – Metadata and Ontologies Workshop
© 2006 IBM Corporation
Download