Semantic Data Access Semantic CMS Community Lecturer Organization Date of presentation Co-funded by the European Union 1 Copyright IKS Consortium Page: Part I: Foundations (1) Introduction of Content Management Part II: Semantic Content Management (3) Knowledge Interaction and Presentation (2) Foundations of Semantic Web Technologies Part III: Methodologies (7) Requirements Engineering for Semantic CMS Representation (4) Knowledge and Reasoning (8) Designing Semantic CMS (5) Semantic Lifting (9) Semantifying your CMS (6) Storing and Accessing Semantic Data (10) www.iks-project.eu Designing Interactive Ubiquitous IS Copyright IKS Consortium Page: 3 What is this Lecture about? We ... which languages can be used to model knowledge. ... how to extract knowledge from content in a automatic way (semantic lifting). We have learned ... need a way ... ... to store the extracted knowledge technically in an accessible way. www.iks-project.eu Part II: Semantic Content Management (3) Knowledge Interaction and Presentation Representation (4) Knowledge and Reasoning (5) Semantic Lifting (6) Storing and Accessing Semantic Data Copyright IKS Consortium Page: 4 Outline Semantic Semantic Web RDF Semantic Data Storage Triple Stores Semantic Data Data Access SPARQL RQL API Calls www.iks-project.eu Copyright IKS Consortium Page: 5 Semantic Data Stands for machine understandable information Allows computers to figure out the data without user interference Allows computers act intelligently without programming for each task www.iks-project.eu Copyright IKS Consortium Page: 6 Semantic Data Provides Applications find out subsequent information based on the previous relations. (e.g. Eiffel Tower -> Paris -> France) Allows infrastructure to get practical results reasoning capabilities Providing extraction of related information which is not directly linked www.iks-project.eu Copyright IKS Consortium Page: 7 Semantic Web A classical “Web of data” Extends generic description: the World Wide Web By encouraging, Common language for representing data Transformable to/from disparate sources such as relational databases, XML, etc (RDF) Common reusable data model to represent data from different domains in common terms (RDFS, OWL, etc) Rules to enable applications reason over the information (SWRL) www.iks-project.eu Copyright IKS Consortium Page: 8 Semantic Web Layer Cake Semantic Web Layer Cake, Image source: http://www.w3.org/2007/03/layerCake.svg www.iks-project.eu Copyright IKS Consortium Page: 9 Semantic Web So many organizations publishing their data in different domains Media Geographic Government … Whole set contains approximately 30 billion triples One of the largest collections is DBPEDIA Semantified version of Wikipedia Example: Obtain cities of China that have population over 20 million Needs efficient storage and query for semantic data www.iks-project.eu Copyright IKS Consortium Page: 10 Representation of Semantic Data RDF The common data format An abstract model with several serialization formats Consists of statement referred as triples having the form (subject, predicate, object) where, Subject: any resource identifier Predicate: a resource identifier of any property Object: either a resource identifier or a literal value www.iks-project.eu Copyright IKS Consortium Page: 11 Storing Semantic Data Need for specialized designs for triple collections Two modalities: Relational databases Triple stores Mostly used for storage Lots of implementations They can also be RDB based. www.iks-project.eu Copyright IKS Consortium Page: 12 Triple Store A purpose-built database for the storage and retrieval of RDF data. Optimized place to add, remove and query for triples. Each triple in the TripleStore complies with the form (subject, predicate, object) www.iks-project.eu Copyright IKS Consortium Page: 13 Considering XML Databases XML databases are existing storage systems for semistructured data Idea: Transform RDF to XML and store it in XML databases Yet, XML data model is not exactly same with semantic data XML data model is a tree-like structure RDF data is represented through a graph without an hierarchy www.iks-project.eu Copyright IKS Consortium Page: 14 Considering XML Databases XML Databases are not suitable for storage and querying RDF Only simple manipulations can be handled through XML query languages RDF Schema processing and inference is not possible Standard RDF/XML mapping is unsuitable www.iks-project.eu Copyright IKS Consortium Page: 15 Monolithic approach for DB Based Triple Stores Generic representation for all RDF schemas Only two tables are used Resources table Triples table www.iks-project.eu Copyright IKS Consortium Page: 16 Monolithic approach for DB Based Triple Stores predid subid objid 6 2 5 id uri 1 1 http://www.iks.og/topics.rdfs#Hotel 3 7 2 http://www.iks.og/topics.rdfs#HotelDirections 5 1 8 3 http://www.oclc.org/dublincore.rdfs#title 5 9 2 4 http://www.iks.og/schema.rdf#Ext.Resource 3 9 5 http://www.w3.org/1999/02/22-rdf-syntax-ns#type 6 http://www.w3.org/2000/01/rdf-schema#subClassOf 7 http://www.w3.org/1999/02/22-rdf-syntaxns#Property 8 http://www.w3.org/2000/01/rdf-schema#Class 9 rl www.iks-project.eu objvalue Sunscal e Copyright IKS Consortium Page: 17 Triples Stores Can be categorized into 3 category: In memory triple stores Used for certain operations like benchmarking, caching, etc Native triple stores Provides their own implementations (Virtuoso, Mulgara, AllegroGraph, …) Non memory non native triple stores Are built on third party databases (Jena SDB, Kaon, …) www.iks-project.eu Copyright IKS Consortium Page: 18 Functionalities provided by Triple Stores RDBMS-support General RDF model access Query language support in the store such as RQL, SPARQL Some stores provide: Provenance - tracking of who-said-what APIs for accessing triple store over network Very few stores provide: Full text search Inference and rule languages www.iks-project.eu Copyright IKS Consortium Page: 19 Example Triple Store implementations RDF Suite Sofia Alexaki, Vassilis Christophides, Gregory Karvounarakis, Dimitris Plexousakis, Karsten Tolle. The ICS-FORTH RDFSuite: Managing Voluminous RDF Description Bases , SemWeb, 2001 Based on an ORDBMS model Sesame Jena http://www.openrdf.org/ Relational databases (mysql, postgres, oracle) http://www.hpl.hp.com/semweb/jena2.htm Relational databases (mysql , postgres, oracle) Virtuoso http://virtuoso.openlinksw.com/ Native RDF Quad Storage (Physical Quads) www.iks-project.eu Copyright IKS Consortium Page: 20 RDFSuite (ICS-Forth)* * IST-1999-13479 C-Web, IST-2000-26074 Mesmuses www.iks-project.eu Copyright IKS Consortium Page: 21 How triples are stored and accessed in RDF Suite Separate tables are created to store resources Properties, subClasses, subProperties and instances Indices on attributes like URI, source and target Querying is possible through RQL www.iks-project.eu Copyright IKS Consortium Page: 22 How triples are stored and accessed in RDF Suite [Figure from *] www.iks-project.eu Copyright IKS Consortium *Sofia Alexaki, Vassilis Christophides, Gregory Karvounarakis, Dimitris Plexousakis, Karsten Tolle. The ICS-FORTH RDFSuite: Managing Voluminous RDF Description Bases , SemWeb, 2001 Page: 23 Sesame Architecture DBMS-independent API for accessing triple repositories SAIL API A set of Java interfaces between other modules and repository Abstract from the actual storage mechanism Query Module RQL support Different ways to communicate with clients Through Protocol handlers www.iks-project.eu Copyright IKS Consortium *Jeen Broekstra and Arjohn Kampman and Frank van Harmelen, Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema, Proceedings of the First International Semantic Web Conference, 2002 Page: 24 SAIL API over PostgreSQL PostgreSQL Object-relational DBMS www.iks-project.eu Support sub-table relations between its tables for providing RDF Schema class and property subsumption Individuals are represented under separate tables created for resources Difficult to add table Copyright IKS Consortium *Jeen Broekstra and Arjohn Kampman and Frank van Harmelen, Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema, Proceedings of the First International Semantic Web Conference, 2002 Page: 25 SAIL API over MySQL MySQL www.iks-project.eu The database schema does not change when the RDFS changes Has advantage where RDFS is unstable Copyright IKS Consortium *Jeen Broekstra and Arjohn Kampman and Frank van Harmelen, Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema, Proceedings of the First International Semantic Web Conference, 2002 Page: 26 Jena2 Architecture www.iks-project.eu Copyright IKS Consortium Page: 27 Jena2 Architecture www.iks-project.eu Copyright IKS Consortium *Kevin Wilkinson, Craig Sayers, Harumi A. Kuno, Dave Reynolds: Efficient RDF Storage and Retrieval in Jena2, Proceedings of SWDB'03, The first International Workshop on Semantic Web and Databases Page: 28 Jena2 Jena2 Denormalized schema Avoids unnecessary joins by merging URIs, literals in statements table Multiple statement tables Better locality and caching Property Tables www.iks-project.eu Copyright IKS Consortium Page: 29 Normalized vs Denormalized Tables www.iks-project.eu Copyright IKS Consortium Page: 30 Property Tables Triple Store Only Subject Property Person Property Table Object ID name age gender person1 name Alice person1 age 32 person1 twinOf person2 person1 faxPhone x1234 person1 adminPh x5678 person2 name Bob person1 twinOf person2 person2 age 35 person1 faxPhone x1234 person2 adopteeOf person6 person1 adminPh x5678 person2 friendOf person8 person2 adopteeOf person6 person2 gender male person2 friendOf person8 www.iks-project.eu p1 Alice 32 - p2 Bob 35 male Triple Store Subject Property Object Copyright IKS Consortium *Kevin Wilkinson, Craig Sayers, Harumi A. Kuno, Dave Reynolds: Efficient RDF Storage and Retrieval in Jena2, Proceedings of SWDB'03, The first International Workshop on Semantic Web and Databases Page: 31 Jena Persistence Options SDB Scalable storage and query for RDF Specifically designed for SPARQL support Supports: MySQL, PostgreSQL, Oracle 11g, Microsoft SQL server and IBM DB2 Scales to graphs of 100 million triples www.iks-project.eu Copyright IKS Consortium Page: 32 Jena Persistence Options TDB Provides for large scale storage and query of RDF datasets using a pure Java engine Supports SPARQL A non-transactional, faster database solution for use by a single system It scales well beyond SDB and is simpler to setup www.iks-project.eu Copyright IKS Consortium Page: 33 Virtuoso General purpose RDBMS with extensive RDF adaptations RDF data is stored as RDF quads, i.e. it supports RDF with named graphs i.e. graph, subject, predicate, object tuples The columns are G for graph, P for predicate, S for subject and O for object www.iks-project.eu Copyright IKS Consortium Page: 34 Querying Semantic Data Semantic data can be queried from triple stores by Various query languages SPARQL Different endpoints provided RQL RDQL SeRQL … API Calls Through proprietary APIs of different projects Linked Data www.iks-project.eu Copyright IKS Consortium Page: 35 SPARQL Is an RDF query language Standardized by W3C consortium Similar concept of SQL for databases Syntactically resembles to SQL RDF Graphs instead of databases www.iks-project.eu Copyright IKS Consortium Page: 36 SPARQL Endpoints Provides functionality to query the knowledge base via the SPARQL language Accepts queries and returns results through HTTP protocol Query results can be in different formats such as RDF XML HTML JSON CSV www.iks-project.eu Copyright IKS Consortium Page: 37 Semantic Data Access With API Calls Open source projects provides APIs to manipulate RDF data Jena Apache Clerezza Sesame JRDF www.iks-project.eu Copyright IKS Consortium Page: 38 Jena Jena provides a rich API to manipulate the RDF stored in the underlying triple store. Model to represent graphs CRUD methods for triples Querying methods for existing resources See the next slide for the code snippet… www.iks-project.eu Copyright IKS Consortium Page: 39 Jena Code Snippet String personURI = "http://somewhere/JohnSmith"; String givenName = "John"; String familyName = "Smith"; String fullName = givenName + " " + familyName; // create an empty Model which represents an RDF graph Model model = ModelFactory.createDefaultModel(); // create the resource which will produce the triples in the next slide Resource johnSmith = model.createResource(personURI) .addProperty(VCARD.FN, fullName) .addProperty(VCARD.N, model.createResource() .addProperty(VCARD.Given, givenName) .addProperty(VCARD.Family, familyName)); www.iks-project.eu Copyright IKS Consortium Page: 40 Jena Created triples with the code snippet in previous slide: (<http://somewhere/JohnSmith>, VCARD.FN, “John Smith”) (<http://somewhere/JohnSmith>, VCARD.FN, _) (_, VCARD.Given, “John”) (_, VCARD.Family, “Smith”) • Note that _ symbol represents a blank node www.iks-project.eu Copyright IKS Consortium Page: 41 Apache Clerezza Provides an API regardless from the different triples stores it supports Its API provides a model to represent RDF graphs and manipulate those graphs Also provides an SPARQL endpoint to query the stored knowledge www.iks-project.eu Copyright IKS Consortium Page: 42 Apache Clerezza Code Snippet Simple code snippet adding two triples to the graph: String base = “http://www.example.org#”; MGraph g = new SimpleMGraph(); g.add( new TripleImpl( new UriRef(base + “JohnSmith”), new UriRef(rdf:Type) new UriRef(foaf:Person))); g.add( new TripleImpl( new UriRef(base + “JohnSmith”), new UriRef(VCARD:FN) LiteralFactory.getInstance().createTypedLiteral(“John”))); www.iks-project.eu Copyright IKS Consortium Page: 43 Linked Data Interrelated datasets on the Web so that computers can explore them Has a standard format to be accessed and managed Provides integration and reasoning on a huge amount of data on the Web www.iks-project.eu Copyright IKS Consortium Page: 44 Linked Data Four famous principles of linked data represented by Tim Berners-Lee Use URIs as names of things Use HTTP URIs to provide dereferencable data to people When an URI is dereferenced provide useful information in standard format (RDF, SPARQL) Provide links to other URIs to make possible discovery of related data www.iks-project.eu Copyright IKS Consortium Page: 45 Linked Data www.iks-project.eu Copyright IKS Consortium Page: 46 Linking Open Data Project Is an W3C SWEO Project Aims to make data freely to everyone Aims to publish open data sets as RDF and set semantic relationships between them Serves information in a machine readable format Enriches content Reduces duplication Linked datasets increasing rapidly A large number of datasets are linked already www.iks-project.eu Copyright IKS Consortium Page: 47 Linked Datasets As of October 2008 www.iks-project.eu Copyright IKS Consortium Page: 48 Linked Datasets As of September 2010 www.iks-project.eu Copyright IKS Consortium Page: 49 2011 www.iks-project.eu Copyright IKS Consortium Page: 50 Access Data In The Cloud Follow the RDF links representing the “things” SPARQL Endpoints Ready to use software to discover linked data (See the next slide) www.iks-project.eu Copyright IKS Consortium Page: 51 Linked Data Applications Lots of application on top of the linked data Just google Tabulator Marbles Openlink RDF Browser … RDF Crawlers RDF Browsers Also see the following link containing a number of linked data applications: http://www.w3.org/wiki/SweoIG/TaskForces/CommunityProjects/ LinkingOpenData/Applications www.iks-project.eu Copyright IKS Consortium Page: 52 Available SPARQL Endpoints http://dbpedia.org/sparql http://www4.wiwiss.fu-berlin.de/dblp/ To see possible SPARQL endpoints providing a certain URI see http://void.rkbexplorer.com/endpoint-search/ See also a list of alive SPARQL endpoints http://www.w3.org/wiki/SparqlEndpoints www.iks-project.eu Copyright IKS Consortium Page: 53 References http://www.w3.org/TR/rdf-sparql-query http://jena.sourceforge.net/tutorial/RDF_API/index.html http://www.slideshare.net/ldodds/sparql-tutorial http://www.slideshare.net/shamod/a-hands-on-overview-of-the-semanticweb?src=related_normal&rel=1702851 http://www.cambridgesemantics.com/2008/09/sparql-by-example http://linkeddata-specs.info/ http://www.w3.org/wiki/SweoIG/TaskForces/CommunityProjects/LinkingOpenData http://www.bioontology.org/wiki/images/6/6a/Triple_Stores.pdf Sofia Alexaki, Vassilis Christophides, Gregory Karvounarakis, Dimitris Plexousakis, Karsten Tolle. The ICS-FORTH RDFSuite: Managing Voluminous RDF Description Bases , SemWeb, 2001 Jeen Broekstra and Arjohn Kampman and Frank van Harmelen, Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema, Proceedings of the First International, Semantic Web Conference, 2002 Kevin Wilkinson, Craig Sayers, Harumi A. Kuno, Dave Reynolds: Efficient RDF Storage and Retrieval in Jena2, Proceedings of SWDB'03, The first International Workshop on Semantic Web and Databases http://jena.sourceforge.net/DB/index.html http://virtuoso.openlinksw.com/ www.iks-project.eu Copyright IKS Consortium