gStore: Answering SPARQL Queries Via Subgraph Matching

advertisement
gStore: Answering SPARQL
Queries Via Subgraph
Matching
Presented by Guan Wang
Kent State University
October 24, 2011
1
Outline

RDF & SPARQL

Previous Solutions for SPARQL Queries

Overview of gStore

Encoding Technique

VS*-tree & Query Algorithm

Experiments

Conclusions
2
Outline

RDF & SPARQL

Previous Solutions for SPARQL Queries

Overview of gStore

Encoding Technique

VS*-tree & Query Algorithm

Experiments

Conclusions
3
What is RDF


A general-purpose framework provides structured,
machine-understandable metadata for the Web
It is based upon the idea of making statements about
resources in the form of subject-predicate-object
expressions. These expressions are known as triples in
RDF.
Subject
Object
Predicate
Statement
4
RDF Model Example
Guan
page.html
Guan’s Home Page
Subject
page.html
page.html
Predicate
Creator
Creator
Object
Guan
Guan's Home Page
5
What is SPARQL

SPARQL is a query language for RDF. It provides a
standard format for writing queries that target RDF data
and a set of standard rules for processing those queries
and returning the results.

The building blocks of a SPARQL queries are graph
patterns that include variables. The result of the query
will be the values that these variables must take to
match the RDF graph.
6
Example of SPARQL
Select ?name Where { ?m <hasName> ?name. ?m
<BornOnDate> “1809-02-12”. ?m <DiedOnDate> “186504-15”. }




Names beginning with a ? or a $ are variables.
Graph patterns are given as a list of triple patterns
enclosed within braces {}
The variables named after the SELECT keyword are the
variables that will be returned as results. (~SQL)
Here each of the conjunctions, denoted by a dot,
corresponds to a join.
7
RDF Graph
8
SPARQL Queries
SPARQL Query:
Select ?name Where { ?m
<hasName> ?name. ?m <BornOnDate> “1809-0212”. ?m <DiedOnDate> “1865-04-15”. }
Query Graph
9
Subgraph Match vs. SPARQL Queries
10
Outline

RDF & SPARQL

Previous Solutions for SPARQL Queries

Overview of gStore

Encoding Technique

VS*-tree & Query Algorithm

Experiments

Conclusions
11
Existing Solutions-Three Column Table
SPARQL Query:
Select ?name Where { ?m
<hasName> ?name. ?m
<BornOnDate> “1809-0212”. ?m <DiedOnDate>
“1865-04-15”. }
Shortage:
Too Many Self-Joins
12
Existing Solutions-Property Table
Shortage:
A Big Waste of Space
13
Existing Solutions-Vertically Partitioned
Shortage:
Too Many
Merge Joins
14
Existing Solutions-RDF-3x
Utilize the characteristic of RDF, that there are only three
elements(subject, object and predicate) in RDF.
Construct all six possible indexes and optimalize merge
orders.
Shortage: Different to Handle Updates
15
Outline

RDF & SPARQL

Previous Solutions for SPARQL Queries

Overview of gStore

Encoding Technique

VS*-tree & Query Algorithm

Experiments

Conclusions
16
Overview of gStore(Store)

Represent an RDF dataset by an RDF graph G and
store it by its adjacency list table.
17
Overview of gStore(Encoding)


Encode each entity and class vertex into a bitstring,
called signature.
Link these vertex signatures to form a data signature
graph G according to RDF graph’s structure
18
Overview of gStore(VS*-tree)
19
Outline

RDF & SPARQL

Previous Solutions for SPARQL Queries

Overview of gStore

Encoding Technique

VS*-tree & Query Algorithm

Experiments

Conclusions
20
Encoding Technique
21
Encoding Technique
22
Outline

RDF & SPARQL

Previous Solutions for SPARQL Queries

Overview of gStore

Encoding Technique

VS*-tree & Query Algorithm

Experiments

Conclusions
23
VS*-tree




Each leaf node of the tree corresponds to one vertex
signature in G.
Given two leaf nodes d1 and d2 in the tree, we
introduce an edge between them, if and only if there is
an edge between d1 and d2 in G
Given nodes d1 and d2 in the tree, we introduce a
super edge from d1 to d2 , if and only if there is at least
one edge from d1’s children to d2’s children.
Assign an edge label for the edge d1→ d2 by
performing bitwise “OR” over these n edge labels from
d1’s children to d2’s children.
24
VS*-tree
25
Query Algorithm
26
Outline

RDF & SPARQL

Previous Solutions for SPARQL Queries

Overview of gStore

Encoding Technique

VS*-tree & Query Algorithm

Experiments

Conclusions
27
Experiments


Used datasets: Yago, DBLP which are popular
semantic datasets with millions of triples.
Data size: approximately 4GB.
28
Experiments(Exact Queries)
29
Experiments(Wildcard Queries)
30
Outline

RDF & SPARQL

Previous Solutions for SPARQL Queries

Overview of gStore

Encoding Technique

VS*-tree & Query Algorithm

Experiments

Conclusions
31
Conclusions



Propose to store and query RDF data from graph
database perspective.
Using VS*-tree as indexing method for bitstring of
vertices, which supports the SPARQL queries in a
scalable manner.
False positive.
32
Reference





[ICDE09]Thanh Tran, Haofen Wang, Sebastian Rudolph, Philipp Cimiano,
"Top-k Exploration of Query Candidates for Efficient Keyword Search on
Graph-Shaped (RDF) Data", DOI 10.1109/ICDE.2009.119.
[VLDB07]Daniel J. Abadi, Adam Marcus, Samuel R. Madden,Kate
Hollenbach, "Scalable Semantic Web Data Management Using Vertical
Partitioning", VLDB ‘07, September 2328, 2007, Vienna, Austria.
[PVLDB08]Cathrin Weiss, Panagiotis Karras, Abraham Bernstein,
"Hexastore:Sextuple Indexing for Semantic Web Data
Management",PVLDB '08, August 23-28, 2008, Auckland, New Zealand
[PVLDB08]Thomas Neumann, Gerhard Weikum, "RDF3X:a RISCstyle
Engine for RDF",PVLDB '08, August 23-28, 2008, Auckland, New Zealand
[VLDB11]Lei Zou, Jinghui Mo, Lei Chen, M. Tamer O¨ zsu, Dongyan Zhao,
"gStore: Answering SPARQL Queries via Subgraph Matching"
VLDB‘11,August 29th - September 3rd 2011, Seattle, Washington.
Thank you!
33
Download