Graph Algebra

advertisement
Graph Algebra
with Pattern Matching and Aggregation Support
1
Nowadays Graph

Variety of Sources
◦
◦
◦
◦

Scientific Studies
Business Activities
Social Needs
Internet
Data are often of
◦ Large Scale
◦ Highly Liked
◦ Schema-less
2
Managing Graph Data

Primary Role of Database
◦ Persistent store
◦ Efficient Query

RDBMS
◦ Storage Model : vertex and edge as tuples
◦ Query: Link is by join

Graph Database
◦ Storage Model: graphs
◦ Query: path traversal
3
Why not RDBMS ?

Schema Issue
◦ Every data inserted may of a different schema
(Web Graph)
◦ Hard to represent semi structured info

Scalability Issues
◦ ACID property VS CAP theorem

Query performance
◦ Difficult to optimize intensive Joins
4
Graph Databases and Query
Languages
No Universal Languages !!!
5
No Universal Language Like SQL?

No commonly agreed algebra

Relational Algebra ?
◦ Expressive, test-of-time to be effective
◦ NOT suitable for GRAPH

Graph Algebra ?
◦ Still at preliminary work
6
Issues with Relational Algebra (RA)

Defined on Tuples or Set of Tuples
◦ Mismatch with graph nature
◦ Operators loose semantics
 What is Union, Intersection, Join in GRAPH?
◦ I/O type ?
 Tables not GRAPH

Domain centric, not Data centric
◦ Don’t anticipate out-of-order data
◦ Treat Tuples as independent
 Didn’t aware the links among Tuples
 Queries written using RA are verbose and complex
7
Advantage of Graph Algebra

An algebra itself is a query language
◦ Easy to work out a language with Strong
theoretic support

Evaluate expressiveness of given languages
◦ Justify when to use what: Gremlin, Cypher etc.

Query Optimization
◦ Operator order EQUALS execution plan
◦ Algebraic Equivalence IMPLIES query
optimization
8
Advantage of Graph Algebra

Separation of Query and System:
◦ One can write Query on any system as long as
common algebra is supported.
◦ Knowing RA, one can write SQL, PL/SQL,
MS/SQL on MySQL, Oracle, SQLServer

Integrate new operators to database:
◦ Current graph database systems didn’t support
newly developed queries:
 Graph OLAP, Graph Cube, Graph Aggregation etc.
◦ Proper Algebra can incorporate these operators
9
Existing Works on Graph Algebra

Graph QL [1]
◦
◦
◦
◦

VAQL [2]
◦
◦
◦
◦



A graph based algebra, operators are based on graphs
Selection
Join – not properly defined
Template
Focused on visualization
Selection
Aggregation – restricted
Visualization
Selection is restricted on isomorphism
Aggregation is not defined over edges
No algebra equivalence
[1] He, Huahai, and Ambuj K. Singh. "Graphs-at-a-time: query language and access methods for graph databases." Proceedings of the 2008 ACM
SIGMOD international conference on Management of data. ACM, 2008.
[2] Shaverdian, Anna A., et al. "A graph algebra for scalable visual analytics." Computer Graphics and Applications, IEEE 32.4 (2012): 26-33.10
What we want for a Graph
Algebra?

Universal
◦ Independent of graph types:
 Directed VS Undirected. Simple VS Hyper. Homogeneous VS
heterogeneous.

Expressive
◦ Able to answer typical graph queries:
 Pattern match, Reachability, Path finding etc.
◦ Cover Relational Algebra (RA)
 This ensures that graph database can handle relational data as
well

Scale
◦ Able to manage data in-scale
 Support queries to summarize, aggregate data
11
Extended Algebra – Graph Model





𝐺(𝑉, 𝐸, 𝐴, 𝐵, 𝐶) is an attributed graph
𝑉 is vertex set, each 𝑣 ∈ V has a unique ID
𝐸 is edge set 𝐸 ⊆ 𝑉 × 𝑉
𝐴 contains attributes for each vertex
𝐵 contains attributes for each edge
◦ Edge contain identifier as well
◦ In simple graph, edge can be represented by end
points

𝐶 contains information for the graph
12
Extended Algebra – Operators

Projection 𝜋

Restriction σ

Unification ⊕

Pattern Matching Γ

Aggregation Σ
13
Operators: Projection 𝜋

Purpose:
◦ Select user interested data from base graph

Syntax:
◦ 𝜋 𝐺, 𝐴, 𝐵, 𝐶 ∶→ 𝐺′
𝐴, 𝐵, 𝐶 are the attribute lists for vertex, edge
and graph
 The result is a new graph, whose attributes are
trimmed by 𝐴, 𝐵, 𝐶

14
Operators: Restriction σ

Purpose:
◦ Restrict the attribute value from base graph

Syntax:
◦ 𝜎𝑣 𝐺, 𝑝 : → {𝐺 ′ }
◦ 𝜎𝑒 𝐺, 𝑝 : → {𝐺 ′ }
◦ 𝜎𝐺 𝐺 , 𝑝𝐴 , 𝑝𝐵 , 𝑝𝐶 : → {𝐺 ′ }
𝜎𝑣 : vertex restriction, select all the vertices (and their
induced edges) which matches predicate 𝑝
 𝜎𝑒 : edge restriction, select all the edges (and their
endpoints) which matches predicate 𝑝
 𝜎𝐺 : graph restriction, select graphs whose every vertex
matches predicate 𝑝𝐴 , every edge matches 𝑝𝐵 and the
graph matches 𝑝𝐶

15
Operator: Unification ⊕

Purpose:
◦ Concatenate graphs

Syntax:
◦ ⊕𝑣 {𝐺} : → {𝐺}
◦ ⊕𝑒 ( 𝐺 , 𝑝): → {𝐺}
◦ ⊕𝑎 𝐺 , 𝑎𝑡𝑡𝑟 : → {𝐺}



⊕𝑣 : vertex unification, unify vertices with identical
ids
⊕𝑒 : edge unification, adding edges between two
vertices matching 𝑝
⊕𝑎 : attribute unification, create a virtual vertex for
each distinct value in 𝑎𝑡𝑡𝑟
16
Operator: Unification ⊕
P(v1,v1) and P(v4,v5) are true
17
Operator: Unification ⊕
18
Operator: Pattern Matching Γ

Purpose:
◦ Find subgraphs out of base graph matching a
given pattern

Syntax:
◦ Γ RM, G : → G
◦ Γ∗ RM, G : → G
𝑅𝑀 is a pattern, which is also a graph. The
definition comes from [1]
 Γ returns all the matching graphs {G}
 Γ ∗ returns abstractive matching, where only
vertices appeared in 𝑅𝑀 is returned

[1] Fan, Wenfei, et al. "Adding regular expressions to graph reachability and pattern queries." Data
Engineering (ICDE), 2011 IEEE 27th International Conference on. IEEE, 2011.
19
Operator: Pattern Matching Γ
20
Operator: Aggregation Σ

Purpose:
◦ To summarize a given graph

Syntax:
◦ Σ𝐺 𝐺, Σ𝑣 , Σ𝑒 : → 𝐺′
◦ Σ𝑣 {𝑉}, 𝐴𝑣 , 𝑓𝑣 : → 𝐺 ′
◦ Σ𝑒 {𝐸′}, 𝐴𝑒 , 𝑓𝑒 : → 𝐺 ′



Σ𝐺 : graph aggregation, every vertex is supplied to Σ𝑣
and every edge set is supplied to Σ𝑒
Σ𝑣 : vertex aggregation, given a set of vertices group
them by 𝐴𝑣
Σ𝑒 : edge aggregation, given a set of edges, group them by
𝐴𝑒
21
Operator: Aggregation Σ
22
Expressiveness

This set of operators are more expressive than
Relational Algebra and Graph QL

It can represent many graph queries
◦ Reachability
◦ Graph Cube computation
◦ I-OLAP and T-OLAP
23
Algebra Equivalence

When operators are chained up, they can form
a query execution plan
⊕𝑣 (𝜋 𝜎𝑣 Γ 𝑅𝑀 , 𝐺 , 𝑏𝑖𝑟𝑡ℎ𝑑𝑎𝑦 > 1989 , 𝑣. 𝑛𝑎𝑚𝑒 )
friend
Comment
Base
Graph
friend
Matched
Result
𝑏𝑖𝑟𝑡ℎ𝑑𝑎𝑦
> 1989
Restriction
v.name
V-Unification
Find the network induced by the person whose friends comment on each
other’s posts with birthday greater than 1989. Output those names as a graph
24
Algebra Equivalence

To generate multiple execution plans for a same
query, we need theoretic support:

Identity Equivalence:
◦ A operator can be represented by other operators

Γ 𝐼 𝑅𝑀 , 𝐺 ≡ Γ 𝐼 𝑅𝑀 − 𝑝, 𝜎 𝐺, 𝑝 ,
attribute predicate

Γ 𝐺, 𝑃 = Γ(⊕𝑣 (Γ 𝐺, 𝐷 𝑃 , 𝑃)
// p is a common
◦ D(P) is to decompose a pattern P into edges

𝐺 =⊕𝑣 (𝜋(𝐴′ , 𝐺 ), 𝜋(𝐴 − 𝐴′ , 𝐺))
◦ // 𝐴 𝑖𝑠 𝑡ℎ𝑒 𝑎𝑡𝑡𝑟𝑖𝑏𝑢𝑡𝑒 𝑙𝑖𝑠𝑡 𝑜𝑓 𝐺


𝜎𝐺 Γ 𝑅𝑀 , 𝐺 , 𝑝 ≡ Γ 𝑅𝑀 , 𝜎𝑉 𝐺, 𝑝
...
25
Conclusion

Graph Algebra plays an important role in graph
database development

We make one step forward by proposing a
Graph Algebra which:
◦ extends existing algebraic work with
 Regular pattern matching
 Aggregation
◦ is expressive and well-defined
◦ contains equivalence rules for further query
optimization
26
27
Download