Graph Algebra with Pattern Matching and Aggregation Support 1 Nowadays Graph Variety of Sources ◦ ◦ ◦ ◦ Scientific Studies Business Activities Social Needs Internet Data are often of ◦ Large Scale ◦ Highly Liked ◦ Schema-less 2 Managing Graph Data Primary Role of Database ◦ Persistent store ◦ Efficient Query RDBMS ◦ Storage Model : vertex and edge as tuples ◦ Query: Link is by join Graph Database ◦ Storage Model: graphs ◦ Query: path traversal 3 Why not RDBMS ? Schema Issue ◦ Every data inserted may of a different schema (Web Graph) ◦ Hard to represent semi structured info Scalability Issues ◦ ACID property VS CAP theorem Query performance ◦ Difficult to optimize intensive Joins 4 Graph Databases and Query Languages No Universal Languages !!! 5 No Universal Language Like SQL? No commonly agreed algebra Relational Algebra ? ◦ Expressive, test-of-time to be effective ◦ NOT suitable for GRAPH Graph Algebra ? ◦ Still at preliminary work 6 Issues with Relational Algebra (RA) Defined on Tuples or Set of Tuples ◦ Mismatch with graph nature ◦ Operators loose semantics What is Union, Intersection, Join in GRAPH? ◦ I/O type ? Tables not GRAPH Domain centric, not Data centric ◦ Don’t anticipate out-of-order data ◦ Treat Tuples as independent Didn’t aware the links among Tuples Queries written using RA are verbose and complex 7 Advantage of Graph Algebra An algebra itself is a query language ◦ Easy to work out a language with Strong theoretic support Evaluate expressiveness of given languages ◦ Justify when to use what: Gremlin, Cypher etc. Query Optimization ◦ Operator order EQUALS execution plan ◦ Algebraic Equivalence IMPLIES query optimization 8 Advantage of Graph Algebra Separation of Query and System: ◦ One can write Query on any system as long as common algebra is supported. ◦ Knowing RA, one can write SQL, PL/SQL, MS/SQL on MySQL, Oracle, SQLServer Integrate new operators to database: ◦ Current graph database systems didn’t support newly developed queries: Graph OLAP, Graph Cube, Graph Aggregation etc. ◦ Proper Algebra can incorporate these operators 9 Existing Works on Graph Algebra Graph QL [1] ◦ ◦ ◦ ◦ VAQL [2] ◦ ◦ ◦ ◦ A graph based algebra, operators are based on graphs Selection Join – not properly defined Template Focused on visualization Selection Aggregation – restricted Visualization Selection is restricted on isomorphism Aggregation is not defined over edges No algebra equivalence [1] He, Huahai, and Ambuj K. Singh. "Graphs-at-a-time: query language and access methods for graph databases." Proceedings of the 2008 ACM SIGMOD international conference on Management of data. ACM, 2008. [2] Shaverdian, Anna A., et al. "A graph algebra for scalable visual analytics." Computer Graphics and Applications, IEEE 32.4 (2012): 26-33.10 What we want for a Graph Algebra? Universal ◦ Independent of graph types: Directed VS Undirected. Simple VS Hyper. Homogeneous VS heterogeneous. Expressive ◦ Able to answer typical graph queries: Pattern match, Reachability, Path finding etc. ◦ Cover Relational Algebra (RA) This ensures that graph database can handle relational data as well Scale ◦ Able to manage data in-scale Support queries to summarize, aggregate data 11 Extended Algebra – Graph Model 𝐺(𝑉, 𝐸, 𝐴, 𝐵, 𝐶) is an attributed graph 𝑉 is vertex set, each 𝑣 ∈ V has a unique ID 𝐸 is edge set 𝐸 ⊆ 𝑉 × 𝑉 𝐴 contains attributes for each vertex 𝐵 contains attributes for each edge ◦ Edge contain identifier as well ◦ In simple graph, edge can be represented by end points 𝐶 contains information for the graph 12 Extended Algebra – Operators Projection 𝜋 Restriction σ Unification ⊕ Pattern Matching Γ Aggregation Σ 13 Operators: Projection 𝜋 Purpose: ◦ Select user interested data from base graph Syntax: ◦ 𝜋 𝐺, 𝐴, 𝐵, 𝐶 ∶→ 𝐺′ 𝐴, 𝐵, 𝐶 are the attribute lists for vertex, edge and graph The result is a new graph, whose attributes are trimmed by 𝐴, 𝐵, 𝐶 14 Operators: Restriction σ Purpose: ◦ Restrict the attribute value from base graph Syntax: ◦ 𝜎𝑣 𝐺, 𝑝 : → {𝐺 ′ } ◦ 𝜎𝑒 𝐺, 𝑝 : → {𝐺 ′ } ◦ 𝜎𝐺 𝐺 , 𝑝𝐴 , 𝑝𝐵 , 𝑝𝐶 : → {𝐺 ′ } 𝜎𝑣 : vertex restriction, select all the vertices (and their induced edges) which matches predicate 𝑝 𝜎𝑒 : edge restriction, select all the edges (and their endpoints) which matches predicate 𝑝 𝜎𝐺 : graph restriction, select graphs whose every vertex matches predicate 𝑝𝐴 , every edge matches 𝑝𝐵 and the graph matches 𝑝𝐶 15 Operator: Unification ⊕ Purpose: ◦ Concatenate graphs Syntax: ◦ ⊕𝑣 {𝐺} : → {𝐺} ◦ ⊕𝑒 ( 𝐺 , 𝑝): → {𝐺} ◦ ⊕𝑎 𝐺 , 𝑎𝑡𝑡𝑟 : → {𝐺} ⊕𝑣 : vertex unification, unify vertices with identical ids ⊕𝑒 : edge unification, adding edges between two vertices matching 𝑝 ⊕𝑎 : attribute unification, create a virtual vertex for each distinct value in 𝑎𝑡𝑡𝑟 16 Operator: Unification ⊕ P(v1,v1) and P(v4,v5) are true 17 Operator: Unification ⊕ 18 Operator: Pattern Matching Γ Purpose: ◦ Find subgraphs out of base graph matching a given pattern Syntax: ◦ Γ RM, G : → G ◦ Γ∗ RM, G : → G 𝑅𝑀 is a pattern, which is also a graph. The definition comes from [1] Γ returns all the matching graphs {G} Γ ∗ returns abstractive matching, where only vertices appeared in 𝑅𝑀 is returned [1] Fan, Wenfei, et al. "Adding regular expressions to graph reachability and pattern queries." Data Engineering (ICDE), 2011 IEEE 27th International Conference on. IEEE, 2011. 19 Operator: Pattern Matching Γ 20 Operator: Aggregation Σ Purpose: ◦ To summarize a given graph Syntax: ◦ Σ𝐺 𝐺, Σ𝑣 , Σ𝑒 : → 𝐺′ ◦ Σ𝑣 {𝑉}, 𝐴𝑣 , 𝑓𝑣 : → 𝐺 ′ ◦ Σ𝑒 {𝐸′}, 𝐴𝑒 , 𝑓𝑒 : → 𝐺 ′ Σ𝐺 : graph aggregation, every vertex is supplied to Σ𝑣 and every edge set is supplied to Σ𝑒 Σ𝑣 : vertex aggregation, given a set of vertices group them by 𝐴𝑣 Σ𝑒 : edge aggregation, given a set of edges, group them by 𝐴𝑒 21 Operator: Aggregation Σ 22 Expressiveness This set of operators are more expressive than Relational Algebra and Graph QL It can represent many graph queries ◦ Reachability ◦ Graph Cube computation ◦ I-OLAP and T-OLAP 23 Algebra Equivalence When operators are chained up, they can form a query execution plan ⊕𝑣 (𝜋 𝜎𝑣 Γ 𝑅𝑀 , 𝐺 , 𝑏𝑖𝑟𝑡ℎ𝑑𝑎𝑦 > 1989 , 𝑣. 𝑛𝑎𝑚𝑒 ) friend Comment Base Graph friend Matched Result 𝑏𝑖𝑟𝑡ℎ𝑑𝑎𝑦 > 1989 Restriction v.name V-Unification Find the network induced by the person whose friends comment on each other’s posts with birthday greater than 1989. Output those names as a graph 24 Algebra Equivalence To generate multiple execution plans for a same query, we need theoretic support: Identity Equivalence: ◦ A operator can be represented by other operators Γ 𝐼 𝑅𝑀 , 𝐺 ≡ Γ 𝐼 𝑅𝑀 − 𝑝, 𝜎 𝐺, 𝑝 , attribute predicate Γ 𝐺, 𝑃 = Γ(⊕𝑣 (Γ 𝐺, 𝐷 𝑃 , 𝑃) // p is a common ◦ D(P) is to decompose a pattern P into edges 𝐺 =⊕𝑣 (𝜋(𝐴′ , 𝐺 ), 𝜋(𝐴 − 𝐴′ , 𝐺)) ◦ // 𝐴 𝑖𝑠 𝑡ℎ𝑒 𝑎𝑡𝑡𝑟𝑖𝑏𝑢𝑡𝑒 𝑙𝑖𝑠𝑡 𝑜𝑓 𝐺 𝜎𝐺 Γ 𝑅𝑀 , 𝐺 , 𝑝 ≡ Γ 𝑅𝑀 , 𝜎𝑉 𝐺, 𝑝 ... 25 Conclusion Graph Algebra plays an important role in graph database development We make one step forward by proposing a Graph Algebra which: ◦ extends existing algebraic work with Regular pattern matching Aggregation ◦ is expressive and well-defined ◦ contains equivalence rules for further query optimization 26 27