Web Data Management Bisimulation 1 In this lecture • Semistructured data model • Graph Simulation and Bisimulation • Computing (bi)simulation Resources Adding structure to semistructured data by Buneman, Davidson, Fernandez, Suciu, in ICDT 97 Data on the Web Abiteboul, Buneman, Suciu : section 6.4 2 The Semistructured Data Model Bib &o1 complex object paper paper book references &o12 &o24 &o29 references author title year author http references author title publisher author author &o43 page title author &25 &96 1997 last firstname lastname atomic object firstname lastname &243 “Serge” “Abiteboul” “Victor” Object Exchange Model (OEM) first &206 “Vianu” 122 133 3 Syntax for Semistructured Data May omit oid’s: { paper: { author: “Abiteboul”, author: { firstname: “Victor”, lastname: “Vianu”}, title: “Regular path queries …”, page: { first: 122, last: 133 } } } 4 Set Semantics for Trees Want to say that {a, a, b} = {a, b} Define equality for trees first, then for graphs Definition Two trees t, t’ are equal, t=t’, if: 1. They are both atomic values with same value 2. t = {t1, ..., tm}, t’ = {t1’, ..., tn’} and: – i=1,...,m, j=1,...,n s.t. ti = tj’ – j=1,...,n, i=1,...,m s.t. ti = tj’ 5 Set Semantics: Example a c b c 1 b d 2 c e 3 a = 2 d e 3 a c c 1 1 c 1 b c 1 c 2 d e 3 6 Set Semantics for Graphs • Previous definition does not apply directly to graphs with cycles • Need to adapt it bisimulation • First, we will define a simulation 7 Graph Simulation Definition Two edge-labeled graphs G1, G2 A simulation is a relation R between nodes: • if (x1, x2) R, and (x1,a,y1) G1, then exists (x2,a,y2) G2 (same label) s.t. (y1,y2) R x1 G1 R x2 a a y1 R G2 y2 Note: if we insist that R be a function graph homeomorphism 8 Graph Bisimulation Definition Two edge-labeled graphs G1, G2 A bisimulation is a relation R between nodes s.t. both R and R-1 are simulations 9 Set Semantics for Semistructured Data Definition Two rooted graphs G1, G2 are equal if there exists a bisimulation R from G1 to G2 such that (root(G1), root(G2)) R • Notation: G1 G2 • For trees, this is precisely our earlier definition 10 Examples of Bisimilar Graphs a b a = b c c c a a a a a a = ... 11 Examples of non-Bisimilar Graphs a a a G1= b c G2= b c • This is a simulation but not a bisimulation – Why ? • Notice: G1, G2 have the same sets of paths 12 Examples of Simulation • Simulation acts like “subset” {a, b} {a, b, c} a b c a b {a, b:{c}} {d, a:{e,f}, b:{c,g}} a d b e c a b f c g • Question: • if DB1 DB2 and DB2 DB1 then DB1 DB2 ? 13 Answer if DB1 DB2 and DB2 DB1 then DB1 DB2 ? No. Here is a counter example: DB1 a DB2 a a b b DB1 DB2 and DB2 DB1 but NOT DB1 DB2 14 Facts About a (Bi)Simulation • The empty set is always a (bi)simulation • If R, R’ are (bi)simulations, so is R U R’ • Hence, there always exists a maximal (bi)simulation: – Checking if DB1=DB2: compute the maximal bisimulation R, then test (root(DB1),root(DB2)) in R 15 Computing a (Bi)Simulation • Computing the maximal (bi)simulation: – Start with R = nodes(G1) x nodes(G2) – While exists (x1, x2) R that violates the definition, remove (x1, x2) from R • This runs in polynomial time ! Better: – O((m+n)log(m+n)) for bisimulation – O(m n) for simulation – Compare to finding a graph homeomorphism ! NP Complete 16