Slides - Computer Science @ UC Davis

Implementing Mapping Composition Todd J. Green* University of Pennsylania with Philip A. Bernstein (Microsoft Research), Sergey Melnik (Microsoft Research), Alan Nash (UC San Diego) VLDB 2006 *Work partially supported by NSF grants IIS0513778 and IIS0415810 Seoul, Korea Schema mappings  Mapping: a correspondence between instances of different schemas Students Name, Address Names SID, Name m Addresses SID, Address S1 S2 Students  Name,Address (Names ⋈ Addresses) 2 Applications of mappings Schema evolution Names  Names σCountry = KR(Addresses)  SID,Address(Local)£{KR} σCountry  KR(Addresses)  Foreign Students  Name,Address,Country(Names ⋈ Addresses) Students Name, Address, Country S1 Names SID, Name Names SID, Name m12 Addresses SID, Address, Country S2 m23 Local SID, Address ... Foreign SID, Address, Country S3 3 Applications of mappings Data integration, data exchange Sn Addresses SID, Address, Country Names SID, Name ... m1 Students  Name,Address (Names ⋈ Addresses) S1 mn Names  Names Local  SID,Address(Country = KR(Addresses)) Foreign  Country  KR(Addresses) Sn−1 Students Name, Address, Country ... Names SID, Name Local SID, Address Foreign SID, Address, Country4 Requirements for constraints  “First attribute in R is a key for R” 2,4(R ⋈1=3 R) µ 2,2(R)  “View V equals R joined with S” V µ R ⋈ S, V ¶ R ⋈ S  “Second attribute of R is a foreign key in S” 2(R) µ 1(S) 2,4(S ⋈1=3 S) µ 2,2(S)  Data integration, data exchange – GLAV R⋈SµT⋈U 5 Mapping composition Names  Names σCountry = KR(Addresses)  SID,Address(Local)£{KR} σCountry  KR(Addresses)  Foreign Students  Name,Address, Country (Names ⋈ (SID,Address(Local)£{KR} [ Foreign)) Students  Name,Address,Country (Names ⋈ Addresses) Students Name, Address, Country S1 m12 Names SID, Name m12  m23 Addresses SID, Address, Country S2 Names SID, Name m23 Local SID, Address Foreign SID, Address, Country S3 6 Composition is hard  Hard part: write composition in the same language as the input mappings. Depending on language:  Not always possible  Not even decidable whether possible  Strategy 1: use powerful (second-order) mapping language closed under composition [FKPT04]  Not supported by DBMS today  Expensive to check  Source-target restriction  Strategy 2: settle for partial solutions [NBM05]  Containment mappings  easier integration with DBMS  The strategy we adopt in this work 7 Our contributions New algorithm for composition problem Incorporates view unfolding and leftcomposition (new technique) Makes best effort in failure cases Algebraic rather than logic-based mappings Use of monotonicity to handle more operators Modular and extensible factoring of algorithm First implementation of composition Experimental evaluation 8 Formal definition of composition  Mapping: set of pairs of instances of db schemas  The composition m12 ±m23 is the mapping {hA,Ci : (9B)(hA,Bi 2 m12 and hB,Ci 2 m23)} where A,B,C are instances of S1,S2,S3  Composition problem: find constraints in same language as input mappings giving the composition of the input mappings U(∙,∙,∙)  Example: S1 = {R}, S2 = {S,T}, S3 = {U,V,W} S(∙,∙) R(∙,∙,∙) m12 R ⊆ S⋈T, S ⊆ (U), T = VT(∙,∙) –W ) R ⊆ (U)⋈(V - W) S1 R ⊆ S⋈T S2 V(∙,∙) m23 W(∙,∙) S ⊆ (U), T=V–W S3 9 Best-effort composition problem  Composition not always possible  “Best-effort” composition problem: compute set of constraints equivalent to input constraints, but with as many symbols from S2 eliminated as possible R ⊆ U, R ⊆ V, 1,4(2=3(UU)) ⊆ U, 1,4(2=3(VV)) ⊆ V, U ⊆ T, V⊆T Can eliminate U (cross out left column) or V (right column), but not both [NBM05] 10 Composition algorithm overview For each relation R in S2 Try to eliminate R via (1) view unfolding Replace = by pairs of ⊆, ⊇ For each relation R in S2 not yet eliminated Try to eliminate R via (2) left compose Else, try to eliminate R via (3) right compose Output: New constraints and list of relations successfully eliminated 11 (1) View unfolding  Idea: exploit equality constraints (if we have any)  Standard technique: substitute view definition for occurrences of view relation in mappings T = V – W, R ⊆ S ⋈T, T  X ⊆ (U)  R ⊆ S ⋈(V – W), (V – W)  X ⊆ (U)  Body must not mention view relation itself  Doesn’t matter what else is in body  Can substitute everywhere 12 (2) Left compose  “View unfolding” for containment constraints (V) ⊆ R – U, R ⊆ S ⋈ T  (V) ⊆ (S ⋈ T) – U  Needs monotonicity of expressions in R. E1 ⊆ E2(R), R ⊆ E3 ´ E1 ⊆ E2(E3) if E2(R) is monotone in R (and R not in E3)  Partial check for monotonicity “Is S – (T – R) monotone in R?” 13 Normalization for left compose Need one constraint of form R ⊆ E1 Use identities to normalize, e.g.: R ⊆ E1 and R ⊆ E2 iff R ⊆ E1  E2 E1  E2 ⊆ E3 iff E1 ⊆ E3 and E2 ⊆ E3 (E1) ⊆ E2 iff E1 ⊆ E2  Dr More identities in paper After left compose, try to eliminate D 14 (3) Right compose  Dual to left compose, from [NBM05]  Example: S ⋈T  R, R – U (V)  (S ⋈T) – U  (V)  Monotonicity check needed here too  Normalization may introduce Skolem functions  E1  (E2) iff f(E1)  E2  Must eliminate Skolem functions after composition  Lots of effort coding this step! 15 User-defined operators  User specifies:  Monotonicity of operator in its arguments “If E1 monotone in R and E2 antimonotone in R or independent of R, then E1 * E2 monotone in R” “if E1 monotone in R or independent of R and E2 antimonotone in R, then E1 * E2 monotone in R”  Identities for normalization “E1 * E2  E3 iff E1  E2  E3 ”  User-defined operators and standard relational operators treated uniformly 16 Implementation  12K lines of C# code, command-line tool # Test case 13: PODS05 example 2 SCHEMA R(2), S(2), T(2) CONSTRAINTS R <= S, P_{0,2} J_{0,1:1,2} (S S) <= R, S <= T ELIMINATE S; Output: P_{0,2} J_{0,1:1,2}(R R) <= R, R <= T 17 Experimental evaluation  First attempt at a composition benchmark  Schema editing and schema reconciliation scenarios  “Add a column to R to produce S”: (R) = S  Measure  % of symbols eliminated  Running time  As a function of  Editing primitives allowed, length of edit sequence, presence/absence of keys, starting schema size, …  Synthetic data 18 Summary of results  Algorithm often effective in eliminating most or even all relation symbols from S2  Running time in subsecond range even for large problems containing hundreds of constraints  Certain schema editing primitives problematic  Key constraints did not reduce effectiveness, although did increase running time (and output size) 19 Schema editing Execution time (sec) 3.5 3 2.5 2 1.5 1 0.5 0 0 10 20 30 40 50 60 Run number 70 80 90  Random starting schema (30 relations of 2-10 attributes)  100 random edits  100 different runs, sorted by execution time 100 20 Schema reconciliation (1) 1 fraction of symbols eliminated 0.8 0.6 execution time (sec) 0.4 0.2 0 10 30 50 70 90 110 130 150 170 190 210 Number of edits  Random schema (30 relations of 2-10 attributes), random edits  Point represents median time of reconciliation step of 500 runs 21 Schema reconciliation (2) Fraction of symbols eliminated 1 complete 0.8 no view unfolding 0.6 0.4 no right compose 0.2 0 10 20 30 40 50 60 70 80 90 100 Schema size  Random schema (variable # relations of 2-10 attributes)  100 random edits  100 different runs, sorted by execution time 22 Related work  [MH03] J. Madhavan, A. Y. Halevy. Composing mappings among data sources. VLDB, 2003.  [FKPT04] R. Fagin, Ph. G. Kolaitis, L. Popa, W.C. Tan. Composing schema mappings: second-order dependencies to the rescue. PODS, 2004.  [NBM05] A. Nash, P. A. Bernstein, S. Melnik. Composition of mappings given by embedded dependencies. PODS, 2005. 23 Conclusion and future work  We motivated and described the mapping composition problem  We presented an implementation of a practical new algorithm for the composition problem  We also presented an experimental evaluation  To do: theoretical analysis of impact of userdefined operators  To do: output constraints from algorithm can be a mess! How to clean up? 24

Slides - Computer Science @ UC Davis

Related documents

Products

Support

Slides - Computer Science @ UC Davis

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib