Data Conflict Resolution Using Trust Mappings Wolfgang Gatterbauer & Dan Suciu University of Washington, Seattle June 8, Sigmod 2010 Project web page: http://db.cs.washington.edu/beliefDB Conflicts & Trust mappings in Community DBs Indus script* Background 1: Conflicting beliefs glyph origin 1 ship hull cow 1 jar 1 fish 2 knot 2 arrow 3 Alice Bob Charlie Bob Charlie Charlie “Beliefs”: annotated (key,value) pairs 80 Alice 50 glyph origin 1 ship hull fish 2 arrow 3 “Implicit belief” Recent work on community databases: Orchestra [SIGMOD’06, VLDB’07] Youtopia [VLDB’09], BeliefDB [VLDB’09] * Bob glyph origin cow 1 fish 2 arrow 3 “Explicit belief” Background 2: Trust mappings Alice Bob (100) Alice Charlie (50) Bob Alice (80) Priorities 100 Current state of knowledge on the Indus Script: Rao et al., Science 324(5931):1165, May 2009 Charlie glyph origin jar 1 knot 2 arrow 3 2 Problems due to transient effects 1. Incorrect inserts – Value depends on order of inserts 100 Bob glyph origin cow t3 Alice would have preferred Bob’s value over Charlie’s Alice 50 glyph origin jar t2 Charlie glyph origin jar t1 3 Problems due to transient effects 1. Incorrect inserts – Value depends on order of inserts 100 80 glyph origin jar t3 2. Incorrect updates – Mis-handling of revokes Alice and Bob trust each other most, but have lost “justification” for their beliefs Bob Alice 50 glyph origin jar t2 This paper: Automatic conflict resolution with trust mappings: 1. How to define a globally consistent solution? 2. How to calculate it efficiently? 3. Some extensions Charlie glyph origin jar t1 cow t4 4 Agenda 1. Stable solutions – how to define a unique and consistent solution? 2. Resolution algorithm – how to calculate the solution efficiently? 3. Extensions – how to deal with “negative beliefs”? 5 Stable solutions • Priority trust network (TN) – – – – • assume a fixed key users (nodes): A, B, C values (beliefs): v, w, u trust mappings (arcs) from “parents” Stable solution – assignment of values to each node*, s.t. each belief has a “non-dominated lineage” to an explicit belief * each node with at least one ancestor with explicit belief User A believes value v A:v B:w 10 10 20 C:? 20 User D is user C’s “preferred parent” D:? User C has no explicit belief 6 Stable solutions • Priority trust network (TN) – – – – • assume a fixed key users (nodes): A, B, C values (beliefs): v, w, u trust mappings (arcs) from “parents” Lineage Stable solution – assignment of values to each node*, s.t. each belief has a “non-dominated lineage” to an explicit belief * each node with at least one ancestor with explicit belief A:v B:w 10 10 20 C:v 20 D:v SS1=(A:v, B:w, C:v, D:v) 7 Stable solutions • Priority trust network (TN) – – – – • assume a fixed key users (nodes): A, B, C values (beliefs): v, w, u trust mappings (arcs) from “parents” Stable solution – assignment of values to each node*, s.t. each belief has a “non-dominated lineage” to an explicit belief • Possible / Certain semantics – a stable solution determines, for each node, a possible value (“poss”) – certain value (“cert”) = intersection of all stable solutions, per user * each node with at least one ancestor with explicit belief A:v B:w 10 10 20 C:w 20 D:w SS1=(A:v, B:w, C:v, D:v) SS2=(A:v, B:w, C:w, D:w) X poss(X) cert(X) A B C D {v} {w} {v,w} {v,w} {v} {w} 8 Stable solutions • Priority trust network (TN) – – – – • Parent “B:w (10)” dominates and is inconsistent with “E:u (5)” assume a fixed key users (nodes): A, B, C values (beliefs): v, w, u trust mappings (arcs) from “parents” Stable solution • Possible / Certain semantics – a stable solution determines, for each node, a possible value (“poss”) – certain value (“cert”) = intersection of all stable solutions, per user * each node with at least one ancestor with explicit belief B:w 10 10 E:u 5 20 C:u Is this a stable solution? – assignment of values to each node*, s.t. each belief has a “non-dominated lineage” to an explicit belief A:v 20 D:u SS1=(A:v, B:w, C:v, D:v, E:u) SS2=(A:v, B:w, C:w, D:w, E:u) SS3=(A:v, B:w, C:u, D:u, E:u) No! X poss(X) cert(X) A B C D E {v} {w} {v,w} {v,w} {u} {v} {w} {u} Now how to calculate poss / cert ? 9 Logic programs (LP) with stable model semantics LP & Stable model semantics “Declarative imperative”* But solving LPs is hard 10,000 1,000 100 Time [sec] Natural correspondence Brave (credulous) reasoning ~ possible tuple semantics Cautious (skeptical) reasoning ~ certain tuple semantics 10 1 0.1 0 0.01 0 0 0 Previous work on consistent query answering & peer data exchange Greco et al. [TKDE’03] Arenas et al. [TLP’03] Barcelo, Bertossi [PADL’03] Bertossi, Bravo [LPAR’07] * ** keynote Joe Hellerstein size of the network = users + mappings; simple network of several “osciallators” (see paper) DLV 50 50 Size of 100 150 200 100network 150 ** 200 the State-of-the-art LP solver How can we calculate poss / cert efficiently? 10 Agenda 1. Stable solutions – how to define a unique and consistent solution? 2. Resolution algorithm – how to calculate the solution efficiently? 3. Extensions – how to deal with “negative beliefs”? 11 Resolution Algorithm Keep 2 sets: closed / open Initialize closed with explicit beliefs Focus on binary trust network closed A{v} B{w} C{u} open D E G H J K F L preferred non-preferred X A B C D E F G H J K L poss(X) cert(X) {v} {w} {u} ? ? ? ? ? ? ? ? {v} {w} {u} ? ? ? ? ? ? ? ? 12 Resolution Algorithm closed A{v} B{w} C{u} D E F Keep 2 sets: closed / open Initialize closed with explicit beliefs MAIN Step 1: if preferred edges from open to closed follow open G H J K L X poss(X) cert(X) A B C D E F G H J K L {v} {w} {u} ? ? ? ? ? ? ? ? {v} {w} {u} ? ? ? ? ? ? ? ? 13 Resolution Algorithm closed A{v} B{w} C{u} D{v} E F open G H J K L Keep 2 sets: closed / open Initialize closed with explicit beliefs MAIN Step 1: if preferred edges from open to closed follow X poss(X) cert(X) A B C D E F G H J K L {v} {w} {u} {v} ? ? ? ? ? ? ? {v} {w} {u} {v} ? ? ? ? ? ? ? 14 Resolution Algorithm closed A{v} B{w} C{u} D{v} E{w} F open G H J K L Keep 2 sets: closed / open Initialize closed with explicit beliefs MAIN Step 1: if preferred edges from open to closed follow X poss(X) cert(X) A B C D E F G H J K L {v} {w} {u} {v} {w} ? ? ? ? ? ? {v} {w} {u} {v} {w} ? ? ? ? ? ? 15 Resolution Algorithm closed A{v} B{w} C{u} D{v} E{w} F{u} open G H J K L Keep 2 sets: closed / open Initialize closed with explicit beliefs MAIN Step 1: if preferred edges from open to closed follow X poss(X) cert(X) A B C D E F G H J K L {v} {w} {u} {v} {w} {u} ? ? ? ? ? {v} {w} {u} {v} {w} {u} ? ? ? ? ? 16 Resolution Algorithm closed A{v} B{w} C{u} D{v} E{w} F{u} open G H{w} J K L Keep 2 sets: closed / open Initialize closed with explicit beliefs MAIN Step 1: if preferred edges from open to closed follow X poss(X) cert(X) A B C D E F G H J K L {v} {w} {u} {v} {w} {u} ? {w} ? ? ? {v} {w} {u} {v} {w} {u} ? {w} ? ? ? 17 Resolution Algorithm closed A{v} B{w} C{u} D{v} E{w} F{u} Keep 2 sets: closed / open Initialize closed with explicit beliefs MAIN Step 1: if preferred edges from open to closed follow Step 2: else construct SCC graph of open open “Minimal SCC” no incoming edge from other SCC G H{w} J K L For every cyclic or acyclic directed graph: - The Strongly Connected Components graph is a DAG - can be calculated in O(n) Tarjan [1972] X poss(X) cert(X) A B C D E F G H J K L {v} {w} {u} {v} {w} {u} ? {w} ? ? ? {v} {w} {u} {v} {w} {u} ? {w} ? ? ? 18 Resolution Algorithm closed A{v} B{w} C{u} D{v} E{w} F{u} Keep 2 sets: closed / open Initialize closed with explicit beliefs MAIN Step 1: if preferred edges from open to closed follow Step 2: else construct SCC graph of open resolve minimum SCCs open G{v,w} H{w} “Minimal SCC” no incoming edge from other SCC J{v,w} K{v,w} L For every cyclic or acyclic directed graph: - The Strongly Connected Components graph is a DAG - can be calculated in O(n) Tarjan [1972] X poss(X) cert(X) A B C D E F G H J K L {v} {w} {u} {v} {w} {u} {v,w} {w} {v,w} {v,w} ? {v} {w} {u} {v} {w} {u} {w} ? 19 Resolution Algorithm A{v} B{w} C{u} D{v} E{w} F{u} G{v,w} H{w} J{v,w} closed open K{v,w} L Keep 2 sets: closed / open Initialize closed with explicit beliefs MAIN Step 1: if preferred edges from open to closed follow Step 2: else construct SCC graph of open resolve minimum SCCs X poss(X) cert(X) A B C D E F G H J K L {v} {w} {u} {v} {w} {u} {v,w} {w} {v,w} {v,w} ? {v} {w} {u} {v} {w} {u} {w} ? 20 Resolution Algorithm A{v} B{w} C{u} D{v} E{w} F{u} G{v,w} H{w} J{v,w} K{v,w} L{v,w,u} closed open PTIME resolution algorithm O(n2) worst case O(n) on reasonable graphs Keep 2 sets: closed / open Initialize closed with explicit beliefs MAIN Step 1: if preferred edges from open to closed follow Step 2: else construct SCC graph of open resolve minimum SCCs X poss(X) cert(X) A B C D E F G H J K L {v} {w} {u} {v} {w} {u} {v,w} {w} {v,w} {v,w} {v,w,u} {v} {w} {u} {v} {w} {u} {w} 21 Agenda 1. Stable solutions – how to define a unique and consistent solution? 2. Resolution algorithm – how to calculate the solution efficiently? 3. Extensions – how to deal with “negative beliefs”? 22 3 semantics for negative beliefs Agnostic {v+} B Eclectic {w+} G Skeptic {w+} F {u+} H J {w+} {v+} B {w−} A {v−} C {v+} {v−} E {v+} B {w−} A {v−} D Our recommendation {w−} A {v−} D C {v+,w−} {v−,w−} E {w+} F D {} E {u+} H {v−,w−} G C {v+} J {u+,v−,w−} {w+} F {u+} H {} G J {} w/o cycles* O(n) O(n) O(n) w cycles NP-hard** NP-hard** O(n2) with a variation assuming total order on parents for each node ** checking if a belief is possible at a give node is NP-hard, checking if it is certain is co-NP-hard * of resolution algorithm 23 Take-aways automatic conflict resolution Problem • Given explicit beliefs & trust mappings, how to assign consistent value assignment to users? Our solution • • • Stable solutions with possible/certain value semantics PTIME algorithm [O(n2) worst case, O(n) experiments] Several extensions – – – – – negative beliefs: 3 semantics, two hard, one O(n2) bulk inserts agreement checking in the paper & TR consensus value lineage computation Please visit us at the poster session Th, 3:30pm or at: http://db.cs.washington.edu/beliefDB 24 poster 25 1. Conflicts & Trust mappings in Community DBs Background 1: Conflicting beliefs* glyph origin 1 ship hull cow 1 jar 1 fish 2 knot 2 arrow 3 Alice Bob Charlie Bob Charlie Charlie “Beliefs”: annotated (key,value) pairs 80 Alice 50 glyph origin 1 ship hull fish 2 arrow 3 “Implicit belief” Recent work on community databases: Orchestra [SIGMOD’06, VLDB’07] Youtopia [VLDB’09], BeliefDB [VLDB’09] * Bob glyph origin cow 1 fish 2 arrow 3 “Explicit belief” Background 2: Trust mappings Alice Bob (100) Alice Charlie (50) Bob Alice (80) Priorities 100 Current state of knowledge on the Indus Script: Rao et al., Science 324(5931):1165, May 2009 Charlie glyph origin jar 1 knot 2 arrow 3 How to unambiguously assign beliefs to all users? 2. Stable solutions • Priority trust network (TN) – – – – • assume a fixed key users (nodes): A, B, C values (beliefs): v, w, u trust mappings (arcs) from “parents” Stable solution – assignment of values to each node*, s.t. each belief has a “non-dominated lineage” to an explicit belief • Possible / Certain semantics – a stable solution determines, for each node, a possible value (“poss”) – certain value (“cert”) = intersection of all stable solutions, per user * each node with at least one ancestor with explicit belief A:v B:w 10 10 E:u 5 20 Lineage C:v 20 D:v SS1=(A:v, B:w, C:v, D:v, E:u) SS2=(A:v, B:w, C:w, D:w, E:u) X poss(X) cert(X) A B C D E {v} {w} {v,w} {v,w} {u} {v} {w} {u} 3. Logic programs with stable model semantics A Step 1: Binarization B C D A B C E’ E’’ 30 20 10 10 partial order preferred parent E Step 2: Logic program A B C 1: accept all poss of preferred parent P(C,x) P(A,x) F(C,B,y) P(B,y), P(C,x), xy P(C,y) P(B,y), F(C,B,y D non-preferred parent E A B C F(C,A,y) P(A,y), P(C,x), xy P(C,y) P(A,y), F(C,A,y) F(C,B,y) P(B,y), P(C,x), xy P(C,y) P(B,y), F(C,B,y) 2: accept poss from non-preferred parent, that are not conflicting with an existing value 4. Resolution Algorithm (1/2) closed A{v} B{w} C{u} D{v} E F open G H J K L Keep 2 sets: closed / open Initialize closed with explicit beliefs MAIN Step 1: if preferred edges from open to closed follow X poss(X) cert(X) A B C D E F G H J K L {v} {w} {u} {v} ? ? ? ? ? ? ? {v} {w} {u} {v} ? ? ? ? ? ? ? 5. Resolution Algorithm (2/2) closed A{v} B{w} C{u} D{v} E{w} F{u} open “Minimal SCC” can be calculated in O(n) G H{w} J K L Tarjan [1972] PTIME resolution algorithm O(n2) worst case O(n) on reasonable graphs Keep 2 sets: closed / open Initialize closed with explicit beliefs MAIN Step 1: if preferred edges from open to closed follow Step 2: else construct SCC graph of open resolve minimum SCCs X poss(X) cert(X) A B C D E F G H J K L {v} {w} {u} {v} {w} {u} {v,w} {w} {v,w} {v,w} ? {v} {w} {u} {v} {w} {u} {w} ? 6. Detail: Strongly Connected Components (SCCs) For every cyclic or acyclic directed graph: - The Strongly Connected Components graph is a DAG - can be calculated in O(n) Tarjan [1972] A B A SCC1 B “Minimal SCCs”: no incoming edge from other SCC = root node(s) in SCC graph SCC1 C D C D SCC2 E F E SCC2 F SCC3 SCC3 G H SCC4 G H SCC4 7. Experiments on large network data Calculating poss / cert for fixed key - 10 10 DLV: State-of-the art logic programming solver RA: Resolution algorithm Time [sec] - 1 100100 1 1 Network 1: “Oscillators” 0.1 0 … 8 16 size 24 2 Network 2: “Web link data” {v} {v} {v} {v} {v} 10,000 10,000 100,000 1,000,000 100,000 1,000,000 Time [sec] 0.1 0 {v} {v} 1,000 1,000 1 1 Network 3: “Worst case” O(n2) {v} 100 100 10 10 Web data set with 5.4m links between 270k domain names. Approach: • Sample links with increasing ratio • Include both nodes in sample • Assign explicit beliefs randomly {v} 0.01 0 10 10 100100 DLV RA y = 1e-5 x 3 {v} 0.01 0 10 10 100100 DLV RA y = 1e-5 x 100 100 1,000 1,000 10,000 10,000 100,000 1,000,000 100,000 1,000,000 {v} {v} {v} {v} {v} {w} {w} {w} {w} {w} {w} {w} {w} {w} {w} {w} {w} {w} Time [sec] 10 10 1 1 00.1 0.01 0 10 10 DLV RA y = 1e-7 x2 100 100 1,000 1,000 10,000 10,000 100,000 1,000,000 100,000 1,000,000 Size of the network [users + mappings] 8. Three semantics for negative beliefs Our recommendation Agnostic {v+} B Eclectic {v−} D {w+} G {w+} F {u+} H J {w+} {v+} B {w−} A {v−} C {v+} {v−} E {v+} B {w−} A Skeptic {w−} A {v−} D C {v+,w−} {v−,w−} E {w+} F D {} E {u+} H {v−,w−} G C {v+} J {u+,v−,w−} {w+} F {u+} H {} G J {} w/o cycles* O(n) O(n) O(n) w cycles NP-hard** NP-hard** O(n2) with a variation assuming total order on parents for each node ** checking if a belief is possible at a give node is NP-hard, checking if it is certain is co-NP-hard * of resolution algorithm 9. Take-aways automatic conflict resolution Problem • Given explicit beliefs & trust mappings, how to assign consistent value assignment to users? Our solution • • • Stable solutions with possible/certain value semantics PTIME algorithm [O(n2) worst case, O(n) experiments] Several extensions – – – – – negative beliefs: 3 semantics, two hard, one O(n2) bulk inserts agreement checking not covered in the talk consensus value lineage computation Slides soon available on our project page: http://db.cs.washington.edu/beliefDB backup 35 Binarization for Resolution Algorithm* Example Trust Network (TN) 6 nodes, 9 arcs (size 15) 3 explicit beliefs: A:v, B:w, C:u Corresponding Binary TN (BTN) 8 nodes, 12 arcs (size 20) Size increase : ≤ 3 A{v} B{w} C{u} A{v} B{w} C{u} 20 70 60 30 80 D * 120 E 20 100 100 F Note that binarization is not necessary, but greatly simplifies the presentation D’ E’ D E F 36 Stable solutions: example 2 30 • – – – – • 20 Priority trust network (TN) assume a fixed key users (nodes): A, B, C values (beliefs): v, w, u trust mappings (arcs) from “parents” Stable solution – assignment of values to each node*, s.t. each belief has a “non-dominated lineage” to an explicit belief G:? A:? 100 90 B:? 80 C:? 70 70 60 D:v E:w F:u • Certain values – all stable solution determine, for each node, a possible value (“poss”) – certain value (“cert”) = intersection of all stable solutions * each node with at least one ancestor with explicit belief 37 Stable solutions: example 2 30 • – – – – • 20 Priority trust network (TN) assume a fixed key users (nodes): A, B, C values (beliefs): v, w, u trust mappings (arcs) from “parents” Stable solution – assignment of values to each node*, s.t. each belief has a “non-dominated lineage” to an explicit belief G:v A:v 100 90 B:v 80 C: 70 70 60 D:v E:w F:u • Certain values – all stable solution determine, for each node, a possible value (“poss”) – certain value (“cert”) = intersection of all stable solutions * each node with at least one ancestor with explicit belief poss(G) = {v,...} 38 Stable solutions: example 2 30 • – – – – • 20 Priority trust network (TN) assume a fixed key users (nodes): A, B, C values (beliefs): v, w, u trust mappings (arcs) from “parents” Stable solution – assignment of values to each node*, s.t. each belief has a “non-dominated lineage” to an explicit belief G:w A:w 100 90 B:w 80 C: 70 70 60 D:v E:w F:u • Certain values – all stable solution determine, for each node, a possible value (“poss”) – certain value (“cert”) = intersection of all stable solutions * each node with at least one ancestor with explicit belief poss(G) = {v,w,...} 39 Stable solutions: example 2 30 • – – – – • 20 Priority trust network (TN) assume a fixed key users (nodes): A, B, C values (beliefs): v, w, u trust mappings (arcs) from “parents” Stable solution – assignment of values to each node*, s.t. each belief has a “non-dominated lineage” to an explicit belief G:u A:u 100 90 B:u 80 C: 70 70 60 D:v E:w F:u • Certain values – all stable solution determine, for each node, a possible value (“poss”) – certain value (“cert”) = intersection of all stable solutions * each node with at least one ancestor with explicit belief not stable! FG dominated by EG poss(G) = {v,w} cert(G) = 40 O(n2)-worst-case for Resolution Algorithm 41