Data Conflict Resolution Using Trust Mappings (presentation)

advertisement
Data Conflict Resolution
Using Trust Mappings
Wolfgang Gatterbauer & Dan Suciu
University of Washington, Seattle
June 8, Sigmod 2010
Project web page: http://db.cs.washington.edu/beliefDB
Conflicts & Trust mappings in Community DBs
Indus script*
Background 1: Conflicting beliefs
glyph origin
1 ship hull
cow
1
jar
1
fish
2
knot
2
arrow
3
Alice
Bob
Charlie
Bob
Charlie
Charlie
“Beliefs”: annotated
(key,value) pairs
80
Alice
50
glyph origin
1 ship hull
fish
2
arrow
3
“Implicit belief”
Recent work on community databases:
Orchestra [SIGMOD’06, VLDB’07]
Youtopia [VLDB’09], BeliefDB [VLDB’09]
*
Bob
glyph origin
cow
1
fish
2
arrow
3
“Explicit belief”
Background 2: Trust mappings
Alice  Bob
(100)
Alice  Charlie
(50)
Bob  Alice
(80)
Priorities
100
Current state of knowledge on the Indus Script: Rao et al., Science 324(5931):1165, May 2009
Charlie
glyph origin
jar
1
knot
2
arrow
3
2
Problems due to transient effects
1. Incorrect inserts
– Value depends on order of inserts
100
Bob
glyph origin
cow t3
Alice would have
preferred Bob’s
value over Charlie’s
Alice
50
glyph origin
jar
t2
Charlie
glyph origin
jar t1
3
Problems due to transient effects
1. Incorrect inserts
– Value depends on order of inserts
100
80
glyph origin
jar
t3
2. Incorrect updates
– Mis-handling of revokes
Alice and Bob trust each
other most, but have lost
“justification” for their beliefs
Bob
Alice
50
glyph origin
jar
t2
This paper:
Automatic conflict resolution with trust mappings:
1. How to define a globally consistent solution?
2. How to calculate it efficiently?
3. Some extensions
Charlie
glyph origin
jar t1
cow t4
4
Agenda
1. Stable solutions
– how to define a unique and consistent solution?
2. Resolution algorithm
– how to calculate the solution efficiently?
3. Extensions
– how to deal with “negative beliefs”?
5
Stable solutions
•
Priority trust network (TN)
–
–
–
–
•
assume a fixed key
users (nodes): A, B, C
values (beliefs): v, w, u
trust mappings (arcs) from “parents”
Stable solution
– assignment of values to each node*,
s.t. each belief has a “non-dominated
lineage” to an explicit belief
*
each node with at least one ancestor with explicit belief
User A believes value v
A:v
B:w
10
10
20
C:?
20
User D is
user C’s
“preferred
parent”
D:?
User C has no explicit belief
6
Stable solutions
•
Priority trust network (TN)
–
–
–
–
•
assume a fixed key
users (nodes): A, B, C
values (beliefs): v, w, u
trust mappings (arcs) from “parents” Lineage
Stable solution
– assignment of values to each node*,
s.t. each belief has a “non-dominated
lineage” to an explicit belief
*
each node with at least one ancestor with explicit belief
A:v
B:w
10
10
20
C:v
20
D:v
SS1=(A:v, B:w, C:v, D:v)
7
Stable solutions
•
Priority trust network (TN)
–
–
–
–
•
assume a fixed key
users (nodes): A, B, C
values (beliefs): v, w, u
trust mappings (arcs) from “parents”
Stable solution
– assignment of values to each node*,
s.t. each belief has a “non-dominated
lineage” to an explicit belief
•
Possible / Certain semantics
– a stable solution determines, for each
node, a possible value (“poss”)
– certain value (“cert”) = intersection of
all stable solutions, per user
*
each node with at least one ancestor with explicit belief
A:v
B:w
10
10
20
C:w
20
D:w
SS1=(A:v, B:w, C:v, D:v)
SS2=(A:v, B:w, C:w, D:w)
X
poss(X) cert(X)
A
B
C
D
{v}
{w}
{v,w}
{v,w}
{v}
{w}


8
Stable solutions
•
Priority trust network (TN)
–
–
–
–
•
Parent “B:w (10)” dominates
and is inconsistent with “E:u (5)”
assume a fixed key
users (nodes): A, B, C
values (beliefs): v, w, u
trust mappings (arcs) from “parents”
Stable solution
•
Possible / Certain semantics
– a stable solution determines, for each
node, a possible value (“poss”)
– certain value (“cert”) = intersection of
all stable solutions, per user
*
each node with at least one ancestor with explicit belief
B:w
10
10
E:u
5
20
C:u
Is this a stable solution?
– assignment of values to each node*,
s.t. each belief has a “non-dominated
lineage” to an explicit belief
A:v
20
D:u
SS1=(A:v, B:w, C:v, D:v, E:u)
SS2=(A:v, B:w, C:w, D:w, E:u)
SS3=(A:v, B:w, C:u, D:u, E:u)
No!
X poss(X) cert(X)
A
B
C
D
E
{v}
{w}
{v,w}
{v,w}
{u}
{v}
{w}


{u}
Now how to
calculate
poss / cert ?
9
Logic programs (LP) with stable model semantics
LP & Stable model semantics
 “Declarative imperative”*
But solving LPs is hard 
10,000
1,000
100
Time [sec]
 Natural correspondence
Brave (credulous) reasoning
~ possible tuple semantics
Cautious (skeptical) reasoning
~ certain tuple semantics
10
1
0.1
0
0.01
0 0
0
 Previous work on consistent query
answering & peer data exchange
Greco et al. [TKDE’03]
Arenas et al. [TLP’03]
Barcelo, Bertossi [PADL’03]
Bertossi, Bravo [LPAR’07]
*
**
keynote Joe Hellerstein
size of the network = users + mappings; simple network of several “osciallators” (see paper)
DLV
50
50
Size
of
100
150
200
100network
150 ** 200
the
State-of-the-art LP solver
How can we calculate
poss / cert efficiently?
10
Agenda
1. Stable solutions
– how to define a unique and consistent solution?
2. Resolution algorithm
– how to calculate the solution efficiently?
3. Extensions
– how to deal with “negative beliefs”?
11
Resolution Algorithm
 Keep 2 sets: closed / open
Initialize closed with explicit beliefs
Focus on binary trust network
closed
A{v}
B{w}
C{u}
open
D
E
G
H
J
K
F
L
preferred
non-preferred X
A
B
C
D
E
F
G
H
J
K
L
poss(X) cert(X)
{v}
{w}
{u}
?
?
?
?
?
?
?
?
{v}
{w}
{u}
?
?
?
?
?
?
?
?
12
Resolution Algorithm
closed
A{v}
B{w}
C{u}
D
E
F
 Keep 2 sets: closed / open
Initialize closed with explicit beliefs
 MAIN
Step 1: if  preferred edges from
open to closed
 follow
open
G
H
J
K
L
X
poss(X) cert(X)
A
B
C
D
E
F
G
H
J
K
L
{v}
{w}
{u}
?
?
?
?
?
?
?
?
{v}
{w}
{u}
?
?
?
?
?
?
?
?
13
Resolution Algorithm
closed
A{v}
B{w}
C{u}
D{v}
E
F
open
G
H
J
K
L
 Keep 2 sets: closed / open
Initialize closed with explicit beliefs
 MAIN
Step 1: if  preferred edges from
open to closed
 follow
X
poss(X) cert(X)
A
B
C
D
E
F
G
H
J
K
L
{v}
{w}
{u}
{v}
?
?
?
?
?
?
?
{v}
{w}
{u}
{v}
?
?
?
?
?
?
?
14
Resolution Algorithm
closed
A{v}
B{w}
C{u}
D{v}
E{w}
F
open
G
H
J
K
L
 Keep 2 sets: closed / open
Initialize closed with explicit beliefs
 MAIN
Step 1: if  preferred edges from
open to closed
 follow
X
poss(X) cert(X)
A
B
C
D
E
F
G
H
J
K
L
{v}
{w}
{u}
{v}
{w}
?
?
?
?
?
?
{v}
{w}
{u}
{v}
{w}
?
?
?
?
?
?
15
Resolution Algorithm
closed
A{v}
B{w}
C{u}
D{v}
E{w}
F{u}
open
G
H
J
K
L
 Keep 2 sets: closed / open
Initialize closed with explicit beliefs
 MAIN
Step 1: if  preferred edges from
open to closed
 follow
X
poss(X) cert(X)
A
B
C
D
E
F
G
H
J
K
L
{v}
{w}
{u}
{v}
{w}
{u}
?
?
?
?
?
{v}
{w}
{u}
{v}
{w}
{u}
?
?
?
?
?
16
Resolution Algorithm
closed
A{v}
B{w}
C{u}
D{v}
E{w}
F{u}
open
G
H{w}
J
K
L
 Keep 2 sets: closed / open
Initialize closed with explicit beliefs
 MAIN
Step 1: if  preferred edges from
open to closed
 follow
X
poss(X) cert(X)
A
B
C
D
E
F
G
H
J
K
L
{v}
{w}
{u}
{v}
{w}
{u}
?
{w}
?
?
?
{v}
{w}
{u}
{v}
{w}
{u}
?
{w}
?
?
?
17
Resolution Algorithm
closed
A{v}
B{w}
C{u}
D{v}
E{w}
F{u}
 Keep 2 sets: closed / open
Initialize closed with explicit beliefs
 MAIN
Step 1: if  preferred edges from
open to closed
 follow
Step 2: else
 construct SCC graph of open
open
“Minimal SCC”
no incoming
edge from
other SCC
G
H{w}
J
K
L
For every cyclic or acyclic directed graph:
- The Strongly Connected Components graph is a DAG
- can be calculated in O(n) Tarjan [1972]
X
poss(X) cert(X)
A
B
C
D
E
F
G
H
J
K
L
{v}
{w}
{u}
{v}
{w}
{u}
?
{w}
?
?
?
{v}
{w}
{u}
{v}
{w}
{u}
?
{w}
?
?
?
18
Resolution Algorithm
closed
A{v}
B{w}
C{u}
D{v}
E{w}
F{u}
 Keep 2 sets: closed / open
Initialize closed with explicit beliefs
 MAIN
Step 1: if  preferred edges from
open to closed
 follow
Step 2: else
 construct SCC graph of open
 resolve minimum SCCs
open
G{v,w} H{w}
“Minimal SCC”
no incoming
edge from
other SCC
J{v,w}
K{v,w}
L
For every cyclic or acyclic directed graph:
- The Strongly Connected Components graph is a DAG
- can be calculated in O(n) Tarjan [1972]
X
poss(X) cert(X)
A
B
C
D
E
F
G
H
J
K
L
{v}
{w}
{u}
{v}
{w}
{u}
{v,w}
{w}
{v,w}
{v,w}
?
{v}
{w}
{u}
{v}
{w}
{u}

{w}


?
19
Resolution Algorithm
A{v}
B{w}
C{u}
D{v}
E{w}
F{u}
G{v,w} H{w}
J{v,w}
closed
open
K{v,w}
L
 Keep 2 sets: closed / open
Initialize closed with explicit beliefs
 MAIN
Step 1: if  preferred edges from
open to closed
 follow
Step 2: else
 construct SCC graph of open
 resolve minimum SCCs
X
poss(X) cert(X)
A
B
C
D
E
F
G
H
J
K
L
{v}
{w}
{u}
{v}
{w}
{u}
{v,w}
{w}
{v,w}
{v,w}
?
{v}
{w}
{u}
{v}
{w}
{u}

{w}


?
20
Resolution Algorithm
A{v}
B{w}
C{u}
D{v}
E{w}
F{u}
G{v,w} H{w}
J{v,w}
K{v,w}
L{v,w,u}
closed
open
PTIME resolution algorithm
O(n2) worst case
O(n) on reasonable graphs
 Keep 2 sets: closed / open
Initialize closed with explicit beliefs
 MAIN
Step 1: if  preferred edges from
open to closed
 follow
Step 2: else
 construct SCC graph of open
 resolve minimum SCCs
X
poss(X) cert(X)
A
B
C
D
E
F
G
H
J
K
L
{v}
{w}
{u}
{v}
{w}
{u}
{v,w}
{w}
{v,w}
{v,w}
{v,w,u}
{v}
{w}
{u}
{v}
{w}
{u}

{w}



21
Agenda
1. Stable solutions
– how to define a unique and consistent solution?
2. Resolution algorithm
– how to calculate the solution efficiently?
3. Extensions
– how to deal with “negative beliefs”?
22
3 semantics for negative beliefs
Agnostic
{v+}
B
Eclectic
{w+} G
Skeptic
{w+}
F
{u+}
H
J {w+}
{v+}
B
{w−}
A
{v−}
C {v+}
{v−} E
{v+}
B
{w−}
A
{v−}
D
Our recommendation
{w−}
A
{v−}
D
C {v+,w−}
{v−,w−} E
{w+}
F
D
{} E
{u+}
H
{v−,w−} G
C {v+}
J {u+,v−,w−}
{w+}
F
{u+}
H
{} G
J {}
w/o cycles*
O(n)
O(n)
O(n)
w cycles
NP-hard**
NP-hard**
O(n2)
with a variation
assuming total order on parents for each node
** checking if a belief is possible at a give node is NP-hard, checking if it is certain is co-NP-hard
*
of resolution algorithm
23
Take-aways automatic conflict resolution
Problem
•
Given explicit beliefs & trust mappings, how to assign
consistent value assignment to users?
Our solution
•
•
•
Stable solutions with possible/certain value semantics
PTIME algorithm [O(n2) worst case, O(n) experiments]
Several extensions
–
–
–
–
–
negative beliefs: 3 semantics, two hard, one O(n2)
bulk inserts
agreement checking
in the paper & TR
consensus value
lineage computation
Please visit us at the poster session Th, 3:30pm
or at: http://db.cs.washington.edu/beliefDB
24
poster
25
1. Conflicts & Trust mappings in Community DBs
Background 1: Conflicting beliefs*
glyph origin
1 ship hull
cow
1
jar
1
fish
2
knot
2
arrow
3
Alice
Bob
Charlie
Bob
Charlie
Charlie
“Beliefs”: annotated
(key,value) pairs
80
Alice
50
glyph origin
1 ship hull
fish
2
arrow
3
“Implicit belief”
Recent work on community databases:
Orchestra [SIGMOD’06, VLDB’07]
Youtopia [VLDB’09], BeliefDB [VLDB’09]
*
Bob
glyph origin
cow
1
fish
2
arrow
3
“Explicit belief”
Background 2: Trust mappings
Alice  Bob
(100)
Alice  Charlie
(50)
Bob  Alice
(80)
Priorities
100
Current state of knowledge on the Indus Script: Rao et al., Science 324(5931):1165, May 2009
Charlie
glyph origin
jar
1
knot
2
arrow
3
How to unambiguously
assign beliefs to all users?
2. Stable solutions
•
Priority trust network (TN)
–
–
–
–
•
assume a fixed key
users (nodes): A, B, C
values (beliefs): v, w, u
trust mappings (arcs) from “parents”
Stable solution
– assignment of values to each node*,
s.t. each belief has a “non-dominated
lineage” to an explicit belief
•
Possible / Certain semantics
– a stable solution determines, for each
node, a possible value (“poss”)
– certain value (“cert”) = intersection of
all stable solutions, per user
*
each node with at least one ancestor with explicit belief
A:v
B:w
10
10
E:u
5
20
Lineage
C:v
20
D:v
SS1=(A:v, B:w, C:v, D:v, E:u)
SS2=(A:v, B:w, C:w, D:w, E:u)
X
poss(X) cert(X)
A
B
C
D
E
{v}
{w}
{v,w}
{v,w}
{u}
{v}
{w}


{u}
3. Logic programs with stable model semantics
A
Step 1:
Binarization
B
C
D
A
B
C
E’
E’’
30 20 10 10
partial order
preferred
parent
E
Step 2:
Logic program
A
B
C
1: accept all poss of preferred parent
P(C,x)  P(A,x)
F(C,B,y)  P(B,y), P(C,x), xy
P(C,y)  P(B,y), F(C,B,y
D
non-preferred
parent
E
A
B
C
F(C,A,y)  P(A,y), P(C,x), xy
P(C,y)  P(A,y), F(C,A,y)
F(C,B,y)  P(B,y), P(C,x), xy
P(C,y)  P(B,y), F(C,B,y)
2: accept poss from non-preferred parent, that are not conflicting with an existing value
4. Resolution Algorithm (1/2)
closed
A{v}
B{w}
C{u}
D{v}
E
F
open
G
H
J
K
L
 Keep 2 sets: closed / open
Initialize closed with explicit beliefs
 MAIN
Step 1: if  preferred edges from
open to closed
 follow
X
poss(X) cert(X)
A
B
C
D
E
F
G
H
J
K
L
{v}
{w}
{u}
{v}
?
?
?
?
?
?
?
{v}
{w}
{u}
{v}
?
?
?
?
?
?
?
5. Resolution Algorithm (2/2)
closed
A{v}
B{w}
C{u}
D{v}
E{w}
F{u}
open
“Minimal SCC”
can be calculated in O(n)
G
H{w}
J
K
L
Tarjan [1972]
PTIME resolution algorithm
O(n2) worst case
O(n) on reasonable graphs
 Keep 2 sets: closed / open
Initialize closed with explicit beliefs
 MAIN
Step 1: if  preferred edges from
open to closed
 follow
Step 2: else
 construct SCC graph of open
 resolve minimum SCCs
X
poss(X) cert(X)
A
B
C
D
E
F
G
H
J
K
L
{v}
{w}
{u}
{v}
{w}
{u}
{v,w}
{w}
{v,w}
{v,w}
?
{v}
{w}
{u}
{v}
{w}
{u}

{w}


?
6. Detail: Strongly Connected Components (SCCs)
For every cyclic or acyclic directed graph:
- The Strongly Connected Components graph is a DAG
- can be calculated in O(n)
Tarjan [1972]
A
B
A
SCC1
B
“Minimal SCCs”: no incoming
edge from other SCC
= root node(s) in SCC graph
SCC1
C
D
C
D
SCC2
E
F
E
SCC2
F
SCC3
SCC3
G
H
SCC4
G
H
SCC4
7. Experiments on large network data
Calculating poss / cert for fixed key
-
10 10
DLV: State-of-the art logic programming solver
RA: Resolution algorithm
Time [sec]
-
1
100100
1 1
Network 1: “Oscillators”
0.1
0
…
8
16
size
24
2
Network 2: “Web link data”
{v}
{v}
{v}
{v}
{v}
10,000
10,000
100,000 1,000,000
100,000 1,000,000
Time [sec]
0.1
0
{v}
{v}
1,000
1,000
1 1
Network 3: “Worst case” O(n2)
{v}
100
100
10 10
Web data set with 5.4m links between
270k domain names. Approach:
• Sample links with increasing ratio
• Include both nodes in sample
• Assign explicit beliefs randomly
{v}
0.01
0 10
10
100100
DLV
RA
y = 1e-5 x
3
{v}
0.01
0 10
10
100100
DLV
RA
y = 1e-5 x
100
100
1,000
1,000
10,000
10,000
100,000 1,000,000
100,000 1,000,000
{v}
{v}
{v}
{v}
{v}
{w}
{w}
{w}
{w}
{w}
{w}
{w}
{w}
{w}
{w}
{w}
{w}
{w}
Time [sec]
10 10
1 1
00.1
0.01
0 10
10
DLV
RA
y = 1e-7 x2
100
100
1,000
1,000
10,000
10,000
100,000 1,000,000
100,000 1,000,000
Size of the network [users + mappings]
8. Three semantics for negative beliefs Our recommendation
Agnostic
{v+}
B
Eclectic
{v−}
D
{w+} G
{w+}
F
{u+}
H
J {w+}
{v+}
B
{w−}
A
{v−}
C {v+}
{v−} E
{v+}
B
{w−}
A
Skeptic
{w−}
A
{v−}
D
C {v+,w−}
{v−,w−} E
{w+}
F
D
{} E
{u+}
H
{v−,w−} G
C {v+}
J {u+,v−,w−}
{w+}
F
{u+}
H
{} G
J {}
w/o cycles*
O(n)
O(n)
O(n)
w cycles
NP-hard**
NP-hard**
O(n2)
with a variation
assuming total order on parents for each node
** checking if a belief is possible at a give node is NP-hard, checking if it is certain is co-NP-hard
*
of resolution algorithm
9. Take-aways automatic conflict resolution
Problem
•
Given explicit beliefs & trust mappings, how to assign
consistent value assignment to users?
Our solution
•
•
•
Stable solutions with possible/certain value semantics
PTIME algorithm [O(n2) worst case, O(n) experiments]
Several extensions
–
–
–
–
–
negative beliefs: 3 semantics, two hard, one O(n2)
bulk inserts
agreement checking
not covered in the talk
consensus value
lineage computation
Slides soon available on our project page:
http://db.cs.washington.edu/beliefDB
backup
35
Binarization for Resolution Algorithm*
Example Trust Network (TN)
6 nodes, 9 arcs (size 15)
3 explicit beliefs: A:v, B:w, C:u
Corresponding Binary TN (BTN)
8 nodes, 12 arcs (size 20)
Size increase : ≤ 3
A{v}
B{w}
C{u}
A{v}
B{w}
C{u}
20
70
60
30
80
D
*
120
E
20
100
100
F
Note that binarization is not necessary, but greatly simplifies the presentation
D’
E’
D
E
F
36
Stable solutions: example 2
30
•
–
–
–
–
•
20
Priority trust network (TN)
assume a fixed key
users (nodes): A, B, C
values (beliefs): v, w, u
trust mappings (arcs) from “parents”
Stable solution
– assignment of values to each node*,
s.t. each belief has a “non-dominated
lineage” to an explicit belief
G:?
A:?
100
90
B:?
80
C:?
70
70
60
D:v
E:w
F:u
•
Certain values
– all stable solution determine, for each
node, a possible value (“poss”)
– certain value (“cert”) = intersection of
all stable solutions
*
each node with at least one ancestor with explicit belief
37
Stable solutions: example 2
30
•
–
–
–
–
•
20
Priority trust network (TN)
assume a fixed key
users (nodes): A, B, C
values (beliefs): v, w, u
trust mappings (arcs) from “parents”
Stable solution
– assignment of values to each node*,
s.t. each belief has a “non-dominated
lineage” to an explicit belief
G:v
A:v
100
90
B:v
80
C:
70
70
60
D:v
E:w
F:u
•
Certain values
– all stable solution determine, for each
node, a possible value (“poss”)
– certain value (“cert”) = intersection of
all stable solutions
*
each node with at least one ancestor with explicit belief
poss(G) = {v,...}
38
Stable solutions: example 2
30
•
–
–
–
–
•
20
Priority trust network (TN)
assume a fixed key
users (nodes): A, B, C
values (beliefs): v, w, u
trust mappings (arcs) from “parents”
Stable solution
– assignment of values to each node*,
s.t. each belief has a “non-dominated
lineage” to an explicit belief
G:w
A:w
100
90
B:w
80
C:
70
70
60
D:v
E:w
F:u
•
Certain values
– all stable solution determine, for each
node, a possible value (“poss”)
– certain value (“cert”) = intersection of
all stable solutions
*
each node with at least one ancestor with explicit belief
poss(G) = {v,w,...}
39
Stable solutions: example 2
30
•
–
–
–
–
•
20
Priority trust network (TN)
assume a fixed key
users (nodes): A, B, C
values (beliefs): v, w, u
trust mappings (arcs) from “parents”
Stable solution
– assignment of values to each node*,
s.t. each belief has a “non-dominated
lineage” to an explicit belief
G:u
A:u
100
90
B:u
80
C:
70
70
60
D:v
E:w
F:u
•
Certain values
– all stable solution determine, for each
node, a possible value (“poss”)
– certain value (“cert”) = intersection of
all stable solutions
*
each node with at least one ancestor with explicit belief
not stable!
FG dominated by EG
poss(G) = {v,w}
cert(G) = 
40
O(n2)-worst-case for Resolution Algorithm
41
Download