Selecting Stars: The k Most Representative Skyline Operator

advertisement
Xuemin Lin, Yidong Yuan, Qing Zhang,
Ying Zhang
ICDE 2007
1

Introduction

preliminary

Method
◦ Two-dimensional Space
 Dynamic Programming Based Algorithm
◦ Multi-dimensional Space
 Greedy algorithm
 FM-based Algorithm

Experiment

Conclusion
2
top-k representative skyline
points(Top-k RSP)

Given a set P of points and an integer k, compute
a set S of k skyline points such that |D(S)| is
maximized.
3

Top-1 RSPP6
dominate {p3,p5,p7}

Top-2 RSP P4,P6 dominate {p3,p5,p7}
P2,P6 dominate {p1,p3,p5,p7}
4

mindist=x+y
N7={N3,N4,N5}
N3={i,g,h}
N6={N1,N2}
N1={a,b,c}
N4={l,k}
Action
Heap contents
Skyline
points
Access root
Expand N7
Expand N3
<N7,4><N6,6>
<N3,5><N6,6><N5,8><N4,10>
<i,5><N6,6><h,7><N5,8><N4,10><g,11>
i
Expand N6
<h,7><N5,8><N1,9><N4,10><g,11>
i
Expand N1
<a,10><N4,10><g,11><b,12><c,12>
i,a
Expand N4
<k,10><g,11><b,12><c,12><l,14>
i,a,k
5

Δ(si, sj) denotes the set of data points that
are dominated by si but not dominated by sj
 Eeee
1
1
j <i
6
7
8
9

top-2 RSP={S1,S2}
10


Greedy Algorithm
FM-based Algorithm
11
12


BBS computes skyline points
FM sketches estimate every |D({Sp})|
13


FM algorithm proposed a bitmap based
algorithm that can efficiently estimate the
number of distinct elements (data points).
h() is a randomly generated hash function
which hashes each elementID into an integer
in bitmap.
14
Give a bitmap B of length L
[0…..L-1]
 L=8B=00000000(bitmap)
 h(p)=3= 00000011(binary)
Only keep least significant bit 100000001
p.fm=10000000
 h(q)=6=0000011000000010
q.fm=01000000
S={p,q}
S.fm=10000000 V 01000000 =11000000

15

100000001,3,5….


1, 11, 111
010000002,6,10….
 10,110,1010…

001000004,12,20….
 100,1100,10100


11100000
find min(B):The leftmost bit value = 0
2min(B)/0.7735=8/0.7735=10.34
16
:number
of hash function
17
action
heap
skyline
Access root
<e6,5> ,<e7,7>
none
Expand e6
<e1,5>,<e7,7>, <e2,8> ,<e3,13>
none
Expand e1
<S2,6>, <e7,7>, <e2,8> ,<S1,8>, <P1,9.5> ,<e3,13>
S2
Expand e7
<e2,8>,<S1,8>,<e4,8>,…..
S2
L=4,
=1
(S2.fm V e2.fm)
Skyline points:S2,S1,S3,S4,S5
18



Add H2
Skyline points:S2,S1,S3,S4,S5
Avoid H2 not too big
when
delete e
e2H2 maxdist=11.5
 S3H1 mindist =8.5
 S4H1 mindist =12
11.5
12
e2 remove from H2

19
20
21
22


Present an efficient dynamic programming
based exact algorithm in a 2d-space.
An efficient, scalable, index-based
randomized algorithm is developed by
applying the FM probabilistic counting
technique.
23
Download