Xuemin Lin, Yidong Yuan, Qing Zhang, Ying Zhang ICDE 2007 1 Introduction preliminary Method ◦ Two-dimensional Space Dynamic Programming Based Algorithm ◦ Multi-dimensional Space Greedy algorithm FM-based Algorithm Experiment Conclusion 2 top-k representative skyline points(Top-k RSP) Given a set P of points and an integer k, compute a set S of k skyline points such that |D(S)| is maximized. 3 Top-1 RSPP6 dominate {p3,p5,p7} Top-2 RSP P4,P6 dominate {p3,p5,p7} P2,P6 dominate {p1,p3,p5,p7} 4 mindist=x+y N7={N3,N4,N5} N3={i,g,h} N6={N1,N2} N1={a,b,c} N4={l,k} Action Heap contents Skyline points Access root Expand N7 Expand N3 <N7,4><N6,6> <N3,5><N6,6><N5,8><N4,10> <i,5><N6,6><h,7><N5,8><N4,10><g,11> i Expand N6 <h,7><N5,8><N1,9><N4,10><g,11> i Expand N1 <a,10><N4,10><g,11><b,12><c,12> i,a Expand N4 <k,10><g,11><b,12><c,12><l,14> i,a,k 5 Δ(si, sj) denotes the set of data points that are dominated by si but not dominated by sj Eeee 1 1 j <i 6 7 8 9 top-2 RSP={S1,S2} 10 Greedy Algorithm FM-based Algorithm 11 12 BBS computes skyline points FM sketches estimate every |D({Sp})| 13 FM algorithm proposed a bitmap based algorithm that can efficiently estimate the number of distinct elements (data points). h() is a randomly generated hash function which hashes each elementID into an integer in bitmap. 14 Give a bitmap B of length L [0…..L-1] L=8B=00000000(bitmap) h(p)=3= 00000011(binary) Only keep least significant bit 100000001 p.fm=10000000 h(q)=6=0000011000000010 q.fm=01000000 S={p,q} S.fm=10000000 V 01000000 =11000000 15 100000001,3,5…. 1, 11, 111 010000002,6,10…. 10,110,1010… 001000004,12,20…. 100,1100,10100 11100000 find min(B):The leftmost bit value = 0 2min(B)/0.7735=8/0.7735=10.34 16 :number of hash function 17 action heap skyline Access root <e6,5> ,<e7,7> none Expand e6 <e1,5>,<e7,7>, <e2,8> ,<e3,13> none Expand e1 <S2,6>, <e7,7>, <e2,8> ,<S1,8>, <P1,9.5> ,<e3,13> S2 Expand e7 <e2,8>,<S1,8>,<e4,8>,….. S2 L=4, =1 (S2.fm V e2.fm) Skyline points:S2,S1,S3,S4,S5 18 Add H2 Skyline points:S2,S1,S3,S4,S5 Avoid H2 not too big when delete e e2H2 maxdist=11.5 S3H1 mindist =8.5 S4H1 mindist =12 11.5 12 e2 remove from H2 19 20 21 22 Present an efficient dynamic programming based exact algorithm in a 2d-space. An efficient, scalable, index-based randomized algorithm is developed by applying the FM probabilistic counting technique. 23