Monochromatic reverse top-k

Reverse Top-k Queries Akrivi Vlachou*, Christos Doulkeridis*, Yannis Kotidis#, Kjetil Nørvåg* *Norwegian University of Science and Technology (NTNU), Trondheim, Norway #Athens University of Economics and Business (AUEB), Greece Outline  Motivation & Preliminaries  Monochromatic Reverse Top-k Queries  Bichromatic Reverse Top-k Queries  Threshold-based Algorithm  Materialized Views  Experimental Evaluation  Conclusions & Future Work 2 Rank-aware Query Processing  Huge amount of available data  Users prefer to retrieve a limited set of k ranked data objects that best match their preferences (top-k queries) 3 Top-k Query   Given a scoring function f(), retrieve the k object that best match the user preferences Linear scoring function f w(p) = Σw[i]*p[i]  Weight w[i]:   relative importance of attribute i Definition TOPk(w): Given a weighting vector w and a positive integer k, find the k data points p with the minimum f(p) scores Query line of w at point p: defines the score of p Query space of w defined by point p: number of enclosed points determines the rank of p 4 Reversing the Top-k Query  From the perspective of manufacturers:  it is important that a product is returned in the highest ranked positions for as many user preferences as possible  estimate the impact of a product compared to their competitors products  advertise a product to potential customers Which customers would be interested? sales representative customer customer customer customer 5 Reversing the Top-k Query  Reverse top-k query: Given a potential product q and a positive integer k, which are the weighting vectors w for which q is in the top-k query result set?  Two different versions Monochromatic:  sales representative no knowledge of user preferences Bichromatic:  a dataset with user preferences is given customer customer customer customer 6 Car Database Example     A database containing information about different cars Different users have different preferences Bob prefers a cheap car, and does not care much about the age  the best choice (top-1) for Bob is the car p1 with score 2.5 Tom prefers a newer car rather than a cheap car  the best choice for Tom and Max is the car p2 7 Car Database Example Query point q=p2, k=1:  Bichromatic reverse top-k: {(0.2,0.8), (0.5,0.5)}   advertise product to Tom and Max Monochromatic reverse top-k: line segment w[price]=[1/7,5/6]  estimate the impact of p2 as 69% Query point q=p3, k=1: empty result set for the bichromatic query 8 Outline  Motivation & Preliminaries  Monochromatic Reverse Top-k Queries  Bichromatic Reverse Top-k Queries  Threshold-based Algorithm  Materialized Views  Experimental Evaluation  Conclusions & Future Work 9 Monochromatic Reverse Top-k Query    mRTOPk(q): Given a point q, a positive number k and a dataset S, the result set of the monochromatic reverse top-k query is the locus for which there exists p in TOPk(wi) such that fwi(q) ≤ fwi(p). The solution space W can be split into a finite set of nonadjacent partitions such that query point q has the same rank for all the weighting vectors. For the monochromatic case: we focus on the 2-d space 2 mRTOP1(q) 1 2 Solution space 10 Geometric Interpretation d=2, k =1  If q belongs to the convex hull, then there exists exactly one partition in mRTOP1(q)    Weighting vectors that are perpendicular to pq and qr define the line segment For weighting vectors with smaller and larger slopes than w1, the relative order of p and q changes Monochromatic reverse top-k, k>1:  The solution space may contain more than 1 partition 11 Outline  Motivation & Preliminaries  Monochromatic Reverse Top-k Queries  Bichromatic Reverse Top-k Queries  Threshold-based Algorithm  Materialized Views  Experimental Evaluation  Conclusions & Future Work 14 Bichromatic Reverse Top-k Query  bRTOPk(q): Given a point q, a positive number k and two datasets S and W, where S represents data points and W is a dataset containing different weighting vectors, a weighting vector wi belongs to the result set, if and only if there exists p in TOPk(wi) such that fwi(q) ≤ fwi(p)  Naïve approach:  for each weighting vector process the top-k query  test if query point q is in the top-k list 15 Threshold-based Algorithm (RTA)  Goal:   reduce the number of top-k evaluations by discarding weighting vectors Threshold-based Algorithm (RTA):  sort the weighting vectors based on pairwise similarity  top-k queries defined by similar vectors, have similar result sets evaluate the first top-k query, calculate a threshold  For each weighting vector  possibly prune based on threshold  refine threshold  16 Example of RTA Algorithm (k=2) Buffer: p1, p2  Evaluate top-2 query for w1  Set threshold based on w2  fw2(q) > threshold  discard w2  Refine threshold for w3 p9 p8 p10 p5 p1 p6 p4 w3 p w1 2 w2 p7 p3 q W=[ w1, w2, w3 ] 17 Materialized Views  Threshold-based Algorithm (RTA) reduce the top-k evaluations by discarding some weighting vectors that are not in the reverse top-k result set  process at least as many top-k evaluations as the cardinality of the result set   Materialized Views  find weighting vectors that belong definitely to the result without top-k evaluation 18 Materialized Views  Grid-based space partitioning  w1, w2, w3 cell Ci lower left corner CiL  upper right corner CiU   We store for each cell Ci the results of reverse top-k queries for corners CiL and CiU 19 Materialized Views  Given a point q enclosed in cell Ci  all weighting vectors in RTOPk(CiU) belong to the result set of q  only weighting vectors in w1, w2, w3 RTOPk(CiL) - RTOPk(CiU) have to be examined  Materialized views can be generalized for arbitrary k<K values w1, w2, w3 , w4 20 Outline  Motivation & Preliminaries  Monochromatic Reverse Top-k Queries  Bichromatic Reverse Top-k Queries  Threshold-based Algorithm  Materialized Views  Experimental Evaluation  Conclusions & Future Work 21 Experimental Setup  Comparison between Naïve and RTA (varying dimensionality, cardinality, data distribution – real data)  Queries: uniform and k-skyband points  Metrics: time  I/Os  number of top-k evaluations  22 RTA vs. Naïve uniform distribution of S and uniform weights W |S|=10K, |W|=10K, top-k=10, skyband query points   RTA outperforms naive by 1 to 2 orders of magnitude as dimensionality increases, |RTOPk(q)| decreases leading to fewer top-k evaluations 23 Scalability of RTA Algorithm various distributions (UN, AC, CO) of S and uniform weights W |S|=10K or |W|=10K, d=5, top-k=10, skyband query points   naive requires |W| top-k query evaluations |W|=5K, correlated dataset:  RTA needs on 544 out of 5000 top-k evaluations (saves 89.12% of the cost)  the average size of the result set is 459 24 Performance of RTA on Real Data NBA consists of 17265 tuples, d=5 (number of points scored, rebounds, assists, steals and blocks) HOUSE consists of 127930 tuples, d=6 (income spent on gas, electricity, water, heating, insurance, and property tax)   uniform and clustered weights W (|W|=10K) clustered weights lead to fewer top-k evaluations 25 Outline  Motivation & Preliminaries  Example of Reverse Top-k Queries  Monochromatic Reverse Top-k Queries  Bichromatic Reverse Top-k Queries  Threshold-based Algorithm  Materialized Views  Experimental Evaluation  Conclusions & Future Work 26 Conclusions and Future Work  We introduced reverse top-k queries  geometric interpretation of the solution space  efficient algorithm for bichromatic reverse top-k query  materialized reverse top-k views  Future Work  interpretation of solution space for higher dimensions (monochromatic reverse top-k)  improve the performance of the bichromatic reverse top-k computation 27 Thank you! Related work: Akrivi Vlachou, Christos Doulkeridis, Yannis Kotidis, Kjetil Nørvåg: "Reverse Top-k Queries" Akrivi Vlachou, Christos Doulkeridis, Kjetil Nørvåg, Yannis Kotidis: "Identifying the Most Influential Data Objects with Reverse Top-k Queries" More information: http://www.idi.ntnu.no/~vlachou/ 28

Monochromatic reverse top-k

Related documents

Products

Support

Monochromatic reverse top-k

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib