Finding Top-k Profitable Products Qian Wan, Raymond Chi-Wing Wong, Yu Peng The Hong Kong University of Science & Technology Prepared by Yu Peng Product Manager’s Dilemma ipad3 ipad 2 ipad Product Manager’s Dilemma Product Manager’s Dilemma ipad : $ 499 Suit: $600 Product Manager’s Dilemma Product Manager’s Dilemma Weigth (g) Processor Camera Price ($) ipad 730/608 Apple A4 None 499->399 ipad 2 613/601 Apple A5 2 499 ipad 3 ? ? ? ? Weigth (g) Processor Camera Price ($) Cost ipad 3 v1 500 Apple A5 2 ? 500 ipad 2 v2 500 Apple A6 2 ? 600 ipad 3 v3 100 Apple A6 4 ? 2000 Which to produce? Outline • • • • • Problem Definition Related Work Proposed Algorithms Experiments Conclusion Problem Definition • Background – Skyline SKY(X) ππΎπ(π) contains all the elements in π such that any other elements in π are not better than them. Products Distanceto-beach Price p1 7.0 200 p2 4.0 350 p3 1.0 500 p4 3.0 600 Cost X={p1,p2,p3,p4} SKY(X)={p1,p2,p3} Problem Definition • Background Products Distanceto-beach Price Cost p1 7.0 200 p2 4.0 350 p3 1.0 500 p4 3.0 600 q1 5.0 ? 100 q2 4.5 ? 200 q3 1.5 ? 400 X={p1,p2,p3,p4,q1,q2,q3} SKY(X)={p1,p2,p3,q1,q2,q3} What prices of q1,q2 and q3 should we set? Problem Definition • Scenario • Given • a set π of 4 products in the current market • a set π of 3 new products we want to produce • Objective • select a set π′ of π = 2 products from set π • determine the prices of the 2 products to gain as much profit as possible. Problem Definition • Notation – Attributes of products {π΄π }, π ∈ [1, π] Products Distance-tobeach (π΄1 ) Price (π΄2 ) Cost (πΆ) p1 7.0 200 p2 4.0 350 p3 1.0 500 p4 3.0 600 q1 5.0 ? 100 q2 4.5 ? 200 q3 1.5 ? 400 • π=2 • π΄1 is “Distance-to-beach”, π΄2 is “Price”. Problem Definition • Notation – Price Assignment Vector π = (π£1 , π£2 , π£3 ) Products Distance-tobeach (π΄1 ) Price (π΄2 ) Cost (πΆ) p1 7.0 200 p2 4.0 350 p3 1.0 500 p4 3.0 600 q1 5.0 π£1 200 100 q2 4.5 π£2 300 400 200 q3 1.5 π£03 400 • |π| = 3, π = 2; • π = (200,400,0) is a price assignment vector. • π = (200,300,0) is a feasible price assignment vector. Problem Definition • Notation – Profit β ππ , π£π of ππ : β ππ , π£π = π£π − ππ . πΆ; – Profit ππππππ‘ π ′ , π of π′: ππππππ‘ π ′ , π = • • • • ππ ∈π′ β Products Distance-tobeach (π΄1 ) Price (π΄2 ) Cost (πΆ) p1 7.0 200 p2 4.0 350 p3 1.0 500 p4 3.0 600 q1 5.0 π£1 100 q2 4.5 π£2 200 q3 1.5 π£3 400 π = 3, π = 2, π ′ = {π1 , π2 } , π = 200,300,0 ; β π1 , π£1 = π£1 − π1 . πΆ = 200 − 100 = 100; β π2 , π£2 = π£2 − π2 . πΆ = 300 − 200 = 100; ππππππ‘ π ′ , π = β π1 , π£1 + β π2 , π£2 = 200. ππ , π£π Problem Definition • Notation – The Optimal Profit ππππππ‘π π′ of π ′ ππππππ‘π π ′ = ππππππ‘ π ′ , ππ = max ππππππ‘ π ′ , π′ ; π′∈π Products Distance-tobeach (π΄1 ) Price (π΄2 ) Cost (πΆ) p1 7.0 200 p2 4.0 350 p3 1.0 500 p4 3.0 600 q1 5.0 π£1 100 q2 4.5 π£2 200 q3 1.5 π£3 400 • π = 3, π = 2, π ′ = {π1 , π2 } ; • β π2 , π£2 ≤ 300 − 200 = 100; β π1 , π£1 ≤ 250 − 100 = 150; • ∴ ππ = 250, 300,0 , ππππππ‘π π′ = 250. Problem Definition • Finding Top-k Profitable Products (TPP) Given a set π of existing products and a set π of possible new products, the goal is to find a subset π ′ of π such that • |π ′ | = π; • ∀ ππ ∈ π ′ , ππ ∈ ππΎπ(π ∪ π′) • ππππππ‘π π′ = ′′ max′′ ππππππ‘π π ′′ . π ⊂π,|π |=π Products Distanceto-beach (π΄1 ) Price (π΄2 ) Cost (πΆ) p1 7.0 200 p2 4.0 350 p3 1.0 500 p4 3.0 600 q1 5.0 ? 100 q2 4.5 ? 200 q3 1.5 ? 400 ο΅ π = 3, π = 2 : • When π ′ = π1 , π2 , • ππ = 250, 300,0 • ππππππ‘π π′ = 250. • When π ′ = π2 , π3 , • ππ = 0,300, 450 • ππππππ‘π π′ = 150. • When π ′ = π1 , π3 , • ππ = 250, 0,450 • ππππππ‘π π′ = 200. Related Work • Skyline Concept – Admissible points [1] – Maximal vectors [2] – Skyline in database [3] • Variations of Skyline – Computation of Skyline • Bitmap [4] • Nearest Neighbor (NN)[5] • Branch and Bound Skyline (BBS)[6] – Top-K queries • Ranked Skyline [6] • Representative skyline queries [7][8] • Reverse Skyline queries [9] – Create “Skyline” queries [10] Proposed Algorithms • Analyses – Price Correlation Products Distanceto-beach (π΄1 ) Price (π΄2 ) Cost (πΆ) p1 7.0 200 p2 4.0 350 p3 1.0 500 p4 3.0 600 q1 5.0 ? 300 100 q2 4.5 ? 200 q3 1.5 ? 400 ο΅ Example π = 3, π = 2, π ′ = {π1 , π2 } ; β π1 , π£1 ≤ 300 − 100 = 200; ∴ π£1 = 300 β π2 , π£2 ≤ 300 − 200 = 100; ∴ π£2 = 300 However, π2 is better than π1 ! In order to avoid Price Correlation, we sort all the products in π. Proposed Algorithms π1′ • Flow π2′ Compare π3′ ... π π1′ Select π products into π’ ππ′ π1 π2′ π3′ ... ππ′ Top-k profitable products ... Find Optimal Price of ππ′ π2 π3 ππ Proposed Algorithms • Find optimal price assignment of a given π′ – Quasi-dominate π quasi-dominates π′ if and only if one of the following holds: 1. π dominates π′ with respect to the first π − 1 attributes; 2. π has the same π − 1 attribute values as π′. Products Distanceto-beach (π΄1 ) Price (π΄2 ) Cost (πΆ) p1 7.0 200 p2 4.0 350 p3 1.0 500 p4 3.0 600 q1 5.0 ? 100 q2 4.5 ? 200 q3 1.5 ? 400 ο΅ Example: π2 quasi-dominate π1 π3 quasi-dominate π1 ,π2 ,π3 Proposed Algorithms • Find optimal price assignment vector of π′ – Quasi-dominate – Order Function π ∀ ππ ∈ π, π ππ = π. π΄1 + β― + π. π΄π−1 Products Distanceto-beach (π΄1 ) Price (π΄2 ) Cost (πΆ) p1 7.0 200 p2 4.0 350 p3 1.0 500 p4 3.0 600 q1 5.0 ? 100 q2 4.5 ? 200 q3 1.5 ? 400 Products π ππ q1 5.0 q2 4.5 q3 1.5 Proposed Algorithms • Find optimal price assignment of a given π′ – Quasi-dominate – Order Function – Lemma Suppose π and π′ are in π. If π quasi-dominates π′, then π(π) is smaller than or equal to π(π′ ). Products Distanceto-beach (π΄1 ) Price (π΄2 ) p1 7.0 p2 Cost (πΆ) Products π ππ q1 5.0 200 q2 4.5 4.0 350 q3 1.5 p3 1.0 500 p4 3.0 600 q1 5.0 ? 100 q2 4.5 ? 200 q3 1.5 ? 400 ο΅ Example: Since π2 quasi-dominates π1 , π π2 < π π1 . Proposed Algorithms • Find optimal price assignment of a given π′ – Quasi-dominate – Order Function – Lemma Suppose π and π′ are in π. If π quasi-dominates π′, then π(π) is smaller than or equal to π(π′ ). – Main idea • First sort all the products in π′ according to their π values. • Find π containing all the products in π ∪ π′ which quasidominate ππ . • Set π£π to (min π. πππππ) − σ. π∈π As ππ are sorted, no price correlation will happen. Proposed Algorithms • Find optimal price assignment of a given π′ Products Distanceto-beach (π΄1 ) Price (π΄2 ) p1 7.0 200 p2 4.0 350 p3 1.0 500 Suppose π′ = π, σ = 50 1. Sort π′ = {π3 , π2 , π1 } 2. Find π For π3 , π = {π3 } p4 3.0 600 3. q1 5.0 ? 100 q2 4.5 ? 200 q3 1.5 ? 400 Cost (πΆ) ππ Products q1 5.0 q3 1.5 q2 4.5 q2 4.5 q3 1.5 q1 5.0 Products π π ππ Set π£π to (min π. πππππ) − σ π∈π π£3 = min{π3 . πππππ} − σ = 450 Run Step 2 and 3 iteratively until any π£π is set. ο This algorithm is called AOPA. The iteration process (Steps 2 and 3) can be expressed as a function π£π = πΌ(ππ , π′, π). 4. Proposed Algorithms • With AOPA/πΌ, we propose three algorithms – Dynamic Programming (DP) for π = 2 – Greedy Algorithm 1 (GR1) for π > 2 – Greedy Algorithm 2 (GR2) for π > 2 • Theorem When π > 2, problem TPP is NP-hard. Dynamic Programming (DP) • Main Steps • Start selecting products into π′ from π ′ = 1. • Whether ππ is selected or not depends on whether the optimal profit of π′ is larger after ππ is added. • Increase |π ′ | by 1 and compute the optimal profit of π′ according to the previous results. • Terminate when π ′ = π. Greedy Algorithm 1 (GR1) • Main Steps • Compute the optimal profit of π ′ = ππ for any π (π = 1). • Choose the π products which have the top- π optimal profits. • π − Approximation – additive error guarantee – multiplicative error guarantee • Disadvantage Price correlation is not considered. Greedy Algorithm 2 (GR2) • Main Steps • Iteratively select one product from π into π ′ . In each iteration, add ππ such that it brings greatest profit increase to π ′ by π΄πππ΄ algorithm. • Terminate when |π ′ | is π. • Advantage In each iteration, price correlation is considered in π΄πππ΄ algorithm. Therefore, the result of GR2 has no correlation. Experiments • Algorithms – – – – DP GR1 GR2 BF • Datasets – Real dataset • Packages (hotel and flights) from Priceline.com and Expedia.com • 149 round trip packages (π) with 6 attributes (π = 5) • 1014 hotels and 4394 flights • 4787 new packages (π) – Synthetic datasets • Small synthetic dataset with π = 10,000, π = 10,000, π = 2 . • Large synthetic dataset with π ∈ [0.5π, 2.0π], π ∈ 0.5π, 3.0π . • Other settings – The discount rate of π is denoted by π, set π. πΆ = 1 − π π. πππππ. Experiments (cont.) • Real Dataset Experiments (cont.) • Small synthetic dataset Experiments (cont.) • Small synthetic dataset Experiments (cont.) • Large synthetic dataset Experiments (cont.) • Large synthetic dataset Conclusion • Contribution – We tackle the problem of finding top-π profitable products. – Three algorithms are proposed for solving it. – The effectiveness and efficiency of proposed algorithms are verified. • Interesting future work – Find top-π profitable products with dynamic data – Consider additional constraints (e.g., supply and demand and unit profit) Reference [1] O. B.-N. et al. On the distribution of the number of admissable points in a vector random sample. In Theory of Probability and its Application, 11(2), 1966. [2] J. L. B. et al. On the average number of maxima in a set of vectors and applications. In Journal of ACM, 25(4), 1978. [3] S. Borzsonyi, D. Kossmann, and K. Stocker. The skyline operator. In ICDE, 2001. [4] K.-L. Tan, P. Eng, and B. Ooi. Efficient progressive skyline computation. In VLDB, 2001. [5] D. Kossmann, F. Ramsak, and S. Rost. Shooting stars in the sky: An online algorithm for skyline queries. In VLDB, 2002. [6] D. Papadias, Y. Tao, G. Fu, and B. Seeger. Progressive skyline computation in database systems. In ACM Transactions on Database Systems, Vol. 30, No. 1, 2005. [7] X. Lin, Y. Yuan, Q. Zhang, and Y. Zhang. Selecting stars: the k most representative skyline operator. In in ICDE, 2007. [8] Y. Tao, L. Ding, X. Lin, and J. Pei. Distance-based representative skyline. In ICDE ’09: Proceedings of the 2009 IEEE International Conference on Data Engineering, pages 892–903, Washington, DC, USA, 2009. IEEE Computer Society. [9] E. Dellis , B. Seeger, Efficient computation of reverse skyline queries, Proceedings of the 33rd international conference on Very large data bases, September 23-27, 2007, Vienna, Austria [10] Q. Wan, R. C.-W. Wong, I. F. Ilyas, M. T. Ozsu, and Y. Peng. Creating competitive products. In VLDB, 2009. Thank you! Q&A Backup Slides Dynamic Programming • Notation – π(π) : all the products in π which quasi-dominate ππ . – π π, π‘ : a size-π‘ subset of π(π) such that it has the greatest profit among all the size-π‘ subsets of π(π). – π π, π‘ : the optimal price assignment vector of π π, π‘ . – π π, π‘ : the optimal profit of π π, π‘ . • Main idea – The optimal profit assignment of set π π, π‘ can be computed by πΌ(ππ , π π − 1, π‘ − 1 , π π − 1, π‘ − 1 ) / π π − 1, π‘ . – By comparing the maximum profit of size-π‘ subsets of π including ππ and not including ππ , we decide whether ππ is in the final selection. Dynamic Programming (cont.) • Main Steps – Maximum Profit: • Case 1: ππ is not included in the final selection of size π‘. – π π, π‘ = α(ππ , π π − 1, π‘ − 1 , π π − 1, π‘ − 1 ) – π π, π‘ = π π − 1, π‘ − 1 ∪ {ππ } – π π, π‘ = π π − 1, π‘ − 1 + π£π • Case 2: ππ is included in the final selection of size π‘. – π π, π‘ = π π − 1, π‘ – π π, π‘ = π π − 1, π‘ – π π, π‘ = π π − 1, π‘ – Comparison: Let ππ πππππ‘ = π π − 1, π‘ − 1 + π£π ,ππππ‘π πππππ‘ = π π − 1, π‘ , If ππ πππππ‘ > ππππ‘π πππππ‘ , selet ππ in the final selection set.