pptx - Department of Computer Science and Engineering

advertisement
Finding Top-k Profitable Products
Qian Wan, Raymond Chi-Wing Wong, Yu Peng
The Hong Kong University of Science & Technology
Prepared by Yu Peng
Product Manager’s Dilemma
ipad3
ipad 2
ipad
Product Manager’s Dilemma
Product Manager’s Dilemma
ipad :
$ 499
Suit: $600
Product Manager’s Dilemma
Product Manager’s Dilemma
Weigth (g) Processor
Camera
Price ($)
ipad
730/608
Apple A4
None
499->399
ipad 2
613/601
Apple A5
2
499
ipad 3
?
?
?
?
Weigth (g)
Processor
Camera
Price ($)
Cost
ipad 3 v1
500
Apple A5
2
?
500
ipad 2 v2
500
Apple A6
2
?
600
ipad 3 v3
100
Apple A6
4
?
2000
Which to produce?
Outline
•
•
•
•
•
Problem Definition
Related Work
Proposed Algorithms
Experiments
Conclusion
Problem Definition
• Background
– Skyline SKY(X)
π‘†πΎπ‘Œ(𝑋) contains all the elements
in 𝑋 such that any other elements
in 𝑋 are not better than them.
Products
Distanceto-beach
Price
p1
7.0
200
p2
4.0
350
p3
1.0
500
p4
3.0
600
Cost
X={p1,p2,p3,p4}
SKY(X)={p1,p2,p3}
Problem Definition
• Background
Products
Distanceto-beach
Price
Cost
p1
7.0
200
p2
4.0
350
p3
1.0
500
p4
3.0
600
q1
5.0
?
100
q2
4.5
?
200
q3
1.5
?
400
X={p1,p2,p3,p4,q1,q2,q3}
SKY(X)={p1,p2,p3,q1,q2,q3}
What prices of q1,q2 and q3 should we set?
Problem Definition
• Scenario
• Given
• a set 𝑃 of 4 products in the current market
• a set 𝑄 of 3 new products we want to produce
• Objective
• select a set 𝑄′ of π‘˜ = 2 products from set 𝑄
• determine the prices of the 2 products to gain as much profit as
possible.
Problem Definition
• Notation
– Attributes of products {𝐴𝑖 }, 𝑖 ∈ [1, 𝑙]
Products
Distance-tobeach (𝐴1 )
Price
(𝐴2 )
Cost
(𝐢)
p1
7.0
200
p2
4.0
350
p3
1.0
500
p4
3.0
600
q1
5.0
?
100
q2
4.5
?
200
q3
1.5
?
400
• 𝑙=2
• 𝐴1 is “Distance-to-beach”, 𝐴2 is “Price”.
Problem Definition
• Notation
– Price Assignment Vector 𝒗 = (𝑣1 , 𝑣2 , 𝑣3 )
Products
Distance-tobeach (𝐴1 )
Price
(𝐴2 )
Cost
(𝐢)
p1
7.0
200
p2
4.0
350
p3
1.0
500
p4
3.0
600
q1
5.0
𝑣1
200
100
q2
4.5
𝑣2
300
400
200
q3
1.5
𝑣03
400
• |𝑄| = 3, π‘˜ = 2;
• 𝒗 = (200,400,0) is a price assignment vector.
• 𝒗 = (200,300,0) is a feasible price assignment vector.
Problem Definition
• Notation
– Profit βˆ† π‘žπ‘– , 𝑣𝑖 of π‘žπ‘– :
βˆ† π‘žπ‘– , 𝑣𝑖 = 𝑣𝑖 − π‘žπ‘– . 𝐢;
– Profit π‘ƒπ‘Ÿπ‘œπ‘“π‘–π‘‘ 𝑄 ′ , 𝒗 of 𝑄′: π‘ƒπ‘Ÿπ‘œπ‘“π‘–π‘‘ 𝑄 ′ , 𝒗 =
•
•
•
•
π‘žπ‘– ∈𝑄′ βˆ†
Products
Distance-tobeach (𝐴1 )
Price
(𝐴2 )
Cost
(𝐢)
p1
7.0
200
p2
4.0
350
p3
1.0
500
p4
3.0
600
q1
5.0
𝑣1
100
q2
4.5
𝑣2
200
q3
1.5
𝑣3
400
𝑛 = 3, π‘˜ = 2, 𝑄 ′ = {π‘ž1 , π‘ž2 } , 𝒗 = 200,300,0 ;
βˆ† π‘ž1 , 𝑣1 = 𝑣1 − π‘ž1 . 𝐢 = 200 − 100 = 100;
βˆ† π‘ž2 , 𝑣2 = 𝑣2 − π‘ž2 . 𝐢 = 300 − 200 = 100;
π‘ƒπ‘Ÿπ‘œπ‘“π‘–π‘‘ 𝑄 ′ , 𝒗 = βˆ† π‘ž1 , 𝑣1 + βˆ† π‘ž2 , 𝑣2 = 200.
π‘žπ‘– , 𝑣𝑖
Problem Definition
• Notation
– The Optimal Profit π‘ƒπ‘Ÿπ‘œπ‘“π‘–π‘‘π‘œ 𝑄′ of 𝑄 ′
π‘ƒπ‘Ÿπ‘œπ‘“π‘–π‘‘π‘œ 𝑄 ′ = π‘ƒπ‘Ÿπ‘œπ‘“π‘–π‘‘ 𝑄 ′ , π’—π‘œ = max π‘ƒπ‘Ÿπ‘œπ‘“π‘–π‘‘ 𝑄 ′ , 𝒗′ ;
𝒗′∈𝑉
Products
Distance-tobeach (𝐴1 )
Price
(𝐴2 )
Cost
(𝐢)
p1
7.0
200
p2
4.0
350
p3
1.0
500
p4
3.0
600
q1
5.0
𝑣1
100
q2
4.5
𝑣2
200
q3
1.5
𝑣3
400
• 𝑛 = 3, π‘˜ = 2, 𝑄 ′ = {π‘ž1 , π‘ž2 } ;
• βˆ† π‘ž2 , 𝑣2 ≤ 300 − 200 = 100; βˆ† π‘ž1 , 𝑣1 ≤ 250 − 100 = 150;
• ∴ π’—π‘œ = 250, 300,0 , π‘ƒπ‘Ÿπ‘œπ‘“π‘–π‘‘π‘œ 𝑄′ = 250.
Problem Definition
• Finding Top-k Profitable Products (TPP)
Given a set 𝑃 of existing products and a set 𝑄 of possible new products,
the goal is to find a subset 𝑄 ′ of 𝑄 such that
• |𝑄 ′ | = π‘˜;
• ∀ π‘žπ‘– ∈ 𝑄 ′ , π‘žπ‘– ∈ π‘†πΎπ‘Œ(𝑃 ∪ 𝑄′)
• π‘ƒπ‘Ÿπ‘œπ‘“π‘–π‘‘π‘œ 𝑄′ = ′′ max′′ π‘ƒπ‘Ÿπ‘œπ‘“π‘–π‘‘π‘œ 𝑄 ′′ .
𝑄 ⊂𝑄,|𝑄 |=π‘˜
Products
Distanceto-beach
(𝐴1 )
Price
(𝐴2 )
Cost
(𝐢)
p1
7.0
200
p2
4.0
350
p3
1.0
500
p4
3.0
600
q1
5.0
?
100
q2
4.5
?
200
q3
1.5
?
400
 𝑛 = 3, π‘˜ = 2 :
• When 𝑄 ′ = π‘ž1 , π‘ž2 ,
• π’—π‘œ = 250, 300,0
• π‘ƒπ‘Ÿπ‘œπ‘“π‘–π‘‘π‘œ 𝑄′ = 250.
• When 𝑄 ′ = π‘ž2 , π‘ž3 ,
• π’—π‘œ = 0,300, 450
• π‘ƒπ‘Ÿπ‘œπ‘“π‘–π‘‘π‘œ 𝑄′ = 150.
• When 𝑄 ′ = π‘ž1 , π‘ž3 ,
• π’—π‘œ = 250, 0,450
• π‘ƒπ‘Ÿπ‘œπ‘“π‘–π‘‘π‘œ 𝑄′ = 200.
Related Work
• Skyline Concept
– Admissible points [1]
– Maximal vectors [2]
– Skyline in database [3]
• Variations of Skyline
– Computation of Skyline
• Bitmap [4]
• Nearest Neighbor (NN)[5]
• Branch and Bound Skyline (BBS)[6]
– Top-K queries
• Ranked Skyline [6]
• Representative skyline queries [7][8]
• Reverse Skyline queries [9]
– Create “Skyline” queries [10]
Proposed Algorithms
• Analyses
– Price Correlation
Products
Distanceto-beach
(𝐴1 )
Price
(𝐴2 )
Cost
(𝐢)
p1
7.0
200
p2
4.0
350
p3
1.0
500
p4
3.0
600
q1
5.0
?
300
100
q2
4.5
?
200
q3
1.5
?
400
 Example
𝑛 = 3, π‘˜ = 2, 𝑄 ′ = {π‘ž1 , π‘ž2 } ;
βˆ† π‘ž1 , 𝑣1 ≤ 300 − 100 = 200;
∴ 𝑣1 = 300
βˆ† π‘ž2 , 𝑣2 ≤ 300 − 200 = 100;
∴ 𝑣2 = 300
However, π‘ž2 is better than π‘ž1 !
In order to avoid Price Correlation,
we sort all the products in 𝑄.
Proposed Algorithms
𝑄1′
• Flow
𝑄2′
Compare
𝑄3′
...
𝑄
𝑄1′
Select π‘˜
products
into 𝑄’
𝑄𝑛′
𝑄1
𝑄2′
𝑄3′
...
𝑄𝑛′
Top-k profitable products
...
Find
Optimal
Price of 𝑄𝑖′
𝑄2
𝑄3
π‘„π‘˜
Proposed Algorithms
• Find optimal price assignment of a given 𝑄′
– Quasi-dominate
𝑝 quasi-dominates 𝑝′ if and only if one of the following holds:
1. 𝑝 dominates 𝑝′ with respect to the first 𝑙 − 1 attributes;
2. 𝑝 has the same 𝑙 − 1 attribute values as 𝑝′.
Products
Distanceto-beach
(𝐴1 )
Price
(𝐴2 )
Cost
(𝐢)
p1
7.0
200
p2
4.0
350
p3
1.0
500
p4
3.0
600
q1
5.0
?
100
q2
4.5
?
200
q3
1.5
?
400
 Example:
𝑝2 quasi-dominate π‘ž1
𝑝3 quasi-dominate π‘ž1 ,π‘ž2 ,π‘ž3
Proposed Algorithms
• Find optimal price assignment vector of 𝑄′
– Quasi-dominate
– Order Function 𝑓
∀ π‘žπ‘– ∈ 𝑄, 𝑓 π‘žπ‘– = π‘ž. 𝐴1 + β‹― + π‘ž. 𝐴𝑙−1
Products
Distanceto-beach
(𝐴1 )
Price
(𝐴2 )
Cost
(𝐢)
p1
7.0
200
p2
4.0
350
p3
1.0
500
p4
3.0
600
q1
5.0
?
100
q2
4.5
?
200
q3
1.5
?
400
Products
𝑓
π‘žπ‘–
q1
5.0
q2
4.5
q3
1.5
Proposed Algorithms
• Find optimal price assignment of a given 𝑄′
– Quasi-dominate
– Order Function
– Lemma
Suppose 𝑝 and 𝑝′ are in 𝑋. If 𝑝 quasi-dominates 𝑝′, then 𝑓(𝑝) is
smaller than or equal to 𝑓(𝑝′ ).
Products
Distanceto-beach
(𝐴1 )
Price
(𝐴2 )
p1
7.0
p2
Cost
(𝐢)
Products
𝑓 π‘žπ‘–
q1
5.0
200
q2
4.5
4.0
350
q3
1.5
p3
1.0
500
p4
3.0
600
q1
5.0
?
100
q2
4.5
?
200
q3
1.5
?
400
 Example:
Since π‘ž2 quasi-dominates π‘ž1 ,
𝑓 π‘ž2 < 𝑓 π‘ž1 .
Proposed Algorithms
• Find optimal price assignment of a given 𝑄′
– Quasi-dominate
– Order Function
– Lemma
Suppose 𝑝 and 𝑝′ are in 𝑋. If 𝑝 quasi-dominates 𝑝′, then 𝑓(𝑝) is
smaller than or equal to 𝑓(𝑝′ ).
– Main idea
• First sort all the products in 𝑄′ according to their 𝑓 values.
• Find π‘Œ containing all the products in 𝑃 ∪ 𝑄′ which quasidominate π‘žπ‘– .
• Set 𝑣𝑖 to (min 𝑝. π‘ƒπ‘Ÿπ‘–π‘π‘’) − σ.
𝑝∈π‘Œ
As π‘žπ‘– are sorted, no price correlation will happen.
Proposed Algorithms
• Find optimal price assignment of a given 𝑄′
Products
Distanceto-beach
(𝐴1 )
Price
(𝐴2 )
p1
7.0
200
p2
4.0
350
p3
1.0
500
Suppose 𝑄′ = 𝑄, σ = 50
1. Sort
𝑄′ = {π‘ž3 , π‘ž2 , π‘ž1 }
2. Find π‘Œ
For π‘ž3 , π‘Œ = {𝑝3 }
p4
3.0
600
3.
q1
5.0
?
100
q2
4.5
?
200
q3
1.5
?
400
Cost
(𝐢)
π‘žπ‘–
Products
q1
5.0
q3
1.5
q2
4.5
q2
4.5
q3
1.5
q1
5.0
Products
𝑓
𝑓
π‘žπ‘–
Set 𝑣𝑖 to (min 𝑝. π‘ƒπ‘Ÿπ‘–π‘π‘’) − σ
𝑝∈π‘Œ
𝑣3 = min{𝑝3 . π‘ƒπ‘Ÿπ‘–π‘π‘’} − σ = 450
Run Step 2 and 3 iteratively until any 𝑣𝑖
is set.
οƒ˜ This algorithm is called AOPA. The
iteration process (Steps 2 and 3) can be
expressed as a function 𝑣𝑖 = 𝛼(π‘žπ‘– , 𝑄′, 𝒗).
4.
Proposed Algorithms
• With AOPA/𝛼, we propose three algorithms
– Dynamic Programming (DP) for 𝑙 = 2
– Greedy Algorithm 1 (GR1) for 𝑙 > 2
– Greedy Algorithm 2 (GR2) for 𝑙 > 2
• Theorem
When 𝑙 > 2, problem TPP is NP-hard.
Dynamic Programming (DP)
• Main Steps
• Start selecting products into 𝑄′ from 𝑄 ′ = 1.
• Whether π‘žπ‘– is selected or not depends on whether the optimal profit of
𝑄′ is larger after π‘žπ‘– is added.
• Increase |𝑄 ′ | by 1 and compute the optimal profit of 𝑄′ according to the
previous results.
• Terminate when 𝑄 ′ = π‘˜.
Greedy Algorithm 1 (GR1)
• Main Steps
• Compute the optimal profit of 𝑄 ′ = π‘žπ‘– for any 𝑖 (π‘˜ = 1).
• Choose the π‘˜ products which have the top- π‘˜ optimal profits.
• πœ€ − Approximation
– additive error guarantee
– multiplicative error guarantee
• Disadvantage
Price correlation is not considered.
Greedy Algorithm 2 (GR2)
• Main Steps
• Iteratively select one product from 𝑄 into 𝑄 ′ . In each iteration, add π‘žπ‘–
such that it brings greatest profit increase to 𝑄 ′ by 𝐴𝑂𝑃𝐴 algorithm.
• Terminate when |𝑄 ′ | is π‘˜.
• Advantage
In each iteration, price correlation is considered in 𝐴𝑂𝑃𝐴 algorithm.
Therefore, the result of GR2 has no correlation.
Experiments
• Algorithms
–
–
–
–
DP
GR1
GR2
BF
• Datasets
– Real dataset
• Packages (hotel and flights) from Priceline.com and Expedia.com
• 149 round trip packages (𝑃) with 6 attributes (𝑙 = 5)
• 1014 hotels and 4394 flights
• 4787 new packages (𝑄)
– Synthetic datasets
• Small synthetic dataset with 𝑃 = 10,000, 𝑄 = 10,000, 𝑙 = 2 .
• Large synthetic dataset with 𝑃 ∈ [0.5𝑀, 2.0𝑀], 𝑄 ∈ 0.5𝑀, 3.0𝑀 .
• Other settings
– The discount rate of π‘ž is denoted by 𝑑, set π‘ž. 𝐢 = 1 − 𝑑 π‘ž. π‘ƒπ‘Ÿπ‘–π‘π‘’.
Experiments (cont.)
• Real Dataset
Experiments (cont.)
• Small synthetic dataset
Experiments (cont.)
• Small synthetic dataset
Experiments (cont.)
• Large synthetic dataset
Experiments (cont.)
• Large synthetic dataset
Conclusion
• Contribution
– We tackle the problem of finding top-π‘˜ profitable products.
– Three algorithms are proposed for solving it.
– The effectiveness and efficiency of proposed algorithms are verified.
• Interesting future work
– Find top-π‘˜ profitable products with dynamic data
– Consider additional constraints (e.g., supply and demand and unit profit)
Reference
[1] O. B.-N. et al. On the distribution of the number of admissable points in a vector
random sample. In Theory of Probability and its Application, 11(2), 1966.
[2] J. L. B. et al. On the average number of maxima in a set of vectors and applications.
In Journal of ACM, 25(4), 1978.
[3] S. Borzsonyi, D. Kossmann, and K. Stocker. The skyline operator. In ICDE, 2001.
[4] K.-L. Tan, P. Eng, and B. Ooi. Efficient progressive skyline computation. In VLDB,
2001.
[5] D. Kossmann, F. Ramsak, and S. Rost. Shooting stars in the sky: An online algorithm
for skyline queries. In VLDB, 2002.
[6] D. Papadias, Y. Tao, G. Fu, and B. Seeger. Progressive skyline computation in
database systems. In ACM Transactions on Database Systems, Vol. 30, No. 1, 2005.
[7] X. Lin, Y. Yuan, Q. Zhang, and Y. Zhang. Selecting stars: the k most representative
skyline operator. In in ICDE, 2007.
[8] Y. Tao, L. Ding, X. Lin, and J. Pei. Distance-based representative skyline. In ICDE ’09:
Proceedings of the 2009 IEEE International Conference on Data Engineering, pages
892–903, Washington, DC, USA, 2009. IEEE Computer Society.
[9] E. Dellis , B. Seeger, Efficient computation of reverse skyline queries, Proceedings of
the 33rd international conference on Very large data bases, September 23-27, 2007,
Vienna, Austria
[10] Q. Wan, R. C.-W. Wong, I. F. Ilyas, M. T. Ozsu, and Y. Peng. Creating competitive
products. In VLDB, 2009.
Thank you!
Q&A
Backup Slides
Dynamic Programming
• Notation
– 𝑄(𝑖) : all the products in 𝑄 which quasi-dominate π‘žπ‘– .
– 𝑆 𝑖, 𝑑 : a size-𝑑 subset of 𝑄(𝑖) such that it has the greatest profit
among all the size-𝑑 subsets of 𝑄(𝑖).
– 𝒗 𝑖, 𝑑 : the optimal price assignment vector of 𝑆 𝑖, 𝑑 .
– 𝑇 𝑖, 𝑑 : the optimal profit of 𝑆 𝑖, 𝑑 .
• Main idea
– The optimal profit assignment of set 𝑆 𝑖, 𝑑 can be computed by
𝛼(π‘žπ‘– , 𝑆 𝑖 − 1, 𝑑 − 1 , 𝒗 𝑖 − 1, 𝑑 − 1 ) / 𝒗 𝑖 − 1, 𝑑 .
– By comparing the maximum profit of size-𝑑 subsets of 𝑄 including π‘žπ‘–
and not including π‘žπ‘– , we decide whether π‘žπ‘– is in the final selection.
Dynamic Programming (cont.)
• Main Steps
– Maximum Profit:
• Case 1: π‘žπ‘– is not included in the final selection of size 𝑑.
– 𝒗 𝑖, 𝑑 = α(π‘žπ‘– , 𝑆 𝑖 − 1, 𝑑 − 1 , 𝒗 𝑖 − 1, 𝑑 − 1 )
– 𝑆 𝑖, 𝑑 = 𝑆 𝑖 − 1, 𝑑 − 1 ∪ {π‘žπ‘– }
– 𝑇 𝑖, 𝑑 = 𝑇 𝑖 − 1, 𝑑 − 1 + 𝑣𝑖
• Case 2: π‘žπ‘– is included in the final selection of size 𝑑.
– 𝒗 𝑖, 𝑑 = 𝒗 𝑖 − 1, 𝑑
– 𝑆 𝑖, 𝑑 = 𝑆 𝑖 − 1, 𝑑
– 𝑇 𝑖, 𝑑 = 𝑇 𝑖 − 1, 𝑑
– Comparison:
Let 𝑇𝑠𝑒𝑙𝑒𝑐𝑑 = 𝑇 𝑖 − 1, 𝑑 − 1 + 𝑣𝑖 ,π‘‡π‘›π‘œπ‘‘π‘ π‘’π‘™π‘’π‘π‘‘ = 𝑇 𝑖 − 1, 𝑑 ,
If 𝑇𝑠𝑒𝑙𝑒𝑐𝑑 > π‘‡π‘›π‘œπ‘‘π‘ π‘’π‘™π‘’π‘π‘‘ , selet π‘žπ‘– in the final selection set.
Download