Online Learning for Online Pricing Problems Maria Florina Balcan

advertisement
Online Learning for Online Pricing
Problems
Maria Florina Balcan
Three versions (easiest to hardest)
Algorithmic
– Customers’ shopping lists / valuations known to the
algorithm. (Seller knows market well)
Incentive-compatible auction
– Customers submit lists / valuations to mechanism,
which decides who gets what for how much. Must be in
customers’ interest to report truthfully.
On-line pricing
– Customers arrive one at a time, buy what they want at
current prices. Seller modifies prices over time.
Adaptive algorithms for pricing a single
good.
(Connections to experts and bandit problems)
Pricing a single good
• Say you are selling lemonade (or a cool new software tool, or
tickets to the world’s fair).
• Protocol #1: for t=1,2,…T
– Seller posts price pt
– Buyer arrives with valuation vt
– If vt ¸ pt, buyer purchases and pays pt, else doesn’t.
– vt revealed to algorithm.
$1
• Protocol #2: same as protocol #1 but
without last step.
• Assume all valuations in [1,h]
• Goal: do nearly as well as best fixed
price in hindsight.
Pricing a single good
• Say you are selling lemonade (or a cool new software tool, or
tickets to the world’s fair).
• Protocol #1: for t=1,2,…T
– Seller posts price pt
– Buyer arrives with valuation vt
– If vt ¸ pt, buyer purchases and pays pt, else doesn’t.
– vt revealed to algorithm.
• Bad algorithm: “best price in past”
– What if sequence of buyers = 1, h, 1, …, 1, h, 1, …, 1, h, …
– Alg makes T/h, OPT makes T.
Ratio of h worse!
Pricing a single good
• Say you are selling lemonade (or a cool new software tool, or
tickets to the world’s fair).
• Protocol #1: for t=1,2,…T
– Seller posts price pt
– Buyer arrives with valuation vt
– If vt ¸ pt, buyer purchases and pays pt, else doesn’t.
– vt revealed to algorithm.
• Good algorithm: “combining expert advice”
– Define one expert for each price p = (1+²)i 2 [1,h].
– Best price of this form gives profit ¸ OPT/(1+²).
– Run RWM algorithm. Get expected gain at least:
OPT/(1+²)2 - O(²-1 h log(²-1 log h))
[extra factor of h coming from range of gains]
Pricing a single good
• Say you are selling lemonade (or a cool new software tool, or
tickets to the world’s fair)
Only arbitrarily small constant factor worse, with
O(h log log h) additive loss!
Can’t hope to do much better: e.g., if only one high
bidder dominates the rest.
• Good algorithm: “combining expert advice”
– Define one expert for each price p = (1+²)i 2 [1,h].
– Best price of this form gives profit ¸ OPT/(1+²).
– Run RWM algorithm. Get expected gain at least:
OPT/(1+²)2 - O(²-1 h log(²-1 log h))
[extra factor of h coming from range of gains]
Pricing a single good
• Say you are selling lemonade (or a cool new software tool, or
tickets to the world’s fair).
• What about Protocol #2? [just see accept/reject decision]
– Now we can’t run RWM directly since we don’t know how
to penalize the experts!
– Called the “adversarial bandit problem”
– How can we solve that?
$1
Pricing a single good
Exponential Weights for Exploration and Exploitation (exp3)
OPT
Expert i ~
Gain git
qt
qt
Exp3
Distrib pt
Gain vector ĝt
qt = (1-°)pt + ° unif
· nh/°
ĝt = (0,…,0, git/qit,0,…,0)
1. RWM believes gain is: pt ¢ ĝt = pit(git/qit) ´ gtRWM
2. t gtRWM ¸ OPT /(1+²) - O(²-1 nh/° log n)
3. Actual gain at t is: git = gtRWM (qit/pit) ¸ gtRWM(1-°)
OPT
RWM
Pricing a single good
Exponential Weights for Exploration and Exploitation (exp3)
OPT
Expert i ~
Gain git
qt
qt
Exp3
Distrib pt
Gain vector ĝt
qt = (1-°)pt + ° unif
OPT
RWM
· nh/°
ĝt = (0,…,0, git/qit,0,…,0)
1. RWM believes gain is: pt ¢ ĝt = pit(git/qit) ´ gtRWM
2. t gtRWM ¸ OPT /(1+²) - O(²-1 nh/° log n)
3. Actual gain is at t: git = gtRWM (qit/pit) ¸ gtRWM(1-°)
3.5. Actual overall gain >= (1- °) OPT /(1+²) - O(²-1 nh/° log n)
Pricing a single good
Exponential Weights for Exploration and Exploitation (exp3)
OPT
Expert i ~
Gain git
qt
qt
Exp3
Distrib pt
Gain vector ĝt
qt = (1-°)pt + ° unif
OPT
RWM
· nh/°
ĝt = (0,…,0, git/qit,0,…,0)
1. RWM believes gain is: pt ¢ ĝt = pit(git/qit) ´ gtRWM
2. t gtRWM ¸ OPT /(1+²) - O(²-1 nh/° log n)
3. Actual gain is at t is: git = gtRWM (qit/pit) ¸ gtRWM(1-°)
4. E[OPT ] ¸ OPT. Because E[ĝjt] = (1- qjt)0 + qjt(gjt/qjt) = gjt ,
so E[maxj[t ĝjt]] ¸ maxj [ E[t ĝjt] ] = OPT.
Pricing a single good
Exponential Weights for Exploration and Exploitation (exp3)
OPT
Expert i ~
Gain git
qt
qt
Exp3
Distrib pt
Gain vector ĝt
OPT
RWM
qt = (1-°)pt + ° unif
ĝt = (0,…,0, git/qit,0,…,0)
Conclusion (° = ²):
E[Exp3] ¸ OPT/(1+²)2 - O(²-2 h log(h) loglog(h))
Algorithmic Problem, Single-minded Bidders
• n item types (coffee, cups, sugar, apples), with unlimited
supply of each.
• m customers.
• Each customer i has a shopping list Li and will only shop if
the total cost of items in Li is at most some amount wi
(otherwise he will go elsewhere).
• Say all marginal costs to you are 0 [revisit this in a bit],
and you know all the (Li, wi) pairs.
What prices on the items will make you the most money?
• Easy if all Li are of size 1.
• What happens if all Li are of size 2?
Algorithmic Pricing, Single-minded Bidders
• A multigraph G with values we on edges e.
• Goal: assign prices on vertices pv¸ 0 to
maximize total profit, where:
5
15
10
30
10
40
20
5
• APX hard [GHKKKM’05].
A Simple 2-Approx. in the Bipartite Case
• Given a multigraph G with values we on edges e.
• Goal: assign prices on vertices pv ¸ 0 as to maximize total profit,
where:
Algorithm
• Set prices in R to 0 and separately fix
prices for each node on L.
• Set prices in L to 0 and separately fix
prices for each node on R
• Take the best of both options.
Proof
simple
!
L
R
15
25
35
15
25
5
40
OPT=OPTL+OPTR
A 4-Approx. for Graph Vertex Pricing
•
Given a multigraph G with values we on edges e.
•
Goal: assign prices on vertices pv¸ 0 to maximize total profit,
where:
5
15
10
30
10
40
20
5
Algorithm
• Randomly partition the vertices into two sets L and R.
• Ignore the edges whose endpoints are on the same side
and run the alg. for the bipartite case.
Proof
simple In expectation half of OPT’s profit is from edges with one
!
endpoint in L and one endpoint in R.
Algorithmic Pricing, Single-minded Bidders,
k-hypergraph Problem
15
What about lists of size · k?
10
Algorithm
20
– Put each node in L with probability 1/k, in R with
probability 1 – 1/k.
– Let GOOD = set of edges with exactly one endpoint in L.
Set prices in R to 0 and optimize L wrt GOOD.
• Let OPTj,e be revenue OPT makes selling item j to customer e.
Let Xj,e be indicator RV for j 2 L & e 2 GOOD.
• Our expected profit at least:
On-line Pricing
Customers arrive one at a time, buy or don’t buy at current
prices.
In (full information) auction model, we know valuation info
for customers 1,…,i-1 when customer i arrives.
In posted-price model, only know who bought what for how
much.
Goal - do well compared to best fixed set of item prices.
On-line Pricing
Our O(k)-approx. alg. can be naturally adapted to the online setting, by
using results of [BH’05] and [BKRW’03] for the online digital good
auction.
Can run separate online auctions over items in L, customers in GOOD
(customers who want exactly one item in L).
Guarantee: perform comparably to best fixed set of item prices (for pts
in L, people in GOOD).
Let OPTi be the best profit achievable (from item i) using a fixed
price for item i from customers in GOOD whose bundle contain item i.
Can use [BH’05] auction -- the expected profit of the online auction
for item i is
On-line Pricing
Can run separate online auctions over items in L, customers in GOOD
(customers that who want exactly one item in L).
Let OPTi be the best profit achievable (from item i) using a fixed
price for item i from customers in GOOD whose bundle contain item i.
Using the [BH’05] auction, the expected profit of the online auction
for item i is:
Overall, we achieve profit at least:
Profit of the offline approx. alg.
Download