Learning Theory and Algorithms for Revenue Optimization in Second-Price Auctions with Reserve Andrés Muñoz Medina Joint work with Mehryar Mohri Courant Institute of Mathematical Sciences Friday, October 10, 2014 Second price auctions Friday, October 10, 2014 Second price auctions Friday, October 10, 2014 Second price auctions Friday, October 10, 2014 Second price auctions Friday, October 10, 2014 Second price auctions Friday, October 10, 2014 Second price auctions Friday, October 10, 2014 Second price auctions Friday, October 10, 2014 Second price auctions Friday, October 10, 2014 AdExchanges Friday, October 10, 2014 AdExchanges Friday, October 10, 2014 AdExchanges Friday, October 10, 2014 AdExchanges Friday, October 10, 2014 AdExchanges Friday, October 10, 2014 Reserve prices Friday, October 10, 2014 Reserve prices Friday, October 10, 2014 Reserve prices Friday, October 10, 2014 Motivation ✦ Online advertisement is a billion dollar industry. ✦ Revenue of online publishers directly related to AdExchanges. ✦ Several ads in AdExchange are sold at really low prices or not sold at all. ✦ Current pricing techniques are naïve. ✦ Increasing number of transaction logs. Friday, October 10, 2014 Related work ✦ Game-theoretic results and incentive compatible mechanisms. [Riley and Samuelson ‘81, Milgrom and Weber ‘82, Nisan et. al ’07, Blum et al. 03, Balcan et. al ’07] ✦ Estimation of empirical probabilities.[Cui et. al ’11, Ostrovsky and Samuelson ’11] ✦ Bandit algorithm with censored information. [Bianchi, Gentile and Mansour ‘12] Friday, October 10, 2014 Outline ✦ Definitions ✦ Learning guarantees ✦ Algorithms ‣ No features ‣ General case ✦ Friday, October 10, 2014 Experimental results Definitions ✦ Auction’s revenue depends on highest bids. (1) (2) b = (b , b ), let For bid pair ✦ Equivalent loss defined by: Friday, October 10, 2014 Machine Learning formulation ✦ ✦ ✦ ✦ ✦ Friday, October 10, 2014 x ∈ X : user information. 2 B ⊂ R : bid space. H = {h : X → R}: hypothesis set. D distribution over X × B. Generalization error: E[L(h(x), b)] . Machine Learning formulation ✦ ✦ ✦ ✦ ✦ x ∈ X : user information. 2 B ⊂ R : bid space. H = {h : X → R}: hypothesis set. D distribution over X × B. Generalization error: E[L(h(x), b)] . Generalization bounds? Friday, October 10, 2014 Loss function ✦ Non-differentiable. ✦ Non-convex. ✦ Discontinuous. Friday, October 10, 2014 Learning Guarantees Let (x1 , b1 ), . . . , (xn , bn ) be a training sample, if d denotes the pseudo-dimension of H , then with high probability for every h ∈ H : � � � n � d 1 L(h(xi ), bi ) + O E[L(h(x), b] ≤ n i=1 n ✦ ✦ Friday, October 10, 2014 How can we effectively minimize this loss? Algorithms Friday, October 10, 2014 No feature case ✦ Find optimaln reserve price. min r∈R � L(r, bi ) i=1 ✦ Optimal reserve is one of highest bids. ✦ Naïve O(n2 ) ✦ Sorting O(n log n) Friday, October 10, 2014 Algorithm idea (2) −(b1 (2) + b2 ) (2) −b2 0 0 Friday, October 10, 2014 − − − − (2) (0)b1 (2) (1)b2 (1) (2)b1 (2) (1)b1 Surrogate Loss Friday, October 10, 2014 Surrogate Loss ✦ Friday, October 10, 2014 Theorem: There exists no consistent convex surrogate loss that is not constant. Continuous surrogate Friday, October 10, 2014 Consistency results ✦ Define: ✦ Theorem: Let M = sup b and H be a Banach b∈B space. If h∗γ = argminh E[Lγ (h(x), b)], then (1) ∗ E[L(hγ (x), b)] Friday, October 10, 2014 ≤ ∗ E[Lγ (hγ (x), b)] + γM Learning Guarantees ✦ Let (x1 , b1 ), . . . , (xn , bn ) be a training (1) sample and M = sup b . With high probability b∈B for every h ∈ H : 1 E[Lγ (h(x), b] ≤ n Friday, October 10, 2014 n � 2 Lγ (h(xi ), b) + �n (H) + M γ i=1 � log 1/δ 2m Optimization via DC-programing ✦ Lγ Friday, October 10, 2014 is a difference of convex functions. Optimization via DC-programing ✦ Lγ Friday, October 10, 2014 is a difference of convex functions. Optimization via DC-programing ✦ Lγ Friday, October 10, 2014 is a difference of convex functions. Optimization via DC-programing ✦ Lγ Friday, October 10, 2014 is a difference of convex functions. Optimization via DC-programing ✦ Lγ Friday, October 10, 2014 is a difference of convex functions. Optimization via DC-programing ✦ Lγ Friday, October 10, 2014 is a difference of convex functions. DC-programming algorithm [Tao and Hoai ’97] ✦ Idea: Sequentially minimize upper bound on function. ✦ If F (w) = f (w) − g(w) . The following iterates converge to a local minimum: wt+1 = argminw f (w) − g(wt ) − ∇g(wt ) · (w − wt ) ✦ Friday, October 10, 2014 Matches CCCP algorithm. [Yuille and Rangarajan ’02] Line-search ✦ Function Lγ positive homogeneous Lγ (tr, tb) = tLγ (r, b) ✦ Fix direction w0 ✦ Rewrite n � i=1 ✦ Friday, October 10, 2014 � Lγ (λw0 xi , bi ) = n � i=1 ∀t > 0 � � w0 xi Lγ λ, bi w0 � x i Equivalent to no-feature minimization. � Algorithms ✦ No Features ✦ Regression ✦ Convex surrogate ✦ DC Friday, October 10, 2014 Experimental results 50 DC Convex 50 DC Ridge 40 Improvement % Improvement % 40 30 20 10 0 200 Friday, October 10, 2014 30 20 10 0 −10 300 400 600 800 1200 1600 2400 −20 200 300 400 600 800 1200 1600 2400 Experimental Results 2 1.8 Revenue 1.6 1.4 1.2 1 0.8 DC Convex No Features 200 300 400 600 800 12001600 2400 3200 Friday, October 10, 2014 Realistic data Surrogate 31.73 Friday, October 10, 2014 No Feat 29.58 DC 37.19 Lowest 29.53 Highest 52.85 Conclusion ✦ Machine learning is crucial for revenue optimization. ✦ Extension to GSP ✦ Better DC algorithm? ✦ Better initialization Friday, October 10, 2014