A Primal-Dual Approach to Online Optimization Problems Seffi Naor Computer Science Dept. Technion, Haifa, Israel Based on joint papers with Nikhil Bansal, Niv Buchbinder, and Kamal Jain Road Map Introducing the framework: • Ski rental • Online set cover • Virtual circuit routing The general framework: • General covering/packing linear programs More applications: • The ad-auctions problem • Weighted caching Further research What is an Online Algorithm? • Input is given “in pieces” over time, each piece is called a “request” • Request sequence: ¾ = ¾1, … , ¾n • Upon arrival of request ¾i : • Online algorithm has to serve the request • Previous decisions for requests ¾1, …, ¾i-1 cannot be changed Performance Evaluation: Competitive Factor • How to evaluate performance of online algorithm A? • For every request sequence ¾ = ¾1, … , ¾n: compare cost of A to the cost of an optimal offline algorithm which is given the request sequence in advance • Competitive factor of online algorithm A is ® if for every request sequence ¾ = ¾1, … , ¾n: Linear Programming Linear constraints, linear objective Min x1 – 3 x2 + 2 x3 x1 – x2 + 7 x3 <= 2 x2 - x3 >= 0 x1 + 3 x2 + x3 >= -3 Polynomial time solvable. Lies at the heart of optimization. The Ski Rental Problem • Buying costs $B. • Renting costs $1 per day. Problem: • Number of ski days is not known in advance. Goal: Minimize the total cost. Ski Rental – Integer Program 1 - B uy 1 - R ent on day i zi x 0 - D on't rent on day i 0 D on't B uy k m in B x z i i 1 Subject to: For each day i: x z i 1 x , z i {0,1} Ski Rental – Relaxation P: Primal Covering D: Dual Packing k m in B x z k i i 1 For each day i: x z i 1 x , zi 0 m ax y i i 1 For each day i: y i 1 k yi B i 1 Online setting: • Primal: New constraints arrive one by one. • Requirement: Upon arrival, constraints should be satisfied. • Monotonicity: Variables can only be increased. Ski Rental – Algorithm P: Primal Covering D: Dual Packing k m in B x z k m ax y i i i 1 For each day i: x z i 1 i 1 For each day i: y i 1 x , zi 0 k i 1 Initially x 0 Each new day (new constraint): if x<1: zi 1-x x x(1+ 1/B) + 1/(c*B) yi 1 - ‘c’ later. yi B Analysis of Online Algorithm Proof of competitive factor: 1. Primal solution is feasible. 2. In each iteration, ΔP ≤ (1+ 1/c)ΔD. 3. Dual is feasible. Conclusion: Algorithm is (1+ 1/c)-competitive Initially x 0 Each new day (new constraint): if x<1: zi 1-x x x(1+ 1/B) + 1/(c*B) yi 1 - ‘c’ later. Analysis of Online Algorithm 1. Primal solution is feasible. If x ≥1 the solution is feasible. Otherwise set: zi 1-x. 2. In each iteration, ΔP ≤ (1+ 1/c)ΔD: Algorithm: If x≥1, ΔP =ΔD=0 When new constraint Otherwise: arrives, if x<1: zi1-x • Change in dual: 1 x x(1+ 1/B) + 1/c*B • Change in primal: y 1 BΔx + zi = x+ 1/c+ 1-x = 1+1/c i Analysis of Online Algorithm 3. Dual is feasible: k Need to prove: y i B i 1 We prove that after B days x≥1 x is a sum of geometric sequence a1 = 1/(cB), q = 1+1/B B 1 1 1 1 B x 1 cB 1 1 B B 1 1 1 B c Algorithm: When new constraint arrives, if x<1: zi1-x x x(1+ 1/B) + 1/c*B yi1 B 1 c 1 1 e 1 B 1 e 1 c e 1 Randomized Algorithm 0 1 X: X1 X2 X3 X4 • Choose d uniformly in [0,1] • Buy on the day corresponding to the “bin” d falls in • Rent up to that day Analysis: • Probability of buying on the i-th day is xi • Probability of renting on the i-th day is at most zi Going Beyond the Ski Problem • Ski problem: coefficients in the constraint matrix belong to {0,1} • What can be said about general constraint matrices with coefficients from {0,1}? The Online Set-Cover Problem • Elements: e1, e2, …, en • Set system: s1, s2, … sm • Costs: c(s1), c(s2), … c(sm) Online Setting: • Elements arrive one by one. • Upon arrival elements need to be covered. • Sets that are chosen cannot be “unchosen”. Goal: Minimize the cost of the chosen sets. Online Set-Cover: Lower Bound • elements: 1, … ,n • sets: all (n choose n) subsets of cardinality n • cost: unit cost Adversary’s strategy: • While possible: pick an element that is not covered (# of elements offered ≥ n) Competitive ratio: n (cost of online: = n, cost of OPT = 1) But, … n ≈ log(n choose n). polylog(m,n) is not ruled out! Set Cover – Linear Program P: Primal Covering D: Dual Packing m ax y ( e ) m in c ( s ) x ( s ) e E s S e E s e s x(s) 1 s S y (e) c ( s ) e s Online setting: • Primal: constraints arrive one by one. • Requirement: each constraint is satisfied. • Monotonicity: variables can only be increased. Set Cover – Algorithm P: Primal Covering D: Dual Packing m ax y ( e ) m in c ( s ) x ( s ) e E s S e E x(s) 1 s S y (e) c ( s ) e s s e s Initially x(s) 0 When new element arrives, while x ( s ) 1 : • y(e) y(e)+1 s e s • . s e s x ( s ) x ( s ) 1 1 / c ( s ) 1 / m c ( s ) Analysis of Online Algorithm Proof of competitive factor: 1. Primal solution is feasible. 2. In each iteration, ΔP ≤ 2ΔD. 3. Dual is (almost) feasible. Conclusion: We will see later. Initially x(S) 0 When new element e arrives, while s e s x(s) 1 : • y(e) y(e)+1 • . s e s x ( s ) x ( s ) 1 1 / c ( s ) 1 / m c ( s ) Analysis of Online Algorithm 1. Primal solution is feasible. We increase the primal variables until the constraint is feasible. Initially x(S) 0 When new element e arrives, while s e s x(s) 1 : • y(e) y(e)+1 • . s e s x ( s ) x ( s ) 1 1 / c ( s ) 1 / m c ( s ) Analysis of Online Algorithm 2. In each iteration, ΔP ≤ 2ΔD. In each iteration: • ΔD = 1 x(s) 1 P = c(s) x ( s ) c(s) c ( s ) m c ( s ) s|es s|es s|e s x (s) 1/m 2 s|e s Initially x(S) 0 When new element e arrives, while s e s x(s) 1 : • y(e) y(e)+1 • . s e s x ( s ) x ( s ) 1 1 / c ( s ) 1 / m c ( s ) Analysis of Online Algorithm 3. Dual is (almost) feasible: • We prove that: s S , y(e) c(s)O (log m ) e S • If y(e) increases, then x(s) increases (for e in S). • x(s) is a sum of a geometric series: a1 = 1/[mc(s)], q = (1+ 1/c(s)) Initially x(S) 0 When new element e arrives, while s e s x(s) 1 : • y(e) y(e)+1 • . s e s x ( s ) x ( s ) 1 1 / c ( s ) 1 / m c ( s ) Analysis of Online Algorithm After c(s)O(log m) rounds: x(s) 1 1 1 / c ( s ) c ( s ) O (log m ) 1 1 1 / c ( s ) 1 m c(s) 1 1 / c ( s ) c ( s ) O (log m ) 1 1 m We never increase a variable x(s)>1! Initially x(S) 0 When new element e arrives, while s e s x(s) 1 : • y(e) y(e)+1 • . s e s x ( s ) x ( s ) 1 1 / c ( s ) 1 / m c ( s ) Conclusion • The dual is feasible with cost 1/O(log m) of the primal. The algorithm produces a fractional set cover that is O(log m)-competitive. • Remark: No online algorithm can perform better in general. What about an integral solution? • Round fractional solution. (With O(log n) amplification.) • Can be done deterministically online [AAABN03]. • Competitive ratio is O(log m log n). Online Virtual Circuit Routing Network graph G=(V, E) capacity function u: E Z+ Requests: ri = (si, ti) • Problem: Connect si to ti by a path, or reject the request. • Reserve one unit of bandwidth along the path. • No re-routing is allowed. • Load: ratio between reserved edge bandwidth and edge capacity. • Goal: Maximize the total throughput. Routing – Linear Program y ( ri , p ) P ( ri ) = Amount of bandwidth allocated for ri on path p - Available paths to serve request ri m ax ri y ( ri , p ) p P ( ri ) s.t: For each ri: y ( ri , p ) 1 p P ( ri ) For each edge e: ri p P ( ri ) e p y ( ri , p ) u ( e ) Routing – Linear Program P: Primal Covering m in u ( e ) x ( e ) e E D: Dual Packing z (r ) m ax i ri ri ri , p P ( ri ) : ri x(e) z ( r ) 1 i e p y ( ri , p ) p P ( ri ) y ( ri , p ) 1 p P ( ri ) e : ri y ( ri , p ) u ( e ) p P ( ri ) e p Online setting: • Dual: new columns arrive one by one. • Requirement: each dual constraint is satisfied. • Monotonicity: variables can only be increased. Routing – Algorithm P: Primal Covering m in u ( e ) x ( e ) e E D: Dual Packing z (r ) m ax i ri ri ri , p P ( ri ) : ri x(e) z ( r ) 1 i e p y ( ri , p ) p P ( ri ) y ( ri , p ) 1 p P ( ri ) e : ri y ( ri , p ) u ( e ) p P ( ri ) e p Initially x(e) 0 When new request arrives, if p P ( ri ), x(e) 1 : e p z(ri) 1 1 1 . e p : x ( e ) x ( e ) 1 u (e ) n u (e ) y(ri,p) 1 Analysis of Online Algorithm Proof of competitive factor: 1. Primal solution is feasible. 2. In each iteration, ΔP ≤ 3ΔD. 3. Dual is (almost) feasible. Conclusion: We will see later. Initially x(e) 0 When new request arrives, if p P ( ri ), x(e) 1 : e p z(ri) 1 1 1 .e p : x ( e ) x ( e ) 1 u (e ) n u (e ) y(ri,p) 1 Analysis of Online Algorithm 1. Primal solution is feasible. If p P ( r ), x(e) 1 : the solution is feasible. i e p Otherwise: we update z(ri) 1 Initially x(e) 0 When new request arrives, if p P ( ri ), x(e) 1 : e p z(ri) 1 1 1 . e p : x ( e ) x ( e ) 1 u (e ) n u (e ) y(ri,p) 1 Analysis of Online Algorithm In each iteration: ΔP ≤ 3ΔD. If p P ( ri ) : x(e) 1 ΔP = ΔD=0 e p Otherwise: ΔD=1 2. P e p u ( e ) x ( e ) z ( ri ) x (e) 1 u (e) 1 3 e p u (e) n u (e) Initially x(e) 0 When new request arrives, if p P ( ri ), x(e) 1 : e p z(ri) 1 1 1 . e p : x ( e ) x ( e ) 1 u (e ) n u (e ) y(ri,p) 1 Analysis of Online Algorithm 3. Dual is (almost) feasible. We prove: • For each e, after routing u(e)O(log n) on e, x(e)≥1 x(e) is a sum of a geometric sequence x(e)1 = 1/(nu(e)), q = 1+1/u(e) After u(e)O(log n) requests: u ( e ) O (log n ) 1 1 1 u (e) 1 x (e) n u (e ) 1 1 1 u (e) 1 1 u ( e ) u ( e ) O (log n ) 1 1 n D: Dual Packing m n min cx i i i 1 a i 1 b j yj j 1 m n j, 1 j m max ij xi b j i, 1 i n a ij y j ci j1 P* = D* • ci, aij, bj and the variables xi are all non-negative • Many approximation/exact algorithms are actually primal-dual: shortest path, matching, flows, cuts .. Dual solutions P: Primal Covering Primal solutions Covering/Packing Linear Programs Key Idea for Online Primal-Dual Primal: Min i ci xi Dual: Max t bt yt Step t, new constraint: a1x1 + a2x2 + … + ajxj ≥ bt New variable yt + bt yt in dual objective xi (1+ ai/ci) xi yt yt + 1 (mult. update) (additive update) primal cost = = Dual Cost Online Primal-Dual Approach • Can the offline problem be cast as a linear covering/packing program? • Can the online process be described as: – New rows appearing in a covering LP? – New columns appearing in a packing LP? Yes ?? • Upon arrival of a new request: – Update primal variables in a multiplicative way. – Update dual variables in an additive way. Online Primal Dual Approach Next Prove: 1. Primal solution is feasible (or nearly feasible). 2. In each round, ΔP ≤ c ΔD. 3. Dual is feasible (or nearly feasible). Got a fractional solution, but need an integral solution ?? • • Randomized rounding techniques might work. Sometimes, even derandomization (e.g., method of conditional probabilities) can be applied online! Simple Lower Bound min x1 + x2 + … + xn x1 + x2 + x3 + … + xn ≥ 1 x2 + x3 + … + xn ≥ 1 x3 + … + xn ≥ 1 … xn ≥ 1 Set all xi to 1/n Increase x2 ,x3,…,xn to 1/n-1 … Increase xn to 1 Online = H(n) ≈ lnn (1+1/2+ 1/3+ … + 1/n) Opt = 1 ( xn=1 suffices) General Covering/Packing Results {0,1} covering/packing matrix: • [Buchbinder Naor 05] Competitive ratio O(log D) (D: max number of non-zero entries in a constraint). Remarks: • • • Fractional solutions randomized algorithm (online rounding). Number of constraints/variables can be exponential. Possible tradeoff: Competitive ratio/violation of constraints. General Covering/Packing Results For a general covering/packing matrix [BN05] : Covering: • Competitive ratio O(log n) (n – number of variables). Packing: • Competitive ratio O(log n + log [a(max)/a(min)]) a(max), a(min) – max/min non-zero entry Remarks: • Results are tight. • Can add “box” constraints to covering LP (e.g. x≤1) Online Primal-Dual Approach Advantages: 1. Generic ideas and algorithms applicable to many online problems. 2. Linear Program helps detecting the difficulties of the online problem. 3. General recipe for the design and analysis of online algorithms. 4. No potential function appearing “out of nowhere”. 5. Competitiveness with respect to a fractional optimal solution. Known Results via P-D Approach Covering Online Problems (Minimization): • O(log k)-algorithm for weighted caching [BBN07] • Ski rental, Dynamic TCP Acknowledgement • Parking Permit Problem [Meyerson 05] • • Online Set Cover [AAABN03] Online Graph Covering Problems [AAABN04]: – – – – Non-metric facility location Generalized connectivity: pairs arrive online Group Steiner: groups arrive online Online multi-cut: (s,t)--pairs arrive online Known Results via P-D Approach Packing Online Problems (maximization): • Online Routing/Load Balancing Problems [AAP93, AAPFW93, BN06]. • • • General Packing/routing e.g. Multicast trees. Online Matching [KVV91] – Nodes arrive one-by-one. Ad-Auctions Problem [MSVV05] – In a bit … What are Ad-Auctions? You type in a search engine: Vacation Eilat You get: And … Advertisements Algorithmic Search results How do search engines sell ads? • Each advertiser: – Sets a daily budget – Provides bids on interesting keywords • Search Engine (on each keyword): – Selects ads – Advertiser pays bid if user clicks on ad. Goal (of Search engine): Maximize revenue Mathematical Model • Buyer i: – Has a daily budget B(i) • Online Setting: – Items (keywords) arrive one-by-one. – Each buyer gives a bid on each of the items (can be zero) • Algorithm: – Assigns each item to some interested buyer. Assumption: Bids are small compared to the daily budget. Results [MSVV FOCS 05]: • (1-1/e)-competitive online algorithm. • Bound is tight. • Analysis uses tradeoff revealing family of LP’s - not very intuitive. Our Results [Buchbinder, Jain, N, 2007]: • A different approach based on the primal-dual method. • Techniques are also applicable to further extensions of the problem. Ad-Auctions – Linear Program B(i) – Budget of buyer i b(i,j) – bid of buyer i on item j I - Set of buyers. J - Set of items. y (i , j ) = 1 j-th ad-auction is sold to buyer i. m ax i I s.t: For each item j: j J y (i , j ) 1 i I For each buyer i: b (i, j ) y (i, j ) Each item is Buyers do not sold once. exceed their budget b (i , j ) y (i , j ) B (i ) j J Ad-auctions Primal and Dual P: Primal Covering m in B ( i ) x ( i ) i I z( j) j J For each item j and buyer i: b ( i , j ) x ( i ) z ( j ) b ( i , j ) D: Dual Packing m ax i I For each item j: b (i, j ) y (i, j ) j J y (i , j ) 1 i I For each buyer i: b (i , j ) y (i , j ) B (i ) j J The Primal-Dual Algorithm • Initially: for each buyer i: x(i) 0 • When new item j arrives: • Assign the item to the buyer i that maximizes: b ( i , j ) 1 x ( i ) • if x(i)≥1 do nothing, otherwise: y (i , j ) 1 z ( j ) b ( i , j ) 1 x ( i ) b (i , j ) b (i , j ) x ( i ) x ( i ) 1 B ( i ) B ( i ) c 1 - ‘c’ later. The Paging/Caching Problem Universe of of n pages Cache of size k ¿ n Request sequence of pages: 1, 6, 4, 1, 4, 7, 6, 1, 3, … If requested page is in cache: no penalty. Else, cache miss! load requested page into cache, evicting some other page. Goal: minimize number of cache misses. Question: which page to evict in case of a cache miss? Previous Results: Paging Paging (Deterministic) [Sleator Tarjan 85]: • Any det. algorithm ≥ k-competitive. • LRU is k-competitive (also other algorithms) Paging (Randomized): • Rand. Marking O(log k) [Fiat, Karp, Luby, McGeoch, Sleator, Young 91]. • Lower bound Hk [Fiat et al. 91], tight results known. The Weighted Paging Problem One small change: • Each page i has a different load cost w(i). • Models scenarios in which the cost of bringing pages is not uniform: Main memory, disk, internet … web Goal • Minimize the total cost of cache misses. Weighted Paging (Previous Work) Randomized Deterministic Paging Lower bound k LRU k competitive O(log k) Randomized Marking [Fiat et al. 91] Weighted Paging k-competitive [Chrobak, Karloff, Payne, Vishwanathan 91] [Young 94] using duality! O(log k) for two weight classes [Irani 02] No o(k) algorithm known even for 3 weight classes. The k-server Problem • k servers (fire trucks) lie in an n-point metric space. • Requests arrive at points. • Serving a request: move a server to request point. Goal: Minimize total distance traveled by servers. Example: The k-server Problem • Paging = k-server on a uniform metric – every page is a point – A page is in the cache iff a server is at the point • Weighted paging = k-server on a weighted star metric Deterministic Results: • General metric spaces: (2k-1)-competitive work function algorithm [Koutsoupias-Papadimitriou 95] • Tree metric: k-competitive algorithm [Chrobak et al. 91] Randomized Results: • No o(k) algorithm known (even for very simple spaces). • Best lower bound: (log k) Fractional Weighted Paging Model: • Fractions of pages are kept in cache: probability distribution over pages p1,…,pn • The total sum of fractions of pages in the cache is at most k. • If pi changes by , cost = w(i) k units of cache Randomized Algorithm [Bansal, Buchbinder, N., FOCS 07] - O(log k)-competitive for weighted paging High level idea: 1. Design a primal-dual O(log k)-competitive algorithm for fractional weighted paging. 2. Obtain a randomized algorithm while losing only a constant factor. Setting up the Linear Program time line Page i Page j Page p Page i Page j t B(t): Set of pages requested until time t (including pt) We can only keep k pages out of the B(t) pages Evict ≥ [|B(t)| - 1 - (k- 1)] = [|B(t)|-k] pages from B(t)\{pt} Weighted paging – Linear Program n m in i 1 t i B (t)\{p t } r ( i ,t ) w (i ) x (i , j ) j 1 x ( i , r ( i , t )) B ( t ) k 0 x (i, j ) 1 • Idea: charge for evicting pages instead of fetching pages x(i,j) – indicator for the event that page i is evicted from the cache between the j-th and (j+1)-st times it is requested r(i,t) - number of times page i is requested till time t, including t Primal and Dual Programs P: Primal Covering n m in i 1 t r ( i ,t ) w (i ) x (i , j ) j 1 x ( i , r ( i , t )) B ( t ) k i B (t)\{p t } 0 x (i, j ) 1 D: Dual Packing m ax B (t ) k y (t ) t n r ( i ,t ) i 1 z (i , j ) j 1 For each page i and the j th time it is asked: t ( i , j 1) 1 y (t ) z (i , j ) w (i ) t t ( i , j ) 1 Sketch of Primal-Dual algorithm New primal constraint: i B (t)\{p t } Dual variable y(t) & constraint x ( i , r ( i , t )) B ( t ) k t ( i , j 1) 1 y (t ) z (i , j ) w (i ) t t ( i , j ) 1 While primal constraint infeasible: Increase y(t) When dual constraint tight, x(i,j) = 1/k From then on, increase exponentially (until x(i,j)=1) The growth function of x(i,j) x (i , j ) 1 1/k 0 Dual is tight Page fully in memory (marked) Dual violated by O(log k) Page is “unmarked” Corresponding Dual constraint Page fully evicted Analysis of Online Algorithm Proof of competitive factor: 1. Primal solution is feasible. 2. Primal ≤ 2 Dual. 3. Dual is feasible up to O(log k) factor Conclusion (weak duality): algorithm is O(log k)-competitive Generalized Caching Pages have both sizes and fetching costs. • Motivation: Web-Caching Special models: • Bit model: Fetching cost proportional to size (minimize traffic) • Fault model: Fetching cost is uniform (minimize number of times a user has to wait for a page) Results (Bansal, Buchbinder, N., 2008): • O(log k)-competitive algorithms for Bit and Fault models. • O(log2k)-competitive algorithm for the general model. • Requires new ideas and interesting analysis in the rounding phase. Further Research (1) • More applications. • Extending the general framework beyond packing/covering. Further Research (2) • Are these techniques strong enough to resolve the randomized k-server conjecture? Answer: maybe … but we don’t know how. • certainly looks like the right direction. Main Difficulty: • k-server = Online Min Cost Flow. • Might need to be able to solve more general LPs … Revisiting the Framework • • Positive objective function. New covering constraints arrive one-by-one. • Feasible region is not known in advance (gradually exposed as more covering constraints appear). When out of feasible region - should move back inside and pay movement cost. • Cost of “infinity” for staying outside the feasible region. Learning from expert advice Problem: • N experts (TV channels) trying to predict “rain/sun” every day. • Decide whether to take umbrella or sunglasses? • Goal: Predict (almost) as good as the best expert. Learning from expert advice Feasible region: distribution yt on n points (simplex) Each round: Cost vectors ct define cost in every point (right/wrong) Algorithm: can change the distribution yt-1 yt Payments: • Service cost: ct yt-1 (cost of staying in a point) • Movement cost: None. [BBN09]: Primal-dual algorithm: same guarantees as weighted majority. Metrical Task System (uniform) Feasible region: distribution yt on n points (simplex) Each round: cost vectors ct defines cost in every point. Algorithm: can change the distribution yt-1 yt Payments: • service cost: ct yt (cost of staying in a point) • movement cost: Earthmover distance metric between distributions. Metrical task system (MTS) [BBN09]: A new algorithm for MTS that for any 0<ɛ<1: 1. Pays service cost at most (1+ɛ)OPT 2. Pays movement cost at most log(n/ɛ)OPT • Known using learning algorithms [BBBT97]. • Leads to best known algorithm for general metrics [FM00]. • Very intuitive primal-dual algorithm. • Uses a non-covering LP. Commercial More about this approach: The Design of Competitive Online Algorithms via a Primal-Dual Approach By Niv Buchbinder and Seffi Naor Foundations and Trends® in Theoretical Computer Science Thank you