Dual fitting in online algorithms Naveen Garg IIT Delhi Joint work with S. Anand, Syamantak Das, Anamitra Chowdhary and Amit Kumar (IIT Delhi) What is dual fitting? • A way to argue about the approximation/competitive ratio of an algorithm. • In many cases the algorithm is greedy. • A dual solution is built using the properties of the algorithm. • Has been successful for problems in facility location, online allocation and online scheduling. Plan of the tutorial • Warmup- Analyse set cover (offline) using dual fitting • Dual fitting for online algorithms – Scheduling with speed augmentation – Scheduling with rejection. Set Cover We are given n elements U= {1,2,3,…,n} and subsets of U, π1 , π2 , π3 , … , ππ . A Set Cover is a collection of subsets which includes (covers) every element in U. Want to find a set cover of minimum size. This is NP-hard. Greedy Algorithm Pick the set which covers the most number of uncovered elements. Repeat above step till all elements are covered. An Integer program for set cover • π₯π is an indicator variable: 1 if S is picked in the setcover and 0 otherwise min ο₯ xS S ο’e : ο₯x S :eοS S ο³1 ο’S : xS ο ο»0,1ο½ Every element e should be included in at least one set. A LP-relaxation for set cover • Replace the integrality constraint, π₯π ∈ {0,1} by a linear constraint 0 ≤ π₯π ≤ 1 min ο₯ xS min ο₯ xS S ο’e : ο₯x S :eοS S S ο³1 ο’S : xS ο ο»0,1ο½ ο’e : ο₯x S :eοS S ο³1 ο’S : xS ο³ 0 • π₯π ≤ 1 is superfluous and so we drop it. Primal and Dual LPs min ο₯ xS max ο₯ ye eοU S ο’e : ο₯x S :eοS S ο³ 1 π¦π ο’S : xS ο³ 0 ο’S : ο₯ ye ο£ 1 eοS ο’e : ye ο³ 0 Analyzing GREEDY Approximation/ competitive ratio Dual LP Construct a dual solution Primal LP LP opt. value GREEDY solution Show that the dual solution value and the no of sets picked by GREEDY are close to each other. Building the dual solution When greedy picks a set and covers k new elements, each of these elements is 1 assigned π¦π = . π Hence, π π¦π equals the number of sets picked by greedy. However this dual solution is not feasible. Making the dual feasible • Consider a set S. We would like π∈π π¦π ≤ 1 . • Rename elements so that if greedy covers ππ before ππ then π < π. • When greedy picked a set to cover ππ , it could also have picked S and covered π − π + 1 new elements. • Greedy picked the set which covered most new elements. • So π¦π ≤ 1/( π − π + 1) Analysis of Greedy • Hence π∈π π¦π ≤ (1 1 + 2 1 + 3 + β―+ 1 π ) which is at most ln |π|. • If the largest set is of size k then π¦π / ln π is a dual feasible solution. • Hence number of sets picked by greedy is at most ln π times a dual feasible solution. Plan of the tutorial • Warmup- Analyse set cover (offline) using dual fitting • Dual fitting for online algorithms – Scheduling with speed augmentation – Scheduling with rejection. Analysing online algorithms using Dual fitting • The Scheduling world is a rich source of problems for the online algorithms community • In this talk: show how to schedule jobs arriving online so that the average response time (flow time) is minimised. Problem Definition Given : A set of machines and a set of jobs. A matrix pij of processing times of job i on machine j. Each job j specifies a release date rj and has weight wj rj Problem Definition rj Pre-emption allowed rj Migration not allowed Problem Definition rj Cj Flow-time of j, Fj = Cj – rj Goal : Find a schedule which minimizes ο₯j wj Fj Problem Definition rj Cj Weighted flow-time = ο₯t total weight of jobs alive at time t Special Cases Parallel : all machines identical p ij = p j Related : machines have different speeds p ij = p j /s i Subset Parallel : parallel except that a job can only go on a subset of machines pij p j Subset Related All of these are NP-hard (even in un-weighted case). Off-line versus On-line Off-line : given the input (set of jobs with their processing time, weight, release date). Approximation Ratio ο½ Flow ο time of the algorithm Flow ο time of optimal solution On-line : know about a job only after it’s release date Flow ο time of the on ο linealgorithm Competitive Ratio ο½ Flow ο time of optimal off ο line solution Single Machine Unweighted Case : SRPT : At each time, schedule the job with the smallest remaining processing time. SRPT is optimal, and on-line Weighted Case : No constant factor approx. known. On-line: super constant lower bound for deterministic algorithms [Bansal, Chan ’09]. Off-line: π log log ππ approximation, P = max pj/min pj [Bansal, Pruhs ’10]. Parallel Machines Unweighted Case : O(log P)-competitive algorithm. [Leonardi, Raz ’97, …] Almost matching lower bounds. [Leonardi Raz ’97, GK. ’07] Weighted Case : No logarithmic approx. algorithms known. Strong lower bounds for on-line case unweighted Flow time Online Offline Parallel machines O(log P), ο(log P) ο(log1-ε P) Related machines O(log P) Subset parallel Unbounded O(log P) [GK ’07] [GK ’07] ο(log P/loglogP) O(min(log2n, lognlogP)) Unrelated machines [LR ’97] [GK ’07] [AGK] [BK ’14] A bad example A+B=T A> T/2 Bx Ax 0 Ax T Flow time is at least AxL > T L/2 OPT flow time is O(T2+L) Ω(T) lower bound on any online algorithm T+L Other Models • What if we allow the algorithm extra resources ? • In particular, suppose the algorithm can process (1+ε) units in 1 time-unit. [first proposed by Kalyanasundaram,Pruhs95] Resource Augmentation Model Resource Augmentation • For a single machine, many natural scheduling algorithms are O(1/eO(1))competitive with respect to any Lp norm [Bansal Pruhs ‘03] • Parallel machines : randomly assign each job to a machine – O(1/eO(1)) competitive [Chekuri, Goel, Khanna, Kumar ’04] • Unrelated Machines : O(1/e2)-competitive, even for weighted case. [Chadha, Garg, Kumar, Muralidhara ‘09] The plan for the next 30mins • An LP Formulation • Greedy algorithm for resource augmentation model for unrelated machines (all weights are 1) • Analysis by building a suitable dual • Extensions Fractional flow-time Cj Recall, flow-time of j = ο₯ 1 t ο½rj pj(t) = remaining processing of job j at time t remaining fraction at time t = pj (t) total processingtime Fractional flow-time of j = ο₯t ¸ rj pj(t)/pij Fractional flow-time 0 1 2 5 12 Fractional flow-time = 1*2 + 2/3*3 + 1/3*7 Fractional flow-time can be much smaller than (integral) flow-time Integer Program Define 0-1 variables : x(i,j,t) : 1 iff job j processed on i during [t,t+1] Write constraints and objective in terms of these variables. Fractional flow-time of j = ο₯t ¸ rj (t-rj) ¢ x(i,j,t)/pij LP Relaxation x(i ,j, t) Mi n ο₯ (t ο rj ) pij i,j, t x(i ,j, t) ο½1 ο₯ pij i,t ο₯ x(i ,j, t) ο£ 1 for all i ,t j x(i ,j, t) One Caveat … for all j ο³ 0 Fractional flow-time A job can be done simultaneously on many machines : flow-time is almost 0 LP Relaxation x(i ,j, t) Mi n ο₯ (t ο rj ) ο« pij i,j, t x(i ,j, t) ο½1 ο₯ pij i,t ο₯ x(i ,j, t) ο₯ x(i ,j, t) i,j, t for all j ο£ 1 for all i ,t j x(i ,j, t) ο³ 0 Add a term for processing time Our Algorithm When a job arrives, we dispatch it to one of the machines. Each machine just follows the optimal policy : Shortest Remaining Processing Time (SRPT) What is the dispatch policy ? GREEDY Our Algorithm When a job j arrives, compute for each machine i the increase in flow-time if we dispatch j to i. j arrives at time t : pij1(t) · pij2(t) · … j1 pijr(t) < pij < pijr+1(t) j2 Increase in flow-time = pj1(t) + … + pjr(t) + pij+ pij(s-r) jr j jr+1 js pj1(t) Our Algorithm When a job j arrives, compute for each machine i the increase in flow-time if we dispatch j to i. Dispatch j to the machine for which increase in fractional flow-time is minimum. Analyzing our algorithm Dual LP Construct a dual solution Primal LP LP opt. value Algorithm’s value Show that the dual solution value and algorithm’s flow-time are close to each other. Dual LP x(i ,j, t) Mi n ο₯ (t ο rj ) ο« pij i,j, t x(i ,j, t) ο½1 ο₯ pij i,t ο₯ x(i ,j, t) i,j, t for all j ®j ο£ 1 for all i ,t ¯it j x(i ,j, t) ο₯ x(i ,j, t) ο³ 0 Dual LP Max ο₯α j αj pij j ο i,t ο βit ο£ α j , βit ο₯β it t ο rj pij ο³ 0 ο«1 for all i , j, t, t ο³ rj Setting the Dual Values When a job j arrives, set ®j to the increase in flow-time when j is dispatched greedily. j arrives at time t : pij1(t) · pij2(t) · … j1 pijr(t) < pij < pijr+1(t) j2 ®j = pj1(t) + … + pjr(t) + pij + pij(s-r) Thus ο₯j αj is equal to the total flowtime. jr jr+1 js pj1(t) Setting the Dual Values Set ¯it to be the number of jobs waiting at time t for machine i. ¯it = s j1 j2 Thus ο₯i,t ¯it is equal to the total flow-time. jr jr+1 js pj1(t) Dual Feasibility Fix a machine i’, a job j and time t’. Suppose pi’jl(t) < pi’j < pi’jl+1(t) ο‘j ο£ pj (t) ο« ... ο« pj (t) ο« pi'j ο« pi'j (s ο l ) 1 j1 j2 l t'οt Need to verify ο£ βi't' ο« ο«1 pi'j pi'j αj jl jl+1 js pj1(t) Dual Feasibility What happens when t’ = t ? ο‘j ο£ pj (t) ο« ... ο« pj (t) ο« pi'j ο« pi'j ο¨s ο l ο© 1 j1 l j2 αj pi'j ο£ pj1 (t) ο« ... ο« pjl (t) pi'j ο« 1 ο« ο¨s ο l ο© ο£ s ο« 1 ο½ βi't ο« 1 jl jl+1 js pj1(t) Dual Feasibility What happens when t’ = t + ¢? Suppose at time t’ job jk is being processed Case 1: k ≤ l αj pi'j ο£ ο£ ¢ pj1 (t) ο« ... ο« pjk-1 (t) ο« pjk (t) ο« ... ο« pjl (t) pi'j (t'οt) ο« pjk (t) ο« ... ο« pjl (t) pi'j j1 j2 ο« 1 ο« ο¨s ο l ο© jl ο« 1 ο« ο¨s ο l ο© (t'οt) (t'οt) ο£ ο« 1 ο« (s - k ο« 1) ο½ ο« 1 ο« βi't' pi'j pi'j jl+1 js Dual Feasibility Case 2: k > l αj pi'j ο£ ο£ j2 jr pj1 (t) ο« ... ο« pjl (t) ο« pi'j ο« pi'j ο¨s ο l ο© jr+1 pi'j pj1 (t) ο« ... ο« pjr (t) ο« pjlο«1 (t) ο« ... ο« pjk-1 (t) pi'j ¢ js ο« 1 ο« ο¨s ο k ο« 1ο© (t'οt) (t'οt) ο£ ο« 1 ο« (s - k ο« 1) ο½ ο« 1 ο« βi't' pi'j pi'j Dual Feasibility Hence, for any machine i’, time t’ and job j (t'οt) ο£ βi't' ο« ο«1 pi'j pi'j αj So, ®j, ¯it are dual feasible But ο₯i,t ¯it and ο₯j αj both equal the total flow time and hence the dual objective value is ο₯ο‘ ο ο₯ ο’ j j i ,t i ,t ο½0 Incorporating machine speed-up For any machine i’, time t’ and job j βi't' (t'οt) ο£ ο« ο«1 ο¨1 ο« e ο© pi'j 1 ο« e ο¨1 ο« e ο© pi'j αj So the values ®j, ¯it/(1+ε) are dual feasible for an instance with processing times larger by a factor (1+ε) Equivalently, schedule given instance on machines of speed (1+ε) to determine ®j, ¯it. The values ®j, ¯it/(1+ε) are dual feasible. Dual Objective Value Since ο₯i,t ¯it = ο₯j αj the value of the dual is ο’ i ,t e ο½ ο₯j ο‘ j ο ο₯ 1ο« e i ,t 1 ο« e ο₯ο‘ j j The dual value is less than the optimum fractional flow time. Hence, the flow time of our solution, ο₯j ®j, is at most (1+1/ε) times the optimum fractional flow time. Extensions • Analysis extends to the weighted Lp-norm of the flow time to give an Oο¦ο§ 2ο«pο¨1/ p ο© οΆο· ο¨e οΈ competitive algorithm. Improves slightly on result of Im and Moseley. • Analysis also extends to the case of minimizing sum of weighted flow time and energy on unrelated machines. Gives an O(γ2)-competitive algorithm when energy function is of the kind sγ. Subsequently improved to π πΎ log πΎ [Devanur, Huang 2014] Plan of the tutorial • Warmup- Analyse set cover (offline) using dual fitting • Dual fitting for online algorithms – Scheduling with speed augmentation – Scheduling with rejection. What when π = ∞? • We want to minimise maximum (weighted) flow time. • When π€π = ππ−1 , this is the same as minimising max stretch. • Earlier results do not apply since competitive ratio is linear in p. • If the optimum (π ∗ ) is known then this is same as deadline scheduling. Job j has deadline ππ + π ∗ π€π−1 , all jobs should finish by their deadlines. The current status Max-flowtime Max-stretch Max-weightflow-time Single Poly time 1, Ω π0.4 , (1, π π0.5 ) (1 + π, π π −2 ) Parallel (1,2) (1 + π, π π −1 ) Related (1 + π, π π −3 ) Subset (1, Ω π ) (π 1 , Ω(log π)) parallel Unrelated (1 + π, π π −1 ) (a,b): b-competitive algorithm with a-speed machines. Immediate vs. non-immediate • Insisting on immediate dispatch will quickly get you into trouble. • A simple example (load balancing lower bound of Azar et.al.’95) – Subset parallel setting – Unit length and weight jobs – No constant competitive algorithm with constant speed-up. • Hence all results assume non-immediate dispatch. Bad example for imm. dispatch • 2m jobs arrive at time 0 but in log π batches. • Batch i: π/2π jobs which can be processed only on π/2π machines having highest load. Bad example (contd.) • When batch i arrives the π/2π highest loaded m/c’s have avg. load at least i. • Hence online algorithm has flow time at least log m. • Offline optimum is 2; jobs of batch i are scheduled on m/c’s on which batch i+1 cannot be processed. • Any constant speed-up will not help. The rejection model • Speed augmentation does not help if – We want immediate dispatch and minimise max flow time. – Minimise max stretch in subset parallel model. • Online algorithm allowed to reject ε-fraction of the jobs/total weight. • Jobs once rejected cannot be revoked. • We want non-migratory, immediate-dispatch algorithms; pre-emption allowed. Rejection model (contd.) • Natural assumption in many settings – Terms in Service Level Agreements. – “Server busy: Please try again later” message when accessing popular websites • Intuitively stronger than speed augmentation model – Simulate by rejecting every (1/ε)th job on each machine – Overall rejection over all machines (and not on every machine) is ε-fraction Our results (CDGK ‘15) • Load Balancing: – Unit size jobs: π(log(1/π)) algorithm – Arbitrary size: π(log 2 (1/π)) algorithm – Lower Bound: π(log(1/π)) • Flow Time ( L∞) – Unit size, unit weight - π(1/π) algorithm – Arbitrary size & weight- π(1/π 4 ) algorithm – Rejection weight different from Flow Time Weight (Max Stretch) - π(1/π 6 ) algorithm – Lower Bound: O(1/ε) Immediate reject in red This talk: max flow time • Jobs are unit size and unweighted. • Job j is released at time ππ and can be processed on a subset of machines ππ . • Our algorithm will be immediate dispatch, immediate reject, nonmigratory. • Algorithm will reject at most π-fraction of the jobs and maximum flow time will be π∗ , π where π ∗ is the optimum. The algorithm: GREEDY • Each machine has a queue of jobs which are assigned to it but not processed. • Let T be our current guess on the optimum. T is doubled if things go wrong. • When j arrives assign it to the m/c in ππ which has the least load (smallest queue) • If all m/c’s in ππ have load π/π reject j. Why is flow time at most π /π? ∗ • Machine i processes jobs in the order in which they were assigned to it. • Hence flow time of a job is the number of jobs ahead of it in the queue. • Queue sizes never go beyond π ∗ /π. • Remains to bound the number of rejected jobs. MaxFlowTime Algorithm (Illustration) t=1 2 3 T*/ε = 2 62 Primal and Dual LPs min T ο’( I , i ) : T ο³ ο₯x j:r j οI , iοS j ο’j : ij ο₯x i:iοS j ij ο length( I ) ο½1 ο’i, j : xij ο³ 0 max ο₯ ο‘ j ο ο₯ length( I ) ο ο’ ( I ,i ) j ( I ,i ) ο’j , ο’i ο S j : ο‘ j ο£ ο₯ο’ ( I ,i ) ο₯ο’ ( I ,i ):r j οI , iοS j ο£1 ( I ,i ) ο’( I , i ) : ο’ ( I ,i ) ο³ 0 Complementary slackness says π½πΌ,π should be non-zero only for machine-intervals where ∗ + πππππ‘β(πΌ) π₯ = π π:ππ ∈πΌ,π∈ππ ππ ( I ,i ) How to set π½πΌ,π • Too many intervals which are “tight”. • On each machine we pick a laminar subset of these intervals. • These intervals are assigned π½πΌ,π = 1. Scale by no. of such intervals to ensure πΌ,π π½πΌ,π ≤ 1. Queue Size on m/c i Machine Intervals 5T* 4T* 3T* 2T* T* k=5 πΌ51 πΌ41 k=4 πΌ32 πΌ31 πΌ21 πΌ11 πΌ33 k=3 πΌ22 k=2 k=1 πΌ1 = πΌ11 , πΌ2 = πΌ21 , πΌ22 , πΌ3 = {πΌ31 , πΌ32 , πΌ33 } For each machine & each k =1 to 1/ε define set of disjoint intervals: minimal interval s.t. queue size at left end-point is (k-1)T* and at right end-point is (at least) kT* π½πΌ,π is set to 1 for all these intervals. Queue Size on m/c i Machine Intervals 5T* 4T* πΌ51 3T* 2T* T* πΌ41 πΌ32 πΌ31 πΌ33 πΌ21 πΌ22 πΌ11 πΌ1 = πΌ11 , πΌ2 = πΌ21 , πΌ22 , πΌ3 = {πΌ31 , πΌ32 , πΌ33 } Machine intervals are nested – an interval in πΌπ is contained in an interval of πΌπ−1 . If at time t queue size is at least kT* for machine i then t belongs to a machine interval in πΌπ (π). Queue Size on m/c i Machine Intervals 5T* 4T* πΌ51 3T* 2T* T* πΌ41 πΌ32 πΌ31 πΌ33 πΌ21 πΌ22 πΌ11 πΌ1 = πΌ11 , πΌ2 = πΌ21 , πΌ22 , πΌ3 = {πΌ31 , πΌ32 , πΌ33 } Suppose job j arriving at time ππ is dispatched to machine i having queue size at least ππ ∗ . Then ππ belongs to πΌπ (π’) for all machines i’ where j can be dispatched. Choosing feasible πΌπ • For a job j, – πΌπ = 1/π if j is rejected – πΌπ = π if queue size on which j is in [ππ ∗ , π + 1 π ∗ ]. • ππ belongs to πΌπ (π’) for all machines i’ where j can be dispatched. • Hence on each machine ππ belongs to k intervals from our collection and so πΌπ ≤ πΌ,π:ππ ∈πΌ,π∈ππ π½πΌ,π By Weak duality • Feasible solution to dual has value less than the primal optimum. • Hence π πΌπ − πΌ,π πππππ‘β πΌ ⋅ π½πΌ,π ≤ π ∗ |(πΌ, π)|. • Consider all intervals of class k. Let – ππ be their number and – πΏπ their total length • Then π πΌπ − ∗ πΏ ≤ π π π π ππ . The Analysis • Consider a job j which is not rejected and assigned to machine i. • If πΌπ = π then ππ is in some interval of class k, k-1, k-2,… 2,1. • Hence π∈π΄ πΌπ is at least the total number of jobs assigned to intervals in classes 2,3,.. The analysis (contd) • Number of jobs assigned to an interval is at least its length + π ∗ • Hence π∈π΄ πΌπ ≥ π>1 πΏπ + π ∗ π>1 ππ . • Since π πΌπ − π πΏπ ≤ π ∗ ∗ πΌ ≤ πΏ + π π1 . 1 π∈π π π ππ we get • Total number of jobs is at least πΏ1 + π ∗ π1 and so number of rejected jobs is at most an π-fraction.. Summary • Dual fitting is a powerful technique for analysing, both offline and online, algorithms. • In this talk I showed how dual fitting can work with faster machines and job rejection. Thank you for your attention