COMP 382: Reasoning about algorithms Fall 2014 Swarat Chaudhuri & John Greiner Unit 7: Greedy algorithms Swarat Chaudhuri & John Greiner COMP 382: Reasoning about algorithms Algorithm: Interval scheduling Interval scheduling problem: Job j starts at sj , finishes at fj (no weights) Two jobs compatible if they don’t overlap. Goal: find largest subset of mutually compatible jobs. Swarat Chaudhuri & John Greiner COMP 382: Reasoning about algorithms Use greedy algorithm? Natural possibilities: 1 [Earliest start time] Consider jobs in ascending order of start time sj 2 [Earliest finish time] Consider jobs in ascending order of finish time fj 3 [Shortest interval] Consider jobs in ascending order of interval length fj − sj 4 [Fewest conflicts] For each job, count the number of conflicting jobs cj . Schedule in ascending order of conflicts cj Only one these works! Swarat Chaudhuri & John Greiner COMP 382: Reasoning about algorithms Greedy strategies breaks earliest start time breaks shortest interval breaks fewest conflicts Swarat Chaudhuri & John Greiner COMP 382: Reasoning about algorithms Greedy algorithm Consider jobs in increasing order of finish time. Take each job provided it’s compatible with the ones already taken. 1 2 3 4 5 6 7 Sort jobs by finish times so that f1 ≤ f2 ≤ · · · ≤ fn . A = ∅ for j = 1 to n { if (job j compatible with A) A = A ∪ {j} } return A Complexity: O(n log n) Swarat Chaudhuri & John Greiner COMP 382: Reasoning about algorithms Optimality Theorem: Greedy algorithm is optimal. Proof: By contradiction. Assume greedy is not optimal, and let’s see what happens. Let i1 , i2 , . . . , ik denote set of jobs selected by greedy. Any optimal solution must eventually disagree with greedy; consider the one that disagrees at the latest possible point Let j1 , j2 , . . . , jm denote set of jobs in this optimal solution with i1 = j1 , i2 = j2 , ..., ir = jr for the largest possible r job ir+1 finishes before jr+1 Greedy: i1 i1 ir OPT: j1 j2 jr ir+1 jr+1 ... why not replace job jr+1 with job ir+1? Swarat Chaudhuri & John Greiner COMP 382: Reasoning about algorithms Optimality Theorem: The greedy algorithm is optimal. Proof: By contradiction. Assume greedy is not optimal, and let’s see what happens. Let i1 , i2 , . . . , ik denote set of jobs selected by greedy. Let j1 , j2 , . . . , jm denote set of jobs in the optimal solution with i1 = j1 , i2 = j2 , ..., ir = jr for the largest possible r job ir+1 finishes before jr+1 Greedy: i1 i1 ir ir+1 OPT: j1 j2 jr ir+1 ... solution still feasible and optimal, but contradicts maximality of r. Swarat Chaudhuri & John Greiner COMP 382: Reasoning about algorithms Exercise: Selecting breakpoints Road trip from Houston to Palo Alto along fixed route. Refueling stations at certain points along the way. Fuel capacity = C Goal: make as few refueling stops as possible. Truck-driver’s algorithm: Go as far as you can before refueling. Question: Is this optimal? Swarat Chaudhuri & John Greiner COMP 382: Reasoning about algorithms Algorithm: Interval partitioning Interval partitioning: Lecture j starts at sj and finishes at fj . Goal: Find minimum number of classrooms to schedule all lectures so that no two occur at the same time in the same room. Example: This schedule uses 4 classrooms to schedule 10 lectures. e c j g d b h a 9 9:30 f 10 10:30 11 11:30 12 12:30 1 Swarat Chaudhuri & John Greiner 1:30 i 2 2:30 3 3:30 4 4:30 Time COMP 382: Reasoning about algorithms Algorithm: Interval partitioning Interval partitioning: Lecture j starts at sj and finishes at fj . Goal: Find minimum number of classrooms to schedule all lectures so that no two occur at the same time in the same room. Example: This schedule uses 3 classrooms c d b a 9 9:30 f j g i h e 10 10:30 11 11:30 12 12:30 1 Swarat Chaudhuri & John Greiner 1:30 2 2:30 3 3:30 4 4:30 Time COMP 382: Reasoning about algorithms Lower bound on optimal solution Definition: The depth of a set of open intervals is the maximum number of intervals that contain any given time. Key observation: Number of classrooms needed ≥ depth. Example: Depth of intervals below = 3. This means schedule below is optimal. c d b a 9 9:30 f j g i h e 10 10:30 11 11:30 12 12:30 1 1:30 Swarat Chaudhuri & John Greiner 2 2:30 3 3:30 4 4:30 Time COMP 382: Reasoning about algorithms Greedy algorithm Consider intervals in increasing order of start time Assign interval to any compatible classroom. 1 2 3 4 5 6 7 8 9 10 11 Sort intervals by starting time so that s1 ≤ s2 ≤ · · · ≤ sn d = 0 for j = 1 to n { if (lecture j is compatible with some classroom k) schedule lecture j in classroom k else { allocate a new classroom d + 1 schedule lecture j in classroom d + 1 d = d + 1 } } Swarat Chaudhuri & John Greiner COMP 382: Reasoning about algorithms Greedy algorithm Consider intervals in increasing order of start time Assign interval to any compatible classroom. 1 2 3 4 5 6 7 8 9 10 11 Sort intervals by starting time so that s1 ≤ s2 ≤ · · · ≤ sn d = 0 for j = 1 to n { if (lecture j is compatible with some classroom k) schedule lecture j in classroom k else { allocate a new classroom d + 1 schedule lecture j in classroom d + 1 d = d + 1 } } Complexity O(n log n) Swarat Chaudhuri & John Greiner COMP 382: Reasoning about algorithms Interval Partitioning: Greedy Analysis Observation: Greedy algorithm never schedules two incompatible lectures in the same classroom. Theorem: Greedy algorithm is optimal. Proof: Let d = number of classrooms that the greedy algorithm allocates. Classroom d is opened because we needed to schedule a job, say j, that is incompatible with all d − 1 other classrooms. Since we sorted by start time, all these incompatibilities are caused by lectures that start no later than sj Thus, we have d lectures overlapping at time sj + δ Key observation implies that all schedules use ≥ d classrooms Swarat Chaudhuri & John Greiner COMP 382: Reasoning about algorithms Algorithm: Scheduling to minimize lateness Minimizing lateness: Single resource processes one job at a time. Job j requires tj units of processing time and is due at time dj If j starts at time sj , it finishes at time fj = sj + tj Lateness: lj = max{0, fj − dj } Goal: schedule all jobs to minimize maximum lateness L = max lj tj dj 1 3 6 2 2 8 3 1 9 4 4 9 5 3 14 6 2 15 Swarat Chaudhuri & John Greiner COMP 382: Reasoning about algorithms Greedy algorithms Greedy template: Consider jobs in some order. [Shortest processing time first] Consider jobs in ascending order of processing time tj [Smallest slack] Consider jobs in ascending order of slack dj − tj . [Earliest deadline first] Consider jobs in ascending order of deadline dj Swarat Chaudhuri & John Greiner COMP 382: Reasoning about algorithms Counterexamples [Shortest processing time first] (t1 , d1 ) = (1, 100); (t2 , d2 ) = (10, 10) [Smallest slack] (t1 , d1 ) = (1, 2); (t2 , d2 ) = (10, 10) Swarat Chaudhuri & John Greiner COMP 382: Reasoning about algorithms Earliest deadline first 1 2 3 4 5 6 7 Sort n jobs by deadline so that d1 ≤ d2 ≤ · · · ≤ dn t = 0 for j = 1 to n Assign job j to interval [t, t + tj ] sj = t, fj = t + tj t = t + tj output intervals [sj , fj ] max lateness = 1 d1 = 6 0 1 2 d2 = 8 3 4 d3 = 9 5 6 d4 = 9 7 Swarat Chaudhuri & John Greiner 8 d5 = 14 9 10 11 12 d6 = 15 13 14 COMP 382: Reasoning about algorithms 15 No idle time Observation: There exists an optimal schedule with no idle time (i.e., time when the processor stays unused). Observation: The greedy schedule has no idle time. Swarat Chaudhuri & John Greiner COMP 382: Reasoning about algorithms Inversions Definition: An inversion in schedule S is a pair of jobs i and j such that: di < dj but j scheduled before i Greedy schedule has no inversions. All schedules with no idle time and no inversions have same maximum lateness. Can only differ in the way jobs with identical deadlines are ordered If a schedule (with no idle time) has an inversion, it has one with a pair of inverted jobs scheduled consecutively. Swarat Chaudhuri & John Greiner COMP 382: Reasoning about algorithms Inversions inversion j before swap i after swap fi i j f'j Claim: Swapping two adjacent, inverted jobs reduces the number of inversions by one and does not increase the max lateness. Proof: Let l be the lateness before the swap, and let l 0 be the lateness afterwards. lk0 = lk for all k 6= i, j li0 ≤ li If job j is late: lj0 = = ≤ ≤ fj0 − dj fi − dj fi − di li . Swarat Chaudhuri & John Greiner (definition of lateness) (j finishes at time fi ) (definition of inversion) COMP 382: Reasoning about algorithms Analysis of Greedy Algorithm Theorem: Greedy schedule S is optimal. Proof: Let S ∗ be an optimal schedule that has the fewest number of inversions, and let’s see what happens. Can assume S ∗ has no idle time. If S ∗ has no inversions, then S = S ∗ If S ∗ has an inversion, let i − j be an adjacent inversion. Swapping i and j does not increase the maximum lateness and strictly decreases the number of inversions this contradicts definition of S ∗ Swarat Chaudhuri & John Greiner COMP 382: Reasoning about algorithms Greedy analysis strategies 1 Greedy algorithm stays ahead: Show that after each step of the greedy algorithm, its solution is at least as good as any other algorithm’s. 2 Exchange argument: Gradually transform any solution to the one found by the greedy algorithm without hurting its quality. 3 Structural: Discover a simple ”structural” bound asserting that every possible solution must have a certain value. Then show that your algorithm always achieves this bound. Swarat Chaudhuri & John Greiner COMP 382: Reasoning about algorithms Recap: “Greedy stays ahead” proofs 1 Consider the greedy solution A and an optimal solution OPT . 2 Find a measure by which greedy stays ahead of OPT . Let a1 , . . . , ak be the measures of the choices of the greedy algorithm, and let o1 , . . . , om be the measures of the choices of the optimal algorithm. Show inductively that the partial solutions constructed by greedy are always as good as partial solutions constructed by optimal. 3 Consider the point where greedy starts disagreeing with all optimal solutions. Show that if optimal agreed with greedy for one more step, the optimal solution wouldn’t lose quality. Swarat Chaudhuri & John Greiner COMP 382: Reasoning about algorithms Recap: Exchange argument proofs 1 2 Consider the greedy solution A and an optimal solution OPT . Compare A and OPT . Assume that A costs more than OPT (otherwise you are done). Now identify differences between A and OPT . For example: There’s an element of OPT that’s not in A and an element of A that’s not in OPT There are two consecutive elements in OPT in a different order than they are in A (i.e., there is an inversion). 3 Swap the elements in question in OPT (replace one element by another in the first case above; swap the elements’ order in the second case). Argue that you have a solution that’s no worse than before. 4 Argue that if you continue swapping, you can eliminate all differences between OPT and A. Swarat Chaudhuri & John Greiner COMP 382: Reasoning about algorithms Exercise: Coin changing Problem: Given currency denominations 1 (P), 5 (N), 10 (D), 25 (Q), and 100, devise a method to pay amount to customer using fewest number of coins. Example: 34 = 25 + 5 + 1 + 1 + 1 + 1 Cashier’s algorithm: At each iteration, add coin of the largest value that doesn’t take you past the amount to be paid. Swarat Chaudhuri & John Greiner COMP 382: Reasoning about algorithms Exercise: Coin changing Problem: Given currency denominations 1 (P), 5 (N), 10 (D), 25 (Q), and 100, devise a method to pay amount to customer using fewest number of coins. Example: 34 = 25 + 5 + 1 + 1 + 1 + 1 Cashier’s algorithm: At each iteration, add coin of the largest value that doesn’t take you past the amount to be paid. Goal Prove that the algorithm is optimal Swarat Chaudhuri & John Greiner COMP 382: Reasoning about algorithms Analysis of greedy algorithm Suppose that at the point where greedy diverges from all OPT , we need to make change for amount x Let ck ≤ x < ck+1 : greedy takes coin k. Claim: any optimal solution must also take coin k. If not, it needs enough coins of type c1 , . . . , ck−1 to add up to x Table below indicates no optimal solution can do this ck 1 5 10 25 100 Constraints on OPT P≤4 N≤1 N +D ≤2 Q≤3 no limit Max. value of coins 1, 2, . . . , k − 1 in OPT 4 4+5=9 20 + 4 = 24 75 + 24 = 99 Swarat Chaudhuri & John Greiner COMP 382: Reasoning about algorithms Case when greedy is suboptimal Greedy algorithm is suboptimal for US postal denominations: 1, 10, 21, 34, 70, 100, 350, 1225, 1500 Swarat Chaudhuri & John Greiner COMP 382: Reasoning about algorithms Case when greedy is suboptimal Greedy algorithm is suboptimal for US postal denominations: 1, 10, 21, 34, 70, 100, 350, 1225, 1500 Counterexample: 140 cents Greedy: 100, 34, 1, 1, 1, 1, 1, 1 Optimal: 70, 70 Swarat Chaudhuri & John Greiner COMP 382: Reasoning about algorithms Dijkstra’s shortest-path algorithm Shortest path network Directed graph G = (V , E ) Source s, destination t le : length of edge e Goal: Find shortest directed path from s to t Swarat Chaudhuri & John Greiner COMP 382: Reasoning about algorithms Dijkstra’s algorithm 1 Maintain a set of explored nodes S for which we have determined the shortest path distance d(u) from s to u 2 Initialize S = {s}, d(s) = 0 3 Repeatedly choose unexplored node v which minimizes π(v ) = min e=(u,v ):u∈S d(u) + le Add v to S, and set d(v ) = π(v ) Swarat Chaudhuri & John Greiner COMP 382: Reasoning about algorithms Correctness by exchange argument Invariant: For each node u ∈ S, d(u) is the length of the shortest s-u path. Proof by induction on |S|: Base case: |S| = 1 is trivial. Inductive hypothesis: Assume true for |S| = k ≥ 1 Let v be next node added to S; u-v be the chosen edge. The shortest s-u path plus (u, v ) is an s-v path of length π(v ). Consider any s-v path P. Let x-y be the first edge in P that leaves S, and let P 0 be the subpath to x. P is already longer than π(v ) as soon as it leaves S. P P' y x s S u v Swarat Chaudhuri & John Greiner COMP 382: Reasoning about algorithms Implementing Dijkstra’s algorithm For each unexplored node, explicitly maintain π(v ) = min e=(u,v ):u∈S d(u) + le . Next node to explore = node with minimum π(v ) When exploring v , for each incident edge e = (v , w ), update π(w ) = min{π(w ), π(v ) + le }. Maintain a priority queue of unexplored nodes, prioritized by π(v ). Swarat Chaudhuri & John Greiner COMP 382: Reasoning about algorithms Implementing Dijkstra’s algorithm For each unexplored node, explicitly maintain π(v ) = min e=(u,v ):u∈S d(u) + le . Next node to explore = node with minimum π(v ) When exploring v , for each incident edge e = (v , w ), update π(w ) = min{π(w ), π(v ) + le }. Maintain a priority queue of unexplored nodes, prioritized by π(v ). O(mn) time if you’re using an array for a priority queue Insertion and finding the minimum takes O(n) time O(m log n) time if you’re using a binary heap O(log n) time to reorganize heap after each update to π(w ) Swarat Chaudhuri & John Greiner COMP 382: Reasoning about algorithms Minimum spanning tree Minimum spanning tree: Given a connected graph G = (V , E ) with real-valued edge weights ce , an MST is a subset of the edges T ⊆ E such that T is a spanning tree whose sum of edge weights is minimized. 24 4 6 16 4 23 18 5 9 8 14 10 9 6 5 11 7 8 11 7 21 Cayley’s theorem: There are nn−2 spanning trees of Kn . Swarat Chaudhuri & John Greiner COMP 382: Reasoning about algorithms MST algorithms: recap from COMP 182 Kruskal’s algorithm: Start with T = ∅. Consider edges in ascending order of cost. Insert edge e in T unless doing so would create a cycle. Reverse-Delete algorithm: Start with T = E . Consider edges in descending order of cost. Delete edge e from T unless doing so would disconnect T . Prim’s algorithm: Start with some root node s and greedily grow a tree T from s outward. At each step, add the cheapest edge e to T that has exactly one endpoint in T . Swarat Chaudhuri & John Greiner COMP 382: Reasoning about algorithms Correctness: cuts and cycles Simplifying assumption: All edge costs ce are distinct. Cut property: Let S be any subset of nodes, and let e be the min cost edge with exactly one endpoint in S. Then the MST contains e. Cycle property: Let C be any cycle, and let f be the max cost edge belonging to C . Then the MST does not contain f . f S C e e is in the MST Swarat Chaudhuri & John Greiner f is not in the MST COMP 382: Reasoning about algorithms Cycles and cuts Cycle: Set of edges of the form a-b, b-c, c-d, . . . , y -z, z-a. Cutset: A cut is a subset of nodes S. The corresponding cutset D is the subset of edges with exactly one endpoint in S. Swarat Chaudhuri & John Greiner COMP 382: Reasoning about algorithms Cycles and cuts Claim: A cycle and a cutset intersect in an even number of edges. 1 2 3 6 Cycle C = 1-2, 2-3, 3-4, 4-5, 5-6, 6-1 Cutset D = 3-4, 3-5, 5-6, 5-7, 7-8 Intersection = 3-4, 5-6 4 5 8 7 Proof: C S V-S Swarat Chaudhuri & John Greiner COMP 382: Reasoning about algorithms MSTs and cut property Simplifying assumption: All edge costs ce are distinct. Cut property: Let S be any subset of nodes, and let e be the min cost edge with exactly one endpoint in S. Then the MST T ∗ contains e. Swarat Chaudhuri & John Greiner COMP 382: Reasoning about algorithms MSTs and cut property Simplifying assumption: All edge costs ce are distinct. Cut property: Let S be any subset of nodes, and let e be the min cost edge with exactly one endpoint in S. Then the MST T ∗ contains e. Proof by exchange argument: Suppose e does not belong to T ∗ , and let’s see what happens. Adding e to T ∗ creates a cycle C in T ∗ . Edge e is both in the cycle C and in the cutset D corresponding to S. Therefore, there exists another edge, say f , that is in both C and D. T 0 = T ∗ ∪ {e} − {f } is also a spanning tree. Since ce < cf , cost(T 0 ) < cost(T ∗ ). This is a contradiction. Swarat Chaudhuri & John Greiner COMP 382: Reasoning about algorithms MSTs and cycle property Simplifying assumption: All edge costs ce are distinct. Cycle property: Let C be any cycle in G , and let f be the max cost edge belonging to C . Then the MST T ∗ does not contain f . Swarat Chaudhuri & John Greiner COMP 382: Reasoning about algorithms MSTs and cycle property Simplifying assumption: All edge costs ce are distinct. Cycle property: Let C be any cycle in G , and let f be the max cost edge belonging to C . Then the MST T ∗ does not contain f . Proof by exchange argument: Suppose f belongs to T ∗ , and let’s see what happens. Deleting f from T ∗ creates a cut S in T ∗ Edge f is both in the cycle C and in the cutset D corresponding to S. Therefore, there exists another edge, say e, that is in both C and D. T 0 = T ∗ ∪ {e} − {f } is also a spanning tree. Since ce < cf , cost(T 0 ) < cost(T ∗ ) This is a contradiction. Swarat Chaudhuri & John Greiner COMP 382: Reasoning about algorithms Prim’s algorithm Prim’s algorithm [Jarnk 1930, Dijkstra 1957, Prim 1959]: Initialize S = any node. Apply cut property to S. Add min cost edge in cutset corresponding to S to T , and add one new explored node u to S. Swarat Chaudhuri & John Greiner COMP 382: Reasoning about algorithms Implementing Prim’s algorithm Use a priority queue a la Dijkstra. Maintain set of explored nodes S. For each unexplored node v , maintain attachment cost a[v ] cost of cheapest edge v to a node in S. O(mn) with an array; O(m log n) with a binary heap. 1 Prim(G, c) { 2 foreach (v ∈ V) a[v] = ∞ 3 Initialize an empty priority queue Q 4 foreach (v ∈ V) insert v onto Q 5 Initialize set of explored nodes S = ∅ 6 while (Q is not empty) { 7 u = delete min element from Q 8 S = S ∪ { u } 9 foreach (edge e = (u, v) incident to u) 10 if ((v ∈ / S) and (c[e] < a[v])) 11 decrease priority a[v] to c[e] 12 } Swarat Chaudhuri & John Greiner COMP 382: Reasoning about algorithms Kruskal’s algorithm Kruskal’s algorithm [Kruskal, 1956] Consider edges in ascending order of weight. Case 1: If adding e to T creates a cycle, discard e according to cycle property. Case 2: Otherwise, insert e = (u, v ) into T according to cut property where S = set of nodes in u’s connected component. v e Case 1 S e u Case 2 21 Swarat Chaudhuri & John Greiner COMP 382: Reasoning about algorithms Implementing Kruskal’s algorithm Implementation: Use the union-find data structure. Build set T of edges in the MST. Maintain set for each connected component. O(m log n) for sorting and O(mα(m, n)) for union-find. α(m, n) is essentially a constant 1 Kruskal(G, c) { 2 Sort edges weights so that c1 < c2 < · · · < cm 3 T = ∅ 4 foreach (u ∈ V) make a set containing singleton u 5 for i = 1 to m 6 (u,v) = ei 7 if (u and v are in different sets) { 8 T = T ∪ {ei } 9 merge the sets containing u and v 10 } 11 return T 12 } Swarat Chaudhuri & John Greiner COMP 382: Reasoning about algorithms Exercise: Membership in MST Suppose you are given a connected graph G (edge costs are assumed to be distinct). A particular edge e of G is specified. Give a linear-time algorithm to decide if e appears in a MST of G . Swarat Chaudhuri & John Greiner COMP 382: Reasoning about algorithms Solution idea Observation: Edge e = (v , w ) does not belong to an MST if and only if v and w can be joined by a path exclusively made of edges cheaper than e. Algorithm: delete from G edge e, as well as all edges that are more expensive than e. Now check connectivity. Swarat Chaudhuri & John Greiner COMP 382: Reasoning about algorithms Exercise: Unique MST Suppose you are given a connected graph G with edge costs that are all distinct. Prove that G has a unique minimum spanning tree. Swarat Chaudhuri & John Greiner COMP 382: Reasoning about algorithms Exercise: Near-tree Let us say a graph G = (V , E ) is a near-tree if it is connected and has at most n + 8 edges, where n = |V |. Give an algorithm with running time O(n) that takes a near-tree with costs on its edges, and returns a minimum spanning tree of G . You may assume that all the edge costs are distinct. Swarat Chaudhuri & John Greiner COMP 382: Reasoning about algorithms Boruvka’s algorithm 1 2 3 4 5 6 7 8 9 Boruvka(G ): Initialize a forest T to be a set of one-node trees while T has more than one component: for each component C of T : S := empty set of edges for each node v in C : Find the cheapest edge from v to a node outside C , and add it to S Add the cheapest edge in S to T Swarat Chaudhuri & John Greiner COMP 382: Reasoning about algorithms