CSCE 411 Design and Analysis of Algorithms Set 4: Greedy Algorithms Prof. Jennifer Welch Spring 2012 CSCE 411, Spring 2012: Set 4 1 Greedy Algorithm Paradigm Characteristics of greedy algorithms: make a sequence of choices each choice is the one that seems best so far, only depends on what's been done so far choice produces a smaller problem to be solved In order for greedy heuristic to solve the problem, it must be that the optimal solution to the big problem contains optimal solutions to subproblems CSCE 411, Spring 2012: Set 4 2 Designing a Greedy Algorithm Cast the problem so that we make a greedy (locally optimal) choice and are left with one subproblem Prove there is always a (globally) optimal solution to the original problem that makes the greedy choice Show that the choice together with an optimal solution to the subproblem gives an optimal solution to the original problem CSCE 411, Spring 2012: Set 4 3 Some Greedy Algorithms fractional knapsack algorithm Huffman codes Kruskal's MST algorithm Prim's MST algorithm Dijkstra's SSSP algorithm … CSCE 411, Spring 2012: Set 4 4 Knapsack Problem There are n different items in a store Item i : weighs wi pounds worth $vi A thief breaks in Can carry up to W pounds in his knapsack What should he take to maximize the value of his haul? CSCE 411, Spring 2012: Set 4 5 0-1 vs. Fractional Knapsack 0-1 Knapsack Problem: the items cannot be divided thief must take entire item or leave it behind Fractional Knapsack Problem: thief can take partial items for instance, items are liquids or powders solvable with a greedy algorithm… CSCE 411, Spring 2012: Set 4 6 Greedy Fractional Knapsack Algorithm Sort items in decreasing order of value per pound While still room in the knapsack (limit of W pounds) do consider next item in sorted list take as much as possible (all there is or as much as will fit) O(n log n) running time (for the sort) CSCE 411, Spring 2012: Set 4 7 Greedy 0-1 Knapsack Alg? 3 items: item 1 weighs 10 lbs, worth $60 ($6/lb) item 2 weighs 20 lbs, worth $100 ($5/lb) item 3 weighs 30 lbs, worth $120 ($4/lb) knapsack can hold 50 lbs greedy strategy: take item 1 take item 2 no room for item 3 CSCE 411, Spring 2012: Set 4 8 0-1 Knapsack Problem Taking item 1 is a big mistake globally although looks good locally Later we'll see a different algorithm design paradigm that does work for this problem CSCE 411, Spring 2012: Set 4 9 Finding Optimal Code Input: data file of characters and number of occurrences of each character Output: a binary encoding of each character so that the data file can be represented as efficiently as possible "optimal code" CSCE 411, Spring 2012: Set 4 10 Huffman Code Idea: use short codes for more frequent characters and long codes for less frequent char a b c d e f # 45 13 12 16 9 5 fixed 000 001 010 011 100 101 300 variable 0 1100 224 101 100 111 1101 CSCE 411, Spring 2012: Set 4 total bits 11 How to Decode? With fixed length code, easy: break up into 3's, for instance For variable length code, ensure that no character's code is the prefix of another no ambiguity 101111110100 b d e CSCE 411, Spring 2012: Set 4 aa 12 Binary Tree Representation fixed length code 0 0 1 0 1 0 1 0 1 0 1 a b c d e f cost of code is sum, over all chars c, of number of occurrences of c times depth of c in the tree CSCE 411, Spring 2012: Set 4 13 Binary Tree Representation 0 1 variable length code a 0 1 0 1 0 1 c b 0 1 f e d cost of code is sum, over all chars c, of number of occurrences of c times depth of c in the tree CSCE 411, Spring 2012: Set 4 14 Algorithm to Construct Tree Representing Huffman Code Given set C of n chars, c occurs f[c] times insert each c into priority queue Q using f[c] as key for i := 1 to n-1 do x := extract-min(Q) y := extract-min(Q) make a new node z w/ left child x (label edge 0), right child y (label edge 1), and f[z] = f[x] + f[y] insert z into Q CSCE 411, Spring 2012: Set 4 15 <board work> CSCE 411, Spring 2012: Set 4 16 Graphs Directed graph G = (V,E) In an undirected graph, edges are unordered pairs of vertices If (u,v) is in E, then v is adjacent to u and (u,v) is incident from u and incident to v; v is neighbor of u V is a finite set of vertices E is the edge set, a binary relation on E (ordered pairs of vertice) in an undirected graph, u is also adjacent to v, and (u,v) is incident on u and v Degree of a vertex is number of edges incident from or to (or on) the vertex CSCE 411, Spring 2012: Set 4 17 More on Graphs A path of length k from vertex u to vertex u’ is a sequence of k+1 vertices s.t. there is an edge from each vertex to the next in the sequence In this case, u’ is reachable from u A path is simple if no vertices are repeated A path forms a cycle if last vertex = first vertex A cycle is simple if no vertices are repeated other than first and last CSCE 411, Spring 2012: Set 4 18 Graph Representations So far, we have discussed graphs purely as mathematical concepts How can we represent a graph in a computer program? We need some data structures. Two most common representations are: adjacency list – best for sparse graphs (few edges) adjacency matrix – best for dense graphs (lots of edges) CSCE 411, Spring 2012: Set 4 19 Adjacency List Representation Given graph G = (V,E) array Adj of |V| linked lists, one for each vertex in V The adjacency list for vertex u, Adj[u], contains all vertices v s.t. (u,v) is in E each vertex in the list might be just a string, representing its name, or it might be some other data structure representing the vertex, or it might be a pointer to the data structure for that vertex Requires Θ(V+E) space (memory) Requires Θ(degree(u)) time to check if (u,v) is in E CSCE 411, Spring 2012: Set 4 20 Adjacency Matrix Representation Given graph G = (V,E) Number the vertices 1, 2, ..., |V| Use a |V| x |V| matrix A with (i,j)-entry of A being 1 if (i,j) is in E 0 if (i,j) is not in E Requires Θ(V2) space (memory) Requires Θ(1) time to check if (u,v) is in E CSCE 411, Spring 2012: Set 4 21 More on Graph Representations Read Ch 22, Sec 1 for more about directed vs. undirected graphs how to represented weighted graphs (some value is associated with each edge) pros and cons of adjacency list vs. adjacency matrix CSCE 411, Spring 2012: Set 4 22 Minimum Spanning Tree 16 5 12 4 7 3 6 11 14 9 8 10 2 15 13 17 18 Given a connected undirected graph with edge weights, find subset of edges that spans all the nodes, creates no cycle, and minimizes sum of weights CSCE 411, Spring 2012: Set 4 23 Facts About MSTs There can be many spanning trees of a graph In fact, there can be many minimum spanning trees of a graph But if every edge has a unique weight, then there is a unique MST CSCE 411, Spring 2012: Set 4 24 Uniqueness of MST Suppose in contradiction there are 2 MSTs, M1 and M2. Let e be edge with minimum weight that is in one MST but not the other (say it is in M1). If e is added to M2, a cycle is formed. Let e' be an edge in the cycle that is not in M1 CSCE 411, Spring 2012: Set 4 25 Uniqueness of MST e: in M1 but not M2 M2: e': in M2 but not M1; wt of e' is less than wt of e, by choice of e Replacing e with e' creates a new MST M3 whose weight is less than that of M2 CSCE 411, Spring 2012: Set 4 26 Kruskal's MST algorithm 16 5 12 4 7 3 6 11 14 9 8 10 2 15 13 17 18 consider the edges in increasing order of weight, add in an edge iff it does not cause a cycle CSCE 411, Spring 2012: Set 4 27 Why is Kruskal's Greedy? Algorithm manages a set of edges s.t. At each iteration: these edges are a subset of some MST choose an edge so that the MST-subset property remains true subproblem left is to do the same with the remaining edges Always try to add cheapest available edge that will not violate the tree property locally optimal choice CSCE 411, Spring 2012: Set 4 28 Correctness of Kruskal's Alg. Let e1, e2, …, en-1 be sequence of edges chosen Clearly they form a spanning tree Suppose it is not minimum weight Let ei be the edge where the algorithm goes wrong {e1,…,ei-1} is part of some MST M but {e1,…,ei} is not part of any MST CSCE 411, Spring 2012: Set 4 29 Correctness of Kruskal's Alg. M: ei, forms a cycle in M wt(e*) > wt(ei) e* : min wt. edge in cycle not in e1 to ei-1 replacing e* w/ ei forms a spanning tree with smaller weight than M, contradiction! gray edges are part of MST M, which contains e1 to ei-1, but not ei CSCE 411, Spring 2012: Set 4 30 Note on Correctness Proof Argument on previous slide works for case when every edge has a unique weight Algorithm also works when edge weights are not necessarily unique Modify proof on previous slide: contradiction is reached to assumption that ei is not part of any MST CSCE 411, Spring 2012: Set 4 31 Implementing Kruskal's Alg. Sort edges by weight efficient algorithms known How to test quickly if adding in the next edge would cause a cycle? use disjoint set data structure, later CSCE 411, Spring 2012: Set 4 32 Another Greedy MST Alg. Kruskal's algorithm maintains a forest that grows until it forms a spanning tree Alternative idea is keep just one tree and grow it until it spans all the nodes Prim's algorithm At each iteration, choose the minimum weight outgoing edge to add greedy! CSCE 411, Spring 2012: Set 4 33