CSE 431/531: Analysis of Algorithms Lecture 4-8: Greedy Algorithms Lecturer: Shi Li Department of Computer Science and Engineering University at Buffalo Spring 2016 MoWeFr 3:00-3:50pm Knox 110 Announcements No lecture to replace the one we skipped last Friday My office hours: 12:30-2:30pm, Mondays, 328 Davis Hall Office hours of TA: 12:30-2:30pm, Fridays, 203 Davis Hall Outline 1 Interval Scheduling 2 Offline Caching 3 Data Compression and Huffman Code Huffman Code Efficient Implementation Using Heap Interval Schduling Input: n jobs, job i with start time si and finish time fi i and j are compatible if [si , fi ) and [sj , fj ) are disjoint Output: A maximum-size subset of mutually compatible jobs 0 1 2 3 4 5 6 7 8 9 Greedy Algorithm for Interval Scheduling Which job is “safe” to be included, and why? The job with the smallest size? No! 0 1 2 3 4 5 6 7 8 9 Greedy Algorithm for Interval Scheduling Which job is “safe” to be included, and why? The job with the smallest size? No! The job conflicting with smallest number of other jobs? No! 0 1 2 3 4 5 6 7 8 9 Greedy Algorithm for Interval Scheduling Which job is “safe” to be included, and why? The job with the smallest size? No! The job conflicting with smallest number of other jobs? No! The job with the earliest finish time? Yes! 0 1 2 3 4 5 6 7 8 9 Greedy Algorithm for Interval Scheduling Lemma: It is safe to include the job j with the earliest finish time: there is an optimum solution that includes j. Proof. Take an arbitrary optimum solution S If it contains j, done Otherwise, replace the first job in S with j to obtain an new optimum schedule S 0 . S: j: S 0: Greedy Algorithm for Interval Scheduling Lemma: It is safe to include the job j with the earliest finish time: there is an optimum solution that includes j. What is the remaining task after we decided to schedule j? Is it another instance of interval scheduling problem? Yes! 0 1 2 3 4 5 6 7 8 9 Greedy Algorithm for Interval Scheduling Schedule(s, f, n) 1 2 3 4 5 A ← {1, 2, · · · , n}, S ← ∅ while A 6= ∅ j ← arg minj 0 ∈A fj 0 S ← S ∪ {j}; A ← {j 0 ∈ A : sj 0 ≥ fj } return S 0 1 2 3 4 5 6 7 8 9 Greedy Algorithm for Interval Scheduling Schedule(s, f, n) 1 2 3 4 5 A ← {1, 2, · · · , n}, S ← ∅ while A 6= ∅ j ← arg minj 0 ∈A fj 0 S ← S ∪ {j}; A ← {j 0 ∈ A : sj 0 ≥ fj } return S Running time of algorithm? Naive implementation: O(n2 ) time Clever implementation: O(n lg n) time Clever Implementation of Greedy Algorithm Schedule(s, f, n) 1 2 3 4 5 6 7 sort jobs according to f values t ← 0, S ← ∅ for every j ∈ [n] according to non-decreasing order of fj if sj ≥ t then S ← S ∪ {j} t t ← fj return S 0 1 2 3 4 5 6 7 8 9 2 5 9 4 8 3 1 7 6 Steps of Designing Greedy Algorithms 1 2 Design a greedy choice; Prove it is “safe” to make the greedy choice: there is an optimum solution that is consistent with the greedy choice Usually done by “exchange argument”; 3 Show that the remaining task after applying the greedy choice is to solve a (many) smaller instance(s) of the same problem. The step is usually trivial Then by induction, greedy algorithm gives an optimum solution. Outline 1 Interval Scheduling 2 Offline Caching 3 Data Compression and Huffman Code Huffman Code Efficient Implementation Using Heap Offline Caching Cache that can store k pages Sequence of m requests, each requesting a page Cache hit: requested page already in cache Cache miss: requested page not in cache: bring requested page into cache, and evict some existing page Goal: minimize the number of cache misses. Decisions: when a cache miss happens, decide which page to evict Offline Caching: Example ⊥ ⊥ ⊥ 1 2 3 1 1 ⊥ ⊥ 1 ⊥ ⊥ 5 5 ⊥ ⊥ 5 ⊥ ⊥ 4 4 ⊥ ⊥ 5 4 ⊥ 2 4 2 ⊥ 5 4 2 5 4 2 5 5 4 2 3 4 2 3 5 3 2 2 4 2 3 5 3 2 1 1 2 3 1 3 2 requests misses = 7 misses = 6 Offline Caching Problem Input: k : the size of cache n : number of pages ρ1 , ρ2 , ρ3 , · · · , ρT ∈ [n]: sequence of requests Output: i1 , i2 , i3 , · · · , it ∈ {0} ∪ [n]: indices of pages to evict (0 means evicting no page) Offline Caching: Potential Greedy Algorithms FIFO(First-In-First-Out): always evict the first page in cache LRU(Least-Recently-Used): Evict page whose most recent access was earliest LFU(Least-Frequently-Used): Evict page that was least frequently requested All these algorithms are not optimum! FIFO does not give optimum solution FIFO ⊥ ⊥ ⊥ 1 1 ⊥ ⊥ 2 1 2 ⊥ 3 1 2 3 4 4 2 3 requests 1 Optimum Offline Caching Furthest-in-Future (FF): evict item that is not requested until furthest in the future FIFO FF ⊥ ⊥ ⊥ ⊥ ⊥ ⊥ 1 1 ⊥ ⊥ 1 ⊥ ⊥ 2 1 2 ⊥ 1 2 ⊥ 3 1 2 3 1 2 3 4 4 2 3 1 4 3 1 4 1 3 1 4 3 requests misses = 5 misses = 4 requests 1 5 4 2 5 3 2 4 3 1 5 3 ⊥ 1 1 1 2 2 2 2 2 2 1 5 5 ⊥ ⊥ 5 5 5 5 3 3 3 3 3 3 3 ⊥ ⊥ ⊥ 4 4 4 4 4 4 4 4 4 4 Online Vs Offline Online algorithms: decisions must be made before seeing future requests FIFO(First-In-First-Out): always evict the first page in cache LRU(Least-Recently-Used): Evict page whose most recent access was earliest LFU(Least-Frequently-Used): Evict page that was least frequently requested Offline algorithms: knowing all requests ahead of time FF(Furthest-in-Future): evict item that is not requested until furthest in the future Online Vs Offline Online algorithms: decisions must be made before seeing future requests Offline algorithms: knowing all requests ahead of time Which one is more realistic? Online algorithms Why study offline algorithm? Competitive analysis: compare online algorithm against the best offline algorithm. Recall: Steps of Designing Greedy Algorithms 1 2 Design a greedy choice; Prove it is “safe” to make the greedy choice: there is an optimum solution that is consistent with the greedy choice Usually done by “exchange argument”; 3 Show that the remaining task after applying the greedy choice is to solve a (many) smaller instance(s) of the same problem. The step is usually trivial Offline Caching Problem Input: k : the size of cache n : number of pages ρ1 , ρ2 , ρ3 , · · · , ρT ∈ [n]: sequence of requests p1 , p2 , · · · , pk ∈ {⊥} ∪ [n]: initial set of pages in cache Output: i1 , i2 , i3 , · · · , it ∈ {0, ⊥} ∪ [n] 0 means evicting no pages ⊥ means “evicting” an empty page Lemma Assume at time 1 a page fault happens. It is safe to evict the page that is not requested until furthest in the future. S: 1 2 3 4 5 4 1 4 3 5 4 3 5 4 3 ··· ··· 2 6 8 3 2 8 3 6 8 2 6 8 2 ··· S0 : 1 2 3 1 4 2 5 4 2 5 4 2 ··· ··· 3 Outline 1 Interval Scheduling 2 Offline Caching 3 Data Compression and Huffman Code Huffman Code Efficient Implementation Using Heap Outline 1 Interval Scheduling 2 Offline Caching 3 Data Compression and Huffman Code Huffman Code Efficient Implementation Using Heap Example 75 1 0 47 28 1 0 1 0 20 A B 15 C 13 1 0 27 A : 00 B : 10 C : 010 D : 011 E : 110 F : 111 11 0 D 9 E 8 1 F 5 Algorithm for Huffman Codes Huffman-tree(k, f ) 1 2 3 4 5 6 7 8 9 10 S ← {1, 2, 3, · · · , k} r←k while |S| ≥ 1 i1 ← arg mini∈S f (i) i2 ← arg mini∈S\{i1 } f (i) r ←r+1 f (r) ← f (i1 ) + f (i2 ) lchild[r] ← i1 , rchild[r] ← i2 S ← S \ {i1 , i2 } ∪ {r} return (r, lchild, rchild) Algorithm using Priority Queue Huffman-tree(k, f ) 1 2 3 4 5 6 7 8 9 10 H ← build-pqueue({1, 2, 3, · · · , k}, f ) r←k while |S| ≥ 1 i1 ←extract-min(H) i2 ←extract-min(H) r ←r+1 f (r) ← f (i1 ) + f (i2 ) lchild(r) ← i1 , rchild(r) ← i2 insert(H, r) return (r, lchild, rchild) Outline 1 Interval Scheduling 2 Offline Caching 3 Data Compression and Huffman Code Huffman Code Efficient Implementation Using Heap Priority Queue A priority queue is a data structure that maintains a set S of elements each v ∈ S with a key value key(v) and supports the following operations: get-min: return the element v ∈ S with the smallest key(v) insert: insert an element v to S delete: delete an element from S ······ Simple Implementations of Priority Queues data structures array sorted array sorted doubly-linked-list heap get-min insert O(n) O(1) O(1) O(n) O(1) O(n) O(1) O(lg n) delete O(n) O(n) O(1) O(lg n) Heap The elements in a heap is organized using a complete binary tree: 1 2 4 8 9 3 5 10 6 7 Vertices are indexed as {1, 2, 3, · · · , n} Parent of vertex i: bi/2c Left child of vertex i: 2i Right child of vertex i: 2i + 1 Heap A heap H satisfies the following property: for any two vertices i, j such that i is the parent of j, we have key(H[i]) ≤ key(H[j]). H[i]: the element at vertex i of the tree for the heap H 2 4 5 10 15 21 16 9 17 20 7 17 15 11 8 16 23 19 A heap. Numbers in the circles denote key values. get-min(H) 1 return H[1] insert(H, v) 1 2 3 H.size ← H.size + 1 H[H.size] ← v heapify-up(H, H.size) delete(H, i) 1 2 3 4 5 H[i] ← H[H.size] H.size ← H.size − 1 if i ≤ H.size then heapify-up(H, i) heapify-down(H, i) \\ need to delete H[i] Insert an Element to a Heap: Example 2 3 5 4 15 21 16 9 10 19 20 17 7 17 15 11 8 16 23 heapify-up(H, i) 1 2 3 4 5 6 while i > 1 j ← bi/2c if key(H[i]) < key(H[j]) then swap H[i] and H[j] i←j else break Delete an Element from a Heap 17 3 17 3 4 5 17 10 4 15 21 16 9 10 17 19 20 7 17 15 11 8 16 23 heapify-down(H, i) 1 2 3 4 5 6 7 8 9 while 2i ≤ H.size if 2i = H.size or key(H[2i]) < key(H[2i + 1]) then j ← 2i else j ← 2i + 1 if key(H[j]) < key(H[i]) then swap the array entries H[i] and H[j] i←j else break Proof for the Correctness of Insertion and Deletion Def. We say that H is almost a heap except that key(H[i]) is too small if we can increase the key value of H[i] to make H a heap.