Lecture 4-8: Greedy Algorithms CSE 431/531: Analysis of Algorithms Lecturer: Shi Li

advertisement
CSE 431/531: Analysis of Algorithms
Lecture 4-8: Greedy Algorithms
Lecturer: Shi Li
Department of Computer Science and Engineering
University at Buffalo
Spring 2016
MoWeFr 3:00-3:50pm
Knox 110
Announcements
No lecture to replace the one we skipped last Friday
My office hours: 12:30-2:30pm, Mondays, 328 Davis Hall
Office hours of TA: 12:30-2:30pm, Fridays, 203 Davis Hall
Outline
1
Interval Scheduling
2
Offline Caching
3
Data Compression and Huffman Code
Huffman Code
Efficient Implementation Using Heap
Interval Schduling
Input: n jobs, job i with start time si and finish time fi
i and j are compatible if [si , fi ) and [sj , fj ) are disjoint
Output: A maximum-size subset of mutually compatible jobs
0
1
2
3
4
5
6
7
8
9
Greedy Algorithm for Interval Scheduling
Which job is “safe” to be included, and why?
The job with the smallest size? No!
0
1
2
3
4
5
6
7
8
9
Greedy Algorithm for Interval Scheduling
Which job is “safe” to be included, and why?
The job with the smallest size? No!
The job conflicting with smallest number of other jobs? No!
0
1
2
3
4
5
6
7
8
9
Greedy Algorithm for Interval Scheduling
Which job is “safe” to be included, and why?
The job with the smallest size? No!
The job conflicting with smallest number of other jobs? No!
The job with the earliest finish time? Yes!
0
1
2
3
4
5
6
7
8
9
Greedy Algorithm for Interval Scheduling
Lemma: It is safe to include the job j with the earliest finish
time: there is an optimum solution that includes j.
Proof.
Take an arbitrary optimum solution S
If it contains j, done
Otherwise, replace the first job in S with j to obtain an new
optimum schedule S 0 .
S:
j:
S 0:
Greedy Algorithm for Interval Scheduling
Lemma: It is safe to include the job j with the earliest finish
time: there is an optimum solution that includes j.
What is the remaining task after we decided to schedule j?
Is it another instance of interval scheduling problem? Yes!
0
1
2
3
4
5
6
7
8
9
Greedy Algorithm for Interval Scheduling
Schedule(s, f, n)
1
2
3
4
5
A ← {1, 2, · · · , n}, S ← ∅
while A 6= ∅
j ← arg minj 0 ∈A fj 0
S ← S ∪ {j}; A ← {j 0 ∈ A : sj 0 ≥ fj }
return S
0
1
2
3
4
5
6
7
8
9
Greedy Algorithm for Interval Scheduling
Schedule(s, f, n)
1
2
3
4
5
A ← {1, 2, · · · , n}, S ← ∅
while A 6= ∅
j ← arg minj 0 ∈A fj 0
S ← S ∪ {j}; A ← {j 0 ∈ A : sj 0 ≥ fj }
return S
Running time of algorithm?
Naive implementation: O(n2 ) time
Clever implementation: O(n lg n) time
Clever Implementation of Greedy Algorithm
Schedule(s, f, n)
1
2
3
4
5
6
7
sort jobs according to f values
t ← 0, S ← ∅
for every j ∈ [n] according to non-decreasing order of fj
if sj ≥ t then
S ← S ∪ {j}
t
t ← fj
return S
0 1 2 3 4 5 6 7 8 9
2
5
9
4
8
3
1
7
6
Steps of Designing Greedy Algorithms
1
2
Design a greedy choice;
Prove it is “safe” to make the greedy choice: there is an
optimum solution that is consistent with the greedy choice
Usually done by “exchange argument”;
3
Show that the remaining task after applying the greedy
choice is to solve a (many) smaller instance(s) of the same
problem.
The step is usually trivial
Then by induction, greedy algorithm gives an optimum
solution.
Outline
1
Interval Scheduling
2
Offline Caching
3
Data Compression and Huffman Code
Huffman Code
Efficient Implementation Using Heap
Offline Caching
Cache that can store k pages
Sequence of m requests, each requesting a page
Cache hit: requested page already in cache
Cache miss: requested page not in cache: bring requested
page into cache, and evict some existing page
Goal: minimize the number of cache misses.
Decisions: when a cache miss happens, decide which page to
evict
Offline Caching: Example
⊥
⊥
⊥
1
2
3
1
1
⊥
⊥
1
⊥
⊥
5
5
⊥
⊥
5
⊥
⊥
4
4
⊥
⊥
5
4
⊥
2
4
2
⊥
5
4
2
5
4
2
5
5
4
2
3
4
2
3
5
3
2
2
4
2
3
5
3
2
1
1
2
3
1
3
2
requests
misses = 7
misses = 6
Offline Caching Problem
Input: k : the size of cache
n : number of pages
ρ1 , ρ2 , ρ3 , · · · , ρT ∈ [n]: sequence of requests
Output: i1 , i2 , i3 , · · · , it ∈ {0} ∪ [n]: indices of pages to evict
(0 means evicting no page)
Offline Caching: Potential Greedy Algorithms
FIFO(First-In-First-Out): always evict the first page in cache
LRU(Least-Recently-Used): Evict page whose most recent
access was earliest
LFU(Least-Frequently-Used): Evict page that was least
frequently requested
All these algorithms are not optimum!
FIFO does not give optimum solution
FIFO
⊥
⊥
⊥
1
1
⊥
⊥
2
1
2
⊥
3
1
2
3
4
4
2
3
requests
1
Optimum Offline Caching
Furthest-in-Future (FF): evict item that is not requested until
furthest in the future
FIFO
FF
⊥
⊥
⊥
⊥
⊥
⊥
1
1
⊥
⊥
1
⊥
⊥
2
1
2
⊥
1
2
⊥
3
1
2
3
1
2
3
4
4
2
3
1
4
3
1
4
1
3
1
4
3
requests
misses = 5
misses = 4
requests
1
5
4
2
5
3
2
4
3
1
5
3
⊥
1
1
1
2
2
2
2
2
2
1
5
5
⊥
⊥
5
5
5
5
3
3
3
3
3
3
3
⊥
⊥
⊥
4
4
4
4
4
4
4
4
4
4
Online Vs Offline
Online algorithms: decisions must be made before seeing
future requests
FIFO(First-In-First-Out): always evict the first page in cache
LRU(Least-Recently-Used): Evict page whose most recent
access was earliest
LFU(Least-Frequently-Used): Evict page that was least
frequently requested
Offline algorithms: knowing all requests ahead of time
FF(Furthest-in-Future): evict item that is not requested until
furthest in the future
Online Vs Offline
Online algorithms: decisions must be made before seeing
future requests
Offline algorithms: knowing all requests ahead of time
Which one is more realistic?
Online algorithms
Why study offline algorithm?
Competitive analysis: compare online algorithm against the
best offline algorithm.
Recall: Steps of Designing Greedy Algorithms
1
2
Design a greedy choice;
Prove it is “safe” to make the greedy choice: there is an
optimum solution that is consistent with the greedy choice
Usually done by “exchange argument”;
3
Show that the remaining task after applying the greedy
choice is to solve a (many) smaller instance(s) of the same
problem.
The step is usually trivial
Offline Caching Problem
Input: k : the size of cache
n : number of pages
ρ1 , ρ2 , ρ3 , · · · , ρT ∈ [n]: sequence of requests
p1 , p2 , · · · , pk ∈ {⊥} ∪ [n]: initial set of pages in cache
Output: i1 , i2 , i3 , · · · , it ∈ {0, ⊥} ∪ [n]
0 means evicting no pages
⊥ means “evicting” an empty page
Lemma Assume at time 1 a page fault happens. It is safe to
evict the page that is not requested until furthest in the future.
S:
1
2
3
4
5
4
1
4
3
5
4
3
5
4
3
···
···
2
6
8
3
2
8
3
6
8
2
6
8
2
···
S0 :
1
2
3
1
4
2
5
4
2
5
4
2
···
···
3
Outline
1
Interval Scheduling
2
Offline Caching
3
Data Compression and Huffman Code
Huffman Code
Efficient Implementation Using Heap
Outline
1
Interval Scheduling
2
Offline Caching
3
Data Compression and Huffman Code
Huffman Code
Efficient Implementation Using Heap
Example
75
1
0
47
28
1
0
1
0
20
A
B
15
C
13
1
0
27
A : 00
B : 10
C : 010
D : 011
E : 110
F : 111
11
0
D
9
E
8
1
F
5
Algorithm for Huffman Codes
Huffman-tree(k, f )
1
2
3
4
5
6
7
8
9
10
S ← {1, 2, 3, · · · , k}
r←k
while |S| ≥ 1
i1 ← arg mini∈S f (i)
i2 ← arg mini∈S\{i1 } f (i)
r ←r+1
f (r) ← f (i1 ) + f (i2 )
lchild[r] ← i1 , rchild[r] ← i2
S ← S \ {i1 , i2 } ∪ {r}
return (r, lchild, rchild)
Algorithm using Priority Queue
Huffman-tree(k, f )
1
2
3
4
5
6
7
8
9
10
H ← build-pqueue({1, 2, 3, · · · , k}, f )
r←k
while |S| ≥ 1
i1 ←extract-min(H)
i2 ←extract-min(H)
r ←r+1
f (r) ← f (i1 ) + f (i2 )
lchild(r) ← i1 , rchild(r) ← i2
insert(H, r)
return (r, lchild, rchild)
Outline
1
Interval Scheduling
2
Offline Caching
3
Data Compression and Huffman Code
Huffman Code
Efficient Implementation Using Heap
Priority Queue
A priority queue is a data structure that maintains
a set S of elements
each v ∈ S with a key value key(v)
and supports the following operations:
get-min: return the element v ∈ S with the smallest key(v)
insert: insert an element v to S
delete: delete an element from S
······
Simple Implementations of Priority Queues
data structures
array
sorted array
sorted doubly-linked-list
heap
get-min insert
O(n)
O(1)
O(1)
O(n)
O(1)
O(n)
O(1)
O(lg n)
delete
O(n)
O(n)
O(1)
O(lg n)
Heap
The elements in a heap is organized using a complete binary tree:
1
2
4
8
9
3
5
10
6
7
Vertices are indexed as
{1, 2, 3, · · · , n}
Parent of vertex i: bi/2c
Left child of vertex i: 2i
Right child of vertex i: 2i + 1
Heap
A heap H satisfies the following property:
for any two vertices i, j such that i is the parent of j, we
have key(H[i]) ≤ key(H[j]).
H[i]: the element at vertex i of the tree for the heap H
2
4
5
10
15
21
16
9
17
20
7
17
15
11
8
16
23
19
A heap. Numbers in the circles denote key values.
get-min(H)
1
return H[1]
insert(H, v)
1
2
3
H.size ← H.size + 1
H[H.size] ← v
heapify-up(H, H.size)
delete(H, i)
1
2
3
4
5
H[i] ← H[H.size]
H.size ← H.size − 1
if i ≤ H.size then
heapify-up(H, i)
heapify-down(H, i)
\\ need to delete H[i]
Insert an Element to a Heap: Example
2
3
5
4
15
21
16
9
10
19
20
17
7
17
15
11
8
16
23
heapify-up(H, i)
1
2
3
4
5
6
while i > 1
j ← bi/2c
if key(H[i]) < key(H[j]) then
swap H[i] and H[j]
i←j
else break
Delete an Element from a Heap
17
3
17
3
4
5
17
10
4
15
21
16
9
10
17
19
20
7
17
15
11
8
16
23
heapify-down(H, i)
1
2
3
4
5
6
7
8
9
while 2i ≤ H.size
if 2i = H.size or key(H[2i]) < key(H[2i + 1]) then
j ← 2i
else
j ← 2i + 1
if key(H[j]) < key(H[i]) then
swap the array entries H[i] and H[j]
i←j
else break
Proof for the Correctness of Insertion and Deletion
Def. We say that H is almost a heap except that key(H[i]) is
too small if we can increase the key value of H[i] to make H a
heap.
Download