Set 4: Greedy Algorithms

advertisement
CSCE
411
Design and Analysis of
Algorithms
Set 4: Greedy Algorithms
Prof. Jennifer Welch
Spring 2012
CSCE 411, Spring 2012: Set 4
1
Greedy Algorithm Paradigm

Characteristics of greedy algorithms:




make a sequence of choices
each choice is the one that seems best so far, only
depends on what's been done so far
choice produces a smaller problem to be solved
In order for greedy heuristic to solve the
problem, it must be that the optimal solution
to the big problem contains optimal solutions
to subproblems
CSCE 411, Spring 2012: Set 4
2
Designing a Greedy Algorithm



Cast the problem so that we make a greedy (locally
optimal) choice and are left with one subproblem
Prove there is always a (globally) optimal solution to
the original problem that makes the greedy choice
Show that the choice together with an optimal
solution to the subproblem gives an optimal solution
to the original problem
CSCE 411, Spring 2012: Set 4
3
Some Greedy Algorithms






fractional knapsack algorithm
Huffman codes
Kruskal's MST algorithm
Prim's MST algorithm
Dijkstra's SSSP algorithm
…
CSCE 411, Spring 2012: Set 4
4
Knapsack Problem


There are n different items in a store
Item i :





weighs wi pounds
worth $vi
A thief breaks in
Can carry up to W pounds in his knapsack
What should he take to maximize the value of
his haul?
CSCE 411, Spring 2012: Set 4
5
0-1 vs. Fractional Knapsack

0-1 Knapsack Problem:



the items cannot be divided
thief must take entire item or leave it behind
Fractional Knapsack Problem:



thief can take partial items
for instance, items are liquids or powders
solvable with a greedy algorithm…
CSCE 411, Spring 2012: Set 4
6
Greedy Fractional Knapsack
Algorithm


Sort items in decreasing order of value per
pound
While still room in the knapsack (limit of W
pounds) do



consider next item in sorted list
take as much as possible (all there is or as much as
will fit)
O(n log n) running time (for the sort)
CSCE 411, Spring 2012: Set 4
7
Greedy 0-1 Knapsack Alg?

3 items:





item 1 weighs 10 lbs, worth $60 ($6/lb)
item 2 weighs 20 lbs, worth $100 ($5/lb)
item 3 weighs 30 lbs, worth $120 ($4/lb)
knapsack can hold 50 lbs
greedy strategy:



take item 1
take item 2
no room for item 3
CSCE 411, Spring 2012: Set 4
8
0-1 Knapsack Problem


Taking item 1 is a big mistake globally
although looks good locally
Later we'll see a different algorithm design
paradigm that does work for this problem
CSCE 411, Spring 2012: Set 4
9
Finding Optimal Code

Input:



data file of characters and
number of occurrences of each character
Output:


a binary encoding of each character so that the
data file can be represented as efficiently as
possible
"optimal code"
CSCE 411, Spring 2012: Set 4
10
Huffman Code

Idea: use short codes for more frequent
characters and long codes for less frequent
char
a
b
c
d
e
f
#
45
13
12
16
9
5
fixed
000 001 010 011 100
101
300
variable
0
1100
224
101 100 111 1101
CSCE 411, Spring 2012: Set 4
total bits
11
How to Decode?

With fixed length code, easy:


break up into 3's, for instance
For variable length code, ensure that no
character's code is the prefix of another

no ambiguity
101111110100
b
d
e
CSCE 411, Spring 2012: Set 4
aa
12
Binary Tree Representation
fixed length code
0
0
1
0
1
0
1 0
1 0
1
a
b c
d e
f
cost of code is sum,
over all chars c, of
number of occurrences
of c times depth of c in
the tree
CSCE 411, Spring 2012: Set 4
13
Binary Tree Representation
0
1
variable length code
a 0
1
0
1
0
1
c
b 0
1
f
e
d
cost of code is sum,
over all chars c, of
number of occurrences
of c times depth of c in
the tree
CSCE 411, Spring 2012: Set 4
14
Algorithm to Construct Tree
Representing Huffman Code



Given set C of n chars, c occurs f[c] times
insert each c into priority queue Q using f[c] as key
for i := 1 to n-1 do




x := extract-min(Q)
y := extract-min(Q)
make a new node z w/ left child x (label edge 0), right child y
(label edge 1), and f[z] = f[x] + f[y]
insert z into Q
CSCE 411, Spring 2012: Set 4
15
<board work>
CSCE 411, Spring 2012: Set 4
16
Graphs

Directed graph G = (V,E)




In an undirected graph, edges are unordered pairs of
vertices
If (u,v) is in E, then v is adjacent to u and (u,v) is incident
from u and incident to v; v is neighbor of u


V is a finite set of vertices
E is the edge set, a binary relation on E (ordered pairs of vertice)
in an undirected graph, u is also adjacent to v, and (u,v) is incident
on u and v
Degree of a vertex is number of edges incident from or to
(or on) the vertex
CSCE 411, Spring 2012: Set 4
17
More on Graphs

A path of length k from vertex u to vertex u’ is
a sequence of k+1 vertices s.t. there is an
edge from each vertex to the next in the
sequence




In this case, u’ is reachable from u
A path is simple if no vertices are repeated
A path forms a cycle if last vertex = first
vertex
A cycle is simple if no vertices are repeated
other than first and last
CSCE 411, Spring 2012: Set 4
18
Graph Representations



So far, we have discussed graphs purely as
mathematical concepts
How can we represent a graph in a computer
program? We need some data structures.
Two most common representations are:


adjacency list – best for sparse graphs (few edges)
adjacency matrix – best for dense graphs (lots of
edges)
CSCE 411, Spring 2012: Set 4
19
Adjacency List Representation



Given graph G = (V,E)
array Adj of |V| linked lists, one for each vertex in V
The adjacency list for vertex u, Adj[u], contains all
vertices v s.t. (u,v) is in E



each vertex in the list might be just a string, representing its
name, or it might be some other data structure
representing the vertex, or it might be a pointer to the data
structure for that vertex
Requires Θ(V+E) space (memory)
Requires Θ(degree(u)) time to check if (u,v) is in E
CSCE 411, Spring 2012: Set 4
20
Adjacency Matrix Representation



Given graph G = (V,E)
Number the vertices 1, 2, ..., |V|
Use a |V| x |V| matrix A with (i,j)-entry of A being




1 if (i,j) is in E
0 if (i,j) is not in E
Requires Θ(V2) space (memory)
Requires Θ(1) time to check if (u,v) is in E
CSCE 411, Spring 2012: Set 4
21
More on Graph Representations

Read Ch 22, Sec 1 for



more about directed vs. undirected graphs
how to represented weighted graphs (some value is
associated with each edge)
pros and cons of adjacency list vs. adjacency matrix
CSCE 411, Spring 2012: Set 4
22
Minimum Spanning Tree
16
5
12
4
7
3
6
11
14
9
8
10
2
15
13
17
18
Given a connected undirected graph with edge weights,
find subset of edges that spans all the nodes,
creates no cycle, and minimizes sum of weights
CSCE 411, Spring 2012: Set 4
23
Facts About MSTs



There can be many spanning trees of a graph
In fact, there can be many minimum
spanning trees of a graph
But if every edge has a unique weight, then
there is a unique MST
CSCE 411, Spring 2012: Set 4
24
Uniqueness of MST




Suppose in contradiction there are 2 MSTs,
M1 and M2.
Let e be edge with minimum weight that is in
one MST but not the other (say it is in M1).
If e is added to M2, a cycle is formed.
Let e' be an edge in the cycle that is not in M1
CSCE 411, Spring 2012: Set 4
25
Uniqueness of MST
e: in M1 but not M2
M2:
e': in M2 but not M1;
wt of e' is less than wt of e,
by choice of e
Replacing e with e' creates a new MST M3
whose weight is less than that of M2
CSCE 411, Spring 2012: Set 4
26
Kruskal's MST algorithm
16
5
12
4
7
3
6
11
14
9
8
10
2
15
13
17
18
consider the edges in increasing order of weight,
add in an edge iff it does not cause a cycle
CSCE 411, Spring 2012: Set 4
27
Why is Kruskal's Greedy?

Algorithm manages a set of edges s.t.


At each iteration:



these edges are a subset of some MST
choose an edge so that the MST-subset property
remains true
subproblem left is to do the same with the remaining
edges
Always try to add cheapest available edge that will
not violate the tree property

locally optimal choice
CSCE 411, Spring 2012: Set 4
28
Correctness of Kruskal's Alg.




Let e1, e2, …, en-1 be sequence of edges
chosen
Clearly they form a spanning tree
Suppose it is not minimum weight
Let ei be the edge where the algorithm goes
wrong


{e1,…,ei-1} is part of some MST M
but {e1,…,ei} is not part of any MST
CSCE 411, Spring 2012: Set 4
29
Correctness of Kruskal's Alg.
M:
ei, forms a cycle in M
wt(e*) > wt(ei)
e* :
min wt. edge
in cycle not in
e1 to ei-1
replacing e* w/ ei forms
a spanning tree with
smaller weight than M,
contradiction!
gray edges are part of MST M, which contains e1 to ei-1,
but not ei
CSCE 411, Spring 2012: Set 4
30
Note on Correctness Proof



Argument on previous slide works for case
when every edge has a unique weight
Algorithm also works when edge weights are
not necessarily unique
Modify proof on previous slide: contradiction
is reached to assumption that ei is not part of
any MST
CSCE 411, Spring 2012: Set 4
31
Implementing Kruskal's Alg.

Sort edges by weight


efficient algorithms known
How to test quickly if adding in the next edge
would cause a cycle?

use disjoint set data structure, later
CSCE 411, Spring 2012: Set 4
32
Another Greedy MST Alg.


Kruskal's algorithm maintains a forest that
grows until it forms a spanning tree
Alternative idea is keep just one tree and
grow it until it spans all the nodes


Prim's algorithm
At each iteration, choose the minimum weight
outgoing edge to add

greedy!
CSCE 411, Spring 2012: Set 4
33
Download