Lecture 10

advertisement
Lecture 10
Graph Algorithms
Definitions
Graph is a set of vertices V, with edges connecting some of the
vertices (edge set E).
An edge can connect two vertices.
An edge between vertex u and v is denoted as (u, v).
Motivating example
In a directed graph (digraph) all edges have directions. In
an undirected graph, an edge does not have a direction.
Unless otherwise mentioned a graph is undirected
A vertex v is adjacent to vertex u, if there is an edge
(u, v)
In an undirected graph, existence of edge (u, v)
means both u and v are adjacent to each other.
In a digraph, existence of edge (u, v) does not mean u
is adjacent to v.
An edge may have a weight.
A path in a graph is a sequence of vertices w1 w2
…..wP such that consecutive vertices wi wi+1 have
an edge between them, i.e., wj+1 is adjacent to wj
A path in a graph is simple if all vertices are distinct, except
possibly the first and the last one.
Length of a path is the number of edges in the path.
What is the length of a simple path of P vertices?
A cycle is a path of length at least 1 such that the
first and the last vertices are equal
A cycle is simple if the path is simple.
For undirected graph, we require a cycle to have distinct edges.
Length of a cycle is the number of edges in the cycle.
What is the length of a simple cycle of P vertices?
Connected Graphs
A graph is connected if there is a path from every vertex to
every other vertex.
A directed graph with this property is strongly
connected.
If a directed graph is not strongly connected, but
underlying undirected graph is connected then the
directed graph is weakly connected.
Subgraphs and Components
A subgraph of a graph is a graph which has a
subset of vertices and a subset of edges of the
original graph.
A component is a subgraph which satisfies two
properties
is connected, and
Maximal with respect to this property.
``Connected ‘’ will be replaced by ``strongly
connected’’ for digraphs.
Complete Graph
A graph which has edge between any vertex pair is
complete.
A complete digraph has edges between any two
vertices.
How many edges can a complete graph of N
vertices have?
N(N-1)/2
How about a digraph? N(N-1)
Representation of Graphs
Adjacency matrix:
Let there be N vertices,0, 1,….N-1
Declare a N x N array A(adjacency matrix)
A[j][k] = 1 if there is an edge (j, k)
Storage O(N2)
= 0 otherwise
What will be a the structure of A for an undirected graph?
A[j][k] = weight of edge (j, k) if edges have weights
= a very large or very small value if edge (j, k)
does not exist
Adjacency List
If the graph is complete or almost complete, then adjacency
matrix representation is fine, otherwise O(N2) storage is
used even though there are fewer edges
Adjacency matrix is a more efficient storage.
Every vertex has a linked list of the vertices which
are adjacent to it.
Note down the representation from the board?
Storage: O(V+E)
Read Section 9.1
However, problem with adjacency list is that one may
have to traverse the entire link list corresponding to a
vertex in order to locate an edge.
This can be somewhat reduced by having links
between edges.
There is a link from edge (1, 3) to edge (3, 1) for
example.
Get the representation from the board.
Sparse and Dense Graphs
Sparse graphs have (V) edges.
Dense graphs have (V2) edges.
What is the storage in adjacency list for sparse graphs?
dense graphs?
adjacency matrix for sparse graphs?
dense graphs?
Adjacency list is better for sparse graphs, adjacency matrix
for dense graphs
Degree Relations
Number of edges incident from a vertex is the degree of the
vertex in a graph.
Note down example from the board
Number of edges ending at a vertex is the indegree of the
vertex in a digraph.
Number of edges originating from a vertex is the outdegree
of the vertex in a digraph.
For a graph,
Sum of degrees of all vertices = 2. Number of edges
For a digraph, sum of indegrees of all vertices
= sum of outdegrees of all vertices
Real life examples where graphs
are useful
Cities and Roads
Networks:
Routers are vertices
Links are edges
Shortest path between vertices.
Constraint representation:
If A is there, B can not be there, etc.
Graph Traversal
Breadth First Search
Starts from a vertex s (source), and discovers all vertices
which are reachable from s
A vertex v is reachable from s if there is a path from s
to v.
Vertices are discovered in order of increasing shortest
simple path lengths.
Initially all vertices are colored white.
When a vertex is ``discovered’’, it is colored gray
First source is discovered, then the vertices adjacent
to source, etc.
When all vertices adjacent to a vertex v have been discovered,
the algorithm finishes processing v, and colors it black.
We use a FIFO queue in the search
Notation:
d[u] is the length of the shortest path from s to u
color[u] is the color of vertex u
pred[u] is the predecessor of u in the search
Pseudocode
BFS(G,s)
{
For each v in V,
{color[v]=white;
d[u]= INFINITY;
color[s] = gray;
d[s]=0;
Queue = {s};
While Queue is nonempty
{
u = Dequeue[Q];
pred[u]=NULL}
For each v in Adj[u], {
if (color[v] = white)
/*if v is discovered*/
{
color[v] = gray;
/*Discover v*/
d[v] = d[u] + 1;
/*Set distance of v*/
pred[v] = u;
/*Set pred of v*/
Enqueue(v);
/*put v in Queue*/
} }
Color[u] = black;
}
}
/*done with u*/
Note down example from board
Complexity Analysis
A vertex is visited once.
Thus the while loop is executed at most V times.
Complexity of operations inside the for loop is constant.
We want to compute the number of times the for loop is
executed.
For each vertex v the for loop is executed at most (deg v + 1)
times.
The factor 1 comes as for a 0 degree vertex we need a
constant complexity
Thus the for loop is executed v (deg v + 1) times
This equals V + 2E
Initialization complexity is V
Thus overall we have complexity V + V + 2E, i.e.
O(V+E)
Depth First Search
Another graph traversal process.
We want to visit all rooms in a castle.
Start from a room
Move from room to room till you reach an undiscovered room
Draw a graffiti in each undiscovered room
Once you reach a discovered room take a door which you
have not taken before.
Will have 3 possible colors for a vertex:
white for an undiscovered vertex
gray for a discovered vertex
black for a finished vertex
Will store predecessor
Will store 2 numbers for each vertex (timestamps)
When we first discover a vertex store a counter d[u]
When you finish off store another f[u]
d[u] is not the distance
Pseudocode
DFS(G)
{
For each v in V,
{color[v]=white;
pred[u]=NULL}
time=0;
For each u in V
If (color[u]=white) DFSVISIT(u)
}
DFSVISIT(u)
{
color[u]=gray;
d[u] = ++time;
For each v in Adj(u) do
If (color[v] = white)
{
pred[v] = u;
DFSVISIT(v);
}
color[u] = black; f[u]=++time;
}
Complexity Analysis
Note down example from board
There is only one DFSVISIT(u) for each vertex u.
Let us analyze the complexity of a DFSVISIT(u)
Ignoring the recursion calls the complexity is O(deg(u)+1)
We consider the recursive calls in separate DFSVISIT(v)
Initialization complexity is O(V)
Overall complexity is O(V + E)
DFS Tree Structure
Consider a directed graph.
Observe that if u is predecessor of v in DFS, there is an
edge (u, v) in the graph.
All such edges are predecessor edges or tree edges.
The predecessor edges constitute an acyclic graph (tree)
If there is a path from u to v in the original graph, such
that all edges in path (u, v) are predecessor edges, then
u is an ancestor of v in DFS tree and v is a descendant
of u.
An edge (u, v) in the graph such that v is an
ancestor of u is a ``back edge’’
An edge (u, v) where v is a proper descendant
of u is a forward edge.
An edge (u, v) where u is not an ancestor nor descendant
of v is a cross-edge.
Do cross edges exist in undirected graphs?
Relation between timestamps and
ancestry
u is an ancestor of v if and only if [d[u], f[u]] [d[v], f[v]]
u is a descendant of v if and only if [d[u], f[u]][d[v], f[v]]
u and v are not related if and only if [d[u], f[u]] and [d[v], f[v]]
are disjoint
These relations are called parenthesis lemma
Applications of DFS
DFS can be used to find out whether a graph or a
digraph contains a cycle.
Consider a digraph. It has a cycle if and only if the
graph has a back edge. The same holds for graphs.
Run DFS
Check the nature of every edge (How do you know whether
an edge is a back edge or not?)
If there is a back edge, then the graph has a cycle.
Complexity?
Now we show that a digraph has a cycle if and only if there
is a back edge.
If there is a back edge there is a cycle.
We show that if there is a cycle, there is a back edge.
Consider an edge (u, v) in a digraph. If it is a back
edge, then f[u] f[v]. Otherwise (for tree, forward,
cross edges) f[u] > f[v]
We show it as follows. For tree, back and forward edges,
the result follows from the parenthesis lemma.
For cross edge (u, v) note that the intervals [d[u], f[u]]
and [d[v], f[v]] are disjoint.
When we were processing u, v was not white otherwise
(u, v) will be a predecessor edge
Thus processing v started before processing u.
Thus d[v] < d[u].
Since the intervals are disjoint, this means f[v] < f[u].
Now we show that if there is a cycle, there is a back edge.
Suppose there is no back edge.
Move along any path. All edges are tree forward or cross
edges. Thus the finish times decrease monotonically.
Hence we don’t come back to the same vertex. Thus there
is no cycle.
Topological Sort
A DAG (directed acyclic graph) is a digraph without
any cycle.
Topological sort of DAG is ordering the vertices such that if
there is an edge (u, v) then u must come before v in the order.
Application: You have a set of tasks to be completed in a
factory. There are relations between some tasks such that A
must be finished before B begins (Example: To build second
floor you must construct first floor first, but there is no relation
between electrical wiring and plumbing).
We need to order the tasks.
We represent the tasks by vertices and there is an edge
(u, v) if u must be finished before v begins.
Next we do a topological sort on them.
(There is something wrong with the task relations if
the representation has a cycle. This can be detected by
a DFS cycle detection. So we assume that the graph is
a DAG).
Note that any ordering of the vertices, such that there is no
edge from a vertex later in the order to another which is
ahead in the order is a valid topological order.
Note that we have a DAG (no cycle). Thus there is no
back edge.
Thus for any edge (u, v) f(u) > f(v).
Thus vertices ordered in decreasing order of their
finish times has a topological order.
This can be attained as follows.
While running DFS, whenever a vertex is
colored black add it to the front of a linked list.
Output the linked list at the end.
Complexity?
Pseudocode
Topological Sort(G)
{
For each v in V,
{color[v]=white;
pred[u]=NULL}
time=0;
Linked list = empty;
For each u in V
If (color[u]=white) DFSVISIT(u)
Output Linked list;
}
Topology-DFSVISIT(u)
{
color[u]=gray;
d[u] = ++time;
For each v in Adj(u) do
If (color[v] = white)
{
pred[v] = u;
DFSVISIT(v);
}
color[u] = black; f[u]=++time;
Add u to the end of a linked list.
Strong Components
A strong component of a digraph is a subgraph which is
Strongly connected
Maximal w.r.t. this property
How many strong components does a strongly
connected digraph have?
A variant of DFS gives all strong components of a
digraph
If we start DFS from any vertex in a strongly connected
component, we will finish all other vertices in the strong
component before finishing this vertex,
And possibly finish a few other vertices. (leaking into
other components)
If there is no leaking, we are done!
If we know the strong components apriori, we
can prevent leaking by choosing the DFS order
properly.
Replace each strong component by a single vertex.
There is an edge from vertex A to vertex B if there
is an edge from a vertex u in component A to
another vertex v in component B.
The resulting graph is a DAG. Why?
Order the vertices of the new DAG in a reverse
topological order (reverse order of the normal
topological order).
Now, if you choose vertices in the aboveorder, you
are done, e.g., let the reverse topological order be
A, B, C, first choose a vertex of A, finish off, when
you need to choose a new vertex, choose that of
B,…etc.
However, we don’t know the components ahead of time. But,
the previous argument tells us that we need to use reverse
topological order somehow.
The following actually works!
Run DFS
Sort the vertices in decreasing order of finish times
Reverse the digraph.
Run DFS. Each time you need to choose a vertex
choose it in the sorted order.
Whenever you retrace to the main loop, you actually
start a new strongly connected component.
Complexity Analysis
What sort would you use?
What is the overall complexity?
Download