production span

advertisement
Introduction
Topological sorting is a common operation performed on
directed acyclic graphs (dags). It arises in numerous applications
including task scheduling, serialization and dependency
determinations, and graph drawing. The goal of this project is to
manipulate graph representations, perform topological sorting on a
dag, and use the topological numbers in a longest path computation
as well as in a heuristic solving a scheduling problem.
Degree-based Topological Sort
Given is a dag G=(V,E), V={0, 1, … ,n-1}, having n vertices and
m edges. The graph is represented in the form of adjacency lists. For
each vertex u, Aout[u] points to a list containing all vertices v such
that <u,v> is an edge from u to v and Ain[u] points to a list containing
all vertices v such that <v,u> is an edge from v to u. The length of list
Aout[u] is the out-degree of node u and the length of list Ain[u] is the
in-degree of node u.
In any dag there exists at least one vertex having in-degree 0
(called the sources) and at least one vertex with out-degree 0 (called
the sinks). The dags used in the project can have an arbitrary number
of sources, but will always have one sink. The dag shown in Figure 1
has two sources (vertices A and G) and one sink (vertex F).
Figure 1: A 7-vertex
dag with two sources and one sink.
Topological sorting assigns to each vertex an integer (also
called its topological number). See Chapter 4.2 in the text for details.
The topological numbers represent a linear ordering of the vertices
such that for every edge <u,v> vertex u appears before v in the
ordering. For the graph in Figure 1, the ordering A G B D C E F is a
topological ordering and the topological numbers are A=0, B=2, C=4,
D=3, E=5, F=6, G=1. Note that topological orderings are not unique.
We note that the textbook presents topological sorting as an
application of DFS (Algorithm 4.5 in Chapter 4.2). Another approach
is a non-recursive implementation based on a degree-driven graph
exploration. In this project you need to implement the degree-based
topological sorting approach described below.
All source vertices are ready to be placed. For Figure 1,
vertices A and G can be placed right away. Once a vertex u has all
the vertices incident to its incoming edges placed, u can be placed.
This observation forms the basis of degree-based topological sorting
algorithm.
The algorithm uses a queue TSQ. Whether one uses a FIFO or
a LIFO (i.e., stack) queue does not matter for determining a
topological ordering (different queues will produce different
orderings). The topological numbers are maintained in an array, say
array T.
The high-level description of the degree-driven algorithm is as
follows:
1 Make all initializations; includes a variable count set to 0 and
placing all sources in queue TSQ.
2 u = dequeue() and we set T[u]=count
3 All of u’s out-going edges are examined. For each edge <u,v>,
reduce the number of edges into v by 1. Once all incoming
edges of vertex v have received topological numbers, this count
would be zero and v is placed into TSQ.
4 Repeat steps 2 to 3 until TSQ is empty.
The needed book-keeping details are yours to work out. In
particular, you need to decide how to manage the counts into vertices
used in step 3. Keep in mind that the running time of step 3 needs to
be proportional to the number of v's outgoing edges.
You can assume that all sinks can be reached from at least one
source. The algorithm implementation needs to run in O(n+m) time.
Longest Path Computation and Scheduling
Assume the dag describes a manufacturing process and each
vertex corresponds to one production step. Production step u cannot
start until all production steps associated with the incoming edges
incident to u have been completed. A topological ordering now
represents one possible production sequence. Manufacturing times at
different production steps vary in applications, but in this project we
assume that every production step takes the same amount of time.
When there is no limit on the number of production steps that can be
carried out simultaneously, what is the minimum time in which all
production steps can be completed? We refer to the time needed to
complete a manufacturing process as the production span. The
production span cannot be shorter than the length of the longest path
from any source to the sink, plus 1 (because the longest path counts
the number of edges).
The longest paths is also called the critical path length (critical
as reducing the production span needs to shorten the length of the
longest paths in the dag.) You need to determine the length of the
longest path in the dag and a manufacturing schedule which
achieving minimum production span. The number of production steps
that can execute at the same time is not limited.
Figure 2: A 8-vertex dag with a longest path of 3.
The DAG in Figure 2 has a longest path of length of 3. This
means that it is possible to schedule the eight production steps in a
production span of 4. Here is a possible schedule: A, (B, C, D), (E, F,
G), H. At time 0, production step A is the only one executing; at time
1, three production steps are executing simultaneously, etc. At any
point, no more than 3 stations are used simultaneously (and thus no
more than 3 production steps execute simultaneously).
Algorithm 4.10 in Section 4.4 of the text presents an O(n+m) time
shortest path algorithm for a DAG. By changing the vertex relaxation,
the algorithm can be changed into the needed longest path algorithm.
See your class notes for more detail.
Scheduling with k Stations
Assume now that at most k manufacturing stations can be used
at any time. You have already developed solutions for two cases:
when k=1 and when the value of k can be as large as needed.
Assume k>1. It turns out that generating a schedule using at most k
stations simultaneously and minimizing the production span is not an
easy problem. You will implement an algorithm that is a natural
generalization of the two earlier algorithms. The algorithm will never
use more than k manufacturing stations, but the resulting production
span may not be the smallest possible (for k stations). This means
that the algorithm will make decisions on which production steps to
assign to stations at times steps that may to not result in the best
possible production span. However, your algorithm will be be fast run in O(n+m) time - and will never use more than k stations.
You are asked to develop an O(n+m) time algorithm for
scheduling the n manufacturing steps when at any time at most k
stations are available. Thus, k tasks can execute simultaneously at
any time. The value of k is given. Your algorithm should assign to
each vertex u a level number L(u) such that
▪ every level contains at most k vertices, and
▪ for every edge <u,v>, we have L(u)<L(v).
The largest level represents the achieved production span.
You need to work out a number of implementation details in your
scheduling algorithm. Make sure it runs in O(n+m) time. Your
algorithm should be based on the following approach:
1 The vertices are considered in topological order. For each vertex v,
assign v to a level p if (i) level p is at least one level higher than
the largest level assigned to any of v’s incoming edges, and (ii)
level p does not already have k vertices assigned to it. If more
than one vertex qualifies for placement, an arbitrary one is
chosen.
2 Repeat the process until all vertices have been scheduled.
3 Report the schedule and the achieved production span.
Summary of Tasks
This project asks you to implement the following tasks. Each task
needs to have an O(n+m) time implementation.
1 For a given dag G, determine the topological number of each vertex
using the degree-based topological sorting algorithm.
2 For a given dag G, determine for the length of the longest path.
3 For a given dag G, use the topological numbers and the longest
paths entries to generate a schedule that completes all
manufacturing steps in the shortest production span. Return the
schedule.
4 For a given dag G and an integer k, 1< k < n, use the topological
numbers to determine a schedule using at most k stations.
Return the schedule.
For those interested in using graph visualization tools, we provide the
following links to resources:
▪ Network Workbench - http://nwb.cns.iu.edu/index.html
▪ graphviz http://www.linuxdevcenter.com/pub/a/linux/2004/05/06/graphviz
_dot.html
▪ Cytoscape - http://www.cytoscape.org/
Programming Assignment
The skeleton code contains a number example dags. Each dag file
has the following format:
<number of vertices> vertices, <number of edges> edges
0:
In: <incoming edges for vertice>
Out: <outgoing edges for vertice>
1:
In: <incoming edges for vertice>
Out: <outgoing edges for vertice>
...
n:
In: <incoming edges for vertice>
Out: <outgoing edges for vertice>
For example, for a 4-vertex dag the input can look like:
4 vertices, 6 edges
0:
In: 2
Out: 1 3
1:
In: 0 2
Out: 3
2:
In:
Out: 3 0 1
3:
In: 2 0 1
Out:
Represent each dag so that it maintains for each vertex the list
of out-edges as well as in-edges. You need to define a class Digraph
that you can adapt from the one given from Princeton library. Add
additional entries as needed.
For tasks 3 and 4, the generated schedule is represented as
described in Schedule.java. The data structure resembles the
structure of adjacency lists: it is an array of lists. The size of the array
equals the production span and the list at the j-th entry of the array
contains the steps executing simultaneously at time j (listed in
arbitrary order). Do not modify Schedule.java. You can use methods
in Schedule.java to construct a schedule.
You must implement the following methods of the
DagUtilities class:
▪ public static int[] topologicalSort(Digraph G)
▪
perform topological sort on dag G and assign topological
numbers.
▪ public static int longestPath(Digraph G)
▪
return the length of the longest path in the dag G.
▪ public static Schedule minProdSpanSchedule(Digraph G)
▪
for dag G, return a schedule minimizing the production span.
▪ public static Schedule spanKStations(Digraph G, int k)
▪
for dag G and integer k, return a schedule using at most k
stations.
▪ public static Digraph readGraphFromFile(File file)
▪
Read a dag from input file and return the dag
Testing
You have been provided with several example trees which you
should use to test your code. To run our main function, use the
following command:
$ java -cp <colon separated path(s) to classes>
Project5Test [Graph_File_Path]
If you want to specify the number of stations for task 4, use the
following command:
$ java -cp <colon separated path(s) to classes>
Project5Test [Graph_File_Path] -s [Number_Of_Stations]
Analysis Questions
Your report must answer each question briefly and clearly.
1 State and explain the asymptotic performance of your algorithm
determining the topological numbers.
2 State and explain the asymptotic performance of your algorithm
determining the longest path in the dag.
3 State and explain the asymptotic performance of your algorithm
determining a schedule achieving minimum production span.
4 State and explain the asymptotic performance of your algorithm
determining a schedule using no more than k stations.
5 Give an example of a dag consisting of at least 15 vertices on
which your k-station algorithm fails to find a schedule
minimizing the production span for k>1. Explain at what point
your algorithm makes the wrong decision.
6 Give an example of a dag consisting of at least 15 vertices on
which your k-station algorithm generates a schedule achieving
minimum production span, k>1.
7 For the dag you used in question 6: Does your algorithm produce a
schedule achieving minimum production span when the
topological ordering used is the one generated by executing a
DFS on the dag? Explain and illustrate your answer.
For the dag in file dagQ8.txt(400 vertices, 850 edges) create a plot
having k on the x-axis (starting with k=1) and the resulting production
span on the y-axis. Note that the dag has a longest path length of 17
and thus the minimum production span possible is 18 (it requires 107
stations). Discuss your plot.
Project 5: Implement four tasks on dags, each having O(n+m) time
performance.
1.For a given dag G, determine the topological number of each vertex using the
degree-based topological sorting algorithm.
• Do not use the DFS approach, but the described degree-based approach
2.For a given dag G, determine for the length of the longest path.
• Use the topological numbers and edge relaxation(on max)
17



Longest path of length is3
Shortest production span is4
Possible schedule:
A, (B, C, D), (E, F, G), H

Possible schedules if only two
stations are available:
A, (B,C), (D,E), (F,G), H A, (B,D), (C,F), (E,G, H
18
Project 5: implement four tasks on dags, each having O(n+m) time
performance.
3. For a given dag G, use the topological numbers and the longest paths entries
to generate a schedule that completes all manufacturing steps in the shortest
production span. Return the schedule.
4. For a given dag G and an integer k, 1< k < n, use the topological numbers to
determine a schedule using at most k stations. Return the schedule.
 The schedule for k=1 is the topological order
 No benefit when having more stations than long

About the dags and schedules

 Take a careful look at the input format
 Dags can have multiple sources, but have only one sink
 Determine what is needed in class Digraph on entries and methods
 Class schedule cannot be changed
Size of the array corresponds to the production span
Linked list at index j lists production steps executing at time j Longest linked list
determines how many stations are used

 There are other representations of schedule, but you need to use the one given



20
Analysis question 5
Give an example of a dag consisting of at least 15 vertices on which your k-station algorithm
fails to find a schedule minimizing the production span for k>1.
Explain at what point your algorithm makes the wrong decision.
For small dags, you can check whether your schedule is feasible and whether it
generates minimum production span.
For large dags, you can write a simple methods to check whether your schedule
is feasible. However, determining whether it produces minimum production span
is not easy.
Choose a k (2 or 3) and a dag such that your scheduling algorithm makes a
mistake.You have to do more than find a counterexample. The example has to fool
your algorithm!
21
About the schedules




 The graph were generated by a dag library and they seem to be “schedule
friendly”
 For k>1, two students may generate schedules having a different production
span.
 For every production span, there generally exist a number of valid schedules.
 Finding an example where your algorithm fails corresponds
Download