Algorithms for Design-Automation Mastering Nanoelectronic Systems Semester: second semester Author: yijun Qu Title: A New Scheduling Algorithm Based On SDC Formulation Advisor: Prof. Dr. Martin Radetzki Data: 08/05/2007 1 Contents ABSTRACT..........................................................................................3 1.INTRODUCTION ....................................................................................................... 4 1.1 Behavioral synthesis ................................................................ 4 1.2 Scheduling................................................................................ 6 2.RELATED CONCEPTS ................................................................................ 6 2.1 Scheduling methods ................................................................. 7 2.1.1 Data-flow-based scheduling ............................................................................ 7 2.1.2Control-flow-based scheduling......................................................................... 7 2.2 CDFG ....................................................................................... 7 2.3 Scheduling variables ................................................................ 8 2.4 Difference constraints .............................................................. 9 2.5 System of difference constraints ............................................ 10 2.6 Linear programming .............................................................. 10 3. SDC-BASED SCHEDULING ........................................................................11 3.1 Modeling scheduling constraints ........................................... 12 3.1.1 Dependency constraints ................................................................................. 12 3.1.2 Timing constraints ......................................................................................... 13 3.1.3 Resource constraints ...................................................................................... 16 3.2 Finding objective functions.................................................... 17 3.2.1 ASAP and ALAP scheduling ......................................................................... 17 3.2.2 Optimizing longest path latency .................................................................... 18 3.2.3 Optimizing expected overall latency ............................................................. 18 3.3 LP problem solving and complexity analysis ........................ 19 4. CONCLUSIONS ....................................................................................................... 20 5. REFERENCES.......................................................................................................... 20 2 A New Scheduling Algorithm Based On SDC Formulation Yijun Qu INFOTECH, Uni-Stuttgart quyijun@gmail.com ABSTRACT As the design getting more and more complex and the competitive market asking for a shorter time-to-market, the idea of higher level of design is becoming necessary. Without focusing on the details of RTL-level, designers just outline the design in a broader and more abstract manner, which gives more flexibility and efficiency. Scheduling is an important role in high-level synthesis, however the existing scheduling method is either not so efficient or can not support various constraints. In this report, we discuss about a new scheduling algorithm, which converts the scheduling constraints into a system of difference constraints( SDC), and doing the optimization by a mathematical programming tool, linear programming (LP). We will see that the SDC-based scheduling algorithm can handle constraints like dependency constraints, timing constraints, resource constraints, etc. and optimize the longest path latency and expected overall latency. By solving the LP problem, we can map the solution into a FSM-like state transition graph which represents the scheduler of the controller. This part will not be showed in this report because it’s application-dependent. Keywords: Behavioral synthesis, Linear programming, Scheduling, system of difference constraints or SDC 3 1.INTRODUCTION In this section, we discuss about the birth of behavioral synthesis and the main role of it which is called scheduling. 1.1 Behavioral synthesis In the early 1990’s, the introduction and development of RTL-synthesis let hardware designers move from a technology-dependent design platform, which needs lots of details, to a higher technology-independent level which describe the functionality in a more abstract way. Meanwhile, the usage of new languages such as VHDL and Verilog became popular. Fortunately or unfortunately, things are just getting more and more complicated. The design system, normally Very Large Scale Integration (VLSI) Circuit, is becoming much more complex so that an increase of designing time is required. On the other hand, economical pressure requires a decrease in time-to-market as well, which is the essential point to make a profitable chip. How to carry out a much more complex chip with less time becomes a serious problem for the designer. Furthermore, for a target design project, there are amounts of choices of possible hardware architectures. How to find a suitable architecture for the design is an experience-dependent problem and always time-consuming even for an experienced designer. This makes impossible for the designer to change the architecture in the middle of the design procedure. According to the problems discussing above, the idea of designing at a more abstract level coming up. Behavioral synthesis or high-level synthesis is kind of solution to work out the problem. Behavioral synthesis is a process which transforms the behavioral specification directly into RTL-level specification which can be used as input to gate-level synthesis flow. This process results in a so-called data-path and a controller description. The general process of behavioral synthesis is showed in Figure 1.1. Behavioral description, which specifies the functionalities of the chip have and how this chip communicate with the outside environment, generates a data-path and a controller description using high-level synthesis. Normally there are three parts of data path: functional units (such as adders, multipliers, ALU), memory (such 4 as RAM, ROM, register), and interconnect to transfer data between memory and functional units (such as buses, wires). The controller description gives the information about how the flow of data in the data-path is organized. It’s represented with states and state transitions. Behavioral description High-level synthesis Control description Data-path Logic synthesis Module generation Gate network Module description Layout generation Layout description Figure 1.1: the process of Behavioral synthesis In the light of higher level of abstraction, the advantages of high-level specifications compared with RTL-level are: z Smaller and less complex. z Easier to understand and debug. z Easier to write and maintain. z Faster to simulate. z Architecture flexibility, due to the fact that behavioral description is architecture-independent. Anyway, there are still some limitations of behavioral synthesis [5]. It’s not a wise idea to implement every design by this high-level synthesis tool, certain designs should be realized with different or more specialized tools. In other words, behavioral synthesis should not be used in following situations: z The design is asynchronous, which can not be implemented by high-level synthesis technology. z The required architecture is well-know, it’s inefficient to try a more abstract way of describing when the structure is clear. z The design is near the limitation of area or performance capabilities of behavioral synthesis, because the technology obtains its flexibility at the cost of more area and less performance. 5 Given the behavioral description and some constraints and goals as well, the task of high-level synthesis is to find a best architecture solution. Changing the constraints will lead to modification of the architecture. The constraints are different according to different applications, such as resource, time, and area. Behavioral synthesis aims at generating the RTL description from behavioral description automatically. But in order to design such kind of synthesis tool, we have to look into detail the process of high-level synthesis. If we do so, we will find that before data-path and controller generation, we have to decide: when will the operations can be executed, known as scheduling; what kind of resources we are using in the data-path, also known as selection; and how many resources are needed in the data-path, also known as allocation; in the end, to decide which operations should be executed on which resources, which is known as binding [4]. 1.2 Scheduling Scheduling is an important role in behavioral synthesis as mentioned before. The quality of scheduling algorithm will have a significant influence on the result quality of the behavioral synthesis. The function of that is to arrange the time at which different computations and communications are performed, in other words, it controls the parallelism of the implementation. Worthy to mention is that, scheduling is not a newly-appeared technology although the commercial behavioral synthesis tools are relatively new. Existing scheduling algorithms can be classified into different categories according to different applications. General categories are: unconstrained design and those who designed under some special constraints. Another one is: data-flow-based (DF-based) scheduling and control-flow-based (CF-based) scheduling. In the next section, I will give some important examples of DF-based scheduling and CF-based scheduling. 2.RELATED CONCEPTS This section gives general idea about the basic concepts which will be mentioned in the following sections, including scheduling methods, linear programming (LP), scheduling variables, difference constraints, system of difference constraints, etc. 6 2.1 Scheduling methods There are lots of classifications of scheduling, one of them classify the scheduling algorithms into two categories: 2.1.1 Data-flow-based scheduling Data-flow-based scheduling emphasizes the data-flow intensive applications, and can be further divided into two classes, one is called time-constrained scheduling and the other is called resource-constrained scheduling. Time-constrained scheduling is to schedule the resources within the given time frame in order to reduce the required resources. One of the common methods of this is force-directed scheduling, which calculates a force value for each operation, and then schedule the operation with the smallest force. Resource-constrained scheduling is exactly doing the opposite thing, given the number of resources and trying to minimize the time (ns) used by the operations. The most popular way to solve the problems is called list scheduling, which gives each operation different priority and forms kind of list, then schedules them in order to fulfill the resource constraints. 2.1.2Control-flow-based scheduling Control-flow-based scheduling is used in control-flow intensive applications such as controllers and network protocol processors. One of the earliest methods of it is named Path-based scheduling. Path-based scheduling algorithms consider all possible sequences of operations (called paths) in a control-flow graph. The mechanism is to schedule the individual path as soon as possible; the solution of it is the minimum control steps, taking into account constraints which will limit the number of operations can be organized in one control step. This scheduling algorithm is practically used, although the complexity of it is proportional to the number of paths in the control-flow. 2.2 CDFG Definition: in [1] A CDFG is a directed graph G (VG, EG) where VG = Vbb ∪ Vop and EG = Ec ∪ Ed. Vbb is a set of basic blocks. A basic block is code that has 7 one entry point, one exit point and no jump instructions contained within it. Basic blocks form the vertices and nodes in control data graph. In CDFG, it always represents a data flow graph, in which the nodes represent the basic computation and the edges show the data dependency. Furthermore, Vop is the entire set of operation nodes in G, and each operation node in Vop belongs to exactly one basic block. Data edges in Ed denote the data dependencies between operation nodes. Control edges in Ec represent the control dependencies between the basic blocks. Each control edge ec ∈ Ec is associated with a branching condition. Figure 2.1 is an example of CDFG which shows the algorithm of a greatest common divisor (GCD). Figure 2.1 an example of CDFG Where the square boxes represent the basic blocks and the lines show the data dependency while the dotted ones tell us the control dependency between nodes and blocks. 2.3 Scheduling variables Scheduling variable is kind of concept to describe the schedule of an operational node in the CDFG. The definition is showed below: Definition: in [1] given a CDFG G (Vbb ∪ Vop, Ec ∪ Ed.), each node v ∈ Vop is associated with a set of scheduling variables {svi (v)| i ∈ [0, Lv]} where Lv = 8 Latency (v). We introduce scheduling variables of operational nodes to represent their pipeline latency. The value of the scheduling variable captures the temporal position (in the content of control state) of operation node in the final schedule (mapping into state transition graph). In other words, one operation can be scheduled at different time cycle according to its latency. In terms of the final state machine, the value of the scheduling variable of a node represents the longest single path (maximum clock cycles) from initial state to the state where the node is executed. For example, in Figure 2.2 (a), the operation node showed is part of the CDFG, and then we apply a set of scheduling variables to this node according to its pipeline latency, as in figure 2.2 (b) the latency is two clock cycles, so we give two scheduling variables svbeg = k and svend = k+1 (k is unknown), in the final state transition graph Figure 2.2 (c), operation node A first executes in state s1 and ends in state s2, the value of k means the maximum clock cycles from initial state to s1. Figure 2.2 (a) partial of CDFG (b) clock cycle diagram (c) partial of state transition graph 2.4 Difference constraints As mentioned before, in order to model the scheduling constraints given by the design task, we introduce the concept of difference constraints: Definition: in [1] an integer difference constraint is a formula n the form of x – y ≤ b for integer variables x and y, and a constant b. Using the scheduling variables, we can model the scheduling constraints as a set of difference constraints which later can be used to solve scheduling problem. 9 2.5 System of difference constraints Definition: in [1] a system of difference constraints SDC (X, C) consists of a set X of variables and a set C of linear inequalities of the form xj – xi ≤ bk, where 1 ≤ i, j ≤ n and 1 ≤ k ≤ m. Where n is the number of variables and m is the number of constraints. Furthermore, the system of difference constraints (SDC) can be formed into a graph representation, in which a vertex represents a variable and edge of b-weight from x to y represents inequality x – y ≤ b. And then, using the constraint graph we can check whether SDC is feasible or not by Theorem: An SDC is feasible if and only if its constraint graph has no negative cycles. Negative cycle is a cycle which contains negative sum of weights. This can be detected by solving a single-source shortest-path problem [brun97] on the graph. 2.6 Linear programming In mathematics, linear programming problem is used to optimize a linear objective function, subject to linear equality and inequality constraints. Definition: Linear programming is a method to solve the optimization problems of a given objective function, under linear equality and inequality constraints. The standard form of LP is: Maximize cTx Subject to Ax ≤ b Where x≥0 Where x represents the vector of variables, while c and b are vectors of coeficients, furthermore A is a matrix of coefficients. The function which is ready to be maximized or minimized is called objective function. In the formulation, cTx is the objective function. When applying LP to solve SDC-based scheduling problems, we should notice that the matrix A of coefficients in LP fomulation is limited to a n×2 matrix due to the fact that SDC only works with the problem of difference constraints. There are lots of algorithms to solve the LP problem, the most common one is named simplex algorithm [3] which deal with the problem by constructing an 10 initial solution at one of the vertex on polyhedron and then run along edges of the polyhedron to other vertice until eventually find the optimal solution. It’s a precise method to conclude the optimal solution if there is some Therefore, even simplex method is normally used and quite efficient in the real life, lots of other algorithms are proposed such as interior point method. While sometimes we apply the LP to solve physical problems, the solution of which should not be arbitrary but integer. For example, if we want to minimize the cost of buying computers for the university, then it makes no sense when the result calculated by simplex method is 120.5 computers. Therefore, we need the introduction of integer linear programming (ILP). Definition: ILP is kind of LP whose unknown variables are all required to be integers. Compared to the efficient solving of the LP problems in the worst case, the relaxation (LP-relaxation of a linear programming problem with integer constraints is the problem arises when the integer constraints are ignored) of integer linear programming problems in practice is always very hard. To solve an ILP problem, if the solution is integer, then it should always be the optimal result, if not, we know the upper or lower bound at least. One of the direct methods to solve the non integer situation is called rounding [2], which gives the closest integer to the non integer solution. For simplicity, when the coefficient matrix of the constraints inequality is unimodular, then it can be proved that the optimal solution of the objective function is exact integral. Definition: Unimodular matrix means every nonsingular square sub-matrix has a determinant of 1 or -1. According to the definition, we can significantly decrease the complexity of the relaxation of ILP. 3. SDC-BASED SCHEDULING This section presents the procedure of solving the scheduling problems with SDC-based scheduling. The problem we are trying to solve is stated as follows: Given: a.) A CDFG G (VG, EG); b.) A set of scheduling constraints C which may include dependency constraints, relative timing constraints, resource constraints, latency 11 Goal: constraints, cycle time constraints. get the optimal scheduling by means of solving ILP problem. The scheduler can be further represented by a FSM-style STG (state transition graph), which will not be discussed in this report. 3.1 Modeling scheduling constraints Using scheduling variables, we can convert the given scheduling constraints into a set of difference constraints. All the formulas are written according to CDFG definition. 3.1.1 Dependency constraints Dependency constraints can be divided into two categories: data dependency and control dependency, both of which could be exposed in the CDFG. a.) Data dependency constraints Data dependency constraints are to ensure the right functionality of the description. In other words, according to the CDFG, if there is an edge from node a to node b, then node b can not be executed before node a has been executed. Node a and node b represent different operations. The standard form of data dependency constraints using the definition of CDFG is: in [1] ∀ e (vi, vj) ∈ Ed : svend (vi) – svbeg (vj) ≤ 0 where vi, vj are operations expressed by nodes, and Ed is the edge from vi points to vj, then the value of largest scheduling variable of vi, svend (vi), is less than the value of smallest scheduling variable of vj, svbeg (vj).As an example, in Figure 3.1 svbeg (vi) – svend (vj) ≤ 0. Figure 3.1 an example of data dependency constraints 12 b.) Control dependency constraints Control dependency constraints are also help to ensure the right functionality of operations. Assume we have two basic blocks bbi and bbj, and there is a control edge from bbi points to bbj. Therefore, all the operation nodes in bbj can not be scheduled until all the operation nodes in bbi have executed. In order to formulate the relationship we discussed before, we use the notion: ssnk (bbi) and ssrc (bbj) to specify the super-sink of bbi and the super-source of bbj. Super-sink/sources are artificial nodes used to polarize each basic block. The standard form of control dependency constraints using the definition of CDFG is: in [1] ∀ e (bbi, bbj) ∈ Ec : svend (ssnk (bbi)) – svbeg (ssrc (bbj)) ≤ 0 For example, as showed in Figure 3.2 the lines represent data flow and the dotted ones represent control flow. Control edge points from bbi to bbj which means that the operation nodes from bbj can not be executed until all the operations in bbi have been finished. Figure 3.2 an example of control dependency constraints 3.1.2 Timing constraints Timing constraints can also be efficiently transformed into difference constraints for behavioral synthesis. The following are three most common 13 constraints. a.) Relative timing constraints A minimum time constraint Lij between node vi and node vj guarantees that the time slot between the earliest scheduling of these two nodes is larger than lij number of clock cycles. In [1] Svbeg (vj) – Svbeg (vi) ≥ Lij In Figure 3.3 it’s an example of relative timing constraints. The scheduling of second operation node vj is not valid. Figure 3.3 an example of relative timing constraints A maximum time constraint uij between node vi and node vj ensures that the time slot between the earliest scheduling of these two nodes is smaller than uij number of clock cycles. In [1] Svbeg (vi) – svbeg (vj) ≤ uij b.) Latency constraints Latency constraints mean the maximum acceptable latency for a subgraph of CDFG. The subgraph should have an entry block bbi and an exit block bbj, for instance, a single basic block, a loop body, etc. Assume that the latency constraint for the subgraph is specified by Tlat. In [1] Svend (ssnk (bbj)) – svbeg (ssrc (bbi)) ≤ Tlat Which means the subgraph with the super-source block bbi and super-sink block bbj can not be executed more than Tlat clock cycles. In Figure 3.4 is an example of latency constraints. 14 Figure 3.4 an example of latency constraints c.) Cycle time constraints In order to meet the target of operation frequency of the synthesized RTL implementation, we always use cycle time constraint to define the maximum combinational delay within a clock cycle. Considering there is a pair of nodes vi and vj, which are in sequential order, and the edge between them can be expressed as cp (vi, vj). Especially, the combinational path with the longest delay can be called critical combinational path ccp (vi, vj). So we know that D (ccp (vi, vj)) = max {D (cp (vi, vj))}. Suppose that the designed cycle time is Tclk, and the total delay between node pair vi and vj is D (ccp (vi, vj)) > Tclk, and then we can make the difference constraint formula as follows: in [1] Svbeg (vi) – svbeg (vj) ≤ - ( ⎡D (ccp (vi, vj))/ Tclk ⎤ ) – 1) Which means if the total delay for a pair of nodes is exceeding the designed cycle time, the combinational path should be partitioned into at least ( ⎡D (ccp (vi, vj))/ Tclk ⎤ ) number of clock cycles. In figure 3.5 is an example of cycle time constraints, where (b) gives the situation that ⎡D (ccp (vi, vj))/ Tclk ⎤ = ⎡2.1 ⎤ = 3, which is not accurate enough because svbeg (vi) – svbeg (vj) can be equal or smaller than -2.That is why we need to minus 1 from ⎡D (ccp (vi, vj))/ Tclk ⎤. 15 Figure 3.5 an example of cycle time constraints 3.1.3 Resource constraints Since the resource constraints problems are always NP-hard, we will heuristically convert the resource constraints into a set of difference constraints. In order to do that, we first introduce a set of linear orders. The linear order of the operations in quite important in our work, for instance, we obtain the linear order for each basic block by using ASAP (as- soon- as possible) as the primary key and ALAP (as-late-as-possible) as the equal breaking key. This is a list-scheduling-based ordering, while any kind of algorithms which can generate feasible linear orders can be used. Then, given a linear order for each basic block, we can check the availability of resources. Suppose that the number of available resources of type resk is c (resk), for any node pair vi and vj with res (vi) = res (vj) = c (resk), if there are c (resk)-1 nodes of resource type resk between node vi and node vj in the basic block, we can obtain the difference constraint as follows: in [1] Svbeg (vi) – svbeg (vj) ≤ - Latency (vi) Which means node vj should be scheduled in a separate state after vi, after adding this kind of constraints to every c (resk) nodes of the same type, we get c (resk) precedence chains among the operation nodes of type resk. In figure3.6, the nodes from A to B have been linearly ordered, A and B are of the same type add, the resource number of type add is 3, between them there exist (3 -1) nodes of type add, so node B should be scheduled in a separate state after node A. 16 Figure 3.6 an example of resource constraints 3.2 Finding objective functions From the concepts introduction part we know that system of difference constraints together with an linear objective function we can form a ILP problem, especially due to the property of SDC that the matrix of SDC is a totally unimodular matrix, the complexity of the relaxation of ILP can be dramatically reduced. In this section, our task is to find the objective function we want to optimize, and in the following we will see that with the constraints matrix the optimization can be handled. 3.2.1 ASAP and ALAP scheduling First of all, let’s see how to get ASAP scheduling using the following objective function: in [1] Min ∑v ∈ vop svbeg (v) The objective function states that if the sum of the starting scheduling of every operation node is minimized, then we can assume that every operation node just execute as soon as the earliest scheduling. It sounds like little bit strange because in practice we will not offer unconstrained resources to support every operation node executing as soon as possible. Similarly, we can optimize the reversal situation which is scheduling as late as possible using the following objective function: in [1] Max ∑v ∈ vop svbeg (v) 17 3.2.2 Optimizing longest path latency First of all, the longest path latency refers to the maximum clock cycles required to execute a simple path from entry to exit of the CDFG. The simple path means a path without repeated operations. As we already known in section 2.3, the value of the scheduling variable of a node represents the longest single path (maximum clock cycles) from initial state to the state where the node is executed in the final state machine. Thus we can obtain the longest path latency by using the scheduling variable of the super-sink of the exit basic block (exit-bb (G)). Min svend (ssnk (exit-bb (G))) 3.2.3 Optimizing expected overall latency Due to the fact that sometimes scheduling according to the result from optimizing longest path latency is not the optimal scheduling for the overall latency. Because the longest path just considers the simple path which doesn’t take into account the operations repeated (no loop consideration), we will face some extra latency when in the real case we should think about the loop effect. In Figure 3.7 (a) the longest path latency is 2, suppose that the loop will take place 50 times, and then both of s1 and s2 should be executed 50 times, so the final latency (overall latency) is 100 clock cycles. On the other hand, in Figure 3.7 (b) we reconstruct the schedule whose longest path latency is 3, nevertheless due to the integration of the loop the overall latency is just 52 clock cycles, which is much optimized than the first one. Therefore, we should also take overall latency into consideration. 18 Figure 3.7 two alternative schedules Since the actual execution process is highly dependent on the data in the real case, it’s hard to statically estimate the final latency in CDFG. So we are trying to approximate the latency by the way using a linear function of scheduling variables. Sometimes it’s very difficult to generate the objective function for the whole CDFG with complex structure, so we can use an iterative approach to walk along the loops and branches in a bottom-up manner. During a iteration, we calculate the inner-most loop expression, and then collapse those loops and obtain the linear equations dynamically after reach the outer-most loop. 3.3 LP problem solving and complexity analysis As we already stated in section 2.6, the process to solve an ILP problem is becoming easier when we add the property of SDC which says that the constraint matrix of SDC is a totally unimodular matrix. So the ILP relaxation guarantees optimal integer solutions in polynomial time. The calculation of the solution will not be listed here; it depends on the application of the SDC and the constraints given by the application, but we can obtain the time complexity in general sense, which is O (n2 (m + nlogn) logn) time [1]. Where m is the number of constraints, and n is the number of scheduling variables. After we obtain the result of solving this ILP problem, every scheduling variable get an integer value. Then we can translate these values into states of the finite state machine, which represents the controller of the design. 19 4. CONCLUSIONS As the complexity of high-level synthesis design is increasing, and the design is always a combination of computations, communications and controls, along with a wide range of constraints, the existing scheduling algorithms either not so efficient or do not support some kinds of constraints. For instance, DF-based scheduling can not solve control-flow-intensive design problems well; and some of the CF-based scheduling methods have an exponential time complexity in the worst case. In this report, we discussed a new algorithm SDC-based scheduling, which can convert a wide range of constraints into a system of difference constraints. We can also optimize the performance objective function by solving the LP problem efficiently because of the special property of the SDC constraints matrix. 5. REFERENCES [1] Jason Cong, Zhiru Zhang: “ An Efficient and Versatile scheduling algorithm based on SDC formulation”, DAC,2006 [2] Alan Sultan: “Linear programming, an introduction with applications”, Academic press, INC. 1993, S.363-366 [3] Geroge B. Dantzig, Mukund N.Thapa: “ Linear programming: introduction”, Hamiltom Printing Co., Rensselaer, NY.1997, S.63-71 [4] M.J.M.Heijligers: “The application of genetic algorithms to high-level Synthesis “, Technische Universiteit Eindhoven, 1996, S.3 [5] John P.Elliott: “Understanding behavioral synthesis, a practical guide to High-level design”, Kluwer Academic Publishers, 1999, S.7-9 20