2/10/2014 TDTS 01 Lecture 6 High-Level Synthesis I Zebo Peng Embedded Systems Laboratory IDA, Linköping University H.L. Describe and Synthesize Description of a design in terms of behavioral specification. Refinement of the design towards an implementation by adding structural details. Evaluation of the design in terms of a cost function and the design is optimized for the cost function. For I=0 To 2 Loop Wait until clk’event and clk=´1´; If (rgb[I] < 248) Then P=rgb[I] mod 8; ... P R O clk enable Zebo Peng, IDA, LiTH 2 TDTS01 Lecture Notes – Lecture 6 1 2/10/2014 Lecture 6 Definition and motivation A typical HLS process Scheduling techniques Advanced scheduling issues Zebo Peng, IDA, LiTH 3 TDTS01 Lecture Notes – Lecture 6 Definition HLS generates automatically a register-transfer level design from a behavioral specification. HLS is also called architectural synthesis, behavioral synthesis, or algorithmic synthesis. Layout Synthesis For I=0 To 2 Loop Wait until clk’event and clk=´1´; If (rgb[I] < 248) Then P=rgb[I] mod 8; ... Logic Synthesis RTL Synthesis HLS p High-Level Synthesis O R clk enable Zebo Peng, IDA, LiTH 4 TDTS01 Lecture Notes – Lecture 6 2 2/10/2014 Input to HLS The behavioral specification. Design constraints (timing, performance, cost, power consumption, pin-count, testability, etc.). An optimization (cost) function. A module library representing the available components. For I=0 To 2 Loop Wait until clk’event and clk=´1´; If (rgb[I] < 248) Then P=rgb[I] mod 8; ... HLS p O R clk enable Zebo Peng, IDA, LiTH 5 TDTS01 Lecture Notes – Lecture 6 Output of HLS RTL implementation structure (netlist). Controller (captured usually as a symbolic FSM). Other attributes, such as geometrical information, to guide the back-end design tasks. C1 For I=0 To 2 Loop Wait until clk’event and clk=´1´; If (rgb[I] < 248) Then P=rgb[I] mod 8; ... HLS p Controller C2 C3 O R clk enable Zebo Peng, IDA, LiTH 6 TDTS01 Lecture Notes – Lecture 6 3 2/10/2014 Goal of HLS To generate a RTL design that implements the specified behavior, satisfies the design constraints, and leads to the optimization of the given cost function. Ex. Cost = DP-area + Cont-area + Exe-time C1 For I=0 To 2 Loop Wait until clk’event and clk=´1´; If (rgb[I] < 248) Then P=rgb[I] mod 8; ... Controller C2 C3 HLS p O R clk enable Zebo Peng, IDA, LiTH 7 TDTS01 Lecture Notes – Lecture 6 Why HLS? Efforts spent in high-level design tasks are now dominating the design process. Physical design → Logic design → High-level design. Improve design productivity. Increase re-targetability. New technology generations. ASICs vs. FPGAs. New platforms. Support design re-use. Get the design correct the first time. Zebo Peng, IDA, LiTH 8 TDTS01 Lecture Notes – Lecture 6 4 2/10/2014 Lecture 6 Definition and motivation A typical HLS process Scheduling techniques Advanced scheduling issues Zebo Peng, IDA, LiTH 9 TDTS01 Lecture Notes – Lecture 6 HLS Phase 1 — Specification Which language to use? Procedural languages Functional languages Graphics notations VHDL vs. Verilog vs. SystemC Explicit parallelism or not? Requirement specification Timing Power consumption PROCEDURE Simple; VAR A,B,C,D,E,F,G: integer; BEGIN Read(A,B,C,D,E); F := E*(A+B); G := (A+B)*(C+D); ... END; Ares cost … Zebo Peng, IDA, LiTH 10 TDTS01 Lecture Notes – Lecture 6 5 2/10/2014 Phase 2 — Data-Flow Analysis Parallelism extraction. PROCEDURE Simple; VAR A,B,C,D,E,F,G:integer; BEGIN Read(A,B,C,D,E); F := E*(A+B); G := (A+B)*(C+D); ... END; Eliminating high-level language constructs (procedure call, complex operator, etc.). Program transformation (e.g, loop invariant movement). E A B C + Loop unrolling. Common sub-expression detection. D + * * Dead code elimination. F Zebo Peng, IDA, LiTH G 11 TDTS01 Lecture Notes – Lecture 6 Phase 3 — Operation Scheduling To decide when to perform each operation. E A Performance/cost tradeoff. B C + D + Clocking strategy. Performance estimation. Zebo Peng, IDA, LiTH 12 * * F G TDTS01 Lecture Notes – Lecture 6 6 2/10/2014 Phase 4 — Data-Path Allocation E To decide the basic datapath structure. Operator selection and binding. Register/memory allocation. A B C + E<0:7> A<0:7> B<0:7> D + * * F G Reg R4 Interconnection generation. Reg R1 Reg R2 Hardware minimization. * + M Reg R3 Zebo Peng, IDA, LiTH 13 partial data-path TDTS01 Lecture Notes – Lecture 6 Phase 5 — Control Allocation Selection of control style (PLA, micro-code, random logic, etc.). Controller description: Controller generation. S1: Load R1 and R2 next S2; E<0:7> A<0:7> B<0:7> S2: Add, M=1, Load R3 next S3; S3: Load R4 next S4; S4: Mult, M=0, Load R3 next S5; ... Reg R4 Reg R1 Reg R2 * + M Reg R3 Zebo Peng, IDA, LiTH 14 TDTS01 Lecture Notes – Lecture 6 7 2/10/2014 Phase 6 — Module Binding and Cont. Imp. Selection of physical modules. Decision on module parameters and constraints. Controller implementation. A<0:7> B<0:7> Controller ROM: M1 Reg R1 L1 Reg R2 Adder 8 Module Lib Zebo Peng, IDA, LiTH Reg R3 L3 15 L2 0000 0001 0010 0011 : : : : 11000000 00100000 00011000 01000000 0001 0010 0011 0100 ... TDTS01 Lecture Notes – Lecture 6 The Basic Issues Scheduling — Assignment of each operation to a time slot corresponding to a clock cycle or time interval. Resource Allocation — Selection of the types of hardware components and the number for each type to be included. Module Binding — Assignment of operation to the allocated hardware components. Controller Synthesis — Design of control style and clocking scheme as well as the generation of the controller. Zebo Peng, IDA, LiTH 16 TDTS01 Lecture Notes – Lecture 6 8 2/10/2014 The Basic Issues (Cont’d) Compilation — To convert the input specification into an internal representation. Parallelism Extraction — To extract the inherent parallelism of the original solution. It is usually done with data flow analysis techniques. Operation Decomposition — Implementation of complex operations by simple operators available in the module library. Zebo Peng, IDA, LiTH 17 TDTS01 Lecture Notes – Lecture 6 Lecture 6 Definition and motivation A typical HLS process Scheduling techniques Advanced scheduling issues Zebo Peng, IDA, LiTH 18 TDTS01 Lecture Notes – Lecture 6 9 2/10/2014 The Scheduling Problems Resource-constrained (RC) scheduling: Given (1) a set O of operations with a partial ordering, (2) a set K of functional unit types, (3) a type function, : O K, to map the operations into the functional unit types, and (4) resource constraints mk for each functional unit type. Find a (optimal) schedule for the set of operations that (1) obeys the partial ordering, and (2) utilizes only the available functional units. Time-constrained (TC) scheduling Zebo Peng, IDA, LiTH 19 TDTS01 Lecture Notes – Lecture 6 A RC-Scheduling Example + + a := i1 + i2; o1 := (a - i3) * 3; o2 := i4 + i5 + i6; d := i7 * i8; g := d + i9 + i10; o3 := i11 * 7 * g; a - * d + + o2 * + * g * o1 o3 1 adder, 1 multiplier : {+, -} Adder {*} Multiplier Zebo Peng, IDA, LiTH 20 TDTS01 Lecture Notes – Lecture 6 10 2/10/2014 ASAP (As Soon As Possible) Sch. Sort operations topologically according to their dependence. Schedule operations, one by one, in the sorted order by placing them in the earliest possible control step. 1 - 1 6 + + 2 3 7 4 + * + * 5 8 9 * * 1 10 2 + 3 4 - 5 * 4 6 9 + 7 + 8 * 7 Zebo Peng, IDA, LiTH 5 * 6 Sorted DFG 3 * + 2 Control Steps + + 21 10 TDTS01 Lecture Notes – Lecture 6 ALAP (As Late As Possible) Sch. Sort operations topologically according to their dependence. Schedule operations in the reversed order by placing them, one by one, in the latest possible control step. - 6 + + 2 3 + * 7 + 4 * N-5 + 5 Control Steps 1 + 8 9 * * 10 Sorted DFG 1 + N-4 2 N-3 * N-2 N-1 N * 3 + 4 6 * 9 + + 7 5 8 * 10 steps needed needed = Optimum 66 steps Zebo Peng, IDA, LiTH 22 TDTS01 Lecture Notes – Lecture 6 11 2/10/2014 List Scheduling For each control step, the operations that are available to be scheduled are kept in a list; The list is ordered by some priority function, e.g.: The length of path from the operation to the end of the block (critical path); Mobility: the number of control steps from the earliest to the latest feasible control step. Each operation on the list is scheduled one by one if the resources it needs are free; otherwise it is deferred to the next control step. Zebo Peng, IDA, LiTH 23 TDTS01 Lecture Notes – Lecture 6 List Scheduling Example 1 * 1 6 + + 2 7 * 3 + + 4 * 8 9 * 10 * 2 Sorted list: +1(2), *3(2), +4(2), +2(1), *5(1) Zebo Peng, IDA, LiTH 1 5 Control Steps + + 24 4 - 5 * 6 + 4 + 8 * 5 2 + 3 3 6 9 + 7 * 10 TDTS01 Lecture Notes – Lecture 6 12 2/10/2014 Time-Constrained Scheduling TC scheduling — to find a schedule for the operations that obeys the partial ordering and utilizes the minimal amount of hardware. Force-Directed Method The basic idea is to balance the concurrency of operations. ASAP and ALAP schedules, assuming no resource constraints, are calculated to derive the time frames for all operations. For each type of operations, a distribution graph is built to denote the possible control steps for each operation. If an operation could be done in k steps, then 1/k is added to each of these k steps. Zebo Peng, IDA, LiTH 25 TDTS01 Lecture Notes – Lecture 6 Force-Directed Scheduling Example + + a1 a2 ALAP + * + a3 * + * a1 a1 a2 * + a2 a3 a3 Distribution Graph for Addition Time Constraint: 3 cycles Force( ( oi ) sj ) DG ( sj ) Zebo Peng, IDA, LiTH The ideal distribution ASAP The algorithm tries to balance the distribution graph by calculating the force of each operation-to-control step assignment and selecting the smallest one. 26 ALAP ( oi ) 1 DG(s) T (oi ) s ASAP ( oi ) TDTS01 Lecture Notes – Lecture 6 13 2/10/2014 Force-Directed Scheduling Example + + a1 a2 ALAP + * + a3 * + a1 a1 a2 * + * a2 a3 a3 The distribution ASAP The algorithm tries to balance the distribution graph by calculating the force of each operation-to-control step assignment and selecting the smallest one. Distribution Graph for Addition Time Constraint: 3 cycles 1 ALAP ( a3) Force(( a3) c 2) 1.5 DG ( s ) 1.5 1 0.5 2 s ASAP ( a3) Zebo Peng, IDA, LiTH 27 TDTS01 Lecture Notes – Lecture 6 Force-Directed Scheduling Example + + a1 a2 ALAP + * + a3 * + * a1 a1 a2 * + a2 a3 a3 The final distribution ASAP The algorithm tries to balance the distribution graph by calculating the force of each operation-to-control step assignment and selecting the smallest one. Distribution Graph for Addition Time Constraint: 3 cycles 1 ALAP ( a3) Force(( a3) c3) 0.5 DG ( s ) 0.5 1 0.5 2 s ASAP ( a3) Zebo Peng, IDA, LiTH 28 TDTS01 Lecture Notes – Lecture 6 14 2/10/2014 Classification of Sch. Approaches, I Constructive scheduling — One operation is assigned to one control step at a time and this process is iterated from control step (operation) to control step (operation). ASAP ALAP List scheduling Zebo Peng, IDA, LiTH 29 TDTS01 Lecture Notes – Lecture 6 Classification of Sch. Approaches, II Global scheduling — All control step and all operations are considered simultaneously when operations are assigned to control steps. Force-directed scheduling Neural net scheduling Integer Linear Programming algorithms Zebo Peng, IDA, LiTH 30 TDTS01 Lecture Notes – Lecture 6 15 2/10/2014 Classification of Sch. Approaches, III Transformational scheduling — Starting from an initial schedule, a final schedule is obtained by successively transformations. Performance Cost Power Testability ... Transformation-Based Approach Zebo Peng, IDA, LiTH 31 TDTS01 Lecture Notes – Lecture 6 Lecture 6 Definition and motivation A typical HLS process Scheduling techniques Advanced scheduling issues Zebo Peng, IDA, LiTH 32 TDTS01 Lecture Notes – Lecture 6 16 2/10/2014 Advanced Scheduling Issues Chaining and multi-cycling: + + 100ns 100ns * * Two adders needed + 200ns 50ns 100ns + * + A multi-cycle multiplication -> Complex in analysis, verification and testing. Pipelined operations. Zebo Peng, IDA, LiTH + Two chained additions 33 TDTS01 Lecture Notes – Lecture 6 Summary on Scheduling Scheduling has long been recognized as a very important step in high-level synthesis, since it determines the performance of the final design. A wide variety of algorithms exist in the literature for efficiently performing the task of HLS scheduling. These algorithms can be classified in different ways: Objectives: basic scheduling algorithms, resource constrained, or time constrained. Approaches: constructive, global, or transformational. Zebo Peng, IDA, LiTH 34 TDTS01 Lecture Notes – Lecture 6 17