High-Level Synthesis I H.L. Describe and Synthesize Zebo Peng Embedded Systems Laboratory

advertisement
1/29/2013
TDTS 01 Lecture 6
High-Level Synthesis I
Zebo Peng
Embedded Systems Laboratory
IDA, Linköping University
H.L. Describe and Synthesize
 Description of a design in terms of behavioral
specification.
 Refinement of the design towards an implementation
by adding structural details.
 Evaluation of the design in terms of a cost function
and the design is optimized for the cost function.
For I=0 To 2 Loop
Wait until clk’event
and clk=´1´;
If (rgb[I] < 248) Then
P=rgb[I] mod 8;
...
P
R
O
clk
enable
Zebo Peng, IDA, LiTH
2
TDTS01 Lecture Notes – Lecture 6
1
1/29/2013
Lecture 6
 Definition and motivation
 A typical HLS process
 Scheduling techniques
 Advanced scheduling issues
Zebo Peng, IDA, LiTH
3
TDTS01 Lecture Notes – Lecture 6
Definition
HLS generates automatically a register-transfer
level design from a behavioral specification.
HLS is also called architectural synthesis,
behavioral synthesis, or algorithmic synthesis.
Layout
Synthesis
For I=0 To 2 Loop
Wait until clk’event
and clk=´1´;
If (rgb[I] < 248) Then
P=rgb[I] mod 8;
...
Logic
Synthesis
RTL
Synthesis
HLS
p
High-Level
Synthesis
O
R
clk
enable
Zebo Peng, IDA, LiTH
4
TDTS01 Lecture Notes – Lecture 6
2
1/29/2013
Input to HLS
The behavioral specification.
Design constraints (timing, performance, cost,
power consumption, pin-count, testability, etc.).
An optimization (cost) function.
A module library representing the available
components.
For I=0 To 2 Loop
Wait until clk’event
and clk=´1´;
If (rgb[I] < 248) Then
P=rgb[I] mod 8;
...
HLS
p
O
R
clk
enable
Zebo Peng, IDA, LiTH
5
TDTS01 Lecture Notes – Lecture 6
Output of HLS
RTL implementation structure (netlist).
Controller (captured usually as a symbolic FSM).
 Other attributes, such as geometrical
information, to guide the back-end design tasks.
C1
For I=0 To 2 Loop
Wait until clk’event
and clk=´1´;
If (rgb[I] < 248) Then
P=rgb[I] mod 8;
...
HLS
p
Controller
C2
C3
O
R
clk
enable
Zebo Peng, IDA, LiTH
6
TDTS01 Lecture Notes – Lecture 6
3
1/29/2013
Goal of HLS
 To generate a RTL design that
 implements the specified behavior,
 satisfies the design constraints, and
 leads to the optimization of the given cost
function.
Ex. Cost =  DP-area +  Cont-area +  Exe-time
C1
For I=0 To 2 Loop
Wait until clk’event
and clk=´1´;
If (rgb[I] < 248) Then
P=rgb[I] mod 8;
...
Controller
C2
C3
HLS
p
O
R
clk
enable
Zebo Peng, IDA, LiTH
7
TDTS01 Lecture Notes – Lecture 6
Why HLS?
Efforts spent in system design are now dominating
the design cycle.
 Physical design → Logic design → System design.
Improve design productivity.
Increase re-targetability.
 New technology generations.
 ASICs vs. FPGAs.
 New platforms.
Suport design re-use.
Get the design correct the first time.
Zebo Peng, IDA, LiTH
8
TDTS01 Lecture Notes – Lecture 6
4
1/29/2013
Lecture 6
 Definition and motivation
 A typical HLS process
 Scheduling techniques
 Advanced scheduling issues
Zebo Peng, IDA, LiTH
9
TDTS01 Lecture Notes – Lecture 6
HLS Phase 1 — Specification
 Which language to use?
 Procedural languages
 Functional languages
 Graphics notations
 VHDL vs. Verilog vs.
SystemC
 Explicit parallelism or not?
 Requirement specification
 Timing
 Power consumption
PROCEDURE Test;
VAR
A,B,C,D,E,F,G:
integer;
BEGIN
Read(A,B,C,D,E);
F := E*(A+B);
G := (A+B)*(C+D);
...
END;
 Ares cost
…
Zebo Peng, IDA, LiTH
10
TDTS01 Lecture Notes – Lecture 6
5
1/29/2013
Phase 2 — Data-Flow Analysis
 Parallelism extraction.
PROCEDURE Test;
VAR
A,B,C,D,E,F,G:integer;
BEGIN
Read(A,B,C,D,E);
F := E*(A+B);
G := (A+B)*(C+D);
...
END;
 Eliminating high-level
language constructs
(procedure call, complex
operator, etc.).
 Program transformation (e.g,
loop invariant movement).
E
A
B
C
+
 Loop unrolling.
 Common sub-expression
detection.
D
+
*
*
 Dead code elimination.
F
Zebo Peng, IDA, LiTH
G
11
TDTS01 Lecture Notes – Lecture 6
Phase 3 — Operation Scheduling
To decide when to perform
each operation.
E
A
 Performance/cost tradeoff.
B
C
+
D
+
 Clocking strategy.
 Performance estimation.
Zebo Peng, IDA, LiTH
12
*
*
F
G
TDTS01 Lecture Notes – Lecture 6
6
1/29/2013
Phase 4 — Data-Path Allocation
E
To decide the basic datapath structure.
 Operator selection and
binding.
 Register/memory
allocation.
A
B
C
+
E<0:7> A<0:7> B<0:7>
D
+
*
*
F
G
Reg R4
 Interconnection
generation.
Reg R1 Reg R2
 Hardware minimization.
*
+
M
Reg R3
Zebo Peng, IDA, LiTH
13
partial data-path
TDTS01 Lecture Notes – Lecture 6
Phase 5 — Control Allocation
 Selection of control
style (PLA, micro-code,
random logic, etc.).
Controller description:
 Controller generation.
S1: Load R1 and R2
next S2;
E<0:7> A<0:7> B<0:7>
S2: Add, M=1, Load
R3 next S3;
S3: Load R4 next
S4;
S4: Mult, M=0, Load
R3 next S5;
...
Reg R4
Reg R1 Reg R2
*
+
M
Reg R3
Zebo Peng, IDA, LiTH
14
TDTS01 Lecture Notes – Lecture 6
7
1/29/2013
Phase 6 — Module Binding and Cont. Imp.
 Selection of physical modules.
 Decision on module parameters and constraints.
 Controller implementation.
A<0:7>
B<0:7>
Controller ROM:
M1
Reg R1
L1 Reg R2
Adder 8
Module
Lib
Zebo Peng, IDA, LiTH
Reg R3
L3
15
L2
0000
0001
0010
0011
:
:
:
:
11000000
00100000
00011000
01000000
0001
0010
0011
0100
...
TDTS01 Lecture Notes – Lecture 6
The Basic Issues
 Scheduling — Assignment of each operation to a
time slot corresponding to a clock cycle or time
interval.
 Resource Allocation — Selection of the types of
hardware components and the number for each
type to be included.
 Module Binding — Assignment of operation to the
allocated hardware components.
 Controller Synthesis — Design of control style and
clocking scheme.
Zebo Peng, IDA, LiTH
16
TDTS01 Lecture Notes – Lecture 6
8
1/29/2013
The Basic Issues (Cont’d)
 Compilation — To convert the input specification
into an internal representation.
 Parallelism Extraction — To extract the inherent
parallelism of the original solution.
 It is usually done with data flow analysis
techniques.
 Operation Decomposition — Implementation of
complex operations in the behavioral
specification.
Zebo Peng, IDA, LiTH
17
TDTS01 Lecture Notes – Lecture 6
Lecture 6
 Definition and motivation
 A typical HLS process
 Scheduling techniques
 Advanced scheduling issues
Zebo Peng, IDA, LiTH
18
TDTS01 Lecture Notes – Lecture 6
9
1/29/2013
The Scheduling Problems
Resource-constrained (RC) scheduling:
Given a set O of operations with a partial ordering,
a set K of functional unit types,
a type function, : O  K, to map the operations into the
functional unit types, and
resource constraints mk for each functional unit type.
Find a (optimal) schedule for the set of operations that
obeys the partial ordering and utilizes only the available
functional units.
Time-constrained (TC) scheduling
Zebo Peng, IDA, LiTH
19
TDTS01 Lecture Notes – Lecture 6
A RC-Scheduling Example
+
+
a := i1 + i2;
o1 := (a - i3) * 3;
o2 := i4 + i5 + i6;
d := i7 * i8;
g := d + i9 + i10;
o3 := i11 * 7 * g;
a
-
*
d
+
+
o2
*
+
*
g
*
o1
o3
1 adder, 1 multiplier
: +,-  Adder
*  Multiplier
Zebo Peng, IDA, LiTH
20
TDTS01 Lecture Notes – Lecture 6
10
1/29/2013
ASAP (As Soon As Possible) Sch.
 Sort operations topologically according to their dependence.
 Schedule operations in the sorted order by placing them in the
earliest possible control step.
1
-
1
6
+
+
2
3
7
4
+
*
+
*
5
8
9
*
*
1
10
2
+
3
4
-
5
*
4
6
9
+
7
+
8
*
7
Zebo Peng, IDA, LiTH
5
*
6
Sorted DFG
3
*
+
2
Control Steps
+
+
21
10
TDTS01 Lecture Notes – Lecture 6
ALAP (As Late As Possible) Sch.
 Sort operations topologically according to their dependence.
 Schedule operations in the reversed order by placing them in the
latest possible control step.
-
6
+
+
2
3
+
*
7
+
4
*
N-5 +
5
Control Steps
1
+
8
9
*
*
10
Sorted DFG
1
+
N-4
2
N-3
*
N-2 N-1
N
*
3
+
4
6
*
9
+
+
7
5
8
*
10
steps needed
needed = Optimum
66 steps
Zebo Peng, IDA, LiTH
22
TDTS01 Lecture Notes – Lecture 6
11
1/29/2013
List Scheduling
For each control step, the operations that are
available to be scheduled are kept in a list;
The list is ordered by some priority function:
The length of path from the operation to the end of
the block (critical path);
Mobility: the number of control steps from the
earliest to the latest feasible control step.
Each operation on the list is scheduled one by one
if the resources it needs are free; otherwise it is
deferred to the next control step.
Zebo Peng, IDA, LiTH
23
TDTS01 Lecture Notes – Lecture 6
List Scheduling Example
Schedulable operations
*
1
6
+
+
2
7
*
3
+
+
4
*
8
9
*
10
1
*
2
Sorted list: +1(2), *3(2), +4(2), +2(1), *5(1)
Zebo Peng, IDA, LiTH
+
5
Control Steps
+
1
24
4
-
5
*
6
+
4
+
8
*
5
2
+
3
3
6
9
+
7
*
10
TDTS01 Lecture Notes – Lecture 6
12
1/29/2013
Time-Constrained Scheduling
TC scheduling — to find a schedule for the operations
that obeys the partial ordering and utilizes the minimal
amount of hardware.
Force-Directed Method
 The basic idea is to balance the concurrency of
operations.
 ASAP and ALAP schedules, assuming no resource
constraints, are calculated to derive the time frames for
all operations.
 For each type of operations, a distribution graph is built
to denote the possible control steps for each operation.
If an operation could be done in k steps, then 1/k is
added to each of these k steps.
Zebo Peng, IDA, LiTH
25
TDTS01 Lecture Notes – Lecture 6
Force-Directed Scheduling Example
+
+
a1
a2
ALAP
+
*
+
a3
*
+
*
a1
a1
a2
*
+
a2
a3
a3
Distribution Graph for Addition
Time Constraint: 3 cycles
Zebo Peng, IDA, LiTH
The ideal distribution
ASAP
26
TDTS01 Lecture Notes – Lecture 6
13
1/29/2013
Force-Directed Scheduling Example
+
+
a1
a2
ALAP
+
*
+
a3
*
+
a1
The ideal distribution
ASAP
The algorithm tries to balance the
distribution graph by calculating the force
of each operation-to-control step
assignment and select the smallest force.
a1
a2
*
+
*
a2
a3
a3
Distribution Graph for Addition
Time Constraint: 3 cycles
Force(  ( oi )  sj )  DG ( sj ) 
Zebo Peng, IDA, LiTH
ALAP ( oi )
1
 DG(s)
T (oi ) s ASAP ( oi )
27
TDTS01 Lecture Notes – Lecture 6
Force-Directed Scheduling Example
+
+
a1
a2
ALAP
+
*
+
a3
*
+
*
a1
a1
a2
*
+
a2
a3
a3
Distribution Graph for Addition
Time Constraint: 3 cycles
Force(( a3)  c 2)  1.5 
Zebo Peng, IDA, LiTH
The ideal distribution
ASAP
The algorithm tries to balance the
distribution graph by calculating the force
of each operation-to-control step
assignment and select the smallest force.
1 ALAP ( a3)
DG ( s )  1.5  1  0.5

2 s ASAP ( a3)
28
TDTS01 Lecture Notes – Lecture 6
14
1/29/2013
Force-Directed Scheduling Example
+
+
a1
a2
ALAP
+
*
+
a3
*
+
*
a1
a1
a2
*
+
a2
a3
a3
The ideal distribution
ASAP
The algorithm tries to balance the
distribution graph by calculating the force
of each operation-to-control step
assignment and select the smallest force.
Distribution Graph for Addition
Time Constraint: 3 cycles
1 ALAP ( a3)
Force(( a3)  c3)  0.5 
DG ( s )  0.5  1  0.5

2 s ASAP ( a3)
Zebo Peng, IDA, LiTH
29
TDTS01 Lecture Notes – Lecture 6
Classification of Sch. Approaches, I
Constructive scheduling —
One operation is assigned to one control
step at a time and this process is iterated
from control step (operation) to control step
(operation).
 ASAP
 ALAP
 List scheduling
Zebo Peng, IDA, LiTH
30
TDTS01 Lecture Notes – Lecture 6
15
1/29/2013
Classification of Sch. Approaches, II
Global scheduling —
All control step and all operations are
considered simultaneously when
operations are assigned to control steps.
 Force-directed scheduling
 Neural net scheduling
 Integer Linear Programming algorithms
Zebo Peng, IDA, LiTH
31
TDTS01 Lecture Notes – Lecture 6
Classification of Sch. Approaches, III
 Transformational scheduling —
Starting from an initial schedule, a final schedule
is obtained by successively transformations.
Performance
Cost
Power
Testability
...
Transformation-Based Approach
Zebo Peng, IDA, LiTH
32
TDTS01 Lecture Notes – Lecture 6
16
1/29/2013
Lecture 6
 Definition and motivation
 A typical HLS process
 Scheduling techniques
 Advanced scheduling issues
Zebo Peng, IDA, LiTH
33
TDTS01 Lecture Notes – Lecture 6
Advanced Scheduling Issues
 Chaining and multi-cycling:
+
+
100ns
100ns
*
*
Two adders needed
+
200ns
50ns
100ns
+
*
+
A multi-cycle
multiplication
Complex in analysis,
verification and testing.
 Pipelined operations.
Zebo Peng, IDA, LiTH
+
Two chained
additions
34
TDTS01 Lecture Notes – Lecture 6
17
1/29/2013
Summary on Scheduling
 Scheduling has long been recognized as a very
important step in high-level synthesis, since it
determines the performance of the final design.
 A wide variety of algorithms exist in the literature for
efficiently performing the task of HLS scheduling.
 These algorithms can be classified in different ways:
 Objectives: Basic scheduling algorithms, Time
constrained, or Resource constrained.
 Approaches: Constructive, Global, or Transformational.
Zebo Peng, IDA, LiTH
35
TDTS01 Lecture Notes – Lecture 6
18
Download