High-Level Synthesis II

advertisement
2014-02-13
TDTS 01 Lecture 7
High-Level Synthesis II
Zebo Peng
Embedded Systems Laboratory
IDA, Linköping University
Lecture 7



Allocation and binding
Control unit synthesis
Advanced HLS issues
Zebo Peng, IDA, LiTH
2
TDTS01 Lecture Notes – Lecture 7
1
2014-02-13
Allocation and Binding

Allocation (unit selection) —— To determine the type and
number of hardware resources required, including
Functional units
Storage elements
Buses

Binding —— Assignment to resource instances:
Operations to functional unit instances
Values to be stored to instances of storage elements
Data transfers to bus instances

Allocation and binding generate the datapath of the design.
Zebo Peng, IDA, LiTH
3
TDTS01 Lecture Notes – Lecture 7
Allocation and Binding Principle
b c
a


d
s1
+ o1 +
e
f
o2
s2
+ o3
+
o4
g
h
b,e,g c,f,h
a
+1, +3
d
+2, +4
Resource sharing: Allow multiple non-concurrent operations to
share the same hardware as much as possible.
Optimization goal:
Minimize total cost of functional units, registers, bus drivers,
and multiplexers.
Minimize total interconnection length (placement info needed).
Constraint on critical path delay.
Zebo Peng, IDA, LiTH
4
TDTS01 Lecture Notes – Lecture 7
2
2014-02-13
Allocation/Binding — Approach 1

Constructive — start with an empty datapath and add
functional, storage and interconnection components as needed.
Greedy algorithms — perform allocation/binding for one control
step at a time.
a1
1
2
+
+
m1
a2
3
+ a3
+
+
a1, a3, a4
*
*
m2 *
a2
m1, m2
f
+ a4
Reg
f
 Rule-based –– used to select type and numbers of function
units, especially prior to scheduling.
Zebo Peng, IDA, LiTH
5
TDTS01 Lecture Notes – Lecture 7
Allocation/Binding — Approach 2

Graph-theoretical formulations — Sub-tasks are
mapped into well-defined problems in graph theory.
 Clique partitioning.
 Left-edge algorithm.
 Graph coloring.
Zebo Peng, IDA, LiTH
6
TDTS01 Lecture Notes – Lecture 7
3
2014-02-13
Clique Partitioning

another clique
G = (V, E), an undirected graph
with a set V of vertices and a set
E of edges.

A clique is a set of vertices that
form a complete subgraph of G.

The Clique Partitioning Problem:
To partition G into a minimal
number of cliques such that
each vertex belongs to exactly
one clique.
Zebo Peng, IDA, LiTH
a1
a2
a3
a4
a clique
A clique partitioning
example
7
TDTS01 Lecture Notes – Lecture 7
Allocation as Clique Partitioning
Functional unit allocation:
a1

Each vertex represents an
operation.
1

An edge connects two
vertices iff:
2
 The two operations are
scheduled into different
control steps, and
m1
m2 *
+ a4
a1
a3
8
f
a2
a4
Zebo Peng, IDA, LiTH
*
+ a3
a2 +
3
 There exists a functional
unit that is capable of
carrying out both
operations.
+
m1
m2
TDTS01 Lecture Notes – Lecture 7
4
2014-02-13
S. Allocation as Clique Partitioning

Storage allocation as a clique partitioning problem:
Each value needed to be stored is mapped to a vertex.
Two vertices are connected, iff the life-times of the two values
do not intersect.

The clique partitioning problem is NP-complete.

Efficient heuristics must be developed.
Ex. Tseng developed a polynomial time algorithm, based on
step-wise grouping, which generates very good results.
Zebo Peng, IDA, LiTH
9
TDTS01 Lecture Notes – Lecture 7
Tseng’s Algorithm

A super-graph is derived from the original graph.

Find two connected super-nodes such that they have
the maximum number of common neighbors.

Merge the two nodes and repeated from the first step,
until no more merger can be carried out.
V3
V4
Zebo Peng, IDA, LiTH
Common
Edge
V2
V1
V5
(V1,V3)
(V1,V4)
(V2,V3)
(V2,V5)
(V3,V4)
(V4,V5)
10
1
1
0
0
1
0
V2
V1
neighbors
V3
V4
V5
TDTS01 Lecture Notes – Lecture 7
5
2014-02-13
Tseng’s Algorithm (Cont’d)
V1
V2
S1-3
V3
(S1-3,V4) 0
(V2,V5) 0
(V4,V5) 0
V2
V4
S1-3-4
Zebo Peng, IDA, LiTH
V4
V5
Edge
(V2,V5)
V3
V3
V2
V5
V4
V1
V1
Common
neighbors
Edge
Common
neighbors
V3
V5
V2
V1
0
11
V4
V5
TDTS01 Lecture Notes – Lecture 7
Left-Edge (LE) Algorithm

Used in channel routing to minimize the number of
tracks used to connect points (layout design).
 To minimize the number of needed tracks.
 To reduce wire lengths.
 To avoid wire crossings.
Zebo Peng, IDA, LiTH
12
TDTS01 Lecture Notes – Lecture 7
6
2014-02-13
LE Algorithm for Reg. Allocation

Map birth time of a value to the left (top) edge, and its
death time to the right (down) edge of a wire.
i1
‘8’
i2
i3 i4
1
+
a
*
6
-
i5
‘4’
3
+
d
+
‘3’ ‘9’
b
*
9
+
c
*
o1
o2
Zebo Peng, IDA, LiTH
2
i3
i4
i5
‘7’
*
5
g
f
i2
4
e
8
i1
‘2’
a
d
b
f
e
g
‘8’
7
+
c
10
o3
o1
13
o2
o3
TDTS01 Lecture Notes – Lecture 7
The Left-Edge Algorithm
1.
The values are sorted in increasing order of their birth
times.
2.
The first value is assigned to the first register.
3.
The list is then scanned for the next value whose birth
time is equal to or larger than the death time of the
previous value.
4.
This value is assigned to the current register.
5.
The list is scanned until no more value can share the
same register.
6.
A new register is then introduced to hold the next value
in the sorted list, and the algorithm iterates from step 3.
Zebo Peng, IDA, LiTH
14
TDTS01 Lecture Notes – Lecture 7
7
2014-02-13
LE Algorithm Example
Original life-times
Sorted list based on birth times
Allocated registers
i1 i2 i3 i4 i5
i1 i2 i3 i4 i5
R1 R2 R3 R4 R5
i1 i2
a
d
b
f
a d
e
e
a
b f
g
g
b
c
c
f
o1
o2 o3
Zebo Peng, IDA, LiTH
15
i4
i5
d
e
g
c
o1
o2
o1
i3
o3
o2 o3
TDTS01 Lecture Notes – Lecture 7
LE Algorithm Discussions

The algorithm guarantees to allocate the minimum
number of registers.

However, it has two disadvantages:
Not all life-time table can be interpreted as intersecting
intervals on a line.
• Loop
• Conditional branches
The assignment is neither unique, nor necessarily optimal, in
terms of minimal number of multiplexers, for example.
Zebo Peng, IDA, LiTH
16
TDTS01 Lecture Notes – Lecture 7
8
2014-02-13
Allocation/Binding — Approach 3

Transformational allocation –– starting from an initial
allocation and binding, a final design is obtained by
successive transformations.
Usually it starts with a maximal allocation (each operation has
its dedicated physical unit).
The design is then improved by merging, step-by-step,
physical units so that hardware resources are shared as much
as possible.
Si
Si
Sj
+
+
Si,j
+
Sj
Zebo Peng, IDA, LiTH
17
TDTS01 Lecture Notes – Lecture 7
Lecture 7



Allocation and binding
Control unit synthesis
Advanced HLS issues
Zebo Peng, IDA, LiTH
18
TDTS01 Lecture Notes – Lecture 7
9
2014-02-13
Control-Unit Synthesis

Two basic approaches are widely used:
Microcode.
Hard-wired.

The basic assumptions:
A synchronous controller is used.
A schedule is given with the set of activation signals
• for enabling, multiplexer input selection, bus control,
etc.
The controller is modeled as a finite-state machine.
Zebo Peng, IDA, LiTH
19
TDTS01 Lecture Notes – Lecture 7
Microcoded Control Synthesis

To store the control information in an organized fashion.

A microcode ROM of size λ is used, where λ is the number of
schedule steps.

The ROM must have log2λ address bits (note: x denotes
the ceiling function).

A synchronous counter with a reset signal is used to address
the ROM.

The counter is controlled by the system clock.

The ROM contents can be implemented as horizontal or
vertical microcode.
Zebo Peng, IDA, LiTH
20
TDTS01 Lecture Notes – Lecture 7
10
2014-02-13
Horizontal Microcode
Each activation signal is associated to one bit of the word in
the microcode.
Address
Microwords

λ
Reset
Clock
00
01
10
11
11000101010
00100010101
00010000000
00001000000
Counter
Activation signals

The word length is usually much larger than λ, and the ROM has
therefore a width larger than its height.

Each bit is connected directly to an activation signal ── high
performance.

There are many zeros ── wasted storage resource.
Zebo Peng, IDA, LiTH
21
TDTS01 Lecture Notes – Lecture 7
Vertical Microcode

A fully vertical microcode encodes the n activation signals with
log2n bits to reduce the width of the ROM.
 Several words may be needed for a schedule step.
1 2 3 4 5 6 7 8 9 10 11 (n = 11)
11000101010
00100010101
00010000000
00001000000
Activation signals
0001
0010
0110
1000
1010
0011
0111
1001
1011
0100
0101
Decoder
Activation signals
Zebo Peng, IDA, LiTH
22
TDTS01 Lecture Notes – Lecture 7
11
2014-02-13
Vertical Microcode Issues

A decoder is needed, which can be implemented by
another ROM to form a two-stage control store.

Operation concurrency may not be fully supported.
0001
0010
0110
1000
1010
Reserve code-words for concurrent operations.
0011
0111
1001
1011
• e.g., using “1100” to denote activation of the first group
of activation signals.
0100
Vertical control schemes can be implemented by:

0101
Lengthening the schedule, or
Reading multiple ROM words in each step.
Decoder
Both have, however, some disadvantages.
Activation S.
Zebo Peng, IDA, LiTH
23
TDTS01 Lecture Notes – Lecture 7
Microcode Optimization


To find the shortest encoding of the words such that full
concurrency is preserved — the microcode compaction
problem (an intractable problem).
MC can be approached by partitioning the operations into
groups such that only one operation is active in each
group and therefore vertical encoding can be used in it.
1 2 3 4 5 6 7 8 9 0 1’
1 3 4 2 6 7 5 8 9 0 1’
A B
C
D
E
11000101010
00100010101
00010000000
00001000000
100
010
001
000
01
10
11
00
01
10
00
11
01
10
00
00
01
10
00
00
1
0
0
0
100
010
000
001
10
01
00
00
10
01
00
00
D1
1
0
0
0
D2 D3
D4
Activation signals
Zebo Peng, IDA, LiTH
24
TDTS01 Lecture Notes – Lecture 7
12
2014-02-13
Microcode Compaction


To minimize the number of groups.
Construct a conflict graph, where the vertices correspond
to the operations and the edges represent concurrency.
4
4
5
5
3
3
Coloring
2
1
6
6


2
1
A minimum coloring of this graph gives the minimum number
of groups needed.
Note: this does not necessarily lead to the minimum number
of word bits (e.g., 10 can be divided as 5+5, or 7+3).
Zebo Peng, IDA, LiTH
25
TDTS01 Lecture Notes – Lecture 7
Hard-Wired Control Synthesis

Generate a Moore-type finite-state machine from a schedule.
1,2,6,8,10
1 2 3 4 5 6 7 8 9 0 1’
11000101010
00100010101
00010000000
00001000000

S1
S2 3,7,9,11
Reset
5
S4
S3
4
Synthesize the FSM model.
Zebo Peng, IDA, LiTH
26
TDTS01 Lecture Notes – Lecture 7
13
2014-02-13
Lecture 7



Allocation and binding
Control unit synthesis
Advanced HLS issues
Zebo Peng, IDA, LiTH
27
TDTS01 Lecture Notes – Lecture 7
Advanced Issues of HLS

Many-to-many mapping between operations
and physical components.
+
x
Adder
Mult
+
x
ALU
-
Subs
Bit-width
compatibility
Adder

Re-use of previous designs (partial structure).

Synthesis with commercially available subsystems, IP-based synthesis.

HLS with testability consideration.
Zebo Peng, IDA, LiTH
28
TDTS01 Lecture Notes – Lecture 7
14
2014-02-13
Summary

High-level synthesis is one of the most important design
steps in the design process of electronic systems.

The use of efficient HLS tools has led to the great
improvement of design productivity.

The two most important tasks are scheduling and
allocation/binding, which are interdependent.

Controller design is also an important task, and its
interaction with datapath design should be considered.

The HLS tasks are usually formulated as optimization
problems and heuristic algorithms are used.
Zebo Peng, IDA, LiTH
29
TDTS01 Lecture Notes – Lecture 7
15
Download