iwls05-phy_aware_data_comm

advertisement
Physically Aware Data
Communication Optimization for
Hardware Synthesis
Ryan Kastner, Wenrui Gong,
Xin Hao, Forrest Brewer
Adam Kaplan, Philip Brisk and
Majid Sarrafzadeh
Dept. of Electrical and
Computer Engineering
University of California,
Santa Barbara
Computer Science Department
University of California,
Los Angeles
Hardware Compilation
Application specified
in high level language
HDL
Compiler
(behavioral,
structural)



We focus our efforts on mapping an
application written in a high-level language
to a hardware description
We desire this mapping to have optimal
characteristics (area, latency, etc.)
In this talk, we focus on the problem of
minimizing data communication in the final
hardware
Synthesis
and
Physical
Design
Chip, bitstream, …
Obligatory Design Flow Slide
SUIF:
Syntactic &
Semantic
Analysis
Application
Specification
CFG Entity
5. Create CFG
interface
AST
Machine
SUIF:
Compiler
Backend
SSA
CDFG
6. Determine structural control
and data communication
between basic block entities
7. Generate synthesizable
RTL code
entity cfg is
…
architecture behavioral of cfg
…
2. Transform instruction
list to dataflow graph
*
+
+
*
Entity 3
Entity 4
1. Create interface
+
Entity 1
Entity 2
Basic Block Entity
4. Synthesize behavioral
HDL code to RTL code
Behavioral Synthesis
8. Synthesize RTL code
Logical & Physical Synthesis
3. Transform dataflow graph
to behavioral HDL code
entity basic_block is
…
architecture behavioral of
basic_block
…
Characterizing Data Communication
 Examples
of data communication schemes
Control
Node 1
Control
Node 1
Memory
(Register
Bank,
RAM)
Bus
Control
Node 2
Control
Node 3
Control
Node 4
Distributed
Data communication = wire
Control
Node 2
Control
Node 3
Control
Node 4
Centralized
Data communication = memory access
Identifying Data Communication
 Determine
relationship between place(s)
where data is defined and where data is used
a…
b…
a…
b…
c…
c
a…
b
a
 Naïve
method: all use-points
of a variable depend on all
definitions of that variable
 Not all use points “use” a
variable
 Global Data Communication
= 5 variables
Need analysis to minimize the
amount of data communication
Use of SSA in Compilation
Must determine relationship between
where data is generated and where data
is used
 Problem formulations

[DAC02]: Minimize the total number of
bits communicated between all pairs of
control nodes
 Today: Minimize overall wirelength


SSA (Static Single Assignment)
Changes each variable to have a unique
definition point
 Must add -nodes to merge definitions
a 1
…
…
b1
…
…
a2
…
…
b2
…
…
c1
…
…
cc1

a3
…
…

bb1
a4  (a2,a3)

 a4
Physically Aware Compiler Transforms
 Consider
layout information during compilation
 Modify transforms to consider physical info
 Ideal: full physical synthesis – extremely
accurate, but way too time consuming
application
 ApproximateLet’s
using floorplanning
Get Physical!
 Much
faster
 Gives “good enough” high level
physical picture
 Previous
 No
Hardware
Compilation
data communication work
physical information
 Can lead to negative results
Physical
FloorSynthesis
planner
Physically Aware Data Communication

Modify placement of Φ-functions to consider wirelength
-Placement Algorithm
1.
Given a CFG Gcfg(Vcfg, Ecfg)
2.
perform_ssa(Gcfg)
3.
calculate_def_use_chains(Gcfg)
4.
remove_back_edges(Gcfg)
5.
topological_sort(Gcfg)
6.
foreach vertex v  Vcfg
7.
foreach -node  v
8.
s  .sources
9.
d  |def_use_chain(.dest)|
10.
IDF  iterated_dominance_fronter(s)
11.
PossiblePlacements 
findPlacementOptions(IDF)
12.
place() 
selectBest(PossiblePlacements)
13.
distribute/duplicate  to place()
FindPlacementOptions
Algorithm
1. Given a set of CFG Nodes R
2. -options  
3.
insert(R) into-options
4.
foreach instruction i  R
5.
6.
if( i is a destination of -function f )
return -options
7.
temp_-options  
8.
foreach non-dominated child c of R
9.
temp_-options 
crossProductJoin(temp__options,
findPlacementOptions(c))
10. return-options  temp_-options
Algorithm in Action
 FAST
function from MediaBench testsuite
N3
F
T
nn_4, i_2
T
nn_5, i_3
F
N9
Algorithm in Action
N3
F
T
nn_4, i_2
N3
T
nn_5, i_3
F
F
N9
T
T
nn_4, i_2 nn_5, i_3
F
N9
Full Floorplanning Results
iterative approach Spectacularly negative results
Floorplan Wirelength
1000
3.
Full
100
Physical
FloorSynthesis
10
planner
4.
in t
ern
al_
ex
pa
nd
od
er
ad
pc
m_
de
c
ad
pc
m_
co
de
r
1
benchmark
de
t
2.
4T
R
10000
Initial optimization minimizes
data communication
Full SA based floorplanning
Reoptimization based to
minimize floorplanning
Full SA based floorplanning
FR
Hardware
100000
Compilation
WL (first)
WL (second)
FA
ST
1.
1000000
in t
ern
al_
fi lt
er
wirelength (logarithmic)
10000000
co
mp
re
ss
_o
De
utp
co
ut
de
_M
PE
G2
De
_In
co
tra
de
_B
_M
l oc
PE
k
G2
_N
on
_In
tra
_B
loc
de
k
co
de
_m
ot
ion
_v
ec
tor
 Simple
Incremental Floorplanning
 Incremental
Placement [Coudert et al]:
floorplan and a set of changes to
an optimized placement
(e.g.due
duetototechnology
-functionremapping)
movement) modify
the modules
netlist (e.g.,
the placement
floorplan to improve it.
 Given
 Equally
applicable to floorplanning
Modified Floorplan
Initial Floorplan
2
2
1
6
3
1
Perturbations
6
3
1
4
4
6
6
Our Incremental Floorplanner
Initial Floorplan
Modified Floorplan
2
2
6
3
1
Perturbations
3
1
4
4
6
6
Incremental Floorplan
2
| 32/36 3
5/5.6 -
-
1
11/12.4 2/2.3 -
2
-
27/30.4
6
4 16/18 -
Incremental
Floorplanner
1
4
3
9/10.1 6
Our Incremental Floorplanner
1.
2.
Calculate area & room of each node: bottom up slicing tree
traversal
Area redistribution
Simple, yet effective


Top down traversal
Other more complicated
Increase area if necessary
Not enough space at root algorithms
Aspect ratios become too distorted


Modified Floorplan
might work better
Incremental Floorplan
2
2
| 32/36 3
1
3
4
5/5.6 -
-
1
11/12.4 2/2.3 -
2
-
27/30.4
6
4 16/18 -
1
4
3
9/10.1 6
MediaBench Functions
Benchmark
Blocks

Links
Weight
Initial WL
1
adpcm
coder
33
31
54
2688
35568
2
adpcm
decoder
26
23
44
1952
21588
3
internal
filter
10
143
60
17088
411637
4
Internal
expand
101
94
257
14336
317031
5
compress
output
34
17
60
2368
29114
6
mpeg2dec
block
62
13
66
2272
34510
7
mpeg2dec
vector
16
4
26
1024
4366
8
FAST
14
4
15
704
3714
9
FR4TR
77
87
155
704
340697
10
det
12
5
13
7936
3772
Incremental Floorplanning Results
Normalized Wirelength
1.2
Initial
Overall Optimal
Overall Incremental
Phi Optimal
Phi Incremental
1
0.8
0.6
0.4
“Optimal” Approach:
12% Overall Wirelength Reduction
25% Phi-node Wirelength Reduction
Our Approach:
6% Overall Wirelength Reduction
8% Phi-node Wirelength Reduction
0.2
0
1
2
3
4
5
6
Benchmarks
7
8
9
10
avrg
Related Work

Hardware compilation projects using SSA





Physically aware behavioral synthesis techniques






PDG+SSA form [UCSB]
CASH [CMU]
SA-C [UCR]
Sea Cucumber [BYU]
SA for scheduling, binding and floorplanning [Prabhakaran97]
SA for binding and floorplanning [Yung-Ming94]
Scheduling, allocation and binding [Dougherty00]
Fasolt: bus topology [Knapp92]
High level synthesis [Tarafdar00]
Incremental CAD


Problem overview/challenges [Coudert00]
Floorplanning [Crenshaw99]
Conclusions
 It’s
been a long strange trip…
 SSA a
nice IR for hardware compilation
 Explicitly
shows data flow
 Useful for exploiting parallelism
 Compiler
techniques applied to hardware design
can reduce wirelength
 They
must be aware of physical information
 They must use an incremental floorplanning
Download