Lecture 7

advertisement
ECE 506
Reconfigurable Computing
Lecture 7
FPGA Placement
Placement
° VLSI Design Flow
• Objective:
-
Minimize total chip area,
Sustain routable circuit within timing budget
° FPGA Flow
• Area fixed
• Objective:
-
-
Assign LUTs in the netlist to available logic blocks in the
array within utilization and performance constraints
(Interconnect)
Locate functional blocks such that the interconnect
required to route the signals between them is minimized.
• Target Architecture determines the cost function
Placement algorithm
° two basic inputs:
• netlist with functional blocks and
connections between them
• device map (architecture)
° algorithm selects a legal location
for each block such that the
circuit wiring is optimized.
Significance of Placement
° Good placement is extremely important
• sets constraints for routability
• even if the circuit does route, a poor placement will
still lead to a lower maximum operating speed and
increased power consumption.
° Finding a good placement is challenging
• A large commercial FPGA contains over 500,000
functional blocks,
-
500,000! Possible placements.
• Exhaustive evaluation is therefore impossible.
• Placement is a computationally hard problem,
-
no known algorithm that produces optimal results in
practical central processing unit (CPU) time.
• Development of fast and effective heuristic
placement algorithms is a critical research area.
Device Legality Constraints
° All resources are prefabricated in an FPGA
• leads to a variety of placement legality constraints:
° A legal placement must place a functional block
only in a location on the chip that can
accommodate it.
• RAM block must be placed in a RAM location, and a
lookup table (LUT) must be placed in a LUT location.
° Some groups of functional blocks must be placed
in a specific relative orientation to make use of
special, dedicated routing resources.
• arithmetic logic cells—to use the dedicated carrychain hardware, the logic cells forming a carry chain
must be placed adjacent to each other in the
sequence required by the carry structure.
FPGA Placement Constraints
° FPGA interconnect is prefabricated,
• Amount of interconnect in each region of a device is
fixed
° Routing congestion
• When the interconnect demand approaches or
exceeds the fabricated wiring capacity in some part of
the FPGA.
• A placement that requires more interconnect in a
device region than that region contains cannot be
routed
FPGA Placement Constraints
° Stratix-II is an island-style FPGA that contains
routing segments that span 4, 16, and 24 logic
blocks.
• Programmable switches allow routing segments in the
same direction (horizontal or vertical) to be connected
at their endpoints to create longer routes.
• Other programmable switches allow some horizontal
routing segments to connect to vertical routing
segments where they cross and vice versa.
X
Y
Length 4
Length 2
Length 1
Placement Objective– Routability Driven
° Create a placement that minimizes the total
interconnect required,
° Increase the probability of successful routing
° Consequently, some routability-driven placement
algorithms minimize not only the total wiring
required by the design but also the amount of
routing congestion.
Placement Objective – Timing Driven
° In addition to optimizing for routability, timingdriven algorithms use timing analysis
• to identify critical paths and/or connections
• to optimize the delay of those connections.
° Most delays in an FPGA are due to the
programmable interconnect
• timing-driven placement can achieve a large
improvement in circuit speed over routability-driven
approaches.
Level of Control on Placement
° Commercial FPGA placement tools allow
designers to control the placement
° Common types of placement directives.
° 1) Exact location of a block
• The most restrictive
• Typical uses
-
to lock down the design I/Os at the locations required by the
circuit board or to lock down the elements of a performancecritical intellectual property (IP) core.
° 2) Area specific
• less restrictive
• forces blocks to go into a specific 2D area,
• allows a designer to guide the placement tool
Level of Control on Placement
° 3) Relative location
• specify the relative location of several blocks,
• placement tool chooses exactly where to locate the
block group.
• Typical use
-
for library components where a designer knows a good
placement of the component blocks relative to each other.
° 4) Floating region
• specifies that some logic should be placed within a
tight region
• placement tool can choose where that region should
be on the device.
Placement Algorithms
• Constructive methods:
- Begin from netlist and generate an initial placement.
- Partitioning method: Mincut
- First address placement of partitions individually
– Significant amount of reduction in search space
- Then address placement of partitions relative to each
other
- Not suitable for FPGAs
– Especially island style FPGA with limited routing resources
– Method postpones the impact of inter-partition connections
– Leads to increased demand on routing tracks
Placement
• Placement has a set of competing goals.
• Can’t optimize locally and globally simultaneously.
• Use heuristic approaches to evaluate quality.
A
B
C
D
A
B
E
1
C D
2
F
LUT1
LUT2
E
Getting Stuck with Local Minima
• pick a random starting point
• repeatedly swap,
• if the new state has a lower cost, it is accepted,
• otherwise the current state is retained.
• greedily accept good moves
• Problem: large number of local minima
• circuit placed as shown at left, is in a local minima.
• No swap of logic or I/O functions will reduce the total
wirelength.
Technology Mapping to Placement
Mapping onto 5-LUT
Technology Mapping to Placement
Iterative Placement Algorithms
° Iterative improvement
• Begin with random or constructive placement.
• Iterate to improve it.
• Pairwise interchange
• Hill climbing
- To avoid getting trapped in local minima, consider “hillclimbing” approach
- Need to accept worse solutions or make “bad” moves to get
global minima.
- Acceptance is probabalistic. Only accept cost-increasing
moves some of the time.
Iterative Placement Algorithms
° Methods
• Force-directed methods (classical mechanics)
- Force vector computed on each module corresponding to all
nets
- Solve set of non-linear differential equations.
– FD relaxation
– FD pairwise exchange
• Simulated annealing (statistical mechanics)
- Model a physical annealing process which optimizes
energy.
- Similar to “quenching” metal.
- Generates best results
- Can be time consuming
• Macro-based approaches
- Genetic algorithms
Physical Annealing
• Take a metal and heat to high temperature
• Allow it to cool slowly; metal is annealed to a low
temperature
• Atoms in the metal are at lower energy states after annealing
• Higher the temperature initially and slower the cooling, the
tougher the metal becomes.
• Atoms transition to high energy states and then move to low
energy.
Simulated Annealing
• Optimization strategy based on physical annealing process
• Generate random moves.
- Initially, accept moves that decrease and
increase cost.
• As temperature decreases, the probability of accepting bad
moves decreases.
• Eventually, default to greedy algorithm
Only accept positive moves
Determine when to terminate.
Simulated Annealing
Bounding Box and Cost Function
° Bounding box
underestimates wirelength
• q(n) is compensation factor
-
q is 1 for 3- and 2-terminal nets
increases to 2.79 for 50 terminal
nets
• Cav is channel capacity
(tracks) in x and y directions
over the bounding box of net n
-
-
penalizes placements which
require more routing in areas of
the FPGA that have narrower
channels.
However, Cav is constant since
channel width is fixed for island
style FPGA
Placement Flow
Wire length measures
° Estimate wire length by distance
between components.
° Possible distance measures:
•
Euclidean distance (sqrt(x2 + y2));
•
Manhattan distance (x + y).
° Multi-point nets must be broken up
into trees for good estimates.
Euclidean
Manhattan
Weighted Graph -> Distance Table
° Geometric Distance NOT Accurate !!!
° Need Weighted Graph
• Cost of Routing Resources
° Finding Shortest Path at Each Step of
Annealing costly
• Need for Lookup Table
Simulated Annealing – Moves per iteration
Moves_per_iteration = BN4/3
•
•
N = # of logic blocks and I/O pads
B = scaling factor
Simulated Annealing – Swapping Range
• Swap distance is adjusted
based on the acceptance
rate as well.
• Initially set to entire FPGA
• As T drops, distance drops.
Simulated Annealing
• New T depends on the
fraction of attempted moves
that were accepted.
• Reduces rapidly when
acceptance rate is high
• When the temperature is
less than a small fraction of
the average cost of a net, it
is unlikely that any move
that results in a cost
increase will be accepted,
so we terminate the anneal.
Annealing Criteria
• Contemporary FPGA packages use the following
parameters:
1. Starting temp – 20 * stand_dev(cost of N swaps)
2. Cost function – weighted sum of wire length and
delay
3. Inner loop – B * N4/3
• Beta cost function
4. Stopping criteria –
• T < [.005 * Cost/Nnets]
Strengths of SA making it suitable for FPGA
° Can enforce all the legality constraints imposed
by the FPGA architecture fairly directly
• By forbidding the creation of illegal placements in the
move generator
• By adding a penalty cost to illegal placements.
° Can directly model the impact of the FPGA
routing architecture on circuit delay and routing
congestion
• By creating an appropriate cost function
Download