ECE 506 Reconfigurable Computing Lecture 7 FPGA Placement Placement ° VLSI Design Flow • Objective: - Minimize total chip area, Sustain routable circuit within timing budget ° FPGA Flow • Area fixed • Objective: - - Assign LUTs in the netlist to available logic blocks in the array within utilization and performance constraints (Interconnect) Locate functional blocks such that the interconnect required to route the signals between them is minimized. • Target Architecture determines the cost function Placement algorithm ° two basic inputs: • netlist with functional blocks and connections between them • device map (architecture) ° algorithm selects a legal location for each block such that the circuit wiring is optimized. Significance of Placement ° Good placement is extremely important • sets constraints for routability • even if the circuit does route, a poor placement will still lead to a lower maximum operating speed and increased power consumption. ° Finding a good placement is challenging • A large commercial FPGA contains over 500,000 functional blocks, - 500,000! Possible placements. • Exhaustive evaluation is therefore impossible. • Placement is a computationally hard problem, - no known algorithm that produces optimal results in practical central processing unit (CPU) time. • Development of fast and effective heuristic placement algorithms is a critical research area. Device Legality Constraints ° All resources are prefabricated in an FPGA • leads to a variety of placement legality constraints: ° A legal placement must place a functional block only in a location on the chip that can accommodate it. • RAM block must be placed in a RAM location, and a lookup table (LUT) must be placed in a LUT location. ° Some groups of functional blocks must be placed in a specific relative orientation to make use of special, dedicated routing resources. • arithmetic logic cells—to use the dedicated carrychain hardware, the logic cells forming a carry chain must be placed adjacent to each other in the sequence required by the carry structure. FPGA Placement Constraints ° FPGA interconnect is prefabricated, • Amount of interconnect in each region of a device is fixed ° Routing congestion • When the interconnect demand approaches or exceeds the fabricated wiring capacity in some part of the FPGA. • A placement that requires more interconnect in a device region than that region contains cannot be routed FPGA Placement Constraints ° Stratix-II is an island-style FPGA that contains routing segments that span 4, 16, and 24 logic blocks. • Programmable switches allow routing segments in the same direction (horizontal or vertical) to be connected at their endpoints to create longer routes. • Other programmable switches allow some horizontal routing segments to connect to vertical routing segments where they cross and vice versa. X Y Length 4 Length 2 Length 1 Placement Objective– Routability Driven ° Create a placement that minimizes the total interconnect required, ° Increase the probability of successful routing ° Consequently, some routability-driven placement algorithms minimize not only the total wiring required by the design but also the amount of routing congestion. Placement Objective – Timing Driven ° In addition to optimizing for routability, timingdriven algorithms use timing analysis • to identify critical paths and/or connections • to optimize the delay of those connections. ° Most delays in an FPGA are due to the programmable interconnect • timing-driven placement can achieve a large improvement in circuit speed over routability-driven approaches. Level of Control on Placement ° Commercial FPGA placement tools allow designers to control the placement ° Common types of placement directives. ° 1) Exact location of a block • The most restrictive • Typical uses - to lock down the design I/Os at the locations required by the circuit board or to lock down the elements of a performancecritical intellectual property (IP) core. ° 2) Area specific • less restrictive • forces blocks to go into a specific 2D area, • allows a designer to guide the placement tool Level of Control on Placement ° 3) Relative location • specify the relative location of several blocks, • placement tool chooses exactly where to locate the block group. • Typical use - for library components where a designer knows a good placement of the component blocks relative to each other. ° 4) Floating region • specifies that some logic should be placed within a tight region • placement tool can choose where that region should be on the device. Placement Algorithms • Constructive methods: - Begin from netlist and generate an initial placement. - Partitioning method: Mincut - First address placement of partitions individually – Significant amount of reduction in search space - Then address placement of partitions relative to each other - Not suitable for FPGAs – Especially island style FPGA with limited routing resources – Method postpones the impact of inter-partition connections – Leads to increased demand on routing tracks Placement • Placement has a set of competing goals. • Can’t optimize locally and globally simultaneously. • Use heuristic approaches to evaluate quality. A B C D A B E 1 C D 2 F LUT1 LUT2 E Getting Stuck with Local Minima • pick a random starting point • repeatedly swap, • if the new state has a lower cost, it is accepted, • otherwise the current state is retained. • greedily accept good moves • Problem: large number of local minima • circuit placed as shown at left, is in a local minima. • No swap of logic or I/O functions will reduce the total wirelength. Technology Mapping to Placement Mapping onto 5-LUT Technology Mapping to Placement Iterative Placement Algorithms ° Iterative improvement • Begin with random or constructive placement. • Iterate to improve it. • Pairwise interchange • Hill climbing - To avoid getting trapped in local minima, consider “hillclimbing” approach - Need to accept worse solutions or make “bad” moves to get global minima. - Acceptance is probabalistic. Only accept cost-increasing moves some of the time. Iterative Placement Algorithms ° Methods • Force-directed methods (classical mechanics) - Force vector computed on each module corresponding to all nets - Solve set of non-linear differential equations. – FD relaxation – FD pairwise exchange • Simulated annealing (statistical mechanics) - Model a physical annealing process which optimizes energy. - Similar to “quenching” metal. - Generates best results - Can be time consuming • Macro-based approaches - Genetic algorithms Physical Annealing • Take a metal and heat to high temperature • Allow it to cool slowly; metal is annealed to a low temperature • Atoms in the metal are at lower energy states after annealing • Higher the temperature initially and slower the cooling, the tougher the metal becomes. • Atoms transition to high energy states and then move to low energy. Simulated Annealing • Optimization strategy based on physical annealing process • Generate random moves. - Initially, accept moves that decrease and increase cost. • As temperature decreases, the probability of accepting bad moves decreases. • Eventually, default to greedy algorithm Only accept positive moves Determine when to terminate. Simulated Annealing Bounding Box and Cost Function ° Bounding box underestimates wirelength • q(n) is compensation factor - q is 1 for 3- and 2-terminal nets increases to 2.79 for 50 terminal nets • Cav is channel capacity (tracks) in x and y directions over the bounding box of net n - - penalizes placements which require more routing in areas of the FPGA that have narrower channels. However, Cav is constant since channel width is fixed for island style FPGA Placement Flow Wire length measures ° Estimate wire length by distance between components. ° Possible distance measures: • Euclidean distance (sqrt(x2 + y2)); • Manhattan distance (x + y). ° Multi-point nets must be broken up into trees for good estimates. Euclidean Manhattan Weighted Graph -> Distance Table ° Geometric Distance NOT Accurate !!! ° Need Weighted Graph • Cost of Routing Resources ° Finding Shortest Path at Each Step of Annealing costly • Need for Lookup Table Simulated Annealing – Moves per iteration Moves_per_iteration = BN4/3 • • N = # of logic blocks and I/O pads B = scaling factor Simulated Annealing – Swapping Range • Swap distance is adjusted based on the acceptance rate as well. • Initially set to entire FPGA • As T drops, distance drops. Simulated Annealing • New T depends on the fraction of attempted moves that were accepted. • Reduces rapidly when acceptance rate is high • When the temperature is less than a small fraction of the average cost of a net, it is unlikely that any move that results in a cost increase will be accepted, so we terminate the anneal. Annealing Criteria • Contemporary FPGA packages use the following parameters: 1. Starting temp – 20 * stand_dev(cost of N swaps) 2. Cost function – weighted sum of wire length and delay 3. Inner loop – B * N4/3 • Beta cost function 4. Stopping criteria – • T < [.005 * Cost/Nnets] Strengths of SA making it suitable for FPGA ° Can enforce all the legality constraints imposed by the FPGA architecture fairly directly • By forbidding the creation of illegal placements in the move generator • By adding a penalty cost to illegal placements. ° Can directly model the impact of the FPGA routing architecture on circuit delay and routing congestion • By creating an appropriate cost function