11_T_Placement

Temporal Placement Wasted Resources • • Wasted resource wr(vi) of a node vi:  Unused area occupied by the node vi during the computation of a partition. wr(vi) = (t(Pi)−ti)×ai, t(Pi): run-time of partition Pi. ti: run-time of the component vi ai: area of vi Wasted resource avoidance:  A component is placed on the chip only when its computation is required  and remains on the device only for time it is active. −  Idle components can be replaced by new ones. • Assumptions:  There is partial reconfiguration support.  Blocks are hard − For soft library modules, temporal partitioning + 2d placement may suffice. 2 Temporal Placement • • Temporal placement:  Time-dependent placement  Management of task execution at run-time Graphical Representation:  Z axis: life time of a configuration.  Some overlaps (resource sharing) in 2-D is allowed: − If not at the same time.  A horizontal cut: RFU configuration at a given time 3 Temporal Placement • Off-line (compile time)  Pre-defined placement sequence − In DSP applications, program flow can be predicted. • On-line (during execution)  Computation sequence is not known in advance. −  Dynamically at run time.  Needs to solve computational intensive problems in a fraction of millisecond: − Efficient management of the free space − Selection of the best site for a new component − Management of communication 4 Off-Line Temporal Placement • Off-line temporal placement:  Given a DFG G = (V,E) and a reconfigurable device H (with length Hx and width Hy), a temporal placement is a 3-dimensional vector function: p = (px, py, pt) : V → R3, pt defines a feasible schedule (pt(vi): Starting time of vi.) px(vi), py(vi): Coordinates of vi on H, 5 Off-Line Temporal Placement [Bazargan] 6 Off-Line Temporal Placement • Algorithms: 1. First fit 2. Best fit 3. Packing 4. Simulated annealing (KAMER) 7 First/Best Fit Algorithm • First fit: 1. Select a node among ready nodes. − i.e. nodes whose predecessors have been placed • • 2. Place it in the first free location. Advantages:  Fast (linear w.r.t. number of free locations) Disadvantages:  Unused resources very high  Fragmentation 8 First/Best Fit Algorithm • • • Best fit: 1. Select a node among ready nodes. 2. Place it in the best free location. Advantages:  Better area utilization Disadvantages:  Much slower: CLBs Frames − Complete list of free locations must be searched. •  Fragmentation Simplifying the problem:  Restricting the modules to columns (Virtex II) or part of a column (Virtex 4 / Virtex 5 / Virtex 6). 9 First Fit Algorithm • Algorithm: First-fit temporal placement of clusters 1. While all the clusters are not placed do 1.1. Select one cluster Cact from the list of ready-to-run clusters 1.2. From the clusters already placed, select the cluster Ctop with the smallest finishing-time such that Ctop ≤ Cact 1.3. Place the cluster Cact on top of Ctop 2. end while 10 First Fit Algorithm • Configuration sequence produced: a) {0, 1, 2, 3, 4} b) {5, 1, 2, 3, 4} c) {5, 1 , 6, 7, 4} d) {5, 1 , 6, 7, 8} e) {5, 9 , 6, 7, 8} f) {10, 9 , 6, 7, 8} 11 ILP • ILP:  Can construct a constraint set  An objective function • Advantage:  Exact solution • Limitation:  Only for small size problems. 12 Packing Approach • Variations of Packing Problem: 1. Base Minimization Problem (BMP):  Given a set of boxes B and a height H, find a container with minimal size (x, y, H) that can accommodate the set of boxes B  i.e. in temporal placement: − 2. Strip Packing Problem (SPP):  Given a set of boxes and a base (X, Y), find the minimum height h container with size (X, Y, h) that can hold all the boxes.  i.e. in temporal − • Find the device with minimum size (x, y) on which a set of components can be implemented given an overall run-time t = H. Find the minimum run-time t = h for a set of components given a reconfigurable device with size (X, Y). The device is usually fixed   SPP is more common. 13 KAMER KAMER Model of an RCS • Larger picture:  An RFU operation ri can be either accepted or rejected − based on availability of RFU real-estate  If rejected, run on CPU −  running time penalty • Assumption:  No communications among RFUs  After running an operation on RFU, results are saved in CPU registers 15 KAMER Model of an RCS • On-line temporal placement • RFUOPS = {r1, …, rn| ri = (wi, hi, si, ei)}  ei - si : time span the operation is resident in the system • Placement of RFUOPS:  Modeled as 3D template placement • Empty Rectangle (ER):  A rectangle that does not overlap a placed module on the chip • Maximum Empty Rectangle (MER):  An ER not included in any other rectangle than the device bounding box 16 Example • Non-ER:  (E,F) • ER:  (E,D): MER  (A,D): MER  (E,C): not MER • KAMER method:  keeping all maximum empty rectangles: − Permanently keeps track of all MERs − Whenever a request for placing a component v arrives, the list of MERs is searched for a rectangle that can accommodate v.  Possible to have many MERs in which v can fit   Strategies: first-fit or best-fit 17 KAMER  Once a rectangle is chosen: − Candidate points are those that do not allow an overlap with the external part of the rectangle. • Problem:  Number of empty rectangles does not grow linear with the number of components included 18 KAMER  Run-time of free rectangle placement: O(n2) • Heuristic:  Keep only the non-overlapping empty rectangles −  Linear time, lower quality • Problem 1:  Several non-overlapping rectangle representations 19 Keeping Non-Overlapping ERs • Problem 2:  Non-overlapping empty rectangles are not necessarily maximal −  A module may exists that could fit onto the device, but cannot be placed Can’t fit Can fit (bad decomposition) 20 Keeping Non-Overlapping ERs • Problem 2: Can’t fit Can fit 21 Keeping Non-Overlapping ERs • Incremental update:  Whenever a new component v1 is placed in a rectangle, two possibilities: − Horizontal splitting − Vertical splitting  Negative impact on next components 22 Keeping Non-Overlapping ERs • Horizontal Can’t fit • Vertical •Solution: Delaying the split decision for a number of steps later Can fit 23 Keeping Non-Overlapping ERs  KAMER always finds a rectangle to place a new component, if one exists.  But position to place the component within the rectangle must be selected from a set of points − whose number is the area of the rectangle in worst case • In [Bazargan00]:  Bottom-left position − Note: No communication is assumed between the rectangle to be placed and those already placed. • If communication exists:  An optimal algorithm should consider any single point • Solution:  [Ahmadinia04] 24 KAMER Off-Line Placement • Uses simulated annealing in a 3D space: 1. Places X% largest modules − X is a parameter of the algorithm − Idea: Remove small modules −  Open space for larger ones −  Small modules fragment more  Initial placement: from KAMER- best fit  Perturb: − Can displace a module on the RFU − No change in start or end time − Can move an operation from CPU to RFU and vice versa 2. Use zero-temperature SA for fine tuning  Place as many (100-X)% modules as it can 25 References  [Bobda07] C. Bobda, “Introduction to Reconfigurable Computing: Architectures, Algorithms and Applications,” Springer, 2007.  [Hauck08] S. Hauck, A. DeHon, "Reconfigurable Computing: The Theory and Practice of FPGA-Based Computation" MorganKaufmann, 2007  [Mehdipour06] F. Mehdipour*, M. Saheb Zamani, M. Sedighi, “An integrated temporal partitioning and physical design framework for static compilation of reconfigurable computing systems,” Journal of Microprocessors and Microsystems, Elsevier, v30, 2006, pp. 52– 62.  [Bazargan00] K. Bazargan, R. Kastner and M. Sarrafzadeh, "Fast Template Placement for Reconfigurable Computing Systems," IEEE Design and Test - Special Issue on Reconfigurable Computing, January-March 2000.  [Bazargan] Bazargan, “Fast Placement Methods for Reconfigurable Computing Systems,” http://www.ece.umn.edu/~kia/Research/Pre2000/RCSPlace/  [Ahmadinia04] A. Ahmadinia, C. Bobda, M. Bednara, and J. Teich, “A new approach for on-line placement on reconfigurable devices,” in Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS) / Reconfigurable Architectures Workshop (RAW), 2004. 35

11_T_Placement

Related documents

Products

Support

11_T_Placement

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib