Component Based Design using Constraint Programming for Module Placement on FPGAs Alexander Wold, Dirk Koch, Jim Torresen Department of Informatics, University of Oslo, Norway Email: {alexawo,koch,jimtoer}@ifi.uio.no Abstract—Constraint satisfaction modeling is both an efficient, and an elegant approach to model and solve many real world problems. In this paper, we present a constraint solver targeting module placement in static and partial run-time reconfigurable systems. We use the constraint solver to compute feasible placement positions. Our placement model incorporates communication, implementation variants and device configuration granularity. In addition, we model heterogeneous resources such as embedded memory, multipliers and logic. Furthermore, we take into account that logic resources consist of different types including logic only LUTs, arithmetic LUTs with carry chains, and LUTs with distributed memory. Our work targets state of the art field-programmable gate arrays (FPGAs) in both design-time and run-time applications. In order to evaluate our placement model and module placer implementation, we have implemented a repository containing 200 fully functional, placed and routed relocatable modules. The modules are used to implement complete systems. This validates the feasibility of both the model and the module placer. Furthermore, we present simulated results for run-time applications, and compare this to other state of the art research. In run-time applications, the results point to improved resource utilization. This is a result of using a finer tile grid and complex module shapes. Keywords—Constraint programming, FPGA, relocatable modules, reconfigurable architectures, partial run-time reconfiguration. I. I NTRODUCTION Run-time reconfigurable systems have been an active research field since the first FPGAs were introduced. At that time, research into run-time reconfiguration was motivated by the fact that FPGA devices were too small to fit many designs [1]. However, FPGA vendors have been quick to exploit advances in semiconductor process technology. This has allowed larger device sizes implementing more complex designs. The methodology used to design FPGA-based systems however, has not seen the same advances. On large devices, implementation of systems with current FPGA tools can take significant time. For example, to place and route complex systems implemented on large devices can take several days [2]. A component-based design flow has been proposed to address long place and route times [3]. Here, fully placed and routed modules are used to implement the system. In addition, using a component-based design flow to implement run-time reconfigurable systems can yield improvements in terms of area and power [4]. Placement positions are computed either at design time (i.e. offline) or at run-time, depending on the application. The work presented in this paper targets static componentbased designs and run-time reconfigurable systems. The contribution of this paper is the improvement in modeling module placement with realistic real world constraints. In particular, we improve upon previous work in the field by taking into account that logic resources are heterogeneous. In Xilinx FPGAs, there are different types of logic resources: logic only look up tables (LUTs), arithmetic LUTs with carry chains, and LUTs with distributed memory. In addition, as in previous work in the area [5], we model other heterogeneous resources such as embedded memory, multipliers and logic. Optimal packing of shapes into a bin is a NP-hard problem [6]. However, given the constraints present in modern FPGA devices, the solution space is reduced. Therefore, in this paper, we investigate the scalability of using constraint programming to place modules given real world constraints. Module placement is constrained by heterogeneous resources, configuration granularity, communication and available free area. The solution space is also determined by the device size, number of modules, and module layout variants. With the given constraints, modules can no longer be freely placed on the reconfigurable device. This is exploited in our implementation to compute feasible module placements meeting all constraints. The computed module locations are used to implement complete systems. In this work, we have extended our previous model [17] and improved the implementation in the following areas: • Heterogeneous logic resources (in addition to other heterogeneous resources such as memory and multipliers) are taken into consideration when placing modules. • Bus-based and module-to-module communication can be modeled. • Configuration aware placement based on the addressing granularity of the FPGA. 978-1-4673-6180-4/13/$31.00 ©2013 IEEE Table I. OVERVIEW OF RELATED WORK Authors Design-/ Run-time Placement Style Module layout Implementation variants Heterogeneous resources Communication Configuration granularity Bazargan and Sarrafzadeh [7] Both 2D Rectangular No No No No Fekete, Kohler and Teich [8] Design time 2D Rectangular No No No No Yuh, Yang and Chang [9] Design time 2D Rectangular No No Yes No Singhal and Bozorgzadeh [10] Design time 2D Rectangular No No No Yes Singhal and Bozorgzadeh [5] Design time 2D Rectangular No Yes No No Monotone, Sciuto and Santambrogio [11] Design time 2D Rectangular No Yes Yes Yes Belaid, Muller and Benjemaa [12] Design time 2D Rectangular No Yes No No Koester, Porrmann and Kalte [13] Run-time 1D/2D Rectangular No Yes No NA for 1D Koester, Luk, Kalte, Hagemeyer and Porrmann [14] Run-time 2D Rectangular Yes Yes Yes Yes Angermeier, Sibirko Wildermann and Teich [15] Run-time 2D Rectangular No Yes Yes Yes Feng and Mehta [16] Design time 2D Arbitrary No Yes No No Wold, Koch and Torresen [17] Design time 2D Arbitrary Yes Yes No No This work Both 2D Arbitrary Yes Yes Yes Yes The remainder of the paper is organized as follows: background and related work are presented in the following section. Section III describes the placement model. In Section IV, we present the module placer implementation. This is followed by experimental results in Section V, and a conclusion in Section VI. II. BACKGROUND AND RELATED WORK A large amount of previous research has been undertaken in the area of module placement targeting reconfigurable devices. An overview of related work is presented in Table I. The different approaches can be classified by their properties as presented in the following sections: A. Module Placement Algorithms In [7]–[10], module placement is approached as a fully flexible 2D packing problem. As stated in the introduction, finding the optimal solution is known to be a NP-hard problem [6]. Heuristics developed for module placement are typically the same as that of generic 2D bin packing problems. A survey on placement algorithms is presented in [18]. Besides heuristics, exact placement techniques and combinations of exact techniques and heuristics have been proposed. This includes ILP (integer linear programming) [19] and constraint programming. B. Module Implementation Variants In [16], floorplanning for static only systems without presynthesized and implemented modules were examined. This achieves higher resource utilization compared to models considering presynthesized and implemented modules. This is a consequence of fragmentation that occurs when fitting functions in module bounding boxes. One method to increase resource utilization when placing presynthesized and implemented modules is to use implementation variants [14]. An implementation variant can be a different layout (i.e. a different shape of the module) or a different design with the same functionality. Implementation variants improve the average resource utilization as modules can be placed with less fragmentation. However, implementation variants increase the search space. This results in longer execution time for the placer as reported in [17]. For online module placement, implementation variants are selected at run-time. Storing several module variants requires more memory in the embedded system. However, not all possible variants that are available at design time will be used by the actual system at run-time. Consequently, only selected modules have to be stored in the system. Table II. Type SliceM SliceL SliceX Logic Yes Yes Yes Table III. S LICES ON S PARTAN -6 X ILINX FPGA S Carry chains Distributed Memory Yes Yes No CLBXL Yes No No CLBXM SLICEX Shift Register LUT Yes No No LUT FF LUT Type Logic Carry chains Distributed Memory Shift Register SliceM SliceL Yes Yes Yes Yes Yes No Yes No C. Heterogeneous Resources As observed by Montone et al. [11], when the placement is more constrained (i.e. more realistically modeled), the results appear less impressive. For example, homogeneous models [7]–[10] typically achieve better packing results than heterogeneous models [5], [11]–[17]. However, most FPGAs today are heterogeneous devices. These devices have different resources which are limiting the placement flexibility. Several different methods have been proposed for heterogeneous module placement. Koester [13], [14], takes heterogeneous resources into account by determining the likelihood for placement positions on an FPGA. By assigning a higher cost to likely placement positions, the overall module reject rate is improved in a run-time reconfigurable system. All related publications considering heterogeneous resources, consider logic primitives as a single resource type. However, on Xilinx FPGAs, there are different types of logic primitives, called configurable logic blocks (CLBs). For example, on Spartan-6 FPGAs, a CLB consists of two slices [20], [21]. There are three different types of slices, SliceM, SliceL and SliceX. Table II and Table III show the different Slices in Spartan-6 and Virtex-6 devices. SliceM can be used as logic, distributed memory, or as a shift register. In addition, SliceM has carry chains for arithmetic operations. SliceL can be used as logic, and has carry chains. SliceX does not have carry chains. A CLB (on a Spartan-6 device) contains one SliceX and one SliceM, or SliceL, as depicted in Figure 1. A SliceM can act as a SliceL and SliceX. A SliceL can act as a SliceX. Therefore, we can relocate modules using SliceL to SliceM positions. The opposite is however, not possible. Therefore, we have refined our model to take the different logic primitives available on recent FPGAs into account. D. Communication Infrastructure For run-time reconfigurable systems, different communication architectures have been proposed [15], [22]–[24]. This includes buses, network on chip (NoC), and point-topoint connections. The main difference between a traditional approach and a communication architecture for componentbased design (and for partial reconfiguration), is to provide compatible interfaces to different modules. This is particularly FF FF FF SLICEL S LICES ON V IRTEX -6 X ILINX FPGA S SLICEX SLICEM LUT RAM SRL FF Carry Logic FF Carry Logic FF FF Figure 1. Two CLBs consisting of SliceX and SliceL, and SliceX and SliceM, respectively. challenging with relocatable modules to be connected when placed in different locations, as modeled in this work. Systems providing regularly structured communication architectures have been presented in [15], [23], [24]. In these systems, the communication architectures are built from preallocated resources that provide a compatible interface for several tiles within a reconfigurable region. In this work, we use GoAhead [25] to implement the communication infrastructure. FPGA vendor tools (e.g., Xilinx PlanAhead) do not target systems with many relocatable modules. However, both OpenPR [26] and GoAhead [25] target such systems. The latter one can generate regularly structured communication architectures. In addition, GoAhead allows very fine grained tiling of the reconfigurable resources. Moreover, it can be used to generate complex shaped modules as depicted in Figure 2. This reduces fragmentation and allows implementation variants. E. Configuration Granularity In run-time reconfigurable systems, routing modules online is infeasible. This is because of long execution time and memory requirements incurring run-time overhead. This is commonly implemented by tiling a reconfigurable area and using a communication architecture, as described in the previous section. In this case, no additional routing steps are needed after placing a module. This implies that modules are not freely relocatable (i.e. in terms of CLBs on Xilinx FPGAs), but relocatable on a coarser grained tile grid. When defining a tile grid for recent Xilinx FPGAs, the grid should take into account that the smallest addressable reconfigurable unit is a configuration frame. A configuration frame contains the configuration data for a set of vertically aligned resources (e.g. 16 CLBs, 4 BRAMs, or 4 DSP blocks on Spartan-6 FPGAs). The configuration frame is atomically updated (i.e. the whole frame is activated in one operation). In order to prevent hazards due to reconfiguration, two modules must not share the same frame. As a consequence, the tile grid should be coarsely grained in vertical direction with respect to the frame granularity. In horizontal direction, a fine tile grid is preferable, as this reduces internal fragmentation. The tile grid is applicable to 2D floorplanners where dynamic partial reconfiguration is modeled [5], [7]–[12]. Even when considering the very fine grained configuration capabilities of Altera Stratix V FPGAs, where the configuration granularity can be as little as an individual LUT [27], reconfigurable systems will typically be coarser tiled than what is available by the underlying FPGA fabric. III. R0 (x1 = 32, y1 = 210) R1 (x1 = 56, y1 = 210) P LACEMENT MODEL Using set theory, we formulate the placement model as follows: A module, M , consist of a sequence of one or more layouts, M = {L1 , ..., Ln }. A layout, L is an implementation variant of the module, consisting of a sequence of one or more resources, L = {R1 , ..., Rn }. A resource, R, represents the physical resource required by the implementation alternative. R is defined as a bounding box, R(x0 , y0 , x1 , y1 , k, c), in which x and y represent the bounding box. k is a finite sequence where its elements denote the type and order of resource primitives (i.e. actual resource primitives on the FPGA) the module requires. c is a set defining compatible communication interfaces. The model allows a module to be represented with complex layout. In Figure 2, a fully functional polyomino shaped module has been placed on a Xilinx Virtex-6 FPGA. The module is defined (using FPGA coordinates) as follows: R0 (x0 = 13, y0 = 0, x1 = 32, y1 = 210, k = {CLEXL, ..., DSP }), M0 = (1) R (x = 32, y0 = 126, x1 = 56, y1 = 210, 1 0 k = {CLEXM, ..., CLEXM }) Similarly the module definition, the FPGA-area is modeled as the sequence of resources, F P GA = {R1 , ..., Rn }. The FPGA-area can also be of complex shape. For example, an FPGA with a non-reconfigurable region can be modeled. We use this to accurately model the reconfigurable resources in Xilinx Virtex-6 (xc6vlx240) device as depicted in Figure 2. Montone et al. [11] and Smith et al. [19] establish real world constraints present in modern FPGA devices. This includes constraints such as intersection between modules (i.e. preventing multiple modules to use the same area). Using constraint programming syntax ( [28]), we define the intersection (between R bounding boxes) constraint as follows: R0 (x0 = 13, y0 = 0) CLEXM CLEXL R1 (x0 = 32, y0 = 126) BRAM DSP Figure 2. A relocatable module placed on a part of a Xilinx Virtex-6 (xc6vlx240tff784-3) FPGA. Used resources within the bounding box (gray area) are depicted in black. The remaining resources within the bonding box are unused (i.e. internal fragmentation). The figure is generated from the XDL file of a fully functional, placed and routed, SHA-256 module. GoAhead was used to constraint both logic and routing resources to within the bounding boxes depicted in the figure. The module took 18 minutes to implement (synthesis, mapping and place and route) from VHDL source code on a Xeon X5690. sequence, Rk , of the FPGA model. The constraint can only be met when the module is placed within the reconfigurable region. Thus, this constraint implicitly constrains the module to within the bounding box of the reconfigurable region. We formulate this as follows: {(L(Rk ), F P GA(Rk )|L(Rk ) == F P GA(Rk )} (3) For communication we specify a producer/consumer constraint as follows: {(Ln (Rc ), Lm (Rc ))|Ln (Rc ) == Lm (Rc ))} (4) This abstract constraint definition allows any type of communication to be modeled where a module communications with another module or the static part of the system. {((Ax0 , Ax1 , Ay0 , Ay1 ), (Bx0 , Bx1 , By0 , By1 ))| Ax0 < Bx1 , Ax1 > Bx0 , Ay0 < By1 , Ay1 > By0 } (2) The resource constraint is satisfied when the resource sequence, Rk , of the layout is a subsequence of the resource IV. C ONSTRAINT S OLVER I MPLEMENTATION We have implemented the model, with the flow as depicted in Figure 3. The module placer is implemented as a constraint solver which computes feasible placement positions for Input Constraint program Constraint computation FPGA Model Constraint solver Relocatable modules Constraints Figure 3. Constraint program generation Constraint program Computed placement positions The figure depicts the components used in the constraint solver to compute feasible floorplans. relocatable modules. Computing feasible placement positions consist of executing a search and evaluating whether the constraints are met. If all constraints are met for a module at a placement location, the placement location is feasible. If all constraints are met for all modules, the a feasible solution exists. State 0 State 1 State 2 State 5 State 3 State 6 State 4 State 7 A. Constraint Evaluation We implement two types of constraints: static and dynamic constraints. 1) Static Constraints: Static constraints are constraints which, when evaluated, always yield the same result. We evaluate the static constraints before the search is executed. This removes part of the search space, and avoids constraint recomputation during search. For example, FPGAs often have a heterogeneous fabric with resources of the same type in rows over the device. This efficiently reduces the horizontal domain for the modules. The search space in vertical direction can be relatively large (particularly when not considering configuration frame granularity). However, due to the heterogeneous resources, the search space will be more constrained in horizontal direction. The search space is further reduced when taking the reconfiguration granularity into account. The reconfiguration granularity is required for partial run-time reconfigurable systems. We take advantage of this by implementing vertical shifts at the granularity of resources within a configuration frame. This results in a reduction of the vertical domain, and hence the search space. 2) Dynamic Constraints: Constraints which depend on the placement of other modules are dynamic. These constraints need to be computed to evaluate whether a location is valid. For example, if module A has a communication constraint related to another module, B, the position of module B affects whether the placement of module A is valid. Similarly, the intersection constraint depends on the placement of other modules. State 8 Figure 4. Example of possible module placement during constraint solving. The rightmost branch results in a feasible floorplan. For clarity, we have only used rectangular modules in the figure. B. Constraint Search The constraint search is implemented with a recursive depth first search (DFS) function. For each recursive call, the dynamic constraints are evaluated. The search backtracks if the constraints cannot be met. figure 4 illustrates the search and backtracking during module placement. The search returns the computed placement positions when all modules have been placed. C. Search Heuristics Constraint solvers typically improve performance (i.e. time it takes to solve the constraint problem) by reducing the search space. The order in which modules are placed, and the order in which constraints are tested, have impact on how fast the search space is pruned. Typically, the execution time of the module placer improves when placing the most constrained modules first. Therefore, we order the modules by the largest Implementation of modules Module placement 18 Module implementations (source code) Bounding box constraints (GoAhead) Generating placement problem 15 21 16 11 19 20 Synth/MAP/PAR Solve placement (constraint solver) 17 14 9 8 22 7 12 Implemented modules Floorplan 23 2 5 3 10 13 Constraint computation System implementation 1 Figure 5. An overview of the experimental tool flow used. A set of modules know to fill 100% of the FPGA is generated. The constraint solved computes feasible placement for the modules. The computed placement is used to implement the design from a set of modules. module first (LMF) heuristic. For example, larger modules are more constrained with respect to the placement as the heterogeneous resources reduce the horizontal domain. It is trivial to generate the initial placement order from the module size. Thus, the module size order is a good compromise compared to a heuristic ordering based explicitly on how constrained the modules are. V. E XPERIMENTAL R ESULTS We have undertaken experiments of the module placer both in a simulated run-time application and in a designtime application. Currently, no common benchmark is available to determine the performance (i.e. execution time, packing quality) of module placement. To evaluate the module placer in a design-time environment, we have generated synthetic module placement problems. The synthetic placement problems consist of placing implemented (i.e. fully functional, placed and routed) relocatable modules. The modules have been implemented using CalyptoC, a high level synthesis (HLS) tool and Xilinx tools. We have constrained the module logic and routing to bounding boxes using the GoAhead tool flow introduced in Beckhoff et al. [25]. The complete experimental flow is depicted in figure 5. We have used the experimental scenario provided by Koester et al. [14] for our run-time experiments. In the runtime environment we simulate the benchmark scenario with the modules parameters (i.e. not implemented modules) given in [14]. However, we use a finer tiled grid as this is one of the advantages with GoAhead. The experimental results were obtained as follows: 4 6 0 Figure 6. The figure depicts a floorplan computed with the constraint solver. The FPGA model is a Xilinx Virtex-6 (xc6vlx240tff784-3) device. The middle part is allocated for the system CPU. The shadings are used to depict the different modules. A. Design-time Benchmarks We first generated a repository of 200 placed and routed relocatable modules. This was accomplished using a custom build program supporting multiple cores on multiple computers. The build program automates the implementation of the modules according to the flow depicted in Figure 5. In addition the build program allows implementation of many modules concurrently. This process took approximately 2 days on a 24 core Xeon X5690 host. This includes time spent on modules that were not possible to fully route successfully. We have used modules from the YaSSL [29] project. YaSSL has implementations of standardized cryptographic and hashing algorithms including AES, DES, MD5 and SHA variants. The YaSSL modules are written in C. We have implemented the YaSSL modules as hardware modules using the HLS tool, GoAhead, the Xilinx tool chain, and the concurrent build program. One of the advantages of using a HLS tool to implement hardware from software, is the ability to implement multiple hardware variants from a single software implementation. For example, different transformations including partial or complete loop unrolling and various number of pipeline stages are parameterizable and automated in the HLS tool. For our experiments, this allows us to implement many working modules with different resource and performance qualities. From the repository of implemented relocatable modules, we have generated test cases. The test cases are all solvable (i.e. a given placement exists where the constraints are met). All test cases utilized 100% of reconfigurable area allocated for modules. The test cases are artificial generated placement problems, using actual placed and routed modules. Each test case consist of relocatable modules of different shapes and Table IV. S IMULATED RUN - TIME B ENCHMARK RESULTS Number of modules Average unsuccessful placements Average unsuccessful placements with our improvements 3 4 5 6 7 8 0% 0% 10% 25% 0% 0% 5% 14% 33% 52% on the FPGA. For each iteration, the earliest placed module is removed, and a new random module is placed. If the module cannot be placed within the area, it must either be delayed or rejected. The average number of unsuccessful placements is shown in the Table IV. This occurs when a task is scheduled to be to be placed, but it cannot be placed. Figure 7. The figure depicts a system implemented from a set of modules. The figure is generated from the XDL file of the final system based on modules from the repository and the computed floorplan. The depicted device is a Xilinx Virtex-6 (xc6vlx240tff784-3) FPGA. resource requirements. The generated placement problems contain up to 30 different modules. In average, the test cases contain 15% complex shaped ( , , and shaped variants) modules, and 15% of the modules span multiple clock regions. The modules were placed on a model of the Xilinx Virtex-6 (xc6vlx240tff784-3) FPGA. We used arbitrary module layouts, implementation variants and a finer vertical grid granularity than Koester et al. The selected grid granularity consist of 64 slices (i.e. 2 CLB columns), 4 BRAMs, or 4 DSP blocks. A finer vertical grid allows the modules to have less internal fragmentation, thus utilizing the resources better. ReCoBus/GoAhoad allows implementation of communication for fine grained placement grids. This allows us to extend the reconfigurable area slightly while maintaining the average resource utilization. Thus, our simulations point to improved results compared to [14]. As shown Table IV, we were able to place more modules with less placement violations in our simulations. VI. The generated test cases were solved by the module placer. This was accomplished by solving the constraint problem for the set of modules in the test case. Experimental results of a solved placement problem are depicted in Figure 6. The Figure shows the computed floorplan. The computed floorplan is used to implement the system. In Figure 7, a system implemented with modules from the repository and placed by the module placer is depicted. The Figure is generated by parsing the XDL file of the implemented system. Currently, the generated systems do not perform any function other than to show the feasibility of our module placer when integrated with FPGA tools. B. Run-time Benchmarks For run-time experiments, we have used the scenario provided in [14] by Koester et al. We have used the same FPGA fabric (Xilinx Virtex4 FX100), modules, schedule, and the same number of iterations (10000). Our run-time experimental results are based on simulations. In order to get a valid comparison between the work of Koester et al. and our implementation, we performed the experiments under the same conditions. In the benchmark scenario by Koester et al. [14], we place n modules. These modules are executed concurrently C ONCLUSION In this paper, we have introduced an improved placement model and module placer implementation. Our work targets systems implemented on state of the art FPGAs using relocatable modules. We have presented a realistic model. Our placement model supports heterogeneous logic resources, implementation variants, communication and device configuration granularity. In addition, we model heterogeneous resources such as embedded memory, multipliers and logic. The module placer is implemented using constraint solver techniques. We have combined the module placer fwith other tools to implement FPGA based systems from a repository of pre-implemented modules. We have generated complete placed and routed designs in order to validate that both the model is realistic, and that our approach is feasible. Simulations of run-time module placement points to improved results compared to previous work in the area. Here, the benefit is the result of reduced internal fragmentation and a finer tile grid. We have reduced the internal fragmentation by modeling complex shaped modules. ACKNOWLEDGMENT This work is funded by T HE R ESEARCH C OUNCIL OF N ORWAY as part of the Context Switching Reconfigurable Hardware for Communication Systems (COSRECOS) [30] project, under grant 191156V30. [16] D. Mehta, “Heterogeneous floorplanning for FPGAs,” in 19th International Conference on VLSI Design held jointly with 5th International Conference on Embedded Systems Design (VLSID’06). IEEE, 2006, p. 6 pp. [17] [2] S. Areibi, G. Grewal, D. Banerji, and P. Du, “Hierarchical FPGA placement,” Canadian Journal of Electrical and Computer Engineering, vol. 32, no. 1, pp. 53–64, Jan. 2007. A. Wold, D. Koch, and J. Torresen, “Enhancing Resource Utilization with Design Alternatives in Runtime Reconfigurable Systems,” in 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum. Anchorage: IEEE, May 2011, pp. 264–270. [18] [3] C. Lavin, M. Padilla, S. Ghosh, B. Nelson, B. Hutchings, and M. Wirthlin, “Using Hard Macros to Reduce FPGA Compilation Time,” 2010 International Conference on Field Programmable Logic and Applications, pp. 438–441, Aug. 2010. A. Lodi, S. Martello, and M. Monaci, “Two-dimensional packing problems: A survey,” European Journal of Operational Research, vol. 141, no. 2, pp. 241–252, Sep. 2002. [19] A. M. Smith, G. A. Constantinides, and P. Y. K. Cheung, “Integrated Floorplanning, Module-Selection, and Architecture Generation for Reconfigurable Devices,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 16, no. 6, pp. 733–744, Jun. 2008. [20] Xilinx, “Configurable logic block with AND gate for efficient multiplication in FPGAS,” pp. 1–58, 2010. [Online]. Available: www.xilinx.com/support/documentation/user guides/ug384.pdf [21] ——, “Spartan-6 Family Overview Summary of Spartan-6 FPGA Features,” pp. 1–11, 2011. [Online]. Available: http://www.xilinx.com/ support/documentation/data sheets/ds160.pdf [22] M. Shelburne, C. Patterson, P. Athanas, M. Jones, B. Martin, and R. Fong, “MetaWire: using FPGA configuration circuitry to emulate a network-on-chip,” IET Computers & Digital Techniques, vol. 4, no. 3, p. 159, 2010. [23] A. Oetken, S. Wildermann, J. Teich, and D. Koch, “A Bus-based SoC Architecture for Flexible Module Placement on Reconfigurable FPGAs,” in 2010 International Conference on Field Programmable Logic and Applications. Milan, Italy: IEEE, Aug. 2010, pp. 234–239. [24] T. S. T. Mak, N. P. Sedcole, P. Y. K. Cheung, W. Luk, T. T. Mak, P. Sedcole, and P. K. Cheung, “On-FPGA Communication Architectures and Design Factors,” in 2006 International Conference on Field Programmable Logic and Applications. IEEE, 2006, pp. 1–8. [9] P.-H. Yuh, C.-L. Yang, and Y.-W. Chang, “Temporal floorplanning using the three-dimensional transitive closure subGraph,” ACM Transactions on Design Automation of Electronic Systems, vol. 12, no. 4, pp. 37–es, Sep. 2007. [25] C. Beckhoff, D. Koch, and J. Torresen, “Go Ahead: A Partial Reconfiguration Framework,” in 2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines. IEEE, Apr. 2012, pp. 37–44. [10] L. Singhal and E. Bozorgzadeh, “Multi-layer Floorplanning on a Sequence of Reconfigurable Designs,” in 2006 International Conference on Field Programmable Logic and Applications. IEEE, 2006, pp. 1–8. [26] A. A. Sohanghpurwala, P. Athanas, T. Frangieh, and A. Wood, “OpenPR: An Open-Source Partial-Reconfiguration Toolkit for Xilinx FPGAs,” in 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum, no. Xdl. IEEE, May 2011, pp. 228–235. [27] M. Bourgeault, “Altera’s Partial Reconfiguration Flow,” 2011. [Online]. Available: http://www.eecg.utoronto.ca/∼jayar/FPGAseminar/ FPGA Bourgeault June23 2011.pdf [28] K. Apt, Principles of Constraint Programming. Press, 2003. [29] WolfSSL, “Yassl,” 2013, available from http://www.yassl.com/yaSSL/benchmarks-cyassl.html. [30] “Context switching reconfigurable hardware for communication systems,” available from http://www.mn.uio.no/ifi/english /research/projects/cosrecos. R EFERENCES [1] M. Wirthlin and B. Hutchings, “A dynamic instruction set computer,” in Proceedings IEEE Symposium on FPGAs for Custom Computing Machines. IEEE Comput. Soc. Press, 1995, pp. 99–107. [4] S. Liu, R. N. Pittman, A. Form, and J.-L. Gaudiot, “On energy efficiency of reconfigurable systems with run-time partial reconfiguration,” in ASAP 2010 - 21st IEEE International Conference on Application-specific Systems, Architectures and Processors. IEEE, Jul. 2010, pp. 265–272. [5] L. Singhal and E. Bozorgzadeh, “Heterogeneous Floorplanner for FPGA,” in 15th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM 2007). IEEE, Apr. 2007, pp. 311–312. [6] S. Martello and D. Vigo, “Exact Solution of the Two-Dimensional Finite Bin Packing Problem,” Management Science, vol. 44, no. 3, pp. 388–399, Mar. 1998. [7] K. Bazargan and M. Sarrafzadeh, “Fast online placement for reconfigurable computing systems,” in Seventh Annual IEEE Symposium on Field-Programmable Custom Computing Machines (Cat. No.PR00375). IEEE Comput. Soc, 1999, pp. 300–302. [8] S. Fekete, E. Kohler, and J. Teich, “Optimal FPGA module placement with temporal precedence constraints,” in Proceedings Design, Automation and Test in Europe. Conference and Exhibition 2001. IEEE Comput. Soc, 2001, pp. 658–665. [11] A. Montone, M. D. Santambrogio, D. Sciuto, and S. O. Memik, “Placement and Floorplanning in Dynamically Reconfigurable FPGAs,” ACM Transactions on Reconfigurable Technology and Systems, vol. 3, no. 4, pp. 1–34, Nov. 2010. [12] I. Belaid, F. Muller, and M. Benjemaa, “Off-line placement of hardware tasks on FPGA,” in 2009 International Conference on Field Programmable Logic and Applications. IEEE, Aug. 2009, pp. 591–595. [13] M. Koester, M. Porrmann, and H. Kalte, “Task placement for heterogeneous reconfigurable architectures,” in Proceedings. 2005 IEEE International Conference on Field-Programmable Technology, 2005., no. m. IEEE, 2005, pp. 43–50. [14] M. Koester, W. Luk, J. Hagemeyer, M. Porrmann, and U. Ruckert, “Design Optimizations for Tiled Partially Reconfigurable Systems,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 19, no. 6, pp. 1048–1061, Jun. 2011. [15] J. Angermeier, S. Wildermann, E. Sibirko, and J. Teich, “Placing Streaming Applications with Similarities on Dynamically Partially Reconfigurable Architectures,” in 2010 International Conference on Reconfigurable Computing and FPGAs. IEEE, Dec. 2010, pp. 91–96. Cambridge University