Run-time Layered Scalable Partial reconfigurable architecture AbstractThe partial dynamically reconfiguration provide reuse one or more smaller devices for high performance computational task. Hence, the reconfiguration delay overhead is the major shortcoming of current architecture which focus on the reconfiguration time and the size of reconfiguration file. Partial reconfiguration is one of solutions for reducing the reconfiguration overhead like: configuration compression, configuration caching and perfecting, configuration relocation and defragmentation, the utilizing multiple contexts and using partial dynamic reconfiguration[1]. These techniques have been used in order to reduce the reconfiguration latency on the overall performance. From another point of view of the platform hardware, we found that there was still a space of research by optimizing the platform prototype. Our aim is to minimize the reconfiguration latency overhead by redesigning and optimizing the platform prototype. The proposed approach involves evolution of some existent methods and some especial hardware designs. To demonstrate the applicability of the analysis, we present following: 1) proposition of architecture with the method of implementation ; 2) different restricting condition of the reconfiguration time 3) experiment tested and the results for a partial reconfigurable hardware prototype implemented on a FPGA board. What I understand here is, we have a configuration time problem, so to reduce these reconfiguration time we use communication interface based on … the question is how this interface deals with reconfiguration time and how it would reduce it ? “An abstract is an accurate representation of the contents of a document in an abbreviated form” Ask yourself: Why would another researcher be interested in this research? What are the most important aspects of the research? What should a reader be sure to know about the research? What information will the reader have to have in order to understand the most important aspects? What are the main points from each section of your report? Summarize each section in one sentence, if possible. The introduction prepares readers for the discussion that follows by introducing the purpose, scope, and background of the research Background Information includes facts that the reader must know in order to understand the discussion that follows. These facts may include descriptions of conditions or events that caused the project to be authorized or assigned and details of previous work and reports on the problem or closely related problems. Run-time Layered Scalable Partial reconfigurable architecture Ask yourself: What facts does the reader need to know in order to understand the discussion that follows? Who has done previous work on this problem? What will the reader know about the subject already and what will you need to tell them so they can understand the significance of your work? the partial and dynamic reconfiguration for FPGA is a hot topic in the research domains now, even though some new generations of FPGAs provide support a more flexible and effective solutions for implementing parallel application. Especially, the re-use of component produce significant performance improvements when we need update continually the system, the reconfiguration latency overhead is still one major shortcoming of current reconfigurable architecture. Some techniques have been described for reducing the reconfiguration latency overhead. There include configuration compression, configuration caching, configuration relocation and defragmentation, multiple contexts, and Partial Dynamic Reconfiguration (PDR) [1]. Nevertheless, they may not produce significant improvements for the fact that the reconfiguration latency brought by the complex communication aspect of hardware platform which could be explained by those two fact: 1) the reconfiguration frequency isincreasing a cause of , in many case the communication interface is reconfigurable. 2) because of the design of System-On-Chip is more and more based upon the reuse IP core. the interface of IP is became more and more complex, if every reuse IP have one especial interface, the communication interface is no more a simple part in the system. If every interface is needed to reconfigure that will be a great reconfiguration time. in this paper, the focus is the proposition of our hybrid reconfigurable architecture. The reconfiguration latency can be reduced by optimizing the architecture. so based on the two drawback, we suggest two steps work in our approach: 1) Optimization of application; we optimize the application by compressing the configuration file , the aims is to minimize the reconfiguration time by reducing the reconfiguration file . in the same, all the hardware context are defined with the scheduling of configuration 2) Integration design of communication system correspond set of application; design a communication interface that not only can provide support of communication bewteen set hardware context, but also can self-adapt the change of interface when the reconfiguration is required at run-time. To demonstrate the applicability of the analysis, we present following: the related work is presented in the following section. Next, we talk about the reconfiguration problem and some condition parameters, the proposed architecture and method are detailed in the third section. In the fourth section, experiment of partial reconfiguration and the experimental result are explained , finally we present the conclusion and future work. It is very important to consider the purpose of your research in the introduction. If you do not completely understand what the purpose is, there is little chance that the reader will understand your Run-time Layered Scalable Partial reconfigurable architecture purpose either. The following questions will help you to think about the purpose of your research and your reason for writing a report: What did your research discover or prove? What kind of problem did you work on? Why did you work on this problem? Why are you writing this paper? What should the reader know or understand when they are finished reading the paper? Scope refers to the ground covered by the report and will outline the method of investigation used in the project. Considering the scope of your project in the introduction will help readers to understand the parameters of your research. It will also help you to identify limiting factors on your research. For example, “if 18 methods for improving packaging are investigated in a project but only 4 are discussed in the report, the scope indicates what factors (such as cost, delivery time, and availability of space) limited the selection” (Blicq and Moretto 165). Scope may also include defining important terms. These questions will help you to think about the scope of both your research and your report: How did you work on the research problem? Why did you work on the problem the way you did? Were there other obvious approaches you could have taken to this problem? What were the limitations you faced that prevented your trying other approaches? What factors contributed to the way you worked on this problem? What factor was most important in deciding how to approach the problem? Introduction The reconfigurable computing [1] has been already developed in recent years. Because of the wide variety of applications is potential to greatly accelerate, the reconfigurable computing is very popular now. The key advantage is support optimising the hardware to improve the computational performance, while retaining much of the flexibility of a software solution. In another hand, the designing complexity, verification, and the time-to-market pressures encourage reuse of components for one or set applications. So Now, the new general FPGAs provide support more and more the partial and dynamic reconfiguration[2], there has been a large amount of work on dynamic modular systems in FPGAs predicated on the property of dynamic reconfiguration (see for example[3][4][5][6][7][8]). Run-time Layered Scalable Partial reconfigurable architecture However, the performance of the system is very limited by the synchronisation between the processor and the reconfigurable component. There is one important factor to restrict the performance of the reconfigurable system that is the reconfiguration latency. One example shown in [9], which tell us that 30% of Virtex-II v6000 FPGAs are occupies when using partial reconfiguration in loading a JPAG decoder task, and 12ms required with the reconfiguration circuitry running at maximum speed. For instance, in the [10], the multimedia standards MPEG-4, which support digital video ans 3D-game application where the number and type of objects to decode and visualize can change at run-time from one frame to another. Moreover, to reach the standard quality level, the system must decode and visualize a frame in less than 33ms. Another important drawback to limit the use of dynamically reconfigurable Hardware is the lack of programming support for dynamic task placement. That is to say the operating system could load the configuration file onto different regions on the FPGAs and execute in parallel. One task finish, the next can be reconfigured in the same region. So the designers need to find out a solution to schedule and to allocate the tasks and to adapt inter-communication of task. In this paper, based on these two drawbacks, we propose one solution for reducing and hiding reconfiguration latency and automatic reconfigurable task placement based on the commercial plat form Xilinx Virtex-4[11]. In the follow chapter highlight relevant related work to talk about the granularity of system. Chapter two introduces our approach with a selfadaptive interface architecture basis. The design flow for the partial reconfiguration tested on the Xilinx Virtex-4 will be shown in the third chapter with some experiment results. Finally, the summary of the main points and the future work is given. The introduction must be rewritten so that, Background, purpose and scope will be clearly identified Background and Related work ???? The reconfigurable hybrid architecture has been provided long time, The work is based on the traditional design flow TOP-DOWN of system level design [12] as shown in Figure 1. Run-time Layered Scalable Partial reconfigurable architecture Figure 1 traditional System Level Design flow Following this flow, the design kernel is on the optimising of method of reconfiguration of application, and the corresponding architecture. A few researchers work on the meliorating of application architecture like the solutions told in [13], which propose a static design-time scheduling technique to attempt to hide reconfiguration latency by organising the scheduling approaches, otherwise some research like[14] propose a technique for allocating data transfers and configurations to minimize overall execution time. Those methods focus to find out a solution to hide the reconfiguration latency. Our goal is to design a new reconfigurable platform prototype that can be used to reduce the reconfiguration overhead, and to make the partial reconfiguration resources effective even for highly dynamic application. [15] Propose one good idea of reconfiguration manager for the organisation of reconfiguration, as well as the [16] show us one profit of self reconfiguration interface and some problem for design the reconfiguration method. Also we have developed the tool and method for partitioning the application [17], and the method [18][19] that propose some solution for reducing and hiding the reconfiguration latency . 3 infrastructure of Partial dynamic Reconfigureable architecture the reconfiguration time can be explain simply by the equation : After the reconfiguration of a task it is necessary to modify the interface to adapt the new task. This adaptation of the interface shall be performed automatically. The self-adaptive Run-time Layered Scalable Partial reconfigurable architecture interface is triggered by the task reconfiguration. The different interface models are registered for the adaptation of the correspondent task. Our infrastructure that based on one self-adaptive communication interface(SCI), can automatically match the corresponding interface of re-configurable component at run-time, the implementation method is developing in the same time, Described in some detail below. 3.1 The auto-reconfiguration environment method and scheduling the design of auto-reconfiguration must begin from the generation of hardware context. For example, show in the figure (5), a simple system S has two functional models F1, F2. Between F1 and F2, the task1 is the public task, and task2 and task3 are the reconfigurable part. In the traditional partial reconfiguration, task2 and task3 can be exchanged at run-time if the system requests it. In such case, task2 and task3 are the partial reconfigurable modules (PRM) and the task1 is the static module(SM). In our approach, we would like compare task2 and task3 to find out one more public model task4 for reducing the size of reconfigurable file. So we distinguish three different steps of FPGA reconfiguration: Step1: for one or set of application, these tasks of application is defined, the re-configurable module and the sequence of reconfiguration are triggered by the designer. Figure 5: partitioning process Step 2: compare each two of these re-configurable module, the common module will be noted and putted into the static module. In the next new sequence of system will be generated like in the figure (6). In this step, the hardware context for the Static Module and the reconfigurable module with the particular interface are generated and scheduled. Run-time Layered Scalable Partial reconfigurable architecture SMs PRM1 PRM2 RPM1_bis SMs_ 1 RPM2_bis RPM1_bis SMs_ 0 + SMs_ 1 RPM2_bis Figure 6: compression of reconfigurable module and re-generation of sequence of system Step3: schedule the reconfiguration and build the connection which corresponds to the sequence of task reconfiguration the interface. Figure 7: PRR (Partial Reconfigurable Region) adapt the different hardware contexts (RPM1_bis or RPM2_bis) in the same time to build the connection , the scheduling environment is built. all the information of scheduling will be built in the SCI and controlled by FSMs. for example, at the same time of running the task0, the T1can be configured if the systems requires it. Simultaneity, the SCI can load the corresponding interface mode for the T1, as shown in figure (8). Run-time Layered Scalable Partial reconfigurable architecture R u n T0 T 0 i n C o n f i gu r e s t at i c m o d u l e T 1 1 T1 R u n 2 T2 T3 R u n 3 4 5 T5 6 T3 T1 T 3 T 3 C o n f i gu r e T 4 SM T0 R u n T4 T 2 T 2 C o n f i gu r e R u n T5 T 1 C o n f i gu r e R u n T4 T2 T 4 C o n f i gu r e NRI T 5 T 5 Figure8: scheduling design 3.2 The SCI Macro structure The key point of the SCI macro structure is using of a set of Finite State Machine (FSMs) shown in Figure (2) to load the different model de communication which at run-time to adapt the set of RM. The data and handshake sign communicate in the horizontal direction, in a given case, the FSM can load automatically on of these model of communication for building the connection for the communication between SM and RM. for example, like in the Figure (2), M1 is the model of communication for SM and RM1, M2 is the model for SM and RM2, M3 is a expanding model for the future. Of cause, this is the simplest model in the Figure. One more useful model is like in the Figure (3). One SCI connect two re-configurable regions. The signal active of Slice Macro is controlled by the SCI. The Boolean state is the easiest way to control the active of communication that is the raison why we connect only to Re-configurable region with one SCI. The communication with the others SCI could be active with the expanding model in the SCI with the request of system. Run-time Layered Scalable Partial reconfigurable architecture RM2 SM FSM RM1 RM1 SM M1 M2 M3 SCI EN Data_out SMI SCI Data_in Figure 2: 2architectural design of Communication Interface Block used for one reconfigurable region and one static module All the communication between Static Module(SM) and Reconfigurable Module(RM) or between two RM, must be passed by the SCI. the two type of connection can be shown below: First type SM-RPM: in this type, SM can be communicated in horizontal direction with any of the two RPM like the figure (3).the scheduling environment of the working will be expatiated in the next. SM SM RPM SM RPM SCI Figure 3: SM-to-RPM Second type SM- RPM&RPM: here, two re-configurable regions are connecting together by the RNI as shown in the figure (4). These two parts can be work like one application block and communicate with SM. Run-time Layered Scalable Partial reconfigurable architecture RPM2 SliceM RPM1 SMs RPR1 SCI RPM4 SliceM RPM3 RPR2 application block Figure 4: SM-to-RPM&RPM 3.3 Formalization However the reconfiguration latency is the major shortcoming of current architecture. even though the partial reconfiguration can decrease one great part of latency, the reconfiguration latency brought by the improving of the configuration frequency and the improving of size of configuration file have far exceeded it. for explaining more clearly, Application Task1 , Task 2,.....Taskn; Task T c o nf Tconf Ttotal Lt a s k F Ltask2 Tresetup F N 1 N ( L Treset upn) F n0 taskn n0 with the equation we could not clearly the reconfiguration time is dependant with the size of reconfiguration file and the configuration frequency. some techniques (configuration compression, configuration caching and perfecting, configuration relocation and defragmentation, utilizing multiple contexts and using Partial dynamic Reconfiguration [1].) have been described for reducing the reconfiguration latency overhead. Run-time Layered Scalable Partial reconfigurable architecture 4 The partial dynamic reconfiguration method and design flow Our approach for the partial and dynamic reconfigurable architecture implemented in Xilinx Virtex-4 FPGAs [11]. We make use of the existing partial reconfiguration mechanisms provided by the Xilinx FPGA and trying to develop a self-adaptive communication interface for the adaptation the component interface at run-time. In our experiment, two tests have been done as are shown in the figure (9). Two operations will be implemented in the plate form Virtex-4 which like the green block for the static function and yellow block for the reconfigurable function in the next figure. Additionally, there are two task in the system, one task used like the static function (A and B=C), another is the partial reconfigurable module (E1 and E2=S). PC provides partial bitstreams for the reconfiguration region. Figure 9 : two examples of partial reconfiguration with Xilinx Virtex-4(XC4VFX12-10) A and B =C (static function module) -------- green Run-time Layered Scalable Partial reconfigurable architecture block E1 and E2 = S (reconfigurable function module) --------blue block 4.1 Design flow We test the partial reconfiguration with a Xilinx FPGA (Virtex-4 XC4VFX12). The method of implementation is based on the module-based reconfiguration in the application note of Xilinx XAPP290 [20]. Hence, the new platform hardware member of Virtex family (Virtex-4) can provide partial reconfiguration slice by slice. Xilinx propose a new method, the merge reconfiguration method, that is different the reconfiguration on the Virtex-II/pro. Unfortunately, there are no more documents to explain completely this method. We have been already succeeded to implement the partial reconfiguration on the Virtex-4 with our design flow which is based on the method proposed by the Xilinx, at the same time few researchers were working on it also, like [21]. Hereinafter, the new partial and dynamic reconfiguration flow will be shown. Two great part innovation of this method are: firstly the bitstream of module are not loaded directly, but merged with the existing configuration state. Secondly not all resources within a reconfigurable module region are exclusive to the module; some are reserved for the static part of the design. The first major innovation is in the way the partial bitstream is loaded. Rather than writing the total bitstream directly to the configuration memory, the current configuration is read back from the device and modified with information from the partial bitstream before being written back. A module may occupy less than the full height of a frame, by only modifying tiles which fall within a given boundary region. As a result, it is also possible to have two or more module regions vertically aligned within the same frame space. The second major innovation is in the reconfigurable module region where not all resources are exclusive to the module; some are reserved for the static part of the design. A statically routed signal can pass through a reconfigured region unperturbed provide the configuration bits associated with the route persist in the new configuration. The design flow is beginning from the analysis of application, the traditional hw/sw co-design method [1]. In the next, the system is built correspond with the commercial tool and platform. The tool EDK is used in our project for generating the netlist of the entire module. After get the netlist. The design flow is separated by two parts: On the one hand, the bitstreams of the static part is continued to be generated. On another hand, comparing the file of netlist of system in which the reconfigurable module and static module are including, and the bitstreams of static part for getting out the different part. Just now, we get two bitstreams, one of the static module, another of the just the reconfigurable module. Finally, we implement these bitstreams on the FPGAs with the tool IMACT by passing the port JTAP. Run-time Layered Scalable Partial reconfigurable architecture Figure 10: the partial reconfiguration flow 4.2 Experiment result We test one reconfigurable part with the Xilinx FPGA (Virtex-4, XC4VFX12). The table 1 shows us some technical data from our experiment. Run-time Layered Scalable Partial reconfigurable architecture FPGA size Static part Reconfigurable part Slices 5,472 40 4 CLB’s 1,368 10 1 KB ≈87552 582 33 Tableau 1: FPGA area utilization The Xilinx Virtex-4 provides nearly 1,368 CLB. These CLB are arranged as 64x64 arrays. With the profile of Virtex-4, in our demonstrator design we use just only one CLB for the reconfigurable region and 2 CLB use like the reconfigurable interface that are the two slice macro. The new platform and the new design flow could let us choose the more accurate size of reconfigurable region for the reconfigurable module, unlike in the Virtex-II the reconfigurable region must be all full high of the platform. The partial reconfiguration reduce the area occupied of reconfiguration in the possible, the advance can be shown in the reducing of reconfiguration latency. Because of we can calculate the possible theoretical value for the reconfiguration of Virtex-4 FPGA. With a configuration clock frequency 100MHz, the transfer rate r is 100MB/s. the length L of the tested partial reconfiguration file is, for example, 33KB. The reconfiguration time can be obtained by this equation: Tconf =L/r with 3, 3 ms. This value is less than 10 double of the configuration time of static part. So we can image that the partial reconfiguration help us to get a new functional system by reconfiguring one part of the system. we don’t need reconfigure all of the system. However, the reconfiguration time shown is the theoretical value, in the real test we find out the real value is much higher than the theoretical value. which can be caused by two raison: one hand, due to the current design frequency may be less that the maximum value, one the other hand, in our experiment we choose the host pc to transfer the bitstreams data from the external configuration memory to the FPGA has not yet been optimized. Some times is wasted in the road. Conclusion: ( la conclusion va être ajouté jusqu’à la fin de la correction) we optimize a design flow for the partial reconfiguration which has been implemented in the Plat form hardware Xilinx Virtex-4. Partial and Dynamic reconfiguration arrow reconfigure parts of system. Our approach introduced an infrastructure with. One kernel part of our approach is a self-adaptive communication interface part. Further investigations will consider more complex example and different protocols REFERENCES [1] K.COMPTON “Reconfigurable Computing: A Survey of Systems and Software” , ACM Computing Surveys,Vol.34, No.2,june2002, pp.171-210. Run-time Layered Scalable Partial reconfigurable architecture [2]Moscu Panainte, E., Bertels, K., Vassiliadis, S.: Dynamic hardware recon_gurations: Performance impact on mpeg2. In: Proceedings of SAMOS. Volume 3133., Samos, Greece, Springer- Verlag Lecture Notes in Computer Science (LNCS) (2004) 284. 292 [3] Panainte, E.M., Bertels, K., Vassiliadis, S.: Instruction scheduling for dynamic hardware con_gurations. In: Proceedings of Design, Automation and Test in Europe (DATE 05), Munich, Germany (2005) 100.105 [2] P. Lysaght, “Dynamic Reconfiguration of FPGAs”, W. Moore and W. Luk, (editors), More FPGAs, pp 82–94, Abingdon EE & CS Books, England, 1994. [3]G. Brebner and O. Diessel, “Chip-Based Reconfigurable Task Management”, Pro-ceedings Field-Programmable Logic and Applications, 2001. [4] J. Burns, A. Donlin, J. Hogg, S. Singh and M. de Wit, “A Dynamic Reconfiguration Run-Time System”, Proceedings IEEE Symposium on FPGAs for Custom Computing Machines, 1997. [5]R. Maestre, F. J. Kurdahi, M. Fern´andez, R. Hermida, N. Bagherzadeh and H. Singh, “A Framework for Reconfigurable Computing: Task Scheduling and Context Management”, IEEE Transactions on Very Large Scale Integration (VLSI) Systems, Vol. 9, No. 6, pp 858–873, December 2001. [6]J-Y. Mignolet, V. Nollet, P. Coene, D.Verkest, S. Vernalde and R. Lauwereins, “Infrastructure for Design and Management of Relocatable Tasks in a Heterogeneous Reconfigurable System-on-Chip”, Proceedings Design, Automation and Test in Europe, 2003. [7]C. Steiger, H. Walder and M. Platzner, “Heuristics for Online Scheduling Real-Time Tasks to Partially Reconfigurable Devices”, Proceedings Field-Programmable Logic and Applications, 2003. [8]G. B. Wigley, D. A. Kearny and D. Warren, “Introducing ReConfigMe: An Operating System for Reconfigurable Computing”, Proceedings Field-Programmable Logic and Applications, 2002. [9] R. Hartenstein, “A Decade of Reconfigurable Computing: A Visionary Retrospective,” Proc. Design, Automation and Test in Europe (DATE 01), IEEE Press, 2001, pp. 642-649. [10]Z. Li and S. Hauck, “Configuration Compression for Virtex FPGAs,” Proc. 9th Ann. Symp. FPGAs for Custom Computing Machines (FCCM 01), IEEE Press, 2001, pp. 147-159. [11] Xilinx, Inc.2004 Virtex-4 Data sheet, Xilinx, Inc. San Jose, CA [12] K. Keutzer, Department of EECS, University of California at Berkeley Department of EE, Princeton University ,System Level Design: Orthogonalization of Concerns and Platform-Based Design,IEEE Transactions on Computer-Aided Design of Circuits and Systems, Vol. 19, No. 12, December 2000. [13] L.Shang and N.K.Jha “Hardware/Software Co-Synthesis of Low Power Real-time Distributed Embedded Systems with Dynamically Reconfigurable FPGAs” Proc.Asia South Pacific Design Automation Conf.(ASP-DAC 02). ACM Press. 2002, PP. 345-354. [14] R.Maestre et al., “Configuration Management in Multi-context Reconfigurable Systems for Simultaneous Performance and Power Optimizations,” Proc.13 Int’l Symp. System Synthesis(ISSS 00), IEEE Press,2000,pp.107-113. [15] Javier Resano and Daniel Mozos, Francky Catthoor, Diederik Verkest, Universidad Complutense de Madrid, Katholieke Universiteit Leuven and IMEC, “ A Reconfiguration Manager for Dynamically Reconfigurable Hardware”, IEEE Design and Test of Computers 2005 , pp 452-460 [16] André Meisel, Markus Visarius, Wolfram Hardt, Chemnitz University of Technology Faculty of Computer Science Gemany, “Self-Reconfigurable of communication Interfaces”, 15 IEEE International workshop on Rapid System Prototyping(RSP’04) 2004. [17] Y.BERVILLER, C.TANOUGAST, S.WEBER, P.BRUNET, H.RABAH, "Outil de partitionnement temporel pour la conception de systèmes reconfigurables embarqués de traitement du signal",Traitement du signal volume 22 n°6 2006 [18] Tangougast, C et al, 2003, Automated RTR Temporal Partitioning for Reconfigurable Embedded Real-time System Design, Proceddings of the International Parallel and Distributed Processing Symposium(IPDPS’03), IEEE Computer Society Press. [19] T.LIU, C.TANOUGAST, P.BRUNET, Y.BERVILLER, H.RABAH, S.WEBER,"An Optimized FGPA Implementation of an AES Algorithm for Embedded Applications.", ISBN 972-99353-8-6 Proceeding of the 2004 International Workshop on Applied Reconfigurable Computing, Algarve, Portugal, February 22, 2005. [20] Xilinx Inc. Two flows for partial reconfiguration: module -based or difference based. Xilinx App. Note 290 Sep., 2004. [21]Shannon Koh and Oliver Diessel, “COMMA: A Communications Methodology for Dynamic Module Reconfiguration in FPGA” (Extended Abstract), IEEE, FCCM’06 2006