Transformation of a synthesizable subset of ANSI C code into behavioral SystemC code Piotr Dziurzanski, Vladimir Beletskyy Faculty of Computer Science & Information Systems, Technical University of Szczecin, ul. Zolnierska 49, 71-210 Szczecin, Poland e-mail: pdziurzanski@wi.ps.pl, bielecki@man.szczecin.pl Abstract: In this paper, there is a preliminary description of a system under development for translating codes written in ANSI C into behavioral SystemC codes. The limitation of the translable structures of ANSI C are described and implementation details are stressed. Key words: Higl level synthesis, synthesizable subset, ANSI C, System C 1. INTRODUCTION Different hardware description languages (HDLs) are used as input to behavioral synthesis. The most commonly used are VHDL and Verilog, but since designers often write system level models using programming languages, application of software languages are of mounting popularity. Applying software languages makes easier performing SW/HW cosynthesis, which accelerates the design process and improves the flexibility of the software/hardware migration. Moreover, the system performance estimation and verification of the functional correctness is easier, as software languages offer fast simulation and a sufficient amount of legacy code and libraries which facilitate the task of system modelling. To implement parts of the design modelled in C/C++ in hardware using synthesis tools, designers must translate these parts into a synthesizable subset of a HDL, which then is synthesized into a logic netlist. A leadership of ANSI C/C++ in the field of software languages contributes to a large number of HDLs based on these languages, for example SystemC, Cynapps, Accellera, and SpecC. This choice makes rewriting the C/C++ code into an equivalent HDL description less time consuming and less error prone that results in a shorter time to market and higher quality [4]. 2 In this paper, we analyze the transformation of an ANSI C code into a SystemC code, which is open and supported by Synopsys, Cadence, Mentor Graphics, Xilinx and other vendors. 2. PRINCIPLES OF SYSTEMC In this section, we introduce basic SystemC concepts and nomenclature. In SystemC, a modelled system is comprised of modules with single or multiple processes to specify combinational or sequential logic. Processes define the parallel behavior of a particular module, processes are executed concurrently. However, an execution of the code within a process is sequential. Each process is declared as a C++ member function of a module class and registered in the constructor. Except for processes, a module contains ports, internal signals, internal data variables, and member functions. It may also include other modules for hierarchical design. Defining a process is based on the method of defining a C++ function, as it is declared as a member function of a module class. Then it is registered as a process in the constructor of the module. There exist three typees of SystemC processes. Since we are aimed at the creating of synthesizable models, we make usage of the only synthesizable type of SystemC process, an SC_METHOD process, which is either level-sensitive or edge-sensitive with respect to a set of signals called its sensitivity list. For defining the module constructor, the C++ macro SC_CTOR is applied. In its body processes are registred and their sensitivity lists are declared. The sensitivity list is defined with the sensitive( ), sensitive_pos( ), sensitive_neg( ) functions or the sensitive, sensitive_pos, or sensitive_neg streams. Similarly to other HDLs, processes in SystemC communicate with their environment using ports, whereas for the communication between processes internal signals or internal variables can be utilizied. However, it is not adviced to apply internal variables, as during simulation the processes are executed in random order which can lead to nondeterminism. Ports are declared with template classes sc_in<port_type>, sc_out<port_type>, sc_inout<port_type> regarding their direction, and signals are declared with the template class sc_signal<port_type>. Internal variables are declared as in ANSI C. 3. ANSI C TO SYSTEMC TRANSLATION Due to the fact that SystemC is a library of the ANSI C language, there is a possibility of one-to-one translation between those two systems in most cases. The vast majority of ANSI C statements are supported by SystemC, but a part of them is nonsytnesizable. In the next section, synthesizable and nonsynthesizable statements are enumerated. ANSI C defines sequential processes, whereas in SystemC processes are run in parallel. Thus one of the crucial points of trasforming ANSI C code into SystemC code is to establish groups of ANSI C functions which should be executed in a single SystemC process. The basic three approaches for this problem are as follows. 2 3 int a; void add() { a++; } void main() { add(); return 1; } (a) SC_MODULE(ex1){ sc_in<bool> start; sc_out<int> output; int a; void add() { a++; } void main() { add(); output=1; } SC_MODULE(ex1) { sc_in<bool> start; sc_out<int> output; sc_signal<int> a; sc_signal<bool> CS; void add() { a++; } void main() { CS=false; CS=true; output=1; } SC_CTOR(ex1) { SC_METHOD(main); sensitive_pos(start); SC_METHOD(add); sensitive_pos(CS); } }; VWDUW PDLQ RXWSXW (d) VWDUW PDLQ &6 RXWSXW D DGG (e) (c) SC_CTOR(ex1) { SC_METHOD(main); sensitive_pos(start); } }; (b) Fig. 1. The ANSI C program (a), corresponding SystemC realizations (b,c) and block diagrams of the processes (d,e) (Example 1) 1. 2. 3. Treating the whole ANSI C program as a single SystemC process, Treating each ANSI C function as a separate SystemC process, Forming partitions from ANSI C functions and then implementing each partition as a separate SystemC process (a hybrid approach). In approach 1, the whole program is executed serially. This approach is sensible if ANSI C functions share a lot of variables or the functions themselves are so simple that time used for synchronization would outnumber benefits of the paralelization. Then the start point of an ANSI C program (usually a function named main) is declared as the only process. This approach is quite straightforward and makes the data dependency analysis unnecessary. Consequently, there is no need of adding control buses for synchronization. Such the approach, however, eliminates all the benefits following from the possibility of parallelization as a single process is serially executed. As functions are executed serially, the function, executing another one, has to wait for the finishing of the executed function. 3 4 int a; void add() { a++; } void main() { add(); return a; } (a) SC_MODULE(ex1){ sc_in<bool> start; sc_out<int> output; int a; void add() { a++; } void main() { add(); output=a; } SC_CTOR(ex1) { SC_METHOD(main); sensitive_pos(start); } }; SC_MODULE(ex2) { sc_in<bool> start; sc_out<int> output; sc_signal<int> a; sc_signal<bool> CS; sc_signal<bool> RDY; void add() { a++; RDY=true; } void main() { CS=false; RDY=false; CS=true; while(RDY==false); output=a; } VWDUW PDLQ RXWSXW (d) VWDUW PDLQ &6 RXWSXW D 5'< DGG (e) SC_CTOR(ex1) { SC_METHOD(main); sensitive_pos(start); SC_METHOD(add); sensitive_pos(CS); } }; (c) (b) Fig. 2. The ANSI C program (a), corresponding SystemC realizations (b,c) and block diagrams of the processes (d,e) (Example 2) Approach 2 is worth considering in the case when functions are lousy tightened, i.e., when there is no much communication between processes. One of complications following from this approach is the need of implementing blocking actions before accessing to shared variables. These actions can be implemented as wait statements, which can be left when synchronization signals from other modules are set. Obviously, these signals can complicate the implementation so that it can be not acceptable due to a large size of the obtaining realization. Approach 3 leads to the best results, but the problem with the function partitioning is computable expensive. One of data structures that can help with the optimization of this stage is a dependency graph [5]. Example 1. Let us consider the ANSI C code given in Fig. 1a, where the function add does not share any variables with the function main. Consequently, the main function does not have to wait until the add finishes. As there are no data dependence between the functions main and add, they can be executed in parallel, and, consequently, realized in one SystemC module as two processes. As main 4 5 int main() { int i; char a[100]; char b[100]; int n=100; for(i=0;i<100;i++) b[i]=a[i]; return 1; } SC_MODULE(ex2) { sc_in<bool> start; sc_out<int> output; sc_signal<int> a[100], b[100]; sc_signal<bool> CS1, CS2, CS3; sc_signal<bool> RDY1, RDY2, RDY3; void loop_body (int From,int To) { int i; for(i=From; i<To ; i++) b[i]=a[i]; } (a) VWDUW RXWSXW PDLQ 5'< &6 ORRS &6 ORRS 5'< &6 ORRS 5'< (c) void main() { RDY1=false; RDY2=false; RDY3=false; CS1=false; CS1=true; CS2=false; CS2=true; CS3=false; CS3=true; while(RDY1 & RDY2 & RDY3 != true); output=1; } void start1(){loop_body(0,33); RDY1=true;} void start2(){loop_body(33,66); RDY2=true;} void start3(){loop_body(66,100); RDY3=true;} SC_CTOR(main) { SC_METHOD(main); sensitive_pos(start); SC_METHOD(start1); sensitive_pos(CS1); SC_METHOD(start2); sensitive_pos(CS2); SC_METHOD(start3); sensitive_pos(CS3); } }; (b) Fig. 3. The ANSI C program (a), corresponding SystemC realization (b) and block diagram of the processes (c) (Example 3) executes add, a control connection between the corresponding processes is necessary. In Fig. 1c and e, the realization with positive edge activating the add function and the block diagram of the processes are depicted, respectively. If we utilize approach 3, we obtain a single module and have no benefits from possible parallelization (Fig. 1b and d). Example 2. Since in the code presented in Fig. 2a the both functions main and add share the same variable a, the execution of main has to wait until add finishes its execution. In order to synchronize the execution of processes, the signal which is set when add finishes is added. In Fig. 2b the realization with positive edge activating the add function and synchronization signal RDY is given. The block diagram of the processes is depicted in Fig. 2c. 5 6 Bool Datatype Struct Integer, Character, Enumeration Constants Postfix Incrementation (++, --) Unary Operators (+,-) Logical Negation Operator (!) Additive Operators (+,-) Relational Operators Bitwise AND, XOR, OR Operators Conditional Operator (?:) Comma Operator (,) Declarations Storage Class Specifiers (extern, static, typedef) Array Declarators Labeled Statements Selection Statements (if, switch) Jump Statements (goto, continue, break, return) File Inclusion (#include) Function Overloading Operator sizeof Integer Datatypes Enumeration Datatype Arrays Casts One's Complement Operator (~) Multiplicative Operators (*,/,%) Shift Operators (<<,>>) Equality Operators Logical AND, OR Operators Assignment Expressions Constant Experssions Init Declarations Type Specifier const Function Declarators Compound Statement (block) Iteration Statements (while, do, for) Function Definitions Conditional Compilation Operators Overloading Tab. 1. Synthesizable ANSI C Constructs Floating Datatypes File Datatype Union Datatype Volatile Qualifier Address Operator (&) Floating Constants Standard Library Functions Recursions Pointers Void Datatype Global Variables Storage Classes auto, register Indirection Operator (*) Pointer Declarations Dynamic Memory Allocation Operator -> Tab. 2. Nonsynthesizable ANSI C Constructs This case can also be implemented with approach 1, which leads to a single SystemC process, given in Fig. 2d and e. In order to benefit more from the transformation, there is a possibility of parallelizing statements inside a function. The next example shows the parallelization of a for loop. Example 3. In the code presented in Fig. 3a, there is the for loop where there are no dependencies among iterations. Then, it could be split in a few processes and run in parallel. In Fig. 3b and c, the realization and the diagram with three processes are depicted (only the synchronization wires are visible). 6 7 4. ANSI C CONSTRUCTS FOR BEHAVIORAL SYSTEMC SYNTHESIS For the designer synthesizing hardware from an C code, the most useful would be a synthesizer which accepts the full ANSI C standard described in [3]. This task, however, turns out to be particularly difficult due to such statements as dynamic memory allocation, function calls, recursions, jumps, type castings, and pointers [1], [4]. In our implementation, we established synthesizable and nonsynthesizable subsets of the ANSI C constructs as given in Table 1 and Table 2, respectively. Although an arbitrary control flow caused by jump statements complicates the scheduling of operations, it has been included into the synthesizable subset. Arrays types can be synthesized as long as each field is of a synthesizable data type. The constructs which have no hardware meaning, such as file operations, are not synthesizable and thus should be avoided. Floating point types are not synthesizable due to the fact that straightforward implementation resulting in the hardware which requires an enormous amount of resources which is beyond the contemporary technology. However, the method described in [2] that offers a fixed point implementation from a floating-point description is under consideration. The dynamic memory allocation and recursion is not synthesizable as an amount of the required memory is unknown at the synthesis stage. Therefore, the synthesis of C code involving dynamic memory allocation would require the access to an operating system running in software or the generation of hardware allocators [6]. Pointers are especially difficult to synthesize as they have different applications, such as complex memory management operations, referencing data structures, referencing functions, passing parameters by reference. SpC, an interesting approach to synthesise pointers and malloc/free statements is described in [6]. However, dynamic memory allocation needs still a lot of research to be carried out so as it can be synthesizable at a satisfactory level, so the synthesis pointers is not included in the majority of available systems (BACH C, COWARE, OCAPI, Synopsys COCENTRIC, and NEC CYBER). 5. CONCLUSIONS AND FUTURE WORK In this paper, we have described a system under development for translating codes written in ANSI C into behavioral SystemC codes. The method of parallel running of funcions are described; synthesizable and nonsythesizable ANSI C subsets are given. For nonsythesizable constructs, we have presented a short justification why it is difficult or impossible to synthesize them. In our future work, we are going to develop methods to synthesize ANSI C code with OpenMP pragmas, which define the parallelization of a code. 7 8 6. REFERENCES [1] ‘Describing Synthesizable RTL in SystemC’, Version 1.2, November 2002, Synopsys, www.synopsys.com [2] H. Keding, M. Willems, M. Coors, H. Meyr, ‘FRIDGE: a fixed-point design and simulation environment Integrated Signal’, In Proceedings of the Design, Automation and Test in Europe, Paris , France, 1998, pp. 429-435 [3] B. Kerninghan, D. Ritchie, ‘The C Programming Language’, Prentice Hall Software Series, Englewood Cliffs, NJ, 1988 [4] S. Y. Liao, ‘Towards a new standard for system-level design’, In Proceedings of the Eighth International Workshop on Hardware/Software Codesign, San Diego, CA, USA, 2000 [5] G. De Micheli, ‘Synthesis and Optimization of Digital Circuits’, Mc Graw Hill, Highstown, NJ, 1994 [6] L. Semeria, K. Sato, G. De Micheli, ‘Synthesis of hardware models in C with pointers and complex data structures’, IEEE Transactions on Very Large Scale Integration Systems, vol. 9 no. 6 , 2001, pp. 743 -756 [7] ‘SystemC Version 2.0 User's Guide’, www.systemc.org, 2002 8