International Journal of Engineering Trends and Technology (IJETT) - Volume4 Issue8- August 2013 VLSI Design of a 16-bit RISC Vector Processor for Computing Applications Akanksha, Mtech Scholar, ECE Deptt, Lingayas university Faridabad, Harayana, India Abstract: This paper includes the designing of 16Bit RISC processor and modeling of its components using VHDL. The implementation strategies have been borrowed from most popular MIPS architecture up to certain extent. The instruction set adopted here is extremely simple that gives an insight into the kind of hardware which should be able to execute the set of instructions properly. Along with sequential and combinational building blocks of NON- pipelined processor such as adders and registers more complex blocks i.e. ALU and Memories had been designed and simulated. The tools which had been used throughout the project work are XILINX v14.1 ISE Design simulator. For synthesis purpose the targeted FPGA device technology was VIRTEX 6. The processor is powerful enough to be used as a stand-alone processing element and is generic enough to be used within multi-processor System on Chip. The processor has been designed to optimize the size i.e. space it acquire on FPGA. We have further tested the processor for scientific computing tasks and verified its performance by executing a particular task over it and some other processors and comparing the execution time of all. Keywords: RISC processor, Scientific Computing, VLSI, FPGA, Computer Architecture, Real-time computing systems, System Arvind Pathak Asstt Prof., ECE deptt, Lingayas University Faridabad, Harayana, India applications in real-time computing systems. The design has been made in such a way that it is powerful enough to be used as a stand-alone processor and generic enough to be used in multiprocessor SoC environment. The processor is implemented on an FPGA as against the tradition of implementing it as an ASIC chip. FPGAs are bit programmable computing devices which offer ample quantities of logic and register resources that can be easily adapted to develop a number of digital hardware for large applications like DSP and embedded systems [2 – 4]. There are two reasons for making such a choice. One is the advantage of reconfigurability present in FPGA. We can always suitably change the system parameters of the design in accordance with the requirements of the target application. The second reason is the high cost involved in ASIC chip fabrication. With off the shelf available FPGAs one can use the proposed processor and use it for applications without having to wait for the full fabrication cycle. The benefit of this approach would be in cases where time-to-market is critical. The paper has been organized as follows: in the next section we will review the design of the processor and discuss its basic working. In Section III, we will discuss in brief the implementation of the processor. Section IV is dedicated to the discussion of its advantages and disadvantages followed by Section V where we will conclude the paper with future directions of this work. II. PROCESSOR DESIGN I. INTRODUCTION In today’s era of high speed systems and ubiquitous computing, the need for real-time computing systems is always on the rise. These computing systems must operate within stringent requirements that are often at the intersection of the conflict between speed and area. Increasing complexity of signal, image and control processing in modern computing applications requires very high computational power. This power can be achieved by high performance programmable components like RISC or CISC processors, DSPs etc and non-programmable specific chips like ASICs or FPGA based hardware [1]. In this paper we have presented the design of a 16-bit RISC processor for ISSN: 2231-5381 The design of our processor is based on certain assumptions. One assumption is that the design must have few registers as opposed to the common wisdom of having as many registers as possible. Such an assumption can be justified by the fact that FPGAs have internal memories that are as fast as registers and the fact that for pre-emptive multitasking, a small number of registers leads to faster context switching. Utilizing these ideas early in the design phase leads to better chip design and resource planning for FPGAs.Another assumption that we have followed prior to the design of the processor is that RISC based processor is more suitable for real-time computing systems than a CISC processor. Our http://www.ijettjournal.org Page 3247 International Journal of Engineering Trends and Technology (IJETT) - Volume4 Issue8- August 2013 justification behind this is that having RISC architecture reduces the complexity of the design and also it becomes possible to speed-up the execution time. With CISC processor it is possible to speed up the operation time but it will prove to be difficult. Also, with the availability of cheap memory devices thanks to VLSI technology, size of the program code is no longer a limitation as much as the system performance and throughput is. Also it has been witnessed in the mobile computing realm that RISC processing is much more tuned for better performance than CISC architectures. With the above assumptions in mind, we have designed our 16-bit processor. The basic block diagram of the processor can be seen in Fig. 1 below The processor has 8 KB of on-chip RAM for faster data execution. The processor has 16bit internal counter clocked at 20 MHz which is useful for measuring short intervals with high precision. The processor is also equipped with serial transmitter and receiver for use in communication systems. We have used vector processing in our design. Vector processors are special purpose computers that match a range of (scientific) computing tasks. These tasks usually consist of large active data sets, often poor locality, and long run times. In addition, vector processors provide vector instructions. These instructions operate in a pipeline (sequentially on all elements of vector registers), and in current machines. All the signals have their usual meanings. The data core of the processor is responsible for computing. The memory signals are also extended to the outside of this module in order to connect to an optional external SRAM and I/O. The above figure shows the basic block diagram of processor and in this I have modified the design to optimize the size processor acquire on FPGA. In this design I have combined decoder and ALU together as execution unit, concept behind this approach is the vector data which can to be processed and executed simultaneously if decoding and execution done in a pipelined way together. Below in the Fig 2 is the design of designed processor. ISSN: 2231-5381 III. PROCESSOR IMPLEMENTATION The entire processor has been designed using VHDL and implemented on Virtex FPGA from Xilinx [5] [6]. The Virtex user programmable gate array comprises two major configurable elements viz. Configurable logic blocks (CLBs) and input/output blocks (IOBs). Each CLB is composed of two slices. A slice contains 4-input 1-output Look-up Table (LUT) and two registers. Interconnections between these elements are configured by multiplexers controlled by SRAM cells programmed by a user’s bitstream. This structure allows a very powerful method of implementing arbitrary complex digital logic. The hardware has been simulated in Windows xp environment. The C compiler used to test the applications is the GCC obtained from MinGW which is the minimalist package of GNU tools and packages for Windows [7]. C code written for sample applications was converted to assembly language by the compiler and then tested on the processor. http://www.ijettjournal.org Page 3248 International Journal of Engineering Trends and Technology (IJETT) - Volume4 Issue8- August 2013 The RTL view of the processor generated by Xilinx ISE Design Suite is shown below in Fig 3:- The instruction set of the processor contains about 45 instructions for 16-bit and occasionally 32-bit operations. The entire processor uses WISHBONE compliant bus and operates in MASTER mode. The general interface diagram of WISHBONE specification is shown in Figure 4. This makes it easier for interfacing with other cores in multi-core SoC environment The FPGA device utilization summary is shown below :Logic utilization used available %utilization Number of splices Number of slice flip flops Number of 4 input LTUs Number of bonded IOBs Number of BRAMs Number of GCLKs 418 768 54% 229 1536 14% 789 1536 51% 48 124 38% 2 4 50% 1 8 12% IV. TESTING AND DESIGN CRITICISM Fig. 4: Wishbone Interface of Master and Slave MASTER and SLAVE interfaces are interconnected with a set of signals that permit them to exchange data. For descriptive purposes these signals are cumulatively known as a bus,and are contained within a functional module called the INTERCON. Address, data and other information is impressed upon this bus in the form of bus cycles. ISSN: 2231-5381 There are several advantages of our design. The first is the use of vector processing for speed-up of instruction execution. Second is the generic design of the processor at the HDL level such that it is possible to re-configure the processor for specific applications. Another advantage of this design is the use of lesser internal registers and more FPGA memories for reducing the design complexity of the processor while still maintaining adequate usage of FPGA resources. However, despite having several advantages, there are few drawbacks of the design. We always have to write the I/O module specific for particular FPGA board and there is no provision of a general I/O module that is applicable across a broad range of FPGA boards. This is an obvious problem which is bound to exist if it is to maintain specific bus specifications or hardware specifications. But this problem can be solved if a generic I/O module is designed that can be used across a large number of development boards. Another apparent drawback is that the HDL code is written specifically for a target FPGA (Virtex in this case) and hence may not be directly usable for other FPGA platforms. A general code in this case would have been more useful. However, the present design was concerned more with demonstrating the power of http://www.ijettjournal.org Page 3249 International Journal of Engineering Trends and Technology (IJETT) - Volume4 Issue8- August 2013 simple vector processing for fast instruction execution and also to demonstrate that FPGA based processors can be equally useful for real-time operating systems rather than to design just a single processor. The designed processor was tested by solving a scientific computational task. The task involved solving the set of Lorenz Equations which have been modeled several times before as a perfect example of chaotic system and is also believed to represent certain dynamics of weather forecasting. Initially, the Lorenz equations were written in C language and compiled using GCC compiler. The compiled target was executed on several processors (including ours) for benchmarking tests and the results are as follows :- either as a stand-alone processing element or as a part of multi-processor SoC. The processor is WISHBONE compliant. The implementation of vector processing in the architecture substantially enhanced its processing capability wherein most instructions were executed within a single machine cycle rather as a rule than as a design. We intend to further take this work forward by designing a generic I/O module applicable for any standard FPGA. Also its performance in the context of a real-time system needs to be thoroughly evaluated and some of its features extended. Design of systems like real-time communication system, intelligent control and biometric systems are some ideas that we want to explore using this processor. REFERENCES [1] L. Kaouane et al, A Methodology to Implement Realtime Applications on Reconfigurable Circuits, available at http://www-rocq.inria.fr/syndex [2] P. Kohlig et al, FPGA Implementation of high performance FIR Filters, In Proc. International Symposium on Circuits and Systems, 1997 May be Our proce ssor 1 min 21 sec No(to be includ ed later) yes yes yes [5] Xilinx Corporation. “Xilinx breaks one million gate barrier with delivery of new virtex series” October 1998. Param eters Process or A Process or B Process or C Execut ion time Suppor t for debug ging 1 min 20 sec 1 min 22 sec 1 min 20 sec yes yes no Multi core suppor t Cycle accurat e simula tion No no yes yes [3] M. Shand, Flexible Image Acquisition using reconfigurable hardware In Proc. of the IEEE Workshop on Field Programmable Custom Computing Machines, April 1995. [4] J. Villasenor, Video Communication using rapidly reconfigurable hardware, IEEE Transactions on Circuits and Systems for Video Technology, Vol. 5, No. 12, pp. 565 – 567, Dec. 1995. [6] Xilinx Corporation. Virtex Data Sheet 2000 [7] www.mingw.org In the above tests carried out, Processor A was STMicro 16-bit core, Processor B was Freescale 16bit core, Processor C was generic 16-bit benchmark core by OpenCores. All of these basic cores were chosen at random and without their on-chip resource usage. CONCLUSION In this paper we showed the design of a 16-bit RISC processor. The processor can be used within a large class of general purpose computing applications as well as for scientific computing tasks. It can be used ISSN: 2231-5381 http://www.ijettjournal.org Page 3250