VLSI Design of a 16-bit RISC Vector Processor for Computing... International Journal of Engineering Trends and Technology (IJETT) - Volume4...

advertisement
International Journal of Engineering Trends and Technology (IJETT) - Volume4 Issue8- August 2013
VLSI Design of a 16-bit RISC Vector Processor for Computing Applications
Akanksha,
Mtech Scholar, ECE Deptt, Lingayas university
Faridabad, Harayana, India
Abstract: This paper includes the designing of 16Bit RISC processor and modeling of its
components using VHDL. The implementation
strategies have been borrowed from most popular
MIPS architecture up to certain extent. The
instruction set adopted here is extremely simple
that gives an insight into the kind of hardware
which should be able to execute the set of
instructions properly. Along with sequential and
combinational building blocks of NON- pipelined
processor such as adders and registers more
complex blocks i.e. ALU and Memories had been
designed and simulated. The tools which had been
used throughout the project work are XILINX
v14.1 ISE Design simulator. For synthesis purpose
the targeted FPGA device technology was
VIRTEX 6. The processor is powerful enough to
be used as a stand-alone processing element and is
generic enough to be used within multi-processor
System on Chip.
The processor has been designed to optimize the
size i.e. space it acquire on FPGA. We have
further tested the processor for scientific
computing tasks and verified its performance by
executing a particular task over it and some other
processors and comparing the execution time of
all.
Keywords: RISC processor, Scientific Computing,
VLSI, FPGA, Computer Architecture, Real-time
computing systems, System
Arvind Pathak
Asstt Prof., ECE deptt, Lingayas University
Faridabad, Harayana, India
applications in real-time computing systems. The
design has been made in such a way that it is
powerful enough to be used as a stand-alone
processor and generic enough to be used in multiprocessor SoC environment.
The processor is implemented on an FPGA as against
the tradition of implementing it as an ASIC chip.
FPGAs are bit programmable computing devices
which offer ample quantities of logic and register
resources that can be easily adapted to develop a
number of digital hardware for large applications like
DSP and embedded systems [2 – 4]. There are two
reasons for making such a choice. One is the
advantage of reconfigurability present in FPGA. We
can always suitably change the system parameters of
the design in accordance with the requirements of the
target application. The second reason is the high cost
involved in ASIC chip fabrication. With off the shelf
available FPGAs one can use the proposed processor
and use it for applications without having to wait for
the full fabrication cycle. The benefit of this approach
would be in cases where time-to-market is critical.
The paper has been organized as follows: in the next
section we will review the design of the processor
and discuss its basic working. In Section III, we will
discuss in brief the implementation of the processor.
Section IV is dedicated to the discussion of its
advantages and disadvantages followed by Section V
where we will conclude the paper with future
directions of this work.
II. PROCESSOR DESIGN
I. INTRODUCTION
In today’s era of high speed systems and ubiquitous
computing, the need for real-time computing systems
is always on the rise. These computing systems must
operate within stringent requirements that are often at
the intersection of the conflict between speed and
area. Increasing complexity of signal, image and
control processing in modern computing applications
requires very high computational power. This power
can be achieved by high performance programmable
components like RISC or CISC processors, DSPs etc
and non-programmable specific chips like ASICs or
FPGA based hardware [1]. In this paper we have
presented the design of a 16-bit RISC processor for
ISSN: 2231-5381
The design of our processor is based on certain
assumptions. One assumption is that the design must
have few registers as opposed to the common
wisdom of having as many registers as possible. Such
an assumption can be justified by the fact that FPGAs
have internal memories that are as fast as registers
and the fact that for pre-emptive multitasking, a small
number of registers leads to faster context switching.
Utilizing these ideas early in the design phase leads
to better chip design and resource planning for
FPGAs.Another assumption that we have followed
prior to the design of the processor is that RISC
based processor is more suitable for real-time
computing systems than a CISC processor. Our
http://www.ijettjournal.org
Page 3247
International Journal of Engineering Trends and Technology (IJETT) - Volume4 Issue8- August 2013
justification behind this is that having RISC
architecture reduces the complexity of the design and
also it becomes possible to speed-up the execution
time. With CISC processor it is possible to speed up
the operation time but it will prove to be difficult.
Also, with the availability of cheap memory devices
thanks to VLSI technology, size of the program code
is no longer a limitation as much as the system
performance and throughput is. Also it has been
witnessed in the mobile computing realm that RISC
processing is much more tuned for better
performance than CISC architectures.
With the above assumptions in mind, we have
designed our 16-bit processor. The basic block
diagram of the processor can be seen in Fig. 1 below
The processor has 8 KB of on-chip RAM for faster
data execution. The processor has 16bit internal
counter clocked at 20 MHz which is useful for
measuring short intervals with high precision. The
processor is also equipped with serial transmitter and
receiver for use in communication systems.
We have used vector processing in our design. Vector
processors are special purpose computers that match
a range of (scientific) computing tasks. These tasks
usually consist of large active data sets, often poor
locality, and long run times. In addition, vector
processors provide vector instructions.
These instructions operate in a pipeline (sequentially
on all elements of vector registers), and in current
machines.
All the signals have their usual meanings. The data
core of the processor is responsible for computing.
The memory signals are also extended to the outside
of this module in order to connect to an optional
external SRAM and I/O.
The above figure shows the basic block diagram of
processor and in this I have modified the design to
optimize the size processor acquire on FPGA. In this
design I have combined decoder and ALU together as
execution unit, concept behind this approach is the
vector data which can to be processed and executed
simultaneously if decoding and execution done in a
pipelined way together. Below in the Fig 2 is the
design of designed processor.
ISSN: 2231-5381
III. PROCESSOR IMPLEMENTATION
The entire processor has been designed using VHDL
and implemented on Virtex FPGA from Xilinx [5]
[6]. The Virtex user programmable gate array
comprises two major configurable elements viz.
Configurable logic blocks (CLBs) and input/output
blocks (IOBs). Each CLB is composed of two slices.
A slice contains 4-input 1-output Look-up Table
(LUT) and two registers. Interconnections between
these elements are configured by multiplexers
controlled by SRAM cells programmed by a user’s
bitstream. This structure allows a very powerful
method of implementing arbitrary complex digital
logic.
The hardware has been simulated in Windows xp
environment. The C compiler used to test the
applications is the GCC obtained from MinGW
which is the minimalist package of GNU tools and
packages for Windows [7]. C code written for sample
applications was converted to assembly language by
the compiler and then tested on the processor.
http://www.ijettjournal.org
Page 3248
International Journal of Engineering Trends and Technology (IJETT) - Volume4 Issue8- August 2013
The RTL view of the processor generated by Xilinx
ISE Design Suite is shown below in Fig 3:-
The instruction set of the processor contains about 45
instructions for 16-bit and occasionally 32-bit
operations. The entire processor uses WISHBONE
compliant bus and operates in MASTER mode. The
general interface diagram of
WISHBONE
specification is shown in Figure 4. This makes it
easier for interfacing with other cores in multi-core
SoC environment
The FPGA device utilization summary is shown
below :Logic
utilization
used
available
%utilization
Number
of splices
Number
of slice
flip flops
Number
of 4 input
LTUs
Number
of bonded
IOBs
Number
of
BRAMs
Number
of GCLKs
418
768
54%
229
1536
14%
789
1536
51%
48
124
38%
2
4
50%
1
8
12%
IV. TESTING AND DESIGN CRITICISM
Fig. 4: Wishbone Interface of Master and Slave
MASTER and SLAVE interfaces are interconnected
with a set of signals that permit them to exchange
data. For descriptive purposes these signals are
cumulatively known as a bus,and are contained
within a functional module called the INTERCON.
Address, data and other information is impressed
upon this bus in the form of bus cycles.
ISSN: 2231-5381
There are several advantages of our design. The first
is the use of vector processing for speed-up of
instruction execution. Second is the generic design of
the processor at the HDL level such that it is possible
to re-configure the processor for specific
applications. Another advantage of this design is the
use of lesser internal registers and more FPGA
memories for reducing the design complexity of the
processor while still maintaining adequate usage of
FPGA resources.
However, despite having several advantages, there
are few drawbacks of the design. We always have to
write the I/O module specific for particular FPGA
board and there is no provision of a general I/O
module that is applicable across a broad range of
FPGA boards. This is an obvious problem which is
bound to exist if it is to maintain specific bus
specifications or hardware specifications. But this
problem can be solved if a generic I/O module is
designed that can be used across a large number of
development boards.
Another apparent drawback is that the HDL code is
written specifically for a target FPGA (Virtex in this
case) and hence may not be directly usable for other
FPGA platforms. A general code in this case would
have been more useful. However, the present design
was concerned more with demonstrating the power of
http://www.ijettjournal.org
Page 3249
International Journal of Engineering Trends and Technology (IJETT) - Volume4 Issue8- August 2013
simple vector processing for fast instruction
execution and also to demonstrate that FPGA based
processors can be equally useful for real-time
operating systems rather than to design just a single
processor.
The designed processor was tested by solving a
scientific computational task. The task involved
solving the set of Lorenz Equations which have been
modeled several times before as a perfect example of
chaotic system and is also believed to represent
certain dynamics of weather forecasting.
Initially, the Lorenz equations were written in C
language and compiled using GCC compiler. The
compiled target was executed on several processors
(including ours) for benchmarking tests and the
results are as follows :-
either as a stand-alone processing element or as a part
of multi-processor SoC. The processor is
WISHBONE compliant. The implementation of
vector processing in the architecture substantially
enhanced its processing capability wherein most
instructions were executed within a single machine
cycle rather as a rule than as a design.
We intend to further take this work forward by
designing a generic I/O module applicable for any
standard FPGA. Also its performance in the context
of a real-time system needs to be thoroughly
evaluated and some of its features extended. Design
of systems like real-time communication system,
intelligent control and biometric systems are some
ideas that we want to explore using this processor.
REFERENCES
[1] L. Kaouane et al, A Methodology to Implement Realtime Applications on Reconfigurable Circuits, available at
http://www-rocq.inria.fr/syndex
[2] P. Kohlig et al, FPGA Implementation of high
performance FIR Filters, In Proc. International Symposium
on Circuits and Systems, 1997
May be
Our
proce
ssor
1 min
21
sec
No(to
be
includ
ed
later)
yes
yes
yes
[5] Xilinx Corporation. “Xilinx breaks one million gate
barrier with delivery of new virtex series” October 1998.
Param
eters
Process
or A
Process
or B
Process
or C
Execut
ion
time
Suppor
t for
debug
ging
1 min
20 sec
1 min
22 sec
1 min
20 sec
yes
yes
no
Multi
core
suppor
t
Cycle
accurat
e
simula
tion
No
no
yes
yes
[3] M. Shand, Flexible Image Acquisition using
reconfigurable hardware In Proc. of the IEEE Workshop
on Field Programmable Custom Computing Machines,
April 1995.
[4] J. Villasenor, Video Communication using rapidly
reconfigurable hardware, IEEE Transactions on Circuits
and Systems for Video Technology, Vol. 5, No. 12, pp. 565
– 567, Dec. 1995.
[6] Xilinx Corporation. Virtex Data Sheet 2000
[7] www.mingw.org
In the above tests carried out, Processor A was
STMicro 16-bit core, Processor B was Freescale 16bit core, Processor C was generic 16-bit benchmark
core by OpenCores. All of these basic cores were
chosen at random and without their on-chip resource
usage.
CONCLUSION
In this paper we showed the design of a 16-bit RISC
processor. The processor can be used within a large
class of general purpose computing applications as
well as for scientific computing tasks. It can be used
ISSN: 2231-5381
http://www.ijettjournal.org
Page 3250
Download