16-Bit RISC Processor Design for Convolution Application Anand Nandakumar Shardul

advertisement
International Journal of Advancements in Research & Technology, Volume 2, Issue 9, September-2013
ISSN 2278-7763
106
16-Bit RISC Processor Design for Convolution
Application
Anand Nandakumar Shardul
1
E & TC Department, Acropolis Institute of Research & Technology, RGPV University, Bhopal
1
shardul.anand@gmail.com
Abstract— In this project, we propose a 16-bit non-pipelined
RISC processor, which is used for signal processing applications.
The processor consists of the blocks, namely, program counter,
clock control unit, ALU, IDU and registers. Advantageous
architectural modifications have been made in the incremented
circuit used in program counter and carry select adder unit of
the ALU in the RISC CPU core. Furthermore, a high speed and
low power modified modifies multiplier has been designed and
introduced in the design of ALU. The RISC processor has been
designed for executing 27-instruction set. It is expandable up to
32 instructions, based on the user requirements.
Keywords— Arithmetic and Logical Unit (ALU), Program
Counter (PC), Register file (REG), Instruction Decoder Unit
(IDU) and Clock Control Unit (CCU).
INTRODUCTION
Before the RISC philosophy became prominent,
many computer architects tried to bridge the socalled semantic gap, i.e. to design instruction sets that directly
supported high-level programming constructs such as
procedure calls, loop control, and complex addressing modes,
allowing data structure and array accesses to be combined into
single instructions. Instructions are also typically highly
encoded in order to further enhance the code density. The
compact nature of such instruction sets results in
smaller program sizes and fewer (slow) main memory
accesses, which at the time (early 1960s and onwards) resulted
in a tremendous savings on the cost of computer memory and
disc storage, as well as faster execution. It also meant
good programming productivity even in assembly language,
as high level languages such as FORTRAN.
IJOART
The trend in the recent past shows the RISC processors clearly
outsmarting the earlier CISC processor architectures. The
reasons have been the advantages, such as its simple, flexible
and fixed instruction format and hardwired control logic,
which paves for higher clock speed, by eliminating the need
for microprogramming. The combined advantages of high
speed, low power, area efficient and operation-specific design
possibilities have made the RISC processor ubiquitous. The
main feature of the RISC processor is its ability to support
single cycle operation, meaning that the instruction is fetched
from the instruction memory at the maximum speed of the
memory. RISC processors in general, are designed to achieve
this by pipelining, where there is a possibility of stalling of
clock cycles due to wrong instruction fetch when jump type
instructions are encountered. This reduces the efficiency of
the processors. This paper describes a RISC architecture in
which, single cycle operation is obtained without using a
pipelined design. It averts possible stalling of clock cycles in
effect. The development of CMOS technology provides very
high density and high performance integrated circuits. The
performance provided by the existing devices has created a
never-ending greed for increasingly better performing devices.
This predicts the use of a whole RISC processor as a basic
device by the year 2020.
Copyright © 2013 SciResPub.
Incitements and benefits
In the 1970s, analysis of high level languages indicated some
complex machine language implementations and it was
determined that new instructions could improve performance.
Some instructions were added that were never intended to be
used in assembly language but fit well with compiled high
level languages. Compilers were updated to take advantage of
these instructions. The benefits of semantically rich
instructions with compact encodings can be seen in modern
processors as well, particularly in the high performance
segment where caches are a central component (as opposed to
most embedded systems). This is because these fast, but
complex and expensive, memories are inherently limited in
size, making compact code beneficial. Of course, the
fundamental reason they are needed is that main memories
(i.e. dynamic RAM today) remain slow compared to a (high
performance) CPU-core.
The RISC idea
The circuitry that performs the actions defined by the
microcode in many (but not all) CISC processors is, in itself, a
processor which in many ways is reminiscent in structure to
very early CPU designs. In the early 1970s, this gave rise to
ideas to return to simpler processor designs in order to make it
more feasible to cope without (then relatively large and
expensive) ROM tables and/or PLA structures for sequencing
and/or decoding. The first (retroactively) RISC-
IJOART
International Journal of Advancements in Research & Technology, Volume 2, Issue 9, September-2013
ISSN 2278-7763
labelled processor (IBM 801 - IBMs Watson Research Centre,
mid-1970s) was a tightly pipelined simple machine originally
intended to be used as an internal microcode kernel, or engine,
in CISC designs, but also became the processor that
introduced the RISC idea to a somewhat larger public.
Simplicity and regularity also in the visible instruction set
would make it easier to implement overlapping processor
stages (pipelining) at the machine code level (i.e. the level
seen by compilers.) However, pipelining at that level was
already used in some high performance CISC
"supercomputers" in order to reduce the instruction cycle time
(despite the complications of implementing within the limited
component count and wiring complexity feasible at the time).
Internal microcode execution in CISC processors, on the other
hand, could be more or less pipelined depending on the
particular design, and therefore more or less akin to the basic
structure of RISC processors.
Why RISC?
Various attempts have been made to increase the instruction
execution rates by overlapping the execution of more than one
instruction since the earliest day of computing. The most
common ways of overlapping are pre-fetching, pipelining and
superscalar operation.
1) Pre-fetching: The process of fetching next instruction or
instructions into an event queue before the current instruction
is complete is called pre-fetching. The earliest 16-bit
microprocessor, the Intel 8086/8, pre-fetches into a non-board
queue up to six bytes following the byte currently being
executed thereby making them immediately available for
decoding and execution, without latency.
107
Detail of Logical Blocks
The block diagram of the 16-bit RISC CPU. The proposed
RISC CPU consists of five blocks, namely, Arithmetic and
Logical Unit (ALU), Program Counter (PC), Register file
(REG), Instruction Decoder Unit (IDU) and Clock Control
Unit (CCU).
Architecture
The architecture of the proposed RISC CPU is a uniform16bit instruction format, single cycle non-pipelined processor. It
has a load/store architecture, where the operations will only be
performed on registers, and not on memory locations. It
follows the classical von-Neumann architecture with just one
common memory bus for both instructions and data. A total of
27 instructions are designed as a first step in the process of
development of the processor. The instruction set consists of
Logical, Immediate, Jump, Load, store and HALT type of
instructions. The Halt instruction acts as a border line between
the instruction and data memory. This offers the flexibility to
the programmer, who uses this processor core to define their
own instruction and data memory within the allotted 64
memory registers. Each of the register is of 16-bits width
capacity.
1) Program Counter: The Program Counter (PC) is a 16-bit
latch that holds the memory address of location, from which
the next machine language instruction will be fetched by the
processor. The proposed PC is the largest sub-block and
second to the control unit in complexity. It controls the flow
of the instructions execution and it ensures the logical
operation flow of the processor. It performs the two operations,
namely, incrementing and loading. For most instructions, the
PC is simply incremented in preparation for the following
instruction or the following instruction nibbles. In general,
abnormal conventional adder circuit will be used for
incrementing action. However, it leads to increased hardware
use along with more power dissipation. Hence, this work
strives for a low power and novel incremented circuit design.
In this design, we employ a 6-bit pointer to indicate the
instruction memory. It additionally uses a 6-bit pointer to
point to the data memory, which will be used only when a
Load/Store instruction is encountered for execution.
2) Arithmetic and Logic unit: The arithmetic and logic unit
(ALU) performs arithmetic and logic operations. It also
performs the bit operations such as rotate and shift by a
defined number of bit positions. The proposed ALU contains
three sub-modules, viz. arithmetic, logic and shift modules.
The arithmetic unit involves the execution of addition
operations and generates Sign flag and Zero flag as per the
result shown in the process. In order to reduce the complexity
of the adder circuits used in the arithmetic unit of the RISC
CPU, a very fast and low power carry select adder circuit has
been introduced. The ALU also consists of a modified
multiplier, which uses compressor circuits to achieve low
power and improved speed of operation. The multiplier is
designed to execute in a single cycle. Hence, it satisfies the
requirement of the RISC design, to execute single cycle
IJOART
2) Pipelining: Pipelining instructions means starting or
issuing an instruction prior to the completion of the currently
executing one. The current generation of machines carries this
to a considerable extent. The PowerPC 601 has 20 separate
pipeline stages in which various portions of various
instructions are executing simultaneously.
3) Superscalar operation: Superscalar operation refers to a
processor that can issue more than one instruction
simultaneously. The PPC 601 has independent integer,
floating-point and branch units, each of which can be
executing an instruction simultaneously. CISC machine
designers incorporated pre-fetching, pipelining and
superscalar operation in their designs but with instructions that
were long and complex and operand access depending on
complex address arithmetic, it was difficult to make efficient
use of these new speed-up techniques. Furthermore, complex
instructions and addressing modes hold down clock speed
compared to simple instructions. RISC machines were
designed to efficiently exploit the caching, pre-fetching,
pipelining and superscalar methods that were invented in the
days of CISC machines.
Copyright © 2013 SciResPub.
IJOART
International Journal of Advancements in Research & Technology, Volume 2, Issue 9, September-2013
ISSN 2278-7763
instructions the shift module is used for executing instructions
such as rotation and shift operations. The shift module is
mandatory for signal processing applications, which needs
division by 2.This is achieved by a single right shift operation.
The logic unit is used to perform logical operations, such as,
Ex-or, OR, and AND. The Data out of each ALU operation is
written back into the corresponding destination register, along
with the flags updated. In order to maintain simplicity of the
design, the carry out of the ALU is not taken into
consideration.
3) Register File: The register file consists of 8 general purpose
registers of 16-bits capacity each. These register files are
utilized during the execution of arithmetic and data-centric
Instructions. It is fully visible to the programmer. It can be
addressed as both source and destination using a 3-bit
identifier. The register addresses are of 3-bit length, with the
range of 000 to 111. The load instruction is used to load the
values into the registers and store instruction is used to
retrieve the values back to the memory to obtain the processed
outputs back from the processor. The Link register is used to
hold the addresses of the corresponding memory locations.
4) Instruction Decoder Unit: Our instruction set is limited yet
comprehensive. Since our data bus is only 5 bits wide, it was
decided to keep the number of instructions supported within
32 for easier implementation. At present, only 27 instructions
have been implemented. The rest have been reserved for
porting digital processing applications into our processor. The
decoder units decodes the instruction and gives out the 3-bit
source and destination addresses respectively, depending on
the op-code’s operation and it also decides whether the write
back circuit has to be enabled or not .
5) Clock Control Unit: It fetches the code of all of the
instructions in the program. It directs the operation of the
other units by providing timing and control signals. All
computer resources are managed by the CU. It directs the flow
of data between the Central Processing Unit (CPU) and the
other devices. The control unit is the circuitry that controls the
flow of data through the processor, and coordinates the
activities of the other units within it. In a way, it is the "brain
within the brain", as it controls what happens inside the
processor, which in turn controls the rest of the computer.
108
Fig. 1 Block Diagram of the Processor.
IJOART
A. Figures and Flow chart.
Figures show the representation of the block of the
processor and diagrammatic view for the same. As mention
earlier the system is as shown in figure. In these all the
registers are assign a task at once. The process takes place in
simultaneous manner in which all register and other need
peripherals devices takes the task at same time and process it.
The flow chart elaborates us how the flow exists with the
processor. Initially all the register & memory is made ready,
than the fetching of the data takes place and the data is being
processed in the same register & memory register help. It’s a
process which occurs in the same simultaneous manner. All
the processor flow can be welled understood in the flow chart.
Fig. 2 Flow Chart
Copyright © 2013 SciResPub.
IJOART
International Journal of Advancements in Research & Technology, Volume 2, Issue 9, September-2013
ISSN 2278-7763
Result
The RISC processor described above is designed using
Verilog HDL mixed language and is simulated using Xilinx
ise 14.2. The proper functioning of the processor is validated.
The simulation result shows that the processor is capable of
implementing the given instruction in single clock cycle,
thereby satisfying the basic requirements of the RISC
processor. In order to evaluate the system performance, usage
of synthesis software were used to map the proposed
processor on a target library
Conclusion
The design of a single cycle 16-Bit non-pipelined RISC
processor for its application towards convolution application
has been presented. Novel adder and multiplier structures
have been employed in the RISC architecture. The processor
has been designed for executing the instruction set comprising
of 27 instructions in total. It is shown expandable up to 32
instructions, based on the user requirements. The processor
design promises its use towards any signal processing
applications.
.
Acknowledgment
We place our gratitude on record to the Department of
Electronics and Communication Engineering, Acropolis
Institute of Research & Technology, Indore for the support
rendered to us in carrying out this work.
109
References
[1] Robert S. Plachno, VP of Audio ―A True Single Cycle
RISC Processor without Pipelining‖. ESS Design White Paper
– RISC Embedded Controller.
[2] Youngjoon Shin, Chanho Lee, and Yong Moon, ―A Low
Power 16-Bit RISC Microprocessor Using ECRL Circuits‖,
ETRI Journal, Volume 26, Number 6, December 2004.
[3] Yasuhiro Takahashi, Toshikazu Sekine, and Michio
Yokoyama, ―Design of a 16-bit Non-pipelined RISC CPU in
a Two Phase Drive Adiabatic Dynamic CMOS Logic,‖
International Journal of Computer and Electrical Engineering,
Vol. 1, No. 1, April 2009 1793-8198.
[4] V. B. Saambhavi and V. S. Kanchana Bhaaskaran, A 16Bit RISC Microprocessor Using DCPAL Circuits.
International Journal of Advanced Engineering and
Technology (IJAET), E-ISSN-0976-3945, Vol.II, Issue I,
January-March 2011, pp. 154-162
[5] J.S. Denker, ―A Review of Adiabatic Computing,‖ IEEE
Symp. Low Power Electronics, 1994, pp. 94-97.
[6] H. Mahmoodi-Meinnand, A. Afzali-Kusha, and M.
Nourani, ―Adiabatic Carry Look-Ahead Adder with Efficient
Power Clock Generator,‖ IEEE Proc., vol. 148, 2001, pp. 229234.
[7] K. Nishimura, T. Kudo, and H. Amano, ―Educational 16bit microprocessor PICO-16,‖ Proc. 3rd Japanese FPGA/PLD
design conference and exhibit (Japanese Edition), Tokyo, July
19–21, 1995, pp. 589–595.
[8] Samiappa Sakthikumaran et al., ―A Very Fast and Low
Power Incrementer and Decrementer Circuits‖, International
Journal of Computer Communication and Information System
(IJCCIS) Vol2. No.1 – 2011, pp. 200-203.
[9] Samiappa Sakthikumaran et al., ―A Very Fast and Low
Power Carry Select Adder Circuits‖, 3rd International
Conference on Electronics Computer Technology - ICECT
2011.
[11] Keshab K.Parhi, VLSI Digital Signal Processing
Systems, Wiley India
IJOART
Copyright © 2013 SciResPub.
IJOART
Download