A REPORT ON DESIGN OF 16 – BIT RISC PROCESSOR Under the guidance of Dr. Chandra Shekhar, Director, CEERI, Pilani. By Raj Kumar Singh Parihar Shivananda Reddy 2002A3PS013 2002A3PS107 BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE PILANI – 333031 May 2006 1 A REPORT ON DESIGN OF 16 – BIT RISC PROCESSOR Under the guidance of Dr. Chandra Shekhar, Director, CEERI, Pilani. By Raj Kumar Singh Parihar Shivananda Reddy 2002A3PS013 2002A3PS107 B.E. (Hons) Electrical and Electronics Engineering Towards the partial fulfillment of the course BITS C335, Computer Oriented Project BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE PILANI – 333031 May 2006 2 ACKNOWLEDGEMENTS We are very thankful Dr. Chandra Shakhar, Director, CEERI – Pilani for giving us an opportunity to work under her guidance. The love and faith that she showered on us helped us to go on and complete the project successfully. We would like to extend our sincere thank to Dr. Anu Gupta, EEE, BITS - Pilani for their valuable suggestions and guidance, Mr. Pawan Sharma, In-charge OLAB for their support, Mr. Ninad Kothari, TA, BITS - Pilani for Encouragement. Last but not the least we would like to thank our Friends, Parents, Family members and invisible force which provided moral support and spiritual strength, which helped us completing the work successfully. 3 BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE PILANI (Rajasthan) – 333031 CERTIFICATE This is to certify that the project entitled “Design of 16 bit RISC Processor” is the bonafide work of Shivananda Reddy (2002A3PS107) done in the second semester of the academic year 2005-2006. He has duly completed his project and has fulfilled all the requirements of the course BITS C335, Computer Laboratory Oriented Project, to my satisfaction. Dr. Chandra Shekhar Director, CEERI, Pilani Date: 4 BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE PILANI (Rajasthan) – 333031 CERTIFICATE This is to certify that the project entitled “Design of 16 bit RISC Processor” is the bonafide work of Raj Kumar Singh Parihar (2002A3PS013) done in the second semester of the academic year 2005-2006. He has duly completed his project and has fulfilled all the requirements of the course BITS C335, Computer Laboratory Oriented Project, to my satisfaction. Dr. Chandra Shekhar Director, CEERI, Pilani Date: 5 ABSTRACT This project includes the designing of 16-Bit RISC processor and modeling of its components using Verilog HDL. The implementation strategies have been borrowed from most popular DLX and MIPS architecture up to certain extent. The instruction set adopted here is extremely simple that gives an insight into the kind of hardware which should be able to execute the set of instructions properly. Along with sequential and combinational building blocks of NON- pipelined processor such as adders and registers more complex blocks i.e. ALU and Memories had been designed and simulated. The modeling of ALU which has been done in this project is fully structural starting from half adders. At the end the semi custom layout had been developed for ALU using AMI05 Technology and IC station tool. Complex blocks have been modeled using behavioral approach i.e. Memories, whereas simple blocks i.e. Adders had been done through structural approach. The tools which had been used throughout the project work are Modelsim (Digital Simulation), Leonardo Spectrum (Digital synthesis) and IC station (Digital Semi custom Layout). For synthesis purpose the targeted FPGA device technology was ALTERA, Cyclone, and EP1C6Q240C. A simple sequential block’s performance and figure of merits were observed under the constraints clock frequency: 50 MHz and DRT: 5ns. 6 TABLE OF CONTENTS ACKNOWLEDGEMENT .......................................................................... 3 CERTIFICATE .................................................................Error! Bookmark not defined. ABSTRACT.................................................................................................. 6 1. CEERI: An Introduction ........................................................................ 8 2. RISC : An introduction ........................................................................... 9 3. RISC vs CISC ......................................................................................... 11 3.1 CISC Designs ......................................................................................................... 11 3.2 The bridge toward RISC (Historical factors) ......................................................... 11 3.3 Why RISC? ............................................................................................................ 12 4. RISC: Top level Description and guidelines ........ Error! Bookmark not defined. 4.1 Instruction Set Architecture .....................................Error! Bookmark not defined. 4.2 Microarchitecture ....................................................Error! Bookmark not defined. 5. Design and Simulation: Building Blocks ............................................. 23 5.1 Sequential Element: D- Flip Flop .......................................................................... 23 5.2 Combinational Element: N-Bit add-subtract Module ............................................ 27 6. ASIC Design Flow: Design of a 16-Bit ALU ....................................... 33 7. Implementation: 16-Bit Non-pipelined RISC Processor ................... 44 Appendix-I II III .......................................................................................... 63 Bibliography ................................................................................................ 82 7 1. CEERI: An Introduction Central Electronics engineering Research Institute, popularly known as CEERI, is a constituent establishment of the Council of Scientific and Industrial Research (CSIR), New Delhi. The foundation stone of the institute was laid on September 21, 1953 by the then Prime Minister of India, Pt. Jawaharlal Nehru. The actual R&D work started toward the end of 1958. The institute has blossomed into a center for excellence for the development of technology and for advanced research in electronics. Over the years the institute has developed a number of products and processes and has established facilities to meet the emerging needs of electronics industry. CEERI, Pilani, since its inception, has been working for the growth of electronics in the country and has established the required infrastructure and intellectual man power for undertaking R&D in the following three major areas: 1) Semiconductor Devices 2) Microwave Tubes 3) Electronics Systems The activities of microwave tubes and semiconductor devices areas are done at Pilani whereas the activities of electronic systems area are undertaken at Pilani as well as the two other centers at Delhi and Chennai. The institute has excellent computing facilities with many Pentium computers and SUN/DEC workstations interlined with internet and email facilities via VSAT. The institute has well maintained library with an outstanding collection of books and current journals (and periodicals) published all over the world. CEERI, with its over 700 highly skilled and dedicated staff members, well-equipped laboratories and supporting infrastructure is ever enthusiastic to take up the challenging tasks of research and development in various areas. 8 2. RISC: An Introduction The reduced instruction set computer, or RISC, is a microprocessor CPU design philosophy that favors a smaller and simpler set of instructions that all take about the same amount of time to execute. The most common RISC microprocessors are ARM, DEC Alpha, PA-RISC, SPARC, MIPS, and IBM's PowerPC. The idea was inspired by the discovery that many of the features that were included in traditional CPU designs to facilitate coding were being ignored by the programs that were running on them. Also these more complex features took several processor cycles to be performed. Additionally, the performance gap between the processor and main memory was increasing. This led to a number of techniques to streamline processing within the CPU, while at the same time attempting to reduce the total number of memory accesses. When the controller design become more complex in CISC and the performance was also not up to expectations, people started looking on some other alternatives. It had been found that when a processor talks to the memory the speed gets killed. So the one improvement on CPI was to keep the instruction set very simple. Simple in not the way it works but the way it looks. That’s why we have very few instructions in any typical RISC architecture where processor asks data from memory probably not other than Load and Store. We avoid keeping such addressing modes. The complexity of controller design has been overcome with the help of operands and Opcode bits fixed in instruction register. At the end the pipelining added a new dimension in the speed just with the help of some additional registers. Now what pipeline does is it increases throughput by reducing CPI. The instruction can be executed effectively in one clock cycle. The pipelining in any kind of architecture took birth from the inherent parallelism and the idle states of components. The pipelined architecture could be further enhanced with the concepts known as super-scaling. There we provide more than one execution unit. The time when one unit is 9 busy with the current execution task, the fetch unit can probably fetch he next instruction which would be executed with the help of some other execution unit present in system. Features which are generally found in RISC designs are: uniform instruction encoding (for example the op-code is always in the same bit position in each instruction, which is always one word long), which allows faster decoding; A homogeneous register set, allowing any register to be used in any context and simplifying compiler design. simple addressing modes (complex addressing modes are replaced by sequences of simple arithmetic instructions); Few data types supported in hardware (for example, some CISC machines had instructions for dealing with byte strings. Others had support for polynomials and complex numbers. Such instructions are unlikely to be found on a RISC machine). Over many years, RISC instruction sets have tended to grow in size. Thus, some have started using the term "load-store" to describe RISC processors, since this is the key element of all such designs. Instead of the CPU itself handling many addressing modes, load-store architecture uses a separate unit dedicated to handling very simple forms of load and store operations. CISC processors are then termed "register-memory" or "memory-memory". Today RISC CPUs (and microcontrollers) represent the vast majority of all CPUs in use. The RISC design technique offers power in even small sizes, and thus has come to completely dominate the market for low-power "embedded" CPUs. Embedded CPUs are by far the largest market for processors. RISC had also completely taken over the market for larger workstations for much of the 90s. After the release of the Sun SPARCstation the other vendors rushed to compete with RISC based solutions of their own. Even the mainframe world is now completely RISC based. 10 3. RISC vs CISC 3.1 CISC Designs An overriding characteristic of CISC machines is an approach to instruction set architecture that emphasizes doing more with each instruction. As a result, CISC machines have a wide variety of addressing modes. CISC machines take a “have it your way” approach to the location and number of operands in various instructions. As a result instructions are of widely varying length and execution times. 3.2 The bridge toward RISC (Historical factors) The capabilities of CISC allowed more operations to be performed into the same program size. During that period, program and data storage were given more importance since cost of memory was high. An attempt was made to narrow the semantic gap, that is, the gap that existed between machine instruction sets and high level language constructs with complicated instructions and addressing modes to obtain performance increase. Most of these “improvements” were rejected by compiler writers on the context that they did not fit well with the language requirements and were of only limited usefulness. At the same time, research conducted by David Patterson and Donald Knuth showed that 85% of a program’s statements were assignments, conditional or procedure calls. Nearly 80% of the assignment statements were MOVE instructions with no arithmetic operations. As more and more capabilities were added to the processors, it was found increasingly difficult to support higher clock speeds that would otherwise have been possible. Complex instructions and addressing modes worked against higher clock speeds, because of the greater number of microscopic actions that had to be performed per instruction. Moreover, RAM prices dropped sufficiently so that the pressure on system designers was less to design instructions that did more that it was to design systems that were faster. It was also becoming cost-effective to employ small amounts of higher-speed cache memory to reduce memory latency i.e. the writing time between when a memory is made and when it has been satisfied. 11 3.3 Why RISC? Various attempts have been made to increase the instruction execution rates by overlapping the execution of more than one instruction since the earliest day of computing. The most common ways of overlapping are pre-fetching, pipelining and superscalar operation. 1) Pre-fetching: The process of fetching next instruction or instructions into an event queue before the current instruction is complete is called pre-fetching. The earliest 16-bit microprocessor, the Intel 8086/8, pre-fetches into a non-board queue up to six bytes following the byte currently being executed thereby making them immediately available for decoding and execution, without latency. 2) Pipelining: Pipelining instructions means starting or issuing an instruction prior to the completion of the currently executing one. The current generation of machines carries this to a considerable extent. The PowerPC 601 has 20 separate pipeline stages in which various portions of various instructions are executing simultaneously. 3) Superscalar operation: Superscalar operation refers to a processor that can issue more than one instruction simultaneously. The PPC 601 has independent integer, floating-point and branch units, each of which can be executing an instruction simultaneously. CISC machine designers incorporated pre-fetching, pipelining and superscalar operation in their designs but with instructions that were long and complex and operand access depending on complex address arithmetic, it was difficult to make efficient use of these new speed-up techniques. Furthermore, complex instructions and addressing modes hold down clock speed compared to simple instructions. RISC machines were designed to efficiently exploit the caching, pre-fetching, pipelining and superscalar methods that were invented in the days of CISC machines. 12 4. RISC: Top level Description and guidelines We implemented a 16-bit RISC microprocessor based on a simplified version of the MIPS architecture. The processor has 16-bit instruction words and 16 general purpose registers. Every instruction is completed in four cycles. An external clock is used as the timing mechanism for the control and datapath units. This section includes a summary of the main features of the processor, a description of the pins, a high level diagram of the external interface of the chip, and the instruction word formats. 16 instructions in the instruction set architecture. 16 general purpose registers. Instruction completion in 4 clock cycles External Clock is used. 14 external address lines. Mem[15:0] Input/Output 16-bit memory bus. Mem_addr_s1[13:0] Output 14-bit address lines to address the main memory. Mem_wrt_s1 Output Asserted high when the chip writes data to Mem[15:0]. Otherwise, the chip treats Mem[15:0] as inputs. Clock Input Clock Reset_s1 Input Resets the chip when asserted high. The chip will switch to the idle state, and fetch from the main memory starting at address 0. Vdd Input Power. GND Input Ground. 13 Fig.4 High Level Block Diagram that describes the external interface of the chip 4.1 Instruction Set Architecture (ISA) The ISA of this processor consists of 16 instructions with a 4-bit fixed size operation code. The instruction words are 16-bits long. The following chart describes the instruction formats. Operation Opcode 12 Destination Reg Source Reg 11 7 10 15 14 13 ADD 0 0 0 0 Rd Rs Rt SUB 0 0 0 1 Rd Rs Rt AND 0 0 1 0 Rd Rs Rt OR 0 0 1 1 Rd Rs Rt 14 9 8 6 5 Target Reg 4 3 2 1 0 XOR 0 1 0 0 Rd Rs NOT 0 1 0 1 Rd Rs SLA 0 1 1 0 Rd Rs SRA 0 1 1 1 Rd Rs LI 1 0 0 0 Rd Immediate LW 1 0 0 1 Rd Rs SW 1 0 1 0 BIZ 1 0 1 1 Rs Offset BNZ 1 1 0 0 Rs Offset JAL 1 1 0 1 Rd Offset JMP 1 1 1 0 Offset JR 1 1 1 1 Rs Rs 15 Rt Rt The Processor features five instruction classes: 1. Arithmetic (Two’s Complement) ALU operation (2) ADD: Rd = Rs + Rt Operands A and B stored in register locations Rs and Rt are added and written to the destination register specified by Rd. SUB: Rd = Rs - Rt Operand B (Rt) is subtracted from Operand A (Rs) and written to Rd. 2. Logical ALU operation (6) AND: Rd = Rs & Rt Operand A (Rs) is bitwise anded with Operand B (Rt) and written into Rd. OR: Rd = Rs | Rt Operand A (Rs) is bitwise ored with Operand B (Rt) and written into Rd. XOR: Rd = Rs ^ Rt Operand A (Rs) is bitwise Xored with Operand B (Rt) and written into Rd. NOT: Rd = ~Rs Operand A (Rs) is bitwise inverted and written into Rd. SLA: Rd = Rs << 1 Operand A (Rs) is arithmetically shifted to the left by one bit and written into Rd. SRA: Rd = Rs >> 1 Operand A (Rs) is arithmetically shifted to the right by one bit and written into Rd. The MSB (sign bit) will be preserved for this operation. 3. Memory operations (3) LI: Rd = 8-bit Sign extended Immediate The 8-bit immediate in the Instruction word is sign-extended to 16-bits and written into the register specified by Rd. LW: Rd = Mem[Rs] 16 The memory word specified by the address in register Rs is loaded into register Rd. SW: Mem[Rs] = Rt The data in register Rt is stored into the memory location specified by Rs. 4. Conditional Branch operations (2) BIZ: PC = PC + 1 + Offset if Rs = 0 If all the bits in register Rs are zero than the current Program Count (PC + 1) is offset to PC + 1 + Offset. The count is offset from PC + 1 because it is incremented and stored during the Fetch cycle. BNZ: PC = PC + 1 + Offset if Rs! = 0 If all the bits in register Rs are not zero than the current Program Count (PC + 1) is offset to PC + 1 + Offset. 5. Program Count Jump operations (3) JAL: Rd = PC + 1 and PC = PC + 1 + Offset Jump and Link instruction would write current Program Count in register Rd and offset the program count to PC + 1 + Offset JMP: PC = PC + 1 + Offset Unconditional jump instruction will offset the program count to PC + 1 + Offset. JR: PC = Rs Jump Return instruction will set the Program Count to the one previously stored in JAL. FETCH INSTRUCTION Part 1 Retrieve instruction word from main memory Increment Program Counter and store in ALU Out Part 2 Write Incremented Program Count Load Operands into latches from Register File 17 EXECUTE INSTRUCTION Part 1 Perform ALU Operation based instruction word and store in ALU Out Move Memory Word into MDR for Load Word operation Write Data into Memory from Register File for Store Word operation Part 2 Write ALU, IR (Immediate), or MDR data into Register File Write new Program Count for Jump Operation or it Branch taken 4.2 MICRO-ARCHITECTURE The micro-architecture refers to a view of the machine that exposes the registers, buses and all other important functional units such as ALUs and counters. The principle subsystems of a processor are the CPU, main memory and the input/output. The data path and the control unit interact to do the actual processing task. The control unit receives signals from the data path and sends control signals to the data oath. These signal s control the data flow within the CPU and between the CPU and the main memory and Input/Output. Program Counter Fig.4.2.1 Program Counter 18 Instruction Register and Register File Fig.4.2.2 Instruction Register and RegFile 19 ALU and Operand Registers Fig.4.2.3 ALU and Operand Registers Control Unit Design The Control FSM has only three distinct states that determine the operation of the processor: IDLE, FETCH and EXECUTE. Here fetch and Execute is further divided into two states, Fetch instruction state and Fetch operands state. Similarly Execute state also divided into two parts. When the reset signal (reset_s1) goes high from any state, the FSM will be placed in the IDLE state. While in the IDLE state the control unit will send the PC write enable signal (pc_wrt_s2 = 1) and select zero (pc_sel_s2 = 0) as the current Program count. Fig.4.2.4 Control unit and Control signals 20 When the reset signal goes low, the FSM’s next state will be the FETCH state and the instruction from Memory address 0 will be loaded into the Instruction Register (IR) to begin program execution. The control looks at the next state = FETCH and generates the IR write (ir_wrt_s1), Operand A Select (opA_sel_s1), Operand B Select (opB_sel_s1 = 0010) and the ALU add operation (alu_op_s1 = 00000001) to load the IR with the next instruction and increment the PC by 1. These events all occur on the first clock of the FETCH state. One-hot signals are used for alu_op_s1, opB_sel_s1, and data_sel_s2 to make for easier decoding in the datapath units. The operation at the next phase of FETCH will be determined by the opcode (opcode_s2) from the IR, except for the incremented PC that is written in from the ALU ouput latch in all cases. The ALU Operations will load in Operands A and B from the Register File. The Load word will only need Operand A, while the Store word will need both operands (one for the address and one for the data word). The Branch instructions will use the offset in its instruction word and PC + 1 count as operands into the ALU. The JAL stores the incremented PC in the Register File, while the JR loads the return address into Operand A. After phase two of the FETCH state, the FSM enters the EXECUTE state. During the first phase for an ALU operation, the appropriate alu_op_s1 control signals are sent to the ALU as decoded from the opcode. The operand mux (opA_sel_s1 & opB_sel_s1) control signals are also generated to select the latch outputs. For the other operations (except LI), an add operation is required from the ALU. The operands chosen for the add are determined by the operation specified. The Load and Store words will access Memory on this first phase as well. The second phase of EXECUTE writes data into the register file or writes a new address into the PC. For the branch instruction, the control will look at the check zero signal from operand A to determine if the branch should be taken and the new PC should be written. The control returns the next state to FETCH to repeat the process for the next instruction. (All the source codes are included in appendix section) 21 5. Design and Simulation: Building Blocks 5.1 Sequential Element: D- Flip Flop Behavioral model of a positive edge-triggered D-flip flop with asynchronous set and reset Code: //d flip flops : Behavioral model module dff_bhv(dout,din,clk,reset,set); input reset,set,clk,din; output dout; reg dout; always @(posedge clk or negedge reset or negedge set) begin if (~reset) dout <= 1'b0; else if(~set) dout <= 1'b1; else dout <= din; end endmodule Simulation: (Waveform Analyzer window) Synthesis: (Typical Value) Clock Frequency: 50 MHz (only for Sequential Circuits) Data Required Time: 5ns (Optimization Report) 22 Slack is a High positive value for DRT = 5ns new DRT = 3ns (Delay Report) (RTL Schematic) 23 (Report Area) 24 (Critical Path Schematic) (Technology Schematic) Conclusion: Data required time is = 3ns For this value of DRT the slack is = 0.53 Logic cells used = 2 Maximum path delay = 3ns Input ports = 4 output ports = 1 25 5.2 Combinational Element: N-Bit add-subtract Module Structural model of a N-Bit adder sub-tractor module with overflow detection techniques using 1-Bit full adder Code: PART- A (1-Bit full adder) //************************************** // full adder: using half adders module full_adder(cout,sum,ain,bin,cin); input ain,bin,cin; output cout,sum; wire w1,w2,w3; half_adder ha1(w1,w2,ain,bin), ha2(w3,sum,w2,cin); or or1(cout,w1,w3); endmodule // half adder: structural module half_adder(carry,sum,ain,bin); input ain,bin; output sum,carry; xor xor1(sum,ain,bin); and and1(carry,ain,bin); endmodule Simulation: (Workspace: Design Hierarchy) (Simulation window) 26 Synthesis: Data Required Time: 5ns (Optimization Report) (Area Report) 27 (Delay Report) (RTL Schematic) (Critical Path Schematic) Conclusion: Logic cells used = 2 Maximum path delay = 4ns Input ports = 3 output ports = 2 28 PART - B // 8-bit adder and sub: using full adders and xor module add_sub_8b(o_flag,cout,sum,ain,bin,ctrl); input [7:0]ain; input [7:0]bin; input ctrl; output cout,o_flag; output [7:0]sum; wire [7:0] w; wire [6:0] c; xor x0(w[0],bin[0],ctrl), x1(w[1],bin[1],ctrl), x2(w[2],bin[2],ctrl), x3(w[3],bin[3],ctrl), x4(w[4],bin[4],ctrl), x5(w[5],bin[5],ctrl), x6(w[6],bin[6],ctrl), x7(w[7],bin[7],ctrl); full_adder fa1(c[0],sum[0],ain[0],w[0],ctrl), fa2(c[1],sum[1],ain[1],w[1],c[0]), fa3(c[2],sum[2],ain[2],w[2],c[1]), fa4(c[3],sum[3],ain[3],w[3],c[2]), fa5(c[4],sum[4],ain[4],w[4],c[3]), fa6(c[5],sum[5],ain[5],w[5],c[4]), fa7(c[6],sum[6],ain[6],w[6],c[5]), fa8(cout,sum[7],ain[7],w[7],c[6]); xor x(o_flag,c[6],cout); endmodule Simulation: (Waveform Analyzer window) Synthesis: Data Required Time: 10ns for the above value of DRT the slack is negative. New DRT = 12ns 29 (Optimization Report) (Delay Report) Conclusion: New data required time is = 12ns for this value of DRT the slack is = 0.39 Logic cells used = 17 Maximum path delay = 12ns Input ports = 17 output ports = 10 30 (RTL Schematic) 31 6. ASIC Design Flow: Design of a 16-Bit ALU Objective: Design a 16-Bit Arithmetic Logic unit (ALU) which can perform arithmetic and bit-wise logical operations. Parts are as following 1.Arithmetic unit A-Logic B-Logic 16-Bit Adder 1-Bit Full Adder 2.Logic Operation unit 3.output MUX 4.Status Register Operation Summary: Mode (M) 0 0 0 0 1 1 1 1 1 1 1 1 Sel [1] 0 0 1 1 0 0 0 0 1 1 1 1 Sel [0] 0 1 0 1 0 0 1 1 0 0 1 1 Co x x x x 0 1 0 1 0 1 0 1 Function A AndB A Or B A Xor B A Xnor B A A+ 1 A+ B A+ B+ 1 A+ B' A+ B'+1 A'+ B A'+ B+ 1 Operation AND OR XOR XNOR Pass A Inc A ADD ADD SUB (1’s C) SUB (2’s C) SUB (1’s C) SUB (2’s C) Code: //Code Starts from here //alu version -2 fully str..and fully optimized <str> : (\/) module alu_16bit(c,v,z,n,result,parity,m,sel,cout,ain,bin,c0); input [15:0] ain,bin; input m,c0; input [1:0] sel; output [15:0] result; wire [15:0] a_in,b_in,ari_out,res_out; output parity,cout; output z,n,c,v; //reg [3:0] status; mux_2by1 m1(m,result,res_out,ari_out); 32 b_logic b1(sel,b_in,bin); a_logic a1 (sel,a_in,ain); adder_16 add1(parity,cout,ari_out,a_in,b_in,c0); logic_unit l1(res_out,sel,ain,bin); //s_logic slogic1(status[0],status[1],status[2],status[3],result,cout,o_flow,m); assign z = ~(result[0] & result[1] & result[2] & result[3] & result[4] & result[5] & result[6] & result[7] & result[8] & result[9] & result[10] & result[11] & result[12] & result[13] & result[14] & result[15]); assign n = result [15]; assign v = cout ^ result[15]; assign c = cout; endmodule //module logical unit : <str> ()\/ module logic_unit(res_out,sel,ain,bin); input [15:0] ain,bin; input [1:0]sel; output [15:0] res_out; wire [15:0] and_out,or_out,xor_out,xnor_out; logic_low_unit llu0(and_out[0],or_out[0],xor_out[0],xnor_out[0],ain[0],bin[0]), llu1(and_out[1],or_out[1],xor_out[1],xnor_out[1],ain[1],bin[1]), llu2(and_out[2],or_out[2],xor_out[2],xnor_out[2],ain[2],bin[2]), llu3(and_out[3],or_out[3],xor_out[3],xnor_out[3],ain[3],bin[3]), llu4(and_out[4],or_out[4],xor_out[4],xnor_out[4],ain[4],bin[4]), llu5(and_out[5],or_out[5],xor_out[5],xnor_out[5],ain[5],bin[5]), llu6(and_out[6],or_out[6],xor_out[6],xnor_out[6],ain[6],bin[6]), llu7(and_out[7],or_out[7],xor_out[7],xnor_out[7],ain[7],bin[7]), llu8(and_out[8],or_out[8],xor_out[8],xnor_out[8],ain[8],bin[8]), llu9(and_out[9],or_out[9],xor_out[9],xnor_out[9],ain[9],bin[9]), llu10(and_out[10],or_out[10],xor_out[10],xnor_out[10],ain[10],bin[10]), llu11(and_out[11],or_out[11],xor_out[11],xnor_out[11],ain[11],bin[11]), llu12(and_out[12],or_out[12],xor_out[12],xnor_out[12],ain[12],bin[12]), llu13(and_out[13],or_out[13],xor_out[13],xnor_out[13],ain[13],bin[13]), llu14(and_out[14],or_out[14],xor_out[14],xnor_out[14],ain[14],bin[14]), llu15(and_out[15],or_out[15],xor_out[15],xnor_out[15],ain[15],bin[15]); mux_4by1 mux41(res_out,sel,and_out,or_out,xor_out,xnor_out); endmodule 33 // 4 by 1 mux : <str> (\/) module mux_4by1(out,sel,i0,i1,i2,i3); input [15:0] i0,i1,i2,i3; output [15:0] out; input [1:0] sel; reg [15:0] out; always @ (i0 or i1 or i2 or i3 or sel) case (sel) 2'b00: out=i0; 2'b01: out=i1; 2'b10: out=i2; 2'b11: out=i3; endcase endmodule //logical unit low : <str> (\/) module logic_low_unit (and_out,or_out,xor_out,xnor_out,ain,bin); input ain,bin; output and_out,or_out,xor_out,xnor_out; wire w1,w2; //wire and_out,xnor_out; nand aand1(w1,ain,bin); not nnot1(and_out,w1); nor oor1(w2,ain,bin); not nnot11(or_out,w2); or oor2(xnor_out,and_out,w2); not nnot2(xor_out,xnor_out); endmodule //mux 2 : <behv> (\/) module mux_2by1(sel,out,ain,bin); input [15:0] ain,bin; input sel; output [15:0] out; reg [15:0] out; always @ (ain or bin or sel) case(sel) 1'b0: out=ain; 1'b1: out=bin; endcase endmodule //logic b : <str> (\/) module b_logic (sel,bout,bin); input [15:0] bin; input [1:0] sel; output [15:0] bout; b_logic_low bl0(sel,bout[0],bin[0]), bl1(sel,bout[1],bin[1]), bl2(sel,bout[2],bin[2]), bl3(sel,bout[3],bin[3]), bl4(sel,bout[4],bin[4]), bl5(sel,bout[5],bin[5]), 34 bl6(sel,bout[6],bin[6]), bl7(sel,bout[7],bin[7]), bl8(sel,bout[8],bin[8]), bl9(sel,bout[9],bin[9]), bl10(sel,bout[10],bin[10]), bl11(sel,bout[11],bin[11]), bl12(sel,bout[12],bin[12]), bl13(sel,bout[13],bin[13]), bl14(sel,bout[14],bin[14]), bl15(sel,bout[15],bin[15]); endmodule //module b logic low : <str> (\/) module b_logic_low (sel,bout,bin); input bin; input [1:0] sel; output bout; wire w1; and an1 (w1,bin,sel[0]); not notn1 (bbar, bin), notn2 (s0_bar, sel[0]); and anda1 (w2, bbar,s0_bar,sel[1]); or orn1 (bout, w1,w2); endmodule //a logic : <str> (\/) module a_logic(sel,aout,ain); input [15:0] ain; input [1:0] sel; output [15:0] aout; wire w; and nan1 (w,sel[0],sel[1]); xor xorx0 (aout[0],ain[0],w), xorx1 (aout[1],ain[1],w), xorx2 (aout[2],ain[2],w), xorx3 (aout[3],ain[3],w), xorx4 (aout[4],ain[4],w), xorx5 (aout[5],ain[5],w), xorx6 (aout[6],ain[6],w), xorx7 (aout[7],ain[7],w), xorx8 (aout[8],ain[8],w), xorx9 (aout[9],ain[9],w), xorx10 (aout[10],ain[10],w), xorx11 (aout[11],ain[11],w), xorx12 (aout[12],ain[12],w), xorx13 (aout[13],ain[13],w), xorx14 (aout[14],ain[14],w), xorx15 (aout[15],ain[15],w); endmodule //16 bit adder : <str> (\/) module adder_16 (cp,cout,sum,ain,bin,cin); input [15:0] ain,bin; input cin; output [15:0] sum; output cp,cout; wire [14:0] c; 35 full_adder fulla0(c[0],sum[0],ain[0],bin[0],cin), fulla1(c[1],sum[1],ain[1],bin[1],c[0]), fulla2(c[2],sum[2],ain[2],bin[2],c[1]), fulla3(c[3],sum[3],ain[3],bin[3],c[2]), fulla4(c[4],sum[4],ain[4],bin[4],c[3]), fulla5(c[5],sum[5],ain[5],bin[5],c[4]), fulla6(c[6],sum[6],ain[6],bin[6],c[5]), fulla7(c[7],sum[7],ain[7],bin[7],c[6]), fulla8(c[8],sum[8],ain[8],bin[8],c[7]), fulla9(c[9],sum[9],ain[9],bin[9],c[8]), fulla10(c[10],sum[10],ain[10],bin[10],c[9]), fulla11(c[11],sum[11],ain[11],bin[11],c[10]), fulla12(c[12],sum[12],ain[12],bin[12],c[11]), fulla13(c[13],sum[13],ain[13],bin[13],c[12]), fulla14(c[14],sum[14],ain[14],bin[14],c[13]), fulla15(cout,sum[15],ain[15],bin[15],c[14]); assign cp = (sum[0]^sum[1]^sum[2]^sum[3]^sum[4]^sum[5]^sum[6]^sum[7]^sum[8]^sum[9]^ sum[10]^sum[11]^sum[12]^sum[13]^sum[14]^sum[15]); endmodule // full adder: <str> (\/) module full_adder(cout,sum,ain,bin,cin); input ain,bin,cin; output cout,sum; wire w1,w2,w3; half_adder ha1(w1,w2,ain,bin), ha2(w3,sum,w2,cin); or or1(cout,w1,w3); endmodule //half adder <str> (\/) module half_adder(carry,sum,ain,bin); input ain,bin; output sum,carry; xor xor11(sum,ain,bin); and and11(carry,ain,bin); endmodule Simulation: (Workspace Window: Hierarchy in Design) (Simulation Window) 36 Logical Operations: Arithmetic Operation: 37 Synthesis: (Typical Value) Clock Frequency: 50 MHz (only for Sequential Circuits) Data Required Time: 20ns (Optimization Report) (Critical Path Report) 38 (RTL Schematic) 39 Semi -Custom Layout: (ALU: Floor Planner) 40 (ALU: Layout after Routing) (DRC - Run : Result) 41 (PEX: SPICE NETLIST) * File: alu.pex.netlist * Created: Thu Apr 20 21:38:03 2006 * Program "Calibre xRC" * Version "v9.3_5.11" * .subckt ALU AIN[10] AIN[12] AIN[11] SEL[0] BIN[10] BIN[11] BIN[12] AIN[13] + BIN[13] AIN[14] RESULT[13] RESULT[11] RESULT[10] RESULT[12] RESULT[8] RESULT[9] + BIN[14] BIN[9] RESULT[14] N PARITY C AIN[9] V AIN[15] AIN[8] BIN[15] BIN[8] M + SEL[1] Z BIN[7] AIN[1] AIN[7] BIN[1] AIN[6] RESULT[7] BIN[0] RESULT[6] AIN[0] + RESULT[5] RESULT[4] BIN[6] C0 RESULT[1] AIN[5] BIN[5] AIN[2] RESULT[0] BIN[2] + RESULT[2] RESULT[3] AIN[3] AIN[4] BIN[4] BIN[3] VDD GND * mM0 VDD 269 Z VDD p L=6e-07 W=2.7e-06 AD=9.72e-12 AS=3.555e-12 mM1 270 104 VDD VDD p L=6e-07 W=5.4e-06 AD=4.86e-12 AS=9.72e-12 mM2 271 94 270 VDD p L=6e-07 W=5.4e-06 AD=4.86e-12 AS=4.86e-12 mM3 272 153 271 VDD p L=6e-07 W=5.4e-06 AD=4.86e-12 AS=4.86e-12 mM4 269 155 272 VDD p L=6e-07 W=5.4e-06 AD=8.235e-12 AS=4.86e-12 mM5 GND 269 Z GND n L=6e-07 W=1.5e-06 AD=2.475e-12 AS=2.475e-12 mM6 GND 104 269 GND n L=6e-07 W=1.5e-06 AD=2.7e-12 AS=2.475e-12 mM7 269 94 GND GND n L=6e-07 W=1.5e-06 AD=2.7e-12 AS=2.7e-12 .................................................................................................... ................................................................................................... ................................................................................................... .................................................................................................... ................................................................................................... ................................................................................................... c_7195 BIN[4] 0 18.4363f c_7203 256 0 12.162f c_7218 257 0 22.2254f c_7232 258 0 12.4096f c_7247 259 0 14.4328f c_7264 260 0 19.6958f c_7276 261 0 14.4787f c_7286 262 0 17.8047f c_7299 263 0 17.7997f c_7321 264 0 30.395f c_7336 BIN[3] 0 19.9919f c_7667 VDD 0 2.1844p c_7998 GND 0 1.68888p * .include alu.pex.netlist.ALU.pxi .ends 42 7. Implementation: 16-Bit Non-pipelined RISC Processor Register Set: General Purpose Register: (16 - 16 Bit each) R0 to R15: (R0 always Contains ZERO) 1. Program Counter: (16-bit) PC 2. Instruction Fetch Register: (20 - Bit )IR 3. Status Register: (8-Bit) SR STATUS Register: (8-Bit) X X X C O E P Z Z < -- Zero Flag ('1' if Result is ZERO else '0') P < -- Parity Flag ('1' if Result is ODD Parity else '0') E< -- Even Flag ('1' if Result is EVEN else '0') O < -- Overflow Flag ('1' if Overflow occurs else '0') C < -- Carry Flag ('1' if Carry out Occurs else '0') X < -- NOT used (Reserved for Future Extension) Instruction Set Format: 1. R - Format REG <-- [REG], [REG]: Arithmetic and Logical operations OPCODE IR[19:16] DST IR[15:12] SRC1 IR[11:8] SRC2 IR[7:4] ALU_TASK IR[3:0] Size: 4B 4B 4B 4B 4B Instructions: Opcode 0h Function(Men) ADD 43 ALU_task 0h 0h 0h 0h 0h 0h 0h 0h 1h 2h 3h 4h 5h 6h 7h SUB AND OR XOR SHL SHR ROT 1. I -Format REG <-- [REG], [IMME_VAL] : Arithmetic and Logical operations OPCODE IR[19:16] DST IR[15:12] SRC IR[11:8] IMM_VAL IR[7:0] Size: 4B 4B 4B 8B Instructions: Opcode 1h 2h 3h 4h 5h 6h 7h 8h Function(Men) ADDi SUBi ANDi ORi XORi SHLi SHRi ROTi ALU_task x x x x x x x x 9h 10h LW(i) SW(i) x x BRz x Fh J- Format PC <- Address: Opcode IR[19:16] Address IR[15:0] Size: 4B 16B Instructions: JMP [Address] Source Files’ List: 44 Code: //RISC (Top level: Datapath and Ctrl path) module risc_top (clk,reset); input clk,reset; wire [15:0]AOUT; wire [3:0]opcode,alu_task,alu_fun; wire cout,aluout_ctrl,npc_ctrl,ir_ctrl,A_ctrl,B_ctrl,lmd_ctrl,imm_ctrl, sel_mxa, sel_mxb, sel_mxout,write,memwr,memrd,cond_ctrl,wr_data_ins,rd_data_ins; wire [19:0] wr_data; control_unit controlpath( clk,reset,AOUT,cout,opcode,alu_fun,aluout_ctrl,npc_ctrl,ir_ctrl,A_ctrl,B_ctrl,lmd_ctrl,imm_ctrl, sel_mxa, sel_mxb, sel_mxout,write,memwr,memrd,alu_task,cond_ctrl,wr_data_ins,rd_data_ins); risc_nonpipe datapath( AOUT,cout,opcode,alu_fun,aluout_ctrl,clk,reset,npc_ctrl,ir_ctrl,A_ctrl,B_ctrl,lmd_ctrl,imm_ctrl, sel_mxa, sel_mxb, sel_mxout,write,memwr,memrd,alu_task,cond_ctrl,wr_data,wr_data_ins,rd_data_ins); endmodule //Datapath 45 module risc_nonpipe ( AOUT,cout,opcode,alu_fun,aluout_ctrl,clk,reset,npc_ctrl,ir_ctrl,A_ctrl,B_ctrl,lmd_ctrl,imm_ctrl, sel_mxa, sel_mxb, sel_mxout,write,memwr,memrd,alu_task,cond_ctrl,wr_data,wr_data_ins,rd_data_ins); input aluout_ctrl,clk,reset,npc_ctrl,ir_ctrl,A_ctrl,B_ctrl,lmd_ctrl,imm_ctrl, sel_mxa, sel_mxb, sel_mxout,write,memwr,memrd,cond_ctrl,wr_data_ins,rd_data_ins; input [3:0]alu_task; output cout; output [3:0] opcode,alu_fun; output [15:0] AOUT; output [19:0] wr_data; wire [15:0] pc_out,pc_in,npc_in,npc_out,a_out,a_in,b_out,b_in,lmd_out,lmd_in, imm_out,imm_in,aluout,aluout_out,alu_src1,alu_src2,reg_wr_in,xtn_out,aluout_in; wire [19:0] ins_out; wire [3:0] dst,src1,src2; wire sel_cond,cond_in; pc pcreg(clk,reset,pc_out,pc_in); adder16 pc_adder(npc_in,pc_out,16'h1); //NPC latch16 NPC(npc_ctrl,npc_out,npc_in); //IR IR ir_reg(ir_ctrl,opcode,dst,src1,src2,alu_fun,ins_out); //A latch16 A(A_ctrl,a_out,a_in); //B latch16 B(B_ctrl,b_out,b_in); //LMD latch16 LMD(lmd_ctrl,lmd_out,lmd_in); //IMM latch16 IMM(imm_ctrl,imm_out,imm_in); //COND //cond COND1(cond_ctrl,sel_cond,cond_in); alu16 alu(cout,aluout_in,alu_src1,alu_src2,alu_task); //ALU_OUT latch16 ALU_OUT(aluout_ctrl,aluout_out,aluout_in); data_mem datamem(lmd_in,aluout_out,b_out,memwr,memrd); //ins_mem insmem(ins_out,pc_out); ins_mem insmem(ins_out,pc_out,wr_data,wr_data_ins,rd_data_ins); mux2x1_16b muxa(sel_mxa,alu_src1,npc_out,a_out), muxb(sel_mxb,alu_src2,b_out,imm_out), muxnpc_alu(cond_ctrl,pc_in,npc_out,aluout_out), muxout(sel_mxout,reg_wr_in,lmd_out,aluout_out); 46 sign_xtnd xtender(src2,alu_fun, imm_in); //reg_file reg_Ro_R15(write,a_in,b_in,src1,src2,dst,reg_wr_in); reg_file_2p reg_Ro_R15(clk,write,a_in,b_in,src1,src2,dst,reg_wr_in); //brz_check iszero(a_out,cond_in); assign AOUT = a_out; endmodule //Datapath: Combinational Computational Components //1. Sign Extension unit module sign_xtnd (xtn_in1,xtn_in2, xtn_out); input [3:0] xtn_in1,xtn_in2; output [15:0] xtn_out; reg [15:0] xtn_out; always @ (xtn_in1 or xtn_in2) begin if(xtn_in1[3]) begin xtn_out[3:0] = xtn_in2; xtn_out[7:4] = xtn_in1; xtn_out[15:8] = 8'b11111111; end else begin xtn_out[3:0] = xtn_in2; xtn_out[7:4] = xtn_in1; xtn_out[15:8] = 8'b00000000; end end endmodule //Adder - 16 Bit module adder16 (addout,ain,bin); input [15:0] ain,bin; output [15:0] addout; assign addout = ain+bin; endmodule //Mux 2 by 1 : 16 bit wide module mux2x1_16b (sel,mxout,mxin1,mxin2); input [15:0] mxin1,mxin2; input sel; output [15:0] mxout; reg [15:0] mxout; always @ (mxin1 or mxin2 or sel) case(sel) 1'b0: mxout=mxin1; 1'b1: mxout=mxin2; endcase endmodule 47 //Alu - 16 Bit module alu16(cout,aluout,ain,bin,alu_func); input [15:0] ain,bin; input [3:0] alu_func; output [15:0] aluout; output cout; reg [15:0] aluout; reg cout; always @ (ain or bin or alu_func) begin case(alu_func) 4'b0000: {cout,aluout} = ain+bin; 4'b0001: {cout,aluout} = ain-bin; 4'b0010: aluout = ain | bin; 4'b0011: aluout = ain & bin; 4'b0100: aluout = ain ^ bin; 4'b0101: aluout = ain << bin; 4'b0110: aluout = ain >> bin; //4'b0000: aluout = ain bin; endcase end //assign cout = aluout[16]; endmodule // Datapath : Storage Elements //GP Register bank //two read port and 1 write port module reg_file(write,rdata1,rdata2,rdreg1,rdreg2,wrreg,wr_data); input [3:0] rdreg1,rdreg2,wrreg; input [15:0] wr_data; input write; output [15:0] rdata1,rdata2; reg [15:0] rdata1,rdata2; reg [15:0] reg_bank [0:15]; //Regs are R_0 to R_15 always @ (write or rdreg1 or rdreg2 or wrreg or wr_data) begin rdata1 = reg_bank[rdreg1]; rdata2 = reg_bank[rdreg2]; if(write) reg_bank[wrreg] = wr_data; end endmodule //Regfile two pahse module reg_file_2p(clk,write,rdata1,rdata2,rdreg1,rdreg2,wrreg,wr_data); input [3:0] rdreg1,rdreg2,wrreg; input [15:0] wr_data; 48 input write,clk; output [15:0] rdata1,rdata2; reg [15:0] rdata1,rdata2; reg [15:0] reg_bank [0:15]; //Regs are R_0 to R_15 always @ (clk or write or rdreg1 or rdreg2 or wrreg or wr_data) begin case(clk) 1'b1: begin rdata1 = reg_bank[rdreg1]; rdata2 = reg_bank[rdreg2]; end 1'b0:begin if(write) reg_bank[wrreg] = wr_data; end endcase end endmodule //Registers and latches used in datapath module pc (clk,reset,pcout,pcin); input [15:0] pcin; input clk,reset; output [15:0] pcout; reg [15:0] pcout; always @ (posedge clk or negedge reset) begin if (~reset) pcout = 16'b0; else pcout = pcin; end endmodule //will be used for //NPC, A, B, IMM, Alu_out, LMD module latch16 (ctrl,lth_out,lth_in); input [15:0] lth_in; input ctrl; output [15:0] lth_out; reg [15:0] lth_out; always @ (ctrl or lth_in) begin if(ctrl) lth_out=lth_in; end endmodule //IR Latch 49 module IR (ctrl,opcode,dst,src1,src2,alu_fun,ir_in); input [19:0] ir_in; input ctrl; output [3:0]opcode,dst,src1,src2,alu_fun; reg [3:0]opcode,dst,src1,src2,alu_fun; //output [19:0] ir_out; //reg [19:0] ir_out; always @ (ctrl or ir_in) begin if(ctrl) begin opcode = ir_in[19:16]; dst = ir_in[15:12]; src1 = ir_in[11:8]; src2 = ir_in[7:4]; alu_fun = ir_in[3:0]; end end endmodule //CondLatch module cond (ctrl,cond_out,cond_in); input cond_in; input ctrl; output cond_out; reg cond_out; always @ (ctrl or cond_in) begin if(ctrl) cond_out=cond_in; end endmodule //Instruction memery module ins_mem(rd_data,address,wr_data,memwr,memrd); input [15:0] address; input [19:0] wr_data; input memwr,memrd; output [19:0]rd_data; reg [19:0]rd_data; reg [19:0] mem_data[0:31]; always @ (address or wr_data or memwr or memrd) begin if(memwr) mem_data[address] = wr_data; if(memrd) rd_data = mem_data[address]; else rd_data = 20'bzzzzzzzzzzzzzzzz; end endmodule //Data memory 50 module data_mem(rd_data,address,wr_data,memwr,memrd); input [15:0] address, wr_data; input memwr,memrd; output [15:0]rd_data; reg [15:0]rd_data; reg [15:0] mem_data [0:31]; always @ (address or wr_data or memwr or memrd) begin if(memwr) mem_data[address] = wr_data; if(memrd) rd_data = mem_data[address]; else rd_data = 16'bzzzzzzzzzzzzzzzz; end endmodule Simulation: (Workspace) 51 (Signal Window) 52 (Register Bank): R0 to R15 [16-Bit each] (Instruction memory): 20- Bit wide locations Instructions written in memory are: ADD R3, R2,R1 ADD R3, R2,R1 xxxxx SUB R7, R2,R1 xxxxx xxxxx ADD R7, R2,R1 53 (Simulation Window) Synthesis: Without optimization: Implementation Report RISC - TOP LEVEL (Most Top level) 54 DATA - PATH (Page: 1) (Page: 2) 55 Basic Building Blocks of DATAPATH Work Lib: Component List 1. Control Unit 56 2. Program Counter 3. 16-Bit Adder 4. 16-Bit Latch 5. Instruction Register 6. 16- Bit ALU 57 7. Data Memory 8. Instruction Memory 9. Mux 2 x 1 10.Sign Extension Unit 11. Register File 58 12. RISC-Data Path 13. RISC MOST TOP Level 59 ALU with MUX-ed Inputs Control Path: Synthesis Clock Frequency: 50MHz DRT: 7ns (Optimization Report) 60 (Delay Report) (Area Report) 61 Critical path Schematic 62 Appendix - I DATA Path Schematic 63 Appendix – II Controller State Diagram 64 Appendix – III Verilog Codes - RTL code for RISC design with a micro-coded control unit //************************************************************ // Top level ModuleRISC.v module risc(clk,reset); input clk,reset; wire [3:0]opcode; wire [2:0] alu_sel; wire [1:0] opb_sel,data_sel; wire [15:0] outA; wire carry,pc_sel,pc_wrt,addr_sel,ir_wrt,rega_sel,reg_wrt,opa_sel,re,we; control_unit control_path(opcode,outA,carry,reset,pc_sel,pc_wrt,addr_sel, ir_wrt,data_sel,rega_sel, reg_wrt,opb_sel,opa_sel, alu_sel,re,we,clk); datapath data_path( irout,outA,carry,pc_sel,pc_wrt,addr_sel, ir_wrt,data_sel,rega_sel, reg_wrt,opb_sel,opa_sel, alu_sel,re,we,clk,reset); endmodule //************************************************************ //Control Unit Design module control_unit(opcode,outA,carry,reset,pc_sel,pc_wrt,addr_sel, ir_wrt,data_sel,rega_sel, reg_wrt,opb_sel,opa_sel, alu_sel,re,we,clk); output pc_sel,pc_wrt,addr_sel,ir_wrt,rega_sel,reg_wrt,opa_sel,re,we; output [1:0] data_sel,opb_sel; output [2:0] alu_sel; input [3:0] opcode; input [15:0] outA; input carry, reset, clk; reg pc_sel,pc_wrt,addr_sel,ir_wrt,rega_sel,reg_wrt,opa_sel,re,we; reg [1:0] data_sel,opb_sel; reg [2:0] alu_sel; wire [3:0] opcode; wire [15:0] outA; wire [2:0] alu_task; reg [2:0] pstate,nstate; wire zero; 65 assign alu_task = opcode; parameter start=3'b100,s0=3'b000,s1=3'b001,s2=3'b010,s3=3'b011; assign zero = ((outA == 16'b0)& ((opcode == 4'b1011) | (opcode == 4'b1100)))?1:0; always @(posedge clk) begin if(~reset)pstate=start; else pstate = nstate; end always@(pstate or posedge clk) case (pstate) start: nstate=s0; s0: nstate=s1; s1: nstate=s2; s2: nstate=s3; s3: nstate=s0; endcase always @(posedge clk) case(pstate) //***************************************************** start: begin pc_sel=0; pc_wrt=0; addr_sel=0; ir_wrt=0; rega_sel=0; reg_wrt=0; opa_sel=0; re=0; we=0; data_sel=2'b0; opb_sel=2'b0; alu_sel=3'b0; //**************************************************** s0: begin // Retrieve Instruction Word from Main Memory //Increment Program Counter and Store in ALU Output pc_sel=0; pc_wrt=0; addr_sel=0; ir_wrt=1; rega_sel=0; reg_wrt=0; opa_sel=1; 66 end re=1; we=0; data_sel=2'b00; opb_sel=2'b00; alu_sel=3'b000; end //***************************************************** s1: begin //write incremented Program Count //Load Operands into Latches from Register File case(opcode) 4'b0xxx: begin pc_sel=1; pc_wrt=1; addr_sel=0; ir_wrt=0; rega_sel=0; reg_wrt=0; opa_sel=0; re=0; we=0; data_sel=2'b00; opb_sel=2'b00; alu_sel=alu_task; end 4'b1000: begin// Load Immediate word pc_sel=1; pc_wrt=1; addr_sel=0; ir_wrt=0; rega_sel=1; reg_wrt=0; opa_sel=0; re=0; we=0; data_sel=2'b00; opb_sel=2'b11; alu_sel=3'b000; end 4'b1001: begin// Load Word Operation pc_sel=1; pc_wrt=1; addr_sel=0; ir_wrt=0; rega_sel=0; reg_wrt=0; opa_sel=0; re=0; we=0; data_sel=2'b00; opb_sel=2'b00; alu_sel=3'b000; end 4'b1010: begin//Store Word Operation Keep 11:8 0000 pc_sel=1; 67 pc_wrt=1; addr_sel=0; ir_wrt=0; rega_sel=0; reg_wrt=0; opa_sel=0; re=0; we=0; data_sel=2'b00; opb_sel=2'b00; alu_sel=3'b000; end 4'b1011: begin//Branch If zero pc_sel=1; pc_wrt=1; addr_sel=0; ir_wrt=0; rega_sel=0; reg_wrt=0; opa_sel=0; re=0; we=0; data_sel=2'b00; opb_sel=2'b00; alu_sel=000; 4'b1100: begin//Branch if Not zero pc_sel=1; pc_wrt=1; addr_sel=0; ir_wrt=0; rega_sel=0; reg_wrt=0; opa_sel=0; re=0; we=0; data_sel=2'b00; opb_sel=2'b00; alu_sel=000; 4'b1101: begin//Jump and link pc_sel=1; pc_wrt=1; addr_sel=0; ir_wrt=0; rega_sel=0; reg_wrt=1; opa_sel=0; re=0; we=0; data_sel=2'b10; opb_sel=2'b00; alu_sel=000; 4'b1110: begin//Simple Jump 68 end end end pc_sel=1; pc_wrt=1; addr_sel=0; ir_wrt=0; rega_sel=0; reg_wrt=0; opa_sel=0; re=0; we=0; data_sel=2'b10; opb_sel=2'b00; alu_sel=000; end 4'b1111: begin//Jump Return-- PC = Rs pc_sel=1; pc_wrt=1; addr_sel=0; ir_wrt=0; rega_sel=0; reg_wrt=0; opa_sel=0; re=0; we=0; data_sel=2'b10; opb_sel=2'b00; alu_sel=000; end endcase end //***************************************************** s2: begin //Perform ALU Operation based instruction word and store in ALU Out //Move Memory Word into MDR for Load Word operation //Write Data into Memory from Register File for Store Word operation case(opcode) 4'b0xxx: begin pc_sel=1; pc_wrt=0; addr_sel=0; ir_wrt=0; rega_sel=0; reg_wrt=0; opa_sel=0; re=0; we=0; data_sel=2'b00; opb_sel=2'b00; alu_sel=alu_task; end 4'b1000: begin // Load Immediate word operation pc_sel=1; pc_wrt=0; addr_sel=1; ir_wrt=0; 69 rega_sel=0; reg_wrt=0; opa_sel=0; re=0; we=1; data_sel=2'b01; opb_sel=2'b00; alu_sel=3'b000; end 4'b1001: begin// Load Word Operation pc_sel=1; pc_wrt=0; addr_sel=1; ir_wrt=0; rega_sel=1; reg_wrt=0; opa_sel=1; re=0; we=0; data_sel=2'b01; opb_sel=2'b00; alu_sel=3'b000; end 4'b1010: begin//Store Word Operation Keep 11:8 0000 pc_sel=1; pc_wrt=0; addr_sel=0; ir_wrt=0; rega_sel=0; reg_wrt=0; opa_sel=0; re=0; we=0; data_sel=2'b01; opb_sel=2'b00; alu_sel=3'b000; end 4'b1011: begin//Branch If Zero pc_sel=1; pc_wrt=0; addr_sel=0; ir_wrt=0; rega_sel=1; reg_wrt=0; opa_sel=zero; re=0; we=0; data_sel=2'b00; opb_sel=2'b11; alu_sel=000; end 4'b1100: begin//Branch If not Zero pc_sel=1; pc_wrt=0; addr_sel=0; 70 ir_wrt=0; rega_sel=1; reg_wrt=0; opa_sel=~zero; re=0; we=0; data_sel=2'b00; opb_sel=2'b11; alu_sel=000; 4'b1101: begin//Jump and Link pc_sel=1; pc_wrt=0; addr_sel=0; ir_wrt=0; rega_sel=1; reg_wrt=0; opa_sel=1; re=0; we=0; data_sel=2'b00; opb_sel=2'b11; alu_sel=000; 4'b1110: begin// Simple Jump pc_sel=1; pc_wrt=0; addr_sel=0; ir_wrt=0; rega_sel=1; reg_wrt=0; opa_sel=1; re=0; we=0; data_sel=2'b00; opb_sel=2'b11; alu_sel=000; 4'b1111: begin// Jump Return pc_sel=1; pc_wrt=0; addr_sel=0; ir_wrt=0; rega_sel=0; reg_wrt=0; opa_sel=0; re=0; we=0; data_sel=2'b00; opb_sel=2'b00; alu_sel=000; endcase end //***************************************************** 71 end end end end s3: begin //Write ALU, IR (Immediate), or MDR data into Register File //Write new Program Count for Jump Operation or it Branch taken case(opcode) 4'b0xxx: begin pc_sel=1; pc_wrt=0; addr_sel=1; ir_wrt=0; rega_sel=0; reg_wrt=1; opa_sel=0; re=0; we=0; data_sel=2'b00; opb_sel=2'b00; alu_sel=alu_task; end 4'b1000: begin // Load Immediate word pc_sel=1; pc_wrt=0; addr_sel=1; ir_wrt=0; rega_sel=1; reg_wrt=1; opa_sel=0; re=1; we=0; data_sel=2'b01; opb_sel=2'b11; alu_sel=3'b000; end 4'b1001: begin// Load Word Operation pc_sel=1; pc_wrt=0; addr_sel=1; ir_wrt=0; rega_sel=1; reg_wrt=1; opa_sel=1; re=1; we=0; data_sel=2'b01; opb_sel=2'b11; alu_sel=3'b000; end 4'b1010: begin//Store Word Operation Keep 11:8 0000 pc_sel=1; pc_wrt=0; addr_sel=0; ir_wrt=0; rega_sel=0; reg_wrt=0; opa_sel=0; 72 re=0; we=0; data_sel=2'b01; opb_sel=2'b00; alu_sel=3'b000; 4'b1011: begin//Branch If Zero pc_sel=1; pc_wrt=zero; addr_sel=0; ir_wrt=0; rega_sel=1; reg_wrt=0; opa_sel=zero; re=0; we=0; data_sel=2'b00; opb_sel=2'b11; alu_sel=000; 4'b1100: begin//Branch if not zero pc_sel=1; pc_wrt=~zero; addr_sel=0; ir_wrt=0; rega_sel=1; reg_wrt=0; opa_sel=zero; re=0; we=0; data_sel=2'b00; opb_sel=2'b11; alu_sel=000; 4'b1101: begin//Jump and Link pc_sel=1; pc_wrt=1; addr_sel=0; ir_wrt=0; rega_sel=1; reg_wrt=0; opa_sel=1; re=0; we=0; data_sel=2'b00; opb_sel=2'b11; alu_sel=000; 4'b1110: begin//Simple Jump pc_sel=1; pc_wrt=1; addr_sel=0; ir_wrt=0; rega_sel=1; reg_wrt=0; 73 end end end end opa_sel=1; re=0; we=0; data_sel=2'b00; opb_sel=2'b11; alu_sel=000; 4'b1111: begin//Jump Return pc_sel=1; pc_wrt=1; addr_sel=0; ir_wrt=0; rega_sel=0; reg_wrt=0; opa_sel=0; re=0; we=0; data_sel=2'b00; opb_sel=2'b00; alu_sel=000; end end endcase end endcase endmodule //************************************************************ //Data path Design module datapath( irout,outA,carry,pc_sel,pc_wrt,addr_sel, ir_wrt,data_sel,rega_sel, reg_wrt,opb_sel,opa_sel, alu_sel,re,we,clk,rst); input pc_sel,pc_wrt,addr_sel,ir_wrt, rega_sel,reg_wrt,opa_sel,re,we,rst; input [1:0] data_sel,opb_sel; input [2:0] alu_sel; input clk; output [3:0] irout; output [15:0] outA; output carry; wire pc_sel,pc_wrt,addr_sel,ir_wrt, rega_sel,reg_wrt,opa_sel; wire [1:0] data_sel,opb_sel; wire [2:0] alu_sel; wire [3:0] instr15_12,instr11_8,instr7_4,instr3_0, rega; wire [15:0] outA,alu_out,alu_rslt,pcin,pcout,address,data_in,data_out,sign_ext8,datain, offsetdata,memout,opa,Ra,adata,Rb,bdata,opb; wire [15:0]zero,one; 74 wire clk; assign zero = 16'b0; assign one = 16'b1; assign irout = instr15_12; assign outA = adata; //Modules are instantiated mux2_to_1 pcmux(zero,alu_rslt,pc_sel,pcin); pc programcounter(pcin,pcout,pc_wrt,rst,clk); mux2_to_1 mux_memory(pcout,alu_out,addr_sel,address); ram memory(clk,address,data_in,data_out,re,we);// modified in_data,out_data replaces Memdata instr_reg IR(data_out,ir_wrt, clk,instr15_12, instr11_8, instr7_4, instr3_0); mux4b_2_to_1 instrmux(instr11_8,instr7_4,rega_sel,rega); regfile16 regfile(rega,Ra,instr3_0,Rb,instr11_8,datain,reg_wrt,address,data_in,clk); signextender signext(instr7_4,instr3_0,sign_ext8); mdr memory_data_reg(data_out,memout,clk); mux4_to_1 opbmux(sign_ext8,memout,alu_rslt,zero,data_sel,datain); opa registera(Ra,adata,clk); opb registerb(Rb,bdata,clk); offset offsetpart(sign_ext8,offsetdata,clk); mux2_to_1 regamux(adata,pcout,opa_sel,opa); mux4_to_1 regbmux(bdata,zero,one,offsetdata,opb_sel,opb); alu16b alu(opa,opb,alu_sel,alu_out,carry); aluout finalout(alu_out,alu_rslt,clk); endmodule //************************************************************ //ALU Behavioural module alu16b(PORT1, PORT2, ALUCON, ALUOUT, carry); input [15:0]PORT1,PORT2; input [2:0] ALUCON; output carry; output [15:0] ALUOUT; reg [15:0] ALUOUT; reg carry,temp; always @(ALUCON or PORT1 or PORT2) begin case (ALUCON[2:0]) 75 3'b000 : begin {temp,ALUOUT} = (PORT1 + PORT2); //do add //generate appropriate carry for the purpose of slt especially if(PORT1[15]==PORT2[15]) carry=temp; else if(PORT1[15]==1) carry=1; else carry=0; end 3'b001 : begin {temp,ALUOUT} = (PORT1 - PORT2); // do subtract //generate appropriate carry for the purpose of slt especially if(PORT1[15]==PORT2[15]) carry=temp; else if(PORT1[15]==1) carry=1; else carry=0; end 3'b010 : begin ALUOUT = PORT1 & PORT2;temp=0;carry=0; end // do and operation 3'b011 : begin ALUOUT = PORT1 | PORT2;temp=0;carry=0; end // do or operation 3'b100 : begin ALUOUT = PORT1 ^ PORT2;temp=0;carry=0; end // do xor operation 3'b101 : begin ALUOUT = ~PORT1 ;temp=0;carry=0; end // do not operation 3'b110 : begin ALUOUT = PORT1 << 1;temp=0;carry=0; end // do SLA operation 3'b111 : begin ALUOUT = PORT1 >> 1;temp=0;carry=0; end // do SRA operation endcase end endmodule //************************************************************ //Register File which contains 16 Register of 16 bits module regfile16(add_Rs, out_Rs, add_Rt, out_Rt, add_Rd, data_wr, regwr,addr_sw,data_sw, clk) ; input [3:0] add_Rs, add_Rt, add_Rd; input regwr, clk; input [15:0] data_wr; output [15:0] out_Rs, out_Rt,addr_sw,data_sw; reg[15:0] Register[15:0]; reg [15:0] out_Rs, out_Rt; wire [15:0] addr_sw,data_sw; assign addr_sw = out_Rs; 76 assign data_sw = out_Rt; always @ (posedge clk) if(regwr) begin if (add_Rd == 0) Register[add_Rd] = 0;//if it is zero ,the output must be zero only else Register[add_Rd] = data_wr; $display("At this posedge Register %d of Regfile is written %h",add_Rd,data_wr); end always @ (add_Rs or add_Rt) begin if (add_Rs == 0) out_Rs = 0;// if it is zero register,then give the content as zero only else out_Rs = Register[add_Rs]; if (add_Rt == 0) out_Rt = 0;// if it is zero register,then give the content as zero only else out_Rt = Register[add_Rt]; end endmodule //************************************************************ module ram (clk, address, data_in, data_out, re, we); //parameter DATA_WIDTH = 16 ; //parameter ADDR_WIDTH = 16 ; //parameter RAM_DEPTH = 1 << ADDR_WIDTH; //--------------Input Ports----------------------input clk ; input [15:0] address, data_in ; input re ; input we ; //--------------Inout Ports----------------------output [15:0] data_out ; //--------------Internal variables---------------reg [15:0] data ; reg [15:0] mem [15:0];//65535 //--------------Code Starts Here-----------------assign data_out = (re)?data:16'bz; 77 // Memory Write Block // Write Operation : When we = 1 always @ (posedge clk) begin : MEM_WRITE if ( we ) begin mem[address] = data_in; end end // Memory Read Block // Read Operation : When re = 1 always @ (address or re ) begin : MEM_READ if (re) begin data = mem[address]; end end endmodule //************************************************************ module instr_reg(instr_data,irwr, clk,instr15_12, instr11_8, instr7_4, instr3_0); input [15:0] instr_data; input irwr,clk; output [3:0] instr15_12, instr11_8, instr7_4, instr3_0; reg [3:0] instr15_12, instr11_8, instr7_4, instr3_0; always @(posedge clk) begin if(irwr) begin $display(" Instruction register=memory[pc]=%h",instr_data); instr15_12=instr_data[15:12]; instr11_8=instr_data[11:8]; instr7_4=instr_data[7:4]; instr3_0=instr_data[3:0]; end end endmodule At this posedge //************************************************************ module pc(indata,outdata,pc_wrt_s2,rst,clk); input [15:0]indata; input clk,pc_wrt_s2,rst; output [15:0] outdata; reg [15:0] outdata; always @ (posedge clk or negedge rst) 78 if(~rst)outdata=0; else begin if(pc_wrt_s2) begin outdata=indata; $display("At this posedge contents of pc are %h",outdata); end end endmodule //************************************************************ module signextender(instr7_4,instr3_0,sign_ext8); input [3:0] instr7_4,instr3_0; output [15:0] sign_ext8; reg [15:0] sign_ext8; wire [7:0] A,B; wire [7:0] in; assign in={instr7_4,instr3_0}; assign A=8'b11111111; assign B=8'b00000000; always @ (in) begin if (in[7]==0)//use the msb to determine extension bits sign_ext8={B,in}; else sign_ext8={A,in}; end endmodule //************************************************************ module offset(sign_ext8,dataout,clk); input [15:0] sign_ext8; input clk; output [15:0] dataout; reg [15:0] dataout; always @ (posedge clk) begin dataout=sign_ext8; $display("At this posedge register B=%h",dataout); end endmodule //************************************************************ module aluout(datain,dataout,clk); input [15:0] datain; input clk; 79 output [15:0] dataout; reg [15:0] dataout; always @ (posedge clk) begin dataout=datain; $display("At this posedge of clock register aluout is %h",dataout); end endmodule module opa(datain,dataout,clk); input [15:0] datain; input clk; output [15:0] dataout; reg [15:0] dataout; always @ (posedge clk) begin dataout=datain; $display("At this posedge register B=%h",dataout); end endmodule //************************************************************ module opb(datain,dataout,clk); input [15:0] datain; input clk; output [15:0] dataout; reg [15:0] dataout; always @ (posedge clk) begin dataout=datain; $display(" B=%h",dataout); end endmodule At this posedge register //************************************************************ module mux2_to_1(in0,in1,sel,out); input [15:0] in0,in1; input sel; output [15:0] out; reg [15:0] out; always @(sel or in0 or in1) begin case(sel) 1'b0: out=in0; 1'b1: out=in1; endcase 80 end endmodule //************************************************************ module mux4_to_1(in0,in1,in2,in3,sel,out); input [15:0] in0,in1,in2,in3; input [1:0] sel; output [15:0] out; reg [15:0] out; always @(sel or in0 or in1 or in2 or in3) begin case(sel) 2'b00: out=in0; 2'b01: out=in1; 2'b10: out=in2; 2'b11: out=in3; endcase end endmodule //************************************************************ module mux4b_2_to_1(in0,in1,sel,out); input [3:0] in0,in1; input sel; output [3:0] out; reg [3:0] out; always @(sel or in0 or in1) begin case(sel) 1'b0: out=in0; 1'b1: out=in1; endcase end endmodule //************************************************************ module mdr(memdatain,memdataout,clk); input [15:0] memdatain; input clk; output [15:0] memdataout; reg [15:0]memdataout; always @ (posedge clk) memdataout=memdatain; endmodule //************************************************************ 81 Bibliography [1] J.L. Hennessy, D.A.Patterson, Computer Architecture: A Quantitative Approach, Second Edition, Morgan Kaufmann and Harcourt India, 2000. [2] J.L. Hennessy, D.A.Patterson, Computer Organization and Design: The Hardware/Software Interface, Second Edition, Morgan Kaufmann, 2003. [3] D. J. Smith, HDL Chip Design, International Edition, Doone Publications, 2000 [4] J.F. Wakerly, Digital Design: Principles and Practices, Third Edition, Prentice-Hall, 2000. [5] J. Bhasker, A VHDL Primer, Third Edition, Pearson, 1999. [6] A. S. Tanenbaum, Structured Computer Organization, Fourth Edition, Prentice-Hall, 2000. [7] Yatin Trivedi and others, Verilog HDL, IC, 2000. [8] Mauriss M Mano, Digital Design, Third Edition, Perason Edition, 2000 [9] Modelsim Help manual, from mentor graphics. [10] www.google.com 82