inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture 33 – Single Cycle CPU Datapath, with Verilog II 2004-04-14 Lecturer PSOE Dan Garcia www.cs.berkeley.edu/~ddgarcia Google Gmail Service!! Not so fast! State Sen Liz Figueroa (Fremont) is drafting legislation to block it because it’d place advertising in personal messages after searching them for keywords. “We think it's an absolute invasion of privacy. It's like having a massive billboard in the middle of your home.” CS 61C L33 Single Cycle CPU Datapath, with Verilog II (1) Garcia, Spring 2004 © UCB Storage Element: Idealized Memory Write Enable Address • Memory (idealized) Data In • One input bus: Data In 32 • One output bus: Data Out Clk DataOut 32 • Memory word is selected by: • Address selects the word to put on Data Out • Write Enable = 1: address selects the memory word to be written via the Data In bus • Clock input (CLK) • The CLK input is a factor ONLY during write operation • During read operation, behaves as a combinational logic block: - Address valid => Data Out valid after “access time.” CS 61C L33 Single Cycle CPU Datapath, with Verilog II (2) Garcia, Spring 2004 © UCB Verilog Memory for MIPS Interpreter (1/3) //Behavioral modelof Random Access Memory: // 32-bit wide, 256 words deep, // asynchronous read-port if RD=1, // synchronous write-port if WR=1, // initialize from hex file ("data.dat") // on positive edge of reset signal, // dump to binary file ("dump.dat") // on positive edge of dump signal. module mem (CLK,RST,DMP,WR,RD,address,writeD,readD); input CLK, RST, DMP, WR, RD; input [31:0] address, writeD; output [31:0] readD; reg [31:0] readD; parameter memSize=256; // ~ Constant dec. reg [31:0] memArray [0:memSize-1]; integer chann,i; // Temp variables: for loops ... CS 61C L33 Single Cycle CPU Datapath, with Verilog II (3) Garcia, Spring 2004 © UCB Verilog Memory for MIPS Interpreter (2/3) integer chann,i; always @ (posedge RST) $readmemh("data.dat", memArray); always @ (posedge CLK) if (WR) memArray[address[9:2]] = writeD; // write if WR & positive clock edge (synchronous) always @ (address or RD) if (RD) begin readD = memArray[address[9:2]]; $display("Getting address %h containing %h", address[9:2], readD); end // read if RD, independent of clock (asynchronous) CS 61C L33 Single Cycle CPU Datapath, with Verilog II (4) Garcia, Spring 2004 © UCB Why is it “memArray[address[9:2]]”? • Our memory is always byte-addressed • We can lb from 0x0, 0x1, 0x2, 0x3, … • lw only reads word-aligned requests • We only call lw with 0x0, 0x4, 0x8, 0xC, … • I.e., the last two bits are always 0 • memArray is a word wide and 28 deep •reg [31:0] memArray [0:256-1]; • Size = 4 Bytes/row * 256 rows = 1024 B • If we’re simulating lw/sw, we R/W words • What bits select the first 256 words? [9:2]! • 1st word = 0x0 = 0b000 = memArray[0]; 2nd word = 0x4 = 0b100 = memArray[1], etc. CS 61C L33 Single Cycle CPU Datapath, with Verilog II (5) Garcia, Spring 2004 © UCB Verilog Memory for MIPS Interpreter (3/3) end; always @ (posedge DMP) begin chann = $fopen("dump.dat"); if (chann==0) begin $display("$fopen of dump.dat failed."); $finish; end // Temp variables chan, i for (i=0; i<memSize; i=i+1) begin $fdisplay(chann, "%b", memArray[i]); end end // always @ (posedge DMP) endmodule // mem CS 61C L33 Single Cycle CPU Datapath, with Verilog II (6) Garcia, Spring 2004 © UCB Peer Instruction A. We should use the main ALU to compute PC=PC+4 B. We’re going to be able to read 2 registers and write a 3rd in 1 cycle C. Datapath is hard, Control is easy CS 61C L33 Single Cycle CPU Datapath, with Verilog II (7) 1: 2: 3: 4: 5: 6: 7: 8: ABC FFF FFT FTF FTT TFF TFT TTF TTT Garcia, Spring 2004 © UCB How to Design a Processor: step-by-step • 1. Analyze instruction set architecture (ISA) => datapath requirements • meaning of each instruction is given by the register transfers • datapath must include storage element for ISA registers • datapath must support each register transfer • 2. Select set of datapath components and establish clocking methodology • 3. Assemble datapath meeting requirements • 4. Analyze implementation of each instruction to determine setting of control points that effects the register transfer. • 5. Assemble the control logic (hard part!) CS 61C L33 Single Cycle CPU Datapath, with Verilog II (8) Garcia, Spring 2004 © UCB Storage Element: Register (Building Block) •Register • Similar to the D Flip Flop except Write Enable Data In N Data Out N - N-bit input and output - Write Enable input Clk • Write Enable: - negated (or deasserted) (0): Data Out will not change - asserted (1): Data Out will become Data In CS 61C L33 Single Cycle CPU Datapath, with Verilog II (9) Garcia, Spring 2004 © UCB Verilog 32-bit Register for MIPS Interpreter // Behavioral model of 32-bit Register: // positive edge-triggered, // synchronous active-high reset. module reg32 (CLK,Q,D,RST); input [31:0] D; input CLK, RST; output [31:0] Q; reg [31:0] Q; always @ (posedge CLK) if (RST) Q = 0; else Q = D; endmodule // reg32 CS 61C L33 Single Cycle CPU Datapath, with Verilog II (10) Garcia, Spring 2004 © UCB Storage Element: Register File • Register File consists of 32 registers: • Two 32-bit output busses: busA and busB • One 32-bit input bus: busW • Register is selected by: RW RA RB Write Enable 5 5 5 busW 32 Clk busA 32 32 32-bit Registers busB 32 • RA (number) selects the register to put on busA (data) • RB (number) selects the register to put on busB (data) • RW (number) selects the register to be written via busW (data) when Write Enable is 1 • Clock input (CLK) • The CLK input is a factor ONLY during write operation • During read operation, behaves as a combinational logic block: - RA or RB valid => busA or busB valid after “access time.” CS 61C L33 Single Cycle CPU Datapath, with Verilog II (11) Garcia, Spring 2004 © UCB Verilog Register File for MIPS Interpreter (1/3) // // // // // // Behavioral model of register file: 32-bit wide, 32 words deep, two asynchronous read-ports, one synchronous write-port. Dump register file contents to console on pos edge of dump signal. CS 61C L33 Single Cycle CPU Datapath, with Verilog II (12) Garcia, Spring 2004 © UCB Verilog Register File for MIPS Interpreter (2/3) module regFile (CLK, wEnb, DMP, writeReg, writeD, readReg1, readD1, readReg2, readD2); input CLK, wEnb, DMP; input [4:0] writeReg, readReg1, readReg2; input [31:0] writeD; output [31:0] readD1, readD2; reg [31:0] readD1, readD2; reg [31:0] array [0:31]; reg dirty1, dirty2; integer i; • 3 5-bit fields to select registers: 1 write register, 2 read register CS 61C L33 Single Cycle CPU Datapath, with Verilog II (13) Garcia, Spring 2004 © UCB Verilog Register File for MIPS Interpreter (3/3) always @ (posedge CLK) if (wEnb) if (writeReg!=4'h0) // why? begin array[writeReg] = writeD; dirty1=1'b1; dirty2=1'b1; end always @ (readReg1 or dirty1) begin readD1 = array[readReg1]; dirty1=0; end CS 61C L33 Single Cycle CPU Datapath, with Verilog II (14) Garcia, Spring 2004 © UCB Step 3: Assemble DataPath meeting requirements • Register Transfer Requirements Datapath Assembly • Instruction Fetch • Read Operands and Execute Operation CS 61C L33 Single Cycle CPU Datapath, with Verilog II (15) Garcia, Spring 2004 © UCB 3a: Overview of the Instruction Fetch Unit • The common RTL operations • Fetch the Instruction: mem[PC] • Update the program counter: - Sequential Code: PC = PC + 4 - Branch and Jump: PC = “something else” Clk PC Next Address Logic Address Instruction Memory CS 61C L33 Single Cycle CPU Datapath, with Verilog II (16) Instruction Word 32 Garcia, Spring 2004 © UCB 3b: Add & Subtract • R[rd] = R[rs] op R[rt] Ex.: addU rd, rs, rt • Ra, Rb, and Rw come from instruction’s Rs, Rt, 26 21 16 11 6 and Rd fields 31 op 6 bits rs 5 bits rt 5 bits rd 5 bits shamt 5 bits funct 6 bits 0 • ALUctr and RegWr: control logic after decoding the instruction Rd Rs Rt RegWr 5 5 5 32 32-bit Registers busA 32 busB 32 ALU busW 32 Clk Rw Ra Rb ALUctr Result 32 • Already defined register file, ALU CS 61C L33 Single Cycle CPU Datapath, with Verilog II (17) Garcia, Spring 2004 © UCB Clocking Methodology Clk . . . . . . . . . . . . • Storage elements clocked by same edge • Being physical devices, flip-flops (FF) and combinational logic have some delays • Gates: delay from input change to output change • Signals at FF D input must be stable before active clock edge to allow signal to travel within the FF, and we have the usual clock-to-Q delay • “Critical path” (longest path through logic) determines length of clock period CS 61C L33 Single Cycle CPU Datapath, with Verilog II (18) Garcia, Spring 2004 © UCB Register-Register Timing: One complete cycle Clk PC Old Value New Value Rs, Rt, Rd, Op, Func ALUctr Old Value RegWr Old Value Old Value busA, B Old Value busW Old Value Instruction Memory Access Time New Value Delay through Control Logic New Value New Value Register File Access Time New Value ALU Delay New Value Rd Rs Rt RegWr5 5 5 CS 61C L33 Single Cycle CPU Datapath, with Verilog II (19) busA 32 busB 32 ALU busW 32 Clk Rw Ra Rb 32 32-bit Registers ALUctr Register Write Occurs Here Result 32 Garcia, Spring 2004 © UCB 3c: Logical Operations with Immediate • R[rt] = R[rs] op ZeroExt[imm16] ] 31 26 21 op rs 31 6 bits 5 bits 16 rt 5 bits 16 15 rd? 11 0 immediate 16 bits 0 ALU immediate 0000000000000000 Rd Rt RegDst 16 bits 16 bits Mux Rt register read?? Rs Rt? What about ALUct RegWr 5 5 5 r busA Rw Ra Rb busW 32 Result 32 32-bit 32 Registers 32 busB Clk 32 Mux 16 ZeroExt imm16 32 ALUSrc • Already defined 32-bit MUX; Zero Ext? CS 61C L33 Single Cycle CPU Datapath, with Verilog II (20) Garcia, Spring 2004 © UCB 3d: Load Operations • R[rt] = Mem[R[rs] + SignExt[imm16]] Example: lw rt, rs, imm16 31 26 op rs 6 bits Rd RegDst Mux RegWr 5 32 Clk 0 rt 5 bits immediate 5 bits 16 bits Rt Rs Rt? 5 5 Rw Ra Rb 32 32-bit Registers busA W_Src 32 ?? 32 ExtOp 32 MemWr ALUSrc CS 61C L33 Single Cycle CPU Datapath, with Verilog II (21) Data In 32 Clk Mu x busB 32 Mux 16 ALUctr Extender imm16 16 ALU busW 21 WrEn Adr Data Memory 32 Garcia, Spring 2004 © UCB 3e: Store Operations • Mem[ R[rs] + SignExt[imm16] ] = R[rt] Ex.: sw rt, rs, imm16 31 26 21 op rs 6 bits 5 bits Rd Rt RegDst Mux RegWr5 5 rt 5 bits ALU Data In32 32 ExtOp 32 Mu x Extender 16 W_Src Rs Rt 5 busA Rw Ra Rb 32 32 32-bit Registers busB 32 imm16 0 immediate 16 bits ALUctr MemWr Mux busW 32 Clk 16 Clk WrEn Adr 32 Data Memory ALUSrc CS 61C L33 Single Cycle CPU Datapath, with Verilog II (22) Garcia, Spring 2004 © UCB 3f: The Branch Instruction 31 26 op 6 bits • beq 21 rs 5 bits 16 rt 5 bits 0 immediate 16 bits rs, rt, imm16 • mem[PC]Fetch the instruction from memory • Equal = R[rs] == R[rt] condition Calculate the branch • if (Equal) Calculate the next instruction’s address - PC = PC + 4 + ( SignExt(imm16) x 4 ) else - PC = PC + 4 CS 61C L33 Single Cycle CPU Datapath, with Verilog II (23) Garcia, Spring 2004 © UCB Datapath for Branch Operations • beq rs, rt, imm16 Datapath generates condition (equal) 26 op 6 bits 21 00 Adder 32 PC Mux Rs Rt 5 busA Rw Ra Rb 32 32 32-bit Registers busB 32 Cond RegWr 5 5 busW Clk Adder PC Ext imm16 0 rs rt immediate 5 bits 5 bits 16 bits Inst Address nPC_sel 4 16 Equal? 31 Clk • Already MUX, adder, sign extend, zero CS 61C L33 Single Cycle CPU Datapath, with Verilog II (24) Garcia, Spring 2004 © UCB Putting it All Together:A Single Cycle Datapath Instruction<31:0> <0:15> <11:15> Rs <16:20> <21:25> Inst Memory Adr Rt Rd Imm16 RegDst ALUctr MemWr MemtoReg Equal Rt Rd 1 0 Rs Rt RegWr 5 5 5 busA Rw Ra Rb = busW 32 32 32-bit 0 32 32 Registers busB 0 32 Clk 32 WrEn Adr 1 1 Data In Data imm16 32 Clk 16 Clk Memory nPC_sel imm16 Mux ALU Extender PC Ext Adder Mux PC Mux Adder 00 4 ExtOp ALUSrc CS 61C L33 Single Cycle CPU Datapath, with Verilog II (25) Garcia, Spring 2004 © UCB Peer Instruction 1: Suppose we’re writing a MIPS interpreter in 2: Verilog. Which sequence below is best 3: organization for the interpreter? 4: A. repeat loop that fetches instructions 5: B. while loop that fetches instructions 6: C. Decodes instructions using case statement 7: D. Decodes instr. using chained if statements 8: 9: E. Executes each instruction 0: F. Increments PC by 4 CS 61C L33 Single Cycle CPU Datapath, with Verilog II (26) ACEF ADEF AECF AEDF BCEF BDEF BECF BEDF EF FAE Garcia, Spring 2004 © UCB An Abstract View of the Implementation PC Clk Next Address ALU Control Ideal Instruction Instruction Control Signals Conditions Memory Rd Rs Rt 5 5 5 Instruction Address A Data Data 32 Address Rw Ra Rb 32 Ideal Out 32 32-bit 32 Data Data Registers B Memory In Clk 32 Clk Datapath CS 61C L33 Single Cycle CPU Datapath, with Verilog II (27) Garcia, Spring 2004 © UCB Summary: Single cycle datapath °5 steps to design a processor • 1. Analyze instruction set => datapath requirements • 2. Select set of datapath components & establish clock methodology • 3. Assemble datapath meeting the requirements • 4. Analyze implementation of each instruction to determine setting of control points that effects the register transfer. Processor • 5. Assemble the control logic °Control is the hard part °Next time! CS 61C L33 Single Cycle CPU Datapath, with Verilog II (28) Input Control Memory Datapath Output Garcia, Spring 2004 © UCB