inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture 27 – Single Cycle CPU Datapath, with Verilog II 2004-11-01 Lecturer PSOE Dan Garcia www.cs.berkeley.edu/~ddgarcia Another shutout for Cal! Unbelievable! The #4 Bears were dominant in beating ASU 27-0. JJ Arrington shatters Cal records w/his 7th-straight 100yd game, becoming the fastest Cal player ever to reach 1,000 yds. It’s ASU’s 1st shutout loss in 9 yrs & our first time in the top 5 in 52 years!! OU next Sat… calbears.com Garcia, Spring 2004 © UCB CS 61C L27 Single Cycle CPU Datapath, with Verilog II (1) Why is it “memArray[address[9:2]]”? • Our memory is always byte-addressed • We can lb from 0x0, 0x1, 0x2, 0x3, … • lw only reads word-aligned requests • We only call lw with 0x0, 0x4, 0x8, 0xC, … • I.e., the last two bits are always 0 • memArray is a word wide and 28 deep •reg [31:0] memArray [0:256-1]; • Size = 4 Bytes/row * 256 rows = 1024 B • If we’re simulating lw/sw, we R/W words • What bits select the first 256 words? [9:2]! • 1st word = 0x0 = 0b000 = memArray[0]; 2nd word = 0x4 = 0b100 = memArray[1], etc. CS 61C L27 Single Cycle CPU Datapath, with Verilog II (2) Garcia, Spring 2004 © UCB How to Design a Processor: step-by-step • 1. Analyze instruction set architecture (ISA) => datapath requirements • meaning of each instruction is given by the register transfers • datapath must include storage element for ISA registers • datapath must support each register transfer • 2. Select set of datapath components and establish clocking methodology • 3. Assemble datapath meeting requirements • 4. Analyze implementation of each instruction to determine setting of control points that effects the register transfer. • 5. Assemble the control logic (hard part!) CS 61C L27 Single Cycle CPU Datapath, with Verilog II (3) Garcia, Spring 2004 © UCB Storage Element: Register (Building Block) • Similar to D Flip Flop except - N-bit input and output - Write Enable input • Write Enable: - negated (or deasserted) (0): Data Out will not change - asserted (1): Data Out will become Data In CS 61C L27 Single Cycle CPU Datapath, with Verilog II (4) Write Enable Data In N Data Out N Clk Garcia, Spring 2004 © UCB Verilog 32-bit Register for MIPS Interpreter // Behavioral model of 32-bit Register: // positive edge-triggered, // synchronous active-high reset. module reg32 (CLK,Q,D,wEnb); input CLK, wEnb; input [31:0] D; output [31:0] Q; reg [31:0] Q; always @ (posedge CLK) if (wEnb) Q = D; endmodule // reg32 CS 61C L27 Single Cycle CPU Datapath, with Verilog II (5) Garcia, Spring 2004 © UCB Storage Element: Register File • Register File consists of 32 registers: • Two 32-bit output busses: busA and busB • One 32-bit input bus: busW • Register is selected by: RW RA RB Write Enable 5 5 5 busW 32 Clk busA 32 32 32-bit Registers busB 32 • RA (number) selects the register to put on busA (data) • RB (number) selects the register to put on busB (data) • RW (number) selects the register to be written via busW (data) when Write Enable is 1 • Clock input (CLK) • The CLK input is a factor ONLY during write operation • During read operation, behaves as a combinational logic block: - RA or RB valid => busA or busB valid after “access time.” CS 61C L27 Single Cycle CPU Datapath, with Verilog II (6) Garcia, Spring 2004 © UCB Verilog Register File for MIPS Interpreter (1/3) // // // // // // Behavioral model of register file: 32-bit wide, 32 words deep, two asynchronous read-ports, one synchronous write-port. Dump register file contents to console on pos edge of dump signal. CS 61C L27 Single Cycle CPU Datapath, with Verilog II (7) Garcia, Spring 2004 © UCB Verilog Register File for MIPS Interpreter (2/3) module regFile (CLK, wEnb, DMP, writeReg, writeD, readReg1, readD1, readReg2, readD2); input CLK, wEnb, DMP; input [4:0] writeReg, readReg1, readReg2; input [31:0] writeD; output [31:0] readD1, readD2; reg [31:0] readD1, readD2; reg [31:0] array [0:31]; reg dirty1, dirty2; integer i; • 3 5-bit fields to select registers: 1 write register, 2 read register CS 61C L27 Single Cycle CPU Datapath, with Verilog II (8) Garcia, Spring 2004 © UCB Verilog Register File for MIPS Interpreter (3/3) always @ (posedge CLK) if (wEnb) if (writeReg!=5'h0) // why? begin array[writeReg] = writeD; dirty1=1'b1; dirty2=1'b1; end always @ (readReg1 or dirty1) begin readD1 = array[readReg1]; dirty1=0; end CS 61C L27 Single Cycle CPU Datapath, with Verilog II (9) Garcia, Spring 2004 © UCB Step 3: Assemble DataPath meeting requirements • Register Transfer Requirements Datapath Assembly • Instruction Fetch • Read Operands and Execute Operation CS 61C L27 Single Cycle CPU Datapath, with Verilog II (10) Garcia, Spring 2004 © UCB 3a: Overview of the Instruction Fetch Unit • The common RTL operations • Fetch the Instruction: mem[PC] • Update the program counter: - Sequential Code: PC = PC + 4 - Branch and Jump: PC = “something else” Clk PC Next Address Logic Address Instruction Memory CS 61C L27 Single Cycle CPU Datapath, with Verilog II (11) Instruction Word 32 Garcia, Spring 2004 © UCB 3b: Add & Subtract • R[rd] = R[rs] op R[rt] Ex.: addU rd,rs,rt • Ra, Rb, and Rw come from instruction’s Rs, Rt, 26 21 16 11 6 and Rd fields 31 op 6 bits rs 5 bits rt 5 bits rd 5 bits shamt 5 bits funct 6 bits 0 • ALUctr and RegWr: control logic after decoding the instruction Rd Rs Rt RegWr 5 5 5 32 32-bit Registers busA 32 busB 32 ALU busW 32 Clk Rw Ra Rb ALUctr Result 32 • Already defined register file, ALU CS 61C L27 Single Cycle CPU Datapath, with Verilog II (12) Garcia, Spring 2004 © UCB Clocking Methodology Clk . . . . . . . . . . . . • Storage elements clocked by same edge • Being physical devices, flip-flops (FF) and combinational logic have some delays • Gates: delay from input change to output change • Signals at FF D input must be stable before active clock edge to allow signal to travel within the FF, and we have the usual clock-to-Q delay • “Critical path” (longest path through logic) determines length of clock period CS 61C L27 Single Cycle CPU Datapath, with Verilog II (13) Garcia, Spring 2004 © UCB Register-Register Timing: One complete cycle Clk PC Old Value New Value Rs, Rt, Rd, Op, Func ALUctr Old Value RegWr Old Value Old Value busA, B Old Value busW Old Value Instruction Memory Access Time New Value Delay through Control Logic New Value New Value Register File Access Time New Value ALU Delay New Value Rd Rs Rt RegWr5 5 5 CS 61C L27 Single Cycle CPU Datapath, with Verilog II (14) busA 32 busB 32 ALU busW 32 Clk Rw Ra Rb 32 32-bit Registers ALUctr Register Write Occurs Here Result 32 Garcia, Spring 2004 © UCB 3c: Logical Operations with Immediate • R[rt] = R[rs] op ZeroExt[imm16] ] 31 26 21 op rs 31 6 bits 5 bits 16 rt 5 bits 16 15 rd? 11 0 immediate 16 bits 0 ALU immediate 0000000000000000 Rd Rt RegDst 16 bits 16 bits Mux Rt register read?? Rs Rt? What about ALUct RegWr 5 5 5 r busA Rw Ra Rb busW 32 Result 32 32-bit 32 Registers 32 busB Clk 32 Mux 16 ZeroExt imm16 32 ALUSrc • Already defined 32-bit MUX; Zero Ext? CS 61C L27 Single Cycle CPU Datapath, with Verilog II (15) Garcia, Spring 2004 © UCB 3d: Load Operations • R[rt] = Mem[R[rs] + SignExt[imm16]] Example: lw rt,rs,imm16 31 26 op rs 6 bits Rd RegDst Mux RegWr 5 32 Clk 0 rt 5 bits immediate 5 bits 16 bits Rt Rs Rt 5 5 Rw Ra Rb 32 32-bit Registers busA W_Src 32 ?? 32 ExtOp 32 MemWr ALUSrc CS 61C L27 Single Cycle CPU Datapath, with Verilog II (16) Data In 32 Clk Mu x busB 32 Mux 16 ALUctr Extender imm16 16 ALU busW 21 WrEn Adr Data Memory 32 Garcia, Spring 2004 © UCB 3e: Store Operations • Mem[ R[rs] + SignExt[imm16] ] = R[rt] Ex.: sw rt, rs, imm16 31 26 21 op rs 6 bits 5 bits Rd Rt RegDst Mux RegWr5 5 rt 5 bits ALU Data In32 32 ExtOp 32 Clk WrEn Adr 32 Data Memory Mux Extender 16 W_Src Rs Rt 5 busA Rw Ra Rb 32 32 32-bit Registers busB 32 imm16 0 immediate 16 bits ALUctr MemWr Mux busW 32 Clk 16 ALUSrc CS 61C L27 Single Cycle CPU Datapath, with Verilog II (17) Garcia, Spring 2004 © UCB 3f: The Branch Instruction 31 26 op 6 bits 21 rs 5 bits 16 rt 5 bits 0 immediate 16 bits • beq rs, rt, imm16 • mem[PC] Fetch the instruction from memory • Equal = R[rs] == R[rt] Calculate branch condition • if (Equal) Calculate the next instruction’s address - PC = PC + 4 + ( SignExt(imm16) x 4 ) else - PC = PC + 4 CS 61C L27 Single Cycle CPU Datapath, with Verilog II (18) Garcia, Spring 2004 © UCB Datapath for Branch Operations • beq rs, rt, imm16 Datapath generates condition (equal) 26 op 6 bits 21 00 Adder 32 PC Mux Rs Rt 5 busA Rw Ra Rb 32 32 32-bit Registers busB 32 Cond RegWr 5 5 busW Clk Adder PC Ext imm16 0 rs rt immediate 5 bits 5 bits 16 bits Inst Address nPC_sel 4 16 Equal? 31 Clk • Already MUX, adder, sign extend, zero CS 61C L27 Single Cycle CPU Datapath, with Verilog II (19) Garcia, Spring 2004 © UCB Putting it All Together:A Single Cycle Datapath Instruction<31:0> <0:15> <11:15> Rs <16:20> <21:25> Inst Memory Adr Rt Rd Imm16 RegDst ALUctr MemWr MemtoReg Equal Rt Rd 1 0 Rs Rt RegWr 5 5 5 busA Rw Ra Rb = busW 32 32 32-bit 0 32 32 Registers busB 0 32 Clk 32 WrEn Adr 1 1 Data In Data imm16 32 Clk 16 Clk Memory nPC_sel imm16 Mux ALU Extender PC Ext Adder Mux PC Mux Adder 00 4 ExtOp ALUSrc CS 61C L27 Single Cycle CPU Datapath, with Verilog II (20) Garcia, Spring 2004 © UCB An Abstract View of the Implementation PC Clk Next Address ALU Control Ideal Instruction Instruction Control Signals Conditions Memory Rd Rs Rt 5 5 5 Instruction Address A Data Data 32 Address Rw Ra Rb 32 Ideal Out 32 32-bit 32 Data Data Registers B Memory In Clk 32 Clk Datapath CS 61C L27 Single Cycle CPU Datapath, with Verilog II (21) Garcia, Spring 2004 © UCB Peer Instruction 1: Suppose we’re writing a MIPS interpreter in 2: Verilog. Which sequence below is best 3: organization for the interpreter? 4: A. repeat loop that fetches instructions 5: B. while loop that fetches instructions 6: C. Decodes instructions using case statement 7: D. Decodes instr. using chained if statements 8: 9: E. Executes each instruction 0: F. Increments PC by 4 CS 61C L27 Single Cycle CPU Datapath, with Verilog II (22) ACEF ADEF AECF AEDF BCEF BDEF BECF BEDF EF FAE Garcia, Spring 2004 © UCB Summary: Single cycle datapath °5 steps to design a processor • 1. Analyze instruction set => datapath requirements • 2. Select set of datapath components & establish clock methodology • 3. Assemble datapath meeting the requirements • 4. Analyze implementation of each instruction to determine setting of control points that effects the register transfer. Processor • 5. Assemble the control logic °Control is the hard part °Next time! CS 61C L27 Single Cycle CPU Datapath, with Verilog II (23) Input Control Memory Datapath Output Garcia, Spring 2004 © UCB Dwarfing the importance of this lecture… …is the importance that tomorrow you get out and VOTE! CS 61C L27 Single Cycle CPU Datapath, with Verilog II (24) Garcia, Spring 2004 © UCB