Single Cycle CPU Design CS 365 Lecture 7 Big Picture: Where are We Now? Five classic components of a computer Processor Input Control Memory Datapath Output Today’s topic: design a single cycle processor machine design inst. set design (Ch 2) Single-cycle CPU arithmetic (Ch 3) technology CS465 2 D. Barbara The Performance Perspective Performance of a machine is determined by: Instruction count Clock cycle time Clock cycles per instruction CPI Inst. Count Cycle Time Processor design (datapath and control) will determine: Clock cycle time Clock cycles per instruction Today: single cycle processor Advantage: one clock cycle per instruction Disadvantage: long cycle time Single-cycle CPU CS465 3 D. Barbara How to Design a Processor 1. Analyze instruction set datapath requirements Meaning of each instruction given by register transfers Datapath must include storage element for ISA registers (possibly more) Datapath must support each register transfer 2. Select set of datapath components and establish clocking methodology 3. Assemble datapath meeting the requirements 4. Analyze implementation of each instruction to determine setting of control points that effects the register transfer 5. Assemble the control logic Single-cycle CPU CS465 4 D. Barbara MIPS Instruction Formats All MIPS instructions are 32 bits long; 3 instruct. formats: R-type 31 26 op rs 6 bits I-type 31 26 op 31 J-type 16 rt 5 bits 5 bits 21 rs 6 bits 21 5 bits 6 0 rd shamt funct 5 bits 5 bits 6 bits 16 0 immediate rt 5 bits 16 bits 26 op 0 target address 6 bits 11 26 bits The different fields are: op: operation of the instruction rs, rt, rd: the source and destination register specifiers shamt: shift amount funct: selects the variant of the operation in the “op” field address / immediate: address offset or immediate value target address: target address of the jump instruction Single-cycle CPU CS465 5 D. Barbara Step 1a: The MIPS-lite Subset ADD, SUB, AND, OR add rd, rs, rt sub rd, rs, rt and rd, rs,rt or rd,rs,rt 31 26 op rs 6 bits 31 26 op 16 rt 5 bits 5 bits 21 rs 6 bits 21 6 0 rd shamt funct 5 bits 5 bits 6 bits 16 rt 5 bits 11 0 immediate 5 bits 16 bits LOAD and STORE Word lw rt, rs, imm16 sw rt, rs, imm16 31 BRANCH: 26 op 6 bits 21 rs 5 bits 16 rt 5 bits 0 immediate 16 bits beq rs, rt, imm16 Single-cycle CPU CS465 6 D. Barbara Register Transfer Language RTL gives the meaning of the instructions First step is to fetch the instruction from memory op | rs | rt | rd | shamt | funct = MEM[ PC ] op | rs | rt | Imm16 = MEM[ PC ] inst Register Transfers ADD R[rd] <– R[rs] + R[rt]; PC <– PC + 4 SUB R[rd] <– R[rs] – R[rt]; PC <– PC + 4 OR R[rd] <– R[rs] | R[rt]; PC <– PC + 4 LOAD R[rt] <– MEM[ R[rs] + sign_ext(Imm16)]; PC <– PC + 4 STORE MEM[ R[rs] + sign_ext(Imm16) ] <– R[rt]; PC <– PC + 4 BEQ if ( R[rs] == R[rt] ) then PC <– PC + sign_ext(Imm16)] || 00 else PC <– PC + 4 Single-cycle CPU CS465 7 D. Barbara Step 1b: Requirements of Instr. Set Memory Instruction & data Registers (32 x 32) Read RS Read RT Write RT or RD Extender Add and Sub register or extended immediate PC Add 4 or extended immediate to PC Single-cycle CPU CS465 8 D. Barbara Step 2: Components of Datapath Combinational elements Storage elements Clocking Single-cycle CPU methodology CS465 9 D. Barbara Abstract/Simplified View of Datapath Data PC Address Instruction Instruction memory Register # Registers Register # ALU Address Data memory Register # Data Two types of functional units: Elements that operate on data values (combinational) Elements that contain state (sequential) Single-cycle CPU CS465 10 D. Barbara Combinational Logic Elements Adder CarryIn A MUX 32 Selec t A 32 B Sum Carry 32 MUX Adder B Basic building blocks 32 32 Y 32 O P A ALU B Single-cycle CPU ALU 32 32 Result 32 CS465 11 D. Barbara State Elements: Review Unclocked vs. Clocked Clocks used in synchronous logic When should an element that contains state falling edge be updated? cycle time rising edge Single-cycle CPU CS465 12 D. Barbara Relationship between state elements and combinatorial circuits State elements change only on the clock edge They provide stable outputs for the combinatorial logic The clock period must be long enough for the combinatorial circuit to stabilize Single-cycle CPU CS465 13 D. Barbara Absence of race conditions Read and write can be done in the same cycle (clock period long enough) Single-cycle CPU CS465 14 D. Barbara An Unclocked State Element The set-reset latch Output depends on present inputs and also on past inputs R Q _ Q S Single-cycle CPU CS465 15 D. Barbara Latches and Flip-flops Output is equal to the stored value inside the element Change of state (value) is based on the clock Latches: whenever the inputs change, and the clock is asserted Flip-flop: state changes only on a clock edge Don't need to ask for permission to look at the value Edge-triggered methodology Details: Appendix B.8 A clocking methodology defines when signals can be read and written — wouldn't want to read a signal at the same time it was being written Single-cycle CPU CS465 16 D. Barbara D-latch Two inputs: The data value to be stored (D) The clock signal (C) indicating when to read & store D Two outputs: The value of the internal state (Q) and its complement C Q D C _ Q Q D Single-cycle CPU CS465 17 D. Barbara D Flip-flop Output changes only on the clock edge D D C D latch Q D Q D latch _ C Q Q _ Q C D C Q Single-cycle CPU CS465 18 D. Barbara Our Implementation An edge triggered methodology Typical execution: Read contents of some state elements Send values through some combinational logic Write results to one or more state elements State element 1 Combinational logic State element 2 Clock cycle State element Single-cycle CPU Combinational logic CS465 19 D. Barbara Storage Element: Register Register Similar Basic building block to the D flip-flop except N-bit input and output Write Enable input Write Write Enable Data In N Data Out N Clk Enable: Negated (0): Data Out will not change Asserted (1): Data Out will become Data In Single-cycle CPU CS465 20 D. Barbara Implementing Register File Built using D flip-flops Read register number 1 Read register number 1 Register 0 Register 1 Register n – 1 M u x Read register number 2 Read data 1 Register file Write register Read data 2 Register n Write data Read register number 2 M u x Single-cycle CPU Read data 1 Write Read data 2 CS465 21 D. Barbara Writing to Register File Note: we still use the clock to determine when to write Write C 0 R egister n u m ber R e gi ster 0 1 D n-to-1 de co der C R e gi ster 1 D n – 1 n C R egister n – 1 D C R e gi ster n D R e gi st er d at a Single-cycle CPU CS465 22 D. Barbara Storage Element – Register File Register file consists of 32 registers: Two 32-bit output busses: busA and busB One 32-bit input bus: busW busW 32 Clk busA 32 32 32-bit Registers busB 32 Register is selected by: RW RARB Write Enable 5 5 5 RA (number) selects the register to put on busA (data) RB (number) selects the register to put on busB (data) RW (number) selects the register to be written via busW (data) when Write Enable is 1 Clock input (CLK) The CLK input is a factor ONLY during write operation During read operation, behaves as a combinational logic block: RA or RB valid => busA or busB valid after “access time” Single-cycle CPU CS465 23 D. Barbara Storage Element: Idealized Memory Memory (idealized) Write Enable One input bus: Data In One output bus: Data Out Address Data In 32 Clk DataOut 32 Memory word is selected by: Address selects the word to put on Data Out Write Enable = 1: address selects the memory word to be written via the Data In bus Clock input (CLK) The CLK input is a factor ONLY during write operation During read operation, behaves as a combinational logic block: Address valid => Data Out valid after “access time” Single-cycle CPU CS465 24 D. Barbara Clocking Methodology Clk Setup Hold Setup Hold . . . . . . Don’t Care . . . . . . All storage elements are clocked by the same clock edge Cycle Time = CLK-to-Q + Longest Delay Path + Setup + Clock Skew Single-cycle CPU CS465 25 D. Barbara How to Design a Processor 1. Analyze instruction set datapath requirements 2. Select set of datapath components and establish clocking methodology 3. Assemble datapath meeting the requirements 4. Analyze implementation of each instruction to determine setting of control points that effects the register transfer 5. Assemble the control logic Single-cycle CPU CS465 26 D. Barbara Step 3 Register transfer requirements Datapath assembly Instruction fetch Read operands and execute operation Single-cycle CPU CS465 27 D. Barbara 3a: Instruction Fetch Unit The common RTL operations Fetch the Instruction: mem[PC] Update the program counter: Sequential Code: PC <- PC + 4 Branch and Jump: PC <- “something else” We don’t know if instruction is a Branch/Jump or one of the other instructions until we have fetched and interpreted the instruction from memory So all instructions initially increment the PC Single-cycle CPU CS465 28 D. Barbara Components to Assemble Instruction address PC Instruction Add Sum Instruction memory a. Instruction memory Single-cycle CPU b. Program counter CS465 c. Adder 29 D. Barbara Datapath for Instruction Fetch Add 4 PC Read address Instruction Instruction memory Single-cycle CPU CS465 30 D. Barbara Step 3b: R-format Instructions Example instructions: add, sub, and, or, slt R[rd] <- R[rs] op R[rt] Read register 1, Read register 2, and Write register come from instruction’s rs, rt, and rd fields ALU control and RegWrite: control logic after decoding the instruction 31 26 op Register numbers 5 5 Data 16 rs 6 bits 5 21 rt 5 bits Read register 1 11 5 bits 6 rd shamt 5 bits 5 bits 3 0 funct 6 bits ALU control Read data 1 Read register 2 Registers Write register Read data 2 Write data Data Zero ALU ALU result RegWrite a. Registers Single-cycle CPU b. ALU CS465 31 D. Barbara Datapath for R-format Instructions 3 Read register 1 Instruction Read register 2 Registers Write register Write data ALU operation Read data 1 Zero ALU ALU result Read data 2 RegWrite Single-cycle CPU CS465 32 D. Barbara Register-Register Timing Clk PC Old Value Clk-to-Q New Value Rs, Rt, Rd, Op, Func Old Value ALUctr Old Value RegWr Old Value busA, B busW Instruction Memory Access Time New Value Delay through Control Logic New Value New Value Register File Access Time New Value Old Value ALU Delay New Value Old Value Rd Rs Rt RegWr 5 5 5 Single-cycle CPU Register Write Occurs Here Result busA 32 busB 32 CS465 ALU busW 32 Clk Rw Ra Rb 32 32-bit Registers ALUctr 32 33 D. Barbara Step 3d: Load & Store Operations R[rt] <- Mem[R[rs] + SignExt[imm16]] Example: lw rt, rs, imm16 Mem[ R[rs] + SignExt[imm16] <- R[rt] ] Example: sw rt, rs, imm16 31 26 op 6 bits 21 rs 16 rt 5 bits 0 immediate 5 bits 16 bits MemWrite Address Write data Read data 16 Data memory Sign extend 32 MemRead a. Data memory unit Single-cycle CPU b. Sign-extension unit CS465 34 D. Barbara Datapath for LW & SW R[rt] <- Mem[R[rs] + SignExt[imm16]] Example: lw rt, rs, imm16 Mem[ R[rs] + SignExt[imm16] <- R[rt] ] Example: sw rt, rs, imm16 3 Read register 1 Instruction ALU operation MemWrite Read data 1 Read register 2 Registers Write register Read data 2 Write data Zero ALU ALU result Write data RegWrite 16 Single-cycle CPU Address Sign extend 32 Read data Data memory MemRead CS465 35 D. Barbara 3f: The Branch Instruction 31 26 op 6 bits beq 21 rs 5 bits 16 rt immediate 5 bits Fetch the instruction from memory Equal <- R[rs] == R[rt] condition 16 bits rs, rt, imm16 mem[PC] 0 Calculate the branch if (COND eq 0), calculate the next instruction’s address: PC <- PC + 4 + ( SignExt(imm16) x 4 ) Else: PC <- PC + 4 Single-cycle CPU CS465 36 D. Barbara Datapath for Branch Instruction PC + 4 from instruction datapath Add Sum Branch target Shift left 2 Instruction 3 Read register 1 Read register 2 Registers Write register Write data ALU operation Read data 1 ALU Zero To branch control logic Read data 2 RegWrite 16 Single-cycle CPU Sign extend 32 CS465 37 D. Barbara Multiplexors: Stitch Datapath R-Instructions & Memory Accesses Add 4 PC Read address Ins truction Instruction memory Read register 1 Registers 3 Read register 2 Read data 1 Write register Read data 2 ALUSrc M u x Zero ALU ALU result Address Write data RegWrite Single-cycle CPU MemWrite MemtoReg Write data 16 ALU operation Sign 32 extend CS465 Read data Data memory M u x MemRead 38 D. Barbara Putting All Together With addition of Branch PCSrc M u x Add Add ALU result 4 Shift left 2 PC Read address Instruction Instruction memory Single-cycle CPU Registers Read register 1 Read Read data 1 register 2 Write register Write data RegWrite 16 ALUSrc Read data 2 Sign extend M u x 3 ALU operation Zero ALU ALU result MemWrite MemtoReg Address Read data Data memory Write data M u x 32 MemRead CS465 39 D. Barbara Putting All Together (cont’d) PCSrc Add ALU Add result 4 RegWrite Instruction [25– 21] PC Read address Instruction [31– 0] Instruction memory Instruction [20– 16] 1 M u Instruction [15– 11] x 0 RegDst Instruction [15– 0] Read register 1 Read register 2 Read data 1 Read Write data 2 register Write Registers data 16 Sign 32 extend 1 M u x 0 Shift left 2 MemWrite ALUSrc 1 M u x 0 ALU control Zero ALU ALU result MemtoReg Address Read data Data Write memory data 1 M u x 0 MemRead Instruction [5– 0] ALUOp Single-cycle CPU CS465 40 D. Barbara Adding Control Unit 0 M u x ALU Add result Add 4 Instruction [31 26] Control Instruction [25 21] PC Read address Instruction memory Instruction [15 11] Shift left 2 RegDst Branch MemRead MemtoReg ALUOp MemWrite ALUSrc RegWrite PCSrc Read register 1 Instruction [20 16] Instruction [31– 0] 1 0 M u x 1 Read data 1 Read register 2 Registers Read Write data 2 register 0 M u x 1 Write data Zero ALU ALU result Address Write data Instruction [15 0] 16 Sign extend Read data Data memory 1 M u x 0 32 ALU control Instruction [5 0] Single-cycle CPU CS465 41 D. Barbara add $t1,$t2,$t3 Single-cycle CPU CS465 42 D. Barbara lw $t1,offset($t2) Single-cycle CPU CS465 43 D. Barbara beq $t1,$t2,offset Single-cycle CPU CS465 44 D. Barbara An Abstract View of Critical Path Register file and ideal memory: The CLK input is a factor ONLY during write operation During read operation, behave as combinational logic: Address valid => Output valid after “access time.” Ideal Instruction Memory Critical Path (Load Operation) = PC’s Clk-to-Q + Instruction Memory’s Access Time + Register File’s Access Time + ALU to Perform a 32-bit Add + Data Memory Access Time + Setup Time for Register File Write + Clock Skew Data 32 Address Ideal Data Data Memory In Instruction Rd Rs 5 5 Instruction Address Rt 5 Imm 16 A 32 Rw Ra Rb PC 32 32-bit Registers 32 ALU Next Address B Clk Clk Single-cycle CPU 32 CS465 Clk 45 D. Barbara Single Cycle Implementation Calculate cycle time assuming negligible delays except: memory (2ns), ALU and adders (2ns), register file access (1ns) PCSrc Add ALU Add result 4 RegWrite Instruction [25– 21] PC Read address Instruction [31– 0] Instruction memory Instruction [20– 16] 1 M u Instruction [15– 11] x 0 RegDst Instruction [15– 0] Read register 1 Read register 2 Read data 1 Read Write data 2 register Write data Registers 16 Sign 32 extend 1 M u x 0 Shift left 2 MemWrite ALUSrc 1 M u x 0 ALU control Zero ALU ALU result MemtoReg Address Read data Data Write data memory 1 M u x 0 MemRead Instruction [5– 0] ALUOp Single-cycle CPU CS465 46 D. Barbara Single Cycle Implementation Calculate cycle time assuming negligible delays except: memory (2ns), ALU and adders (2ns), register file access (1ns) Instructi Instructi Register ALU on class on Fetch Access Register/ Memory Access R-Type X X X R Load Store Branch X X X X X X M M X X X Jump X Single-cycle CPU CS465 Register Access X 47 D. Barbara How to Design a Processor 1. Analyze instruction set datapath requirements 2. Select set of datapath components and establish clocking methodology 3. Assemble datapath meeting the requirements 4. Analyze implementation of each instruction to determine setting of control points that effects the register transfer 5. Assemble the control logic Single-cycle CPU CS465 48 D. Barbara Step 4: Datapath: RTL -> Control Instruction<31:0> Rs Rd <0:15> Rt <11:15> Op Fun <16:20> Adr <21:25> <21:25> Inst Memory Imm16 Control Branch RegWr RegDst ALUSrc ALUop MemRd MemWr MemtoReg Zero DATA PATH Single-cycle CPU CS465 49 D. Barbara Control Selecting the operations to perform (ALU, read/write, etc.) Controlling the flow of data (multiplexor inputs) Design the ALU Control Unit Design the Main Control Unit Information comes from the 32 bits of the instruction Example: add $8, $17, $18 Instruction Format: 000000 10001 10010 01000 00000 100000 op rs rt rd shamt funct ALU's operation based on instruction type and function code Single-cycle CPU CS465 50 D. Barbara ALU Control What should the ALU do with the given instruction Example: lw $1, 100($2) 35 2 1 op rs rt 100 16 bit offset ALU control input 0000 AND Recall design of ALU from Chapter 3 0001 OR 0010 add 0110 subtract 0111 set-on-less-than Single-cycle CPU CS465 51 D. Barbara ALU Control Design Instructi on ALUOp opcode Instruction operation Funct field LW 00 Load word xxxxxx Add 0010 SW 00 Store word xxxxxx Add 0010 BEQ 01 Branch eq xxxxxx Subtract 0110 R-type 10 Add 100000 Add 0010 R-type 10 Subtract 100010 Subtract 0110 R-type 10 AND 100100 And 0000 R-type 10 OR 100101 Or 0001 R-type 10 Set on less than 101010 Set on less than 0111 Single-cycle CPU CS465 Desired ALU action ALU control input 52 D. Barbara Control Implementation Must describe hardware to compute 4-bit ALU control input Given instruction type 00 = lw, sw 01 = beq 10 = arithmetic Function code for arithmetic ALUOp computed from instruction type Describe it using a truth table (can turn into gates): ALUOp Funct field ALUOp1 ALUOp0 F5 F4 F3 F2 F1 0 0 X X X X X X 1 X X X X X 1 X X X 0 0 0 1 X X X 0 0 1 1 X X X 0 1 0 1 X X X 0 1 0 1 X X X 1 0 1 Single-cycle CPU CS465 Operation F0 X X 0 0 0 1 0 0010 0110 0010 0110 0000 0001 0111 53 D. Barbara Design the Main Control Unit Seven control signals RegDst RegWrite ALUSrc PCSrc MemRead MemWrite MemtoReg Single-cycle CPU CS465 54 D. Barbara Control Signals 1. 2. 3. 4. 5. 6. 7. RegDst = 0 => Register destination number for the Write register comes from the rt field (bits 20-16) RegDst = 1 => Register destination number for the Write register comes from the rd field (bits 15-11) RegWrite = 1 => The register on the Write register input is written with the data on the Write data input (at the next clock edge) ALUSrc = 0 => The second ALU operand comes from Read data 2 ALUSrc = 1 => The second ALU operand comes from the signextension unit PCSrc = 0 => The PC is replaced with PC+4 PCSrc = 1 => The PC is replaced with the branch target address MemtoReg = 0 => The value fed to the register write data input comes from the ALU MemtoReg = 1 => The value fed to the register write data input comes from the data memory MemRead = 1 => Read data memory MemWrite = 1 => Write data memory Single-cycle CPU CS465 55 D. Barbara R-format Instructions RegDst = 1 RegWrite = 1 ALUSrc = 0 Branch = 0 MemtoReg = 0 MemRead = 0 MemWrite = 0 ALUOp = 10 Single-cycle CPU CS465 56 D. Barbara Memory Access Instructions Load word Store Word RegDst = 0 RegDst = X RegWrite = 1 RegWrite = 0 ALUSrc = 1 ALUSrc = 1 Branch = 0 Branch = 0 MemtoReg = 1 MemtoReg = X MemRead = 1 MemRead = 0 MemWrite = 0 MemWrite = 1 ALUOp = 00 ALUOp = 00 Single-cycle CPU CS465 57 D. Barbara Branch on Equal RegDst = X RegWrite = 0 ALUSrc = 0 Branch = 1 MemtoReg = X MemRead = 0 MemWrite = 0 ALUOp = 01 Single-cycle CPU CS465 58 D. Barbara Control 0 M u x Add Add 4 Instruction [31– 26] Control Instruction [25– 21] PC Read address Instruction memory Instruction [15– 11] 1 Shift left 2 RegDst Branch MemRead MemtoReg ALUOp MemWrite ALUSrc RegWrite Read register 1 Instruction [20– 16] Instruction [31– 0] ALU result 0 M u x 1 Read data 1 Read register 2 Registers Read Write data 2 register 0 M u x 1 Write data Zero ALU ALU result Address Write data Instruction [15– 0] 16 Sign extend Read data Data memory 1 M u x 0 32 ALU control Instruction [5– 0] Memto- Reg Mem Mem Instruction RegDst ALUSrc Reg Write Read Write Branch ALUOp1 ALUp0 R-format 1 0 0 1 0 0 0 1 0 lw 0 1 1 1 1 0 0 0 0 sw X 1 X 0 0 1 0 0 0 beq X 0 X 0 0 0 1 0 1 Single-cycle CPU CS465 59 D. Barbara Step 5: Implementing Control Simple combinational logic (truth tables) Inputs Op5 Op4 Op3 Op2 Op1 ALUOp Op0 ALU control block ALUOp0 Outputs ALUOp1 R-format Operation2 F3 sw beq RegDst ALUSrc MemtoReg Operation F2 Iw Operation1 RegWrite F (5– 0) F1 MemRead Operation0 MemWrite F0 Branch ALUOp1 ALUOpO ALU Control Unit Main Control Unit Single-cycle CPU CS465 60 D. Barbara Our Simple Control Structure All of the logic is combinational We wait for everything to settle down, and the right thing to be done ALU might not produce “right answer” right away We use write signals along with clock to determine when to write Cycle time determined by length of the longest path State ele ment 1 Co mbinational logic State element 2 Clock cycle Single-cycle CPU CS465 61 D. Barbara Summary 5 steps to design a processor MIPS makes it easier 1. Analyze instruction set => datapath requirements 2. Select set of datapath components & establish clock methodology 3. Assemble datapath meeting the requirements 4. Analyze implementation of each instruction to determine setting of control points that effects the register transfer 5. Assemble the control logic Instructions same size Source registers always in same place Immediates same size, location Operations always on registers/immediates Single cycle datapath => CPI=1, Clock Cycle Time => long Single-cycle CPU CS465 62 D. Barbara Next Lecture Topic: Multiple cycle implementation of CPU Reading Ch5 Patterson and Hennessy Single-cycle CPU CS465 63 D. Barbara