inst.eecs.berkeley.edu/~cs61c CS61CL : Machine Structures Lecture #9 – Single Cycle CPU Design 2009-07-22 Jeremy Huddleston CS61CL L09 Single Cycle CPU Design (1) Huddleston, Summer 2009 © UCB Finite State Machine Example: 3 ones… FSM to detect the occurrence of 3 consecutive 1’s in the input. Draw the FSM… Assume state transitions are controlled by the clock: on each clock cycle the machine checks the inputs and moves to a new state and produces a new output… CS61CL L09 Single Cycle CPU Design (2) Huddleston, Summer 2009 © UCB Hardware Implementation of FSM … Therefore a register is needed to hold the a representation of which state the machine is in. Use a unique bit pattern for each state. + = Combinational logic circuit is used to implement a function maps from present state and input to next state and output. CS61CL L09 Single Cycle CPU Design (3) ? Huddleston, Summer 2009 © UCB Register Details…What’s inside? • n instances of a “Flip-Flop” • Flip-flop name because the output flips and flops between and 0,1 • D is “data”, Q is “output” • Also called “d-type Flip-Flop” CS61CL L09 Single Cycle CPU Design (4) Huddleston, Summer 2009 © UCB Adder/Subtracter – One-bit adder LSB… CS61CL L09 Single Cycle CPU Design (5) Huddleston, Summer 2009 © UCB Adder/Subtracter – One-bit adder (1/2)… CS61CL L09 Single Cycle CPU Design (6) Huddleston, Summer 2009 © UCB Adder/Subtracter – One-bit adder (2/2)… CS61CL L09 Single Cycle CPU Design (7) Huddleston, Summer 2009 © UCB N 1-bit adders 1 N-bit adder b0 + CS61CL L09 Single Cycle CPU Design (8) + + Huddleston, Summer 2009 © UCB Administrivia frequency • Midterm handed back today CS61CL L09 Single Cycle CPU Design (9) Huddleston, Summer 2009 © UCB Five Components of a Computer Computer Processor Control Datapath Memory (passive) (where programs, data live when running) CS61CL L09 Single Cycle CPU Design (10) Devices Input Output Keyboard, Mouse Disk (where programs, data live when not running) Display, Printer Huddleston, Summer 2009 © UCB The CPU • Processor (CPU): the active part of the computer, which does all the work (data manipulation and decisionmaking) • Datapath: portion of the processor which contains hardware necessary to perform operations required by the processor (the brawn) • Control: portion of the processor (also in hardware) which tells the datapath what needs to be done (the brain) CS61CL L09 Single Cycle CPU Design (11) Huddleston, Summer 2009 © UCB Stages of the Datapath : Overview • Problem: a single, atomic block which “executes an instruction” (performs all necessary operations beginning with fetching the instruction) would be too bulky and inefficient • Solution: break up the process of “executing an instruction” into stages, and then connect the stages to create the whole datapath • smaller stages are easier to design • easy to optimize (change) one stage without touching the others CS61CL L09 Single Cycle CPU Design (12) Huddleston, Summer 2009 © UCB Stages of the Datapath (1/5) • There is a wide variety of MIPS instructions: so what general steps do they have in common? • Stage 1: Instruction Fetch • no matter what the instruction, the 32-bit instruction must first be fetched from memory (the cache-memory hierarchy) • also, this is where we Increment PC (that is, PC = PC + 4, to point to the next instruction: byte addressing so + 4) CS61CL L09 Single Cycle CPU Design (13) Huddleston, Summer 2009 © UCB Stages of the Datapath (2/5) • Stage 2: Instruction Decode • upon fetching the instruction, we next gather data from the fields (decode all necessary instruction data) • first, read the opcode to determine instruction type and field lengths • second, read in data from all necessary registers - for add, read two registers - for addi, read one register - for jal, no reads necessary CS61CL L09 Single Cycle CPU Design (14) Huddleston, Summer 2009 © UCB Stages of the Datapath (3/5) • Stage 3: ALU (Arithmetic-Logic Unit) • the real work of most instructions is done here: arithmetic (+, -, *, /), shifting, logic (&, |), comparisons (slt) • what about loads and stores? - lw $t0, 40($t1) - the address we are accessing in memory = the value in $t1 PLUS the value 40 - so we do this addition in this stage CS61CL L09 Single Cycle CPU Design (15) Huddleston, Summer 2009 © UCB Stages of the Datapath (4/5) • Stage 4: Memory Access • actually only the load and store instructions do anything during this stage; the others remain idle during this stage or skip it all together • since these instructions have a unique step, we need this extra stage to account for them • as a result of the cache system, this stage is expected to be fast CS61CL L09 Single Cycle CPU Design (16) Huddleston, Summer 2009 © UCB Stages of the Datapath (5/5) • Stage 5: Register Write • most instructions write the result of some computation into a register • examples: arithmetic, logical, shifts, loads, slt • what about stores, branches, jumps? - don’t write anything into a register at the end - these remain idle during this fifth stage or skip it all together CS61CL L09 Single Cycle CPU Design (17) Huddleston, Summer 2009 © UCB +4 1. Instruction Fetch ALU Data memory rd rs rt registers PC instruction memory Generic Steps of Datapath imm 2. Decode/ Register Read CS61CL L09 Single Cycle CPU Design (18) 3. Execute 4. Memory 5. Reg. Write Huddleston, Summer 2009 © UCB Datapath Walkthroughs (1/3) • add $r3,$r1,$r2 # r3 = r1+r2 • Stage 1: fetch this instruction, inc. PC • Stage 2: decode to find it’s an add, then read registers $r1 and $r2 • Stage 3: add the two values retrieved in Stage 2 • Stage 4: idle (nothing to write to memory) • Stage 5: write result of Stage 3 into register $r3 CS61CL L09 Single Cycle CPU Design (19) Huddleston, Summer 2009 © UCB reg[1]+reg[2] reg[2] ALU Data memory 2 reg[1] imm add r3, r1, r2 +4 3 1 registers PC instruction memory Example: add Instruction CS61CL L09 Single Cycle CPU Design (20) Huddleston, Summer 2009 © UCB Datapath Walkthroughs (3/3) • sw $r3, 17($r1) • Stage 1: fetch this instruction, inc. PC • Stage 2: decode to find it’s a sw, then read registers $r1 and $r3 • Stage 3: add 17 to value in register $r1 (retrieved in Stage 2) • Stage 4: write value in register $r3 (retrieved in Stage 2) into memory address computed in Stage 3 • Stage 5: idle (nothing to write into a register) CS61CL L09 Single Cycle CPU Design (21) Huddleston, Summer 2009 © UCB 17 CS61CL L09 Single Cycle CPU Design (22) reg[1] reg[1]+17 reg[3] ALU Data MEM[r1+17]<=r3 memory imm x 1 SW r3, 17(r1) +4 3 registers PC instruction memory Example: sw Instruction Huddleston, Summer 2009 © UCB Why Five Stages? (1/2) • Could we have a different number of stages? • Yes, and other architectures do • So why does MIPS have five if instructions tend to idle for at least one stage? • The five stages are the union of all the operations needed by all the instructions. • There is one type of instruction that uses all five stages: the load CS61CL L09 Single Cycle CPU Design (23) Huddleston, Summer 2009 © UCB Why Five Stages? (2/2) • lw $r3, 17($r1) • Stage 1: fetch this instruction, inc. PC • Stage 2: decode to find it’s a lw, then read register $r1 • Stage 3: add 17 to value in register $r1 (retrieved in Stage 2) • Stage 4: read value from memory address compute in Stage 3 • Stage 5: write value found in Stage 4 into register $r3 CS61CL L09 Single Cycle CPU Design (24) Huddleston, Summer 2009 © UCB ALU MEM[r1+17] 17 reg[1]+17 Data memory imm x 1 reg[1] LW r3, 17(r1) +4 3 registers PC instruction memory Example: lw Instruction CS61CL L09 Single Cycle CPU Design (25) Huddleston, Summer 2009 © UCB Datapath Summary • The datapath based on data transfers required to perform instructions ALU Data memory rd rs rt registers PC instruction memory • A controller causes the right transfers to happen imm +4 opcode, funct Controller CS61CL L09 Single Cycle CPU Design (26) Huddleston, Summer 2009 © UCB CPU clocking (1/2) For each instruction, how do we control the flow of information though the datapath? • Single Cycle CPU: All stages of an instruction are completed within one long clock cycle. • The clock cycle is made sufficient long to allow each instruction to complete all stages without interruption and within one cycle. 1. Instruction Fetch 2. Decode/ Register Read CS61CL L09 Single Cycle CPU Design (27) 3. Execute 4. Memory 5. Reg. Write Huddleston, Summer 2009 © UCB CPU clocking (2/2) For each instruction, how do we control the flow of information though the datapath? • Multiple-cycle CPU: Only one stage of instruction per clock cycle. • The clock is made as long as the slowest stage. 1. Instruction Fetch 2. Decode/ 3. Execute 4. Memory Register Read 5. Reg. Write • Several significant advantages over single cycle execution: Unused stages in a particular instruction can be skipped OR instructions can be pipelined (overlapped). CS61CL L09 Single Cycle CPU Design (28) Huddleston, Summer 2009 © UCB How to Design a Processor: step-by-step • 1. Analyze instruction set architecture (ISA) datapath requirements • meaning of each instruction is given by the register transfers • datapath must include storage element for ISA registers • datapath must support each register transfer • 2. Select set of datapath components and establish clocking methodology • 3. Assemble datapath meeting requirements • 4. Analyze implementation of each instruction to determine setting of control points that effects the register transfer. • 5. Assemble the control logic (hard part!) CS61CL L09 Single Cycle CPU Design (29) Huddleston, Summer 2009 © UCB Review: The MIPS Instruction Formats • All MIPS instructions are 32 bits long. 3 formats: • R-type • I-type 31 26 op rs 6 bits 31 26 31 5 bits 21 rs 6 bits 16 rt 5 bits op • J-type 21 5 bits 11 6 rd shamt funct 5 bits 5 bits 6 bits 16 0 address/immediate rt 5 bits 16 bits 26 op 6 bits 0 0 target address 26 bits • The different fields are: • • • • • • op: operation (“opcode”) of the instruction rs, rt, rd: the source and destination register specifiers shamt: shift amount funct: selects the variant of the operation in the “op” field address / immediate: address offset or immediate value target address: target address of jump instruction CS61CL L09 Single Cycle CPU Design (30) Huddleston, Summer 2009 © UCB The MIPS-lite Subset for today • ADDU and SUBU •addu rd,rs,rt •subu rd,rs,rt • OR Immediate: 31 op 31 31 6 bits 0 rd shamt funct 5 bits 5 bits 6 bits 0 16 bits 16 rt 5 bits 6 immediate 5 bits 21 rs 11 16 rt 5 bits 26 op 5 bits 21 rs 6 bits 16 rt 5 bits 26 op •lw rt,rs,imm16 •sw rt,rs,imm16 21 rs 6 bits •ori rt,rs,imm16 • LOAD and STORE Word 26 0 immediate 5 bits 16 bits • BRANCH: •beq rs,rt,imm16 31 26 op 6 bits CS61CL L09 Single Cycle CPU Design (31) 21 rs 5 bits 16 rt 5 bits 0 immediate 16 bits Huddleston, Summer 2009 © UCB ALU Needs for MIPS-lite + Rest of MIPS • Addition, subtraction, logical OR, ==: ADDU R[rd] = R[rs] + R[rt]; ... SUBU R[rd] = R[rs] – R[rt]; ... ORI R[rt] = R[rs] | zero_ext(Imm16)... BEQ if ( R[rs] == R[rt] )... • Test to see if output == 0 for any ALU operation gives == test. How? • P&H also adds AND, Set Less Than (1 if A < B, 0 otherwise) • ALU follows chap 5 CS61CL L09 Single Cycle CPU Design (32) Huddleston, Summer 2009 © UCB How to Design a Processor: step-by-step • 1. Analyze instruction set architecture (ISA) datapath requirements • meaning of each instruction is given by the register transfers • datapath must include storage element for ISA registers • datapath must support each register transfer • 2. Select set of datapath components and establish clocking methodology • 3. Assemble datapath meeting requirements • 4. Analyze implementation of each instruction to determine setting of control points that effects the register transfer. • 5. Assemble the control logic (hard part!) CS61CL L09 Single Cycle CPU Design (33) Huddleston, Summer 2009 © UCB What Hardware Is Needed? (1/2) • PC: a register which keeps track of memory addr of the next instruction • General Purpose Registers • used in Stages 2 (Read) and 5 (Write) • MIPS has 32 of these • Memory • used in Stages 1 (Fetch) and 4 (R/W) • cache system makes these two stages as fast as the others, on average CS61CL L09 Single Cycle CPU Design (34) Huddleston, Summer 2009 © UCB What Hardware Is Needed? (2/2) • ALU • used in Stage 3 • something that performs all necessary functions: arithmetic, logicals, etc. • we’ll design details later • Miscellaneous Registers • In implementations with only one stage per clock cycle, registers are inserted between stages to hold intermediate data and control signals as they travels from stage to stage. • Note: Register is a general purpose term meaning something that stores bits. Not all registers are in the “register file”. CS61CL L09 Single Cycle CPU Design (35) Huddleston, Summer 2009 © UCB Combinational Logic Elements (Building Blocks) • Adder CarryIn A B • MUX Adder 32 Sum 32 CarryOut 32 Select B 32 MUX A 32 • ALU A OP ALU B 32 Y 32 32 Result 32 CS61CL L09 Single Cycle CPU Design (36) Huddleston, Summer 2009 © UCB Storage Element: Idealized Memory • Memory (idealized) Write Enable Address • One input bus: Data In • One output bus: Data Out • Memory word is found by: Data In 32 Clk DataOut 32 • Address selects the word to put on Data Out • Write Enable = 1: address selects the memory word to be written via the Data In bus • Clock input (CLK) • The CLK input is a factor ONLY during write operation • During read operation, behaves as a combinational logic block: - Address valid Data Out valid after “access time.” CS61CL L09 Single Cycle CPU Design (37) Huddleston, Summer 2009 © UCB Storage Element: Register (Building Block) • Similar to D Flip Flop except • N-bit input and output • Write Enable input • Write Enable: • negated (or deasserted) (0): Data Out will not change Write Enable Data In Data Out N N clk • asserted (1): Data Out will become Data In on positive edge of clock CS61CL L09 Single Cycle CPU Design (38) Huddleston, Summer 2009 © UCB Storage Element: Register File • Register File consists of 32 registers: • Two 32-bit output busses: busA and busB • One 32-bit input bus: busW RWRA RB Write Enable 5 5 5 busA busW 32 32 32-bit 32 Registers busB • Register is selected by: Clk 32 • RA (number) selects the register to put on busA (data) • RB (number) selects the register to put on busB (data) • RW (number) selects the register to be written via busW (data) when Write Enable is 1 • Clock input (clk) • The clk input is a factor ONLY during write operation • During read operation, behaves as a combinational logic block: - RA or RB valid busA or busB valid after “access time.” CS61CL L09 Single Cycle CPU Design (39) Huddleston, Summer 2009 © UCB How to Design a Processor: step-by-step • 1. Analyze instruction set architecture (ISA) datapath requirements • meaning of each instruction is given by the register transfers • datapath must include storage element for ISA registers • datapath must support each register transfer • 2. Select set of datapath components and establish clocking methodology • 3. Assemble datapath meeting requirements • 4. Analyze implementation of each instruction to determine setting of control points that effects the register transfer. • 5. Assemble the control logic (hard part!) CS61CL L09 Single Cycle CPU Design (40) Huddleston, Summer 2009 © UCB Overview of the Instruction Fetch Unit • The common RTL operations • Fetch the Instruction: mem[PC] • Update the program counter: - Sequential Code: PC PC + 4 - Branch and Jump: PC “something else” clk PC Next Address Logic Address Instruction Memory CS61CL L09 Single Cycle CPU Design (41) Instruction Word 32 Huddleston, Summer 2009 © UCB Add & Subtract • R[rd] = R[rs] op R[rt] Ex.: addU rd,rs,rt • Ra, Rb, and Rw come from instruction’s Rs, Rt, and Rd fields • ALUctr and RegWr: control logic after decoding the instruction 31 26 21 16 11 op rs rt rd 6 bits 5 bits 5 bits 5 bits • … Already defined the register file & ALU Rd Rs Rt RegWr 5 5 5 Rw Ra Rb 32 32-bit Registers clk CS61CL L09 Single Cycle CPU Design (42) funct 6 bits 0 ALUctr busA 32 busB ALU busW 32 6 shamt 5 bits Result 32 32 Huddleston, Summer 2009 © UCB Logical Operations with Immediate • R[rt] = R[rs] op ZeroExt[imm16] 31 26 op 31 6 bits 21 rs 16 rt 5 bits 0 immediate 5 bits 16 15 16 bits 0 immediate 0000000000000000 16 bits 16 bits But we’re writing to Rt register?? ALUctr RegWr Rd Rs Rt 5 5 Rw Ra Rb busA RegFile busB clk CS61CL L09 Single Cycle CPU Design (43) 32 ALU busW 5 32 32 Huddleston, Summer 2009 © UCB Logical Operations with Immediate • R[rt] = R[rs] op ZeroExt[imm16] 31 26 21 op rs 31 6 bits Rd Rt RegWr 5 bits 16 15 16 bits 0 immediate 16 bits What about Rt register read?? Rs Rt 5 ALUctr 5 Rw Ra Rb busA RegFile busB 32 16 ZeroExt clk imm16 immediate 32 0 ALU 32 rt 5 bits 0 5 0 0000000000000000 16 bits RegDst 1 16 32 1 32 ALUSrc • Already defined 32-bit MUX; Zero Ext? CS61CL L09 Single Cycle CPU Design (44) Huddleston, Summer 2009 © UCB Load Operations • R[rt] = Mem[R[rs] + SignExt[imm16]] Example: lw rt,rs,imm16 31 26 op 6 bits 21 16 rs 5 bits rt 5 bits 0 immediate 16 bits RegDst Rd Rt 1 RegWr 5 Rs Rt 5 ALUctr 5 Rw Ra Rb busA RegFile busB 32 imm16 CS61CL L09 Single Cycle CPU Design (45) 16 ZeroExt clk 32 0 ALU 32 0 32 1 32 ALUSrc Huddleston, Summer 2009 © UCB Load Operations • R[rt] = Mem[R[rs] + SignExt[imm16]] Example: lw rt,rs,imm16 31 26 op 6 bits 21 rs 5 bits 16 0 rt 5 bits immediate 16 bits MemtoReg ALUctr MemWr RegDst Rd Rt 1 RegWr 0 Rs Rt 5 5 5 Rw Ra Rb RegFile 32 busB 32 imm16 16 ExtOp CS61CL L09 Single Cycle CPU Design (46) Extender clk 32 0 1 32 ALU busW busA ? 32 Data In ALUSrc clk 32 0 WrEn Adr Data Memory 1 Huddleston, Summer 2009 © UCB Store Operations • Mem[ R[rs] + SignExt[imm16] ] = R[rt] Ex.: sw rt, rs, imm16 31 26 op 6 bits 21 rs 5 bits 16 0 rt 5 bits immediate 16 bits MemtoReg ALUctr MemWr RegDst Rd Rt 1 RegWr 0 Rs Rt 5 5 5 Rw Ra Rb RegFile 32 busB 32 imm16 16 ExtOp CS61CL L09 Single Cycle CPU Design (47) Extender clk 32 0 ALU busW busA 32 0 32 WrEn Adr 1 32 Data In ALUSrc clk Data Memory 1 Huddleston, Summer 2009 © UCB Store Operations • Mem[ R[rs] + SignExt[imm16] ] = R[rt] Ex.: sw rt, rs, imm16 31 26 op 6 bits 21 rs 5 bits 16 0 rt 5 bits immediate 16 bits MemtoReg ALUctr MemWr RegDst Rd Rt 1 RegWr 0 Rs Rt 5 5 5 Rw Ra Rb RegFile 32 busB 32 imm16 16 ExtOp CS61CL L09 Single Cycle CPU Design (48) Extender clk 32 0 ALU busW busA 32 0 32 WrEn Adr 1 32 Data In ALUSrc clk Data Memory 1 Huddleston, Summer 2009 © UCB The Branch Instruction 31 26 op 6 bits 21 rs 5 bits 16 rt 5 bits immediate 16 bits 0 beq rs, rt, imm16 • mem[PC] Fetch the instruction from memory • Equal = R[rs] == R[rt] Calculate branch condition • if (Equal) Calculate the next instruction’s address - PC = PC + 4 + ( SignExt(imm16) x 4 ) else - PC = PC + 4 CS61CL L09 Single Cycle CPU Design (49) Huddleston, Summer 2009 © UCB Datapath for Branch Operations • beq rs, rt, imm16 Datapath generates condition (equal) 31 26 21 op rs 6 bits 5 bits Inst Address 16 rt 5 bits immediate 16 bits Equal nPC_sel Adder 4 00 RegWr busW clk clk 5 ALUctr Rs Rt 5 5 Rw Ra Rb busA RegFile busB 32 = ALU PC Mux Adder PC Ext imm16 0 32 32 Already have mux, adder, need special sign extender for PC, need equal compare (sub?) CS61CL L09 Single Cycle CPU Design (50) Huddleston, Summer 2009 © UCB Putting it All Together:A Single Cycle Datapath RegDst 00 32 5 Rs Rt 5 5 Rw Ra Rb RegFile busB 16 Extender imm16 busA 32 clk clk MemtoReg ALUctr MemWr 32 = ALU PC PC Ext Adder Mux busW Equal 0 RegWr Adder 4 Rt Rd Imm16 Rd Rt 1 Instruction<31:0> <0:15> nPC_sel Rs <11:15> Adr <16:20> <21:25> Inst Memory 0 32 1 32 Data In clk 32 0 WrEn Adr Data Memory 1 imm16 ExtOp CS61CL L09 Single Cycle CPU Design (51) ALUSrc Huddleston, Summer 2009 © UCB An Abstract View of the Implementation Ideal Instruction Memory PC clk 32 Instruction Rd Rs Rt 5 5 5 Rw Ra Rb Register File clk Control Signals Conditions A 32 ALU Next Address Instruction Address Control B 32 32 Data Addr Ideal Data Memory Data Out Data In clk Datapath CS61CL L09 Single Cycle CPU Design (52) Huddleston, Summer 2009 © UCB Summary: A Single Cycle Datapath Instruction<31:0> Rs Rt Rd Imm16 ALUctr MemtoReg Rd Rt 1 0 RegWr 5 Rs Rt 5 busB 32 16 Extender clk imm16 busA ExtOp CS61CL L09 Single Cycle CPU Design (53) 32 MemWr = ALU RegFile 32 zero 5 Rw Ra Rb busW <0:15> RegDst <11:15> clk <16:20> instr fetch unit nPC_sel <21:25> • We have everything except control signals 0 32 1 32 Data In clk 32 0 WrEn Adr Data Memory 1 ALUSrc Huddleston, Summer 2009 © UCB “And In conclusion…” • N-bit adder-subtractor done using N 1bit adders with XOR gates on input • XOR serves as conditional inverter • CPU design involves Datapath,Control • Datapath in MIPS involves 5 CPU stages 1. Instruction Fetch 2. Instruction Decode & Register Read 3. ALU (Execute) 4. Memory 5. Register Write CS61CL L09 Single Cycle CPU Design (54) Huddleston, Summer 2009 © UCB Bonus slides • These are extra slides that used to be included in lecture notes, but have been moved to this, the “bonus” area to serve as a supplement. • The slides will appear in the order they would have in the normal presentation CS61CL L09 Single Cycle CPU Design (55) Huddleston, Summer 2009 © UCB Review of Timing Terms • Clock (CLK) - steady square wave that synchronizes system • Setup Time - when the input must be stable before the rising edge of the CLK • Hold Time - when the input must be stable after the rising edge of the CLK • “CLK-to-Q” Delay - how long it takes the output to change, measured from the rising edge • Flip-flop - one bit of state that samples every rising edge of the CLK • Register - several bits of state that samples on rising edge of CLK or on LOAD CS61CL L09 Single Cycle CPU Design (56) Huddleston, Summer 2009 © UCB What about overflow? • Consider a 2-bit signed # & overflow: •10 •11 •00 •01 = -2 + -2 or -1 = -1 + -2 only = 0 NOTHING! = 1 + 1 only • Highest adder ± # • C1 = Carry-in = Cin, C2 = Carry-out = Cout • No Cout or Cin NO overflow! • Cin and Cout NO overflow! What • C but no C A,B both > 0, overflow! in out op? • Cout but no Cin A,B both < 0, overflow! CS61CL L09 Single Cycle CPU Design (57) Huddleston, Summer 2009 © UCB What about overflow? • Consider a 2-bit signed # & overflow: 10 11 00 01 = -2 + -2 or -1 = -1 + -2 only = 0 NOTHING! = 1 + 1 only ± # • Overflows when… • Cin, but no Cout A,B both > 0, overflow! • Cout, but no Cin A,B both < 0, overflow! CS61CL L09 Single Cycle CPU Design (58) Huddleston, Summer 2009 © UCB Extremely Clever Subtractor CS61CL L09 Single Cycle CPU Design (59) Huddleston, Summer 2009 © UCB Datapath Walkthroughs (2/3) • slti $r3,$r1,17 • Stage 1: fetch this instruction, inc. PC • Stage 2: decode to find it’s an slti, then read register $r1 • Stage 3: compare value retrieved in Stage 2 with the integer 17 • Stage 4: idle • Stage 5: write the result of Stage 3 in register $r3 CS61CL L09 Single Cycle CPU Design (60) Huddleston, Summer 2009 © UCB 17 reg[1]<17? ALU Data memory imm x 1 reg[1] slti r3, r1, 17 +4 3 registers PC instruction memory Example: slti Instruction CS61CL L09 Single Cycle CPU Design (61) Huddleston, Summer 2009 © UCB Clocking Methodology Clk . . . . . . . . . . . . • Storage elements clocked by same edge • Being physical devices, flip-flops (FF) and combinational logic have some delays • Gates: delay from input change to output change • Signals at FF D input must be stable before active clock edge to allow signal to travel within the FF (set-up time), and we have the usual clock-to-Q delay • “Critical path” (longest path through logic) determines length of clock period CS61CL L09 Single Cycle CPU Design (62) Huddleston, Summer 2009 © UCB An Abstract View of the Critical Path Ideal Instruction Memory PC clk 32 Instruction Rd Rs Rt 5 5 5 Rw Ra Rb Register File clk CS61CL L09 Single Cycle CPU Design (63) (Assumes a fast controller) A 32 ALU Next Address Instruction Address Critical Path (Load Instruction) = Delay clock through PC (FFs) + Instruction Memory’s Access Time + Register File’s Access Time, + ALU to Perform a 32-bit Add + Data Memory Access Time + Stable Time for Register File Write B 32 32 Data Addr Ideal Data Memory Data In clk Huddleston, Summer 2009 © UCB