CS232 Midterm Exam 2 November 12, 2003 Name: Darth Vader Section: Imperial Star Ship This exam has 7 pages including the pipelined datapath diagram on the last page, which you are free to tear off. You have 50 minutes, so budget your time carefully! No written references or calculators are allowed. To make sure you receive credit, please write clearly and show your work. We will not answer questions regarding course material. Question Maximum 1 50 2 20 3 30 Total 100 1 Your Score Question 1: Multicycle CPU implementation (50 points) Consider extending the MIPS architecture with the instruction below, which adds three registers together and stores the result in a register. add3 rd, rs, rt, ru # rd = rs + rt + ru This will use the same format as R-type instructions---shown here for reference---where the shamt field is used to hold ru. Field Bits op 31-26 rs 25-21 rt 20-16 rd 15-11 shamt/ru 10-6 func 5-0 An example of the usage of the add3 instruction is shown in part (c) of question 1. Part (a) The multicycle datapath from lecture appears below. Show what changes are needed to support add3. You should only add wires and muxes to the datapath; do not modify the main functional units themselves (the memory, register file and ALU). Try to keep your diagram neat! (15 points) Note: While we’re primarily concerned about correctness, full points will only be rewarded to solutions that use a minimal number of cycles and do not lengthen the clock cycle. Assume that the ALU, Memory and Register file all take 2ns, and everything else is instantaneous. PCWrite ALUSrcA PC RegSrcB IorD 2 0 RegDst 0 M u x 1 RegWrite M u x MemRead Address 1 IRWrite 0 Memory Write Mem data Data MemWrite 0 [31-26] [25-21] [20-16] [15-11] [15-0] [10-6] Instr register Memory data register M u x 1 Read reg 1 Read reg 2 Write register Read data 1 A 0 4 1 2 Write Registers data 3 ALUSrcB Sign extend Shift left 2 1 MemToReg 2 ALU Zero Result B Read data 2 0 M u x 1 0 ALU Out M u x 1 PCSrc ALUOp Question 1 continued Part (b) Complete this finite state machine diagram for the add3 instruction. Control values not shown in each stage are assumed to be 0. Remember to account for any control signals that you added or modified in the previous part of the question! (20 points) Instruction fetch and PC increment IorD = 0 MemRead = 1 IRWrite = 1 ALUSrcA = 0 ALUSrcB = 01 ALUOp = ADD PCSource = 0 PCWrite = 1 Register fetch and branch computation Op = BEQ ALUSrcA = 1 ALUSrcB = 00 ALUOp = SUB PCSource = 1 PCWrite = Zero ALUSrcA = 0 ALUSrcB = 11 ALUOp = ADD Op = R-type Op = ADD3 ALUSrcA = 01 ALUSrcB = 00 ALUOp = ADD RegSrcB = 1 ALUSrcA = 1 ALUSrcB = 00 ALUOp = func Branch completion R-type execution RegDst = 1 MemToReg = 0 RegWrite = 1 Writeback Effective address computation Op = LW/SW ALUSrcA = 1 Op = SW ALUSrcB = 10 ALUOp = ADD Memory write IorD = 1 MemWrite = 1 Op = LW IorD = 1 MemRead = 1 ALUSrcA = 10 ALUSrcB = 00 ALUOp = ADD Memory read RegDst = 0 MemToReg = 1 RegWrite = 1 RegDst = 1 MemToReg = 0 RegWrite = 1 3 lw register write Question 1, continued: Part (c) The add3 instruction can be used in place of two dependent add instruction, reducing the number of instructions that need to be executed. Below are two functionally equivalent programs; the second of which uses the add3 instruction: Program 1 5 5 5 4 4 4 lw lw lw add add sw Program 2 $t0, 0($a0) $t1, 4($a0) $t2, 8($a0) $t3, $t0, $t1 $t3, $t2, $t3 $t3, 0($a1) 5 5 5 5 4 lw lw lw add3 sw $t0, 0($a0) $t1, 4($a0) $t2, 8($a0) $t3, $t0, $t1, $t2 $t3, 0($a1) Assuming the datapath and control that you implemented in parts (a) and (b), how much faster (or slower) is program 2 than program 1? You may leave your answer as a fraction. (10 points) Program 2 is faster by 3/24 = 12.5% Part (d) Implementing the add3 instruction in the pipelined datapath we discussed in class (shown on page 7) would be much more difficult. Give two specific reasons why. (5 points) There are two main structural hazards •The ALU is used twice (would need a second adder) •There are 3 register reads (would need a 3rd read port) 4 Question 2: Pipelined Datapath (20 points) The pipeline datapath diagram on page 7 (feel free to tear it off) has a number of wires/buses annotated. For each annotation (5 points per annotation): • describe the purpose of the wire/bus and • give an instruction or (instruction sequence) that requires it. 1 Purpose:This bus sends PC+4 back to MUX that updates PC Instr: Everyone in IF stage e.g., add $a1,$t1, $s1 2 Purpose:This bus sends extended immediate value to ALU Instr: lw and sw and addi e.g., lw $t1, 8($a0) 3 Purpose:Data Bus to send Data to Memory Instr: Stores e.g., sw $a2, 0($t0) 4 Purpose:Forward Data from WB stage to EX stage Instr: Any R-Type, Stores and Loads e.g., lw $t1, 0($a0) lw $t3, 4($a0) add $s1, $t1, $t2 5 Question 3: Pipelining (30 points) Part (a) instruction latency. Pipelining primarily improves: instruction throughput. (circle one, 5 points) For parts b, c, and d, refer to the pipelined datapath on page 7. Part (b) What is the penalty for taken branches (in number of cycles flushed)? How do you know? (10 points) One cycle because branches are determined in ID stage (see the little = box) Also, only the IF register has a flush signal. Part (c) Describe (in English) the situations that require a load stall. (5 points) Load immediately followed by an instruction uses the result of the load Part (d) Complete the control equation for load stalls. Hint: use the forwarding equation as a guide. (10 points) if ( ID/EX.MemRead == 1 AND ((ID/EX.rt == IF/ID.rs) OR (ID/EX.rt == IF/ID.rt))) then PC_Write = 0, IF/ID.Write = 0, and ControlMux = 0 Note: by default, PC_Write, IF/ID.Write, and ControlMux are 1. 6 Pipelined Datapath Here is the final datapath that we discussed in class. 0 1 Hazard ID/EX.MemRead ControlMux Rs Rt 0 1 0 1 Control EX/Mem.RegWrite 4 + P C + ×4 = R1 Adrs 1 0 1 2 R2 Registers Wr Instr. RAM Data 2 ID/EX 0 1 2 Zero 0 Result Adrs Data RAM 1 2 0 Data Ext 3 Rt 1 0 Rd 1 Rs Forward EX/Mem.RegRd EX/MEM IF.Flush MEM/WB IF/ID.Write ID/EX.RegisterRt IF/ID • • Forwarding is performed from the EX/MEM and MEM/WB latches to the ALU inputs. The control equation for the Rs register input to the ALU is shown at the bottom of the page. The Rt input is similar. Branches are assumed to be not taken. A hazard unit inserts stalls for lw instructions PC Write • MEM/WB.RegWrite 4 MEM/WB.RegisterRd if ((EX/MEM.RegWrite == 1) and (EX/MEM.RegisterRd == ID/EX.RegisterRs)) then ForwardA = 1 else if ((MEM/WB.RegWrite == 1) and (MEM/WB.RegisterRd == ID/EX.RegisterRs)) then ForwardA = 2 7