CprE 381 Computer Organization and Assembly Level Programming, Fall 2013 Midterm Review 2 Dr. Zhao Zhang Iowa State University Announcement No quiz today No homework this Friday Exam on Monday 9:00-9:50 HW9 deadline extended to next Friday HW8 solutions will be posted today Chapter 1 — Computer Abstractions and Technology — 2 Exam 2 Coverage Coverage: Ch. 4, The Processor Datapath and control Simple MIPS pipeline Data hazards and forwarding Load-use hazard and pipeline stall Control hazards Arithmetic will NOT be covered Will be covered in the final exam Final exam is comprehensive Chapter 1 — Computer Abstractions and Technology — 3 Question Styles and Coverage Short answer True/False or multi-choice Design and Analysis Performance analysis and optimization Signal values in the datapath and control Identify critical path Support a new MIPS instruction Identify pipeline bubbles in program execution Reorder instructions to improve performance And others Chapter 1 — Computer Abstractions and Technology — 4 Nine-Instruction MIPS They’re enough to illustrate the most aspects of CPU design, particularly datapath and control design Some questions will use it as the baseline design Memory reference: LW and SW Arithmetic/logic: ADD, SUB, AND, OR, SLT Branch: BEQ, J Chapter 1 — Computer Abstractions and Technology — 5 Datapath With Jumps Added Chapter 4 — The Processor — 6 The Control Control signals for the nine-instruction implementation Inst RegDst ALU- Mem- Reg- Mem Mem Bran toReg Write Read Write ch Src ALU Op1 ALU Op0 Jum p R- 1 0 0 1 0 0 0 1 0 0 lw 0 1 1 1 1 0 0 0 0 0 sw X 1 X 0 0 1 0 0 0 0 beq X 0 X 0 0 0 1 0 1 0 j X X X 0 0 0 0 X X 1 Note: “R-” means R-format Chapter 1 — Computer Abstractions and Technology — 7 ALU Control Truth table for ALU Control Extend it as a secondary control unit in projects B & C, with more control signal output opcode ALUOp Operation funct ALU function ALU control lw 00 load word XXXXXX add 0010 sw 00 store word XXXXXX add 0010 beq 01 branch equal XXXXXX subtract 0110 R-type 10 add 100000 add 0010 subtract 100010 subtract 0110 AND 100100 AND 0000 OR 100101 OR 0001 set-on-less-than 101010 set-on-less-than 0111 Chapter 4 — The Processor — 8 Extend the Single-Cycle Processor For each instruction, do we need 1. Any new or revised datapath element(s)? 2. Any new control signal(s)? Then revise, if necessary, 1. Datapath: Add new elements or revise existing ones, add new connections 2. Control Unit: Add/extend control signals, extend the truth table 3. ALU Control: Extend the truth table Chapter 1 — Computer Abstractions and Technology — 9 Support JAL jal target 000011 address 31:26 25:0 PC = JumpAddr R[31] = PC_plus_4 PC_plus_4 = PC+4 JumpAddr = PC_plus_4[31:28] & Inst[25:0] & “00” Chapter 1 — Computer Abstractions and Technology — 10 Support JAL Make what changes to the datapath? Chapter 4 — The Processor — 11 Support JAL Analyze the instruction execution Writes register $ra ($31) Update PC with jump target Analyze datapath This part already done for supporting J Needs another input, fixed at 31, to “Write register” port of register file Needs another input, PC+4, to “Write data” port of register file Revise control Add a “link” signal The (main) control unit can tell it by reading the opcode Chapter 1 — Computer Abstractions and Technology — 12 SCPv1 + JAL Revises the two muxes • Add another input • Extend the select signals Alternatively, use extra mux Chapter 4 — The Processor — 13 Control Signals Control signals for the nine-instruction implementation Inst RegDst ALUSrc MemtoReg RegWrite Mem Read Mem Write Branc h ALUO p1 ALUO p0 Jump R- 1 0 0 1 0 0 0 1 0 0 lw 0 1 1 1 1 0 0 0 0 0 sw X 1 X 0 0 1 0 0 0 0 beq X 0 X 0 0 0 1 0 1 0 j X X X 0 0 0 0 X X 1 Link jal • Add a new row for jal • Extend RegDst • Add a control line link Chapter 1 — Computer Abstractions and Technology — 14 Control Signals Control signals for the nine-instruction implementation Inst RegDst ALUSrc MemtoReg RegWrite Mem Read Mem Write Branc h ALUO p1 ALUO p0 Jump Link R- 1 0 0 1 0 0 0 1 0 0 0 lw 0 1 1 1 1 0 0 0 0 0 0 sw X 1 X 0 0 1 0 0 0 0 0 beq X 0 X 0 0 0 1 0 1 0 0 j X X X 0 0 0 0 X X 1 0 jal 0 X 0 1 0 0 X X X 1 1 • Extend control input to RegDst Mux: RegDst & Link • Extend control input to MemtoReg Mux: MemtoReg & Link Chapter 1 — Computer Abstractions and Technology — 15 Simple Pipeline Add pipeline registers hold information produced in each cycle Chapter 4 — The Processor — 16 Pipelined Control Chapter 4 — The Processor — 17 Hazards Situations that prevent starting the next instruction safely in the next cycle Structure hazards A required resource is busy Data hazard The simple pipeline won’t work correctly Need to wait for previous instruction to complete its data read/write Control hazard Deciding on control action depends on previous instruction Chapter 4 — The Processor — 18 Data Hazards Program with data dependence sub and or add sw $2, $1,$3 $12,$2,$5 $13,$6,$2 $14,$2,$2 $15,100($2) Program with control dependence beq $1, $3, +4 addi $2, $2, 1 addi $4, $4, 1 Chapter 1 — Computer Abstractions and Technology — 19 Data Forwarding sub and or add sw $2, $1,$3 # MEM=>EX forwarding $12,$2,$5 # WB =>EX forwarding $13,$6,$2 $14,$2,$2 $15,100($2) IF ID EX MEM WB or and sub … … add or and sub … AND gets forwarded new $2 value sw add or and sub SUB gets forwarded new $2 value Chapter 1 — Computer Abstractions and Technology — 20 Data Forwarding Paths Chapter 4 — The Processor — 21 Detecting the Need to Forward Input rs and rt from EX rd and RegWrite from MEM rd and RegWrite from WB Output FwdA, FwdB Caveats Check RegWrite Check if rd = 0 Forwarding from MEM wins over WB Review slides and textbook for details Chapter 4 — The Processor — 22 Load-Use Data Hazard lw $s0, 20($t1) sub $t2, $s0, $t3 Can’t always avoid stalls by forwarding Must stall pipeline by one cycle Chapter 4 — The Processor — 23 Datapath with Hazard Detection Chapter 4 — The Processor — 24 Hazard Detection Unit Input rs and rt from ID rt and MemRead from EX Output PCWrite, IF/IDWrite (0 for holding instructions) Select signal to a MUX to insert bubble in EX Read slides/textbook for details Chapter 4 — The Processor — 25 Pipeline Stall The nop has all control signals set to zero It does nothing at EX, MEM and WB Prevent update of PC and IF/ID register Using instruction is decoded again (OK) Following instruction is fetched again (OK) 1-cycle stall allows MEM to read data for lw Can subsequently forward from WB to EX Chapter 4 — The Processor — 26 Code Scheduling to Avoid Stalls Reorder code to avoid use of load result in the next instruction C code for A = B + E; C = B + F; stall stall lw lw add sw lw add sw $t1, $t2, $t3, $t3, $t4, $t5, $t5, 0($t0) 4($t0) $t1, $t2 12($t0) 8($t0) $t1, $t4 16($t0) 13 cycles lw lw lw add sw add sw $t1, $t2, $t4, $t3, $t3, $t5, $t5, 0($t0) 4($t0) 8($t0) $t1, $t2 12($t0) $t1, $t4 16($t0) 11 cycles Chapter 4 — The Processor — 27 Control Hazards Branch determines flow of control Two branch outcomes: Taken or Not-Taken The CPU doesn’t recognize a branch until it reaches the end of the ID stage Every cycle, the CPU has to fetch one instruction Chapter 4 — The Processor — 28 Control Hazards The MIPS pipeline in textbook always predict “not-taken” Pipeline flush on every taken branch OK to flush because mis-fetched instructions don’t write to register/memory But this incurs pipeline bubbles (performance penalty) The revised MIPS pipeline move branch comparison to the ID stage Doable for BEQ and BNE Reduce pipeline bubbles from 3 to 1 per taken branch Complicate data forwarding and hazard detection Chapter 4 — The Processor — 29 Revised MIPS Pipeline Chapter 4 — The Processor — 30 Revised MIPS Pipeline Note: Branch does nothing in EX, MEM and WB Chapter 4 — The Processor — 31 Performance Penalty Any pipeline bubbles? loop: addi $1, $1, -1 lw $1, addr add $4, $5, $6 add $4, $5, $6 beq $1, $zero, loop beq $1, $4, target Chapter 1 — Computer Abstractions and Technology — 32 Delayed Branch Delayed branch may remove the one-cycle stall The instruction right after the beq is executed no matter the branch is taken or not (sub instruction in the example) Alternatingly saying, the execution of beq is delayed by one cycle sub $10, $4, $8 beq $1, $3, 7 beq $1, $3, 7 => sub $10, $4, $8 and $12, $2, $5 and $12, $2, $5 Must find an independent instruction, otherwise May have to fill in a nop instruction, or Need two variants of beq, delayed and not delayed Chapter 1 — Computer Abstractions and Technology — 33 Other Topics Exception handling Multi-issue pipeline Those topics will be covered in the final exam Exam 2 will NOT cover them Chapter 1 — Computer Abstractions and Technology — 34