CS 152, Spring 2010 Section 3 Andrew Waterman University of California, Berkeley Agenda • Precise Exceptions • Problem Set 1 review Precise Exceptions • Goal: provide illusion of sequential, non-overlapping instruction execution – All instructions before exceptional one appear to have executed completely – All instructions including and following exceptional one appear to have not executed at all Precise Exceptions • Two requirements for precise exceptions: – Keep architected state consistent – Take the correct (oldest) exception • How to keep architected state consistent: – Don’t update it until instruction is guaranteed to commit – Need to be able to draw a single “commit line”: • Before commit line, no architected state is modified • At commit line, it is known whether or not an exception will occur Problem Set 1 Review – P1 • Skipping 1.A-1.C (solutions online) • 1.D: What did you notice about relative code size for CISC, RISC, and stack machines? • 1.E: What optimization strategies proved effective? Did anyone beat my solution (7 instructions)? Problem Set 1 Review – P2 • For microcode problems, key is to get the pseudocode right – Control signals follow readily from pseudocode • Sanity checks: – Only one device may drive the bus – The bus probably should be driven every cycle – Don’t read from a register whose write-enable was a don’t-care Problem Set 1 Review – P2 • Most people got P2 A/B correct, but didn’t use don’t-cares aggressively – If you won’t read A/B/MA registers again, their write-enables should be don’t-cares – If enMem is off, Mem Wr is a don’t-care Problem Set 1 Review – P2 • P2A: M[rd] <~~ M[rs] + M[rt] – MA <- R[rs] – A <- Mem – MA <- R[rd] – B <- Mem – MA <- R[rd] – Mem <- ALU (A+B); uBR=J • Note efficiency: 9 cycles vs. 18 for ld,ld,add,st Problem Set 1 Review – P2 • P2B: if(--rs != 0) then branch – A <- R[rs] – R[rs] <- ALU (A-1); uBr=z – A <- signext(imm) – PC <- A+B; uBR=J • Recall that B <- PC+4 happened for free Problem Set 1 Review – P2 • P2B: if(--rs != 0) then branch – A <- R[rs] – R[rs] <- ALU (A-1); uBr=z – A <- signext(imm) – PC <- A+B; uBR=J • Recall that B <- PC+4 happened for free Problem Set 1 Review – P3 • 3.A: load-use stalls are gone (lw -> add) • 3.B: address-calc, store data stalls appear (add -> lw/sw) • 3.C: compiler can schedule around load-use stalls, but now address calculation is costly. Old pipeline was better • 3.D: anyone have a favorite solution? • 3.E: what is the problem with precise state? Problem Set 1 Review – P4 • Pipeline depth, microcode vs. hardwired are clearly NOT ISA visible • CISC vs. RISC is ISA visible (it IS the ISA) • Delay slot is ISA visible • Stack machine’s # of physical registers isn’t ISA visible, provided the spill mechanism is automatic Problem Set 1 Review – P5 • Deeper pipelining – Doesn’t affect I/P, increases CPI, reduces T • Adding complex insn – Reduces I/P if compiler can use; increases CPI, T? • Reducing bypasses – Doesn’t affect I/P, increases CPI, reduces T • Improving mem access speed – Doesn’t affect I/P, reduces either CPI or T Questions? • (short of “what’s on the quiz”)