Slide Set #21: Exploiting ILP Basic Pipelining Wrap-up (from Slide Set 20) Chapter 6 and beyond 1 Pipelining • Big Picture • Improve performance by increasing instruction throughput Program execution Time order (in instructions) 200 400 600 800 1000 1200 1400 1600 1800 • lw $1, 100($0) Instruction fetch Reg lw $2, 200($0) Data access ALU Instruction Reg fetch 800 ps ALU Data access Multicycle Reg 800 ps lw $1, 100($0) Instruction fetch 400 Reg lw $2, 200($0) 200 ps Instruction fetch lw $3, 300($0) 600 ALU Reg Instruction 200 ps fetch 800 Data access ALU Reg 1000 1200 Amount of hardware used (vs. single cycle) 1400 Split instruction into multiple stages (1 per cycle)? Reg Data access ALU Pipelined Clock cycle time (vs. single cycle) Instruction fetch 800 ps 200 Remember the single-cycle implementation – Inefficient because low utilization of hardware resources – Each instruction takes one long cycle Two possible ways to improve on this: Reg lw $3, 300($0) Program execution Time order (in instructions) 2 Reg Data access Each stage has its own set of hardware? Reg 200 ps 200 ps 200 ps 200 ps 200 ps How many instructions executing at once? Ideal speedup is number of stages in the pipeline. Do we achieve this? 3 4 Exploiting More ILP • • ILP = __________________ _________________ ________________ (parallelism within a single program) How can we exploit more ILP? 1. ________________________ (Split execution into many stages) Pipelining and Beyond 2. ___________________________ (Start executing more than one instruction each cycle) 5 Example – Multiple Issue Multiple Issue Processors How many cycles does it take for this code to execute on a 2-issue CPU? add lw add sw $t0, $s1, $t0, $s1, 6 $t1, $t2 0($s2) $t0, $t4 0($s3) • Key metric: CPI • Key questions: 1. What set of instructions can be issued together? IPC 2. Who decides which instructions to issue together? – Static multiple issue Answer? – 7 Dynamic multiple issue 8 Multiple Issue Processors Example – MIPS Static Multiple Issue • What extra hardware do we need to do Static Multiple Issue? • What else for Dynamic Multiple Issue? 9 Example – Dynamic Multiple Issue Scheduling Instruction fetch and decode unit Functional units Reservation station ... Reservation station Reservation station Integer Integer ... Floating point Load/ Store Commit unit Exercise #1 Assume you must execute the following instructions in order. In any one cycle you can issue at most one integer op and one load or store. Show the resultant pipeline diagram. What’s the total number of cycles? If you can’t issue an instruction on a certain cycle, wait for the next cycle. In-order issue Reservation station 10 lw sub lw add add Out-of-order execute $t0, $s1, $t2, $a0, $a0, 0($s2) $t0, $s3 0($s2) $a1, $a2 $a0, $a3 In-order commit 11 12 Exercise #2 Exercise #3: Static vs. Dynamic Multiple Issue Use same assumptions as with Exercise #1, but first schedule the code to try and eliminate stalls. Show the new pipeline diagram and total number of cycles. lw sub lw add add $t0, $s1, $t2, $a0, $a0, • Which do you think has been commercially successful – static or dynamic issue? Why? 0($s2) $t0, $s3 0($s2) $a1, $a2 $a0, $a3 13 Exercise #4 • 14 Ideas for improving Multiple Issue Look ahead at the slide for Idea #4 – loop unrolling. What is the possible bug? 1. Non-blocking caches 2. Speculation 3. Register renaming 4. Loop unrolling 15 16 Idea #3: Register renaming lw sw lw sw $t0, $t0, $t0, $t0, Idea #4: Loop unrolling 0($s0) 4($s0) 0($s2) 4($s2) Loop: Problem? lw sw addi addi bne $t0, $t0, $s1, $s2, $s1, 0($s1) 0($s2) $s1, -4 $s2, -4 $zero,Loop Solution? Why is this a good idea? 17 Chapter 6 Summary Deeply pipelined Multicycle (Section 5.5) Pipelined Multiple issue with deep pipeline (Section 6.10) Multiple-issue pipelined (Section 6.9) Single-cycle (Section 5.4) Slower Faster Instructions per clock (IPC = 1/CPI) 19 Loop: lw lw lw lw sw sw sw sw addi addi bne $t0, 0($s1) $t1, 4($s1) $t2, 8($s1) $t3,12($s1) $t0, 0($s2) $t1, 4($s2) $t2, 8($s2) $t3,12($s2) $s1, $s1, -16 $s2, $s2, -16 $s1, $zero,Loop 18