Basic Pipelining Wrap-up (from Slide Set 20) Pipelining Big Picture

advertisement
Slide Set #21:
Exploiting ILP
Basic Pipelining Wrap-up
(from Slide Set 20)
Chapter 6 and beyond
1
Pipelining
•
Big Picture
•
Improve performance by increasing instruction throughput
Program
execution
Time
order
(in instructions)
200
400
600
800
1000
1200
1400
1600
1800
•
lw $1, 100($0) Instruction
fetch Reg
lw $2, 200($0)
Data
access
ALU
Instruction Reg
fetch
800 ps
ALU
Data
access
Multicycle
Reg
800 ps
lw $1, 100($0)
Instruction
fetch
400
Reg
lw $2, 200($0) 200 ps Instruction
fetch
lw $3, 300($0)
600
ALU
Reg
Instruction
200 ps fetch
800
Data
access
ALU
Reg
1000
1200
Amount of hardware used
(vs. single cycle)
1400
Split instruction into multiple stages
(1 per cycle)?
Reg
Data
access
ALU
Pipelined
Clock cycle time
(vs. single cycle)
Instruction
fetch
800 ps
200
Remember the single-cycle implementation
– Inefficient because low utilization of hardware resources
– Each instruction takes one long cycle
Two possible ways to improve on this:
Reg
lw $3, 300($0)
Program
execution
Time
order
(in instructions)
2
Reg
Data
access
Each stage has its own set of
hardware?
Reg
200 ps 200 ps 200 ps 200 ps 200 ps
How many instructions executing at
once?
Ideal speedup is number of stages in the pipeline. Do we achieve this?
3
4
Exploiting More ILP
•
•
ILP = __________________ _________________ ________________
(parallelism within a single program)
How can we exploit more ILP?
1. ________________________
(Split execution into many stages)
Pipelining and Beyond
2. ___________________________
(Start executing more than one instruction each cycle)
5
Example – Multiple Issue
Multiple Issue Processors
How many cycles does it take for this code to execute on a 2-issue CPU?
add
lw
add
sw
$t0,
$s1,
$t0,
$s1,
6
$t1, $t2
0($s2)
$t0, $t4
0($s3)
•
Key metric: CPI
•
Key questions:
1. What set of instructions can be issued together?
IPC
2. Who decides which instructions to issue together?
– Static multiple issue
Answer?
–
7
Dynamic multiple issue
8
Multiple Issue Processors
Example – MIPS Static Multiple Issue
•
What extra hardware do we need to do Static Multiple Issue?
•
What else for Dynamic Multiple Issue?
9
Example – Dynamic Multiple Issue Scheduling
Instruction fetch
and decode unit
Functional
units
Reservation
station
...
Reservation
station
Reservation
station
Integer
Integer
...
Floating
point
Load/
Store
Commit
unit
Exercise #1
Assume you must execute the following instructions in order. In any one
cycle you can issue at most one integer op and one load or store. Show
the resultant pipeline diagram. What’s the total number of cycles?
If you can’t issue an instruction on a certain cycle, wait for the next cycle.
In-order issue
Reservation
station
10
lw
sub
lw
add
add
Out-of-order execute
$t0,
$s1,
$t2,
$a0,
$a0,
0($s2)
$t0, $s3
0($s2)
$a1, $a2
$a0, $a3
In-order commit
11
12
Exercise #2
Exercise #3: Static vs. Dynamic Multiple Issue
Use same assumptions as with Exercise #1, but first schedule the code to
try and eliminate stalls. Show the new pipeline diagram and total
number of cycles.
lw
sub
lw
add
add
$t0,
$s1,
$t2,
$a0,
$a0,
•
Which do you think has been commercially successful – static or
dynamic issue? Why?
0($s2)
$t0, $s3
0($s2)
$a1, $a2
$a0, $a3
13
Exercise #4
•
14
Ideas for improving Multiple Issue
Look ahead at the slide for Idea #4 – loop unrolling. What is the
possible bug?
1. Non-blocking caches
2. Speculation
3. Register renaming
4. Loop unrolling
15
16
Idea #3: Register renaming
lw
sw
lw
sw
$t0,
$t0,
$t0,
$t0,
Idea #4: Loop unrolling
0($s0)
4($s0)
0($s2)
4($s2)
Loop:
Problem?
lw
sw
addi
addi
bne
$t0,
$t0,
$s1,
$s2,
$s1,
0($s1)
0($s2)
$s1, -4
$s2, -4
$zero,Loop
Solution?
Why is this a good idea?
17
Chapter 6 Summary
Deeply
pipelined
Multicycle
(Section 5.5)
Pipelined
Multiple issue
with deep pipeline
(Section 6.10)
Multiple-issue
pipelined
(Section 6.9)
Single-cycle
(Section 5.4)
Slower
Faster
Instructions per clock (IPC = 1/CPI)
19
Loop:
lw
lw
lw
lw
sw
sw
sw
sw
addi
addi
bne
$t0, 0($s1)
$t1, 4($s1)
$t2, 8($s1)
$t3,12($s1)
$t0, 0($s2)
$t1, 4($s2)
$t2, 8($s2)
$t3,12($s2)
$s1, $s1, -16
$s2, $s2, -16
$s1, $zero,Loop
18
Download