ECE473 Computer Organization and Architecture

advertisement
Advanced Pipelining
Out of Order Processors
COMP25212
From Monday…
Out-of-Order Execution with Scoreboard
• Centralized data structure which tracks the status
of registers, FUs and instructions and creates,
dynamically in hardware, the dependency graph
– The centralized nature limits scalability:
– Small number of FUs and small window of instructions
• Dependencies
– RAW – stall conflicted instruction
– WAW – stall the pipeline
– WAR – stall WB
Out of Order Execution with Tomasulo
Tomasulo’s Algorithm
• Control logic for out-of-order execution is
decentralized
– Reservation Stations (RS) in the functional units keep
instruction information
– In addition RS seamlessly rename registers
• A Common Data Bus (CDB) broadcasts data
and results to the different devices
– A single instruction can finish each cycle
• Distributed control allows for a larger window
of instructions – Dynamic scheduling
Tomasulo’s Algorithm
• Structural hazards stall the pipeline
• RS tracks when operands are available and
buffers them as soon as they are
– No need for accessing register bank (store values or
sources)
• Impact of RAW dependencies are limited
– Execute an instruction when its operands are available
• WAW and WAR dependencies are avoided
– Register renaming
Register Renaming (Example)
• Eliminates WAR and WAW hazards by renaming all
destination registers.
• Can be done by compiler
True dependences
DIV.D
ADD.D
ST.D
SUB.D
MUL.D
Output dependence
F0, F2, F4
S F0, F8 Antidependence
F6,
S 0(R1)
F6,
T F10, F14
F8,
T
F6, F10, F8
Tomasulo Organization
From Mem
FP Op
Queue
FP Registers
Load Buffers
Load1
Load2
Load3
Load4
Load5
Load6
Store
Buffers
Add1
Add2
Add3
Mult1
Mult2
FP adders
Reservation
Stations
To Mem
FP multipliers
Common Data Bus (CDB)
Normal data bus: data + destination
Common data bus: data + source
Stages of a Tomasulo Pipeline
Execute
Integer
Issue
Write
Back
Execute
FP Multiplication
Write
Back
Execute
FP Multiplication
Write
Back
Execute
FP Add
Write
Back
Execute
FP Division
Write
Back
Three Stages of Tomasulo Algorithm
1. Issue—get instruction from FP Op Queue
If reservation station free (no structural hazard),
control issues instr & sends operands (renames registers).
2. Execute—operate on operands (EX)
When both source operands are ready then execute;
if not ready, watch Common Data Bus for result
3. Write result—finish execution (WB)
Write on Common Data Bus to all awaiting units;
mark reservation station available
• Normal data bus: data + destination (“go to” bus)
• Common data bus: data + source (“come from” bus)
– 64 bits of data + 4 bits of Functional Unit source address
– Write if matches expected Functional Unit (produces result)
– Does the broadcast
Reservation Station Components
No information about instructions needed
Tomasulo Example
Instruction stream
Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6
j
34+
45+
F2
F6
F0
F8
k
R2
R3
F4
F2
F6
F2
Exec Write
Issue Comp Result
Load1
Load2
Load3
Reservation Stations:
Time Name Busy
Add1
No
Add2
No
Add3
No
Mult1 No
Mult2 No
Register result status:
Clock
0
FU
Busy
Addressstatus:
Instruction
Op
S1
Vj
S2
Vk
RS
Qj
RS
Qk
F0
F2
F4
F6
F8
No
Tomasulo does not
No
need this info
No
We will show the
times for each stage,
for convenience
F10
F12
...
F30
Reservation Station Components
No information about instructions needed
Op: Operation to perform in the unit (e.g., + or –)
Vj, Vk: Value of Source operands
– Store buffers has V field, result to be stored
Qj, Qk: Reservation stations producing source
registers (value to be written)
– Note: Qj,Qk=0 => ready
– Store buffers only have Qi for RS producing result
Busy: Indicates reservation station or FU is busy
Reservation Stations:
3 Load Buffers
Tomasulo Example
Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6
j
34+
45+
F2
F6
F0
F8
k
R2
R3
F4
F2
F6
F2
Exec Write
Issue Comp Result
Load1
Load2
Load3
Reservation Stations:
FU count
down
Time Name Busy
Add1
No
Add2
No
Add3
No
Mult1 No
Mult2 No
Register result status:
Clock
0
FU
Busy Address
Op
S1
Vj
S2
Vk
F0
Source
registers
F2
F4
RS
Qj
RS
Qk
No
No
No
Source Stations:
Reservation
registers
3 Adder
2 Multiplication
Which FU
will
F6 produce
F8 F10
operands
F12
...
F30
Reservation Station Components
No information about instructions needed
Op: Operation to perform in the unit (e.g., + or –)
Vj, Vk: Value of Source operands
– Store buffers has V field, result to be stored
Qj, Qk: Reservation stations producing source
registers (value to be written)
– Note: Qj,Qk=0 => ready
– Store buffers only have Qi for RS producing result
Busy: Indicates reservation station or FU is busy
Register result status—Indicates which functional
unit will write each register, if one exists. Blank when
no pending instructions that will write that register.
Tomasulo Example
Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6
j
34+
45+
F2
F6
F0
F8
k
R2
R3
F4
F2
F6
F2
Exec Write
Issue Comp Result
Load1
Load2
Load3
Reservation Stations:
Time Name Busy
Add1
No
Add2
No
Add3
No
Mult1 No
Mult2 No
Register result status:
Clock
0
Clock cycle
counter
Busy Address
Op
S1
Vj
S2
Vk
RS
Qj
RS
Qk
F0
F2
F4
F6
F8
No
No
No
F10
F12
FU
Which RS will write in each register?
...
F30
A Tomasulo Example
The following code is run on a Tomasulo pipeline with:
Functional Unit (FU)
FP Multiply/Division
FP Addition/Substraction
Mem Load
L.D
F6, 34(R2)
L.D
F2, 45(R3)
MUL.D
F0, F2, F4
SUB.D
F8, F6, F2
DIV.D
F10, F0, F6
ADD.D
F6, F8, F2
# of FUs
2
3
3
EX cycles
10/40
2
2
Functional units
not pipelined
Dependency Graph For Example Code
Example Code
1
1
2
3
4
5
6
L.D F6, 34 (R2)
2
L.D F2, 45 (R3)
3
MUL.D F0, F2, F4
4
SUB.D F8, F6, F2
5
DIV.D F10, F0, F6
L.D
L.D
MUL.D
SUB.D
DIV.D
ADD.D
F6, 34(R2)
F2, 45(R3)
F0, F2, F4
F8, F6, F2
F10, F0, F6
F6, F8, F2
Date Dependence:
(1, 4) (1, 5) (2, 3) (2, 4)
(2, 6) (3, 5) (4, 6)
Output Dependence:
(1, 6)
Anti-dependence:
(5, 6)
Real Data Dependence (RAW)
6
ADD.D F6, F8, F2
Anti-dependence
(WAR)
Output Dependence
(WAW)
Tomasulo Example
Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6
j
34+
45+
F2
F6
F0
F8
k
R2
R3
F4
F2
F6
F2
Exec Write
Issue Comp Result
Load1
Load2
Load3
Reservation Stations:
Time Name Busy
Add1
No
Add2
No
Add3
No
Mult1 No
Mult2 No
Register result status:
Clock
0
FU
Busy Address
Op
S1
Vj
S2
Vk
RS
Qj
RS
Qk
F0
F2
F4
F6
F8
No
No
No
F10
F12
...
F30
Tomasulo Example Cycle 1
Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6
j
34+
45+
F2
F6
F0
F8
k
R2
R3
F4
F2
F6
F2
Exec Write
Issue Comp Result
1
Reservation Stations:
Time Name Busy
Add1
No
Add2
No
Add3
No
Mult1 No
Mult2 No
Register result status:
Clock
1
LD#1 issued
FU
Busy Address
Load1
Load2
Load3
Op
S1
Vj
S2
Vk
RS
Qj
RS
Qk
F0
F2
F4
F6
F8
Load1
Yes
No
No
34+R2
F10
F12
...
F30
Tomasulo Example Cycle 2
Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6
j
34+
45+
F2
F6
F0
F8
k
R2
R3
F4
F2
F6
F2
Exec Write
Issue Comp Result
1
2
Reservation Stations:
Time Name Busy
Add1
No
Add2
No
Add3
No
Mult1 No
Mult2 No
Register result status:
Clock
2
LD#2 issued
FU
Busy Address
Load1
Load2
Load3
Op
S1
Vj
S2
Vk
RS
Qj
RS
Qk
F0
F2
F4
F6
F8
Load2
Load1
Yes
Yes
No
34+R2
45+R3
F10
F12
...
F30
Tomasulo Example Cycle 3
Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6
j
34+
45+
F2
F6
F0
F8
k
R2
R3
F4
F2
F6
F2
Exec Write
Issue Comp Result
1
2
3
Reservation Stations:
Time Name Busy Op
Add1
No
Add2
No
Add3
No
Mult1 Yes MULTD
Mult2 No
Register result status:
Clock
3
FU
F0
Busy Address
3
S1
Vj
Load1
Load2
Load3
S2
Vk
RS
Qj
Yes
Yes
No
34+R2
45+R3
F10
F12
RS
Qk
R(F4) Load2
F2
Mult1 Load2
F4
F6
F8
Load1
• MULTD is issued
• LD#1 completes and broadcasts its result
...
F30
Tomasulo Example Cycle 4
Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6
j
34+
45+
F2
F6
F0
F8
k
R2
R3
F4
F2
F6
F2
Exec Write
Issue Comp Result
1
2
3
4
Reservation Stations:
Busy Address
3
4
4
Load1
Load2
Load3
S1
Vj
S2
Vk
RS
Qj
RS
Qk
F2
F4
F6
F8
No
Yes
No
45+R3
F10
F12
Time Name Busy Op
Add1 Yes SUBD M(A1)
Load2
Add2
No
Add3
No
Mult1 Yes MULTD
R(F4) Load2
Mult2 No
Register result status:
Clock
4
FU
F0
Mult1 Load2
• SUBD is issued
• LD#1 result updates the register bank
• LD#2 completes, broadcasting its result
M(A1) Add1
...
F30
Tomasulo Example Cycle 5
Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6
j
34+
45+
F2
F6
F0
F8
k
R2
R3
F4
F2
F6
F2
Exec Write
Issue Comp Result
1
2
3
4
5
Reservation Stations:
Busy Address
3
4
4
5
Load1
Load2
Load3
S1
Vj
S2
Vk
RS
Qj
RS
Qk
F2
F4
F6
F8
Time Name Busy Op
2 Add1 Yes SUBD M(A1) M(A2)
Add2
No
Add3
No
10 Mult1 Yes MULTD M(A2) R(F4)
Mult2 Yes DIVD
M(A1) Mult1
Register result status:
Clock
5
FU
F0
Mult1 M(A2)
• DIVD is issued
• LD#2 result updates the register bank
• Add1, Mult1 start execution
No
No
No
F10
M(A1) Add1 Mult2
F12
...
F30
Tomasulo Example Cycle 6
Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6
j
34+
45+
F2
F6
F0
F8
k
R2
R3
F4
F2
F6
F2
Exec Write
Issue Comp Result
1
2
3
4
5
6
Reservation Stations:
Busy Address
3
4
4
5
Load1
Load2
Load3
S1
Vj
S2
Vk
RS
Qj
RS
Qk
F2
F4
F6
F8
Time Name Busy Op
1 Add1 Yes SUBD M(A1) M(A2)
Add2 Yes ADDD
M(A2) Add1
Add3
No
9 Mult1 Yes MULTD M(A2) R(F4)
Mult2 Yes DIVD
M(A1) Mult1
Register result status:
Clock
6
• ADDD issued
FU
F0
Mult1 M(A2)
Add2
No
No
No
F10
Add1 Mult2
F12
...
F30
Tomasulo Example Cycle 7
Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6
j
34+
45+
F2
F6
F0
F8
k
R2
R3
F4
F2
F6
F2
Exec Write
Issue Comp Result
1
2
3
4
5
6
Reservation Stations:
3
4
Busy Address
4
5
Load1
Load2
Load3
7
S1
Vj
S2
Vk
RS
Qj
RS
Qk
F2
F4
F6
F8
Time Name Busy Op
0 Add1 Yes SUBD M(A1) M(A2)
Add2 Yes ADDD
M(A2) Add1
Add3
No
8 Mult1 Yes MULTD M(A2) R(F4)
Mult2 Yes DIVD
M(A1) Mult1
Register result status:
Clock
7
FU
F0
No
No
No
Mult1 M(A2)
Add2
F10
Add1 Mult2
• Add1 (SUBD) completes and broadcasts result
F12
...
F30
Tomasulo Example Cycle 8
Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6
j
34+
45+
F2
F6
F0
F8
k
R2
R3
F4
F2
F6
F2
Exec Write
Issue Comp Result
1
2
3
4
5
6
Reservation Stations:
Busy Address
3
4
4
5
Load1
Load2
Load3
7
8
S1
Vj
S2
Vk
RS
Qj
RS
Qk
F2
F4
F6
F8
Time Name Busy Op
Add1
No
2 Add2 Yes ADDD (M-M) M(A2)
Add3
No
7 Mult1 Yes MULTD M(A2) R(F4)
Mult2 Yes DIVD
M(A1) Mult1
Register result status:
Clock
8
FU
F0
Mult1 M(A2)
No
No
No
F10
Add2 (M-M) Mult2
• Add1 (SUBD) result updates the register bank
• Add2 (ADDD) start execution
F12
...
F30
Tomasulo Example Cycle 9
Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6
j
34+
45+
F2
F6
F0
F8
k
R2
R3
F4
F2
F6
F2
Exec Write
Issue Comp Result
1
2
3
4
5
6
Reservation Stations:
Busy Address
3
4
4
5
Load1
Load2
Load3
7
8
S1
Vj
S2
Vk
RS
Qj
RS
Qk
F2
F4
F6
F8
Time Name Busy Op
Add1
No
1 Add2 Yes ADDD (M-M) M(A2)
Add3
No
6 Mult1 Yes MULTD M(A2) R(F4)
Mult2 Yes DIVD
M(A1) Mult1
Register result status:
Clock
9
FU
F0
Mult1 M(A2)
No
No
No
F10
Add2 (M-M) Mult2
• ADDD and MULTD continue execution
F12
...
F30
Tomasulo Example Cycle 10
Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6
j
34+
45+
F2
F6
F0
F8
k
R2
R3
F4
F2
F6
F2
Exec Write
Issue Comp Result
1
2
3
4
5
6
Reservation Stations:
3
4
4
5
7
8
Busy Address
Load1
Load2
Load3
10
S1
Vj
S2
Vk
RS
Qj
RS
Qk
F2
F4
F6
F8
Time Name Busy Op
Add1
No
0 Add2 Yes ADDD (M-M) M(A2)
Add3
No
5 Mult1 Yes MULTD M(A2) R(F4)
Mult2 Yes DIVD
M(A1) Mult1
Register result status:
Clock
10
FU
F0
Mult1 M(A2)
• Add2 (ADDD) completes
No
No
No
F10
Add2 (M-M) Mult2
F12
...
F30
Tomasulo Example Cycle 11
Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6
j
34+
45+
F2
F6
F0
F8
k
R2
R3
F4
F2
F6
F2
Exec Write
Issue Comp Result
1
2
3
4
5
6
Reservation Stations:
Busy Address
3
4
4
5
Load1
Load2
Load3
7
8
10
11
S1
Vj
S2
Vk
RS
Qj
RS
Qk
F2
F4
F6
F8
Time Name Busy Op
Add1
No
Add2
No
Add3
No
4 Mult1 Yes MULTD M(A2) R(F4)
Mult2 Yes DIVD
M(A1) Mult1
Register result status:
Clock
11
FU
F0
Mult1 M(A2)
No
No
No
F10
(M-M+M)(M-M) Mult2
• ADDD result updates the register bank
F12
...
F30
Tomasulo Example Cycle 12
Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6
j
34+
45+
F2
F6
F0
F8
k
R2
R3
F4
F2
F6
F2
Exec Write
Issue Comp Result
1
2
3
4
5
6
Reservation Stations:
Busy Address
3
4
4
5
Load1
Load2
Load3
7
8
10
11
S1
Vj
S2
Vk
RS
Qj
RS
Qk
F2
F4
F6
F8
Time Name Busy Op
Add1
No
Add2
No
Add3
No
3 Mult1 Yes MULTD M(A2) R(F4)
Mult2 Yes DIVD
M(A1) Mult1
Register result status:
Clock
12
FU
F0
Mult1 M(A2)
• MULTD continues execution
No
No
No
F10
(M-M+M)(M-M) Mult2
F12
...
F30
Tomasulo Example Cycle 13
Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6
j
34+
45+
F2
F6
F0
F8
k
R2
R3
F4
F2
F6
F2
Exec Write
Issue Comp Result
1
2
3
4
5
6
Reservation Stations:
Busy Address
3
4
4
5
Load1
Load2
Load3
7
8
10
11
S1
Vj
S2
Vk
RS
Qj
RS
Qk
F2
F4
F6
F8
Time Name Busy Op
Add1
No
Add2
No
Add3
No
2 Mult1 Yes MULTD M(A2) R(F4)
Mult2 Yes DIVD
M(A1) Mult1
Register result status:
Clock
13
FU
F0
Mult1 M(A2)
• MULTD continues execution
No
No
No
F10
(M-M+M)(M-M) Mult2
F12
...
F30
Tomasulo Example Cycle 14
Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6
j
34+
45+
F2
F6
F0
F8
k
R2
R3
F4
F2
F6
F2
Exec Write
Issue Comp Result
1
2
3
4
5
6
Reservation Stations:
Busy Address
3
4
4
5
Load1
Load2
Load3
7
8
10
11
S1
Vj
S2
Vk
RS
Qj
RS
Qk
F2
F4
F6
F8
Time Name Busy Op
Add1
No
Add2
No
Add3
No
1 Mult1 Yes MULTD M(A2) R(F4)
Mult2 Yes DIVD
M(A1) Mult1
Register result status:
Clock
14
FU
F0
Mult1 M(A2)
• MULTD continues execution
No
No
No
F10
(M-M+M)(M-M) Mult2
F12
...
F30
Tomasulo Example Cycle 15
Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6
j
34+
45+
F2
F6
F0
F8
k
R2
R3
F4
F2
F6
F2
Exec Write
Issue Comp Result
1
2
3
4
5
6
Reservation Stations:
Busy Address
3
4
15
7
4
5
Load1
Load2
Load3
10
11
S1
Vj
S2
Vk
RS
Qj
RS
Qk
F2
F4
F6
F8
8
Time Name Busy Op
Add1
No
Add2
No
Add3
No
0 Mult1 Yes MULTD M(A2) R(F4)
Mult2 Yes DIVD
M(A1) Mult1
Register result status:
Clock
15
FU
F0
Mult1 M(A2)
No
No
No
F10
(M-M+M)(M-M) Mult2
• MULTD completes and broadcasts result
F12
...
F30
Tomasulo Example Cycle 16
Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6
j
34+
45+
F2
F6
F0
F8
k
R2
R3
F4
F2
F6
F2
Exec Write
Issue Comp Result
1
2
3
4
5
6
Reservation Stations:
3
4
15
7
4
5
16
8
Load1
Load2
Load3
10
11
S1
Vj
S2
Vk
RS
Qj
RS
Qk
F2
F4
F6
F8
Time Name Busy Op
Add1
No
Add2
No
Add3
No
Mult1 No
40 Mult2 Yes DIVD M*F4 M(A1)
Register result status:
Clock
16
FU
F0
Busy Address
M*F4 M(A2)
No
No
No
F10
(M-M+M)(M-M) Mult2
• MULTD result updates the register bank
• DIVD starts execution
F12
...
F30
39 cycles later…
Tomasulo Example Cycle 55
Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6
j
34+
45+
F2
F6
F0
F8
k
R2
R3
F4
F2
F6
F2
Exec Write
Issue Comp Result
1
2
3
4
5
6
Reservation Stations:
3
4
15
7
4
5
16
8
Load1
Load2
Load3
10
11
S1
Vj
S2
Vk
RS
Qj
RS
Qk
F2
F4
F6
F8
Time Name Busy Op
Add1
No
Add2
No
Add3
No
Mult1 No
1 Mult2 Yes DIVD M*F4 M(A1)
Register result status:
Clock
55
FU
F0
Busy Address
M*F4 M(A2)
• DIVD is about to complete
No
No
No
F10
(M-M+M)(M-M) Mult2
F12
...
F30
Tomasulo Example Cycle 56
Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6
j
34+
45+
F2
F6
F0
F8
k
R2
R3
F4
F2
F6
F2
Exec Write
Issue Comp Result
1
2
3
4
5
6
Reservation Stations:
3
4
15
7
56
10
4
5
16
8
Load1
Load2
Load3
S1
Vj
S2
Vk
RS
Qj
RS
Qk
56
FU
• DIVD completes
F0
F2
F4
F6
F8
M*F4 M(A2)
No
No
No
11
Time Name Busy Op
Add1
No
Add2
No
Add3
No
Mult1 No
0 Mult2 Yes DIVD M*F4 M(A1)
Register result status:
Clock
Busy Address
F10
(M-M+M)(M-M) Mult2
F12
...
F30
Tomasulo Example Cycle 57
Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6
j
34+
45+
F2
F6
F0
F8
k
R2
R3
F4
F2
F6
F2
Exec Write
Issue Comp Result
1
2
3
4
5
6
3
4
15
7
56
10
4
5
16
8
57
11
Op
S1
Vj
S2
Vk
RS
Qj
RS
Qk
F0
F2
F4
F6
F8
Reservation Stations:
Time Name Busy
Add1
No
Add2
No
Add3
No
Mult1 No
Mult2 No
Register result status:
Clock
56
FU
Busy Address
M*F4 M(A2)
Load1
Load2
Load3
No
No
No
F10
(M-M+M)(M-M) Result
• DIVD result updates the register bank
F12
...
Tomasulo Example Cycle 57
Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6
j
34+
45+
F2
F6
F0
F8
k
R2
R3
F4
F2
F6
F2
Exec Write
Issue Comp Result
1
2
3
4
5
6
3
4
15
7
56
10
4
5
16
8
57
11
Op
S1
Vj
S2
Vk
RS
Qj
RS
Qk
F0
F2
F4
F6
F8
Reservation Stations:
Time Name Busy
Add1
No
Add2
No
Add3
No
Mult1 No
Mult2 No
Register result status:
Clock
56
FU
Address
In-orderBusy
issue
Load1
No
Load2
No
Out-of-order
execution
Load3
No
Out-of-order completion
M*F4 M(A2)
F10
(M-M+M)(M-M) Result
F12
...
Tomasulo’s advantages
(1) Distributed hazard detection logic
– distributed reservation stations and the CDB
– If multiple instructions waiting on a single result, & each
instruction has other operand, then instructions can be
dispatched simultaneously by broadcasting on CDB
– If a centralized register file were used, the units would
have to read their results from the registers when
register buses are available.
(2) Avoids stalling due to WAW or WAR hazards
Tomasulo Drawbacks
• Complexity of hardware
• Performance limited by Common Data Bus
– Each CDB must go to all functional units
 high capacitance, high wiring density
– Number of functional units that can complete per cycle
limited to one!
» Multiple CDBs  more FU logic for parallel stores
Summary
• Reservations stations: implicit register renaming to
larger set of registers + buffering source operands
– Prevents registers from being bottleneck
– Avoids the WAR and WAW hazards of Scoreboard
• Lasting Contributions
– Dynamic scheduling
– Register renaming
– Load/store disambiguation
Summary of Out-of-Order Processors
Out of Order Processors
BENEFITS:
LIMITATIONS:
• Accelerates the
execution of programs
• More efficient design
• More complex design
• Very expensive in terms
of area and power
• Non-precise interrupts
– Increases the utilisation of
processor resources
– Interrupting exactly after
an instruction might not be
possible
Scoreboard vs Tomasulo
Scoreboard
Tomasulo
≤ 5 instructions
≤ 14 instructions
Structural hazard:
No issue
No issue
WAR dependency
stall completion
renaming avoids
WAW dependency:
stall completion
renaming avoids
Window size:
Results forwarding: Write/read registers
Control structure:
central scoreboard
Broadcast from FU
distributed reservation stations
Example
In-order
RAW
LD R1 X
LD R2 Y
ADD R3 R1 R2
SUB R3 R5 R6
MUL R4 R1 R1
DIV R7 R5 R6
1
IF
2
ID
IF
3
4
LD1 LD2
ID LD1
IF
ID
IF
5
LD3
LD2
Stall
Stall
3
4
LD1 LD2
RO LD1
I
RO
I
5
LD3
LD2
RO
I
6
LD4
LD3
Stall
Stall
7
WB
LD4
Stall
Stall
8
WB
Stall
Stall
9
10
11
12
13
14
15
16
RAW – Stall the pipeline
Add1 Add2 WB
ID
Sub1 Sub2 WB
IF
ID
Mul1 Mul2 Mul3 Mul4 WB
IF
ID
Div1 Div2 Div3 Div4 WB
Out-of-order with Scoreboard
LD R1 X
LD R2 Y
ADD R3 R1 R2
SUB R3 R5 R6
MUL R4 R1 R1
DIV R7 R5 R6
WAW
1
I
2
RO
I
6
LD4
LD3
RO
I
7
WB
LD4
RO
I
8
9
10
11
12
13
14
15
RAW – ADD stalled, SUB could be issued
WB
RO
RO
I
Add1 Add2 WB
Sub1 Sub2 WB
RO Mul1 Mul2 Mul3 Mul4
I
RO
Div1 Div2 Div3
WB
Div4
WB
Out-of-order with Tomasulo
LD R1 X
LD R2 Y
ADD R3 R1 R2
SUB R3 R5 R6
MUL R4 R1 R1
DIV R7 R5 R6
1
I
2
3
4
5
6
7
8
9
10
LD1 LD2 LD3 LD4 CDB
I
LD1 LD2 LD3 LD4 CDB
I
RS
RS
RS
RS
Add1 Add2 CDB
I
Sub1 Sub2 CDB CDB
I
RS Mul1 Mul2 Mul3 Mul4
I
Div1 Div2 Div3 Div4
LD – 4 cycles
Add/Sub – 2 cycles
Mul/Div – 2 cycles
11
12
RAW – ADD stalled, SUB can be issued
CDB
CDB
CDB
Assuming no structural Hazards
Example
In-order
LD R1 X
LD R2 Y
ADD R3 R1 R2
SUB R3 R5 R6
MUL R4 R1 R1
DIV R7 R5 R6
1
IF
2
ID
IF
3
4
LD1 LD2
ID LD1
IF
ID
IF
5
LD3
LD2
Stall
Stall
3
4
LD1 LD2
RO LD1
I
RO
I
5
LD3
LD2
RO
I
6
LD4
LD3
Stall
Stall
7
WB
LD4
Stall
Stall
8
WB
Stall
Stall
9
10
11
12
13
14
15
16
Add1 Add2 WB
ID
Sub1 Sub2 WB
IF
ID
Mul1 Mul2 Mul3 Mul4 WB
IF
ID
Div1 Div2 Div3 Div4 WB
Out-of-order with Scoreboard
LD R1 X
LD R2 Y
ADD R3 R1 R2
SUB R3 R5 R6
MUL R4 R1 R1
DIV R7 R5 R6
WAW
1
I
2
RO
I
6
LD4
LD3
RO
I
7
WB
LD4
RO
I
8
9
10
11
12
13
14
15
WAW – SUB cannot be issued
Add1 Add2 WB
Stall
Sub1 Sub2
WBthe pipeline
WB
RO
RO
I
RO
I
Mul1 Mul2 Mul3 Mul4
RO
Div1 Div2 Div3
WB
Div4
WB
Out-of-order with Tomasulo
LD R1 X
LD R2 Y
ADD R3 R1 R2
SUB R3 R5 R6
MUL R4 R1 R1
DIV R7 R5 R6
1
I
2
3
4
5
6
7
8
9
10
LD1 LD2 LD3 LD4 CDB
I
LD1 LD2 LD3 LD4 CDB
I
RS
RS
RS
RS
Add1 Add2 CDB
I
Sub1 Sub2 CDB CDB
I
RS Mul1 Mul2 Mul3 Mul4
I
Div1 Div2 Div3 Div4
LD – 4 cycles
Add/Sub – 2 cycles
Mul/Div – 2 cycles
11
12
WAW – Allowed by register renaming in RS
CDB
CDB
CDB
Assuming no structural Hazards
Example
In-order
LD R1 X
LD R2 Y
ADD R3 R1 R2
SUB R3 R5 R6
MUL R4 R1 R1
DIV R7 R5 R6
1
IF
2
ID
IF
3
4
LD1 LD2
ID LD1
IF
ID
IF
5
LD3
LD2
Stall
Stall
3
4
LD1 LD2
RO LD1
I
RO
I
5
LD3
LD2
RO
I
6
LD4
LD3
Stall
Stall
7
WB
LD4
Stall
Stall
8
WB
Stall
Stall
9
10
11
12
13
14
15
16
Add1 Add2 WB
ID
Sub1 Sub2 WB
IF
ID
Mul1 Mul2 Mul3 Mul4 WB
IF
ID
Div1 Div2 Div3 Div4 WB
Out-of-order with Scoreboard
LD R1 X
LD R2 Y
ADD R3 R1 R2
SUB R3 R5 R6
MUL R4 R1 R1
DIV R7 R5 R6
1
I
2
RO
I
6
LD4
LD3
RO
I
7
WB
LD4
RO
I
8
WB
RO
RO
I
9
10
11
12
13
14
15
Add1 Add2 WB
Sub1 Sub2 WB
RO Mul1 Mul2 Mul3 Mul4
I
RO
Div1 Div2 Div3
2 instrs.WB
can finish at
Div4 WB
the same
time
Out-of-order with Tomasulo
LD R1 X
LD R2 Y
ADD R3 R1 R2
SUB R3 R5 R6
MUL R4 R1 R1
DIV R7 R5 R6
1
I
2
3
4
5
6
7
8
9
10
LD1 LD2 LD3 LD4 CDB
I
LD1 LD2 LD3 LD4 CDB
I
RS
RS
RS
RS
Add1 Add2 CDB
I
Sub1 Sub2 CDB CDB
I
RS Mul1 Mul2 Mul3 Mul4
I
Div1 Div2 Div3 Div4
LD – 4 cycles
Add/Sub – 2 cycles
Mul/Div – 2 cycles
11
CDB
CDB
12
CDB
CDB limits finishing
instrs. to one/cycle
Assuming no structural Hazards
Download