Dependences

advertisement
Fall 2011
Prof. Hyesoon Kim
• App1 IPC: 1.5 App2 IPC 0.5
– Average IPC ?
– 1/(1/1.5+1/0.5) = 0.75
Add: 1 cycles
FE_stage
FE L
ID
L
add
r1, r2, r3
add
add
sub
r4, r1, r3
sub
sub add add
mul
r5, r2, r3
mul sub sub
mul sub
sub
mul mul
sub sub
EX
add
mul mul
L
MEM L
WB L
add
add add
add
sub sub
• Data Dependencies
– RAW: Read-After-Write (True Dependence)
– WAR: Anti-Depedence
– WAW: Output Dependence
• Control Dependence
Ifetch
Reg
ALU
– When following instructions depend on the
outcome of a previous branch/jump
DMem
Reg
• Register dependencies
– RAW, WAR, WAW, based on register number
• Memory dependencies
– Based on memory address
– This is harder
• Register names known at decode
• Memory addresses not known until execute
• Also called memory disambiguation
• Basically similar to regular (register) data
dependencies: RAW, WAR, WAW
• However, the exact location is not known
until run-time :
– A: STORE R1, 0[R2]
– B: LOAD R5, 24[R8]
– C: STORE R3, -8[R9]
– RAW exists if (R2+0) == (R8+24)
– WAR exists if (R8+24) == (R9 – 8)
– WAW exists if (R2+0) == (R9 – 8)
Time (clock cycles)
Control hazard
Ifetch
and r6,r1,r7
O
TARG
r
or
r1,r2,r9
d
e
br (r1) TARG1
r
Control hazard and data hazard
Reg
DMem
Ifetch
Reg
DMem
Reg
DMem
Reg
ALU
n
s
uncond. TARG
t
r.
ALU
I
add r1,r2,r3
ALU
O
r
d
e
r
Cycle 1Cycle 2 Cycle 3 Cycle 4Cycle 5 Cycle 6 Cycle 7
ALU
I
n
s
t
r.
Ifetch
Ifetch
Reg
Reg
Reg
DMem
Reg
Add: 1 cycles
FE_stage
FE L
ID
L
add
r1, r2, r3
add
add
sub
r4, r1, r3
sub
sub add add
mul
r5, r2, r3
mul sub sub
mul sub
sub
mul mul
sub sub
EX
add
mul mul
Can we do better than this?
L
MEM L
WB L
add
add add
add
sub sub
or
r8,r1,r9
xor r10,r1,r4
Reg
DMem
Ifetch
Reg
DMem
Ifetch
Reg
DMem
Ifetch
Reg
ALU
and r6,r1,r7
Ifetch
ALU
sub r4,r1,r3
Reg
ALU
O
r
d
e
r
add r1,r2,r3 Ifetch
ALU
I
n
s
t
r.
ALU
Time (clock cycles)
DMem
Reg
Reg
Reg
Reg
DMem
Reg
• Destination register is a name for instr’s result
• Source registers are names for sources
• Forwarding logic builds data dependence (flow) graph for
instructions in the execution window
and r6,r1,r7
Reg
Ifetch
DMem
Reg
Reg
Reg
Ifetch
Ifetch
Ifetch
Reg
Reg
DMem
ALU
sub r4,r1,r3
Ifetch
ALU
add r1,r2,r3
ALU
Time (clock cycles)
Reg
DMem
Reg
and r6,r1,r7
Reg
DMem
Ifetch
Reg
DMem
Reg
Reg
Ifetch
Reg
Reg
ALU
sub r4,r1,r3
Ifetch
ALU
add r1,r2,r3
ALU
Time (clock cycles)
DMem
Reg
or
r8,r1,r9
DMem
Ifetch
Reg
DMem
Reg
Ifetch
Ifetch
Reg
Reg
Reg
DMem
ALU
and r6,r1,r7
Reg
ALU
sub r4,r1,r3
Ifetch
ALU
add r1,r2,r3
ALU
Time (clock cycles)
Reg
DMem
Reg
Reg
ALU
Ifetch
DMem
Reg
Reg
ALU
Ifetch
DMem
Reg
• Is that all?
• How Do we know whether Add is writing a
data to R1?
• Broadcasting Register id
Destination register ID
Ifetch
Reg
ALU
Sources
Ready?
DMem
Reg
•
•
•
•
•
Sim.cpp
Op structure,
Op latency
Pipeline latch
Get Op
run_a_cycle
OP pool
cycle_count++
Op
Traces
Update pipeline stages
Op
Op
Op
get_op
Data structure for simulation
WB
MEM
EX
ID
FE
and r6,r2,r7
Reg
Ifetch
DMem
Reg
Reg
Reg
Ifetch
Reg
Reg
ALU
sub r4,r1,r3
Ifetch
ALU
O
r
d
e
r
add r1,r2,r3
ALU
I
n
s
t
r.
DMem
All sources are ready?
Why not execute them?
Q: What is static scheduling?
A: Compiler re-order instructions
DMem
Reg
Reg
• Give processor more registers than
specified by the ISA
– temporarily map ISA registers (“logical” or
“architected” registers) to the physical registers
to avoid overwrites
• Components:
– mapping mechanism
– physical registers
• allocated vs. free registers
• allocation/deallocation mechanism
• Example
– I3 can not exec before I2 because
I3 will overwrite R5
– I5 can not go before I2 because
I2, when it goes, will overwrite
R2 with a stale value
Program code
I1: ADD R1, R2, R3
I2: SUB R2, R1, R5
I3: AND R5, R11, R7
I4: OR
R8, R5, R2
I5: XOR R2, R4, R11
RAW
WAR
WAW
• Solution:
I1: ADD R1, R2, R3
Let’s give I3 temporary name/
I2: SUB R2, R1, R5
location (e.g., S) for the value
I3: AND S,
R5 R11, R7
it produces.
R5, R2
I4: OR R8, S,
• But I4 uses that value,
I5: XOR R2,
T, R4, R11
so we must also change that to S…
• In fact, all uses of R5 from I3 to the next
instruction that writes to R5 again must now be
changed to S!
• We remove WAW deps in the same way: change
R2 in I5 (and subsequent instrs) to T.
Program code
• Implementation
– Space for S, T, etc.
– How do we know when
to rename a register?
I1: ADD R1, R2, R3
I2: SUB R2, R1, R5
I3: AND S, R11, R7
I4: OR
R8, S,
R2
I5: XOR T, R4, R11
• Simple Solution
– Do renaming for every instruction
– Change the name of a register
each time we decode an
instruction that will write to it.
– Remember what name we gave it 
Download