L12-Bypassing

advertisement
Computer Architecture: A Constructive Approach
Bypassing
Joel Emer
Computer Science & Artificial Intelligence Lab.
Massachusetts Institute of Technology
March 19, 2012
http://csg.csail.mit.edu/6.S078
L12-1
Six Stage Pipeline
Fetch
Decode
Reg
Read
Execute
Memory
Writeback
p
c
F
f
r
D
d
r
R
r
r
X
x
r
M
m
r
W
Writeback feeds back directly to
ReadReg rather than through FIFO
March 19, 2012
http://csg.csail.mit.edu/6.S078
L12-2
Register Read Stage
Reg
Read
Decode
D
Decode
Scoreboard
update signaled
immediately
March 19, 2012
d
r
R
r
r
W
Read
regs
update scoreboard
RF
writeback register
http://csg.csail.mit.edu/6.S078
L12-3
Per stage architectural state
typedef struct {
MemResp instResp;
} FBundle;
typedef struct {
Rindx
r_dest;
Rindx
op1;
Rindx
op2;
...
} DecBundle;
typedef struct {
Data
src1;
Data
src2;
} RegBundle;
March 19, 2012
typedef struct {
Bool
cond;
Addr
addr;
Data
data;
} Ebundle;
No MemBundle –
data added to Ebundle
No WbBundle –
nothing added in WB
http://csg.csail.mit.edu/6.S078
L12-4
Per stage micro-architectural state
typedef struct {
FBundle
fInst;
DecBundle decInst;
RegBundle regInst;
Inum
inum;
Epoch
epoch;
} RegData deriving(Bits,Eq);
Architectural state
from prior stages
New architectural
state for this stage
Passthrough and (optional)
new micro-architectural state
Utility
function RegData newRegData(DecData decData);
function
RegData regData = ?;
regData.fInst
= decData.fInst;
regData.decInst = decData.decInst;
Copy architectural
state from prior stages
regData.inum
= decData.inum;
regData.epoch
= decData.epoch;
return regData;
Copy uarch state
endfunction
from prior stages
In “uarch_types”
March 19, 2012
http://csg.csail.mit.edu/6.S078
L12-5
Example of stage state use
rule doRegRead;
let regData = newRegData(dr.first());
let decInst = regData.decInst;
Initialize
stage state
Set utility
variables*
case (reg_read.readRegs(decInst)) matches
tagged Valid .rfdata:
Set
begin
architectural
state
if( writes_register(decInst))
reg_read.markUnavail(decInst.r_dest);
regData.regInst.src1 = rfdata.src1;
regData.regInst.src2 = rfdata.src2;
Set uarch
rr.enq(regData);
state if any
dr.deq();
end
Enqueue
Dequeue
outgoing
endcase
incoming
*Don’t try to
state
state
endrule
update!!!
March 19, 2012
http://csg.csail.mit.edu/6.S078
L12-6
Resource-based diagram key
0
1
Fetch
(Fuschia)
ζ
η
Decode
(Denim)
ε
ζ
Reg Read
(Red)
δ
Execute
(Emerald)
γ
δ
Memory
(Mustard)
β
γ
Writeback
(Wheat)
α
α
α = stalled
March 19, 2012
epoch change
(color change)
pc redirect
δ = killed
ε
R1 busy
Set R1
not busy
writeback feedback
http://csg.csail.mit.edu/6.S078
L12-7
Scoreboard clear bug!
9
η7waits8for write
of R1 by ζ
0
1
2
3
4
5
6
α
β
γ
δ
ε
ζ
η
α
β
γ
δ
ε
ζ
η
α
β
γ
δ
ε
ζ
α = 00: add r1,r0,r0
β = 04: j 40
γ = 08: add ...
δ = 12: add ...
ε = 16: add ...
ζ = 40: add r1,r0, r0
η = 44: add r2, r1,r0
α
β
γ
α
β
α
F
D
η
η
R
ζ
X
M
ζ
W
β
Looks okay, but what if α is delayed?
March 19, 2012
http://csg.csail.mit.edu/6.S078
L12-8
Scoreboard clear bug!
8
9by
η7is released
earlier write of R1
0
1
2
3
4
5
6
α
β
γ
δ
ε
ζ
η
α
β
γ
δ
ε
ζ
η
α
β
γ
δ
ε
ζ
α = 00: ld r1,100(r0)
β = 04: j 40
γ = 08: add ...
δ = 12: add ...
ε = 16: add ...
ζ = 40: add r1,r0, r0
η = 44: add r2, r1,r0
α
β
γ
α
α
α
F
α
Mistreatment of what did of data dependence?
March 19, 2012
http://csg.csail.mit.edu/6.S078
D
η
R
ζ
η
X
β
ζ
M
α
β
W
WAW
L12-9
So don’t clear scoreboard!
8
9 is
ζ7sees register
free by feedback,
then sets it busy F
0
1
2
3
4
5
6
α
β
γ
δ
ε
ζ
η
α
β
γ
δ
ε
ζ
η
α
β
γ
δ
ε
ζ
α = 00: ld r1,100(r0)
β = 04: j 40
γ = 08: add ...
δ = 12: add ...
ε = 16: add ...
ζ = 40: add r1,r0, r0
η = 44: add r2, r1,r0
α
β
γ
δ
ε
α
α
α
α
D
ζ
η
R
ζ
X
M
β
α
W
β
Looks okay, but what if R1 written in branch shadow?
March 19, 2012
http://csg.csail.mit.edu/6.S078
L12-10
Dead instruction deadlock!
ζ7never8sees
register R1
0
1
2
3
4
5
6
α
β
γ
δ
ε
ζ
η
α
β
γ
δ
ε
ζ
η
α
β
γ
δ
ε
ζ
α = 00: add r3,r0,r0
β = 04: j 40
γ = 08: add ...
δ = 12: add r1,...
ε = 16: add,...
ζ = 40: add r1,r0, r0
η = 44: add r2, r1,r0
α
β
γ
δ
ε
α
β
α
9
F
D
ζ
ζ
R
X
M
β
W
So, what can we do?
March 19, 2012
http://csg.csail.mit.edu/6.S078
L12-11
Poison bit uarch state
In exec:

instead of dropping killed instructions
mark them ‘poisoned’
In subsequent stages:


March 19, 2012
DO NOT make architectural changes
DO necessary bookkeeping
http://csg.csail.mit.edu/6.S078
L12-12
Poison-bit solution
Poisoned δ
frees register,
but7 value8 same,
9
ζ marks it busy.
0
1
2
3
4
5
6
α
β
γ
δ
ε
ζ
η
α
β
γ
δ
ε
ζ
η
η
α
β
γ
δ
ε
ζ
ζ
α = 00: add r3,r0,r0
β = 04: j 40
γ = 08: add ...
δ = 12: add r1,...
ε = 16: add,...
ζ = 40: add r1,r0, r0
η = 44: add r2, r1,r0
α
β
γ
δ
ε
α
β
γ
δ
ε
α
β
γ
δ
F
D
η
R
ζ
X
M
ε
W
What stage generates poison bit?
March 19, 2012
http://csg.csail.mit.edu/6.S078
L12-13
Poison-bit generation
Initialize
stage state
rule doExecute;
let execData = newExecData(rr.first);
let ...;
Set utility
if (! epochChange)
variables
begin
execData.execInst = exec.exec(decInst,src1,src2);
if (execInst.cond) ...
end
Set
architectural
else
state
begin
execData.poisoned = True;
Set uarch
end
state
Enqueue
xr.enq(execData);
outgoing
rr.deq();
state
Dequeue
What stages
endrule
incoming
need to handle
state
March 19, 2012
http://csg.csail.mit.edu/6.S078
poison bit?
L12-14
Poison-bit usage
Architectural
activity if not
poisoned
rule doMemory;
let memData = newMemData(xr.first);
let ...;
if(! poisoned && memop(decInst))
memData.execInst.data <- mem.port(...);
mr.enq(memData);
xr.deq();
endrule
Unconditionally do
rule doWriteBack;
bookkeeping
let wbData = newWBData(mr.first);
let ...;
reg_read.markAvailable(rdst);
if (!poisoned && regwrite(decInst)
Architectural
rf.wr(rdst, data);
activity if not
mr.deq;
poisoned
endrule
March 19, 2012
http://csg.csail.mit.edu/6.S078
L12-15
Data dependence stalls
0
1
ζ
η
ζ
2
3
4
5
6
8
9
F
η
ζ
D
η
η
η
ζ
R
η
ζ
ζ = 40: add r1,r0, r0
η = 44: add r2, r1,r0
X
M
η
ζ
This is a data dependence. What kind?
March 19, 2012
7
http://csg.csail.mit.edu/6.S078
η
W
RAW
L12-16
Bypass
RegRead
rr
Execute
What does RegRead need to do?
If scoreboard says register is not available then
RegRead needs to stall, unless it sees what it
needs on the bypass line….
March 19, 2012
http://csg.csail.mit.edu/6.S078
L12-17
Register Read Stage
D
Decode
d
r
R
r
r
X
W
Generate
bypass
Read
regs
Update
scoreboard
Bypass
Net
RF
March 19, 2012
Writeback register
http://csg.csail.mit.edu/6.S078
L12-18
Bypass Network
typedef
struct { Rindx regnum; Data
value;} BypassValue;
module mkBypass(BypassNetwork)
Rwire#(BypassValue) bypass;
method produceBypass(Rindx regnum, Data value);
bypass.wset(BypassValue{regname: regnum, value:value});
endmethod
method Maybe#(Data) consumeBypass1(Rindx regnum);
if (bypass matches tagged Valid .b && b.regnum == regnum)
return tagged Valid b.value;
else
return tagged Invalid;
endmethod
endmodule
Does bypass solve WAW hazard
March 19, 2012
http://csg.csail.mit.edu/6.S078
No!
L12-19
Register Read (with bypass)
module mkRegRead#(RFile rf, BypassNetwork bypass)(RegRead);
method Maybe#(Sources) readRegs(DecBundle decInst);
let op1 = decInst.op1;
let op2 = decInst.op2;
let rfdata1 = rf.rd1(op1); let rfdata2 = rf.rd2(op2);
let bypass1 = bypass.consumeBypass1(op1);
let bypass2 = bypass.consumeBypass2(op2);
let stall = isWAWhazard(scoreboard) ||
(isBusy(op1,scoreboard) && !isJust(bypass1)) ||
(isBusy(op2,scoreboard) && !isJust(bypass2)));
let s1 = fromMaybe(rfdata1, bypass1);
let s2 = fromMaybe(rfdata2, bypass2);
if (stall) return tagged Invalid;
else return tagged Valid Sources{src1:s1, src2:s2};
endmethod
March 19, 2012
http://csg.csail.mit.edu/6.S078
L12-20
Bypass generation in execute
rule doExecute;
let execData = newExecData(rr.first);
let ...;
if (! epochChange) begin
execData.execInst = exec.exec(decInst,src1,src2);
if (execInst.cond) ...
if ( decInst.i_type == ALU )
bypass.generateBypass(decInst.r_dst, execInst.data);
end else begin
execData.poisoned = True;
end
If have data destined
xr.enq(execData);
for a register bypass it
rr.deq();
back to register read
endrule
Does bypass solve all RAW delays?
March 19, 2012
http://csg.csail.mit.edu/6.S078
No – not for ld/st!
L12-21
Full bypassing
p
c
F
f
r
D
d
r
R
r
r
X
x
r
M
m
r
W
Every stage that generates register
writes can generate bypass data
March 19, 2012
http://csg.csail.mit.edu/6.S078
L12-22
Multi-cycle execute
R
r
r
X
x
r
M
M1
m
1
M2
m
2
M3
Incorporating a multi-cycle operation into a stage
March 19, 2012
http://csg.csail.mit.edu/6.S078
L12-23
Multi-cycle function unit
module mkMultistage(Multistage);
State1 m1 <- mkReg(); State2 m2 <- mkReg();
method Action request(Operand operand);
m1.enq(doM1(operand));
endmethod
rule s1;
m2.enq(doM2(m1.first)); m1.deq();
endrule
method ActionValue Result response();
return doM3(m2.first); m2.deq();
endmethod
endmodule
March 19, 2012
http://csg.csail.mit.edu/6.S078
Perform first
stage of pipeline
Perform middle
stage(s) of
pipeline
Perform last
stage of pipeline
and return result
L12-24
Single/Multi-cycle pipe stage
rule doExec;
If not busy
... initialization
start multicycle
operation, or…
let done = False;
if (! waiting) begin
if(isMulticycle(...)) begin
multicycle.request(...); waiting <= True;
end else begin
result = singlecycle.exec(...); done = True;
end
end else begin
…perform single
cycle operation
result <- multicycle.response();
done = True; waiting <= False;
end
if (done)
If busy, wait
Finish up if
...finish up
for response
have
a
result
endrule
March 19, 2012
http://csg.csail.mit.edu/6.S078
L12-25
Download