ppt file

advertisement
EECS 322 Computer Architecture
Pipeline Control,
Data Hazards
and Branch Hazards
EECS 322 April 10, 2000
Models
Single-cycle model (non-overlapping)
• Each instruction executes in a single cycle
• Every instruction and clock-cycle must be
stretched to the slowest instruction (p.438)
Multi-cycle model (non-overlapping)
• Each instruction executes in variable number of cycles
• The clock-cycle must be stretched to the slowest step
• Ability to share functional units within the execution
of a single instruction
Pipeline model (overlapping)
• Each instruction executes in several cycles
• The clock-cycle must be stretched to the slowest step
• Gains efficiency by overlapping the execution of multiple
instructions, increasing hardware utilization. (p. 377)
EECS 322 April 10, 2000
Overhead
Single-cycle model
Chip Area
• 8 ns Clock (125 MHz), (non-overlapping)
• 1 ALU + 2 adders
• 0 Muxes
• 0 Datapath Register bits (Flip-Flops)
Speed
Multi-cycle model
• 2 ns Clock (500 MHz), (non-overlapping)
• 1 ALU + Controller
• 5 Muxes
• 160 Datapath Register bits (Flip-Flops)
Pipeline model
• 2 ns Clock (500 MHz), (overlapping)
• 2 ALU + Controller
• 4 Muxes
• 373 Datapath + 16 Controlpath Register bits (Flip-Flops)
EECS 322 April 10, 2000
PCSrc
0
M
u
x
1
PC
IF/ID
32
2W
3M
ID/EX
EX/MEM
PC
PC3
2
Add
bits
4
2W
3M
4 EX
Add
result
Add
MEM/WB
2W
32
M
Branch
PC
Address
IR
Instruction
memory
Instruction
RegWrite
32
Read
register 1
Read
data 1
Read
register 2
Registers Read
Write
data 2
register
Write
data
Datapath
Registers
160 FFs
bits
Z
MemWrite
Figure 6.25
R
Zero
Zero
ALU ALU
result
0
M
u
x
1
Instruction
16
[15– 0]
Instruction
[20– 16]
Sign
extend
32
Si
32
RT
5
6
0
M
u
x
1
RegDst
+ 16 FFs
1
ALUSrc
B
32
D
ALU
Out
32
Address
Data
memory
ALU
control
B
32
ALUOp
D
5
Read
data
32
ALU
Out
Write
data
Instruction
[15– 11]
+ 213 FFs
A
32
Shift
left 2
MemRead
MemtoReg
1
M
u
x
0
32
D
5
RD
5
EECS 322 April 10, 2000
Pipeline Control: Controlpath Register bits
WB
Instruction
Control
IF/ID
M
WB
EX
M
WB
ID/EX
EX/MEM
MEM/WB
9 control bits
5 control bits
Figure 6.29
2 control bits
EECS 322 April 10, 2000
Pipeline Control: Controlpath Register bits
Figure 5.20, Single Cycle
Instruction
Reg
Dst
ALU
Src
Mem
Reg
Reg
Wrt
Mem
Red
Mem
Wrt
Branch
ALU
op1
ALU
op0
R-format
1
0
0
1
0
0
0
1
0
lw
1
1
1
1
1
0
0
0
0
sw
X
1
X
0
0
1
0
0
0
beq
X
0
X
0
0
0
1
0
1
ID / EX
control lines
Figure 6.28
EX / MEM
control lines
MEM / WB
cntrl lines
Instruction
Reg
Dst
ALU
Op1
ALU
Op0
ALU
Src
Branch
Mem
Red
Mem
Wrt
Reg
Wrt
Mem
Reg
R-format
1
1
0
0
0
0
0
1
0
lw
1
0
0
1
0
1
0
1
1
sw
X
0
0
1
0
0
1
0
X
beq
X
0
1
0
1
0
0
0
X
EECS 322 April 10, 2000
Pipeline Hazards
Pipeline hazards
• Solution #1 always works: stall, delay & procrastinate!
Structural Hazards (i.e. fetching same memory bank)
• Solution #2: partition architecture
Control Hazards (i.e. branching)
• Solution #1: stall! but decreases throughput
• Solution #2: guess and back-track
• Solution #3: delayed decision: delay branch & fill slot
Data Hazards (i.e. register dependencies)
• Worst case situation
• Solution #2: re-order instructions
• Solution #3: forwarding or bypassing: delayed load
EECS 322 April 10, 2000
Pipeline Datapath and Controlpath
PCSrc
ID/EX
0
M
u
x
1
WB
Control
IF/ID
EX/MEM
M
WB
EX
M
MEM/WB
WB
Add
Add
Add result
Instruction
memory
ALUSrc
Read
register 1
Read
data 1
Read
register 2
Registers Read
Write
data 2
register
Write
data
Zero
ALU ALU
result
0
M
u
x
1
MemtoReg
Address
Branch
Shift
left 2
MemWrite
PC
Instruction
RegWrite
4
Address
Data
memory
Read
data
Write
data
Instruction 16
[15– 0]
Instruction
[20– 16]
Instruction
[15– 11]
Sign
extend
32
6
ALU
control
0
M
u
x
1
1
M
u
x
0
MemRead
ALUOp
RegDst
Figure 6.30
EECS 322 April 10, 2000
Clock 2
WB
A=C$1
Instruction
Zero
ALU ALU
result
0
M
u
x
1
6
ALU
control
0
M
u
x
1
ALUOp
RegWrite
Branch
Address
Data
memory
Read
data
Write
data
MemRead
D=$10
D=0
Figure 6.30
D=$10
RegDst
WB
1
M
u
x
0
Aluout
32
T=$10
Instruction
[15– 11]
Sign
extend
ALUSrc
ALU=20+C$1
S=20
lw $10,20($1)
Write
data
Instruction
[20– 16]
Add
Add result
B=X
IR=
Read
data 1
Read
register 2
Registers Read
Write
data 2
register
Instruction 16
[15– 0]
M
Shift
left 2
Read
register 1
MEM/WB
MDR=Mem[20+C$1]
PC=4
PC=0
Instruction
memory
WB
EX
4
PC=4+20<<2
EX/MEM
M
IF/ID
Address
WB=11
PC=4+20<<2
Control
PC
WB=11
M=010
MemtoReg
PC=4
ID/EX
0
M
u
x
1
Add
Clock 3
WB=11, M=010
EX=0001
PCSrc
Clock 0
Clock 3
MemWrite
load inst.
Clock 1
EECS 322 April 10, 2000
Clock 2
WB
A=C$1
Instruction
Zero
ALU ALU
result
0
M
u
x
1
6
ALU
control
0
M
u
x
1
ALUOp
RegWrite
Branch
Address
Data
memory
Read
data
Write
data
MemRead
D=$10
D=0
Figure 6.30
D=$10
RegDst
WB
1
M
u
x
0
Aluout
32
T=$10
Instruction
[15– 11]
Sign
extend
ALUSrc
ALU=20+C$1
S=20
lw $10,20($1)
Write
data
Instruction
[20– 16]
Add
Add result
B=X
IR=
Read
data 1
Read
register 2
Registers Read
Write
data 2
register
Instruction 16
[15– 0]
M
Shift
left 2
Read
register 1
MEM/WB
MDR=Mem[20+C$1]
PC=4
PC=0
Instruction
memory
WB
EX
4
PC=4+20<<2
EX/MEM
M
IF/ID
Address
WB=11
PC=4+20<<2
Control
PC
WB=11
M=010
MemtoReg
PC=4
ID/EX
0
M
u
x
1
Add
Clock 3
WB=11, M=010
EX=0001
PCSrc
Clock 0
Clock 3
MemWrite
load inst.
Clock 1
EECS 322 April 10, 2000
Pipeline Datapath and Controlpath
Contents of Register 1 = C$1 = 3; C$2=4; C$3=4; C$4=6; C$5=7; C$10=8; … Memory[23]=9;
Formats: add $rd,$rs=A,$rt=B;
lw $rt=B,@($rs=A)
Clock
<IF/ID>
<ID/EX>
<EX/MEM>
<MEM/WB>
<PC, IR>
<PC, A, B, S, Rt, Rd> <PC, Z, ALU, B, R> <MDR, ALU, R>
0
<0,?>
<?,?,?,?,?,?>
<?,?,?,?,?>
<?,?,?>
1
<4,lw $10,20($1)>
<0,?,?,?,?,?>
<?,?,?,?,?>
<?,?,?>
2
<8,sub $11,$2,$3>
<4,C$13,C$108,20,$10,0> <0,?,?,?,?>
3
<12,and $12,$4,$5> <8,C$24,C$34,X,$3,$11> <4+20<<284,0,20+323,8,$10><?,?,?>
4
<16,or $13,$6,$7>
5
<20,add $14,$8,$9> <16,C$6 ,C$7,X,$7,$13> <X,0,1,7,$12>
<?,?,?>
<12,C$46,C$57,X,$5,$12><X,1,4-4=0,4,$11> <Mem[23]9,23,$10>
<X,0,$11>
EECS 322 April 10, 2000
Clock 1: Figure 6.31a
EX: before<2>
IF/ID
ID/EX
00
Control
PC=4
Add
000
0000
WB
M
EX/MEM
Instruction
Instruction
memory
000
WB
0
00
EX
0
M
00
0
0
0
0
WB 0
Branch
Shift
left 2
ALUSrc
Read
register 1
Read
data 1
Read
register 2
Registers Read
Write
data 2
register
Write
data
Zero
ALU ALU
result
0
M
u
x
1
Address
Data
memory
Read
data
Write
data
Instruction
[15– 0]
Sign
extend
Instruction
[20– 16]
Instruction
[15– 11]
Clock 1
IF: sub $11, $2, $3
MEM/WB
Add
Add result
RegWrite
IR=lw $10,20($1)
PC=0
Address
WB: before<4>
00
4
PC
MEM: before<3>
MemtoReg
0
M
u
x
1
ID: before<1>
MemWrite
IF: lw $10, 20($1)
ALU
control
0
M
u
x
1
1
M
u
x
0
MemRead
ALUOp
RegDst
ID: lw $10, 20($1)
EX: before<1>
MEM: before<2>
EECS
322before<3>
April 10, 2000
WB:
1
Clock 1
RegDst
C
ID/EX
lw
Control
WB
010
0001
RegWrite
X
WB
M
00
0
0
0
0
WB 0
Add
Add result
Branch
Shift
left 2
ALUSrc
20
Instruction
[15– 0]
10
Instruction
[20– 16]
20
10
X
D=0
Instruction
[15– 11]
Sign
extend
Zero
ALU ALU
result
0
M
u
x
1
Address
Data
memory
Read
data
Write
data
T=$10
Clock 2
Read $1
data 1
Read
register 2
Registers Read $X
Write
data 2
register
Write
data
X
000
Read
register 1
S=20
Instruction
memory
Instruction
Address
IR=sub$10,20($1)
IR=lw
$11, $2, $3
PC=4
PC
MEM/WB
00
0
00
EX
0
B=X
4
1
M
EX/MEM
WB: before<3>
MemtoReg
11
MEM: before<2>
MemWrite
IF/ID
PC=4
PC=8
Add
EX: before<1>
A=C$1
0
M
u
x
1
ID: lw $10, 20($1)
PC=4
IF: sub $11, $2, $3
ALU
control
0
M
u
x
1
1
M
u
x
0
MemRead
ALUOp
RegDst
Figure 6.31b
EECS 322 April 10, 2000
ID/EX
PC=8
PC=12
2
RegWrite
010
0
00
EX
1
Read $2
data 1
Instruction
[20– 16]
X
X
11
10
0
WB 0
Branch
ALUSrc
Zero
ALU ALU
result
0
M
u
x
1
20
D=0
D=$11
Instruction
[15– 11]
Sign
extend
00
ALU
control
0
M
u
x
1
ALUOp
RegDst
Address
Data
memory
Read
data
Write
data
1
M
u
x
0
MemRead
D=$10
X
Add
Add result
$1
Write
data
MEM/WB
0
0
M
0
Shift
left 2
Read
register 2
Registers Read $3
Write
data 2
register
Instruction
[15– 0]
WB
T=$10
T=$3
Figure 6.32a
Read
register 1
X
11
Clock 3
1100
M
ALU=20+C$1
3
000
S=20
S=X
IR=sub $12,$4,$5
IR=and
$11, $2, $3
PC=8
Instruction
memory
Instruction
4
Address
Control
11
PC=4+20<<2
sub
WB
EX/MEM
WB: before<2>
MemtoReg
10
MEM: before<1>
MemWrite
IF/ID
Add
PC
EX: lw $10, . . .
C
0
M
u
x
1
ID: sub $11, $2, $3
C C PC=8
B=X
PC=4 A=C$2
A=C$1 B=C$3
IF: and $12, $4, $5
EECS 322 April 10, 2000
11
ID/EX
and
Control
WB
000
1100
Instruction
[20– 16]
$3
Sign
extend
X
X
Instruction
11
0
1
0
0
WB 0
ALUSrc
Zero
ALU ALU
result
0
M
u
x
1
ALU
control
0
M
u
x
1
ALUOp
RegDst
Read
data
Address
Data
memory
Write
data
MemRead
1
M
u
x
0
10
D=$10
11
MEM/WB
Branch
D=$10
D=$11
12
D=$11
D=$12
Instruction
[15– 11]
Add
Add result
WB: before<1>
ALU
X
M
T=$3
Clock 4
1
10
EX
0
Write
data
Instruction
[15– 0]
WB
Shift
left 2
Read $4
data 1
Read
register 2
Registers Read $5
Write
data 2
register
X
12
000
ALU=20+C$1
ALU=C$2-C$3
5
10
S=X
IR=and
$12,$4,$5
IR=or
$13,$6,$7
PC=20
Instruction
memory
4
M
EX/MEM
PC=4+20<<2
PC=X
10
4
Address
C
IF/ID
MEM: lw $10, . . .
MDR=Mem[20+C$1]
$2
EX: sub $11, . . .
Add
PC
RegDst
PC=4+20<<2
Read
register 1
ID: and $12, $2, $3
PC=12
PC=16
0
M
u
x
1
PC=8 A=C$2
B=C$5
A=C$4 B=C$3
PC=12
IF: or $13, $6, $7
RegWrite
Clock 4: Figure 6.32b
1
C
C
Clock 3
MemtoReg
[15– 11]
MemWrite
11
EECS 322 April 10, 2000
Data Dependencies: that can be resolved by forwarding
Time (in clock cycles)
CC 1
Value of
register $2: 10
CC 2
CC 3
10
10
CC 4
CC 5
CC 6
CC 7
CC 8
CC 9
10
10/–
20 forwarding
– 20
– 20
Resolved
by
– 20
– 20
Program
execution
order
(in instructions)
sub $2, $1, $3
Data Dependencies
and $12, $2, $5
IM
Reg
IM
sw $15, 100($2)
Figure 6.36
Data Hazards
Reg
DM
Reg
IM
or $13, $6, $2
add $14, $2, $2
DM
Reg
DM
Reg
IM
At same time: Not a hazard
DM
Reg
IM
Reg
Reg
Reg
DM
Reg
Forward in time: Not a hazard
EECS 322 April 10, 2000
Data Hazards: arithmetic
Time (in clock cycles)
CC 1
Value of register $2 : 10
Value of EX/MEM : X
Value of MEM/WB : X
CC 2
CC 3
CC 4
CC 5
CC 6
CC 7
CC 8
CC 9
10
X
X
10
X
X
10
– 20
X
10/– 20
X
– 20
– 20
X
X
– 20
X
X
– 20
X
X
– 20
X
X
Forwards in time: Can be resolved
Program
execution order
(in instructions)
sub $2, $1, $3
and $12, $2, $5
or $13, $6, $2
add $14, $2, $2
sw $15, 100($2)
Figure 6.37
At same time: Not a hazard
IM
Reg
IM
DM
Reg
IM
Reg
DM
Reg
IM
Reg
DM
Reg
IM
Reg
DM
Reg
Reg
DM
Reg
EECS 322 April 10, 2000
Data Dependencies: no forwarding
Clock
1
IF
sub $2,$1,$3
and $12,$2,$5
2
3
ID
IF
EX
ID
Stall
5
6
7
8
WB
M
ID
Stall
4
ID
Write
1st
Half
EX
M
WB
Read
2nd
Half
Suppose every instruction is dependant = 1 + 2 stalls = 3 clocks
MIPS = Clock = 500 Mhz = 167 MIPS
CPI
3
EECS 322 April 10, 2000
Data Dependencies: no forwarding
A dependant instruction will take
= 1 + 2 stalls = 3 clocks
An independent instruction will take = 1 + 0 stalls = 1 clocks
Suppose 10% of the time the instructions are dependant?
Averge instruction time = 10%*3 + 90%*1 = 0.10*3 + 0.90*1 = 1.2 clocks
MIPS = Clock = 500 Mhz = 417 MIPS (10% dependency)
CPI
1.2
MIPS = Clock = 500 Mhz = 167 MIPS (100% dependency)
CPI
3
MIPS = Clock = 500 Mhz = 500 MIPS (0% dependency)
CPI
1
EECS 322 April 10, 2000
Data Dependencies: with forwarding
Clock
1
IF
sub $2,$1,$3
and $12,$2,$5
2
ID
IF
3
EX
ID
4
M
EX
5
6
WB
M
WB
Detected
Data Hazard 1a
ID/EX.$rs = EX/M.$rd
Suppose every instruction is dependant = 1 + 0 stalls = 1 clock
MIPS = Clock = 500 Mhz = 500 MIPS
CPI
1
EECS 322 April 10, 2000
Data Dependencies: Hazard Conditions
Data Hazard Condition
occurs whenever a data source needs a previous
unavailable result due to a data destination.
Example
sub $2, $1, $3
and $12, $2, $5
sub
and
$rd, $rs, $rt
$rd, $rs, $rt
Data Hazard Detection
is always comparing a destination with a source.
Destination
EX/MEM.$rdest =
MEM/WB.$rdest =
Source
{
{
ID/EX.$rs
ID/EX.$rt
ID/EX.$rs
ID/EX.$rt
Hazard Type
1a.
1b.
2a.
2b.
EECS 322 April 10, 2000
Data Dependencies: Hazard Conditions
1a Data Hazard:
sub $2, $1, $3
and $12, $2, $5
EX/MEM.$rd = ID/EX.$rs
sub $rd, $rs, $rt
and $rd, $rs, $rt
1b Data Hazard:
sub $2, $1, $3
and $12, $1, $2
EX/MEM.$rd = ID/EX.$rt
sub $rd, $rs, $rt
and $rd, $rs, $rt
2a Data Hazard:
sub $2, $1, $3
and $12, $1, $5
or $13, $2, $1
MEM/WB.$rd = ID/EX.$rs
sub $rd, $rs, $rt
sub $rd, $rs, $rt
and $rd, $rs, $rt
2b Data Hazard:
sub $2, $1, $3
and $12, $1, $5
or $13, $6, $2
MEM/WB.$rd = ID/EX.$rt
sub $rd, $rs, $rt
sub $rd, $rs, $rt
and $rd, $rs, $rt
EECS 322 April 10, 2000
Data Dependencies: Worst case
Data Hazard:
sub $2, $1, $3
sub
$rd, $rs, $rt
and $12, $2, $2
and
$rd, $rs, $rt
or
and
$rd, $rs, $rt
$13, $2, $2
Data Hazard 1a:
Data Hazard 1b:
Data Hazard 2a:
Data Hazard 2b:
EX/MEM.$rd = ID/EX.$rs
EX/MEM.$rd = ID/EX.$rt
MEM/WB.$rd = ID/EX.$rs
MEM/WB.$rd = ID/EX.$rt
EECS 322 April 10, 2000
Data Dependencies: Hazard Conditions
Hazard
Type
1a.
1b.
Source
ID/EX.$rs
ID/EX.$rt
2a.
2b.
ID/EX.$rs
ID/EX.$rt
ID/EX
Destination
} = EX/MEM.$rdest
}
= MEM/WB.$rdest
Pipeline Registers
EX/MEM
MEM/WB
$rs
$rt
$rd
$rd
$rd
EECS 322 April 10, 2000
ID/EX
EX/MEM
Registers
MEM/WB
ALU
Data
memory
M
u
x
a. No forwarding
ID/EX
EX/MEM
MEM/WB
M
u
x
Registers
ForwardA
ALU
M
u
x
Rs
Rt
Rt
Rd
Data
memory
ForwardB
M
u
x
EX/MEM.RegisterRd
Forwarding
unit
Figure 6.38
M
u
x
MEM/WB.RegisterRd
b. With forwarding
EECS 322 April 10, 2000
Data Hazards: Loads
Time (in clock cycles)
Program
CC 1
execution
order
(in instructions)
lw $2, 20($1)
IM
and $4, $2, $5
or $8, $2, $6
CC 2
Backwards in time: Cannot be resolved
CC 3
CC 4
CC 5
CC 6
CC 7
CC 8
CC 9
Forwards in time: Can be resolved
Reg
IM
DM
Reg
IM
add $9, $4, $2
Reg
DM
Reg
IM
Reg
DM
Reg
Reg
DM
Reg
At same time: Not a hazard
slt $1, $6, $7
IM
Reg
DM
Reg
Figure 6.44
EECS 322 April 10, 2000
Data Hazards: load stalling
Program
Time (in clock cycles)
execution
CC 1
CC 2
order
(in instructions)
lw $2, 20($1)
IM
and $4, $2, $5
CC 3
Reg
IM
or $8, $2, $6
Reg
IM
CC 4
CC 5
DM
Reg
Reg
IM
CC 6
CC 7
DM
Reg
Reg
DM
CC 8
CC 9
CC 10
Reg
bubble
add $9, $4, $2
IM
DM
Reg
Reg
Stall
slt $1, $6, $7
Figure 6.45
IM
Reg
DM
Reg
EECS 322 April 10, 2000
Data Hazards: Hazard detection unit (page 490)
Stall Condition
Source
IF/ID.$rs
IF/ID.$rt
Destination
} = ID/EX.$rt  ID/EX.MemRead=1
Stall Example
lw
$2, 20($1)
and $4, $2, $5
lw
$rt, addr($rs)
and
$rd, $rs, $rt
No Stall Example: (only need to look at next instruction)
lw
$2, 20($1)
lw
$rt, addr($rs)
and $4, $1, $5
and $rd, $rs, $rt
or
$8, $2, $6
or
$rd, $rs, $rt
EECS 322 April 10, 2000
Data Hazards: Hazard detection unit (page 490)
No Stall Example: (only need to look at next instruction)
lw
$2, 20($1)
lw
$rt, addr($rs)
and $4, $1, $5
and $rd, $rs, $rt
or
$8, $2, $6
or
$rd, $rs, $rt
Example
load: assume half of the instructions are immediately
followed by an instruction that uses it.
What is the average number of clocks for the load?
load instruction time: 50%*(1 clock) + 50%*(2 clocks)=1.5
EECS 322 April 10, 2000
Hazard Detection Unit: when to stall
ID/EX.MemRead
Hazard
detection
unit
ID/EX
IF/IDWrite
WB
Control
0
M
u
x
PC
Instruction
memory
Instruction
PCWrite
IF/ID
EX/MEM
M
WB
EX
M
MEM/WB
WB
M
u
x
Registers
ALU
Data
memory
M
u
x
M
u
x
IF/ID.RegisterRs
Figure 6.46
IF/ID.RegisterRt
IF/ID.RegisterRt
Rt
IF/ID.RegisterRd
Rd
ID/EX.RegisterRt
Rs
Rt
M
u
x
EX/MEM.RegisterRd
Forwarding
unit
MEM/WB.RegisterRd
EECS 322 April 10, 2000
Data Dependency Units
Forwarding Condition
Source
ID/EX.$rs
ID/EX.$rt
Destination
}
} = MEM/WB.$rd
ID/EX.$rs
ID/EX.$rt
Stall Condition
Source
IF/ID.$rs
IF/ID.$rt
= EX/MEM.$rd
Destination
}
= ID/EX.$rt  ID/EX.MemRead=1
EECS 322 April 10, 2000
Data Dependency Units
Pipeline Registers
Stalling Comparisons
IF/ID
ID/EX
$rs
$rs
$rt
$rt
$rd
$rd
Stall Condition
Source
IF/ID.$rs
IF/ID.$rt
Forwarding Comparisons
EX/MEM
$rd
MEM/WB
$rd
Destination
}
= ID/EX.$rt  ID/EX.MemRead=1
EECS 322 April 10, 2000
Branch Hazards: Soln #1, Stall until Decision made (fig. 6.4)
@3C:
@40:
add
beq
$4, $5, $6
$1, $3, 7
Soln #1: Stall until Decision is made
@44:
@48:
@4C:
@50:
Program
execution
Time
order
(in instructions)
add $4, $5, $6
beq $1, $2, 40
and
or
add
lw
2
Instruction
fetch
2ns
4
Reg
Instruction
fetch
lw $3, 300($0)
4 ns
$12, $2, $5
$13, $6, $2
$14, $2, $2
$4, 50($7)
6
ALU
Reg
8
Data
access
ALU
Instruction
fetch
10
14
12
16
Reg
Data
access
Reg
Reg
ALU
Data
access
Reg
2ns
Stall
Decision made in ID stage: do load
EECS 322 April 10, 2000
Branch Hazards: Soln #2, Predict until Decision made
Clock
beq $1,$3,7
1
IF
2
ID
3
EX
4
M
5
6
7
8
WB
Predict false branch
and $12, $2, $5
IF
ID
EX
M
WB
discard “and $12,$2,$5” instruction
lw $4, 50($7)
IF
ID
EX
M
WB
Decision made in ID stage: discard & branch
EECS 322 April 10, 2000
Branch Hazards: Soln #3, Delayed Decision
Clock
beq $1,$3,7
1
IF
2
ID
3
EX
4
M
5
6
7
8
WB
Move instruction before branch
add $4,$6,$6
IF
ID
EX
M
WB
Do not need to discard instruction
lw $4, 50($7)
IF
ID
EX
M
WB
Decision made in ID stage: branch
EECS 322 April 10, 2000
Branch Hazards: Soln #3, Delayed Decision
Clock
beq $1,$3,7
and $12, $2, $5
1
IF
2
ID
IF
3
EX
ID
4
M
EX
5
6
7
8
WB
M
WB
Decision made in ID stage: do branch
lw $4, 50($7)
IF
ID
EX
M
WB
EECS 322 April 10, 2000
Branch Hazards: Decision made in the ID stage (figure 6.4)
Clock
1
2
IF
ID
nop
No decision yet:
insert a nop
IF
beq $1,$3,7
lw $4, 50($7)
3
EX
ID
4
M
EX
5
6
7
8
WB
M
WB
Decision: do load
IF
ID
EX
M
WB
EECS 322 April 10, 2000
Branch Hazards: Soln #2, Predict until Decision made
Program
execution
order
(in instructions)
Branch Decision made in MEM stage:
Discard values when wrong prediction
Time (in clock cycles)
CC 1
CC 2
CC 3
CC 4
CC 5
DM
Reg
CC 6
CC 7
CC 8
CC 9
Predict false branch
40 beq $1, $3, 7
44 and $12, $2, $5
48 or $13, $6, $2
IM
Reg
IM
Reg
IM
52 add $14, $2, $2
72 lw $4, 50($7)
Figure 6.50
DM
Reg
IM
Reg
DM
Reg
Same effect as 3 stalls
IM
Reg
DM
Reg
Reg
DM
Reg
EECS 322 April 10, 2000
Figure 6.51
IF.Flush
Early branch comparison
Hazard
detection
unit
ID/EX
M
u
x
WB
Control
0
M
u
x
IF/ID
4
M
WB
EX
M
MEM/WB
WB
Shift
left 2
Registers
PC
EX/MEM
=
M
u
x
Instruction
memory
ALU
M
u
x
Data
memory
M
u
x
Sign
Flush: if wrong prediciton,
add nops
extend
M
u
x
Forwarding
unit
EECS 322 April 10, 2000
Performance
load: assume half of the instructions are immediately
followed by an instruction that uses it (i.e. data dependency)
load instruction time = 50%*(1 clock) + 50%*(2 clocks)=1.5
Jump: assume that jumps always pay 1 full clock cycle
delay (stall). Jump instruction time = 2
Branch: the branch delay of misprediction is 1 clock cycle
that 25% of the branches are mispredicted.
branch time = 75%*(1 clocks) + 25%*(2 clocks) = 1.25
EECS 322 April 10, 2000
Performance, page 504
Instruction
SingleCycle
Multi-Cycle
Clocks
Pipeline
Cycles
Instruction
Mix
loads
1
5
1.5
23%
stores
1
4
1
13%
arithmetic
1
4
1
43%
branches
1
3
1.25
19%
jumps
1
3
2
2%
Clock
speed
125 Mhz
8 ns
500 Mhz
2 ns
500 Mhz
2 ns
CPI
1
4.02
1.18
=  Cycles*Mix
MIPS
125 MIPS
125 MIPS
424 MIPS
= Clock/CPI
EECS 322 April 10, 2000
Homework for Wednesday April 5, 2000
1. Convert the RISCEE 1 Architecture into a
pipeline Architecture (like Figure 6.30)
(showing the number data and control bits).
2. Build the control line table (like Figure 6.28)
for the RISCEE3 pipeline architecture for
RISCEE1 instructions:
(addi, subi, load, store, beq, jmp, jal).
3. Homework 6.23, p534
4. Homework 6.24, p534
EECS 322 April 10, 2000
Download