CDA 5155 Week 4 Branch Prediction

advertisement
CDA 5155
Week 4
Branch Prediction
M
U
X
1
+
target
+
Inst
mem
PC
REG
file
eq?
sign
ext
bpc target
M
U
X
M
U
X
A
L
U
Data
memory
Control
beq
IF/
ID
ID/
EX
EX/
Mem
Mem/
WB
Branch Target Buffer
Fetch PC
Send PC
to BTB
found?
Yes
use
target
Predicted target PC
No
use
PC+1
Branch prediction
• Predict not taken:
~50% accurate
– No BTB needed; always use PC+1
• Predict backward taken:
~65% accurate
– BTB holds targets for backward branches (loops)
• Predict same as last time:
~80% accurate
– Update BTB for any taken branch
What about indirect branches?
• Could use same approach
– PC+1 unlikely indirect target
– Indirect jumps often have multiple targets (for
same instruction)
• Switch statements
• Virtual function calls
• Shared library (DLL) calls
Indirect jump: Special Case
• Return address stack
– Function returns have deterministic behavior (usually)
• Return to different locations (BTB doesn’t work well)
• Return location known ahead of time
– In some register at the time of the call
– Build a specialize structure for return addresses
• Call instructions write return address to R31 AND RAS
• Return instructions pop predicted target off stack
– Issues: finite size (save or forget on overflow?);
– Issues: long jumps (clear when wrong?)
Costs of branch
prediction/speculation
• Performance costs?
– Minimal: no difference between waiting and
squashing; and it is a huge gain when
prediction is correct!
• Power?
– Large: in very long/wide pipelines many
instructions can be squashed
• Squashed = # mispredictions  pipeline
length/width before target resolved
Costs of branch
prediction/speculation
• Area?
– Can be large: predictors can get very big as we
will see next time
• Complexity?
– Designs are more complex
– Testing becomes more difficult, but …
What else can be speculated?
• Dependencies
– I think this data is coming from that store instruction
• Values
– I think I will load a 0 value
• Accuracy?
–
–
–
–
Branch prediction (direction) is Boolean (T,NT)
Branch targets are stable or predictable (RAS)
Dependencies are limited
Values cover a huge space (0 – 4B)
Parts of the branch predictor
• Direction Predictor
– For conditional branches
• Predicts whether the branch will be taken
– Examples:
• Always taken; backwards taken
• Address Predictor
– Predicts the target address (use if predicted taken)
– Examples:
• BTB; Return Address Stack; Precomputed Branch
• Recovery logic
Ref: The Precomputed Branch Architecture
Characteristics of branches
• Individual branches differ
– Loops tend not to exit
• Unoptimized code: not-taken
• Optimized code: taken
– If-statements:
• Tend to be less predictable
– Unconditional branches
• Still need address prediction
Example gzip:
• gzip: loop branch A@ 0x1200098d8
•
•
•
•
Executed: 1359575 times
Taken:
1359565 times
Not-taken: 10 times
% time taken: 99% - 100%
Easy to predict (direction and address)
Example gzip:
• gzip: if branch B@ 0x12000fa04
•
•
•
•
Executed: 151409 times
Taken:
71480 times
Not-taken: 79929 times
% time taken: ~49%
Easy to predict? (maybe not/ maybe dynamically)
Example: gzip
12000000
10000000
8000000
6000000
4000000
A
2000000
Easy to predict
14000000
Easy to predict
total branch executions
16000000
B
0
0
% taken (per branch)
Direction prediction: always taken
Accuracy: ~73 %
100
Branch Backwards
3
not taken
2.5
taken
2
1.5
1
0.5
distance of branch target
Most backward branches are heavily TAKEN
Forward branches slightly more likely to be NOT-TAKEN
Ref: The Effects of Predicated Execution on Branch Prediction
95
80
65
50
35
20
5
0
-1
5
-2
0
-4
5
-5
0
-7
5
-8
00
0
-1
% of total branches
3.5
Using history
• 1-bit history (direction predictor)
– Remember the last direction for a branch
Branch History Table
branchPC
NT
How big is the BHT?
T
Example: gzip
total branch executions
16000000
A
14000000
12000000
10000000
8000000
6000000
4000000
2000000
B
0
0
% taken (per branch)
Direction prediction: always taken
Accuracy: ~73 %
How many times will branch A mispredict?
How many times will branch B mispredict?
100
Using history
• 2-bit history (direction predictor)
Branch History Table
branchPC
SN
How big is the BHT?
NT
T
ST
Example: gzip
total branch executions
16000000
A
14000000
12000000
10000000
8000000
6000000
4000000
2000000
B
0
0
% taken (per branch)
Direction prediction: always taken
Accuracy: ~73 %
How many times will branch A mispredict?
How many times will branch B mispredict?
100
Using History Patterns
~80 percent of branches are either heavily
TAKEN or heavily NOT-TAKEN
For the other 20%, we need to look a patterns
of reference to see if they are predictable
using a more complex predictor
Example: gcc has a branch that flips each time
T(1) NT(0)
10101010101010101010101010101010101010
Using history
• 1-bit history (direction predictor)
– Remember the last direction for a branch
Branch History Table
branchPC
NT
How big is the BHT?
T
Using history
• 2-bit history (direction predictor)
Branch History Table
branchPC
SN
How big is the BHT?
NT
T
ST
Using History Patterns
~80 percent of branches are either heavily
TAKEN or heavily NOT-TAKEN
For the other 20%, we need to look a patterns
of reference to see if they are predictable
using a more complex predictor
Example: gcc has a branch that flips each time
T(1) NT(0)
10101010101010101010101010101010101010
Local history
branchPC
Branch History
Table
Pattern History
Table
10101010
What is the prediction
for this BHT 10101010?
When do I update the tables?
NT
T
Local history
branchPC
Branch History
Table
Pattern History
Table
01010101
NT
On the next execution of this
branch instruction, the branch
history table is 01010101,
pointing to a different pattern
What is the accuracy of a flip/flop branch 0101010101010…?
T
Global history
Branch History
Register
Pattern History
Table
01110101
for (i=0; i<100; i++)
for (j=0; j<3; j++)
if (aa == 2)
0;  taken
j<3 jaa= =1 1101
== 2)
j<3 jif=(bb
2 1011
 taken
0;  not taken
j<3 jbb
==
3 0111
{…
i<100if (aa != bb)
1110
 usually taken
How can branches interfere with each other?
Gshare predictor
branchPC
Branch History
Register
01110101
Must read!
Ref: Combining Branch Predictors
Pattern History
Table
xor
Bimod predictor
Global history reg
branchPC
xor
Choice
predictor
PHT skewed
taken
PHT skewed
Not-taken
mux
Hybrid predictors
Local predictor
(e.g. 2-bit)
Global/gshare predictor
(much more state)
Prediction
1
Selection table
(2-bit state machine)
Prediction
2
Prediction
How do you select which predictor to use?
How do you update the various predictor/selector?
Overriding Predictors
• Big predictors are slow, but more accurate
• Use a single cycle predictor in fetch
• Start the multi-cycle predictor
– When it completes, compare it to the fast prediction.
• If same, do nothing
• If different, assume the slow predictor is right and flush
pipline.
• Advantage: reduced branch penalty for those
branches mispredicted by the fast predictor and
correctly predicted by the slow predictor
Pipelined Gshare Predictor
• How can we get a pipelined global
prediction by stage 1?
– Start in stage –2
– Don’t have the most recent branch history…
• Access multiple entries
– E.g. if we are missing last three branches, get 8 histories
and pick between them during fetch stage.
Ref: Reconsidering Complex Branch Predictors
Exceptions
• Exceptions are events that are difficult or
impossible to manage in hardware alone.
• Exceptions are usually handled by jumping
into a service (software) routine.
• Examples: I/O device request, page fault,
divide by zero, memory protection
violation (seg fault), hardware failure, etc.
Taking and Exception
• Once an exception occurs, how does the
processor proceed.
– Non-pipelined: don’t fetch from PC; save state; fetch
from interrupt vector table
– Pipelined: depends on the exception
• Precise Interrupt: Must stop all instruction “after the
exception” (squash)
– Divide by zero: flush fetch/decode
– Page fault: (fetch or mem stage?)
• Save state after last instruction before exception completes
(PC, regs)
• Fetch from interrupt vector table
How Much ILP is There?
ALU Operation GOOD, Branch
BAD
Expected Number of Branches
Between Mispredicts
E(X) ~ 1/(1-p)
E.g., p = 95%, E(X) ~ 20 brs, 100-ish insts
How Accurate are Branch Predictors?
Download