Branches Daniel Ángel Jiménez Departments of Computer Science UT San Antonio & Rutgers

advertisement
Branches
Daniel Ángel Jiménez
Departments of Computer Science
UT San Antonio & Rutgers
About Me

Born in Fort Hood, Texas in 1969 (~80 miles north on IH-35)



Moved to San Antonio, Texas in 1973 (~80 miles south on IH-35)




Ph.D. UT Austin, 2002
Moved to New Jersey in 2002, New York 2003


Non-tenure-track faculty, UTHSCSA
Moved to Austin in 1999


Started Ph.D. program at UT Austin
Moved back to San Antonio in 1996


B.S. at UTSA, 1992
M.S. at UTSA, 1994
Moved to San Marcos, Texas in 1995 (~30 miles south on IH-35)


Dad from Mexico, Mom from Texas
Lived in Temple, Texas
Asst. Professor, Rutgers
Sabbatical in Barcelona, Spain in 2005
Back to San Antonio in 2007
 Associate Professor, UTSA
 Mostly for the breakfast tacos
2
More about me

Always liked computer programming


Fortunate sequence of mentors guided me into my career







First computer was Tandy Color Computer in 1984
Mom – Education is important (didn’t believe her at the time)
Neal Wagner – theory is exciting
Hugh Maynard – math is my friend
Betty Travis – Research Careers for Minority Scholars
Calvin Lin – perfect fit Ph.D. advisor
Uli Kremer – welcomed me into being a professor
Like taekwondo, piano, traveling, Spanish music

Current favorite band – Ojos de Brujo
3
This Talk

How an instruction is processed – pipelining

Kinds of branches

Branch prediction

Accuracy

Technique

Empirical properties of branches

How to handle branches

Conclusion
4
How an Instruction is Processed
Processing can be divided
into five stages:
Instruction fetch
Instruction decode
Execute
Memory access
Write back
5
Instruction-Level Parallelism
To speed up the process,
pipelining overlaps execution of
multiple instructions, exploiting
parallelism between instructions
Instruction fetch
Instruction decode
Execute
Memory access
Write back
6
Control Hazards: Branches
Conditional branches create a
problem for pipelining: the next
instruction can't be fetched until
the branch has executed, several
stages later.
Branch instruction
7
Pipelining with Branches
Branches cause bubbles in the pipeline,
where some stages are left idle.
Instruction fetch
Instruction decode
Execute
Memory access
Write back
Unresolved branch instruction
8
Branch Prediction
A branch predictor allows the processor
to speculatively fetch and execute
instructions down the predicted path.
Instruction fetch
Instruction decode
Execute
Memory access
Write back
Speculative execution
Branch predictors must be highly accurate to avoid mispredictions!
9
Kinds of Branches

Conditional




Unconditional


Targets still have to be predicted with BTB
Indirect



Very common, 1/4 to 1/10 of instructions
Must be predicted, can be hard to predict
Loops back edges with short fixed trip counts can be predicted perfectly
E.g. jumping through a table of addresses
Can be predicted, often just use BTB as predictor
Returns


Predicted with RAS
>99% possible if you avoid deep recursion
10
Branch Predictor Accuracy is Critical



The cost of a misprediction is proportional to pipeline depth
Predictor accuracy is more important for deeper pipelines
Need good branch predictor to feed core with right-path insts
Deeper pipelines allow higher clock
rates by decreasing the delay of each
pipeline stage

Decreasing misprediction rate from
9% to 4% results in 31% speedup for
32 stage pipeline

Today’s
pipelines have been scaled
back, but only temporarily…
Simulations with SimpleScalar/Alpha
11
Conditional Branch Prediction

Most predictors are based on 2level adaptive branch prediction
GAs – a common
type of predictor
[Yeh & Patt ’91]




Branch outcomes are shifted into a
history register, 1 for taken, 0 for
not taken
History bits and address bits
combine to index a pattern history
table (PHT) of 2-bit saturating
counters
Prediction is high bit of counter
Counter is incremented if branch is
taken, decremented if branch is not
taken
12
Characteristics of Branch Behavior

Branches tend to be highly biased




53% are strongly biased, taken at least 98% or at most 2% of the time
Remaining branches also exhibit weak biases
A few branches show no bias
Branch outcomes are highly correlated with past branch history
13
Important Facts about Branches

A taken branch is (often) more costly than an untaken branch


Mispredicted branches are very costly



Trace caches can mitigate this
Some mispredictions are more costly than others – how to exploit that?
Be aware of your machine’s indirect branch predictor

What’s the best way to compile dense switch/case stmts?

What to do about virtual dispatch?
Some ISAs have hint bits

These can help a lot if set correctly

But only if microarch uses them
14
What to do about mispredictions?



Capacity/Conflict

Too many program paths, collisions in tables

Solutions: use the hint bits or align branches

Unfortunately branch predictors are secret so options are limited
Branches not correlated with recent history

Split loops so trip counts are within history length

Data dependent branches with unfriendly distributions

Predicate if possible
Profile

Performance counters + tools such as VTune or Oprofile
15
Conclusion

Branches can have variable costs due primarily to prediction

Be aware of the implementation of branches

Profiling and ISA support for branches

Different causes and effects of mispredictions

Impact of mispredictions has crept up in recent years
16
The End
http://www.cs.utsa.edu/~dj
17
Related Compiler Work

Profile-guided code placement to improve instruction locality

Program restructuring for virtual memory [Hatfield & Gerald `71]

Reducing conflict misses in direct-mapped I$ [McFarling `88, `89]
Procedure placement [Petis & Hansen `90], [Gloy & Smith `99]



Transformations for reducing branch costs

Branch alignment [Calder & Grunwald `94],[Young et al. `97]

Software trace cache [Ramirez et al. `99]
Transformations for improving predictor accuracy




Static correlated branch prediction [Young & Smith `99]
Address adjustment [Chen & King `99]
Reverse-engineering branch predictors [Milenkovic et al. `04]
PHT partitioning [Jiménez `05]
18
Download