Branches Daniel Ángel Jiménez Departments of Computer Science UT San Antonio & Rutgers About Me Born in Fort Hood, Texas in 1969 (~80 miles north on IH-35) Moved to San Antonio, Texas in 1973 (~80 miles south on IH-35) Ph.D. UT Austin, 2002 Moved to New Jersey in 2002, New York 2003 Non-tenure-track faculty, UTHSCSA Moved to Austin in 1999 Started Ph.D. program at UT Austin Moved back to San Antonio in 1996 B.S. at UTSA, 1992 M.S. at UTSA, 1994 Moved to San Marcos, Texas in 1995 (~30 miles south on IH-35) Dad from Mexico, Mom from Texas Lived in Temple, Texas Asst. Professor, Rutgers Sabbatical in Barcelona, Spain in 2005 Back to San Antonio in 2007 Associate Professor, UTSA Mostly for the breakfast tacos 2 More about me Always liked computer programming Fortunate sequence of mentors guided me into my career First computer was Tandy Color Computer in 1984 Mom – Education is important (didn’t believe her at the time) Neal Wagner – theory is exciting Hugh Maynard – math is my friend Betty Travis – Research Careers for Minority Scholars Calvin Lin – perfect fit Ph.D. advisor Uli Kremer – welcomed me into being a professor Like taekwondo, piano, traveling, Spanish music Current favorite band – Ojos de Brujo 3 This Talk How an instruction is processed – pipelining Kinds of branches Branch prediction Accuracy Technique Empirical properties of branches How to handle branches Conclusion 4 How an Instruction is Processed Processing can be divided into five stages: Instruction fetch Instruction decode Execute Memory access Write back 5 Instruction-Level Parallelism To speed up the process, pipelining overlaps execution of multiple instructions, exploiting parallelism between instructions Instruction fetch Instruction decode Execute Memory access Write back 6 Control Hazards: Branches Conditional branches create a problem for pipelining: the next instruction can't be fetched until the branch has executed, several stages later. Branch instruction 7 Pipelining with Branches Branches cause bubbles in the pipeline, where some stages are left idle. Instruction fetch Instruction decode Execute Memory access Write back Unresolved branch instruction 8 Branch Prediction A branch predictor allows the processor to speculatively fetch and execute instructions down the predicted path. Instruction fetch Instruction decode Execute Memory access Write back Speculative execution Branch predictors must be highly accurate to avoid mispredictions! 9 Kinds of Branches Conditional Unconditional Targets still have to be predicted with BTB Indirect Very common, 1/4 to 1/10 of instructions Must be predicted, can be hard to predict Loops back edges with short fixed trip counts can be predicted perfectly E.g. jumping through a table of addresses Can be predicted, often just use BTB as predictor Returns Predicted with RAS >99% possible if you avoid deep recursion 10 Branch Predictor Accuracy is Critical The cost of a misprediction is proportional to pipeline depth Predictor accuracy is more important for deeper pipelines Need good branch predictor to feed core with right-path insts Deeper pipelines allow higher clock rates by decreasing the delay of each pipeline stage Decreasing misprediction rate from 9% to 4% results in 31% speedup for 32 stage pipeline Today’s pipelines have been scaled back, but only temporarily… Simulations with SimpleScalar/Alpha 11 Conditional Branch Prediction Most predictors are based on 2level adaptive branch prediction GAs – a common type of predictor [Yeh & Patt ’91] Branch outcomes are shifted into a history register, 1 for taken, 0 for not taken History bits and address bits combine to index a pattern history table (PHT) of 2-bit saturating counters Prediction is high bit of counter Counter is incremented if branch is taken, decremented if branch is not taken 12 Characteristics of Branch Behavior Branches tend to be highly biased 53% are strongly biased, taken at least 98% or at most 2% of the time Remaining branches also exhibit weak biases A few branches show no bias Branch outcomes are highly correlated with past branch history 13 Important Facts about Branches A taken branch is (often) more costly than an untaken branch Mispredicted branches are very costly Trace caches can mitigate this Some mispredictions are more costly than others – how to exploit that? Be aware of your machine’s indirect branch predictor What’s the best way to compile dense switch/case stmts? What to do about virtual dispatch? Some ISAs have hint bits These can help a lot if set correctly But only if microarch uses them 14 What to do about mispredictions? Capacity/Conflict Too many program paths, collisions in tables Solutions: use the hint bits or align branches Unfortunately branch predictors are secret so options are limited Branches not correlated with recent history Split loops so trip counts are within history length Data dependent branches with unfriendly distributions Predicate if possible Profile Performance counters + tools such as VTune or Oprofile 15 Conclusion Branches can have variable costs due primarily to prediction Be aware of the implementation of branches Profiling and ISA support for branches Different causes and effects of mispredictions Impact of mispredictions has crept up in recent years 16 The End http://www.cs.utsa.edu/~dj 17 Related Compiler Work Profile-guided code placement to improve instruction locality Program restructuring for virtual memory [Hatfield & Gerald `71] Reducing conflict misses in direct-mapped I$ [McFarling `88, `89] Procedure placement [Petis & Hansen `90], [Gloy & Smith `99] Transformations for reducing branch costs Branch alignment [Calder & Grunwald `94],[Young et al. `97] Software trace cache [Ramirez et al. `99] Transformations for improving predictor accuracy Static correlated branch prediction [Young & Smith `99] Address adjustment [Chen & King `99] Reverse-engineering branch predictors [Milenkovic et al. `04] PHT partitioning [Jiménez `05] 18