CS 201 Compiler Construction Lecture 13 Instruction Scheduling: Trace Scheduler 1 Instruction Scheduling Modern processors can exploit Instruction Level Parallelism (ILP) by simultaneously executing multiple instructions. Instruction scheduling influences effectiveness with which ILP is exploited. Pipelined processors (e.g., ARM): reordering of instructions avoids delays due hazards. EPIC/VLIW processors (e.g. Itanium): a single long instruction is packed with multiple operations (conventional instructions) that can be simultaneously executed. 2 Compiler Support Analyze dependences and rearrange the order of instructions, i.e. perform instruction scheduling. Pipelined: limited amount of ILP is required -can be uncovered by reordering instructions within each basic block. EPIC/VLIW: much more ILP is required -can be uncovered by examining code from multiple basic blocks. 3 Compiler Support Two techniques that go beyond basic block boundaries to uncover ILP: (Acyclic Schedulers) Trace Scheduling: examines a trace – a sequence of basic blocks along an acyclic program path; instruction scheduling can result in movement of instructions across basic block boundaries. (Cyclic Schedulers) Software Pipelining: examines basic blocks corresponding to consecutive loop iterations; instruction scheduling can result in movement of instructions across loop iterations. 4 Trace Scheduling A trace is a sequence of basic blocks that does not extend across loop boundaries. • Select a trace • Determine the instruction schedule for the trace • Introduce compensation code to preserve program semantics • Repeat the above steps till some part of the program is yet to be scheduled 5 Trace Selection Selection of traces is extremely important for overall performance – traces should represent paths that are executed frequently. A fast instruction schedule for one path is obtained at the expense of a slower schedule for the other path due to speculative code motion. 6 Picking Traces O – operation/instruction Count(o) – number of times o is expected to be executed during an entire program run. Prob(e) – probability that an edge e will be executed -- important for conditional branches. Count(e) = Count(branch) x Prob(e) o Counts are estimated using profiling – measure counts by running the program on a representative input. 7 Algorithm for Trace Construction 1. Pick an operation with the largest execution count as the seed of the trace. 2. Grow the trace backward from the seed. 3. Grow the trace forward from the seed. Given that p is in the trace, include s in the trace iff: 1.Of all edges leaving p, e has the largest execution count. 2.Of all edges entering s, e has the highest execution count. Same approach taken to grow the trace backward. 8 Algorithm Contd.. Trace stops growing forward when: Count(e1) < count(e2) Premature termination of trace can occur in the above algorithm. To prevent this, a slight modification is required. 9 Algorithm Contd.. Lets say A-B-C-D has been included in the current trace. Count(D-E) > Count(D-F) => add E Count(C-E) > Count(D-E) => do not add E Premature termination occurs because the trace that can include CE can no longer be formed because C is already in the current trace. Modification: consider only edges P-E st P is not already part of he current trace.10 Algorithm Contd.. Trace cannot cross loop boundaries: • if the edge encountered is a loop back edge; or • if edge enters into a loop then stop growing the trace. 1 & 2 cannot be placed in the same trace because the edge directly connecting them is a loop back edge and edges indirectly connecting them cross loop boundaries. 11 Instruction Scheduling Construct a DAG for the selected trace. Generate an instruction schedule using a scheduling heuristic: list scheduling with critical path first. Following generation of the instruction schedule introduction of compensation code may be required to preserve program semantics. 12 Compensation Code Consider movement of instructions across basic block boundaries, i.e. past splits and merges in the control flow graph. 1. Movement of a statement past/below a Split: 13 Compensation Code Contd.. 2. Movement of a statement above a Join: 14 Compensation Code Contd.. 3. Movement of a statement above a Split: No compensation code introduced – speculation. Note that i<-i+2 can be moved above spilt if i is dead along the off-trace path. 15 Compensation Code Contd.. 4. Movement of a statement below a Join: This case will not arise assuming dead code has been removed. 16 Compensation Code Contd.. 5. Movement of a branch across a split. 17 Compensation Code Contd.. 6. Movement of a branch above a join. 18 Compensation Code Contd.. 6. Movement of a branch above a join. 19 Compensation Code Contd.. 7. Packing multiple branches in a long instruction. 20 Code Explosion 21 Code Explosion Contd.. 22 Building a DAG for Scheduling DAG contains the following edges: 1.Write-After-Read data dependence 2.Write-After-Write data dependence 3.Read-After-Write data dependence 4.Conditional jumps: introduce write-afterconditional-read edge between IF e & x=c+d to prevent movement of x=c+d above IF e. 23 Building a DAG Contd.. 5. Condition jumps: – – Introduce off-live edge between x=a+b nd IF e. This edge does not constrain movement past IF e; it indicates that if x=a+b is moved past IF e then it can be eliminated from the trace but a copy must be placed along the off-trace path. 24