PPT

advertisement
CS 201
Compiler Construction
Software Pipelining:
Circular Scheduling
1
Motivation
Trace Scheduling uncovers ILP in acyclic segments of
code – another technique is needed to exploit ILP
across loop iterations.
1. Loop Unrolling: Unrolling a loop converts ILP
across loop iterations to ILP within a single iteration
that can be exploited using trace scheduling.
* drawback is growth in code size.
2.Software Pipelining:
* converts ILP across loop iterations to ILP
within a single iteration without significant growth in
code size.
2
Software Pipelining
2. Software Pipelining Contd:
* a single iteration of the transformed loop
contains a single occurrence of each instruction –
this is why code growth is less than unrolling.
* loop iteration so constructed brings instances of
statements from different loop iterations of the
original loop into the same loop iteration.
3
Software Pipelining Contd..
4
Software Pipelining Contd..
Ld1
Ld2
Add1
Ld3
Add2
St1
…..
Add3
St2
…..
……
St3
…..
……
…..
Ldn-2
……
…..
Ldn-1
Addn-
…..
Ldn
Addn-
Stn-2
Addn
Stn-1
2
1
Stn
Prologue
Loop i =2,n-1
Epilogue
Ld1
Ld2
Add1
Ldi+1
Addi
Sti-1
Addn
Stn-1
Stn
Prologue + Epilogue=2 iterations
Loop = n-2 iterations
5
Circular Scheduling
An algorithm for Software Pipelining that is suitable
for scalar architectures
–
–
Limited amount of ILP can be exploited
Limited number of registers are available
Assumption: register allocation has already been done
Approach:
• Identify idle slots in the instruction schedule and
try to fill them by propagating instructions across
loop iterations
• Continue to do the above as long as the schedule
continues to improve
• If register allocation needs to be modified to allow
instruction motion, then do so.
6
Circular Scheduling Contd..
Construct a DAG for the loop body.
Moving an instruction from later
iteration to earlier iteration
corresponds to moving an
instruction from top of the DAG
to the bottom of the DAG.
An instruction moved from top of
the loop to the bottom is called
a circled instruction.
If each instruction can only circle
once: circled instructions form
the prologue; remaining
instructions form the epilogue;
loop is executed N-1 times.
N iterations
7
Circular Scheduling Contd..
I1
I1
I2
I2
……
IN
Prologue
Circled instructions
Loop
N-1 Iterations
I3
……
Epilogue
Non-circled instrns
8
Circular Scheduling Contd..
Ramp-Up
Ramp-Down
Effect
……
Before
……
After
9
Circular Scheduling Contd..
for (i=0; i<N; i=i+1)
X[i] := X[i] + C
-- initialization
F8  C
R3  0
R2  N
-- loop body
Loop:
F4 0(R3)
R3R3+1
F6F4+F8
BNE R3,R2,Loop
<delay>
-1(R3)F6
10
Circular Scheduling Contd..
-- initialization
F8  C
R3  0
R2  N
-- loop body
Loop:
F4 0(R3)
R3R3+1
F6F4+F8
BNE R3,R2,Loop
<delay>
-1(R3)F6
-- initialization
F8  C
R3  0
R2  N
-- prologue
R3R3+1
BEQ R3,R2,Lend
F4-1(R3)
-- loop body
Loop:
F6F4+F8
F40(R3)
Circled
R3R3+1
BNE R3,R2,Loop
-2(R3)F6
-- epilogue
Lend:
F6F4+F8
<delay>
-2(R3)F6
instructions
11
Algorithm
1. Apply basic block scheduling to the loop; if no stalls
present, use the schedule ; otherwise continue.
2. If the loop has no procedure calls & if-statements then
perform circular scheduling; otherwise give up.
3. Select one of the root nodes of the DAG for cycling –
choose one on the longest path (simple heuristic).
4. Rebuild the DAG assuming recycling has been performed.
5. If no stalls are present, use current schedule else if
there are more stalls than before, use previous schedule
else repeat steps 3 & 4 to remove additional stalls.
6. Create prologue & epilogue; alter the number of times
the loop body is executed.
12
Register Renaming
Since register allocation is done prior to circular
scheduling, dependences due to register usage may
inhibit code motion.
Solution: Perform register renaming during
circular scheduling.
Def R1
Use R1
Use R1
Def R1
VS
Def R1
Use R1
Use R1
Def R2
Use R2
Use R1
13
Register Renaming Contd..
1. Identify registers that are not live at the beginning
and the end of the basic block: these registers form
the pool of temporary registers available for temporary
usage during renaming.
2. Ignore dependences due to reuse of registers during
building of the DAG.
3. Pick instruction.
– If instruction uses a temporary register replace
that register by a new register (from pool) that was
used when the Def corresponding to the Use was
processed. If this is the last use, then put the
register back in the available pool.
14
Register Renaming Contd..
–
–
If instruction defines a temporary register a new
register is chosen from the available pool of
registers.
Repeat above steps till the basic block has been
scheduled.
To avoid running out of registers, given two
candidate instructions, select first an
instruction that does not need a new register
or frees up a temporary register.
If renaming fails – give up and use previous
schedule.
15
Sample Problem
16
Contd.
17
Download