Adapting Compilation Techniques to Enhance the Packing of Instructions into Registers

advertisement
Adapting Compilation Techniques to Enhance
the Packing of Instructions into Registers
Stephen Hines, David Whalley and Gary Tyson
Computer Science Dept.
Florida State University
October 23, 2006
Instruction Packing


Store frequently occurring instructions as
specified by the compiler in a small, lowpower Instruction Register File (IRF)
Allow multiple instruction fetches from the
IRF by packing instruction references
together



Tightly packed – multiple IRF references
Loosely packed – piggybacks an IRF reference
onto an existing instruction
Facilitate parameterization of some
instructions using an Immediate Table (IMM)
Adapting Compilation Techniques to Enhance the Packing of Instructions into Registers
2/17
Execution of IRF Instructions
Instruction Fetch Stage
Instruction Cache
packed instruction
IF/ID
insn1
insn2
insn3
insn4
packed instruction
PC
First Half of Instruction Decode Stage
IRF
insn2
insn4
insn1
insn3
IRWP
IMM
imm3
To Instruction
Decoder
imm3
Executing a Tightly Packed Param4c Instruction
Adapting Compilation Techniques to Enhance the Packing of Instructions into Registers
3/17
Outline



Introduction
Improved Promotion to the IRF
Compiler Optimizations





Instruction Selection
Register Re-assignment
Instruction Scheduling
Experimental Evaluation
Conclusions & Future Work
Adapting Compilation Techniques to Enhance the Packing of Instructions into Registers
4/17
Improved Promotion to the IRF


Different classes of instructions can consume 1 – 5 slots
More accurately model the benefits of promoting from one
class of instruction to another


Original IRF papers did not promote multiple I-type instructions
with different default immediate values
addi $3, $3, 4 and addi $3, $3, 1 would not both reside in the
IRF, no matter how frequently they occurred
Adapting Compilation Techniques to Enhance the Packing of Instructions into Registers
5/17
Mixed Profiling




Static profiling is best for decreasing code
size
Dynamic profiling is best for reducing
energy consumption
Can simultaneously weight static and
dynamic profile data to obtain a mixed result
that has both good code compression and
reduced energy consumption
Can obtain most of the benefits of individual
static/dynamic profiling
Adapting Compilation Techniques to Enhance the Packing of Instructions into Registers
6/17
Compiler Optimizations

Instruction Selection


Register Re-assignment


Choose beneficial encodings for increasing redundancy
Attempts to rename registers such that instructions can be
accessed via IRF
Instruction Scheduling


Intra-block – focus on reordering instructions so that
dense packs are formed (both tight and loose)
Inter-block – attempt to move instructions between blocks
to fill up packs ending with branches/jumps


Code duplication
Predication
Adapting Compilation Techniques to Enhance the Packing of Instructions into Registers
7/17
Intra-block Instruction Scheduling
Without Instruction
Scheduling
With Instruction
Scheduling
3
1 2
1
1
2
2 4 5
4’
53
4
5
3
4’
4
1
2
4
4’
5
5
Instruction
Dependence DAG
Adapting Compilation Techniques to Enhance the Packing of Instructions into Registers
8/17
Code Duplication to Reduce Code Size
•••
W
X
5 c 5’
Y
a b
Z
1 3 4 3’ 4’
2
3
3’
4
4’
Adapting Compilation Techniques to Enhance the Packing of Instructions into Registers
6 slots is too many
to fit in a single
packed instruction …
but we can duplicate
a single instruction …
resulting in the ability
to pack the remaining
5 slots together.
9/17
Predication – Forward Branches
•••
X
Cond Branch a
Fall-through
Instructions packed
after forward branches
will only be executed
when the branch is
not taken
Y
1 2
2
3 3
4 2’
b 4’
2’
4’
4
Z
Branch
taken path
•••
Adapting Compilation Techniques to Enhance the Packing of Instructions into Registers
10/17
Predication – Backward Branches
•••
a b c
2’
1 2
Branch
d e
f
Instructions packed
after backward
branches will only be
executed when the
branch is taken
Branch
offset
•••
Adapting Compilation Techniques to Enhance the Packing of Instructions into Registers
11/17
Predication Advantages with IRF



IRF facilitates a form of predication for the MIPS –
a baseline architecture that traditionally does not
support predication
No need to waste instruction encoding space
specifying predicate bits for most/all instructions
(even ARM traded away general predication for
reducing code size with Thumb and Thumb2)
No need to fetch, decode and possibly execute
instructions that are annulled after the branch within
a pack (reducing energy consumption and
execution time)
Adapting Compilation Techniques to Enhance the Packing of Instructions into Registers
12/17
Experimental Evaluation


MiBench embedded benchmark suite – 6
categories representing common tasks for various
domains
SimpleScalar MIPS/PISA architectural simulator



Out-of-order, single issue embedded machine with 8KB 4way set associative L1 instruction and data caches and
128-entry bimodal branch predictor
Wattch/Cacti extensions for modeling energy
consumption (inactive portions of pipeline only dissipate
10% of normal energy when using cc3 clock gating)
VPO – Very Portable Optimizer targeted for
SimpleScalar MIPS/PISA
Adapting Compilation Techniques to Enhance the Packing of Instructions into Registers
13/17
Energy Consumption
No optimizations
Promotion
Inst Selection
Reg Re-assign
Intra-sched
Inter-sched
100.0%
Total Energy
95.0%
90.0%
85.0%
80.0%
75.0%
70.0%
oti
m
o
t
Au
ve
er
um
s
n
Co
tw
Ne
ork
fice
Of
ri
cu
e
S
ty
m
om
c
le
Te
e
rag
e
Av
Benchmark Category
Adapting Compilation Techniques to Enhance the Packing of Instructions into Registers
14/17
Static Code Size
No optimizations
Promotion
Inst Selection
Reg Re-assign
Intra-sched
Inter-sched
97.5%
Static Code Size
92.5%
87.5%
82.5%
77.5%
72.5%
67.5%
62.5%
57.5%
oti
m
o
t
Au
ve
er
um
s
n
Co
tw
Ne
ork
fice
Of
ri
cu
e
S
ty
m
om
c
le
Te
e
rag
e
Av
Benchmark Category
Adapting Compilation Techniques to Enhance the Packing of Instructions into Registers
15/17
IRF Promotion with Mixed Profiling
Code Size
Optimized Code Size
Total Energy
Optimized Total Energy
Relative Measure (%)
97.5%
92.5%
87.5%
82.5%
77.5%
72.5%
67.5%
100/0
(Dynamic)
75/25
50/50
25/75
0/100
(Static)
Dynamic/Static Mixture
Adapting Compilation Techniques to Enhance the Packing of Instructions into Registers
16/17
Conclusions & Future Work





Compiler optimizations targeted specifically for IRF
can further reduce energy (12.2%15.8%), code
size (16.8%28.8%) and execution time
Unique transformation opportunities exist due to
IRF, such as code duplication for code size
reduction and predication
As processor designs become more idiosyncratic, it
is increasingly important to explore the possibility of
evolving existing compiler optimizations
Register targeting and loop unrolling should also be
explored with instruction packing
Enhanced parameterization techniques
Adapting Compilation Techniques to Enhance the Packing of Instructions into Registers
17/17
Adapting Compilation Techniques to Enhance the Packing of Instructions into Registers
18/17
Tightly Packed Instruction Format


New opcodes for this T-format of MISA instructions
Supports sequential execution of up to 5 RISA instructions
from the IRF


Unnecessary fields are padded with nop
Supports up to 2 parameters replacing instruction slots




Parameters can come from 32-entry IMM
Each IRF entry also retains a default immediate value as well
Branches use these 5 bits for displacements
R-type RISA instructions can use parameter to replace RD field
Adapting Compilation Techniques to Enhance the Packing of Instructions into Registers
19/17
MIPS Instruction Format Modifications

Creating Loosely Packed instructions


R-type: Removed shamt field and merged with rs
I-type: Shortened immediate values (16-bit  11bit)


Lui now uses 21-bit immediate values, hence no loose
packing
J-type: Unchanged
Adapting Compilation Techniques to Enhance the Packing of Instructions into Registers
20/17
Download