slide

advertisement
Compiler-Based Register Name
Adjustment for Low-Power
Embedded Processors
Peter Petrov; Alex Orailoglu;
ICCAD’03
Agenda



Introduction
Mathematical Formulation
Heuristic Solutions For RNA




Register PermuTation (RPT)
Register PerturBation (RPB)
Experimental Results
Conclusions
2/19
Introduction



Objective: Low-Power
Key Point: Reduce bit transition activity
on the register index streams.
Concept: Register Name Adjustment
(RNA)
3/19
Example
add r3, r2, r4
011 010 100
sub r6, r3, r5
110 011 101
sub r3, r2, r6
011 010 110
mul r4, r4, r5
100 100 101
Total Bit Transitions:
7 + 4 + 5 = 16
add r6,
110
sub r7,
111
sub r6,
110
mul r4,
100
r2, r4
010 100
r6, r5
110 101
r2, r7
010 111
r4, r5
100 101
3 + 4 + 3 = 10
4/19
Agenda



Introduction
Mathematical Formulation
Heuristic Solutions For RNA




Register PermuTation (RPT)
Register PerturBation (RPB)
Experimental Results
Conclusions
5/19
Cost Function
n 1
cost   f c M Pi ,l , M Pi 1,l   C
3
l 1 i 1



fc(rega, regb): the hamming distance
between rega and regb.
l: the lth column in an instruction.
M(Pi, j): a bijective mapping function
from the original reg Pi, j to a new reg
index
6/19
Literals



Literals: unchangeable field in an instruction
such as an opcode or immediate
oprand.
L(i, j): to record the literal positions.

 M Pi , j 
M ' 
P

 i, j
if Li, j = 0
if Li, j = 1
7/19
Example
ld
add
add
mul
st
r5, (r1) 0
r3, r2, r5
r4, r3, r2
P=
r3, r4, r3
r3, r7 (10)
(v3,
(v2,
(v5,
(v3,
v4)
v3)
v3)
v3)
–
–
–
–
3
2
1
1
v5 v1 0
v3 v2 v5
v4 v3 v2
v3 v4 v3
v3 v7 10
(v4, v7) – 1
(v5, v2) – 1
( 0, v5) – 1
(10, v3) – 1
L=
0
0
0
0
0
0
0
0
0
0
1
0
0
0
1
(v1, v2) – 1
8/19
Agenda



Introduction
Mathematical Formulation
Heuristic Solutions For RNA




Register PermuTation (RPT)
Register PerturBation (RPB)
Experimental Results
Conclusions
9/19
Flow
RPB: Max the distribution skew of register pair occurrences
Select Vi and Vj that maximize f(eij) + f(eji)
Pick names for Vi and Vj and compute the cost
All unassigned indices tried?
No
Yes
Brute-Force
TimeConsuming
Name Vi and Vj with min cost
All registers named?
No
Yes
Finish
10/19
Cost Function of RPT
Cij =
 H ck , ci    H ci , ck  
kLi
kRi , kI
 H ck , c j    H c j , ck  
 f e   f e H ci , c j 
kL j
kR j , kI
ij
ji
Literali
…
eij
Vi
Regi
…
Literalj
…
eji
Vj
Regj
…
11/19
Register PertuBation


Number of higher utilization frequency↓
Performance↑
Number of self transition↑ Performance↑
12/19
Cost Function of RPB
x   


2

2
D
Dˆ 
NP
N
D: the number of
self-transitions
Maximize
maximize
Doesσto
larger
σ imply
the distribution
skew of
larger skewness?
register pair occurrences
C0  Dˆ  1   ˆ
13/19
Register PertuBation

Commutativity Transformation
Question: would
the data
r1  r2, r3
r1 r2, r3
r4  r1, r2 dependency
r4 r2, r1
increase?
Note: these instructions must be commutable

Dead Register Reassignment
r1  r2, r3
r4  r1, r2
r2  r3, r4
r1 r2, r3
r2 r1, r2
r2 r3, r2
Note: r4 must be dead after the third instruction
14/19
Dead Register Reassignment
r1
1
r2
2
4
r3
3
r1
3
1
4
6
5
r2
r3
2
6
5
Self-Transition
7
7
8
8
15/19
Agenda



Introduction
Mathematical Formulation
Heuristic Solutions For RNA




Register PermuTation (RPT)
Register PerturBation (RPB)
Experimental Results
Conclusions
16/19
Experimental Results
ˆ  1   ˆ
C0  D
RPT
Circuit
Total
Total
fdct
ej
RPB
70
58
73,837 63,169
Impr%
18.09
λ(0.0)
47
λ(0.25)
46
λ(0.5)
46
λ(0.75)
46
λ(1.0)
Impr%
46
34.55
14.45 49,203 48,933 48,934 48,934 45,224
38.75
41.41
mmul
7,613
6,463
15.11
4,710
Does larger σ imply
4,460 larger
4,460skewness?
4,460 4,593
tri
5,929
5,400
8.92
3,490
3,489
3,489
3,489
3,335
43.76
sor
1,440
1,142
20.69
1,004
1,003
1,043
1,043
1,004
30.30
adpcm_e 20,513 15,338
25.23 15,897 15,144 15,144 15,144 14,750
28.10
adpcm_d 17,212 13,689
20.46 13,393 12,655 12,655 12,655 11,404
33.74
17/19
Agenda



Introduction
Mathematical Formulation
Heuristic Solutions For RNA




Register PermuTation (RPT)
Register PerturBation (RPB)
Experimental Results
Conclusions
18/19
Conclusions



Minimize the bit transitions , reduce the
power consumption.
RPT improves up to 25%.
RPB improves up to 44%.
19/19
Download