More Loop Optimizations Spring 2010

advertisement
Spring 2010
More Loop Optimizations
5
Outline
•
•
•
•
Strength Reduction
Loop Test Replacement
Loop Invariant Code Motion
SIMDization with SSE
Saman Amarasinghe
2
6.035
©MIT Fall 1998
Strength
g Reduction
• Replace
p
expensive
p
operations
p
in an expression
p
using cheaper ones
– Not a data-flow problem
p
– Algebraic simplification
– Example: a*4
a 4  a << 2
Saman Amarasinghe
3
6.035
©MIT Fall 1998
Strength
g Reduction
• In loops
p reduce expensive
p
operations
p
in
expressions in to cheaper ones by using the
ppreviously
y calculated value
Saman Amarasinghe
4
6.035
©MIT Fall 1998
Strength
g Reduction
t = 202
for j = 1 to 100
t = t - 2
A(j) = t
Saman Amarasinghe
5
6.035
©MIT Fall 1998
Strength
g Reduction
t = 202
for j = 1 to 100
t = t - 2
*( b
*(abase
+ 4*j) = t
Saman Amarasinghe
6
6.035
©MIT Fall 1998
Strength
g Reduction
t = 202
for j = 1 to 100
t = t - 2
*( b
*(abase
+ 4*j) = t
Basic Induction variable:
J
1
2
2,
= 1,
Induction variable 200 - 2*j
t
= 202,
200,
3
3,
4 …..
4,
198,
196, …..
Induction variable abase
abase+4*j:
4 j:
abase+4*j = abase+4, abase+8, abase+12, abase+14, ….
Saman Amarasinghe
7
6.035
©MIT Fall 1998
Strength
g Reduction
t = 202
for j = 1 to 100
t = t - 2
*( b
*(abase
+ 4*j)
4*j)
= t
Basic Induction variable:
J
22,
= 11,
1
1
33,
Induction variable 200 - 2*j
t
= 202,
200,
198,
1
44, …..
196, …..
Induction variable abase+4*j:
j:
abase+4*j = abase+4, abase+8, abase+12, abase+14, ….
Saman Amarasinghe
8
6.035
©MIT Fall 1998
Strength
g Reduction
t = 202
for j = 1 to 100
t = t - 2
*( b
*(abase
+ 4*j)
4*j)
= t
Basic Induction variable:
J
1
2
2,
= 1,
1
1
3
3,
1
4 …..
4,
Induction variable 200 - 2*j
t
= 202,
200,
198,
196, …..
-2
-2
-2
Induction variable abase
abase+4*j:
4 j:
abase+4*j = abase+4, abase+8, abase+12, abase+14, ….
Saman Amarasinghe
9
6.035
©MIT Fall 1998
Strength
g Reduction
t = 202
for j = 1 to 100
t = t - 2
*( b
*(abase
+ 4*j) = t
Basic Induction variable:
J
1
2
2,
= 1,
1
1
3
3,
1
4 …..
4,
Induction variable 200 - 2*j
t
= 202,
200,
198,
196, …..
-2
-2
-2
Induction variable abase
abase+4*j:
4 j:
abase+4*j = abase+4, abase+8, abase+12, abase+14, ….
4
4
4
Saman Amarasinghe
10
6.035
©MIT Fall 1998
Strength
g Reduction Algorithm
g
• For a dependent
p
induction variable k = a*jj + b
for j = 1 to 100
*(abase + 4*j) = j
Saman Amarasinghe
11
6.035
©MIT Fall 1998
Strength
g Reduction Algorithm
g
• For a dep
pendent induction variable k = a*jj+ b
• Add a pre-header k’ = a*jinit + b
t = abase + 4*1
for j = 1 to 100
*(abase + 4*j) = j
Saman Amarasinghe
12
6.035
©MIT Fall 1998
Strength
g Reduction Algorithm
g
• For a dependent
p
induction variable k = a*jj + b
• Add a pre-header k’ = a*jinit + b
• Next to j = j + c add kk’ = kk’ + a*c
t = abase
for j = 1
*(abase
t = t +
Saman Amarasinghe
+ 4*1
to 100
+ 4*j) = j
4
13
6.035
©MIT Fall 1998
Strength
g Reduction Algorithm
g
•
•
•
•
For a dependent
p
induction variable k = a*jj + b
Add a pre-header k’ = a*jinit + b
Next to j = j + c add kk’ = kk’ + a*c
Use k’ instead of k
t = abase + 4*1
for j = 1 to 100
*(t) = j
t = t + 4
Saman Amarasinghe
14
6.035
©MIT Fall 1998
Example
p
double A[256], B[256][256]
j = 1
while(j>100)
A[j] = B[j][j]
j = j + 2
Saman Amarasinghe
15
6.035
©MIT Fall 2006
Example
p
double A[256], B[256][256]
j = 1
while(j>100)
*(&A + 4*j) = *(&B + 4*(256*j + j))
j = j + 2
Saman Amarasinghe
16
6.035
©MIT Fall 2006
Example
p
double A[256], B[256][256]
j = 1
while(j>100)
*(&A + 4*j) = *(&B + 4*(256*j + j))
j = j + 2
Base Induction Variable:
Saman Amarasinghe
17
j
6.035
©MIT Fall 2006
Example
p
double A[256], B[256][256]
j = 1
while(j>100)
*(&A + 4*j) = *(&B + 4*(256*j + j))
j = j + 2
Base Induction Variable:
j
D
Dependent
d
Induction
I d i Variable:
V i bl a = &A + 4*j
Saman Amarasinghe
18
6.035
©MIT Fall 2006
Example
p
double A[256], B[256][256]
j = 1
a = &A + 4
while(j>100)
*(&A + 4*j) = *(&B + 4*(256*j + j))
j = j + 2
Base Induction Variable:
j
D
Dependent
d
Induction
I d i Variable:
V i bl a = &A + 4*j
Saman Amarasinghe
19
6.035
©MIT Fall 2006
Example
p
double A[256], B[256][256]
j = 1
a = &A + 4
while(j>100)
*(&A + 4*j) = *(&B + 4*(256*j + j))
j = j + 2
a = a + 8
Base Induction Variable: j
Dependent Ind
Induction
ction V
Variable:
a iable a = &A + 4*j
Saman Amarasinghe
20
6.035
©MIT Fall 2006
Example
p
double A[256], B[256][256]
j = 1
a = &A + 4
while(j>100)
*a = *(&B + 4*(256*j + j))
j = j + 2
a = a + 8
Base Induction Variable: j
Dependent Ind
Induction
ction V
Variable:
a iable a = &A + 4*j
Saman Amarasinghe
21
6.035
©MIT Fall 2006
Example
p
double A[256], B[256][256]
j = 1
a = &A + 4
while(j>100)
*a = *(&B + 4*(256*j + j))
j = j + 2
a = a + 8
Base IInduction
B
d ti V
Variable:
i bl
j
Dependent Induction Variable: b = &B +
4*257*j
Saman Amarasinghe
22
6.035
©MIT Fall 2006
Example
p
double A[256], B[256][256]
j = 1
a = &A + 4
b = &B + 1028
while(j>100)
*a = *(&B + 4*(256*j + j))
j = j + 2
a = a + 8
Base IInduction
B
d ti V
Variable:
i bl
j
Dependent Induction Variable: b = &B +
4*257*j
Saman Amarasinghe
23
6.035
©MIT Fall 2006
Example
p
double A[256], B[256][256]
j = 1
a = &A + 4
b = &B + 1028
while(j>100)
*a = *(&B + 4*(256*j + j))
j = j + 2
a = a + 8
b = b + 2056
Base Induction Variable: j
Dependent Induction Variable: b = &B +
4*257*j
4
257 j
Saman Amarasinghe
24
6.035
©MIT Fall 2006
Example
p
double A[256], B[256][256]
j = 1
a = &A + 4
b = &B + 1028
while(j>100)
*a = *b
j = j + 2
a = a + 8
b = b + 2056
Base Induction Variable: j
Dependent Induction Variable: b = &B +
4*257*j
4
257 j
Saman Amarasinghe
25
6.035
©MIT Fall 2006
Example
p
double A[256], B[256][256]
j = 1
a = &A + 4
b = &B + 1028
while(j>100)
*a = *b
j = j + 2
a = a + 8
b = b + 2056
Saman Amarasinghe
26
6.035
©MIT Fall 2006
5
Outline
•
•
•
•
Strength Reduction
Loop Test Replacement
Loop Invariant Code Motion
SIMDization with SSE
Saman Amarasinghe
27
6.035
©MIT Fall 1998
Loop
pTest Replacement
p
• Eliminate basic induction variable used only
y
for calculating other induction variables
Saman Amarasinghe
28
6.035
©MIT Fall 1998
Loop
p Test Replacement
p
• Eliminate basic induction variable used only
y
for calculating other induction variables
double A[256], B[256][256]
j = 1
while(j>100)
A[j] = B[j][j]
Saman Amarasinghe
29
6.035
©MIT Fall 1998
Loop
pTest Replacement
p
• Eliminate basic induction variable used only
y
for calculating other induction variables
double A[256], B[256][256]
j = 1
a = &A + 4
b = &B + 1028
while(j>100)
*a = *b
j = j + 2
a = a + 8
b = b + 2056
Saman Amarasinghe
30
6.035
©MIT Fall 1998
Loop
pTest Replacement
p
• Eliminate basic induction variable used only
y
for calculating other induction variables
double A[256], B[256][256]
j = 1
a = &A + 4
• J is only used for the loop bound
b = &B + 1028
while(j>100)
while(j>100)
*a = *b
j = j + 2
a = a + 8
b = b + 2056
Saman Amarasinghe
31
6.035
©MIT Fall 1998
Loop
pTest Replacement
p
• Eliminate basic induction variable used only
y
for calculating other induction variables
double A[256], B[256][256]
j = 1
a = &A + 4
• J is only used for the loop bound
b = &B + 1028
• Use a dependent IV (a or b)
while(j>100)
*a = *b
j = j + 2
a = a + 8
b = b + 2056
Saman Amarasinghe
32
6.035
©MIT Fall 1998
Loop
pTest Replacement
p
• Eliminate basic induction variable used only
y
for calculating other induction variables
double A[256], B[256][256]
j = 1
a = &A + 4
• J is only used for the loop bound
b = &B + 1028
• Use a dependent IV (a or b)
while(j>100)
• Lets choose a
*a = *b
j = j + 2
a = a + 8
b = b + 2056
Saman Amarasinghe
33
6.035
©MIT Fall 1998
Loop
pTest Replacement
p
• Eliminate basic induction variable used only
y
for calculating other induction variables
double A[256], B[256][256]
j = 1
a = &A + 4
• J is only used for the loop bound
b = &B + 1028
• Use a dependent IV (a or b)
while(j>100)
• Lets choose a
*a = *b
j = j + 2
j > 100  a > &A + 800
a = a + 8
b = b + 2056
Saman Amarasinghe
34
6.035
©MIT Fall 1998
Loop
pTest Replacement
p
• Eliminate basic induction variable used only
y
for calculating other induction variables
double A[256], B[256][256]
j = 1
a = &A + 4
• J is only used for the loop bound
b = &B + 1028
while(a>&A+800) • Use a dependent IV (a or b)
• Lets choose a
*a = *b
j = j + 2
j > 100  a > &A + 800
a = a + 8
• Replace the loop condition
b = b + 2056
Saman Amarasinghe
35
6.035
©MIT Fall 1998
Loop
pTest Replacement
p
• Eliminate basic induction variable used only
y
for calculating other induction variables
double A[256], B[256][256]
a = &A + 4
b = &B + 1028
while(a>&A+800)
*a = *b
• J is only used for the loop bound
• Use a dependent IV (a or b)
• Lets choose a
j > 100  a > &A + 800
a = a + 8
• Replace the loop condition
b = b + 2056 • Get rid of j
Saman Amarasinghe
36
6.035 ©MIT Fall 1998
Loop
p Test Replacement
p
• Eliminate basic induction variable used only
y
for calculating other induction variables
double A[256], B[256][256]
a = &A + 4
b = &B + 1028
while(a>&A+800)
*a = *b
a = a + 8
b = b + 2056
Saman Amarasinghe
37
6.035
©MIT Fall 1998
Loop
p Test Replacement
p
Algorithm
g
• If basic induction variable J is only used for
calculating other induction variables
• Select an induction variable k in the family of J
(K = a*J + b)
• Replace a comparison such as
if (J > X) goto L1
by
if(K’ > a*X + b) goto L1
if(K’ < a*X + b) goto L1
if a is positive
if a is negative
• If J is live at any exit from loop, recompute
J = (K’ - b)/a
Saman Amarasinghe
38
6.035
©MIT Fall 1998
5
Outline
•
•
•
•
Strength Reduction
Loop Test Replacement
Loop Invariant Code Motion
SIMDization with SSE
Saman Amarasinghe
39
6.035
©MIT Fall 1998
Loop
p Invariant Code Motion
• If a computation
p
produces
p
the same value in
every loop iteration, move it out of the loop
• Same idea as with induction variables
– Variables not updated in the loop are loop invariant
– Expressions of loop invariant variables are loop
invariant
– Variables assigned only loop invariant expressions
are loop invariant
Saman Amarasinghe
40
6.035
©MIT Fall 1998
Loop
p Invariant Code Motion
• If a computation
p
produces
p
the same value in
every loop iteration, move it out of the loop
for i = 1 to N
x = x + 1
for j = 1 to N
a(i,j) = 100*N + 10*i + j + x
Saman Amarasinghe
41
6.035
©MIT Fall 1998
Loop
p Invariant Code Motion
• If a computation
p
produces
p
the same value in
every loop iteration, move it out of the loop
for i = 1 to N
x = x + 1
for j = 1 to N
a(i,j) = 100*N + 10*i + j + x
Saman Amarasinghe
42
6.035
©MIT Fall 1998
Loop
p Invariant Code Motion
• If a comp
putation produces the same value in every loop iteration, move it out of the loop
t1 = 100*N
for i = 1 to N
x = x + 1
for j = 1 to N
a(i,j) = 100*N + 10*i + j + x
Saman Amarasinghe
43
6.035
©MIT Fall 1998
Loop
p Invariant Code Motion
• If a computation
p
produces
p
the same value in
every loop iteration, move it out of the loop
t1 = 100*N
for i = 1 to N
x = x + 1
for j = 1 to N
a(i,j) = t1 + 10*i + j + x
Saman Amarasinghe
44
6.035
©MIT Fall 1998
Loop
p Invariant Code Motion
• If a computation
p
produces
p
the same value in
every loop iteration, move it out of the loop
t1 = 100*N
for i = 1 to N
x = x + 1
for j = 1 to N
a(i,j) = t1 + 10*i + j + x
Saman Amarasinghe
45
6.035
©MIT Fall 1998
Loop
p Invariant Code Motion
• If a comp
putation produces the same value in every loop iteration, move it out of the loop
t1 = 100*N
for i = 1 to N
x = x + 1 for j = 1 to N
a(i,j) = t1 + 10*i + j + x
Saman Amarasinghe
46
6.035
©MIT Fall 1998
Loop
p Invariant Code Motion
• If a comp
putation produces the same value in every loop iteration, move it out of the loop
t1 = 100*N
for i = 1 to N
x = x + 1 t2 = t1 + 10*i + x
for j = 1 to N
a(i,j)
( ,j) = t1 + 10*i + j + x
Saman Amarasinghe
47
6.035
©MIT Fall 1998
Loop
p Invariant Code Motion
• If a computation
p
produces
p
the same value in
every loop iteration, move it out of the loop
t1 = 100*N
for i = 1 to N
x = x + 1
t2 = t1 + 10*i + x
for j = 1 to N
a(i,j)
( ,j) = t2 + j
Saman Amarasinghe
48
6.035
©MIT Fall 1998
5
Outline
•
•
•
•
Strength Reduction
Loop Test Replacement
Loop Invariant Code Motion
SIMDization with SSE
Saman Amarasinghe
49
6.035
©MIT Fall 1998
SIMD Through
g SSE extensions
• Single
g Instruction Multiple
p Data
– Compute multiple identical operations in a single
instruction
– Exploit fine grained parallelism
Saman Amarasinghe
50
6.035
©MIT Fall 1998
SSE Registers
g
• 16 128-bit registers:
g
%xmm0 to %xmm16
– Multiple interpretations for each register
– Each arithmetic operation
p
comes in multiple
p versions
128 bit Double Quadword
64 bit Quadword
32 bit Doubleword
64 bit Quadword
32 bit Doubleword
16 bit word 16 bit word 16 bit word 16 bit word
Saman Amarasinghe
51
32 bit Doubleword
32 bit Doubleword
16 bit word 16 bit word 16 bit word 16 bit word
6.035
©MIT Fall 1998
Data Transfer
• Moving Data From Memory or xmm registers
– MOVDQA OP1, OP2
Move aligned Double Quadword
• Can read or write to memory in 128 bit chunks
• If OP1 or OP2 are registers,
registers they must be xmm registers
• Memory locations in OP1 or OP2 must be multiples of 16
– MOVDQU OP1, OP2 Move unaligned Double Quadword
• Same as MOVDQA but
• memory addresses don’t have to be multiples of 16
Saman Amarasinghe
52
6.035
©MIT Fall 1998
Data Transfer
• Moving Data From 64-bit registers
– MOVQ OP1, OP2
Move Double Quadword
• Can move from 64 bit register to xmm register or viceversa
• Writes to/Reads from the lower 64 bits of xmm register
• Can also be used to read a 64-bit chunk to/from memory
MOVQ %r11, %xmm1
%r11
%xmm1
128
Saman Amarasinghe
53
0
6.035
©MIT Fall 1998
Data Reorderingg
• Unpack
p
and Interleave
– PUNPCKLDQ
Low Doublewords
0
128
0
0
128
Saman Amarasinghe
128
54
6.035
©MIT Fall 1998
Data Reorderingg
• Unpack
p
and Interleave
– PUNPCKLQDQ
Low Quadwords
0
128
0
0
128
Saman Amarasinghe
128
55
6.035
©MIT Fall 1998
Arithmetic
• Arithmetic operations come in many flavors
– based on the datatype of the register
– specified in the instruction sufix
• Example: Addition
– PADDQ
– PADDD
– PADDW
Add 64-bit Quadwords
Add 32
32-bit
bi Doublewords
D bl
d
Add 16-bit words
• Example:
E ample: Subtraction
S btraction
– PSUBQ
– PSUBD
– PSUBW
Saman Amarasinghe
Subtract 64-bit Quadwords
Subtract 32
32-bit
-bit Doublewords
Subtract 16-bit words
56
6.035
©MIT Fall 1998
Putting
g It All Together
g
• Source Code
for i = 1 to N
A[i] = A[i] * b
• After Unrolling
loop:
mov
mov
imul
imul
mov
sub
mov
sub
jz
Saman Amarasinghe
(%rdi,%rax), %r10
(%rdi,%rbx), %rcx
%r11, %r10
%r11, %rcx
%r10, (%rdi,%rax)
$8, %rax
%rcx, (%rdi,%rbx)
$8, %rbx
loop
57
Readingg from consecutive
addresses
Mult by a loop invariant
Writing to consecutive
addresses
6.035
©MIT Fall 1998
Putting
g it all together
g
Original Version
loop:
p
mov
mov
imul
imul
mov
sub
mov
sub
jz
Saman Amarasinghe
SSE Version
(%rdi,%rax), %r10
(%rdi,%rbx), %rcx
%r11, %r10
%r11, %rcx
%r11
%rcx
%r10, (%rdi,%rax)
$8, %rax
%rcx, (%rdi,%rbx) $8, %rbx
loop
58
movq
punpckldq
loop:
movdqa
d
pmuludq
movdqa
sub
jz
%r11, %xmm2
%xmm2, %xmm2
(% di,%rax),
(%rdi
%
) %
%xmm0
0
%xmm2, %xmm0
%xmm0, (%rdi, %rax)
$8, %rax
loop
6.035
©MIT Fall 1998
Putting
g it all together
g
SSE Version
movq
punpckldq
loop:
movdqa
pmuludq
movdqa
sub
jz
Saman Amarasinghe
%r11, %xmm2
%xmm2, %xmm2
Populate xmm2 with copies of %r11
(%rdi,%rax), %xmm0
%xmm2, %xmm0
%xmm0, (%rdi, %rax)
$8, %
$8,
%rax
a
Only one
loop
59
index is needed
6.035
©MIT Fall 1998
Conditions for SIMDization
• Consecutive iterations reading and writing from
consecutive locations
• Consecutive iterations are independent of each other
• The easiest thing is to pattern match at the basic block
level after unrolling loops
Saman Amarasinghe
60
6.035
©MIT Fall 1998
MIT OpenCourseWare
http://ocw.mit.edu
6.035 Computer Language Engineering
Spring 2010
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
Download