Induction Variables and Chains of Recurrences

advertisement
Induction Variable Analysis
with Chains of Recurrences

Brief tutorial on induction variable recognition





Past and present methods for IV detection
Chains of recurrences: why and how?
Analyzing pointer arithmetic in loops
Array dependence testing for loop restructuring
and vectorization
Results and conclusions
Intel 9/12/05
1
Induction Variable Recognition:
a Classic Compiler Problem
HLL
I = 0
do
…
I = I+1
…
while (…)
…
LDW R8,#0
CFG




…
ADD R8,#1

Detect induction variables
(IVs) in loops
The example loop has a basic
IV: a scalar integer variable
with one unconditional update
Values of derived IVs depend
on values of basic IVs
Loop analysis algorithms
detect IVs by analyzing back
edges to detect loops in IR
forms, e.g. AST, CFG, or SSA
Beware of aliases!
…
BNE L1
Intel 9/12/05
2
Most Loop Optimizations Rely
on Accurate IV Recognition





Loop strength reduction [Allen69, Aho86]
IV elimination [Lowry69, Kennedy81, Aho86]
IV substitution [Gerlek95, Haghighat96, Wolfe92]
Loop iteration bounds analysis
Pointer-to-array conversion and array recovery
[vanEngelen01b, Franke01]

Array dependence testing for loop restructuring
[Banerjee88, Blume94, Goff91, Maydan91, Muchnick97, Psarris03,
Pugh91, vanEngelen04, Wolfe92, Zima90] and others
Intel 9/12/05
3
A Classic Induction Variable
Recognition Algorithm [Aho86]

Input:
Loop L with reaching definition information
and loop-invariant information
Output: Triple (i,stride,init) for each IV
Find basic IVs: represent basic IV i by triple (i,stride,init)
2. Find derived IVs: search for variables k with single
assignment k = j  b where b is constant (loop invariant),
if j has triple (i,c,d) and  = * then k has triple (i,b*c, b*d)
else if j has triple (i,c,d) and  = + then k has triple (i,c, b+d)
(assuming no use of k before its def)
1.
Intel 9/12/05
4
IV Recognition for Loop
Optimization
Example loop
with IVs
I = 0
J = 1
while (I<N)
I = I+1
… = A[J]
J = J+2
K = 2*I
A[K] = …
endwhile
IV Triples
After strength
reduction
(I,1,0) basic
(J,2,1) basic
(I,2,2) derived
K = 0
I = 0
J = 1
while (I<N)
I = I+1
… = A[J]
J = J+2
K = K+2
A[K] = …
endwhile
Intel 9/12/05
After IV
elimination
K = 0
while (K<2*N)
… = A[K+1]
K = K+2
A[K] = …
endwhile
5
IV Recognition for Loop
Vectorization & Parallelization
Example loop
with IVs
After IV substitution (IVS) After parallelization
(note the affine indexes)
I = 0
J = 1
while (I<N)
I = I+1
… = A[J] IVS
J = J+2
K = 2*I
A[K] = …
endwhile
A[]
W R
A[2*i+1]
W R
W R
for i=0 to N-1
S1: … = A[2*i+1]
S2: A[2*i+2] = …
endfor
…
A[2*i+2]
forall (i=0,N-1)
… = A[2*i+1]
A[2*i+2] = …
Dep test
endforall
GCD test to solve dependence equation 2id - 2iu = -1
Since 2 does not divide 1 there is no data dependence.
Intel 9/12/05
6
IV Recognition isn’t Always
Trivial …

IVs that are linear (affine) but not handled by [Aho86]:
do
K = J+1
J = K+1
while (…)


do
K = 3
K = K+J
if (…)
J = K
else
J = J+3
endif
while (…)
It quickly gets more complicated with deps and flow
Note that transformations such as constant propagation,
forward substitution, and code motion can help
Intel 9/12/05
7
More Powerful IV Recognition
Methods

Linear IV recognition on SSA forms and FUD chains
[Cytron91, Wolfe92]



Linear and nonlinear IV recognition with
symbolic differencing [Haghighat95]
Linear and nonlinear IV recognition with
recurrence system solvers [Gerlek95]
Linear and nonlinear IV recognition with
chains of recurrences [vanEngelen01a]
Intel 9/12/05
8
A Method for Linear IV
Recognition on FUD Chains



Similar algorithms by [Cytron91, Wolfe92]
Factored use-def (FUD) chains are similar to single
static assignment (SSA) forms
Input: FUD chains
Output: Triple (i,stride,init) for each IV
Build depth-first spanning tree of operations
2. Start at -node in loop cycle to find basic IVs
3. Find derived IVs from other -nodes not in a cycle
1.
Intel 9/12/05
9
Example
I1 = 3
M1 = 0
do
I2 = (I1,I3)
J1 = (?,J3)
K1 = (?,K2)
L1 = (?,L2)
M2 = (M1,M3)
J2 = 3
I3 = I2+1
L2 = M2+1
M3 = L2+2
J3 = I3+J2
K2 = 2*J3
while (…)
Spanning
tree
I2 = (i,1,3)
L1 = (i,3,1)
M2 = (i,3,0)
Intel 9/12/05
J1 = (i,1,7)
K1 = (i,2,14)
10
Symbolic Differencing
do
x =
y =
z =
while
Use abstract interpretation to evaluate
loop iterations and construct symbolic
difference table of the IV values.
x+z
z+1
y+1
(…)
Iteration
x
1 x+z
diff
2 x+2z+2
z+2
3 x+3z+6
z+4
y
z
z+1
diff
z
diff
diff
z+3
2
z+2
2
2
z+5
2
z+4
2
From difference tables compute the
characteristic functions describing
the IV progressions.
Intel 9/12/05
x(i) = x0 + z0i + (i2-i)
y(i) = z0 + 2i + 1
z(i) = z0 + 2i
11
Symbolic Differencing: Oops
[vanEngelen01a] identified a serious problem with the
differencing method for IVs involving cyclic recurrence relations.
The inferred closed-form function might be incorrect when it is
assumed that the polynomial order of the IVs is bounded.
Quiz: guess the maximum polynomial
order of the characteristic functions of
the IVs shown in the loop on the right.
Intel 9/12/05
?
for i=0 to n
t = a
a = b
b = c+2*b-t+2
c = c+d
d = d+i
endfor
12
Outline


Brief tutorial on induction variable recognition
Chains of recurrences: why and how?
Chains of recurrences preliminaries [Zima92, vanEngelen01a]
 Chains of recurrences for IV recognition, loop analysis,
and optimization [vanEngelen01a, vanEngelen01b, vanEngelen04]




Analyzing pointer arithmetic in loops
Array dependence testing for loop restructuring and
vectorization
Results and conclusions
Intel 9/12/05
13
Preliminaries


A chain of recurrences (CR) represents a
polynomial or exponential function or mix
evaluated over a unit-distance grid [Zima92]
Basic form: {init, , stride}
Iteration {init, , stride}
f(i) = 2i+1 = {1,+,2}
f(i) = 2i = {1,*,2}
i = 0 init
1
1
i = 1 init  stride
3
2
i = 2 init  stride  stride
5
4
i = 3 init  stride  stride  stride 7
8
Intel 9/12/05
14
Chains of Recurrences:
General Formulation


The key idea is to represent a non-constant CR stride in
CR form itself, thereby forming a chain of recurrences
Example: f(i) = i2 = {0, +, s(i-1)} with s(i) = {1, +, 2}
Iteration {init, , s(i-1)}
s(i) = {1, +, 2}
f(i) = {0, +, s(i-1)}
i = 0 init
1
0
i = 1 init  s(0)
3
1
i = 2 init  s(0)  s(1)
5
4
i = 3 init  s(0)  s(1)  s(2)
7
9
Intel 9/12/05
15
Primary Application of CRs:
Loop Strength Reduction

Loop strength reduction is straight forward to
implement with CR forms of real-valued and
complex functions [Zima92]

Method: add each CR and its nested CR stride as
IVs to the loop nest
Intel 9/12/05
16
Example



Suppose f(i) = a + b·i + c·i2 = {a, +, {b+c, +, 2c}}
We have two IVs x and y:
s(i)
f(i) = x = {x0, +, y} with x0 = a
s(i) = y = {y0, +, 2c} with y0 = b+c
Add x and y as IVs to loop for efficient function
evaluation over unit-distance grid i = 0, …, n :
30
x = a
y = b+c
for i=0 to n
f[i] = x
x = x+y
y = y+2*c
endfor
25
20
15
10
5
0
1
2
3
4
5
6
7
8
9
10
Iteration
Intel 9/12/05
17
Loop Strength Reduction:
Multi-dimensional Loops

Algorithm for 2-d case (n-d similar)
1.
2.
3.
Determine iteration order:
for i = 0 to n
for j = 0 to m
Compute CR forms for j-loop first by treating i
invariant within j-loop
Compute multivariate CR (MCR) forms for i-loop
Intel 9/12/05
18
Example

Suppose f(i,j) = i2 + i·j + 1
Create IV k for f(i,j) in j-loop:
f(i,j) = kj = {pi, +, ri}j with pi = i2 + 1 and ri = i
2. Create IVs for pi and ri in i-loop:
pi = {p0, +, qi}i with p0 = 1
qi = {q0, +, 2}i with q0 = 1
ri = {r0, +, 1}i with r0 = 0
3. Add IVs k, p, q, and r to loops
1.
Intel 9/12/05
p = 1
q = 1
r = 0
for i = 0 to n
k = p
for j = 0 to m
f[i,j] = k
k = k+r
endfor
p = p+q
q = q+2
r = r+1
endfor
19
How to Obtain CR Forms for
Strength Reduction: CR Algebra

Algorithm to compute the CR form of a symbolic function f(i):
1.
2.

Replace i with {0,+,1} in the symbolic form of f
Compute CR form using the CR algebra rewrite rules
(selected rules shown here):
{x, +, y} + c
 {x+c, +, y}
c{x, +, y}
 {c·x, +, c·y}
{x, +, y} + {u, +, v}
 {x+u, +, y+v}
{x, +, y} * {u, +, v}
 {x·u, +, y{u, +, v}+v{x, +, y}+y·v}
Example: f(i) = c·(i+a) = c·({0, +, 1}+a) = c{a, +, 1} = {c·a, +, c}
Intel 9/12/05
20
But What About IV Recognition?

The key idea of our approach:



Scan the loop to detect IV updates
Determine the CR form of the IV using the update
operation
In simple terms, the method looks for operations
on IVs that can be represented as CR forms:
do
J =
I =
P =
while
J+I
I+3
2*P
(…)
J = {J0, +, I}
I = {I0, +, 3}
P = {P0, *, 2}
Intel 9/12/05
J = {J0, +, {I0, +, 3}}
21
Compiler Algorithms for IV
Recognition with CRs

IV recognition algorithms [vanEngelen01a, vanEngelen04] :
Scan the loop in backward order to determine
recurrence relations of scalar variables in the loop
2. Compute CR forms of recurrence relations
3. “Solve” CR forms by computing closed-form
characteristic functions with the CR inverse rules
1.

Note: GCC 4.x uses this algorithm first published in
[vanEngelen01a] applied to loops in SSA form.
GCC developers refer to CRs as “scalar evolutions”
(without proper justification).
Intel 9/12/05
22
Algorithm 1: Find Recurrences

Input: Loop L with live variable information
Output: Set S of recurrence relations of IVs
Start with set S = { v, v | v is live at loop header }
2. Search L from bottom to top:
for each assignment v = x of expression x to scalar
variable v update tuples u, y in S by replacing v in y with x
1.
Loop L
Step S = {H, H, I, I, J, J, K, K}
do
M =
L =
J =
K =
I =
while
2
J-H
L+M
K+M*I
I+1
(…)
5
4
3
2
1
S5 = {H, H, I, I+1, J, J-H+2, K, K+2*I}
S4 = {H, H, I, I+1, J, J-H+M, K, K+M*I}
S3 = {H, H, I, I+1, J, L+M, K, K+M*I}
S2 = {H, H, I, I+1, J, J, K, K+M*I}
S1 = {H, H, I, I+1, J, J, K, K}
Intel 9/12/05
23
Algorithm 2: Compute CR Forms

Input: Set S with recurrence relations
Output: CR forms for IVs in S
For each relation v, x in S do:
if x is of the form v then v = v0 (v is loop invariant)
if x is of the form v + y then v = {v0, +, y}
if x is of the form v * y then v = {v0, *, y}
if x does not contain v then v = {v0, #, y} (v is wrap around)
2. Simplify the CR forms with the CR algebra rewrite rules
1.
Recurrence relation in S
H, H
I, I+1
J, J-H+2
K, K+2*I
CR form
H = H0
I = {I0, +, 1}
J = {J0, +, 2-H}
K = {K0, +, 2*I}
Intel 9/12/05
Simplified CR form
H = H0
I = {I0, +, 1}
J = {J0, +, 2-H0}
K = {K0, +, 2I0, +, 2}
24
Algorithm 3: Solve

Input: CR forms for IVs
Output: Closed-form solutions for IVs (when possible)
For each CR form of v apply the CR inverse algebra,
assuming loop is normalized for i = 0, …, n
2. Certain “exotic” mixed non-polynomial and nonexponential CR forms may not have closed forms
1.
Loop L
Simplified CR form
for i = 0, …
do
M =
L =
J =
K =
I =
while
Closed form
2
J-H
L+M
K+M*I
I+1
(…)
J = {J0, +, 2-H0}
K = {K0, +, 2I0, +, 2}
I = {I0, +, 1}
Intel 9/12/05
J = J0 + (2-H0)*i
K = K0 + i2 + (2I0-1)*i
I = I0 + i
25
Example 1
Loop L
Step S = {x, x, z, z}
do
x =
y =
z =
while
x+z
z+1
y+1
(…)
3 S3 = {x, x+z, z, z+2}
2 S2 = {x, x, z, z+2}
1 S1 = {x, x, z, y+1}
Intel 9/12/05
CR form
x = {x0, +, z}
z = {z0, +, 2}
Closed form
x(i) = x0 + z0i + i2-i
z(i) = z0+2i
26
Example 2
DO I=1,M
DO J=1,I
ij = ij+1
ijkl = ijkl+I-J+1
DO K=I+1,M
DO L=1,K
ijkl = ijkl+1
xijkl[ijkl]=xkl[L]
ENDDO
ENDDO
ijkl = ijkl+ij+left
ENDDO
ENDDO
TRFD code segment
from Perfect Benchmark
with IV updates
IVS
DO I=0,M-1
DO J=0,I
DO K=0,M-I-2
DO L=0,I+K+1
tmp =
ijkl+L+I*(K+(M+M*M+2*left+6)/
4)+J*(left+(M+M*M)/2)+((I*I*M
*M)+2*(K*K+3*K+I*I*(left+1))+
M*I*I)/4+2
xijkl[tmp] = xkl[L+1]
ENDDO
ENDDO
ENDDO
ENDDO
TRFD after aggressive
induction variable substitution
Intel 9/12/05
27
Recognizing Mixed Functional
Forms and Reductions
Loop L
I = 1
do
F = F*I
I = I+1
while (…)
Loop L
I = 0; S = 0
do
S = S+A[I]
I = I+2
while (…)
Simplified CR form
F = {F0, *, 1, +, 1}
I = {1, +, 1}
Factorial
F = F0 * i!
Simplified CR form
Reduction
S = {0, +, A[{0, +, 2}]}
I = {0, +, 2}
S = ∑ A[2i]
Intel 9/12/05
28
Outline



Brief tutorial on induction variable recognition
Chains of recurrences: why and how?
Analyzing pointer arithmetic in loops



Converting pointer references into array references to
facilitate array dependence testing and loop restructuring
Array dependence testing for loop restructuring and
vectorization
Results and conclusions
Intel 9/12/05
29
Converting Pointer References
to Array References



Key observation: pointers with pointer arithmetic
in loops often behave similar to IVs
Use IV recognition algorithms to detect pointerbased IVs
Convert pointer-based IVs to closed-form
characteristic functions to obtain array references
Intel 9/12/05
30
Pointer Access Descriptions of
Pointer and Array References


A pointer access description (PAD) [vanEngelen01b] is a CR
form of a pointer or array reference in a loop nest
PADs are computed with the CR-based IV algorithms
short a[…], *p;
int i;
p = a;
for(i=0;…;i++)
{
}
Loop Code
PAD
Sequence
a[i]
{a, +, 1}
a[0],a[1],a[2],a[3]
a[2*i+1]
{a+1, +, 2}
a[1],a[3],a[5],a[7]
a[(i*i-i)/2] {a, +, 0, +, 1}
a[0],a[0],a[1],a[3]
a[1<<i]
{a+1, +, 1, *, 2} a[1],a[2],a[4],a[8]
p++
{a, +, 1}
a[0],a[1],a[2],a[3]
p+=i
{a, +, 0, +, 1}
a[0],a[0],a[1],a[3]
Intel 9/12/05
31
Example
f += 2;
lsp += 2;
for (i = 2; i <= 5; i++)
{ *f = f[-2];
for (j = 1; j < i; j++, f--)
*f += f[-2]-2*(*lsp)*f[-1];
*f -= 2*(*lsp);
f += i;
lsp += 2;
}
Lsp_az speech codec segment
from ETSI with pointer updates.
Intel 9/12/05
for (i = 0; i <= 3; i++)
{ f[i+2] = f[i];
for (j = 0; j <= i; j++)
f[i-j+2] += f[i-j]2*lsp[2*i+2]*f[i-j+1];
f[1] -= 2*lsp[2*i+2];
}
Lsp_az speech codec segment
after pointer-to-array conversion.
Note that all array index
expressions are affine.
32
Outline




Brief tutorial on induction variable recognition
Chains of recurrences: why and how?
Analyzing pointer arithmetic in loops
Array dependence testing for loop restructuring and
vectorization
CR-based dependence testing
 Solving linear (affine), nonlinear, and symbolic equations


Results and conclusions
Intel 9/12/05
33
Benefits of Array & Pointer
Dependence Testing with CRs

Reduced complexity of testing
Eliminates IV substitution phase
 Eliminates the need for pointer-to-array conversion for
dependence testing on pointer-based C code


Extended coverage
Able to solve linear, nonlinear, and symbolic dependence
equations
 Possibility to augment existing tests, such as the extreme
value test and range test

Intel 9/12/05
34
CR Dependence Equations



Compute dependence equations in CR form for
pointer and array accesses in loop nests directly
without IV substitution or pointer-to-array
conversion
Solve the equations by computing value ranges of
the CR forms to determine solution intervals
If the solution space is empty, there is no
dependence
Intel 9/12/05
35
Determining the Value Range of
a CR Form on a Domain

Suppose x(i) = {x0, +, s(i-1)} for i = 0, …, n



If s(i-1) > 0 then x(i) is monotonically increasing
If s(i-1) < 0 then x(i) is monotonically decreasing
If a function is monotonic on its domain, then it is
trivial to find its exact value range
Intel 9/12/05
36
Example

Determine the value range of x(i) = (i2-i)/2 for i = 0,…,10
Convert to CR form x(i) = {0, +, 0, +, 1} = {0, +, {0, +, 1}}
 Monotonically increasing, since {0, +, 1} = i > 0 for i = 0,…,10
 Therefore, lower bound of x(i) is x(0) = 0 and upper bound is
x(10) = (102-10)/2 = 45


Classic interval analysis often gives conservative results

For this example, interval analysis gives the range
([0,10]2-[0,10])/2 = ([0,100]+[-10,0])/2 = [-10,100]/2 = [-5,50]
Intel 9/12/05
37
Solving a Dependence Equation
float a[…], *p, *q;
p = a;
q = a+2*n;
for (i=0; i<n; i++)
{ t = *p;
S: *p++ = *q;
*q-- = t;
}
S
*
p={a, +, 1}
q={a+2n, +, -1}
Dependence equation:
{a, +, 1}id = {a+2n, + ,-1}iu
Constraints:
0 < id < n-1
0 < iu < n-1
Compute solution interval:
Low[{{-2n, +, 1}iu, +, 1}id]
= Low[{-2n, +, 1}iu]
= -2n
Up[{{-2n, +, 1}iu, +, 1}id]
= Up[{-2n, +, 1}iu + n-1]
= Up[-2n + 2n - 2]
= -2
Rewrite dependence equation:
{a, +, 1}id = {a+2n, +, -1}iu
 {a, +, 1}id - {a+2n, +, -1}iu = 0
 {{-2n, +, 1}iu, +, 1}id = 0
No dependence
Intel 9/12/05
38
Dependence Testing on
Nonlinear & Symbolic Accesses
float a[…], *p, *q;
p = q = a;
for (i=0; i<n; i++)
{ for (j=0; j<=i; j++)
*q += *++p;
q++;
}
DO i = 1, M+1
S1:
A[I*N+10] = ...
S2:
... = A[2*I+K]
K = 2*K+N
ENDDO
p = {{a+1, +, 1, +, 1}i, +, 1}j = a[(i2+i)/2+j+1]
q = {a, +, 1}i = a[i]
S1: A[{N+10, +, N}i]
S2: A[{K0+2N, +, K0+ N+2, *, 2}i]
CR range test disproves
dependence when
CR dep. test disproves
flow dependence (<, <)
K+N > 10 and K > 2
Intel 9/12/05
39
Some Observations wrt.
Vectorization




Auto-vectorization (mostly) requires affine array
accesses, e.g. to enable unimodular loop transformations
Best when vector loads/stores are memory aligned
Data remapping possible, but runtime remapping may
outweigh vectorization speedup
CR forms of array and pointer accesses naturally
represent memory access sequences, which may help in
detecting aligned memory accesses and to support data
remapping
Intel 9/12/05
40
Some Preliminary Results
Perfect Club Suite
Benchmark
Additional dependence pairs
broken by CR test over
Omega and range test
DYFESM*
OCEAN
QCD*
BDNA
TRFD
MDG*
MG3D*
0
63
51
6
10
62
8
Results shown produced from CR test implementation in Polaris
*Test results incomplete due to Polaris memory issue
Intel 9/12/05
41
Conclusions

The CR-based compiler analysis framework supports:






IV recognition and strength reduction optimizations
Pointer-to-array conversion for analysis of C loops
Array dependence testing with affine, nonlinear, and
symbolic dependence equations
Dependence testing on pointer arithmetic
Induction variable substitution
Implementations
GCC 4.x (with limitations!)
 From our lab: Polaris (TBA)

Intel 9/12/05
42
Further Reading




Robert van Engelen, Johnnie Birch, Yixin Shou, Burt Walsh, and Kyle Gallivan, “A
Unified Framework for Nonlinear Dependence Testing and Symbolic Analysis”, in
the proceedings of the ACM International Conference on Supercomputing (ICS),
2004, pages 106-115.
Robert van Engelen, Johnnie Birch, and Kyle Gallivan, “Array Dependence Testing
with the Chains of Recurrences Algebra”, in the proceedings of the IEEE
International Workshop on Innovative Architectures for Future Generation HighPerformance Processors and Systems (IWIA), January 2004, pages 70-81.
Robert van Engelen and Kyle Gallivan, “An Efficient Algorithm for Pointer-toArray Access Conversion for Compiling and Optimizing DSP Applications”, in
proceedings of the 2001 International Workshop on Innovative Architectures for
Future Generation High-Performance Processors and Systems (IWIA), January
2001, pages 80-89.
Robert van Engelen, “Efficient Symbolic Analysis for Optimizing Compilers”, in
proceedings of the International Conference on Compiler Construction, ETAPS
2001, LNCS 2027, pages 118-132.
Intel 9/12/05
43
References
[Aho86]
[Allen69]
[Bannerjee88]
[Blume94]
[Cytron91]
[Franke01]
[Gerlek95]
[Goff91]
[Haghighat95]
[Kennedy81]
[Lowry69]
[Maydan91]
[Muchnick97]
[Psarris03]
[Pugh91]
[vanEngelen01a]
[vanEngelen01b]
[vanEngelen04]
[Wolfe92]
[Wolfe96]
[Zima90]
[Zima92]
AHO, A., SETHI, R., AND ULLMAN, J. Compilers: Principles,Techniques and Tools. Addison-Wesley Publishing Company, Reading MA, 1985.
ALLEN, F.E. Program optimization, Annual Review in Automatic Programming, 5, pp. 239-307.
BANERJEE, U. Dependence Analysis for Supercomputing. Kluwer, Boston, 1988.
BLUME, W., AND EIGENMANN, R. The range test: a dependence test for symbolic non-linear expressions. In proceedings of Supercomputing
(1994), pp. 528–537. 22, 2 (1994), 183–205.
CYTRON, R., FERRANTE, J., ROSEN B.K, WEGMAN, M.N, ZADECK, F.K. Efficiently Computing Static Single Assignment Form and the Control
Dependence Graph, ACM Transactions on Programming Languages and Systems, 1991
FRANKE, B., AND O’BOYLE, M. Compiler transformation of pointers to explicit array accesses in DSP applications. In proceedings of the ETAPS
Conference on Compiler Construction 2001, LNCS 2027 (2001), pp. 69–85.
GERLEK, M., STOLZ, E., AND WOLFE, M. Beyond induction variables: Detecting and classifying sequences using a demand-driven SSA form.
ACM Transactions on Programming Languages and Systems (TOPLAS) 17, 1 (Jan. 1995), pp. 85–122.
GOFF, G., KENNEDY, K., AND TSENG, C.-W. Practical dependence testing. In proceedings of the ACM SIGPLAN’91 Conference on Programming
Language Design and Implementation (PLDI) (1991), vol. 26, pp. 15–29.
HAGHIGHAT, M. R. Symbolic Analysis for Parallelizing Compilers. Kluwer Academic Publishers, 1995.
KENNEDY, K. A survey of data flow analysis techniques, in Muchnick and Jones, (1981), pp.5-54.
LOWRY. E.S. AND MEDLOCK C.W. Object code optimization, Communications of the ACM, 12,( 1991), pp.159-166.
MAYDAN, D. E., HENNESSY, J. L., AND LAM, M. S. Efficient and exact data dependence analysis. In proceedings of the ACM SIGPLAN Conference on
Programming Language Design and Implementation (PLDI) (1991), ACM Press, pp. 1–14.
MUCHNICK, S. Advanced Compiler Design and Implementation. Morgan Kaufmann, San Fransisco, CA, 1997.
PSARRIS, K. Program analysis techniques for transforming programs for parallel systems. Parallel Computing 28, 3 (2003), 455–469.
PUGH, W., AND WONNACOTT, D. Eliminating false data dependences using the Omega test. In proceedings of the ACM SIGPLAN Conference
on Programming Language Design and Implementation (PLDI) (1992), pp. 140–151.
VAN ENGELEN, R. Efficient symbolic analysis for optimizing compilers. In proceedings of the ETAPS Conference on Compiler Construction,
LNCS 2027 (2001), pp. 118–132.
VAN ENGELEN, R., AND GALLIVAN, K. An efficient algorithm for pointer-to-array access conversion for compiling and optimizing
DSP applications. In proceedings of the International Workshop on Innovative Architectures for Future Generation High-Performance Processor
and Systems (IWIA) (2001), pp. 80–89.
VAN ENGELEN, R. A., BIRCH, J., SHOU, Y., WALSH, B., AND GALLIVAN, K. A. A unified framework for nonlinear dependence testing and
symbolic analysis. In proceedings of the ACM International Conference on Supercomputing (ICS) (2004), pp. 106–115.
WOLFE, M. Beyond induction variables. In ACM SIGPLAN’92 Conf. on Programming Language Design and Implementation (1992), pp. 162–174.
WOLFE, M. High Performance Compilers for Parallel Computers. Addison-Wesley, Redwood City, CA, 1996.
ZIMA, H., AND CHAPMAN, B. Supercompilers for Parallel and Vector Computers. ACM Press, New York, 1990.
ZIMA, E. Recurrent relations and speed-up of computations using computer algebra systems. In proceedings of DISCO’92 (1992), LNCS 721, pp.152–161.
Intel 9/12/05
44
Download