Program Improvement by Source-to

advertisement
Program Improvement by Source-to-Source Transformation
DAVID
B.
LOVEMAN
Massachusetts Computer Assoctates, lnc , Wakefield, Massachusetts
ABSTRACT The use of source-to-source p r o g r a m t r a n s f o r m a t i o n s has p r o v e d valuable in improving p r o g r a m
p e r f o r m a n c e The concept of p r o g r a m m a m p u l a t l o n is elucidated by describing its role in b o t h c o n v e n t i o n a l
optnmJzatmn a n d high level modification of condltnonal, looping, a n d p r o c e d u r e structures A n example
p r o g r a m f r a g m e n t written in an Algol-like l a n g u a g e is greatly i m p r o v e d by t r a n s f o r m a t i o n s e n a b l e d by a userprovided assertion a b o u t a d a t a a r r a y A compllatnon model based on the use of source-to-source p r o g r a m
transformations is used to provide a f r a m e w o r k for discussing nssues of code generatnon, compllatnon of high
level languages such as A P L , a n d e h m l n a t m g o v e r h e a d c o m m o n l y associated with m o d u l a r structured
p r o g r a m m i n g A p p l i c a t i o n of the compilation m o d e l to several different languages is discussed
KEY WORDS AND PHRASES p r o g r a m optimization, source-to-source t r a n s f o r m a t i o n , p r o g r a m i m p r o v e m e n t ,
compilation, p r o g r a m mannpulatlon, p r o g r a m t r a n s f o r m a t i o n
CR CATEGORIES
4 12, 4 22, 5 24, 5 25
Introduction
A c o m p u t e r p r o g r a m may be subject to severe time a n d / o r space constraints caused by
the necessnty of e x e c u t i o n on a small m i m c o m p u t e r , real time p e r f o r m a n c e requirements, or e c o n o m i c considerations resulting from repetitive execution of the p r o g r a m .
Opttmtzatton is the t e r m given to the application of a set of rules for manipulating various
representations of a p r o g r a m by exploiting local or global i n v a n a n c e s within the p r o g r a m
in o r d e r to ~mprove the p r o g r a m r e l a t w e to s o m e m e a s u r e . C o n v e n t i o n a l l y , optlmizatton
has b e e n r e g a r d e d as a b e h i n d - t h e - s c e n e s actlvlty to be d o n e by a c o m p i l e r on a
representation of the p r o g r a m invisible to the p r o g r a m m e r . W e have found, h o w e v e r ,
that p r o g r a m nmprovement can be v i e w e d profitably as successive t r a n s f o r m a t i o n s on
source language representations of a p r o g r a m visible to the p r o g r a m m e r and subject to
his direction and g m d a n c e . This perspective, a d v o c a t e d recently by such investigators as
Knuth [16], W e g b r e i t [28], Standish et al. [25], C h e a t h a m and T o w n l e y [8], and H o a r e
[12], has p r o v e d invaluable at Massachusetts C o m p u t e r Associates in such diverse
applications as optimizing compilers for Fortran and a high level cryptographnc language,
design of a h a r d w a r e resource allocation t e c h m q u e for a language for c o m p u t e r controlled a u t o m a t i c test e q u i p m e n t [19], and indicating parallelism in a F o r t r a n c o m p i l e r
for the I L L I A C IV [21]. This view of p r o g r a m i m p r o v e m e n t will be shown in this p a p e r
to provide a m o d e l of the bulk of the compilation process (including most of code
generation), will indicate how to efficiently compile high level languages such as A P L ,
and will show how much of the o v e r h e a d associated with c o n v e n t i o n a l structured or
m o d u l a r p r o g r a m m i n g can be eliminated.
O u r purpose is not to provide a f o r m a l t r e a t m e n t of source-to-source transformations,
nor to prove that certain transformations, u n d e r speciDc enabling conditions, p r e s e r v e
Copyright © 1977, Association for Computing Machinery, Inc General permission to republish, but not for
profit, all or part of thin materml ISgranted provided that ACM's copyright notice is given and that reference ns
made to the pubhcatlon, to its date of issue, and to the fact that reprinting privileges were granted by
permlssxon of the Association for Computing Machinery
Thns paper is a revised version of one with the same title that appears m the Conference Record of the Third
ACM Symposium on Principles of Programming Languages held by ACM-SIGACT and ACM-SIGPLAN,
January 19-21, 1976
Author's address Massachusetts Computer Associates, Inc , 26 Princess Street, Wakefield, MA 01880
Journal of the Associationfor Computing Machinery, Vol 24, No 1, January 1977, pp 121-145
122
DAVID B. LOVEMAN
program equivalence. It is, rather, to establish informally and by means of examples a
uniform view of the compilation process from the processing of high level language
features to the problems of code generation. As such we have glossed over a number of
interesting problems, not the least of which are those of representation, both of program
and of transformations. Our only assumptions are that a tree-structured representation
of a program is obtainable and that the source-to-source transformations are expressible
as pattern-directed rearrangements of tree-structured forms of program text.
In our study of source-to-source transformations, both m the abstract and in the
implementation of several language processing systems, we have found several notions of
use: First, transformations are not applied m random order; the successful application of
a transformation suggests successor transformations. Indeed, m a practical apphcation,
the most difficult software engineering problems are concerned with transformation
ordering and informatiol, gathering. Second, although a transformation may be applicable, it may not win or cause an improvement m the program. Third, the distraction
between machine-dependent and machine-independent portions of a compder Js more
subtle than usually thought; a transformation on a program may be machine independent, in the usual sense, but the reason for applying it may well depend on the target
machine architecture. Fourth, a number of interesting transformations were zdentlfied.
In particular the concept that a variable use may on occasion be replaced by an
expression representing an assertion about the value of the variable is quite powerful.
Fifth, several sequences of transformations have been identified and studied. The code
generation sequence m particular is qmte powerful.
The Programmmg Process
Before discussing optimization, we should discuss some ideas on the nature of the
programming process. The programming process can be divided somewhat arbitrarily
into a number of conceptual phases: construction, verification, optimization, compdation, execution, maintenance, documentation, etc. In an idealized world one constructs a
program with the help of computer aids such as a text editor, file system, and syntax
checker. The intent during program construction is to describe data objects of interest,
primary operations on these objects, and an algorithm using these objects and operations
to solve some problem. At the construction phase in the programming process, programming style should be aimed primarily at clarity of expression and ease of verification and
only secondary attention should be paid to efficiency.
Unfortunately programming styles which lead to logically clear programs tend to result
in programs which, when compiled, are not efficient enough to meet their operational
requirements. Structured programming techniques, such as the isolation of implementation details of operations from the application of the operations, tend to result in highly
modular programs Naive compilauon of highly modular programs results in great
Inefficiency as a consequence of the interconnection of the modules. It is necessary to
decide, for example, which modules should be specialized to their particular calling
environments and which should be left as general modules. The code resulting from
modules which are expanded in line will usually allow for considerable optimization
when viewed in the context of the program in which the expansion occurs. Conventional
optimization techniques, unfortunately, will only perform a little of this optimization.
More powerful techniques which address themselves to program structure, loops and
condRionals in particular, are needed
These optimization techmques may be triggered in two ways: by facts known about the
particular data objects being manipulated at the point of module expansion, or by
interactions with the program text that is the context of the point where the module is
expanded. Examples of these two different ways of triggering optimization will be given
in a later section.
In general we wew the process of optimization as one which e~ther deduces, or accepts
as given, statements of invariance in a program. Statements of mvariance include the
Program Improvement by Source-to-Source Transformatton
123
fixing of the specification of the internal representation of the basic operations on the
data objects, as well as such " s t a n d a r d " consta41cy information as "N has the value of 4 , "
"SIN represents the library sine routine," and "'This algorithm will be executed only on a
CDC 7600." Given statements of invariance, certain program transformations may be
shown to preserve program correctness while at the same time "improving" the program
relative to some measure
Obviously fault can be found with the simplistic view of the programming process
presented above. Algorithms to manipulate data objects are not designed without some
notion of how to implement these objects. Even at program construction some constancy
information such as "this program will be compiled by IBM Fortran Level H " may be
known and the statement of the algorithm may be affected. Also, whale doing opttmlzation, one may discover that a chosen representation does not allow the operational
requirements to be met, thus requiring the construction, verification, and optimization of
a new representation. Our intent in dividing the programming process into construction,
verification, and optimization for algorithm and representation is to stress two separations: data objects and operations on them which are natural to a problem area from the
underlying representation of the objects and operations; and the problem of constructing
a correct algorithm from the problem of making an algorithm meet certain time and/or
space requirements.
Our model of optimization is that it is a phase in the program preparation process
following the construction and verification of a correct program. The first step in the
optimization process is the measurement of the program to determine whether or not
optimization is m fact needed at all. If it is needed, various transformations must be
made to the program in order to improve it. In the absence of software tools, the
programmer must resort to his instinct and knowledge of the language system m order to
improve the program Alternatively the programmer can utilize a set of gmdelines on
how to improve a program [27]. In addition to giving unpredictable results, these
methods require the programmer to physically modify the program, degrading the
readability and, in all probability, introducing subtle errors. With more sophisticated
tools, the program might have been debugged by using a "checkout" compiler w a h good
diagnostic messages. If the resulting program needs improvement, it can be compiled by
using the "optimizing" compiler. If both compilers accept the same language, and ff the
optimizing compiler improves the program enough to meet the operational requirements, this is an acceptable approach
The approach we favor, however, is to allow the programmer to be the strategist and
to provide a mechanical assistant to perform the optimization itself. A canned set of
transformations may improve the program sufficiently. Otherwise the p r o g r a m m e r may
have to experiment by applying transformations and measuring the resulting program
repetitively until a sufficiently improved program results. The programmer indicates the
transformations to perform; the system performs them mechanically, verifying that
program equivalence is preserved or requesting that the programmer so verify.
The advantages of such an interactive program manipulation system seem obvious.
First, effort in optimization is spent only when program measurement determines that
the effort is necessary and potentially beneficial, and only on those portions of the
program where the payoff appears to be high. Work on program measurement [13] and
goal-directed program transformations [28] are relevant here. Second, by concatenating
sequences of low level transformations for the purpose of performing higher level
manipulations, the system will provide the programmer with a powerful set of tools. Such
sequences might include, for example, constant folding, consisting of constant detection,
propagation, computation, and dead variable elimination, or code generation, which we
discuss at length later. A reasonable manipulation system will allow the p r o g r a m m e r to
experiment with and produce his own sequences of transformations for his own purposes. Indeed, we might view a conventional compiler with an optional optimizing phase
as a very simple example of a program manipulation system although there is only one
124
DAVID B.
LOVEMAN
fixed sequence of transformations [9, 20]. Third, if a transformation has been shown to
preserve program correctness, the p r o g r a m m e r may invoke it and investigate its effect on
the program's efficiency without any concern for its effect on program correctness.
Finally, a p r o g r a m m e r is free to use all the facilities of his programming language to
produce a high level, well-structured, modular program with the knowledge that should
it prove to be inefficient, he has access to a set of tools to change representations and
eliminate modularizat)on overhead while preserving the correctness of his original
program.
High Level Opttmizatton
In optimization we treat the algorithm as an object of manipulation, determine items of
algorithm constancy either from the algorithm itself or by statement from the outside,
and simplify the algorithm based on the constancy. We include within optimization such
standard techniques as constant propagation, constant computation, common subexpresslon elimination, removal of invariant code from loops, strength reduction, elimination
of redundant code, etc. O u r primary interest, however, is in high level optimlzattons such
as case splitting based on known possible values of a variable, loop unrolling, loop
fusion, interaction between loops and conditional statements, etc.
We are interested in those opt~mizations which can be described as transformations on
a tree-structured representation of a program, that is, opt~mizauon which can be
considered as source-to-source. We thus may describe an optimization as a mapping of
one p a m c u l a r tree structure to another. The vital issue is that this mapping must
preserve program validity. In some cases, for example loop unrolhng, the transformation
always preserves validity; m others certain enabhng conditions must be true before the
transformation can be performed. F o r example two syntactically identical computations
are conventional common subexpresslons only if corresponding variables referenced by
them can be shown to have the same values.
A second consideration, the raison d'Etre of optimization, ~s that the transformation
must lead to an improvement (of some form) in the program. A complication ~s that the
transformation ~tself may not improve the program, but may change it m such a way that
later transformations will result in a gain. This is analogous to the intermediate expression swell issue in algebraic symbol manipulation. For example, in certain cases computation done prior to an if . . . then . . . else can be duplicated on the then and else
branches. This results in no savings of time and an increase in space, but it may be that
taking advantage of the knowledge that the if predicate is true on the then branch and
false on the else branch will lead to considerable savings.
One can imagine an optimizer searching blindly through a program tree, trying to
decide where to apply transformations. Such a process would be extremely wasteful,
however. It appears that the application of certain transformations immediately suggests
other transformations to apply. For example, having discovered that a variable is
assigned a constant value immediately suggests that we try to propagate that constant
and then see whether the assignment is to a dead variable. Having p r o p a g a t e d the
constant value, we want to see if constant computations can be done and whether we can
assert that other variables are constant.
We thus view an optimization as containing seven parts:
- t h e name of the optimization, for identification purposes;
- a pattern to be found in the program tree structure;
- a pattern predtcate which evaluates to true if the optimization preserves program
validity;
a win predicate which evaluates to true if the optimization, or a succeeding optimization, will improve the program;
- a sequence of pattern-replacement rules which make the transformation,
- a set of predicates asserted to be true as a result of applying this optimmatJon;
- a sequence of names of optlmlzatlons which should be tried after applying this
optimization.
-
Program I m p r o v e m e n t by Source-to-Source Transformatton
125
A serious problem involves the propagation of information from its point of origin to
where it is needed to evaluate pre&cates in an optimization rule. Classical optimizers
such as IBM's Fortran H have used the techmques of back dominators or interval
analysis as described in Schaefer [22]. A n improved form of global analysis called "pgraph analysis" has been developed by Massachusetts C o m p u t e r Assoctates and used in
the I L L I A C - I V Fortran compiler and in the Fortran Laundry, a program developed for
Fortran source language optimization [20, 24]. Wegbreit [29] has extended the p-graph
concepts so that they apply to any property set which satisfies an appropriate set of
axioms. Recent work by Karr [14] has studied the problems of gathering and propagating information about programs
We are interested m techniques for manipulating programs which have looping
structures explicitly written by the programmer. Although these are of interest, of
greater interest are the loops which exist lmpllc~tly within the program as a result of high
level programming language features. A programmer could conceivably perform transformations such as loop fusion on loops which he has written. There is no way, however,
that he can touch the implicit loops m the program. By analogy, a p r o g r a m m e r can, if he
~s careful, perform common subexpresslon ellmmat~on on code he has written There is
no way, however, that he can ehminate the common code resulting from similar array
references; the compiler must aecomphsh this.
High level programming language features which provide implicit looping may exist in
two forms: They may be budt into the language as standard features, such as the primary
operators and con&tion expressions in cryptological languages or the vector and array
operations in A P L , or they may be provided by extension in an extenslble language such
as ECL [30]. The first case can clearly be viewed as a spectal case of the second, where
the "extensions" arc made at language definttlon time.
Examples o f Htgher Level Optimization 1
Assume the data type matrtx lS defined in a natural way and mult ts a procedure which
wdi multiply an l by m and an m by n matrix, giving an l by n result matrix:
procedure mult(x mamx[l, m], y
loop fort = 1 t o l
loop for1 = 1 t o n
matrix[m, n], z matrtx[l, n])
z[,, j] .= 0,
= ltom'
z[,, d '= x[,, k]*y[k, d + zO, J];
loop f o r k
repeat;
repeat,
repeat,
end,
Let us look at a parhcular call on m u l t , namely, a multiplication of a by b giving c, where
a happens to be a &agonal matrix:
dtagonal(x matrix) ~ ~ ~ 1 implies x[l, 1] = 0,
dedare a matnx[lO, 10], b matrtx[lO, 20], c mamx[lO, 20],
assume a is dtagonal,
mult(a, b, c),
Directly expanding mult in line gives, with constant propagation
loop for :
1 to 10"
loop for/ = l t o 2 0
"=
c[l,/] = 0,
loop f o r k = 1 t o l O
C[t, I] = nit, k ]*b[k, 1] + C[~,1],
repeat,
repeat,
repeat,
The syntax we use in the examples is close to that of [16] except that we use endif where Knuth uses ft.
126
DAVID
B. LOVEMAN
Case splitting based on the knowledge a is dtagonal allows a r e p l a c e m e n t o f a [ i , k] by if
i ~ k then 0 else air, k]. The assignment s t a t e m e n t in the i n n e r m o s t loop b e c o m e s
c[:, 1] := (if ~ ~ k then 0 else a[~, k])*b[k, 1] + c[t, 1];
By e x p a n d i n g the scope of the if, we get
loop fort = l t o l O
loop for/ = 1 t o 2 0 :
c[~,]] := 0,
loop for k := 1 to 10
if z ~ k then
c[,, j] = 0*b[k, ]] + c[~,j],
else c[t, 1] = air, k]*b[k, I] + c[t,/];
endif,
repeat,
repeat,
repeat,
The then clause simphfies to c[t, j] := c[i, ]], which is r e d u n d a n t a n d may be eliminated.
In the else cause i = k; thus references to k m a y be replaced by references to t. T h e
program Is n o w
loop fort = l t o l O
loop for/ = 1 tb20
c[t, ;] .= 0.
loop fork = l t o l 0
fit = k then
c[t, 1] .= air, t]*blt, 1] + c[L 1],
endif,
repeat;
repeat,
repeat,
T h e if s t a t e m e n t w i t h i n t h e i n n e r m o s t l o o p s e l e c t s t h e o n e n u m b e r in t h e r a n g e o f k
w h i c h ~s e q u a l t o t h e c u r r e n t v a l u e o f t. I n o t h e r w o r d s t h e l o o p o n k c o u p l e d w R h t h e if
s t a t e m e n t s a y s , " I f t h e s e t o f v a l u e s o f t a n d k i n t e r s e c t , a n d t h e c u r r e n t v a l u e o f i is
within the mtersechon, then perform the given code." In general the intersection of the
sets of values might be null, or difftcult to determine. In the case of nested loops,
h o w e v e r , t h e t e s t s a r e p a r t i c u l a r l y e a s y . If w e h a v e , f o r e x a m p l e ,
loop for i := lmzttal to lanai"
loop for k = kmmat to kyinal
if t = k then
endff,
repeat,
repeat,
then the values of t and k intersect if t~n,t~a~ --< kf~nal and k~B,t,~ --< t~n~. The value of l in the
tuner loop is within the intersection lf max(k,n,t,~t, i,,,,t,~) --< i and t --< min(kr~n~z, i~n~l). The
program may thus be written by using this t r a n s f o r m a t i o n as
loop for I := 1 to 10
Ioopfor/ = l t o 2 0
c[,, 1] = 0,
if 1 ~ 10 and 1 --< 10 and max(l, 1) -< t a n d l ~ mm(10, 10) then
c[t, 1] = a[t, t]*b[t, 1] + c[t, 1],
endif,
repeat,
repeat,
1 _< 10; m a x ( l , 1), and min(lO, 10) clearly may be e v a l u a t e d at compile time; a n d the
pre&cate of the if s t a t e m e n t is clearly true for all i since the if s t a t e m e n t is in the range of
Program Improvement
by Source-to-Source
Transformation
127
a loop in which 1 --< i -< 10. Thus the if statement may be replaced by its then branch,
giving
loop fort = l t o l O
loop for/ = l t o 2 0
¢[~,j] = 0,
c[t, 1] .= air, t]*b[t, 1] + c[t, 1],
repeat,
repeat,
At this point constant propagation, expression simplification, and dead variable
ehmination (after the zero has been propagated, the assignment c[i,]] := 0 is redundant
since that generation of c[t, I] is dead) gives the simplified code
loop for t .= 1 to 10
loop for/ .= 1 to 20
cb, j]
=
a[~, d,bit, d,
repeat,
repeat,
With the p r o g r a m m this form, conventional compilation techniques would result in
good object code. This example of optimization depended on assertions known true at
the beglnnmg and on assertions derwed from the program. Clearly the more that is
known about data and program regularities, the better the job an optimizer can do. Such
initial assertions can come from two sources: from the programmer by direct statement;
or derived from the program as a result of global compilation, compiling the routine in
the presence of the call of the routine m a main program.
As a second example, suppose we have, in additmn to the above, extended the
definition of "*" as a binary infix operator for matrix multiply, " + " as a binary infix
operator for matrix addition, and " : = " as a binary infix operator for matrix assignment.
The meaning of "*" is the procedure m u l t ; the meaning of " + " is
procedureplus(x matrix[l, m], y
loop forl = l t o l .
I o o p f o r l = 1 torn
matrix[l, m], z matrix[l, m])
z[l, 1] = x[t, 1] + y[t, I],
repeat,
repeat,
end,
and the meaning of " : = " is
procedure asstgn(x matrix[l, m], z
loop forl = l t o l
I o o p f o r l = 1 tom
matrix[l, m])
ztz, j] "= x[l, d;
repeat,
repeat,
end,
Suppose we have the following:
declare a matrtx[lO, 10], b matrtx[lO, 10], c . matrtx[lO, 10], d matrtx[lO, 10],
d = (a.b) + c,
Naive compilation of the assignment statement will give
declare tl :rnatrtx I10, 101,t2
matrix [10, 10],
mult(a, b, tl),
plus(tl, c, t2),
assign(t2, d),
Note the need for 200 words of temporary storage. Expanding m u l t , p l u s , and a s s i g n in
hne gives
DAVID B LOVEMAN
128
loop for ~ := 1 to 10
loop f o r t := 1 to 10"
tl[i,/] = 0;
loop for k .= 1 to 10.
tl[t,l] .= air, k ]*b[k, I] + tl[t,/];
repeat,
repeat;
repeat,
loop forz = l t o 1 0
loop f o r / .= 1 to 10.
t2[t, 1] = tl[t, 1] +c[t, 1],
repeat,
repeat,
loop fort = l t o 1 0
Ioopforl = lto10
d[t, 1] = t2[:, I],
repeat,
repeat,
T h e o u t e r loops can clearly be fused since the index sets are the same and t h e r e are no
data d e p e n d e n c i e s which forbid It, giving
loop f o r t = l t o l O
loopforj = ltolO.
d[~, d '= o,
loop for k .= 1 to tO
tI[:, 1] = air, k ]*b[k, j] + tl[,,/],
repeat;
t2[~,/] := tt[~,j] + c[,, j];
d[,, 1] := t2b, 1],
repeat,
repeat,
We now note that each instance
oftl[i,j]
ts d e a d a f t e r t h e ( i , j } i t e r a t i o n o f t h e t w o o u t e r
loops. That is, t l and t2 have b e e n r e d u c e d to scalars. S u b s u m p t i o n (replacing a variable
which is used only once by its definition) and initializing t l to c[i, j] gives
loop fort = l t o l O
loop f o r / = 1 t o l O
tl = c[l,l],
loop f o r k = 1 tolO
tl = ab, k]*b[k, 1] + tl,
repeat,
d[z, 1] ,= tt,
repeat,
repeat,
We could have e l i m i n a t e d t l c o m p l e t e l y by accumulating in d[t,j], but we did not for two
reasons First, on m a n y m a c h i n e s an array r e f e r e n c e is m o r e expensive than a scalar
reference. Second, a c o m p i l e r ' s c o d e g e n e r a t o r is m o r e hkely to assign a scalar which is
d e a d at the end of each l o o p Iteration to a high s p e e d register than assign an array
e l e m e n t r e f e r e n c e to a high speed reglstdr
Description o f P r o g r a m T r a n s f o r m a t i o n Rules
As indicated earlier, a p r o g r a m t r a n s f o r m a t i o n rule says. " F i n d s o m e pattern in the
p r o g r a m for which s o m e conditions are true. If an e v a l u a t i o n function indicates that this
transformation results in a gain, relative to the a p p r o p r i a t e m e a s u r e , then p e r f o r m a set
of r e p l a c e m e n t s and m a k e a set of assertions. W h e n a rule has b e e n e x e c u t e d , a set of
rules m a y b e c o m e applicable; see if they a p p l y . " A possible syntax for such a rule might
be
Program Improvement by Source-to-Source Transformatton
name
129
in pattern where pattern~predtcate
w h e n wtn ~ predtcate
then patt ~ replacementt,
path ~ replacement.
assert predlcate~,
, laredtcateA
try name~,
, namem,
As an example of a conventional optimization, the rules for constant folding might be
written
constant~assert in "x = y" where constant(y)
assert constant(x), value(x) = value(y)
try dead\ var, constant~propagate,
constant~propagate in "x" where constant(x)
then "x" ~ value(x)
try constant~ computatton,
constant~ assert,
constant~ computatton in "x op y" where constant(x) and constant(y)
then "x op y" ~ value(x) op value(y)
try constant~ computatton,
constant~ assert,
d e a d ~ var in "x = y " where dead(x)
then "x = y " ~ null,
The rules in the above examples of constant folding do not have w m ~ p r e d i c a t e s
because they always result in an improvement Other rules are not always beneficial. For
example, in general, sphttlng a loop into two loops introduces an overhead of extra
control structure. However, on a CDC 7600, if the new loops will fit into the instruction
stack of 12 words, there will be a gain. This rule might be written
l o o p ~ s p h t in "'loop A . B repeat,"
where can ~ spht(Ioop) and c o d e r length (loop) > 12
w h e n code~length("loop A repeat") ~_ 12 or
code~length("loop B repeat") _< 12
then "loop A ,B r e p e a t , " ~ " l o o p A repeat,
loop B repeat,",
If we were to actually state formally all the detatis of the rules presented later, the
presentation would be qmte tedious, excessively long, and difficult to follow Some
predicates and functions, such as c a n ~ s p l t t and c o d e \ l e n g t h above, are decidedly
nontrlwal, but their existence can be assumed without much difficulty. We are most
interested m the pattern and replacements. Rather than give the detailed replacements,
in most cases we have lUSt given a generalized before and after example m the hope that
the reader can deduce the form of the transformations.
Typtcal Program Transformation Rules
We have somewhat arbitrarily divided the program transformations into five classes for
ease of exposition: stmphficattons, opttmlzattons, evoluttons, devoluttons, and mantpulattons. Since program transformation rules are still being developed, we cannot give a
complete set of such rules, if indeed such a set exists. We shall describe informally and by
example a collection of approximately 40 transformation rules. These include fairly
trwlal program slmphfications, conventional optim~zations, sophisticated loop parallel~sm detection evoluuons, straightforward devolution or replacement of a construct by its
lower level meaning, and m a m p u l a h o n rules. These rules include all those referred to m
the examples. A n extenswe collection of over 100 transformation rules is presented by
Standish et al. [26]
Stmphficatton refers to a generalization of the idea of constant computation and
involves cleaning up local patches of program m fairly obvious ways. We shall list some
of the more obvious s~mphfications:
130
DAVID B. LOVEMAN
• Constant computation, including evaluation of procedures with constant arguments
(and no side effects):
3 + 2~5
sqrt(9)~3
• Expression simplifieaUon:
x * l + 0 =~x
• Loop collapse: elimination of a vacuous loop:
loop repeat; ~ empty
• Prune conditional:
if true then A else B endif; ~ A
if false then A else B endif; ~ B
• Reorder conditional:
i f p then else A endif; =-~ if not p then A eudif,
• Assignment ehmmation by equahty: if a = b, the assignment a := b; can be
eliminated. A special case of this eliminates the assignment a := a;
Optimizauon Is the term we use to describe those program transformations, usually of
a low level, which are already well understood m some sense. These transformations
include all the classical ones and perhaps a few new ones:
• C o m m o n subexpression elimination.
• Code motion from program areas of high frequency to those of lower frequency.
• Strength reduction.
• Dead variable elimination.
• Constant propagation.
• SubsumpUon: following an assignment m := expr; and before modification of any
part o f e x p r , a use o f m may be replaced by expr (only ifexpr has no side effects). This is
especially valuable if there is only one use of m, or if the expr is a simple variable name,
in which case the transformation is called scalar propagation. The idea of subsumption
may be used with embedded assignment to assist register assignment. For example r := a
+b;...
; z := r ; can be replaced by z := (r : = a + b ) ;
• go to chasmg:
go toa,
go to b,
a gotob,
go to b,
n Array temporary e h m m a t t o n : a [ i , / ] := b , c ; . .. ;a[/, i] := a[i, j]*2; can be replaced
b y t := b * c ; a [ i , j ] := t; . . . ; a [ j , i] := t*2;
• Count up to zero: A counting up loop can be shifted so that the last loop value is
zero, thus simplifying the loop termination test. This requires modifying all uses of the
index within the loop and insertion of an index adjustment assignment following the
loop. If the loop final value is a constant, the index offset is a constant and its addition
can be done via address arithmetic. If the index is dead following the loop, the index
adjustment assignment is not needed.
loop for1:= l b y l t o n
.l.
repeat,
loop f o r t . = 1 - n b y l t o O
~
., t+n
repeat,
t:=n+l,
* Move while to end: Any loop must contain a conditional jump. Thus if a loop
terminates with an unconditional jump, the loop can be rotated to move the conditional
jump to the end, thus eliminating the unconditional jump.
Program I m p r o v e m e n t by Source-to-Source Transformation
loop,
~
go to I,
loop'
A
while p
B
131
B
l A
while p repeat,
repeat,
The term evolutton is used to describe those program transformations which discover
higher level language constructs lurking among lower level constructs. Primarily we are
concerned with discovering higher level parallel looping constructs. Lamport's work on
the parallel execution of loops [17, 18, 21 ] is of great importance here. Lamport defines
three types of parallel loops: l o o p c o n e . . . . concurrent loops m which individual
processors are assumed to operate asynchronously on different iterations of the loop;
l o o p s i r e . . . , simultaneous loops in which individual processors operate in lock-step
fashion, similar to the ILLIAC IV, and l o o p s y n c . . . , synchronized loops in which the
indwidual processors are assumed to not be in lock-step but to obey a synchronization
condition which guarantees preservation of generation-use orderings. Two transformations have been developed to detect parallelism in conventional ~terated loops:
* The coordmate m e t h o d transforms certain well behaved sets of nested loop tot's into
a l o o p sire.
* The hyperplane m e t h o d transforms certain well behaved sets of nested loop tot's into
a l o o p cone. Obviously any l o o p c o n e is also a l o o p sire.
It has been shown that any l o o p sire produced by the coordinate method is also a l o o p
syne.
• Strtp m i n m g refers to the transformation of a given parallel loop into the lterative
execution of a parallel loop with smaller index set. This transformation is necessary for
any parallel machine with a fixed n u m b e r of processors. For example, on the I L L I A C IV
which has 64 processors:
Ioopfort = l b y l t o 2 5 6
~
Ioopconcfort = l b y l t o 2 5 6
~
a(t) = b 0 ) +c(t),
a(t) = b ( t ) + c ( t ) ,
repeal,
repeal,
ioopforl = 0 b y l t o 3
Ioopsimfort =64.1 + l b y l
to 64*1 + 64
a(t) = b(t) + cO),
repeat,
repeat,
Observe that strip mining can work equally well for a strip of width one, i e. for a
machine with only one processor. It has been shown that any l o o p e o n c or l o o p syne can
be strip mined. Thus in place of the second transformahon in the example above, we
could have strip mined m reverse order with a width of one to get the counting backward
loop:
loop fort = 256 by-1 to 1
a(t) = b(t) + c(t),
repeat,
The transformations to parallel loops require considerable data dependency analysis,
exactly the analysis necessary for loop splitting, loop fusion, etc. We shall see the role of
parallel loop detectmn in easing the explanation of these transformatmns later.
Another evolution transformation we need is
* Coindex detection, which, for a given loop, determines variables other than those
given exphcitly which also serve as loop indexes. The coindexes are written in the for
clause as explicit parallel indexes. For example:
./.=4,
loop forz = l b y l t o 7
A
I = 1 + 3,
repeat,
loop fort = l b y 1 t o 7 # . /
~
A
repeal,
= 4by3to22
132
DAVID B. LOVEMAN
D e v o l u t i o n d e s c r i b e s t h e r e p l a c e m e n t of a h i g h e r level l a n g u a g e f e a t u r e b y its m e a n i n g
in t e r m s o f l o w e r level l a n g u a g e f e a t u r e s . Typically we p e r f o r m t r a n s f o r m a t i o n s o n g i v e n
language constructs, then replace the constructs by their meaning and perform further
t r a n s f o r m a t i o n s at t h e l o w e r level.
• Coindex
ehmtnation:
loop fort = bt b y s i toil
# / := b! by s1 to f j
A
repeat;
] =b/,
until done:
loop fort = bt by st toil:
A
I =1+s1,
t f (hi - fl)*slgn(sl) > 0 the• done, endif;
repeat,
end,
• Multistep
iteratton elimination:
declare b procedure A end,
loop for t '= t l by s I to f l
loop for t := t 1 by s 1 to f l ,
t2 by s2 to f2.
b,
A
repeat,
~
repeat,
loop for t = t2 bys2 to f2
b,
repeat,
* Value-list tteration ehminatton:
loop
fort = v l , v2, v3, v4
A
repeat,
~
declare b procedure' A end;
t =vl,b,
s = v2, b,
t .= v3, b;
t = v4, b,
alternatively
declare b procedure A end,
declare a array [1 4]
initially (vl, v2, v3, v4),
loop for1 = 1 by 1 t o 4
t = a[/], b,
repeat;
• Single step tteratton reduction:
loop fort .= b bys t o f
A
repeat,
~
sl = s,
ft=f,
loop for t = b by st tuff
A
repeat,
• Single step iteration eliminatton:
Ioopfort = b bys tof
A
repeat,
t = b,
~
st "= s,
fi = f,
go to l,
loop
A
I .=
l + St,
1 while (t - fi)*stgn(st) --< 0
repeat,
• Dahl loop elimmatton:
loop
A
go to l,
rn B
while p
B
repeat,
~
• Procedure
pl,
...
l
A
ifp then go t o m , endlf,
mtegratton:
S u p p o s e f h a s b e e n d e c l a r e d a p r o c e d u r e of n a r g u m e n t s
m n , r e s p e c u v e l y , a n d w i t h r e t u r n v a l u e p r of m o d e m r .
, pn of modes ml .....
For example:
Program I m p r o v e m e n t by Source-to-Source Transformation
declarefprocedure (pl .rnl,
. ,pn .mn,pr
133
mr).A end,
A reference to f can be replaced by a block which computes the appropriate value:
f(al,
, an)
~
begin
declare
m l initially a 1,
pn mn initiallyan,
pr mr,
p 1
A
pr
end
Manipulation is the mapping of one versmn of a computation into another version
which is better m the sense that it is smaller or faster or will enable other beneficml
transformatmns to apply to the computation. Some of the manipulations that have been
developed are given here:
• B a c k expansion o f c o n d m o n a l allows a computatmn to be tailored by the reformation that p is true on the t h e n branch and false on the else branch. If the absorbed
computatmn is an assignment, the variable assigned to may well be dead on one of the
branches.
A
ifp then
TC
else FC
endif,
ifp then
A, TC
else A, FC
endif,
• F o r w a r d expanston o f condttional:
lip then
TC
else FC
enflif,
ifp then
TC, A
else FC, A
endif,
A
• L o o p unrolhng in general reduces the number of tests and jumps executed and
increases the number of instructions exposed for parallel execution. It often aids register
allocation by amehorating the seam matching problem.
loop
A
loop
~
A
repeat,
A
repeat,
loop while p :
until done
loop while p
A
repeat,
A
ifp then done, endif,
A
repeat,
end,
loop fort = b bys t o f
A
repeat,
until done
loop for~ = b b y s t o f
A
t .:| +S,
if i > f the• done, endit,
A
repeat,
end,
Note that ff the number of iterations of the loop is divisible by the unrolhng factor, then
134
DAVID B. LOVEMAN
the m i d l o o p test for t e r m i n a t i o n is not n e e d e d . T h e assignment t := t + s, can often be
e l i m i n a t e d by subsumptlon.
• L o o p first case:
loop fort .= b bys toy
A
repeat;
t = b,
ff
if t -< f then
A
loop fort = t + s b y s t o f
A
repeat,
endif,
If it can be shown that the l o o p will have at least one iteration, i.e. that t -< f initially,
t h e n the enclosing if s t a t e m e n t can be r e m o v e d . T h e assignment ~ := b; can often be
e l i m i n a t e d by subsumptlon.
• L o o p last case:
loop fort = b bys tof
A
repeat,
ifl ~ f then
loop f o r t ' = b b y s t o f - s
~
A
repeat,
A
endif,
• L o o p mtddle case:
loop for/ = b bys t o f
loop fort = b b y s t • m
A
A
repeat,
repeat,
loop forl = t b y s t • f
A
repeat.
This m a m p u l a t i o n is particularly useful w h e n the l o o p body is an a p p r o p r i a t e c o n d m o n a l
s t a t e m e n t , for e x a m p l e if/--> m + 1 t h e n . . . If the predicate ist = m + 1, p e r f o r m i n g a
loop first case m a n i p u l a t i o n on the second resulting l o o p is often valuable.
• Case spht: A s s u m e there is an assertion of the f o r m p implies q, w h e r e p is a predicate
a b o u t array subscripts and q an assertion a b o u t array values of the f o r m ar = value,
where ar Is an array r e f e r e n c e (whose subscripts are restricted by p ) . F o r e x a m p l e , i ~ j
implies a[i, j] = 0. If there is an expression which contains an array r e f e r e n c e r e f which
" m a t c h e s " ar, then replace r e f w i t h i f p t h e n value else r e f e n d l f . If there is an assignment
r e f := x w h e r e r e f " m a t c h e s " ar and x m a t c h e s value, the assignment can be r e p l a c e d
with if not p t h e n ref := x; endif. F o r e x a m p l e , suppose i -~ j implies a[i, j] = O; then
a[k, 1] := O, will b e c o m e i f k = l t h e n a[k, 1] := O; endif;
• L o o p elimmatton: A predicate within an inner l o o p may be used to select an i t e m out
of the intersection of the index sets of the inner l o o p and o u t e r loop. W h e n d e t e c t e d , this
condition can be greatly simplified.
loop fort : = / b y l t o k
A
loop fort = / b y l t o k
~
loop forl = m by 1 ton
ift = l t h e n
S
endif,
repeat,
B
repeat,
A
if max(l, m) <--t --<rnm(k, n) the•
l =t,
S
endif,
B
repeat,
Typically the m a x and m m expressions e v a l u a t e to constants and the assignment l := ~ is
e l i m i n a t e d by s u b s u m p t i o n .
• L o o p compresston: O f t e n a predicate within a l o o p serves to restrict the index set of
the loop. In this case the restriction can be i n c o r p o r a t e d into the index set itself:
Program Improvement by Source-to-Source Transformatton
suppose b --< I -~ f;
Io•pfort = b b y l t • f
if t -->1 the•
135
loop fort = l by 1 t • f
A
repeal,
~
A
endif,
repeal,
or
suppose b -< l _<f,
loop fort = b b y l t • f
ift --<l then
A
loop fort = b b y l t • l ,
A
repeat,
endif,
repeal,
• Order reduction: If an array a is used in a loop on t and all references to a have i in,
for e x a m p l e , the first subscript position, then ff a is d e a d following the loop, the
dlmenslonality of a may be r e d u c e d . (As stated here, this is a special case of a m o r e
general rule.) F o r e x a m p l e :
loop for t =
a0,1].
a[i, 3]
a[t,,, + 7]
loop for t '=
~
aD]
a[3]
a[l + 7]
repeat,
repeat,
u Index set shift: It may be that the set of values assumed by an index variable, if
modified, will be m o r e compatible with additional transformations.
loop fort = b b y s t o f
I
loop fort .= b + c bys t o f + c ,
~
repeat,
1 --C
repeat,
• Loop fuston" A d j a c e n t loops o v e r the same index set, if data d e p e n d e n c y considerations allow, can be fused into a single loop. The data d e p e n d e n c y analysis can be
r e p r e s e n t e d by first evolving c o n v e n t i o n a l loops into sire loops, fusing, and strip mining
to get back to a c o n v e n t i o n a l loop. O n e of the index sets may require index set shifting to
m a k e it identical to the other.
loop fort = b b y s t o f
A
Ioopsimfort
=~
repeat,
loop for] .= b hys t o f
= b bys t o f
A
repeat,
loop sim forj '= b by s tof'
B
B
repeat,
repeat,
Ioopsim furl = b bys t o f
A
I =t,
B
loop for i = b bys tof.
A
! =t,
B
~
repeal,
repeal,
u Loop sphttmg: The loop splitting manipulation reverses a l o o p fusion and d e p e n d s
on similar analysis: E v o l v e to s i m loops, spht, strip mine.
loop fort = b b y s t o f
A
B
Ioopsimfort
~
repeat,
repeat,
Ioopsimfort = b b y s t o f
loop fort = b b y s t • f
A
repeat,
Ioopsim fort = b b y s t • f
B
repeat,
= b bys t o f
A
B
A
~
repeat,
loop fort = b bys t o f
B
repeat,
136
DAVID B.
LOVEMAN
* Loop reordermg: The loop reordering manipulation depends on finding a tightly
nested loop set, evolving to a sire loop, and strip mining in an appropriate way:
loop for t := bi by st to ft"
loop for j = bl by sl to fl
loop sire for t , I = (bt by sl t o i l )
× (b1 by s! to f/)
A
repeat;
repeat,
A
repeat,
loop for I = bl by s/ to fl
loop fort = b t b y s t t o f i
A
repeat,
repeat,
* Loop variable ehmmatton
suppose t t s mvartant tn loop
and dt ts the change t n t as a result o f the loop,
loop f o r t . = t + b b y s t o f .
A
repeat,
loop fort = t + b b y s
tof
l =t,
A
repeat,
t =t-dr,
The Compdatton Model
As Hoare [12] has commented, a solution to the problem of producing efficient object
code for programs written in a high level language is to have designed the language so
that "a simple straightforward 'non-pessimizing' compiler will produce straightforward
object programs of acceptable compactness and efhclency . . . the language [should be]
sufficiently expressive that m o s t . . , optimizatlons can be made in the language i t s e l f . . .
that a general machine-independent optimizer can simply translate an inefficient program into a more efficient one w,th guaranteed identical effects, and expressed in the
same source language." This is our approach.
Our model is that a straightforward parser, perhaps automatically generated, wdl
parse the character string representation of a program into a more highly structured,
eas,er to manipulate, internal form. It is this internal form that is in fact transformed by
our source-to-source transformations. At any point in the transformation process we can
inspect the results by use of an unparser or transcriber which will re-create the character
string representation of a program. We believe that by applying an appropriate set of
transformations to a given source program written in a suitably rich multilevel source
language, we can transform the given program in such a way that quite naive code
generation techmques will result in very good code. It is our hope that this reduction in
scope of the code generation problem may in fact make possible some real progress
toward the automatic, or at least semiautomatic, production of code generators. The
point here is that a considerable amount of the work done by conventional code
generation can be viewed as source-to-source transformation in which the win predicate
depends on the target machine. That is, the transformation may always be a valid one,
but may only be useful for a particular target machine.
We have found, however, that most existing languages do not allow "nonpessimlzing"
compilation. It is an assumption inherent in source-to-source transformation that there is
some subset of the language for which the compiler knows how to generate efficient
code. For example, part of the benefit of constant propagation will be lost if the compiler
does not generate immediate instructions when one argument of a computation is a
constant. In our work with existing languages, we have found it necessary to make
suitable language extensions so that the results of our transformations can be expressed.
It is intended that the algorithm writer, however, will confine himself to the original
language. Examples of language extensions we have used in existing languages include
Program Improvement by Source-to-Source Transformation
137
e m b e d d e d assignment; running relationals such as a < x < b or x = 3 151 8; value
returning blocks and con&tionals; storage attributes in declarations; restricted procedure
invocations; and many forms of looping constructs with restricted semantics including
value list iteration, multiple indexes, coindex, Dahl loop [16], concurrent, simultaneous,
and synchronous loops, and Zahn situation statements [16[.
Since our compilation model attempts to reflect as much as possible what is going on in
the source code itself, it is important that the language facilities exist at all different levels
of abstractness to express the desired meaning. Although we favor a single, unified
language, the ideas are equally applicable to multiple languages. One could imagine, for
example, performing transformations m a very high level language, then devolving the
language constructs to a language like Pascal. In this lower form, more transformations
could take place; loops which were previously implicit would now become explicit, for
example. Following these transformations, we should devolve the program into a low
level language, perhaps BLISS [31], enabling stdl more transformations. Finally we have
the program m a low level form, with language constructs reflecting the machine
architecture, for which simple machine code generation will result m high quality code.
It ~s important to note our distinction between machine independent and machine
dependent. In a conventional compiler there is a (more or less) machine independent
portion consisting of parse and optimization followed by a (typically) ad hoc code
generation phase. In our model of compdatlon a program is parsed into a mampulable
form, subjected to a large number of source-to-source transformations, and then has
code generated by a nawe walk over the program representation. The transformations
themselves, even those done early in the processing, may be machine independent m that
they do not affect the program meaning or reflect directly machine dependent features,
but the motwatlon for performing them, the win pre&cate, does depend on machine
features For example, doing an index set shift so that the final value of a loop variable Is
zero may be done if the target machine has a very efficient register compare with zero
and if addition of a constant in an address calculation IS free:
loop fort = l t o l O
all] = a[t]+b[t], ~
repeat,
loop fort = - 9 t o o
a[t + 10] =oUt + 10]+b[t + 10],
repeat,
Alternatively, which is probably better in this specific case, we could transform the l o o p
into a loop sim~ and strip mine in reverse order, giwng a counting down loop rather
than a counting up loop As a second e x a m p l e , x = a o r x = b could be rewritten a s x =
a { b, effectively performing the common subexpression elimination of the reference to
x. Alternatwely, at a lower level conventional common subexpression elimination would
perform an essentially similar transformation.
This model of compilation has been grossly slmphfied by convemently forgetting the
problems of global information gathering. Typical global processing requires complete
global flow analysis of the entire program and is used, for example, to produce use-set
reformation or register assignment. Th~s analys~s falls outside the scope of this paper,
although we expect that the results are directly applicable. Use-set information can be
reflected in the program by attaching generation markers to the varmble names implemented as threads in the internal representation. Register assignment can be reflected by
introducing a (lo~v level) register data type.
for
Code Generatton
One of the more interesting aspects of source-to-source transformations is the possibilities presented by the abihty to concatenate sequences of transformations to perform
complex program manipulations. One such sequence for constant folding was hinted at
earlier. The sequence of transformations for performing "conventional" optimization is
described in [20]. In this section we w~sh to consider briefly a sequence of transformations that we have termed code generatton.
138
DAVID
B.
LOVEMAN
At some point in the compilation process the compiler has a program representatton
which has been optimized as far as is possible at the level of abstraction of the language
operators present in the program. The compiler then has one of two choices: The
program representation can be turned over for execution to an interpreter which
understands the program in its current form. Alternatively, the language features
represented m the program can be devolved to lower level forms which can then be
subject to further optimization. For example, we may have the A P L expression (A × B)
+ A × B. Source-to-source transformations might replace this with 2 × A × B. No
further improvements can be made without looking at the meaning of × on matrices and
scalars. Once this ts done, we can proceed in a manner similar to the prevtous examples.
This problem, for APL, has been examined by Abrams [1].
A similar problem has been studied by Carter [7]. Carter is concerned with special
case analysis for PL/1 concatenation operation. His solution is to define a procedure
c o n c a t e n a t e which contains within its body executable code for performing the case
analysts. He then uses the transformation of procedure integration to expand the body of
the procedure in hne at each point of call. Conventional optimlzatton then simplifies the
resulting code.
Standish et al. [25] also utilized a version of this approach. Standish et al.'s model is
one of partial evaluation of the procedure body based on actual arguments of call,
followed by replacement of the call by the procedure body.
At Massachusetts Computer Associates we have had considerable success with this
approach. An apphcation described later used this techmque to produce optimized code
for high level string operators in a language for cryptographlc problems. The approach
has also been used at a low level to design a (more or less) machine independent code
generator. Procedures were devised to perform case analysts for language operators. For
example, a procedure (written very informally) for integer exponentlation might be
procedure exponenttate(~ , I),
if i = 2 k then
shtfl "1" k *1 bits left;
elsif/gs constant then
case(j)
6: (lz*z)2,
end case,
more valuable than space then m hne code;
else intrinsic functton reference;
elsif time
endif;
end,
Transformations were developed to handle sequencing and addressing tssues. For example, A * B - C * D on a one-accumulator machme which can subtract memory from
register but not register from memory should be rewritten as T ~ C , D ; A * B - T . Note
that thts is another example of a machine independent transformahon with a machine
dependent wm predtcate. Such sequencing transformations can have sigmficant payoff if
data objects are packed. For example, suppose m a language hke JOVXAL,I, J, and K are
12-bit packed integers starting at bit positions 0, 4, and 8, respectively. Consider the
computation of K ~-- I + K - J on a one-accumulator machine. If this is computed
naively as K ~ (I + K) - J, it wdl require two temporaries more, four more instructions
and shift 12 positions as opposed to 8, compared with K ~ (! - J) + K. Obviously
transformations are necessary for explication of lmphed scaling, field access, and mode
conversion. Also necessary ~s the exphcation of location and contents operauons. This
allows conventional optimization transformations to, for example, remove invariant load
register instructions out of loops.
The following is the sequence of transformations we have termed code generation:
systemahcally rename procedure locals
model parameter passing (substitution for call by name, assignment for call by value)
Program Improvement
by S o u r c e - t o - S o u r c e T r a n s f o r m a t t o n
139
simphfy procedure body
constant computation
expression simplification, including relatlonals and Booleans
control structure simplification
collapse loops
prune conditionals
reorder conditionals
substitute body of procedure for call
further optimlzations
Note now that further optimization will cause interactions of the procedure body with the
remaining program. For example, if the call occurred within a loop and the body contains
loop invariant code, that code will be removed from the loop.
There are at least two exceptions to code generation: If space is more important than
time it may be appropriate to leave the procedures intact, accessed via calls; or if
p a m c u l a r cunning algorithms (such as high level machine instructions) are available,
they may be appropriate to use. For example, the assignment A ~ B where A and B are
matrices is better implemented by a b l o c k transfer or m o v e character m s t r u c t t o n , if
available, than by expansion into the general loop form.
The following section continues a previous example down to a very low level. Further
transformations are possible, but not without introducing machine dependent operators.
Code Generation Example
Let us reconsider the previous example of matrix multiplication which we had transformed to
loop forl = 1 t o tO
loop forj = l t o 2 0 '
c[t, 1] = a[t, t]*b[i, 1],
repeat,
repeat,
We shall continue this example to show how source-to-source transformations (given a
rich enough source language) can assist in the remaining compilation and code generation. The underlying representation of an m by n matrix is a one-dimensional array of
m * n items, where selection of the [ i , j ] item of the matrix selects the [(i - 1)*n + j] item
of the underlying array, assuming row major storage. Letting u a , u b , and uc be the
underlying arrays for a, b, and c, respectively, we get
loop fort = 1 t o l O
loop for/ = l t o 2 0
uc[O - 1).20 +j] =ua[O - 1).10 +tl*ub[O - 1).20 +j],
repeat,
repeal,
With expression simplification and common subexpresslon eliminahon within the
inner loop body, the inner loop body becomes
tl
t2
= 20.~ +j
.= ll*t
-
-
20,
10,
uc[t l ] .= ua[t2 ]*ub[t l ],
Part of the t l computation and all of the t2 computation are mvarlant m the inner loop.
Performing code motion, we get
loop fort
= 1
tolO
t3
= 20*t
-
20,
t2
= ll*t
-
10,
Ioopfor/ = lto20,
tl = t3 +J,
uc[tl ] = ua[t2]*ubltl ],
repeat,
repeat,
140
DAVID B. LOVEMAN
A f t e r b o t h l o o p s h a v e b e e n a n a l y z e d for c o i n d e x e s , t h e c o m p u t a t i o n b e c o m e s
loop for t := 1 to 10 # t3 := 0 by 20 to 180 # t2 := 1 by 11 to 100:
Ioopforl := l t o 2 0 # t l . = t 3 +
l b y l t o t 3 + 20:
uc[tl] := ua[t2]*ub[tl];
repeat,
repeat,
T h e i a n d ! i n d e x e s are n o w d e a d a n d c a n b e e h m m a t e d :
loop for t3 -= 0 by 20 to180 # t2 = l b y l l t o l 0 0
loop f o r t l . = t 3 + l b y l t o t 3 + 2 0
uc[tl] = ua[t2]*ub[tl],
repeat,
repeat,
D e v o l u t i o n r e m o v e s t h e o n e r e m a i n i n g c o m d e x . N o t e t h a t t h e i n d e x sets of t3 a n d t2
c o n t a i n t h e s a m e n u m b e r of e l e m e n t s ; t h u s r a n g e t e s t i n g for t3 is n o t n e e d e d :
t3 = 0 ,
loop fort2 = l b y l l t o l O 0
loop fortl = t 3 + l t o t 3 + 2 0
uc[tl] .= ua[t2]*ub[tl],
repeat;
t 3 : = t 3 +20,
repeat,
U s i n g l o o p v a r i a b l e e h m i n a t i o n , we c a n r e p l a c e t l b y t3. A s a result, at t h e e n d o f t h e
i n n e r l o o p t3 is t o o l a r g e ; we c o m p e n s a t e b y s u b t r a c t i n g a c o m p e n s a t i n g a m o u n t . U s i n g
m d e x set shifting, we c a n h a v e t h e t2 l o o p final v a l u e b e z e r o , t h u s e a s i n g c o d e select.
T h i s r e q u i r e s t h a t r e f e r e n c e s to t2 b e r e p l a c e d by r e f e r e n c e s to t2 + 100:
t3 = 0,
loop for t2 = - 9 9 by 11 to 0
loop fort3 = t 3 + l b y 1 tot3 + 20
tl '= t3,
uc[tl] .= ua[t2 + lO0]*ub[tl],
repeat;
t3 = t3 - 21,
t 3 ' = t 3 + 20,
repeat,
Scalar p r o p a g a t i o n e l i m i n a t e s t h e r e m a i n i n g t l r e f e r e n c e s ; s u b s u m p t i o n a n d c o n s t a n t
c o m p u t a t i o n simplify t h e t3 c o m p u t a t i o n ; a n d d e v o l u t i o n e l i m i n a t e s t h e t u n e r i t e r a t i o n
loop:
t3 "= 0,
loop fort2 '= -99 by 11 to 0
t3 .=t3 + 1,
ft3 = t 3 + 20,
go to 1,
loop:
uc[t3] = ua[t2 + lO0]*ub[t3],
t3 = t 3 + 1,
l. while (t3 - ft3) _< 0 repeat,
t3 = t3 - 1,
repeat,
S u b s u m p t l o n b e t w e e n l o o p i t e r a t i o n s a g a i n simplifies t h e t3 c o m p u t a t i o n s :
t3 -= 1,
loop for t2 := -99 by 11 to 0:
ft3 := t3 + 20,
go to/,
loop:
uc[t3] "= ua[t2 + lO0]*ub[t3],
t3 := t3 + 1,
l "while (t3 - ft3) --< 0 repeat,
repeat,
Program Improvement
by Source-to-Source
Transformatton
141
Devolution eliminates the outer iteration loop:
t3 = I ,
t2 = -99,
go to ll,
loop.
ft3 = t 3 + 20,
go to l,
loop.
uc[t3] = ua[t2 + lO0]*ub[t3],
t3 "= t3 + 1,
l whileit3 - it3] -< 0 repeat,
t2 "= t2 + 11;
ll while t2 -< 0 repeat,
Finally devolution eliminates both Dahl loops:
t3 = 1,
t2 = - 9 9 ,
go to II,
label outer loop,
ft3 .= t3 + 20,
go t o / ,
label mner loop,
uc[t3] = ua[t2 + lO0]*ub[t3],
t3 .= t3 + 1,
label l,
if (t3 - f13) --<0 then go to tnner loop, endif,
t2 = t2 + 11,
l a b e l ll ,
ff t2 <-- 0 then go to o u t e r l o o p , endif,
The program is now in a form m which, with only reasonable assumptions about the
target machine, a naive code generation would produce good code. By "naive" we mean
that code generahon need look only at one statement at a time, not that tt need not be
clever. Indeed, it is very important that code generation know how to handle the obvious
special cases: load a register with a constant; increment a register by a constant; access an
indexed memory location; conditional transfer of control; etc.
Applicatton of Higher Level Opttmtzation
At Massachusetts Computer Assocmtes we have had considerable success using the
concepts of source-to-source transformattons in the manufacture of production software
as well as in the spectflcat~on and design of language processing systems. The I V T R A N
compder for Fortran for the I L L I A C IV uses source-to-source transformations in two
distinct ways. Powerful parallelism detectton algorithms can determine that certain
Fortran loops can be executed in parallel on the I L L I A C IV. The results of program
modification are reflected m the source by means of language extensions [21]. The
"paralyzed" source is available for a programmer to examine. In addttion, the conventional optimization phase is tmplemented in terms of source-to-source transformations
[20]. Since this phase operates on extended ANSI Fortran and since a transcriber
(unparser) was available, the parser, optimizer, and transcriber were packaged as a
Fortran Laundry. A similar implementation of Fortran optimization has been reported
by Schneck and Angel [23].
We have designed a language, OeaL, which is the U.S. Army's interim standard
language for automatic test equipment. The language has difficult compilation problems
associated with hardware device handling at a high level. The language was designed in
such a way that source-to-source transformations conveniently solve many of the language processing problems [19].
In previous sections we have seen how source-to-source transformations can remove
overhead in modular programming, perform much of code generation, and provide the
framework for an A P L compiler. Standish et al. [25] reports the use of source-to-source
142
DAVID B. LOVEMAN
transformation in estabhshlng that the result of multiplying two trmngular mamces Is a
triangular matrix. He terms this proof by program transformauon.
In order to improve the object code of an existing high level production language, the
techniques of source-to-source program transformation were used. This language is a
Fortran-hke programming language designed specifically for the processing of cryptologic problems. The language contains the following features:
(1) abdlty to define new character string alphabets;
(2) computational operations extended to character strings, such as character-bycharacter addition (modulo alphabet size);
(3) high level cryptologic operators, such as monographlc frequency count;
(4) abihty to extend operations to digraphs and, m general, k-graphs;
(5) explicit within-statement indexing over strings in a replacement statement or
conditional statement;
(6) complex relations over strings for use in conditional statements;
(7) compound indexing in loop statements and within-statement indexing including
multiple and parallel indexes.
To study this language we used a language laboratory consisting of a parser generator
and program mampulator to construct a model front end compiler for the language [5].
The Language Laboratory is a tool designed to assist m the development of optimization techniques for high level programming languages. It is written m BCPL and
designed so that experimental optimization techniques can be plugged m for testing and
analyzing. The laboratory Is capable of syntactically analyzing a program, creating a
structural representation of the program, and transcribing that representation into a new
source program. It also provides utilities to manipulate and print the structural representation so that experimental opUmization techniques can be quickly implemented, debugged, and analyzed. The laboratory is composed of five parts: Syntax Analyzer,
Working Tree and Table Generator, Transcriber, Utilmes, and Experiments. There is
also an interactive Command Interpreter to assist with on-line experimentation. The
Laboratory implementation was somewhat language dependent. It now appears that this
language dependence could have been removed from all parts of the laboratory except,
of course, the experiments themselves.
In this laboratory environment we were able at low cost to study the effect of various
transformations on typical source programs. Our results allowed us to make suggestions
for changes m the production compder. The introduction of high level loop opt~mJzat~ons
in the production compiler allowed certain critical language constructs to be executed up
to a factor of three faster.
The language proved to be an inhospitable target language for source-to-source
transformations. Some study revealed that several approprmte low level constructs were
not present and that the productaon compiler's code generator often missed many specml
cases. A number of simple changes were made m code generauon as a result of this
study. In addation to defining and tuning the nonpesslm~zing subset of the language, the
special-case analysas an code select also made a noticeable amprovement in the object
code.
As dascussed earlier, we found that specific h~gh level language features can be
handled in one of two ways: what we termed previously as code generation or cunning
algorithm design. The use of a language feature can be replaced by its implementation in
terms of lower level language features, thus exposing lower level code to further
optimization. This technique was used very successfully m the handhng of a large class of
complex string assignment operations and predicates over strings. It was also used to
handle logical operations in conditional statements and the processing of array references.
Alternatwely, cunning algorithms were sometimes dewsed to implement certain features. This was often machine dependent and was very useful when a given target
Program Improvement by Source-to-Source Transformation
143
machine had speclahzed, powerful instructions. For example, we designed a special
algorithm for a PL/1-1ike I N D E X function (to search a string for an instance of a second
string) which used the IBM 370 translate instruction and executed approximately six
t~mes faster than a conventionally implemented I N D E X function. We studied the
problem of computation on elements in strings and devised techmques for performing
certain computations one word at a time (packed) rather than one character at a time.
We studied the problem of providing efficient formatted output for languages using the
Fortran run time input-output package. The techniques developed allowed certain
programs to run up to three times faster than they had prior to the application of these
techmques.
Topics for Future Research
The use of source-to-source transformations appears to be very promising both for high
level program optimization and as a model of the bulk of the compilation process.
Obwously much work remains to be done. Imtial steps have been taken at gathering
information needed to evaluate the predicates in transformations [29, 14, 15]. Considerable effort is still needed in this area and in the specification of the predicates themselves.
Only a little work has been done on the ordering of transformations [20, 28]. Although a
large number of compilers have been constructed, and therefore the software engineering aspects have had to be addressed, viewing compilatmn as an integrated information
gathering and transformatmn system poses a number of interesting software engineering
problems. If the role of code generation m a compiler can in fact be reduced as much as
we hope, so that semiautomatic production of code generators becomes possible,
considerable work wdl be needed m this field.
The transformations themselves pose a number of problems: How does one precisely
define a transformation9 How can one prove that a given transformation preserves
program correctness? Is there a complete (m some sense) set of transformations for a
given language7 A r e there language independent transformatmns? Is a given set of
transformations closed? How can one concatenate numbers of related transformations to
form subsystems? Does the approach work well m all programmmg domains? How
difficult is it to write and debug the transformations themselves?
As has been pointed out earlier, the use of source-to-source transformations makes
demands on programming language features: The language must contain the features at
both a high and a quite low level to serve as target for the transformations. Clearly
research is needed in language design: Will a single langiiage with both high level and low
level features solve the problem? A n d if so, how does an installation restrict a user to the
preferred subset of the language? Alternatively, perhaps a hierarchy of languages from
high level to Fortran level to machine architecture level is the solution.
The use of a Language Laboratory in the development of transformation techniques
proved to be very valuable. A similar but more general and more powerful system should
be produced. Such an effort would have many interesting software engineering problems
of its own, and would provide a very powerful tool for future research. The technology of
informatmn gathering and the tools provided by the Language Laboratory make possible
the idea of constructing an interactive program tailoring system. Such a system would
allow a user to interact with and transform his program, taking advantage of known
constancies of data ("matrix A is lower left trmngular") or environment ("the compiled
version of this program will run on a C D C 7600"). Since the transformations would
preserve the correctness of his program, the user could concentrate on the Job of
tailoring his program so that it would meet his required space and time constraints.
Conclusions
The use of source-to-source transformations has already proved to be a valuable approach in the design and construction of several language processing systems. We have
144
DAVID B. LOVEMAN
found that, m a production compiler [20], when careful attention is patd to software
engineering issues, the cost of this approach in implementation efficiency is comparable
to a conventional approach. In addition, two significant benefits resulted: First, as a
result of the inherent modularization resulting from identifying particular transformations, modification was considerably easier. Second, since the various optimizatlons were
reflected directly m a representatton of the source, use of a transcriber or unparser
allowed the developers to directly view the results of optimizations, greatly easing the
system debugging problem.
The examples in thts paper show that the techniques can easily be used to provide
quite powerful high level program improvement. Indeed, source-to-source program
transformatzons appear to provide a coherent model of the compilation process for hzgh
level languages and for much of code generation and to be an implementation technique
for eliminating overhead in modular programming.
Thts view of program ~mprovement provides not only a large class of interesting
problems remaining to be solved, but also a useful framework for constructing language
processors today.
ACKNOWLEDGMENTS I would like to express my appreciation to Kirk Sattley for
numerous stimulatmg conversations, Mtchael Karr for dtscusstons on program informatton gathermg, Mat Myszewskt for his insights into code generation, and Ross Faneuf for
hts pragmatic software engineering approach to conventtonai optimtzation
REFERENCES
(Note.
1
2
3.
4
5
6
7
8
9.
10.
11
12
13
14
15
16
17
18
19
References [2-4, 6, 10, 11] are not cited m the text )
ABRAMS,P An APL machine, SU-SEL-70-017, Stanford Electron Lab Stanford, Cahf , Feb. 1970
ACM SIGPLAN Symposium on very high level languages SIGPLAN Nottces (ACM) 9, 4 (Aprd
1974)
ACM SIGPLAN. Proceedings of a symposmm on compder optJmizanon. SIGPLAN Notices (ACM) 5,
7 (July 1970)
ALLEN,F.E., AND COCKE, J A catalogue of optimizing transformattons In Destgn and Opttmtzatton of
Compders, R Rustm, Ed , Prenttce-Hall, Englewood Cliffs, N J , 1972, pp 1-30
BEARISTO,D , AND SATTERLY, K BETA laboratory Final Rep CADD-7312-3111, Mass Computer
Assocmtes, lnc , Wakefield, Mass., Dec 1973
BI..IRSTALL,R i , AND DARLINGTON, J A transformation system for developing recurslve programs J
ACM 24, 1 (Jan 1977), 44-67 (this issue)
CARTER,J L A case study of a new compdmg code generatton techmque RC 5666, IBM Thomas J
Watson Res Ctr , Yorktown Hezghts, N Y , Oct 1975
CHEATHAM,T.E., AND TOWNLEY. J A Symbohc evaluation of programs--a look at loop analysis Proc
1976 ACM Symp on Symbohc and Algebraic Comput , Aug 1976, pp 90-96
CHEATHAM, T E , AND WEGBREIT, B k laboratory for the study of automatic programming Proc.
AFIPS 1972 SJCC, Vol 40, AFIPS Press, Montvale, N J , pp 11-21.
GERHART, S L Correctness-preserving program transformations Conf. Rec Second ACM Syrup on
Principles of Programming Languages, Jan 1975. pp 54-66.
GESCHKE, C M Global program optlmlzauons Ph D Th , Comptr Sct Dep , Carnegie-Mellon U ,
Pittsburgh, Pa , 1972
HOARE,C A R Hints on programming language design STAN-CS-73-403, Comptr Sci Dep , Stanford U , Stanford, Cahf, Dec 1973
|NGALLS.D The execution time profde as a programming tool In Design and Optzmlzatton of Compders, R Rustm, Ed , Prentice-Hall, Englewood Cliffs, N J , 1972, pp. 107-128
KARR, M Gathering mformatton about programs CA-7507-1411, Mass Computer Associates, Inc ,
Wakefield, Mass , July 1975
KABR,M On affme relationships among varmbles of a program Acta lnformat:ca 6, 2 (1976), 133-152
KNOTH, D Structured programming with goto statements Computtng Surveys 6, 4 (Dec 1974), 261301
LAMPORT,L Parallel execution on array and vector computers Proc 1975 Sagamore Conf. on Parallel
Processing, Syracuse U , Aug 1975, pp 187-191
LAMPORT,L The parallel executmn of DO loops. Comm ACM 17, 2 (Feb 1974), 83-93
LOVEMAN,D An ATE language processing system Autotestcon 76 Formally Automatic Support Systems for Advanced Maintainability, IEEE, New York, 1976, pp 1-9
Program I m p r o v e m e n t by Source-to-Source Transformation
20
21
22
23
24
25
26
27
28.
29
30
31
145
LOVEMAN,D , AND FANEUF, R Program optimization--theory and practice Proc Conf. on Programruing Languages and Compilers for Parallel and Vector Machines SIGPLAN Notices (ACM) 10, 3
(March 1975), 97-102
PRESBERG.D , AND JOHNSON,N The paralyzer IVTRAN's parallehsm analyzer and synthesizer Proc
Conf on Programming Languages and Compilers for Parallel and Vector Machines. SIGPLAN Notices
(ACM) 10, 3 (March 1975), 9-16
SCHAEFER, M A Mathematical Theory of Global Program Optimization Prentice-Hall, Englewood
Cliffs, N J , 1973
SCHNECK,P B , AND ANGEL, E A FORTRAN to FORTRAN optimizing compiler Computer J 16, 4
(Nov 1973), 322-330
SHAPIRO,R M , AND SAINT, H The representation of algorithms Final Tech, Rep RADC-TR-69-313,
Applied Data Research, I n c . Vol II, Rome Air Develop Ctr.. Sept 1969
STANDISH, T A , HARR1MAN, D C , KIBLER, D F , AND NEIGHBORS, J M Improving and refimng
programs by program manipulation Proceedings 1976 ACM Annual Conl , Oct. 20-22, 1976, pp 509516
STANDISH, T , HARRIMAN, D , KIBLER, D . AND NEIGHBORS, J The lrvlne program transformation
catalogue Dep Inform and Comptr Scl, U of California at lrvlne, Irvlne, C a h f , Jan 1976
VANTASSEL.D Program Style, Design Efficiency, Debugging and Testing Prentice-Hall, Englewood
Chffs. N J , 1974
WEGBREIT, B Goal-directed program transformation IEEE Trans on Software Eng SE-2, 2 (June
1976), 69-80
WEGBREIT,B Property extraction m well-founded property sets IEEE Trans on Software Eng SE-1, 3
(Sept, 1975), 270-285
WEGBRE1T,B The ECL programming system Proc AF1PS FJCC, Vol 39, AF1PS Press, Montvale,
N J . 253-262
WULF, W,A , RUSSELL, D.B , AND HABERMANN, A M BLISS a language for systems programming
Comm ACM 14, 12 (Dec 1971), 780-790
RECEIVED MARCH
1976,
REVISED JULY
1976
Journal of The Assocmnonfor ComputingMachineryVol 24. No 1, January 1977
Download