Advanced Compiler Design and Implementation

advertisement
Advanced Compiler Design and Implementation
Mooly Sagiv
Course notes
Case Studies of Compilers and Future
Trends
(Chapter 21)
By Ronny Morad & Shachar Rubinstein
24/6/2001
Case Studies
This chapter deals with real-life examples of compilers. For each compiler this scribe
will discuss three subjects:
 A brief history of the compiler.
 The structure of the compiler with emphasis on the back end.
 Optimization performed on two programs. It must be noted that this
test can’t be used to measure and compare the performance of the
compilers.
The compilers examined are the following:
 SUN compilers for SPARC 8, 9
 IBM XL compilers for Power and PowerPC architectures. The Power
and PowerPC are classes of architectures. The processors are sold in
different configurations.
 Digital compiler for Alpha. The Alpha processor was bought by Intel.
 Intel reference compiler for 386
Historically, compilers were built for specific processors. Today, it is not so obvious.
Companies use other developers’ compilers. For example, Intel uses IBM’s compiler
for the Pentium processor.
The compilers will compile two programs: a C program and a Fortran 77 program.
The C program
int length, width, radius;
enum figure {RECTANGLE, CIRCLE} ;
main()
{
int area = 0, volume = 0, height;
enum figure kind = RECTANGLE;
for (height=0; height < 10; height++)
{
if (kind == RECTANGLE) {
area += length * width;
volume += length * width * height;
}
else if (kind == CIRCLE){
area += 3.14 * radius * radius ;
volume += 3.14 * radius * height;
}
}
process(area, volume);
}
Possible optimizations:
1. The value of ‘kind’ is constant and equals to ‘RECTANGLE’.
Therefore the second branch, the ‘else’ part is dead code and the first
‘if’ is also redundant.
2. ‘length * width’ is loop invariant and can be calculated before the loop.
3. Because ‘length * width’ is loop invariant, area can be calculated
simply by using a single multiplication. Specifically, 10 * length *
width.
4. The calculation of ‘volume’ in the loop can be done using addition
instead of multiplication.
5. The call to ‘process()’ is a tail-call. The fact can be used to prevent to
need to create a stack frame.
6. Compilers will probably use loop unrolling to increase pipeline
utilization.
Note: Without the call to ‘process()’, all the code is dead, because ‘area’ and ‘volume’
aren’t used.
The Fortran 77 program:
integer a(500, 500), k, l;
do 20 k=1,500
do 20 l=1,500
a(k, l)= k+l
20 continue
call s1(a, 500)
end
subroutine s1(a,n)
integer a(500, 500), n
do 100 i = 1,n
do 100 j = i + 1, n
do 100 k = 1, n
l = a(k, i)
m = a(k, j)
a(k, j) = l + m
100 continue
end
Possible optimizations:
1. a(k,j) is calculated twice. This can be prevented by using commonsubexpression elimination.
2. The call to ‘s1’ is a tail-call. Because the compiler has the source of
‘s1’, it can be inlined at the main procedure. This can be used to further
optimize the resulting code. Most compilers will leave the original
copy of ‘s1’ intact.
3. If the procedure is not inlined, interprocedural constant propagation
can be used to find out that ‘n’ is a constant equals 500.
The access to ‘a’ is calculated using multiplication. This can be
averted using addition. The compiler “knows” how the array will be
realized in memory. For example, in Fortran, arrays are ordered by
columns. So it can add the correct number of bytes every time to the
address, instead of recalculating.
5. After 4, the counters aren’t needed and the conditions in the loop can
be replaced by testing the address. That’s done using linear test
replacement.
6. Again, loop unrolling will be used according to the architecture.
4.
Sun SPARC
The SPARC architecture
SPARC has two major versions of the architecture, Version 8 and Version 9.
The SPARC 8 has the following features:
 32 bit RISC superscalar system with pipeline.
 Integer and floating point units.
 The integer unit has a set of 32-bit general registers and executes load,
store, arithmetic, logical, shift, branch, call and system-control
instructions. It also computers addresses (register + register or register
+ displacement).
 The floating-point unit has 32 32-bit floating-point data register and
implements the ANSI/IEEE floating-point standard.
 There are 8 general purpose integer registers (from the integer unit).
The first has a constant value of zero (r0=0).
 Three address instructions at the following form: Instruction
Src1,Src2,Result.
 Several 24 register windows (spilling by OS). This is used to save on
procedure calls. When there aren’t enough registers, the processor
sends an interrupt and the OS handles saving the registers to memory
and refilling them with the necessary values.
SPARC 9 is a 64-bit version, fully upward-compatible with Version 8.
The assembly language guide is on pages 748-749 of the course book, tables A.1, A.2,
A.3.
The SPARC compilers
General
Sun SPARC compilers originated from the Berkeley 4.2 BSD UNIX software
distribution and have been developed at Sun since 1982. The original back end was
for the Motorola 68010 and was migrated successively to later members of the
M68000 family and then to SPARC. Work on global optimization began in 1984 and
on interprocedural optimization and parallelization in 1989. The optimizer is
organized as a mixed model. Today Sun provides front-ends, and thus compilers, for
C, C++, Fortran 77 and Pascal.
The structure
Front End
Sun IR
yabe
Relocatable
Automatic inliner
Aliaser
Iropt
(global optimization)
Sun IR
Code
generator
Relocatable
The four compilers: C, C++, Fortran 77 and Pascal, share the same back-end. The
front-end is Sun IR, an intermediate representation discussed later. The back end
consists of two parts:
 Yabe – “Yet Another Back-End”. Creates a relocatable code without
optimization.
 An optimizer.
The optimizer is divided to the following:
 The automatic inliner. This part works only on optimization level 04
(discussed later). It replaces some calls to routines within the same
compilation unit with inline copies of the routines’ body. Next, tailrecursion elimination is preformed and other tail calls are marked for
the code generator to optimize.
 The aliaser. The aliaser used information that is provided by the
language specific front-end to determine which sets of variables may,
at some point in the procedure, map to the same memory location. The
aliaser aggressiveness is determined on the optimization level. Aliasing
information is attached to each triple that requires it, for use by the
global optimizer.
 IRopt, the global optimizer
 The code generator.
The Sun IR
The Sun IR represents a program as a linked list of triples representing executable
operations and several tables representing declarative information. For example:
ENTRY “s1_” {IS_EXT_ENTRY, ENTRY_IS_GLOBAL}
GOTO LAB_32
LAB_32:
LTEMP.1 = (.n { ACCESS V41} );
i=1
CBRANCH (i <= LTEMP.1, 1: LAB_36, 0: LAB_35);
LAB_36:
LTEMP.2 = (.n { ACCESS V41} );
j=i+1
CBRANCH (j <= LTEMP.2, 1: LAB_41, 0: LAB_40);
LAB_41:
LTEMP.3 = (.n { ACCESS V41} );
k=1
CBRANCH (k <= LTEMP.3, 1: LAB_46, 0: LAB_45);
LAB_46:
l = (.a[k, i] ACCESS V20} );
m = (.a[k, j] ACCESS V20});
*(a[k,j] = l+m {ACCESS V20, INT});
LAB_34:
k = k+1;
CBRANCH(k>LTEMP.3, 1: LAB_45, 0: LAB_46);
LAB_45:
j = j+1;
CBRANCH(j>LTEMP.2, 1: LAB_40, 0: LAB_41);
LAB_40:
i = i+1;
CBRANCH(i>LTEMP.1, 1: LAB_35, 0: LAB_36);
LAB_35:
The CBRANCH is a general branch, not attached to the architecture. It provides two
branches, the first when the expression is correct and the second when not.
This IR is somewhere between LIR and MIR. It isn’t LIR because there are no
registers. It isn’t MIR because there is access to memory using the compiler memory
organization, the use of LTEMP.
Optimization levels
There are four optimization levels:
01
Limited optimizations. This level invokes only certain optimization
components of the code generator.
02
This and higher levels invoke both the global optimizer and the optimizer
components of the code generator. At this level, expressions that involve global or
equivalent variables, aliased local variables’ or volatile variables are not candidates
for optimization. Automatic inlining, software pipelining, loop unrolling, and the
early phase of instruction scheduling are not done.
03
This level optimizes expressions that involve global variables but make worstcase assumptions about potential aliases caused by pointers and omits early
instruction scheduling and automatic inlining. This level gives the best results.
04
This level aggressively tracks what pointers may point to’ making worst-case
assumptions only where necessary. It depends on the language-specific front ends to
identify potentially aliased variables, pointer variables, and a worst-case set of
potential aliases. It also does automatic inlining and early instruction scheduling. This
level turned out to be very problematic because of bugs in the front-ends.
The global optimizer
The optimizer input is Sun IR and the output is Sun IR.
The global optimizer performs the subsequent on that input:
 Control-flow analysis is done by identifying dominators and back
edges, except that the parallelizer does structural analysis for its own
purposes.
 The parallelizer searches for commands the processor can execute in
parallel. Practically, it doesn’t improve execution time (The alpha
processor is where it has an effect, if any). Most of the time it is just
for not interrupting the processor’s parallelism.
 The global optimizer processes each procedure separately, using basic
blocks. It first computes additional control-flow information. In
particular, loops are identified at this point, including both explicit
loops (for example, ‘do’ loops in Fortran 77) and implicit ones
constructed from ‘if’s and ‘goto’s.
 Then a series of data-flow analysis and transformations is applied to
the procedure. All data-flow analysis is done iteratively. Each
transformation phase first computes (or recomputes) data-flow
information if needed. The transformations are preformed in this order:
1. Scalar replacement of aggregates and expansion of Fortran
arithmetic on complex numbers to sequences of real-arithmetic
operations.
2. Dependence-based analysis and transformations (levels 03 and
04 only, as described below).
3. Linearization of array addresses.
4. Algebraic simplification and reassociation of address
expressions.
5. Loop invariant code motion.
6. Strength reduction and induction variable removal.
7. Global common-subexpression elimination.
8. Global copy and constant propagation.
9. Dead-code elimination
 The dependence based analysis and transformation phase is designed
to support parallelization and data-cache optimization and may be done
(under control of a separate option) when the optimization level
selected is 03 or 04. The steps comprising it (in order) are as follows:
1. Constant propagation
2. Dead-code elimination
3. Structural control flow analysis
4. Loop discovery (including determining the index variables,
lower and upper bounds and increment).
5. Segregation of loops that have calls and early exits in their
bodies.
6. Dependence analysis using GCD and Banerjee-Wolfe tests,
producing direction vectors and loop-carried scalar du- and
ud-chains.
7. Loop distribution.
8. Loop interchange.
9. Loop fusion.
10. Scalar replacement of array elements.
11. Recognition of reductions.
12. Data-cache tiling.
13. Profitability analysis for parallel code generation.
The code generator
After global optimization has been completed, the code generator first translates the
Sun IR code input to it to a representation called ‘asm+’ that consists of assemblylanguage instructions and structures that represent control-flow and data dependence
information. An example is available on page 712. The code generator then performs
a series of phases, in the following order:
1. Instruction selection.
2. Inline of assembly language templates whose computational impact is
understood (02 and above).
3. Local optimizations, including dead-code elimination, straightening,
branch chaining, moving ‘sethis’ out of loops, replacement of
branching code sequences by branchless machine idioms, and
communing of condition-code setting (02 and above).
4. Macro expansion, phase 1 (expanding of switches and a few other
constructs).
5. Data-flow analysis of live variables (02 and above).
6. Software pipelining and loop unrolling (03 and above).
7. Early instruction scheduling (04 only).
8. Register allocation by graph coloring (02 and above).
9. Stack frame layout
10. Macro expansion, phase 2 (Expanding of memory-to-memory moves,
max, min, comparison of value, entry, exit, etc.). Entry expansion
includes accommodating leaf routines and generation of positionindependent code.
11. Delay-slot filing.
12. Late instruction scheduling
13. Inlining of assembly language templates whose computational impact
is not understood (02 and above).
14. Macro expansion, phase 3 (to simplify code emission).
15. Emission of relocatable object code.
The Sun compiling system provides for both static and dynamic linking. The selection
is done by a link-time option.
Compilation results
The assembly code for the C program appears in the book on page 714. The code was
compiled using 04 optimization.
The assembly code for the Fortran 77 program appears in the book on pages 715-716.
The code was compiled using 04 optimization.
The numbers in parentheses are according to the numbering of possible optimizations
for each program.
Optimizations performed on the C program







(1) The unreachable code in ‘else’ was removed, except for π, which is
still loaded from .L_const_seg_900000101 and stored at %fp-8.
(2) The loop invariant ‘length * width’ has been removed from the
loop (‘smul %o0,%o1,%o0’).
(4) Strength reduction of “height”. Instead of multiplying by ‘height’,
addition of previous value is used.
(6) Loop unrolling by factor of four. (‘cmp %lo,3’)
Local variables in registers.
All computations in registers.
(5) Identifying tail call and optimizing it by eliminating the stack
frame.
Missed optimizations on the C program



Removal of  computation.
(3) Compute area in one instruction.
Completely unroll the loop. Only the first 8 iterations were unrolled.
Optimizations performed on the Fortran 77 program





(2) Procedure integration of s1. The compiler can make use of the fact
that n=500 to unroll the loop, which it did.
Common subexpression elimination of ‘a[k,j]’
Loop unrolling, from label .L900000112 to .L900000113.
Local variables in registers
Software pipelining. Note, for example, the load just above the starting
label of the loop.
An example for software pipelining:
When running the following commands, assuming all depend on each other:
Load
Add
Store
The add can’t be started until load is finished and ‘store’ can’t be started until ‘add’ is
finished. The compiler can improve this code by writing the following:
Load
*Load
Add
*Store
Store
The compiler inserts here the commands with * needed later. This way, when ‘add’
will start execution, the result of the first load will be available. Same for ‘store’ and
‘add’ respectively.
Missed optimizations on the Fortran 77 program


Eliminating s1. The compiler produced code for ‘s1()’ although the
main routine is the only one calling ‘s1()’.
Eliminating addition in the loop via linear function test replacement.
This would have eliminated one of the additions in the resulting code.
POWER/PowerPC
The POWER/PowerPC architecture
The POWER architecture is an enhanced 32-bit RISC machine with the following
features:
 It consists of branch, fixed-point, floating-point and storage-control
processors.
 Individual implementations may have multiple processors of each sort,
except that the registers are shared among them and there may be only
one branch processor in a system. That is, a processor is configurable
and may be purchased with different number of processors.
 The branch processor includes the condition, link and count registers
and executes conditional and unconditional braches and calls, system
calls and condition register move and logical operations.
 The fixed-point processor contains 32 32-bit integer general purpose
registers, with register ‘gr0’ delivering the value zero when used as an
operand in an address computation. (gr0=0). It implements loads and
stores, arithmetic, logical, compare, shift, rotate and trap instructions.
It also implements system control instructions. There are two modes of
addressing: register + register or register + displacement, plus the
capability to update the base register with the computed address.
 The floating-point processor contains 32 64-bit data register and
implements the ANSI/IEEE floating-point standard for doubleprecision values only.
 The storage-control processor provides for segmented main-storage,
interfaces with caches and translation look-aside buffer and does
virtual address translation.
 The instructions typically have three operands, two sources and one
result. The order is opposite to SPARC, first the result and then the
operands: Instructions result, src1, src2.
The PowerPC architecture is a nearly upward compatible extension of POWER that
allows for 32- and 64-bit implementations. It isn’t 100% compatible because, for
example, some instructions, which were troublesome corner cases, have been made
invalid.
The assembly language guide is on page 750 of the course book, table A.4.
The IBM XL compilers
General
The compilers for these architectures are known as the XL family. The XL family
originated in 1983, as a project to provide compiler to an IBM RISC architecture that
was an intermediate stage between the IBM 801 and POWER, but that was never
released as a product. It was an academic project. The first compilers created were an
optimizing Fortran compiler for the PC RT that was release to a selected few
customers and a C compiler for the PC RT used only for internal IBM development.
The compilers were created with interchangeable back ends, so today they generate
code for POWER, Intel 386, SPARC and PowerPC. The compilers were written in
PL.8.
The compilers don’t perform interprocedural optimizations. Almost all optimization
are preformed on a proprietary low level IR, called “XIL”. Some optimizations, which
require higher level IR, for example, optimizations on arrays, are performed on YIL, a
higher level representation. It’s created from XIL.
The structure
Translator
XIL
Optimizer
XIL
Instruction
scheduler
Register allocator
Instruction
scheduler
Root services
XIL
Instruction
selection
XIL
Final assembly
relocatable
Each compiler consists of a front end called a translator, a global optimizer, an
instruction scheduler, a register allocator. An instruction selector and a phase called
final assembly that produces the relocatable image and assembly language listings.
The root services module interacts with all phases and serves to make compilers
compatible with multiple operating systems by, for example, holding information
about how to produce listings and error messages.
The translator and XIL
A translator converts the source language to XIL using calls to XIL library routines.
The XIL generation routines do not merely generate instructions. They may perform a
few optimizations, for example, generate a constant in place of an instruction that
would compute the constant. A translator may consist of a front end that translates a
source language to a different IR language, followed by a translator from the other
intermediate form to XIL.
The XIL compilation unit uses data structures illustrated on page 720. The illustration
shows the relationships among the structures. It may save memory space while
compiling but it makes debugging the compiler more difficult. The data structures are:
 A procedure descriptor table that holds information about each
procedure, such as the size of its stack frame and information about
global variables it affects, and a pointer to the representation of its
code.
 A procedure list. The code representation of each procedure consists of
a procedure list that comprises pointers to the XIL structures that
represent instructions. The instructions are quite low level and sourcelanguage independent.
 Computation table. Each instruction is represented as an entry in this
table. The computation table is an array of variable length records that
represent preorder traversals of the intermediate code for the
instructions.
 Symbolic register table. Variables and intermediate results are
represented by symbolic registers, each comprises an entry in this
table. Each entry points to the computation table entry that defines it.
An example of XIL is on page 721.
TOBEY
The compiler back end (all the phases except the source to XIL translator) is named
TOBEY, an acronym for TOronto Back End with Yorktown, indicating the heritage
of the two group which created the back end.
The TOBEY optimizer
The optimizer does the subsequent:
 YIL is used for storage-related optimization.
o YIL is created by TOBEY from XIL and includes; in
addition to the structures in XIL, representations for
looping constructs, assignment statements, subscripting
operation, and conditional control flow at the level of
‘if’ statements.
o It also represent the code is SSA form.
o The goal is to produce code that is appropriate for
dependence analysis and loop transformations.
o After the analysis and transformations, the YIL is
translated back to XIL.
 Alias information is provided by the translator to the optimizer by calls
from the optimizer to front end routines.
 Control flow uses basic blocks. It builds the flow graph within a
procedure, uses DFS to construct a search tree and divides it into
intervals.


Data flow analysis is done by interval analysis. It’s an older method
that the dominator method for finding loops. The iterative form is used
for irreducible intervals.
Optimization is preformed on each procedure separately.
The register allocator
TOBEY includes two register allocators:
 A “quick and dirty” local , used when optimization is not requested.
 A graph-coloring global based on Chatin’s, but with spilling done in
the style of Brigg’s work.
The instruction scheduler



Performs basic-block and branch scheduling.
Performs global scheduling.
Run after register allocations if any spill code has been generated.
The final assembly
The final assembly phase does 2 passes over the XIL:
 peephole optimizations – removing compares.
 generate relocatable image and listings.
Compilation results
The assembly code for the C program appears in the book on page 724.
The assembly code for the Fortran 77 program appears in the book on pages 724-725.
The numbers in parentheses are according to the numbering of possible optimizations
for each program.
Optimizations performed on the C program





(1) The constant value of kind has been propagated into the conditional
and the dead code eliminated.
(2) The loop invariant length*width has been removed from the loop.
(6) the loop has been unrolled by factor of two.
the local variables have been allocated to registers.
instruction scheduling has been performed.
Missed optimizations on the C program


(5) tail call to process().
(4) accumulation of area has not been turned into a single
multiplication
Optimizations performed on the Fortran 77 program





(3) find out that n=500.
(1) common sub-expression elimination of a[k,j].
(6) The inner loop has been unrolled by a factor of two.
The local variables have been allocated to registers.
Instruction scheduling has been performed.
Missed optimizations on the Fortran 77 program


(2) The routine s1() has not been inlined.
(5) Eliminating addition in loop via linear function test replacement.
Intel 386
The Intel 386 architecture
The Intel 386 architecture includes the Intel 386 and its successors, the 486, Pentium,
Pentium Pro and so on. The architecture is a thoroughly CISC design, however some
implementations utilize RISC principles such as pipelining and superscalarity.
It has the following characteristics:
 There are eight 32-bit integer registers.
 It supports 16 and 8 bit registers.
 There are six 32-bit segment registers for computing addresses.
 Some registers have dedicated purposes (e.g. point to the top of the
current stack frame).
 There are many addressing modes.
 There are eight 80-bit floating point regisers.
The assembly language guide is on page 752-753 of the course book, tables A.7 and
A.8.
The Intel compilers
Intel provides compilers from C, C++, Fortran 77 and Fortran 90 for the 386
architecture family.
The structure of the compilers, which use the mixed model of optimizer organization,
is as follows:
Front end
IL-1
Interprocedural
optimizer
IL-1 + IL-2
Memory
optimizer
IL-1 + IL-2
Global
optimizer
IL-1 + IL-2
Code selector
Register
allocator
Instruction
scheduler
Relocatable
Code genrator
The fron-end is derived from work done at Multiflow and the Edison Design Group.
The fron-ends produce a medium-level intermediate code called IL-1.
The interprocedural optimizer operates accross modules. It performs a series of
optimizations that include inlining, procedure cloning, parameter substitution, and
interprocedural constant propagation.
The output of the interprocedural optimizer is a lowered version of IL-1, called IL-2,
along with IL-1’s program-structure information; this intermediate form is used for
the remainder of the major components of the compiler, down through input to the
code generator.
The memory optimizer improves use of memory and caches mainly by performing
loop transformations.It first does SSA-based sparse conditional constant propagation
and then data dependence analysis.
The global optimizer does the following optimizations:
 constant propagation
 dead-code elimination
 local common subexpression elimination
 copy propagation
 partial-redundancy elimination
 a second pass of copy propagation
 a second pass of dead-code elimination
Compilation results
The assembly code for the C program appears in the book on page 741.
The assembly code for the Fortran 77 program appears in the book on pages 742-743.
The numbers in parentheses are according to the numbering of possible optimizations
for each program.
Optimizations performed on the C program





(1) The constant value of kind has been propagated into the conditional
and the dead code eliminated.
(2) the loop invariant length*width has been removed from the loop.
strength-reduction of height.
the local variables have been allocated to registers.
instruction scheduling has been performed.
Missed optimizations on the C program



(6) loop unroll.
(5) tail-call optimization.
(3) accumulation of area into a single multiplication.
Optimizations performed on the Fortran 77 program




(2) s1() has been inlined, and therefore it is found out that n=500.
(1) common subexpression elimination of a[k,j]
(5) linear-function test replacement
local variables allocated to regisers
Missed optimizations on the Fortran 77 program

(6) loop unroll
Compilers comparison
The performance of each
following table:
optimization
constant propagation of
king
dead-code elimination
loop-invariant code
motion
strength-reduction of
height
reduction of area
computation
loop unrolling factor
rolled loop
regiser allocation
instruction scheduling
stack frame eliminated
tail call optimized
of the compilers on the C example is summarized in the
The performance of each
the following table:
optimization
address of a(i) a
common subexpression
precedure integration of
s1()
loop unrolling factor
rolled loop
instructions in innermost
loop
linear-function test
replacement
software pipelining
register allocation
instruction scheduling
elimination of s1()
subroutine
of the compilers on the Fortran example is summarized in
Sun SPARC
yes
yes
IBM XL
Intel 386 family
yes
almost all
yes
yes
yes
yes
yes
yes
yes
yes
no
no
no
4
yes
yes
yes
yes
yes
2
yes
yes
yes
no
no
none
yes
yes
yes
no
no
Sun SPARC
yes
yes
IBM XL
Intel 386 family
yes
yes
no
yes
4
yes
21
2
yes
9
none
yes
4
no
no
yes
yes
yes
yes
no
no
yes
yes
no
no
yes
yes
no
Future trends
There are several clear main trends developing for the near future of advanced
compiler design and implementation:
 SSA is being uses more and more:
o allows methods designed to basic blocks & extended
basic blocks to be applied to whole procedures
o improves performance





more use of partial-redundancy elimination
partial-redundancy elimination & SSA are being combined
scalar-oriented optimizations integrated with parallelization and
vectorization.
advance in data-dependence testing, data-cache optimization and
software pipelining
The most active research in scalar compilation will continue to be
optimization.
Other trends




More and more work will be shifted from hardware to compilers.
More advanced hardware will be available.
Higher order programming languages will be used:
o Memory management will be simpler
o Modularity facilities will be available.
o Assembly programming will hardly be used.
Dynamic (runtime) compilation will become more significant.
Theoretical Techniques in Compilers
technique
data structures
automata algorithms
graph algorithms
linear programming
Diophantic equations
random algorithms
compiler field
all
front-end, instruction selection
control-flow analysis, data-flow analysis, register
allocation
instruction selection (complex machines)
parallelization
not use yet
Download