Intermediate representations - Department of Computer Science

advertisement
Intermediate Representations
Saumya Debray
Dept. of Computer Science
The University of Arizona
Tucson, AZ 85721
The Role of Intermediate Code
source
code
lexical analysis
syntax analysis
static checking
intermediate
code
generation
tokens
final code
generation
final
code
intermediate
code
Intermediate Code
2
Why Intermediate Code?
• Closer to target language.
– simplifies code generation.
• Machine-independent.
– simplifies retargeting of the compiler.
– Allows a variety of optimizations to be implemented in
a machine-independent way.
• Many compilers use several different
intermediate representations.
Intermediate Code
3
Different Kinds of IRs
• Graphical IRs: the program structure is
represented as a graph (or tree) structure.
Example: parse trees, syntax trees, DAGs.
• Linear IRs: the program is represented as a list
of instructions for some virtual machine.
Example: three-address code.
• Hybrid IRs: combines elements of graphical and
linear IRs.
Example: control flow graphs with 3-address code.
Intermediate Code
4
Graphical IRs 1: Parse Trees
• A parse tree is a tree representation of a
derivation during parsing.
• Constructing a parse tree:
– The root is the start symbol S of the grammar.
– Given a parse tree for  X , if the next derivation step
is  X    1…n  then the parse tree is obtained
as:
Intermediate Code
5
Graphical IRs 2: Abstract Syntax Trees (AST)
A syntax tree shows the structure of a program by
abstracting away irrelevant details from a parse tree.
– Each node represents a computation to be performed;
– The children of the node represents what that
computation is performed on.
Intermediate Code
6
Abstract Syntax Trees: Example
Grammar :
Parse tree:
EE+T | T
TT*F | F
F  ( E ) | id
Input: id + id * id
Syntax tree:
Intermediate Code
7
Syntax Trees: Structure
• Expressions:
– leaves: identifiers or constants;
– internal nodes are labeled with
operators;
– the children of a node are its operands.
• Statements:
– a node’s label indicates what kind of
statement it is;
– the children correspond to the
components of the statement.
Intermediate Code
8
Graphical IRs 3: Directed Acyclic Graphs
(DAGs)
A DAG is a contraction of an AST that avoids
duplication of nodes.
•
•
reduces compiler memory requirements;
exposes redundancies.
E.g.: for the expression (x+y)*(x+y), we have:
AST:
DAG:
Intermediate Code
9
Linear IRs
• A linear IR consists of a sequence of instructions
that execute in order.
– “machine-independent assembly code”
• Instructions may contain multiple operations,
which (if present) execute in parallel.
• They often form a starting point for hybrid
representations (e.g., control flow graphs).
Intermediate Code
10
Linear IR 1: Three Address Code
• Instructions are of the form ‘x = y op z,’ where x, y,
z are variables, constants, or “temporaries”.
• At most one operator allowed on RHS, so no
‘built-up” expressions.
Instead, expressions are computed using temporaries
(compiler-generated variables).
• The specific set of operators represented, and
their level of abstraction, can vary widely.
Intermediate Code
11
Three Address Code: Example
• Source:
if ( x + y*z > x*y + z)
a = 0;
• Three Address Code:
t1 = y*z
t2 = x+t1
// x + y*z
t3 = x*y
t4 = t3+z
// x*y + z
if (t2  t4) goto L
a = 0
L:
Intermediate Code
12
An Example Intermediate Instruction Set
• Assignment:
• Procedure call/return:
– x = y op z (op binary)
– x = op y (op unary);
– x=y
• Jumps:
– if ( x op y ) goto L
label);
– goto L
• Pointer and indexed
assignments:
–
–
–
–
–
x = y[ z ]
y[ z ] = x
x = &y
x = *y
*y = x.
(L a
– param x, k
param)
– retval x
– call p
– enter p
– leave p
– return
– retrieve x
(x is the kth
• Type Conversion:
– x = cvt_A_to_B y (A, B base
types) e.g.: cvt_int_to_float
• Miscellaneous
– label L
Intermediate Code
13
Three Address Code: Representation
• Each instruction represented as a structure called a
quadruple (or “quad”):
– contains info about the operation, up to 3 operands.
– for operands: use a bit to indicate whether constant or Symbol
Table pointer.
E.g.:
if ( x  y ) goto L
x=y+z
Intermediate Code
14
Linear IRs 2: Stack Machine Code
• Sometimes called “One-address code.”
• Assumes the presence of an operand stack.
– Most operations take (pop) their operands from the stack and
push the result on the stack.
• Example: code for “x*y + z”
Stack machine code
Three Address Code
push x
tmp1 = x
push y
tmp2 = y
mult
tmp3 = tmp1 * tmp2
push z
tmp4 = z
add
tmp5 = tmp3 + tmp4
Intermediate Code
15
Stack Machine Code: Features
• Compact
– the stack creates an implicit name space, so many
operands don’t have to be named explicitly in
instructions.
– this shrinks the size of the IR.
• Necessitates new operations for manipulating
the stack, e.g., “swap top two values”, “duplicate
value on top.”
• Simple to generate and execute.
• Interpreted stack machine codes easy to port.
Intermediate Code
16
Linear IRs 3: Register Transfer Language (GNU RTL)
• Inspired by (and has syntax resembling) Lisp
lists.
• Expressions are not “flattened” as in threeaddress code, but may be nested.
– gives them a tree structure.
• Incorporates a variety of machine-level
information.
Intermediate Code
17
RTLs (cont’d)
Low-level information associated with an RTL
expression include:
• “machine modes” – gives the size of a data
object;
• information about access to registers and
memory;
• information relating to instruction scheduling and
delay slots;
• whether a memory reference is “volatile.”
Intermediate Code
18
RTLs: Examples
Example operations:
– (plus:m x y), (minus:m x y), (compare:m x y), etc.,
where m is a machine mode.
– (cond [test1 value1 test2 value2 …] default)
– (set lval x)
(assigns x to the place denoted by lval).
– (call func argsz), (return)
– (parallel [x0 x1 …]) (simultaneous side effects).
– (sequence [ins1 ins2 … ])
Intermediate Code
19
RTL Examples (cont’d)
• A call to a function at address a passing n bytes
of arguments, where the return value is in a
(“hard”) register r:
(set (reg:m r) (call (mem:fm a) n))
– here m and fm are machine modes.
• A division operation where the result is truncated
to a smaller size:
(truncate:m1 (div:m2 x (sign_extend:m2 y)))
Intermediate Code
20
Hybrid IRs
• Combine features of graphical and linear IRs:
– linear IR aspects capture a lower-level program
representation;
– graphical IR aspects make control flow behavior
explicit.
• Examples:
– control flow graphs
– static single assignment form (SSA).
Intermediate Code
21
Hybrid IRs 1: Control Flow Graphs
Definition: A control flow graph for a function is a directed
graph G = (V, E) such that:
– each v  V is a straight-line code sequence (“basic block”); and
– there is an edge a  b  E iff control can go directly from a to b.
Example:
L1: if x > y goto L0
t1 = x+1
x = t1
L0: y = 0
goto L1
Intermediate Code
22
Basic Blocks
•
Definition: A basic block B is a sequence of
consecutive instructions such that:
1. control enters B only at its beginning; and
2. control leaves B only at its end (under normal
execution); and
•
This implies that if any instruction in a basic
block B is executed, then all instructions in B
are executed.
 for program analysis purposes, we can treat a
basic block as a single entity.
Intermediate Code
23
Identifying Basic Blocks
1. Determine the set of leaders, i.e., the first
instruction of each basic block:
–
–
–
the entry point of the function is a leader;
any instruction that is the target of a branch is a
leader;
any instruction following a (conditional or
unconditional) branch is a leader.
2. For each leader, its basic block consists of:
–
–
the leader itself;
all subsequent instructions upto, but not including,
the next leader.
Intermediate Code
24
Example
int dotprod(int a[], int b[], int N)
{
int i, prod = 0;
for (i = 1; i  N; i++) {
prod += a[i]b[i];
}
return prod;
}
No.
Instruction
leader?
Block No.
Y
1
1
enter dotprod
2
prod = 0
1
3
i = 1
1
4
t1 = 4*i
5
t2 = a[t1]
2
6
t3 = 4*i
2
7
t4 = b[t3]
2
8
t5 = t2*t4
2
9
t6 = prod+t5
2
10
prod = t6
2
11
t7 = i+i
2
12
i = t7
2
13
if i  N goto 4
2
14
retval prod
15
leave dotprod
3
16
return
3
Intermediate Code
Y
Y
2
3
25
Hybrid IRs 2: Static Single Assignment Form
• The Static Single Assignment (SSA) form of a
program makes information about variable
definitions and uses explicit.
– This can simplify program analysis.
• A program is in SSA form if it satisfies:
– each definition has a distinct name; and
– each use refers to a single definition.
• To make this work, the compiler inserts special
operations, called -functions, at points where
control flow paths join.
Intermediate Code
26
SSA Form:  - Functions
• A -function behaves as follows:
x1 = …
x2 = …
x3 =  (x1, x2)
This assigns to x3 the value of x1, if control comes from
the left, and that of x2 if control comes from the right.
• On entry to a basic block, all the -functions in the block
execute (conceptually) in parallel.
Intermediate Code
27
SSA Form: Example
Example:
Original code
Code in SSA form
Intermediate Code
28
Download