Click Here to

advertisement
6. Semantic Analysis
• Semantic Analysis Phase
– Purpose: compute additional information needed for
compilation that is beyond the capabilities of ContextFree Grammars and Standard Parsing Algorithms
– Static semantic analysis: Take place prior to execution
(Such as building a symbol table performing type
inference and type checking)
• Classification
– Analysis of a program required by the rules of the
programming language to establish its correctness and
to guarantee proper execution
– Analysis performed by a compiler to enhance the
efficiency of execution of the translated program
• Description of the static semantic analysis
– Attribute grammar
• identify attributes of language entities that must be
computed and to write attribute equations or semantic
rules that express how the computation of such
attributes is related to the grammar rules of the
language.
– Which is most useful for languages that obey the principle
of Syntax-Directed Semantics
– Abstract syntax as represented by an abstract syntax tree
Contents
6.1 Attributes and Attribute Grammars
6.3 The Symbol Table
6.4 Data Types and Type Checking
6.1 Attributes and Attribute
Grammars
• Attributes
– Any property of a programming language construct such as
•
•
•
•
•
The data type of a variable
The value of an expression
The location of a variable in memory
The object code of a procedure
The number of significant digits in a number
• Binding of the attribute
– The process of computing an attribute and associating its
computed value with the language construct
• Binding time
– The time during the compilation/execution process when the
binding of an attribute occurs
– Based on the difference of the binding time, attributes is divided
into Static attributes (be bound prior to execution) and Dynamic
attributes (be bound during execution)
• Example: The binding time and significance during
compilation of the attributes.
– Attribute computations are extremely varied
– Type checker
• A Type Checker is a semantic analyzer that computes the data type
attribute of all language entities for which data types are defined and
verifies that these types conform to the type rules of the language.
• In a language like C or Pascal, is an important part of semantic
analysis; .
• While in a language like LISP , data types are dynamic, LISP compiler
must generate code to compute types and perform type checking
during program execution.
– The values of expressions
• Usually dynamic and the be computed during execution;
• But sometime can also be evaluated during compilation (constant
folding).
6.1.1 Attribute Grammars
• In syntax-directed semantics, attributes are associated
directly with the grammar symbols of the language.
• X.a means the value of ‘a’ associated to ‘X’
– X is a grammar symbol and a is an attribute associated to X
• Syntax-directed semantics:
– Attributes are associated directly with the grammar symbols
of the language.
– Given a collection of attributes a1, …, ak, it implies that for
each grammar rule X0X1X2…Xn (X0 is a nonterminal),
– The values of the attributes Xi.aj of each grammar symbol Xi
are related to the values of the attributes of the other
symbols in the rule.
• An attribute grammar for attributes a1, a2, … ,ak is the
collection of all attribute equations or semantic rules
of the following form,
– for all the grammar rules of the language.
Xi.aj = fij(X0.a1,…,X0.ak, …,X1.al,….. X1.ak …, Xn-1.a1, …Xn.ak)
• Where fij is a mathematical function of its arguments
• Typically, attribute grammars are written in tabular
form as follows:
Grammar Rule
Rule 1
...
Rule n
Semantic Rules
Associated attribute equations
Associated attribute equation
Example 6.1 consider the following simple grammar for unsigned numbers:
Number  number digit | digit
Digit  0|1|2|3|4|5|6|7|8|9
The most significant attribute: numeric value (write as val), and the responding
attribute grammar is as follows:
Grammar Rule
Number1number2 digit
Numberdigit
digit0
digit1
digit2
digit3
digit4
digit5
digit6
digit7
digit8
digit9
Semantic Rules
number1.val = number2.val*10+digit.val
number.val= digit.val
digit.val = 0
digit.val = 1
digit.val = 2
digit.val = 3
digit.val = 4
digit.val = 5
digit.val = 6
digit.val = 7
digit.val = 8
digit.val = 9
table 6.1
The parse tree showing attribute computations for the
number 345 is given as follows
number
(val = 34*10+5 =345)
number
(val = 3*10+4=34)
number
(val=3)
digit
(val=4)
digit
4
3
Fig 6.1
digit
(val = 5)
5
Example 6.2 consider the following grammar for simple integer arithmetic
expressions:
Exp  exp + term | exp-term | term
Term  term*factor | factor
Factor  (exp)| number
The principal attribute of an exp (or term or factor) is its numeric value
(write as val) and the attribute equations for the val attribute are given as
follows
Grammar Rule
exp1exp2+term
exp1exp2-term
exp1 term
term1term2*factor
termfactor
factor(exp)
factornumber
Semantic Rules
exp1.val=exp2.val+term.val
exp1.val=exp2.val-term.val
exp1.val= term.val
term1.val=term2.val*factor.val
term.val=factor.val
factor.val=exp.val
factor.val=number.val
table 6.2
Given the expression (34-3)*42 , the computations implied by this attribute grammar
by attaching equations to nodes in a parse tree is as follows
Exp
(val = 1302)
term
(val = 31*42=1302)
term
factor
*
(val = 31)
(val = 42)
factor
number
(val = 31)
(val = 42)
Exp
(
(val = 34-3=31)
Exp
(val = 34)
Fig6.2
-
)
term
(val = 3)
term
factor
(val = 34)
(val =3)
factor
number
(val = 34)
(val = 3)
number
(val = 34)
Example 6.3 consider the following simple grammar of variable declarations
in a C-like syntax:
Decl  type var-list
Typeint | float
Var-listid, var-list |id
Define a data type attribute for the variables given by the identifiers in a
declaration and write equations expressing how the data type attribute is
related to the type of the declaration as follows:
(We use the name dtype to distinguish the attribute from the nonterminal
type)
Grammar Rule
decltype var-list
typeint
type float
var-list1id,var-list2
var-listid
Semantic Rules
var-list.dtype = type.dtype
type.dtype = integer
type.dtype = real
id.dtype = var-list1.dtype
var-list2.dtype= var-list1.dtype
id.dtype = var-list.dtype
table 6.3
Note that there is no equation involving the dtype of the nonterminal decl.
It is not necessary for the value of an attribute to be specified for all grammar
symbols
Parse tree for the string float x,y showing the dtype attribute as specified by the
attribute grammar above is as follows:
decl
Type
(dtype = real)
float
Var-list
(dtype = real)
Id
(x)
(dtype = real)
,
Var-list
(dtype = real)
Id
(y)
(dtype = real)
Fig6.3
Attribute grammars may involve several interdependent attributes.
Example 6.4 consider the following grammar, where
numbers may be octal or decimal, suppose this is
indicated by a one-character suffix o(for octal) or d(for
decimal):
Based-num  num basechar
Basechar  o|d
Num  num digit | digit
Digit  0|1|2|3|4|5|6|7|8|9
In this case num and digit require a new attribute base,
which is used to compute the val attribute. The
attribute grammar for base and val is given as follows.
Grammar Rule
Based-numnum basechar
Basechar o
Basechar d
Num1num2 digit
Semantic Rules
Based-num.val = num.val
Based-num.base = basechar.base
Basechar.base = 8
Basechar.base = 10
num1.val =
If digit.val = error or num2.val = error
Then error
Else num2.val*num1.base+digit.val
Num  digit
Digit 0
Digit 1
…
Digit 7
Digit 8
Digit 9
Num2.base = num1.base
Digit.base = num1.base
num.val = digit.val
Digit.base = num.base
digit.val = 0
digit.val = 1
…
digit.val = 7
digit.val = if digit.base = 8 then error else 8
digit.val = if digit.base = 8 then error else 9
Tab 6.4
345o
Based-num
(val = 229)
num
basechar
(val = 28*8+5=229)
(base = 8)
(base = 8)
o
num
(val = 3*8+4=28)
digit
(base=8)
(val = 5)
(base = 8)
num
digit
(val = 3)
(val = 4)
(base = 8)
(base = 8)
digit
(val = 3)
5
4
(base = 8)
3
Fig6.4
6.3 Symbol Table
• The symbol table is the major inherited attribute in a compiler
and, after syntax tree, forms the major data structure.
• The operations of the Symbol table are
insert: is used to store the information provided by name
declarations when processing these declarations.
lookup: is needed to retrieve the information associated
to a name when that name is used to associate code
delete: is needed to remove the information provided by
a declaration when that declaration no longer applies
The structure of the symbol table
• The symbol table in a compiler is a typical
dictionary data structure include
• linear lists :constant time insert operations, can be
chosen if speed is not a concern
• various search tree structures( binary search
trees, AVL trees, B trees):do not provide best case
efficiency and complexity of delete operations
• hash tables: best choice because all three
operations can be performed at constant time.
• The best scheme for compiler construction is to
open addressing, called separate chaining.
• In this method each bucket is actually a linear
list, and collisions are resolved by inserting the
new item into the bucket list.
• separate chaining.
– The size of the linear list ranges from few hundred to
over a thousand
– Size of the bucket array should be chosen to be a prime
number.
• Hash function for a symbol table
– Convert the character string in to an integer between
0…..size-1
• Convert each character in to a non negative integer
– C language uses ASCII values
– Pascal uses ord function
• Combine all the integers in to a single integer
– Programmers choice
• Resulting integer should be scaled to the range of 0……size-1
• separate chaining.
– Mod function is used to scale a number to fall in to the
range of 0…size-1.
– Combine all the integers in to a single integer
• One simple method is to consider first few characters or first
middle and last characters and ignore the rest
– Inadequate: as programmers can use variables like temp1,temp2…
and will cause frequent collisions
• Chosen method should reduce the collisions
• One good method is to multiplying factor(some constant
number )
– If ci is the numeric value of the (i+1)th character and hi is the partial
is the partial hash value computed at the (i)th step then
» h(i+1) = hi+c
» This can be generalized to
» H=( n-1c1+ n-2c2+……. cn-1+cn)mod size
• separate chaining
– The choice of the constant in this formula has
significant effect on the out come
• A reasonable choice for the constant is a number which is
power of 2 (like 2,4,8,16,32,64,128…)
• Declarations.
– There is no specific standard to create a symbol table for
all compilers.
• Symbol table will vary from compiler to compiler and it depends
on
– How each compiler Calls insert and delete operations
– How the variables are declared in each compiler.
– How long the compiler wants the symbol table to exist
– The behavior of the symbol table heavily depends on the
properties of declarations of the language being
translated.
– There are 4 basic kinds of declarations
• Constant ,
– Const int SIZE=199
• type
– Structures and unions
• variable
– Variable declarations :int a,b[100]
• Declarations.
– procedure/function
• Are constant declaration of procedure/function type (similar
to other variables)
• These variables are explicit in nature (needs to be declared
before use)
• In few compliers they are implicit in nature (need not be
declared before use)
– Fortran,basic
– We can create one single symbol table for all above 4
declarations
• Constant declarations(value binding) associate values to
names.
– Saves them in the symbol table and their value will never change
in symbol table
– In the execution phase the compiler will change these constant
variables names to values by picking them from the symbol table
• Declarations.
– We can create one single symbol table for all
above 4 declarations
– Type declarations, Variable declaration and
function/procedure declarations also will be saved in the
symbol table with minute difference between each other
• Scope rules and Block structure:
• In the above program there are five blocks
• First block : Variables int i,j and the function f has
scope for entire program(global).
• Second block: the variable size has one scope
• Third block: variables i and temp has different scope
• Scope rules and Block structure:
• In the above program there are five blocks
• Third :variables i and temp has different scope
» Variable i is of type char but in global scope it is of type int
» Until we come out of this 3rd block (function f) the type of i
will be char. When we come out of 3rd block again variable i
will be of type int.
» Compiler is giving more priority to local variables.
• When we come out of the block the variable i of type
char will be deleted and again variable i of type int will
come in to picture.
• Scope rules and Block structure:
• In the above program there are five blocks
• Third :variables i and temp has different scope
– When processing this 3rd block the symbol table will look like
• Scope rules and Block structure:
• Variable i is saved twice with both types
• Variable i of type char is saved first, so when
try to look up for variable i, it will return the
char variable only (but not int variable)
• When we come out of the block, i (char) and
temp (char) will be deleted from the symbol
table
• Scope rules and Block structure:
• In the above program there are five blocks
• Fourth block :variables j of type double will be added to
the table and the symbol table looks like
• Scope rules and Block structure:
• In the above program there are five blocks
• At the end of the function f block all the variables
declared in it are deleted and the symbol table will be
• Scope rules and Block structure:
• One more type of scope declaration is using different
symbol tables for each block (as seen in the previous
slides) and the practical implementation will look like
Symbol table example
block1
class Foo {
int value;
int test() {
int b = 3;
scope of b
return value + b;
}
void setValue(int c) {
value = c;
{ int d = c;
c = c + d;
scope of d
value = c;
}
}
}
class Bar {
int value;
void setValue(int c) {
value = c;
}
}
scope of value
scope of c
scope of c
scope of value
37
Symbol table example
class Foo {
int value;
int test() {
int b = 3;
return value + b;
}
void setValue(int c) {
value = c;
{ int d = c;
c = c + d;
value = c;
(test)
}
Symbol
}
b
}
class Bar {
int value;
void setValue(int c) {
value = c;
}
}
(Foo)
Symbol
Kind
Type
Properties
value
field
int
…
test
method
-> int
setValue
method
int -> void
(setValue)
Kind
Type
Properties
Symbol
Kind
Type
Properties
var
int
…
c
var
int
…
Symbol
Kind
Type
Properties
d
var
int
…
(block1)
38
Symbol table example cont.
…
(Foo)
Symbol
Kind
Type
Properties
value
field
int
…
test
method
-> int
setValue
method
int -> void
(test)
(setValue)
Symbol
Kind
Type
Properties
Symbol
Kind
Type
Properties
b
var
int
…
c
var
int
…
(block1)
Symbol
Kind
Type
Properties
d
var
int
…
39
Checking scope rules
(Foo)
Symbol
Kind
Type
Properties
value
field
int
…
test
method
-> int
setValue
method
int -> void
(test)
(setValue)
Symbol
Kind
Type
Properties
Symbol
Kind
Type
Properties
b
var
int
…
c
var
int
…
(block1)
void setValue(int c) {
value = c;
{ int d = c;
c = c + d;
value = c;
}
}
lookup(value)
Symbol
Kind
Type
Properties
d
var
int
…
40
Catching semantic errors
Error !
(Foo)
Symbol
Kind
Type
Properties
value
field
int
…
test
method
-> int
setValue
method
int -> void
(test)
undefined
symbol
(setValue)
Symbol
Kind
Type
Properties
Symbol
Kind
Type
Properties
b
var
int
…
c
var
int
…
(block1)
void setValue(int c) {
value = c;
{ int d = c;
c = c + d;
myValue = c;
}
}
lookup(myValue)
Symbol
Kind
Type
Properties
d
var
int
…
41
6.4 Data Types and Type Checking
Principal Task of Compiler
Type inference
The computation and maintenance of information on data
types
Type checking
• Type checking uses logical rules to reason about the behavior of a
program at run time. Specifically, it ensures that the types of the
operands match the type expected by an operator. For example,
the && operator in Java expects its two operands to be booleans;
the result is also of type boolean.
The use of the information to ensure that each part of the
program makes sense under the type rules of the language.
These two tasks are related, performed together, and referred as
type checking.
Data Type
• Data type information may be either static or dynamic,
or a combination of two.
Ex: In LISP type inference and type checking will be done
during the execution(dynamic)
Ex: In Pascal, C and Ada type information is primarily
static, and is used as the principal mechanism for checking
the correctness of the program before execution(static)
Download