Chapter6 - Syntax & Semantic Analysis

advertisement
ACSC373 – Compiler Writing
Chapter 6 – Syntax and Semantic Analysis
The syntax analyser – the backbone of a compiler
It receives sequences of tokens from the lexical analyser and
attempts to group these tokens to form syntactic structures as
defined by the grammar of the language.
The syntactic structures can then be used to generate low-level
object code.
Another classification for grammars and parsers apart from Chomsky hierarchy
Type 2 – context-free grammars of interest here
LL(K) grammars
If a top-down parser can be written for that grammar which can make a decision as to
which production to apply to any stage by simply examining at most the next K symbols
of the input
LL – left to right, the reading of input
LL(1) – the simpler but most important group. Only examination of the next symbol of
the input i.e. one symbol look ahead
e.g. productions of the form A  aB | bC can be handled by an LL(1) parser
if ‘a’ apply production A  aB
if ‘b’ apply production A  bC
The introduction of extra non-terminals and productions in the grammar, as in
A x | y
X  aB
Y  bC
Efficient parser for this type, no backtracking
[single pass]
has no effect on the grammars classification i.e. still LL(1).
Another example, A  aB | aC cannot handled by LL(1).
LR(K) grammars
If a bottom-up parser can be written for that parser which makes a single left-to-right pass
over the input while examining at most the next K symbols of the input.
- a bottom-up technique.
Top-down Parsing
Starts by trying to recognise the starting symbol of the grammar until it reaches the level
where it matches the terminal symbols.
ACSC373 – Compiler Writing – Chapter 6 – Dr. Stephania Loizidou Himona
General approach
e.g.
S  AB
AC|D
B  aD | cD
S, starting symbol recognise A in the input sentence then recognise B in the remainder of
the input sentence. When A, use second production, if it recognises C or D
B or aD or CD
aD or cD
e.g. a  BC
procedure A;
begin
B;
C
end;
Left Recursion
Which should be avoided
e.g. A  Ba | a
B  AB | b
i.e. the procedure A starts by calling procedure B.
Procedure B then starts by calling procedure A
You can rewrite grammars to eliminate left
recursion
e.g. Elimination of left recursion:
EE+T|T
Can be transformed to
E  TZ
Z  + TZ | ε
or, E  T {+T} in EBNF
A  αβ | αγ
Can be rewritten as
α € null
A  αAΓ1
Left factoring
AΓ1  αβ | γ
Example
Consider the following grammar that defines the syntax of an <assignment>:
<assignment>  <identifier> = <expression>;
<expression>  <expression> + <term> | <term>
<term>  <identifier> | <expression>
<identifier>  x | y | z
2
ACSC373 – Compiler Writing – Chapter 6 – Dr. Stephania Loizidou Himona
2nd production (above) left recursive  transform
<expression>  <term> {+<term>}
A recursive parser:
Assume the existence of NextToken
and error procedures
NextToken reads next char token and
places it in variable token (onecharacter lookahead)
Error produces error messages
var token : char;
procedure assignment;
procedure expression; forward;
procedure identifier;
begin
if token in [‘x’, ‘y’, ‘z’] then NextToken
else error (‘Identifier expected’)
end;
procedure term;
begin
if token = ‘C’ then
begin
NextToken;
expression;
if token <> ‘)’ then
error (‘) expected ‘)
else NextToken
end
else identifier
end;
procedure expression;
begin
term;
while token = ‘+’ do
begin
NextToken;
term
end
end;
begin (*body of assignment*)
identifier;
if token <> ‘=’ then error (‘=expected’)
else
begin
NextToken;
expression;
if token <> ‘;’ then error (‘;expected’)
else NextToken
end
end;
3
ACSC373 – Compiler Writing – Chapter 6 – Dr. Stephania Loizidou Himona
Similarly, - LL(K) parsing
somewhere else!
LR(K) parsing


Top-down parsers
Bottom-up parsing
Semantic Analysis
Lexical and syntax analysers are not concerned with the meaning or semantics of the
programs they process. Once the analysis of the source program is complete, the
synthesis of the object program can start and this is where considerations of semantics
become important.
Semantic analyser e.g. for Pascal
 Evaluation procedure for expressions by determining the type attributes of the
components
 Selecting appropriate forms of the operators
 Issuing error messages if incompatible operands etc
To ensure that all context-sensitive rules of the language are upheld (symbol table)
‘belonging’ to the semantic analyser
The semantic analyser has to perform two distinguishable processes:
1. Flatten the tree (i.e. the parse tree)
2. Cope with type information
e.g. i + j – k * r
i, j, k  integer
r  real
Translation of an arithmetic expression
4
ACSC373 – Compiler Writing – Chapter 6 – Dr. Stephania Loizidou Himona
Type checking by consulting the symbol table 
Types of all
variables inserted
into the tree
Types of
intermediate results
checking for
compatibility
i.e. integer * real  real
e.g. if I div (j – k * r)  type conflict
div cannot handle integer and real argument
5
ACSC373 – Compiler Writing – Chapter 6 – Dr. Stephania Loizidou Himona
The symbol table
Table in which symbols such as identifiers are stored and associated with other
information such as their type, location, scope and so on.
Fast access to this table is important (many references)
Symbol table contents
1. TYPE : a name may refer to a constant, a variable type, a procedure or a function
e.g. if variable  variable’s name
if constant  value and type
2. If the name refers to an object that can exist at runtime such as a variable, then
some means of identifying a name’s run-time location must be included.
3. If the name refers to a subprogram, then some means of identifying the starting
address of the subprogram must be included.
OR,
Store the source line numbers of name declaration and their use.
6
Download