Bild 1

advertisement
Lesson 3
CDT301 – Compiler Theory, Spring 2011
Teacher: Linus Källberg
Outline
• Introduction to parsing
– Specifying language syntax using CFGs
• Ambiguous grammars
2
INTRODUCTION TO PARSING
3
Why use regexps and grammars?




It gives a clear understanding of the language
Most grammars and regexps can be used
more or less directly as input to parser
generators
Grammars can be used to specify also the
semantics (e.g., generation of code)
A grammar serves as a clear and compact
specification for a recursive top-down parser 4
Overview of parsing

The lexical analyzer (or scanner or tokenizer)
splits the input into tokens



Token = type + attribute
Examples: <id, 3>, <+>, <num, 1234>
This is done by determining
membership of strings in regular
languages
5
Overview of parsing

The parser uses the tokens as terminals to
build a parse tree
 Implicitly or explicitly
 Most often, the parser repeatedly
“asks” the scanner for the next
token
6
Overview of parsing

The parser tries to determine which
grammar rules to apply to build the parse
tree


No suitable rules found = syntax error
Two main strategies: top-down or bottom-up


Top-down parsing starts with the start
symbol, i.e., the root of the parse tree
Bottom-up parsing starts with the
terminals, i.e., the leaves of the parse tree 7
Examples of grammars
• Lists of space-separated digits like 1 9 7 4 5
• Possible solution, assuming non-empty lists:
digit_list → digit digit_list
| digit
• Note:


digit is a terminal: the name of a token, of which
the actual integer value is an attribute
The spaces are assumed to have been removed
in the lexical analysis; therefore they are not
present in the grammar
8
Examples of grammars
• Simple expressions, e.g.,
id + id + id
id + id
E → E + id
E → id
• Note: here '+' is a token (terminal) as well
as id
9
Examples of grammars
• Grammar for a “begin-end” block in the
Pascal language:
block
→ begin stmt_list end
stmt_list
→ stmt_list ; stmt
| stmt
stmt
→ assign
| if …
… (more statement types)
10
Exercise (1)
Write a grammar for the language that allows
declarations of a single integer array with
initialization in C. The list is not allowed to be empty.
Example:
int arr[2] = {1, 2, 42};
Note: don't care about matching the number of
elements in the initialization with the array size.
What are suitable tokens?
What change is needed in order to allow the
11
initialization list to be empty?
Top-down parsing


Also called predictive parsing
Works as this:




Creates the root of the parse tree
Repeatedly expands non-terminal nodes
in the parse tree, i.e., adding children to
them, until the tree is finished, or the
parser gets stuck (syntax error)
What grammar rules to apply is predicted
by looking at the input
In lab 1 you will implement a variant known
as recursive descent
12
Recursive descent – example
• Grammar:
S → num C
C→,S
C→;
• Example strings:
3;
5, 7, 9;
1, 2, 3, 4, 5;
13
Recursive descent – example
int main(void)
{
// 1 = OK
// 0 = syntax error
return ExpectS();
}
int ExpectS()
{
if (Lookahead()==NUM)
{
Consume();
return ExpectC();
}
else return 0;
}
int ExpectC()
{
switch (Lookahead())
{
case COMMA:
Consume();
return ExpectS();
case SEMICOLON:
Consume();
return 1;
default:
return 0;
}
}
14
Using the recursive
descent technique



The previous parser merely determines
whether or not the input program is
correct
However, by inserting semantical actions
(code segments) into the parser, a syntaxdirected translation can be performed
during the parse
15
We will look at this later
AMBIGUOUS GRAMMARS
16
Writing parsers from
context-free grammars


Different grammars may describe the
same language. Example:
S→eS|e
and
S→Se|e
describe the same language, a non-empty
sequence of e's
The preferred form of the grammar
17
depends on the parsing strategy used
Ambiguous grammars

A grammar is ambiguous if it is possible to
build more than one parse tree for a
produced string


It is still a valid grammar for the
language
This might make it hard to use the
grammar to write a parser

The grammar doesn't guide the parsing
18
algorithm in making decisions
Exercise (2)
Show that the following grammar is
ambiguous, by building two different parse
trees for some string produced by the
grammar
expr → expr + expr
| expr – expr
| num
19
Handling ambiguity
• Ignore it
– Bad for the semantical analysis
•
•
•
•
Rewrite the grammar
Handle it carefully in the parser
Explicit directives to the parser generator
Which parse tree is preferred?
20
Rewriting the expression grammar


The grammar can be rewritten to an
unambiguous form, and still describe the
same language
However, preferably the (unique) parse
trees should reflect the order in which the
operators (+ and -) are applied

Application order is specified by
operator associativity and operator
precedence (described later)
21
Operator associativity



Binary operators are often left-associative,
e.g., +, -, *, and /
This means that if an operand is surrounded
by two operators of the same type, the left
operator should be applied before the right
one
Examples:
3-7-9
=
(3 - 7) - 9
a - (b + c) - d
=
(a - (b + c)) - d
22
Rewriting the expression grammar
• We rewrite the ambiguous grammar
expr → expr + expr
| expr – expr
| num
as
expr → expr + num
| expr – num
| num
• Both grammars describe the exact same language,
but the latter one unambiguously and also
23
reflecting the left associativity
Rewriting the expression grammar



In this particular case the ambiguity could be
resolved by using operator associativity
In general we do not aim to express semantics
with the grammar
There is no general method for rewriting
ambiguous grammars to unambiguous ones
24
Operator precedence




In addition to associativity, operators have a
precedence level
Example: * and / have higher precedence than +
and -. This means that
a+b*c
=
a + (b * c)
although both + and * are left-associative
Operators with higher precedence are always
applied before those with lower precedence
The application order for operators within the
same precedence group is given by their
associativity
25
Operator precedence in C
Operator
Associativity
*/
Left
+-
Left
< <= > >=
Left
== !=
Left
26
Exercise (3)
The previous grammar contained only + and -, which have the
same precedence. Let's add * and / to the grammar as well:
expr → expr + num
| expr – num
| expr * num
| expr / num
| num
Rewrite this grammar to reflect the operator precedence (it is
already unambiguous, and the associativity is already reflected)
Tip: operators on the same precedence level can be handled
identically
27
“Dangling-else”
• Grammar for if-else statements:
stmt → if ( expr ) stmt else stmt
| if ( expr ) stmt
| other
• Problematic program:
if (expr) if (expr) other else other
28
Conclusion



The parser builds a parse tree (or syntax
tree), either explicitly or implicitly, by
grouping tokens provided by the scanner
using productions of the grammar
There can be several grammars for the
same language
Ambiguous grammars can sometimes be
rewritten as unambiguous grammars
29
Next time
• Recursive descent parsers
• Left recursion
• Left factoring
30
Download