Chapter 4: Syntax Analysis Part 1: Grammar Concepts

advertisement
Chapter 4: Syntax Analysis
Part 1: Grammar Concepts
CSE4100
Prof. Steven A. Demurjian
Computer Science & Engineering Department
The University of Connecticut
371 Fairfield Way, Unit 2155
Storrs, CT 06269-3155
steve@engr.uconn.edu
http://www.engr.uconn.edu/~steve
(860) 486 - 4818
Material for course thanks to:
Laurent Michel
Aggelos Kiayias
Robert LeBarre
CH4p1.1
Syntax Analysis - Parsing

CSE4100 






An overview of parsing : Functions &
Responsibilities
Context Free Grammars: Concepts & Terminology
Writing and Designing Grammars
Resolving Grammar Problems / Difficulties
Top-Down Parsing
 Recursive Descent & Predictive LL
Bottom-Up Parsing
 LR & LALR
Key Issues in Error Handling
Concluding Remarks/Looking Ahead
CH4p1.2
An Overview of Parsing
Why are Grammars to formally describe Languages
Important ?
CSE4100
1. Precise, easy-to-understand representations
2. Compiler-writing tools can take grammar and
generate a compiler
3.
allow language to be evolved (new statements,
changes to statements, etc.) Languages are not
static, but are constantly upgraded to add new
features or fix “old” ones
ADA  ADA9x, C++ Adds: Templates, exceptions,
How do grammars relate to parsing process ?
CH4p1.3
Parsing During Compilation
regular
expressions
errors
CSE4100
source
program
lexical
analyzer
token
get next
token
• uses a grammar to check
structure of tokens
• produces a parse tree
• syntactic errors and
recovery
• recognize correct syntax
• report errors
parser
symbol
table
parse
tree
rest of
front end
interm
repres
• also technically part
or parsing
• includes augmenting
info on tokens in
source, type checking,
semantic analysis
CH4p1.4
Parsing Responsibilities
Syntax Error Identification / Handling
CSE4100 Recall typical error types:
Lexical : Misspellings
Syntactic : Omission, wrong order of tokens
Semantic : Incompatible types
Logical : Infinite loop / recursive call
Majority of error processing occurs during syntax analysis
NOTE: Not all errors are identifiable !! Which ones?
CH4p1.5
Key issues in Error Handling
Detection
 Actual Detection
 Finding position at which they occur
 Reporting

CSE4100
Clear / accurate presentation
 Recovery



How to skip over to continue and find later errors
Cannot impact compilation of correct programs
CH4p1.6
What are some Typical Errors ?
#include<stdio.h>
int f1(int v)
{
int i,j=0;
As reported by MS VC++
for (i=1;i<5;i++)
CSE4100
{ j=v+f2(i) }
return j; }
int f2(int u)
{
'f2' undefined; assuming extern returning int
syntax error : missing ';' before ‘}‘
syntax error : missing ';' before ‘return‘
fatal error : unexpected end of file found
int j;
j=u+f1(u*u)
return j; }
int main()
{
int i,j=0;
for (i=1;i<10;i++)
{
Which are “easy”
to recover from?
Which are “hard” ?
j=j+i*I; printf(“%d\n”,i);
printf("%d\n",f1(j));
return 0;
}
CH4p1.7
Error Recovery Strategies
Panic Mode – Discard tokens until a “synchro” token is
found ( end, “;”, “}”, etc. )
CSE4100
-- Decision of designer
-- Problems:
skip input miss declaration – causing more errors
miss errors in skipped material
-- Advantages:
simple suited to 1 error per statement
Phase Level – Local correction on input
-- “,” ”;” – Delete “,” – insert “;”
-- Also decision of designer
-- Not suited to all situations
-- Used in conjunction with panic mode to
allow less input to be skipped
CH4p1.8
Error Recovery Strategies – (2)
Error Productions:
-- Augment grammar with rules
-- Augment grammar used for parser
CSE4100
construction / generation
-- Add a rule for
:= in C assignment statements
Report error but continue compile
-- Self correction + diagnostic messages
-- What are other rules ? (supported by yacc)
Global Correction:
-- Adding / deleting / replacing symbols is
chancy – may do many changes !
-- Algorithms available to minimize changes
costly - key issues
What do you think of each approach? What approach do compilers
you’ve used take ?
CH4p1.9
Motivating Grammars
• Regular Expressions
CSE4100
 Basis of lexical analysis
 Represent regular languages
• Context Free Grammars
 Basis of parsing
 Represent language constructs
 Characterize context free languages
Reg. Lang.
CFLs
EXAMPLE: anbn , n  1 : Is it regular ?
CH4p1.10
Context Free Languages
Regular Languages
Context Free
Languages
CSE4100
Token Structure
Sentence Structure
Scanner Generation
Reg.
Languag
es
Parser Generation
Context Free
Languages
CH4p1.11
Context Free Grammars :
Concepts & Terminology
Definition: A Context Free Grammar, CFG, is described
by T, NT, S, PR, where:
CSE4100
T: Terminals / tokens of the language
NT: Non-terminals to denote sets of strings generatable
by the grammar & in the language
S: Start symbol, SNT, which defines all strings of the
language
PR: Production rules to indicate how T and NT are
combines to generate valid strings of the language.
PR: NT  (T | NT)*
Like a Regular Expression / DFA / NFA, a Context Free
Grammar is a mathematical model !
CH4p1.12
Context Free Grammars : A First Look
assign_stmt  id := expr ;
expr  term operator term
CSE4100
term  id
term  real
term  integer
What do “BLUE”
symbols represent?
What do “BLACK”
symbols represent?
operator  +
operator  -
Derivation: A sequence of grammar rule applications
and substitutions that transform a starting non-term
into a collection of terminals / tokens.
Simply stated: Grammars / production rules allow us to
“rewrite” and “identify” correct syntax.
CH4p1.13
How is Grammar Used ?
Given the rules on the previous slide, suppose
CSE4100
id := real + int;
is input. Is it syntactically correct? How do we know?
expr is represented as: expr  term operator term
Is this accurate / complete?
expr  expr operator term
expr  term
How does this affect
the derivations which
are possible?
CH4p1.14
Derivation and Parse Tree
Let’s derive: id := id + real – integer ;
CSE4100
What’s its parse tree ?
CH4p1.15
Derivation Example
Let’s
CSE4100
derive: id := id + real – integer ;
assign_stmt
assign_stmt → id := expr ;
→ id := expr ;
expr
→ expr operator term
→ id := expr operator term;
expr
→ expr operator term
→ id := expr operator term operator term;
expr
→ term
→ id := term operator term operator term;
term
→ id
→ id := id operator term operator term;
operator
→+
→ id := id + term operator term;
term
→ real
→ id := id + real operator term;
operator
→-
→ id := id + real - term;
term
→ integer
→ id := id + real - integer;
assign_stmt → id := expr ;
expr → expr operator term
expr → term
term → id
term → real
term → integer
operator → +
operator → CH4p1.16
Example Grammar
CSE4100
expr  expr op expr
expr  ( expr )
expr  - expr
expr  id
op  +
op  op  *
op  /
op  
Black : NT
Blue : T
expr : S
9 Production rules
To simplify / standardize notation, we offer a
synopsis of terminology.
CH4p1.17
Example Grammar - Terminology
Terminals: a,b,c,+,-,punc,0,1,…,9, blue strings
CSE4100
Non Terminals: A,B,C,S, black strings
T or NT: X,Y,Z
Strings of Terminals: u,v,…,z in T*
Strings of T / NT:  ,  in ( T  NT)*
Alternatives of production rules:
A 1; A 2; …; A k;  A  1 | 2 | … | 1
First NT on LHS of 1st production rule is designated as
start symbol !
E  E A E | ( E ) | -E | id
A+|-|*| / |
CH4p1.18
Grammar Concepts
A step in a derivation is zero or one action that
replaces a NT with the RHS of a production rule.
CSE4100
EXAMPLE: E  -E (the  means “derives” in one
step) using the production rule: E  -E
EXAMPLE: E  E A E  E * E  E * ( E )
DEFINITION:  derives in one step
+
 derives in  one step
*

derives in  zero steps
EXAMPLES:  A      if A  is a production rule
*
*
1  2 …  n  1 
n ;  
 for all 
*  and    then  
* 
If  
CH4p1.19
How does this relate to Languages?
+
Let G be a CFG with start symbol S. Then S W
(where W has no non-terminals) represents the language
+
CSE4100 generated by G, denoted L(G). So WL(G)  S W.
W : is a sentence of G
When S  (and  may have NTs) it is called a
sentential form of G.
EXAMPLE: id * id is a sentence
Here’s the derivation:
E  E A E E * E  id * E  id * id
Sentential forms
*
E
id * id
CH4p1.20
Leftmost and Rightmost Derivations
Leftmost: Replace the leftmost non-terminal symbol
CSE4100
E  E A E  id A E  id * E  id * id
lm
lm
lm
lm
Rightmost: Replace the leftmost non-terminal symbol
E rm
 EAE 
E A id 
E * id 
id * id
rm
rm
rm
Important Notes:
A
If A     , what’s true about  ?
lm
If A     , what’s true about  ?
rm
Derivations: Actions to parse input can be represented
pictorially in a parse tree.
CH4p1.21
Examples of LM / RM Derivations
E  E A E | ( E ) | -E | id
CSE4100
A+|-|*| / |
A leftmost derivation of : id + id * id
A rightmost derivation of : id + id * id
CH4p1.22
Derivations & Parse Tree
E
E EAE
E
CSE4100
A
E
E
E*E
E
A
E
*
E
 id * E
 id * id
E
A
E
id
*
E
E
A
E
id
*
id
CH4p1.23
Parse Trees and Derivations
Consider the expression grammar:
E  E+E | E*E | (E) | -E | id
CSE4100
Leftmost derivations of
id + id * id
E
E
EE+E
E
+
E + E  id + E
E
E
+
E
id
E
id + E  id + E * E
E
+
E
id
E
*
E
CH4p1.24
Parse Tree & Derivations - continued
E
CSE4100
id + E * E  id + id * E
E
+
E
id
E
*
E
id
E
id + id * E  id + id * id
E
+
E
id
E
*
id
E
id
CH4p1.25
Alternative Parse Tree & Derivation
CSE4100
EE*E
E
E+E*E
 id + E * E
 id + id * E
E
id
E
*
E
+
E
id
id
 id + id * id
WHAT’S THE ISSUE HERE ?
CH4p1.26
Example Pascal Grammar in Yacc
%term
CSE4100
%binary
%left
%left
%left
%left
T_AND
T_CONST
T_TO
T_FOR
T_IF
T_MOD
T_OR
T_PROGRAM
T_STRING
T_UNTIL
T_ARRAY
T_BEGIN
T_DIV
T_DO
T_ELSE
T_END
T_FUNCTION
/* T_GOTO */
T_IN
T_INTEGER
T_NOT
T_REAL
/* T_PACKED */ T_NIL
T_RECORD
T_REPEAT
T_THEN
T_DOWNTO
T_VAR
T_WHILE
T_CASE
T_RANGE
T_FILE
T_IDENTIFIER
T_LABEL
T_OF
T_PROCEDURE
T_SET
T_TYPE
/* T_WITH */
T_COMMA
T_LPAREN
T_UPARROW
T_PERIOD
T_RPAREN
T_SEMI
T_COLON
T_RBRACK
T_LT
T_EQ
T_GT
T_PLUS T_MINUS
UNARYSIGN
T_MULT T_RDIV T_DIV
T_NOT
T_ASSIGN
T_LBRACK
T_GE
T_OR
T_LE
T_MOD
T_AND
T_NE
T_IN
%%
CH4p1.27
Example Pascal Grammar in Yacc
Start : Program_Header Declarations Block T_PERIOD
Program_Header : T_PROGRAM T_IDENTIFIER T_LPAREN Id_List T_RPAREN T_SEMI
CSE4100
Block : T_BEGIN Stt_List T_END
Declarations : Const_Def_Part
Type_Def_Part
Var_Decl_Part
Routine_Decl_Part
Const_Def_Part : T_CONST Const_Def_List
| /* epsilon */ ;
Const_Def_List : Const_Def
| Const_Def_List Const_Def
;
Const_Def : T_IDENTIFIER T_EQ Constant T_SEMI
Type_Def_Part : T_TYPE Type_Def_List
| /* epsilon */
Type_Def_List : Type_Def
| Type_Def_List Type_Def
CH4p1.28
Example Pascal Grammar in Yacc
Type_Def : T_IDENTIFIER T_EQ Type T_SEMI
CSE4100
Var_Decl_Part : T_VAR Var_Decl_List
| /* epsilon */
Var_Decl_List : Var_Decl_List Var_Decl
| Var_Decl
Var_Decl : Id_List T_COLON Type T_SEMI
Routine_Decl_Part : Routine_Decl_Part Routine_Decl
| /* epsilon */
Routine_Decl : Routine_Head T_IDENTIFIER T_SEMI
| Routine_Head Declarations Block T_SEMI
Routine_Head : T_PROCEDURE T_IDENTIFIER Params T_SEMI
| T_FUNCTION T_IDENTIFIER Params Function_Type T_SEMI
Params : T_LPAREN Params_List T_RPAREN
| /* epsilon */
Param_Group : Id_List T_COLON Type
| T_VAR Id_List T_COLON Type
CH4p1.29
Example Pascal Grammar in Yacc
Function_Type : T_COLON Type
| /* epsilon */
CSE4100
Params_List : Param_Group
| Params_List T_SEMI Param_Group
Constant : T_STRING | Number | T_PLUS Number | T_MINUS Number
Number : T_IDENTIFIER | T_INTEGER | T_REAL
Const_List : Constant | Const_List T_COMMA Constant
Type : Simple_Type | T_UPARROW T_IDENTIFIER | Structured_Type
Simple_Type : T_IDENTIFIER
| T_LPAREN Id_List T_RPAREN
| Constant T_RANGE Constant
Structured_Type : T_ARRAY T_LBRACK Simple_Type_List T_RBRACK T_OF Type
| T_SET T_OF Simple_Type
| T_RECORD Field_List T_END
Simple_Type_List : Simple_Type
| Simple_Type_List T_COMMA Simple_Type
CH4p1.30
Example Pascal Grammar in Yacc
Field_List : Fixed_Part /* Variant_part */
Fixed_Part : Field | Fixed_Part T_SEMI Field
CSE4100
Field : /* epsilon */ | Id_List T_COLON Type
Variant_part : / * epsilon * /
| T_CASE T_IDENTIFIER T_OF Variant_List
| T_CASE T_IDENTIFIER T_COLON T_IDENTIFIER T_OF Variant_List
Variant_List : Variant | Variant_List T_SEMI Variant
Variant : / * epsilon * /
| Const_List T_COLON T_LPAREN Field_List T_RPAREN
Stt_List : Statement | Stt_List T_SEMI Statement
Case_Stt_List : Case_Statement | Case_Stt_List T_SEMI Case_Statement
Case_Statement : Const_List T_COLON Statement | /* epsilon */
CH4p1.31
Example Pascal Grammar in Yacc
CSE4100
Statement : /* epsilon */
| T_IDENTIFIER | T_IDENTIFIER T_LPAREN Expr_List T_RPAREN
| Variable T_ASSIGN Expression | T_BEGIN Stt_List T_END
| T_CASE Expression T_OF Case_Stt_List T_END
| T_WHILE Expression T_DO Statement
| T_REPEAT Stt_List T_UNTIL Expression
| T_FOR Variable T_ASSIGN Expression T_TO Expression T_DO
Statement
| T_FOR Variable T_ASSIGN Expression T_DOWNTO Expression T_DO
Statement
| T_IF Expression T_THEN Statement
| T_IF Expression T_THEN Statement T_ELSE Statement
Expression : Expression relop Expression
| T_PLUS Expression
| T_MINUS Expression
| Expression addop Expression
| Expression divop Expression
| T_NIL | T_STRING | T_INTEGER | T_REAL
| Variable
| T_IDENTIFIER T_LPAREN Expr_List T_RPAREN
| T_LPAREN Expression T_RPAREN
| negop Expression
| T_LBRACK Element_List T_RBRACK
| T_LBRACK T_RBRACK
%prec
%prec
%prec
%prec
%prec
T_LT
UNARYSIGN
UNARYSIGN
T_PLUS
T_MULT
%prec T_NOT
CH4p1.32
Example Pascal Grammar in Yacc
Element_List : Element | Element_List T_COMMA Element
CSE4100
Element : Expression | Expression T_RANGE Expression
Variable : T_IDENTIFIER
| Variable T_LBRACK Expr_List T_RBRACK
| Variable T_PERIOD T_IDENTIFIER
| Variable T_UPARROW
Expr_List : Expression | Expr_List T_COMMA Expression
relop : T_EQ | T_LT | T_GT | T_NE | T_LE | T_GE | T_IN
addop : T_PLUS | T_MINUS | T_OR
divop : T_MULT | T_RDIV | T_DIV | T_MOD | T_AND ;
negop : T_NOT
CH4p1.33
Resolving Grammar Problems/Difficulties
Regular Expressions : Basis of Lexical Analysis
CSE4100
Reg. Expr.  generate/represent regular languages
Reg. Languages  smallest, most well defined class
of languages
Context Free Grammars: Basis of Parsing
CFGs  represent context free languages
CFLs  contain more powerful languages
Reg. Lang.
CFLs
anbn – CFL that’s
not regular!
Try to write DFA/NFA for it !
CH4p1.34
Resolving Problems/Difficulties – (2)
Since Reg. Lang.  Context Free Lang., it is possible
to go from reg. expr. to CFGs via NFA.
CSE4100
Recall: (a | b)*abb
a
start
a
0
1
b
2
b
3
b
CH4p1.35
Resolving Problems/Difficulties – (3)
Construct CFG as follows:
1.
Each State I has non-terminal Ai : A0, A1, A2, A3
2.
If
CSE4100
3.
If
i
a
i
b
j
j
then Ai a Aj
then Ai Aj
: A0 aA0, A0 aA1
: A0 bA0, A1 bA2
4.
: A2 bA3
If I is an accepting state, Ai  : A3  
5.
If I is a starting state, Ai is the start symbol : A0
T={a,b}, NT={A0, A1, A2, A3}, S = A0
PR ={ A0 aA0 | aA1 | bA0 ;
A1  bA2 ;
A2  3A3 ;
A3   }
CH4p1.36
How Does This CFG Derive Strings ?
a
start
CSE4100
a
0
1
b
2
b
3
b
vs.
A0 aA0, A0 aA1
A0 bA0, A1 bA2
A2 bA3, A3 
How is abaabb derived in each ?
CH4p1.37
Regular Expressions vs. CFGs
CSE4100
Regular expressions for lexical syntax
1. CFGs are overkill, lexical rules are quite
simple and straightforward
2. REs – concise / easy to understand
3. More efficient lexical analyzer can be
constructed
4. RE for lexical analysis and CFGs for parsing
promotes modularity, low coupling & high
cohesion. Why?
CFGs : Match tokens “(“ “)”, begin / end, if-then-else,
whiles, proc/func calls, …
Intended for structural associations between tokens !
Are tokens in correct order ?
CH4p1.38
Resolving Grammar Difficulties :
Motivation
1. Humans write / develop grammars
2. Different parsing approaches have different needs
CSE4100
LL(k)
Recursive
LR(k)
LALR(k)
Top-Down vs. Bottom-Up
• For: 1  remove “errors”
• For: 2  put / redesign grammar
- ambiguity
- -moves
Grammar
- cycles
Problems
- left recursion
- left factoring
CH4p1.39
Resolving Problems: Ambiguous
Grammars
Consider the following grammar segment:
stmt  if expr then stmt
CSE4100
| if expr then stmt else stmt
| other (any other statement)
What’s problem here ?
Consider the Program:
if e1
then if e2
then s1
else s2
Else must match to previous then.
Structure indicates parse subtree for expression.
CH4p1.40
Resulting Parse Tree

CSE4100
Easy case
 Else must match to previous then.
if e1
then s1
else if e2
then s2
else s3
CH4p1.41
Example : What Happens with this string?
If E1 then if E2 then S1 else S2
CSE4100
How is this parsed ?
if E1 then
if E2 then
S1
else
S2
vs.
if E1 then
if E2 then
S1
else
S2
What’s the issue here ?
CH4p1.42
Parse Trees for Example
Form 1:
CSE4100
if e1
then if e2
then s1
else s2
Form 2:
What’s the issue here ?
CH4p1.43
Removing Ambiguity
Take Original Grammar:
CSE4100
stmt  if expr then stmt
| if expr then stmt else stmt
| other (any other statement)
Revise to remove ambiguity:
stmt  matched_stmt | unmatched_stmt
matched_stmt  if expr then matched_stmt else matched_stmt | other
unmatched_stmt  if expr then stmt
| if expr then matched_stmt else unmatched_stmt
How does this grammar work ?
CH4p1.44
Resolving Difficulties : Left Recursion
A left recursive grammar has rules that support the
+
derivation : A 
A, for some .
CSE4100
Top-Down parsing can’t reconcile this type of grammar,
since it could consistently make choice which wouldn’t
allow termination.
A  A  A  A … etc. A A | 
Take left recursive grammar:
A  A | 
To the following:
A’  A’
A’  A’ | 
CH4p1.45
Why is Left Recursion a Problem ?
CSE4100
Consider:
EE+T | T
TT*F | F
F  ( E ) | id
Derive : id + id + id
EE+T
How can left recursion be removed ?
EE+T | T
What does this generate?
EE+TT+T
EE+TE+T+TT+T+T

How does this build strings ?
What does each string have to start with ?
CH4p1.46
Resolving Difficulties : Left Recursion (2)
Informal Discussion:
CSE4100
Take all productions for A and order as:
A  A1 | A2 | … | Am | 1 | 2 | … | n
Where no i begins with A.
Now apply concepts of previous slide:
A  1A’ | 2A’ | … | nA’
A’  1A’ | 2A’| … | m A’ | 
For our example:
EE+T | T
TT*F | F
F  ( E ) | id
E  TE’
E’  + TE’ | 
F  ( E ) | id
T  FT’
T’  * FT’ | 
CH4p1.47
Resolving Difficulties : Left Recursion (3)
Problem: If left recursion is two-or-more levels deep,
this isn’t enough
S  Aa | b
CSE4100
A  Ac | Sd | 
S  Aa  Sda
Algorithm:
Input: Grammar G with no cycles or -productions
Output: An equivalent grammar with no left recursion
1.
Arrange the non-terminals in some order A1,A2,…An
2.
for i := 1 to n do begin
for j := 1 to i – 1 do begin
replace each production of the form Ai  Aj
by the productions Ai  1 | 2 | … | k
where Aj  1|2|…|k are all current Aj productions;
end
eliminate the immediate left recursion among Ai productions
end
CH4p1.48
Using the Algorithm
Apply the algorithm to:
A1  A2a | b
A2  A2c | A1d | 
CSE4100
i=1
For A1 there is no left recursion
i=2
for j=1 to 1 do
Take productions: A2  A1 and replace with
A2  1  | 2  | … | k |
where
A1 1 | 2 | … | k are A1 productions
in our case A2  A1d becomes A2  A2ad | bd
What’s left: A1 A2a | b
A2  A2 c | A2 ad | bd | 
Are we done ?
CH4p1.49
Using the Algorithm (2)
No ! We must still remove A2 left recursion !
CSE4100
A1 A2a | b
A2  A2 c | A2 ad | bd | 
Recall:
A  A1 | A2 | … | Am | 1 | 2 | … | n
A  1A’ | 2A’ | … | nA’
A’  1A’ | 2A’| … | m A’ | 
Apply to above case. What do you get ?
CH4p1.50
Removing Difficulties : -Moves
Very Simple: A  and B uAv implies add B uv
to the grammar G.
CSE4100
Why does this work ?
Examples:
E  TE’
E’  + TE’ | 
T  FT’
T’  * FT’ | 
F  ( E ) | id
A1  A2 a | b
A2  bd A2’ | A2’
A2’  c A2’ | bd A2’ | 
CH4p1.51
Removing Difficulties : Cycles
How would cycles be removed ?
CSE4100
S  SS | ( S ) | 
Has: S  SS  S
S
CH4p1.52
Removing Difficulties : Left Factoring
Problem : Uncertain which of 2 rules to choose:
CSE4100
stmt  if expr then stmt else stmt
| if expr then stmt
When do you know which one is valid ?
What’s the general form of stmt ?
A  1 | 2
 : if expr then stmt
1: else stmt 2 : 
Transform to:
A   A’
A’  1 | 2
EXAMPLE:
stmt  if expr then stmt rest
rest  else stmt | 
CH4p1.53
Resolving Grammar Problems :
Another Look
• Forcing Matching Within Grammar
CSE4100
- if-then-else  else must go with most recent if
stmt  if expr then stmt | if expr then stmt else stmt | other
Leads to:
stmt  matched_stmt | unmatched_stmt
matched_stmt  if expr then matched_stmt else matched_stmt | other
unmatched_stmt  if expr then stmt
| if expr then matched_stmt else unmatched_stmt
CH4p1.54
Resolving Grammar Problems :
Another Look (2)
• Left Factoring - Know correct choice in advance !
CSE4100
where_queries  where are_does declarations of symbol
| where are_does “search_option” appear
Rules given in Assignment or
where_queries  where rest_where
rest_where  are declarations of symbol
| does “search_option” appear
CH4p1.55
Resolving Grammar Problems :
Another Look (3)
• Left Recursion - Avoid infinite loop when choosing rules
CSE4100
A  A | 
EXAMPLE:
EE+T | T
TT*F | F
F  ( E ) | id
A  A’
A’  A’ | 
E  TE’
E’  + TE’ | 
F  ( E ) | id
T  FT’
T’  * FT’ | 
If we use the original production rules on input: id + id (with leftmost):
* T+T+T+…+T
EE+TE+T+TE+T+T+T
not assured
If we use the transformed production rules on input: id + id (with leftmost):
* id + TE’ 
* id + id E’  id + id
E  TE’  T + TE’ 
CH4p1.56
Resolving Grammar Problems :
Another Look (4)
• Removing -Moves
CSE4100
E  TE’
E’  + TE’ | 
ET
E  TE’
E’  + TE’
E’  + T
Is there still a problem ?
CH4p1.57
Resolving Grammar Problems
Note: Not all aspects of a programming language can be
represented by context free grammars / languages.
CSE4100
Examples:
1. Declaring ID before its use
2. Valid typing within expressions (Type Rules)
3. Parameters in definition vs. in call
4. Method Overloading/hiding
These features are call context-sensitive and define yet
another language class, CSL.
Reg. Lang.
CFLs
CSLs
CH4p1.58
Context-Sensitive Languages - Examples
Examples:
CSE4100
L1 = { wcw | w is in (a | b)* } : Declare before use
L2 = { an bm cn dm | n  1, m  1 }
an bm : formal parameter
cn dm : actual parameter
What does it mean when L is context-sensitive ?
CH4p1.59
How do you show a Language is a CFL?
L3 = { w c wR | w is in (a | b)* }
CSE4100
L4 = { an bm cm dn | n  1, m  1 }
L5 = { an bn cm dm | n  1, m  1 }
L6 = { an bn | n  1 }
CH4p1.60
Solutions
L3 = { w c wR | w is in (a | b)* }
CSE4100
SaSa | bSb | c
L4 = { an bm cm dn | n  1, m  1 }
SaSd | aAd
A  b A c | bc
L5 = { an bn cm dm | n  1, m  1 }
S  XY
X  a X b | ab
Y  c Y d | cd
L6 = { an bn | n  1 }
S  a S b | ab
CH4p1.61
Download