Pushdown Automaton - Monash University

advertisement
COMMONWEALTH OF AUSTRALIA
Copyright Regulations 1969
WARNING
This material has been reproduced and communicated to you by or on behalf of Monash University pursuant to
Part VB of the Copyright Act 1968 (the Act).
The material in this communication may be subject to copyright under the Act. Any further reproduction or
communication of this material by you may be the subject of copyright protection under the Act.
Do not remove this notice.
Lecture 12
Parsers
CSE2303 Formal Methods I
Overview
• Recursive Descent Parsers
• LR Parsers
• Bison
Parsers
• A parser for a grammar is a program.
– Input is a string.
– Decides whether the input can be generated by
the grammar.
• Two main types
– top-down parsers
– bottom-up parsers
Difficulties with
Top-down Parsers
• Left Recursive Grammars
i.e A  …  Aw
• Error Handling
• Backtracking
– Allocating/Deallocating resources
– Undoing actions
LR Parser
• Bottom-up Parser
• Scan input Left to Right
• Construct the Rightmost derivation in
reverse
• Implemented using
– A Finite Automaton and a Stack.
Pros and Cons
• Benefits
– Can construct a LR parser to recognise most
CFGs
– The parsers are efficient
– Detect syntactical errors as soon as possible
• Disadvantages
– Can’t build LR parser for every CFG
– Need a Parser Generator
Bison (yacc)
• Parser generator
– It writes a LR parser
• Assumption
– Grammar is an LALR grammar
• Needs
– set of production rules
– An action for each rule
• Produces
– A C program
main()
{
yyparse();
Parse
input
}
If input cannot be
parsed
Get next
token
int yyparse()
{
yychar = yylex();
}
int yyerror()
{
return 0;
}
Return an int
representing
next token
Input
int yylex()
{
}
return …;
Process
bison
Contains
yyparse()
example.tab.c
example.y
Contains
definitions
example.tab.h
example.l
flex
lex.yy.c
Contains
yylex()
• Write a bison program and a flex program
E.g. example.y and example.l
• Run them through bison and flex
bison -d example.y
flex example.l
• Compile the program with the flag -lfl
A Bison Program
… definitions …
%%
… rules …
%%
… subroutines …
bison does not handle carriage returns
Sections
• … definition section
– Code between %{ … %} copied.
– Definitions used to define tokens, types, etc.
• … rule section
– Pairs of production rules and actions.
– The productions rules are from a CFG.
• … subroutine section
– Consists of users subroutines.
– Copied after the end of the bison generated code
– must contain yyerror()
simple.c
int
main()
{
yyparse();
}
simple.l
%%
. {return yytext[0];}
\n {return 0;}
%%
simple.y
%%
S: B B {printf(“S -> BB\n”);}
;
B: ‘a’ B {printf(“B -> aB\n”);}
| ‘b’ {printf(“B -> b\n”);}
;
%%
int yyerror(char* s)
{
printf(“%s\n”, s);
return 0;
}
gcc –o simple simple.c lex.yy.c simple.tab.c -lfl
Evaluation of 4+2*3
SE
ET|T+E
TF|F*T
F  INT
S =10
E =10
T =4 +
E =6
F =4
T =6
yylval = 4 INT
F =2 *
yylval = 2 INT
T =3
F =3
yylval = 3 INT
plusTimes.l
plusTimes.y
%{
#include “plusTimes.tab.h”
%}
%token INT
%%
-?[0-9]+ {
yylval = atoi(yytext);
return INT;
}
[ \t]
.
\n
%%
{return yytext[0];}
{return 0;}
%%
S: E {printf(“%d\n”, $1);}
;
E: T
{$$ = $1;}
| T ‘+’ E {$$ = $1 + $3;}
;
T: F
| F ‘*’ T
;
F: INT
;
%%
…
{$$ = $1;}
{$$ = $1 * $3;}
{$$ = $1;}
RealPlusTimes.y
RealPlusTimes.l
%{
#include “RealPlusTimes.tab.h”
%}
Real -?([0-9]+|([0-9]*\.[0-9]+))
%%
{Real} {
yylval.dval = atof(yytext);
return REAL;
}
[ \t]
.
\n
%%
{return yytext[0];}
{return 0;}
%union{double dval;}
%token <dval> REAL
%type <dval> E T F
%%
S: E {printf(“%g\n”, $1);}
;
E: T
{$$ = $1;}
| T ‘+’ E {$$ = $1 + $3;}
;
T: F
| F ‘*’ T
;
F: REAL
;
%%
…
{$$ = $1;}
{$$ = $1 * $3;}
{$$ = $1;}
More Information
• Check the courseware web site.
• Man pages
– login and type:
xman bison
• Library
– “lex & yacc”, by John Levine et al.
– “Principles of Compiler Design”, by A.V. Aho and J.D.
Ullman
– “The Unix Programming Environment”, by
Kernighan & Pike. (see chapter 8)
Preparation
• Read
– Chapter 14-16 in the Text Book.
Download