COMMONWEALTH OF AUSTRALIA Copyright Regulations 1969 WARNING This material has been reproduced and communicated to you by or on behalf of Monash University pursuant to Part VB of the Copyright Act 1968 (the Act). The material in this communication may be subject to copyright under the Act. Any further reproduction or communication of this material by you may be the subject of copyright protection under the Act. Do not remove this notice. Lecture 12 Parsers CSE2303 Formal Methods I Overview • Recursive Descent Parsers • LR Parsers • Bison Parsers • A parser for a grammar is a program. – Input is a string. – Decides whether the input can be generated by the grammar. • Two main types – top-down parsers – bottom-up parsers Difficulties with Top-down Parsers • Left Recursive Grammars i.e A … Aw • Error Handling • Backtracking – Allocating/Deallocating resources – Undoing actions LR Parser • Bottom-up Parser • Scan input Left to Right • Construct the Rightmost derivation in reverse • Implemented using – A Finite Automaton and a Stack. Pros and Cons • Benefits – Can construct a LR parser to recognise most CFGs – The parsers are efficient – Detect syntactical errors as soon as possible • Disadvantages – Can’t build LR parser for every CFG – Need a Parser Generator Bison (yacc) • Parser generator – It writes a LR parser • Assumption – Grammar is an LALR grammar • Needs – set of production rules – An action for each rule • Produces – A C program main() { yyparse(); Parse input } If input cannot be parsed Get next token int yyparse() { yychar = yylex(); } int yyerror() { return 0; } Return an int representing next token Input int yylex() { } return …; Process bison Contains yyparse() example.tab.c example.y Contains definitions example.tab.h example.l flex lex.yy.c Contains yylex() • Write a bison program and a flex program E.g. example.y and example.l • Run them through bison and flex bison -d example.y flex example.l • Compile the program with the flag -lfl A Bison Program … definitions … %% … rules … %% … subroutines … bison does not handle carriage returns Sections • … definition section – Code between %{ … %} copied. – Definitions used to define tokens, types, etc. • … rule section – Pairs of production rules and actions. – The productions rules are from a CFG. • … subroutine section – Consists of users subroutines. – Copied after the end of the bison generated code – must contain yyerror() simple.c int main() { yyparse(); } simple.l %% . {return yytext[0];} \n {return 0;} %% simple.y %% S: B B {printf(“S -> BB\n”);} ; B: ‘a’ B {printf(“B -> aB\n”);} | ‘b’ {printf(“B -> b\n”);} ; %% int yyerror(char* s) { printf(“%s\n”, s); return 0; } gcc –o simple simple.c lex.yy.c simple.tab.c -lfl Evaluation of 4+2*3 SE ET|T+E TF|F*T F INT S =10 E =10 T =4 + E =6 F =4 T =6 yylval = 4 INT F =2 * yylval = 2 INT T =3 F =3 yylval = 3 INT plusTimes.l plusTimes.y %{ #include “plusTimes.tab.h” %} %token INT %% -?[0-9]+ { yylval = atoi(yytext); return INT; } [ \t] . \n %% {return yytext[0];} {return 0;} %% S: E {printf(“%d\n”, $1);} ; E: T {$$ = $1;} | T ‘+’ E {$$ = $1 + $3;} ; T: F | F ‘*’ T ; F: INT ; %% … {$$ = $1;} {$$ = $1 * $3;} {$$ = $1;} RealPlusTimes.y RealPlusTimes.l %{ #include “RealPlusTimes.tab.h” %} Real -?([0-9]+|([0-9]*\.[0-9]+)) %% {Real} { yylval.dval = atof(yytext); return REAL; } [ \t] . \n %% {return yytext[0];} {return 0;} %union{double dval;} %token <dval> REAL %type <dval> E T F %% S: E {printf(“%g\n”, $1);} ; E: T {$$ = $1;} | T ‘+’ E {$$ = $1 + $3;} ; T: F | F ‘*’ T ; F: REAL ; %% … {$$ = $1;} {$$ = $1 * $3;} {$$ = $1;} More Information • Check the courseware web site. • Man pages – login and type: xman bison • Library – “lex & yacc”, by John Levine et al. – “Principles of Compiler Design”, by A.V. Aho and J.D. Ullman – “The Unix Programming Environment”, by Kernighan & Pike. (see chapter 8) Preparation • Read – Chapter 14-16 in the Text Book.