YACC

advertisement
Compiler Principle and
Technology
Prof. Dongming LU
Mar. 26th, 2014
5. Bottom-Up Parsing
PART THREE
Contents
PART ONE
5.1 Overview of Bottom-Up Parsing
5.2 Finite Automata of LR(0) Items and LR(0) Parsing
PART TWO
5.3 SLR(1) Parsing
5.4 General LR(1) and LALR(1) Parsing
5.7 Error Recovery in Bottom-Up Parsers
Part THREE
5.5 Yacc: An LALR(1) Parser Generator
5.5 Yacc: LALR(1) PARSING
GENERATOR

A parser generator is a program taking a
specification of the syntax of a language and
producing a parser procedure for the language

A parser generator is called compilercompiler

One widely used parser generator incorporating
the LALR(1) parsing algorithm is called YACC
5.5.1 Yacc Basics
Parser generator--Yacc
Yacc source
file translate.y
Yacc
y.tab.c
C
compiler
input
a.out
1/36
y.tab.c
a.out
output
Lex source
File x.l
Lex
lex.yy.c
lex.yy.c
C
compiler
a.out
inputStream
a.out
Tokens

Yacc takes a specification file


Yacc produces an output file consisting of C source
code for the parser


(usually with a .y suffix);
(usually in a file called y.tab.c or ytab.c or, more
recently <filename>.tab.c, where <filename>.y is the
input file).
A Yacc specification file has the basic format.






{definitions}
%%
{rules}
%%
{auxiliary routines}
There are three sections separated by lines
containing double percent signs

The example:

A calculator for simple integer arithmetic
expressions with the grammar
exp → exp addop term | termx
addop → + | term → term mulop factor | factor
mulop → *
factor → ( exp ) | number
%{
#include <stdio.h>
#include <ctype.h>
%}
%token NUMBER
%%
command :exp { printf (“%d\n”,$1);}
; /*allows printing of the result */
exp: exp ‘+’ term {$$ = $1 + $3;}
| exp ‘-‘ term {$$ = $1 - $3;}
| term {$$ = $1;}
;
term: term ‘*’ factor {$$ = $1* $3;}
| factor {$$ = $1;}
;
factor :NUMBER{$$ = $1;}
| ‘(‘exp’)’ {$$=$2;}
;
%%
main ( )
{
return yyparse( );
}
int yylex(void)
{ int c;
while( ( c = getchar ( ) )== ‘ ’ );
/*eliminates blanks */
if ( isdigit(c) ) {
unget (c,stidin) ;
scanf (“%d”,&yylval ) ;
return (NUMBER ) ;
}
if (c== ‘\n’) return 0;
/* makes the parse stop */
return ( c ) ;
}
int yyerror (char * s)
{ fprintf (stderr, “%s\n”,s ) ;
return 0;
}/* allows for printing of an error message */

The definitions section (can be empty)



Information about the tokens, data types, grammar
rules;
Any C code that must go directly into the output file
at its beginning
The rules section


Grammar rules in a modified BNF form;
Actions in C code executed whenever the associated
grammar rule is recognized


(i.e.. used in a reduction. according to the LALR(1) parsing
algorithm)
The metasymbol conventions:



The vertical bar is used for alternatives;
The arrow symbol →is replaced in Yacc by a colon;
A semicolon must end each grammar rule.


The auxiliary routines section ( can be
empty):
 Procedure and function declarations
A minimal Yacc specification file

consist only of %% followed by grammar rules and
actions.
Notes:


Two typical #include directives are set off from
other Yacc declarations in the definitions
section by the surrounding delimiters %{ and
%}.
Yacc has two ways of recognizing token.
 Single-character inside single quotes as
itself
 Symbolic tokens as a numeric value that
does not conflict with any character value
Notes:




%start command in the definition section will
indicate that command be start symbol,
otherwise the rule listed first in the rule section
indicates the start symbol.
The variable yylval is defined internally by Yacc;
yyparse is the name of the parsing procedure;
yylex is the name of Lex scanner generator.
5.5.2 Yacc Options
(1) –d option
It will produce the header file usually named
y.tab.h or ytab.h.
Yacc needs access to many auxiliary
procedures in addition to yylax and yyerror;
They are often placed into external files
rather than directly in the Yacc specification
file.
Making Yacc-specific definitions available to
other files.





For example, command in the file calc.y
yacc –d calc.y
(1) –d option
Produce (in addition to the y.tab.c file) the file
y.tab.h (or similar name), whose contents vary,
but typically include such things as the following:
#ifndef YYSTYPE
#define YYSTYPE int
#endif
#define NUMBER 258
extern YYSTYPE yylval;
(2) –v option

Produces yet another file, with the
name y. output (or similar name).


This file contains a textual description of
the LALR( l )parsing table that is used by
the parser.
As an example: consider the skeletal
version of the Yacc specification above.

yacc -v calc .y
(2) –v option
%taken NUMBER
%%
command :exp
;
exp: exp ‘+’ term
| exp ‘-‘ term
| term
;
term: term ‘*’ factor
| factor
;
factor :NUMBER
| ‘(‘exp’)’
;
A typical y.output file generated for the Yacc specification of figure 5.10 using the verbose
option
state 0
$accept :
(
·
_command $end
shift 5
shift 6
error
command
exp
term
factor
goto 1
goto 2
goto 3
goto 4
$accept :
$end
·
command_ $end
accept
error
command :
exp :
exp:
exp_
(1)
exp_ + term
exp_ - term
+
shift 7
shift 8
reduce 4
NUMBER
state 1
state 2
·
state 3
exp:
term:
term_ (4)
term_*factor
*
·
shift 9
reduce
term:
factor_ (6)
·
reduce 6
factor:
NUMBER_ (7)
·
reduce 7
state 4
state 5
state 6
factor:
(_exp)
NUMBER
shift 5
shift 6
(
·
error
exp
term
factor
goto 10
goto 3
goto 4
exp:
exp +_term
NUMBER
(
·
shift 5
shift 6
error
term
factor
goto 11
goto 4
exp:
exp -_term
NUMBER
(
·
shift 5
shift 6
error
term
factor
goto 12
goto 4
state 7
state 8
state 9
temr:
term *_ factor
NUMBER
shift 5
shift 6
error
(
·
factor
state 10
exp :
exp:
factor:
goto 13
+
)
·
state 11
exp :
temr:
shift 7
shift 8
shift 14
error
*
·
exp_ + term
exp_ - term
(exp_)
exp + term_ (2)
term_ * factor
shift 9
reduce 2
state 12
exp :
temr:
exp - term_ (3)
term_ * factor
*
·
state 13
temr:
shift 9
reduce 3
·
state 14
factor:
reduce 5
·
term * factor_ (5)
(exp)_ (8)
reduce 8
The parsing table appears as follow:
State
Input
NUMBER
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
S5
r1
r4
r6
r7
s5
s5
s5
s5
r2
r3
r5
r8
(
s6
r1
r4
r6
r7
s6
s6
s6
s6
r2
r3
r5
r8
+
-
Goto
*
)
$
commad
1
s7
r4
r6
r7
s7
r2
r3
r5
r8
s8
r4
r6
r7
s8
r2
r3
r5
r8
r1
s9
r6
r7
s9
s9
r5
r8
r1
r4
r6
r7
s14
r2
r3
r5
r8
exp
term
factor
2
3
4
10
3
11
4
4
13
accept
r1
r4
r6
r7
r2
r3
r5
r8
5.5.3 Parsing Conflicts and
Disambiguating Rules
Yacc and ambiguous rules
Default rules for Disambiguating:


reduce-reduce conflict: preferring the
reduction by the grammar rule listed
first in the specification file
shift-reduce conflict: preferring the
shift over the reduce
Error recovery in Yacc
The designer of a compiler will decide:
 Which non-terminals may cause an error
 Add error production A  error  for the non-terminals
 Add semantic actions for these error productions
Yacc regards the error productions as normal grammar
productions
Error recovery procedure




Pop the state from stack,until we find
A ·error 
Push ‘error’ into stack
If  is  ,do the reduction,and scan the input
symbols until we find the correct symbol.
If  is not  ,find  ,push  into stack and use A
·error  to do the reduction.
s:
stack
C 1·A2
A· b
..
. . . . . . .A. . a . . A ·error 
s.
..
...
find an error
A
b
C1A·2
...
Ab·
...
lines : lines expr ‘\n’ {printf ( “%g \n”, $2 ) }
| lines ‘\n’
| /  /
| error ‘\n’ {printf ( “重新输入上一行”);
yyerrok;}
;
End of Part Three
THANKS
Download