– Writing a parser with YACC (Yet Another Compiler Compiler). (LALR parser)

advertisement
– Writing a parser with YACC (Yet Another
Compiler Compiler).
• Automatically generate a parser for a context free grammar
(LALR parser)
– Allows syntax direct translation by writing grammar productions and
semantic actions
– LALR(1) is more powerful than LL(1).
• Work with lex. YACC calls yylex to get the next token.
– YACC and lex must agree on the values for each token.
• Like lex, YACC pre-dated c++, need workaround for some
constructs when using c++ (will give an example).
– Writing a parser with YACC (Yet Another
Compiler Compiler).
• YACC file format:
declarations /* specify tokens, and non-terminals */
%%
translation rules /* specify grammar here */
%%
supporting C-routines
• Command “yacc yaccfile” produces y.tab.c, which contains a
routine yyparse().
– yyparse() calls yylex() to get tokens.
• yyparse() returns 0 if the program is grammatically correct,
non-zero otherwise
• The declarations part specifies tokens, non-terminals
symbols, other C/C++ constructs.
– To specify token AAA BBB
• %token AAA BBB
– To assign a token number to a token (needed when using lex), a
nonnegative integer followed immediately to the first appearance
of the token
• %token EOFnumber 0
• %token SEMInumber 101
– Non-terminals do not need to be declared unless you want to
associated it with a type to store attributes (will be discussed later).
• Translations rules specify the grammar productions
exp : exp PLUSnumber exp
| exp MINUSnumber exp
| exp TIMESnumber exp
| exp DIVIDEnumber exp
| LPARENnumber exp RPARENnumber
| ICONSTnumber
;
exp : exp PLUSnumber exp
;
exp : exp MINUSnumber exp
;
• Yacc environment
– Yacc processes a yacc specification file and produces a y.tab.c file.
– An integer function yyparse() is produced by Yacc.
• Calls yylex() to get tokens.
• Return non-zero when an error is found.
• Return 0 if the program is accepted.
– Need main() and and yyerror() functions.
– Example:
yyerror(const char *str)
{ printf("yyerror: %s at line %d\n", str, yyline);
}
main()
{
if (!yyparse()) {printf("accept\n");}
else printf("reject\n");
}
• Hooking yacc and lex together, see example0.y and lexer.l
• Matching the tokens
– In lex:
#define INTEGERCONST 2
#define PLUSNUM 4
– In yacc:
%token INTEGERCONST 2
%token PLUSNUM 4
All tokens used in the yacc grammar need to be specified. Some tokens
recognized by lex may not be in the yacc grammar token. See lexer.l Nonterminals do not need to be specified.
• lex.yy.c and y.tab.c may be compiled separately, or yacc
file may just include lex.yy.c as in example0.y
• Global variables such as yyline, yycolumn, and yylval can
be used in yacc routines.
– YACC automatically builds a parser for the grammar (LALR
parser).
• May have shift/reduce and reduce/reduce conflicts when the
grammar is not LALR
– In this case, you will need to modify grammar to make it LALR in order
for yacc to work properly.
• YACC tries to resolve conflicts automatically
– Default conflict resolution:
» shift/reduce --> shift
» reduce/reduce --> first production in the state
– Not very informative, not clear if such action is what you
wanted.
• ‘yacc -v *.y’ will generate a report in file ‘y.output’.
• See example1.y
– Resolving conflicts
• modify the grammar. See example1.y  example0.y
• Use precedence and associativity of operators.
– Using keywords %left, %right, %nonassoc in the
declarations section.
» All tokens on the same line are the same precedence
level and associativity.
» The lines are listed in order of increasing precedence.
%left PLUSnumber, MINUSnumber
%left TIMESnumber, DIVIDEnumber
– See example3.y
• Attribute grammar with yacc
– Each symbol can be associated with some
attributes.
• Data structure of the attributes can be specified in the union in the
declarations. (see example4.y).
%union {
int semantic_value;
}
%token <semantic_value> INTEGERCONST 2
%type <semantic_value> exp
%type <semantic_value> term
%type <semantic_value> item
• Semantic actions associate with productions can be specified.
• The union is used to define yylval (don’t need to
redeclare again, but you can directly using
yylval.semantic_value in the lex code).
• Semantic actions
– Semantic actions associate with productions can be
specified.
item : LPARENnumber exp RPARENnumber
{$$ = $2;}
| ICONSTnumber
{$$ = $1;}
;
• $$ is the attribute associated with the left handside of the
production
• $1 is the attribute associated with the first symbol in the
right handside, $2 for the second symbol, …
– An action can be in anywhere in the production, it is also
counted as a symbol.
• Semantic actions
– Semantic actions can be in anywhere in the
production, an action is also counted as a
symbol.
item : LPARENnumber {cout << “debug”;} exp RPARENnumber
{$$ = $3;}
| ICONSTnumber
{$$ = $1;}
;
Multiple attributes and C/C++
issues
• Multiple attributes can be associated with a
symbol by declaring a structure in the
union. See cal_trans_c.y (in
yacc1_cop4020).
– Unfortunately C++ does not like union with a
structure or a class.
– A workaround example is given in
cal_trans_cpp.y.
Download