– Writing a parser with YACC (Yet Another Compiler Compiler). • Automatically generate a parser for a context free grammar (LALR parser) – Allows syntax direct translation by writing grammar productions and semantic actions – LALR(1) is more powerful than LL(1). • Work with lex. YACC calls yylex to get the next token. – YACC and lex must agree on the values for each token. • Like lex, YACC pre-dated c++, need workaround for some constructs when using c++ (will give an example). – Writing a parser with YACC (Yet Another Compiler Compiler). • YACC file format: declarations /* specify tokens, and non-terminals */ %% translation rules /* specify grammar here */ %% supporting C-routines • Command “yacc yaccfile” produces y.tab.c, which contains a routine yyparse(). – yyparse() calls yylex() to get tokens. • yyparse() returns 0 if the program is grammatically correct, non-zero otherwise • The declarations part specifies tokens, non-terminals symbols, other C/C++ constructs. – To specify token AAA BBB • %token AAA BBB – To assign a token number to a token (needed when using lex), a nonnegative integer followed immediately to the first appearance of the token • %token EOFnumber 0 • %token SEMInumber 101 – Non-terminals do not need to be declared unless you want to associated it with a type to store attributes (will be discussed later). • Translations rules specify the grammar productions exp : exp PLUSnumber exp | exp MINUSnumber exp | exp TIMESnumber exp | exp DIVIDEnumber exp | LPARENnumber exp RPARENnumber | ICONSTnumber ; exp : exp PLUSnumber exp ; exp : exp MINUSnumber exp ; • Yacc environment – Yacc processes a yacc specification file and produces a y.tab.c file. – An integer function yyparse() is produced by Yacc. • Calls yylex() to get tokens. • Return non-zero when an error is found. • Return 0 if the program is accepted. – Need main() and and yyerror() functions. – Example: yyerror(const char *str) { printf("yyerror: %s at line %d\n", str, yyline); } main() { if (!yyparse()) {printf("accept\n");} else printf("reject\n"); } • Hooking yacc and lex together, see example0.y and lexer.l • Matching the tokens – In lex: #define INTEGERCONST 2 #define PLUSNUM 4 – In yacc: %token INTEGERCONST 2 %token PLUSNUM 4 All tokens used in the yacc grammar need to be specified. Some tokens recognized by lex may not be in the yacc grammar token. See lexer.l Nonterminals do not need to be specified. • lex.yy.c and y.tab.c may be compiled separately, or yacc file may just include lex.yy.c as in example0.y • Global variables such as yyline, yycolumn, and yylval can be used in yacc routines. – YACC automatically builds a parser for the grammar (LALR parser). • May have shift/reduce and reduce/reduce conflicts when the grammar is not LALR – In this case, you will need to modify grammar to make it LALR in order for yacc to work properly. • YACC tries to resolve conflicts automatically – Default conflict resolution: » shift/reduce --> shift » reduce/reduce --> first production in the state – Not very informative, not clear if such action is what you wanted. • ‘yacc -v *.y’ will generate a report in file ‘y.output’. • See example1.y – Resolving conflicts • modify the grammar. See example1.y example0.y • Use precedence and associativity of operators. – Using keywords %left, %right, %nonassoc in the declarations section. » All tokens on the same line are the same precedence level and associativity. » The lines are listed in order of increasing precedence. %left PLUSnumber, MINUSnumber %left TIMESnumber, DIVIDEnumber – See example3.y • Attribute grammar with yacc – Each symbol can be associated with some attributes. • Data structure of the attributes can be specified in the union in the declarations. (see example4.y). %union { int semantic_value; } %token <semantic_value> INTEGERCONST 2 %type <semantic_value> exp %type <semantic_value> term %type <semantic_value> item • Semantic actions associate with productions can be specified. • The union is used to define yylval (don’t need to redeclare again, but you can directly using yylval.semantic_value in the lex code). • Semantic actions – Semantic actions associate with productions can be specified. item : LPARENnumber exp RPARENnumber {$$ = $2;} | ICONSTnumber {$$ = $1;} ; • $$ is the attribute associated with the left handside of the production • $1 is the attribute associated with the first symbol in the right handside, $2 for the second symbol, … – An action can be in anywhere in the production, it is also counted as a symbol. • Semantic actions – Semantic actions can be in anywhere in the production, an action is also counted as a symbol. item : LPARENnumber {cout << “debug”;} exp RPARENnumber {$$ = $3;} | ICONSTnumber {$$ = $1;} ; Multiple attributes and C/C++ issues • Multiple attributes can be associated with a symbol by declaring a structure in the union. See cal_trans_c.y (in yacc1_cop4020). – Unfortunately C++ does not like union with a structure or a class. – A workaround example is given in cal_trans_cpp.y.