• Syntax error handling – Errors can occur at many levels • • • • lexical: unknown operator syntactic: unbalanced parentheses semantic: variable never declared runtime: reference a NULL pointer – Goals of error-handling in a parser • To detect and report the presence of errors • To recover from an error and detect subsequent errors • To not slow down the processing of correct programs Error recovery strategies • Panic mode recovery – On discovering an error, discard input symbols one at a time until one of a designated set of synchronizing token is found. • Phrase-level recovery – On discovering an error, perform a local fix to allow the parser to continue. • Error recovery in predictive parsing – Recovery in a non-recursive predictive parser is easier than in a recursive descent parser. – Panic mode recovery • If a terminal on stack, pop the terminal. • If a non-terminal on stack, shift the input until the terminal can expand. – Phrase-level recovery • Carefully filling in the blank entries about what to do. – Error recover in LR parsing • Canonical LR parsers never make extra reductions when recognizing an error. • SLR and LALR may make extra reductions, but will never shift an erroneous input symbol on the stack. • Panic mode recovery – Scan down stack until a state representing a major program construct is found. Input symbols are discarded until one is found that is in the follow of the nonterminal. Trying to isolate the phrase containing the error. • Phrase level recovery – Implement an error recovery routine for each error entry in the table. – Writing a parser with YACC (Yet Another Compiler Compiler). • Generates LALR parsers • Work with lex. YACC calls yylex to get next token. – YACC and lex must agree on the values for each token. • Produce y.tab.c file by “yacc yaccfile”, which contains a routine yyparse(). • yyparse() returns 0 if the program is ok, non-zero otherwise • YACC file format: declarations %% translation rules %% supporting C-routines • The declarations part specifies tokens, non-terminals symbols, other C constructs. – To specify token AAA BBB • %token AAA BBB – To assign a token number to a token (needed when using lex), a nonnegative integer followed immediately to the first appearance of the token • %token EOFnumber 0 • %token SEMInumber 101 – Non-terminals do not need to be declared unless you want to associated it with a type (will be discussed later). • Translations rules specify the grammar productions exp : exp PLUSnumber exp | exp MINUSnumber exp | exp TIMESnumber exp | exp DIVIDEnumber exp | LPARENnumber exp RPARENnumber | ICONSTnumber ; exp : exp PLUSnumber exp ; exp : exp MINUSnumber exp ; • Yacc environment – Yacc processes the specification file and produce a y.tab.c file. – An integer function yyparse() is produced by Yacc. • Calls yylex() to get tokens. • Return non-zero when an error is found. • Return 0 if the program is accepted. – Need main() and and yyerror() functions. – Example: yyerror(str) char *str; { printf("yyerror: %s at line %d\n", str, yyline); } main() { if (!yyparse()) {printf("accept\n");} else printf("reject\n"); } – YACC builds a LALR parser for the grammar. • May have shift/reduce and reduce/reduce conflicts if there are problems with the grammar. • Default conflict resolution: – shift/reduce --> shift – reduce/reduce --> first production in the state – should always avoid reduce/reduce conflicts • ‘yacc -v *.y’ will generate a report in file ‘y.output’. • See example1.y • The programmer MUST resolve all conflicts (unless you really know what you are doing). – modify the grammar. See example2.y – Use precedence and associativity of operators. • Use precedence and associativity of operators. – Using keywords %left, %right, %nonassoc in the declarations section. • All tokens on the same line are the same precedence level and associativity. • The lines are listed in order of increasing precedence. %left PLUSnumber, MINUSnumber %left TIMESnumber, DIVIDEnumber – See example3.y • Symbol attributes – Each symbol can be associated with some attributes. • Data structure of the attributes can be specified in the union in the declarations. (see example4.y). %union { int semantic_value; } %token <semantic_value> ICONSTnumber %type <semantic_value> exp %type <semantic_value> term %type <semantic_value> item 119 • Semantic actions associate with productions can be specified • Semantic actions – Semantic actions associate with productions can be specified. item : LPARENnumber exp RPARENnumber {$$ = $2;} | ICONSTnumber {$$ = $1;} ; • $$ is the attribute associated with the left handside of the production • $1 is the attribute associated with the first symbol in the right handside, $2 for the second symbol, … – An action can be in anyway in the production, it is also counted as a symbol. – Checkout example5.y for examples with multiple types associated with different symbol.