SKELETON CODE FOR YACC & LEX FOR LEX CODE %{ #define YYSTYPE double extern YYSTYPE yylval; #include file corresponding to "y.tab.h". Note: for Cygwin, this #include statement must come after the #define YYSTYPE one. . . %} macro definitions of the form: name to assign to the re the re %% rules of the form: re c code to be peformed when the re matches the remaining input %% procedures employed by the c code above, and if yacc is not being used, a main program int yywrap(){ } FOR YACC CODE %{ Same definition of YYSTYPE as in the code for LEX. extern FILE * yyin; int lex(void); int yyparse(void); void yyerror(char * mes); . . %} %token list of the terminals in the grammar %% productions and the c code to performed when that production is employed %% int main(int argc char * argv[ ]) { FILE * outfile Code to open outfile as type "wb" with a name constructed by adding ".asm" onto argv[1] yyin = fopen(argv[1], "r") yyparse(); } void yyerror(char * mes) printf("%s\n", mes); EXPLANATION NOTE. In the description below, whenever we refer to what Yacc or Lex does, what we really mean is what the program in C that Yacc or Lex produces does. YYSTYPE & YYLVAL Note that in using a parsing machine, such as the example in Slide Set 24, one can evaluate the successive contents of State No. stack and the remaining input, without making use of Symbol Stack. We are free then to employ items on Symbol Stack of any type that we choose. These items could be grammar symbols as in Slide Set 24, or they could be integers, or of type double, or pointers to entries in symbol tables. Yacc allows you to choose. The next input symbol (i.e. head of the remaining input) is defined in Yacc by the following statement: YYSTYPE YYLVAL YYLVAL is what Yacc calls the next input symbol, and YYSTYPE is a type variable which Yacc leaves for you to define, e.g. via a statement such as #define YYSTYPE double The same type variable YYSTYPE is used by Yacc in the definition of Symbol Stack, so in the above example Symbol Stack would be defined as a stack of numbers of type double. Now it’s the function of LEX to set YYLVAL to whatever corresponds to the next input symbol. LEX thus makes references to YYLVAL, but it does not contain a definition of YYLVAL. In order for Lex to transmit this information to Yacc (when Yacc calls Lex for the next input symbol), it is necessary that Yacc and Lex employ the same definitions for YYLVAL. So in the Definitions section of the input to Yacc, we need to provide a definition for YYSTYPE, such as via the #define statement above, and in the Definitions section of the input to Lex, we should employ the same definition. Furthermore, since YYLVAL is defined in Yacc (via YYSTYPE YYLVAL), to ensure that Lex employs the same variable we need to employ an extern definitions for YYLVAL in Lex, namely extern YYSTYPE YYLVAL To sum up, if the entries of symbol stack and the next input symbol are to be of type e.g. double, then in the Definition section of the input to Yacc, we should include #define YYSTYPE double and in the Definition section of the input to Lex, we should include #define YYSTYPE double extern YYSTYPE YYLVAL Note that the statement YYSTYPE YYLVAL is already provided by Yacc, and so you should not include it in the input to Yacc also. As an example consider the grammar employed with the parsing machine in Slide set 24: E → E + T | T T→T*a | a We could be using this grammar and the parsing machine in Slide set 24 to evaluate arithmetic expressions, such as 3.2 + 5.0 * 7.1 In this case numbers would be classified by Lex as “a” and in reading the next input symbol, such as 3.2, the symbol “a” would be used in conjunction with the parsing machine to determine what transition or reduction to make, and the subsequent contents of State No. stack, but the 3.2 (not an “a”) would be pushed onto symbol stack. Lex would provide the information for Yacc to do the above by setting YYLVAL to 3.2 and returning, as a function, with a value that is a code for “a”. CODES FOR GRAMMAR SYMBOLS But what is this code for “a”, and how to we ensure that both Yacc and Lex employ the same code? Rather than deal with awkward grammar symbols, Yacc assigns an integer code to each grammar symbol, and employs a parsing machine (represented by a twodimensional array) that also uses these codes. In order allow Lex to employ the same codes, Yacc (if invoked with the –d option) provides a file called y.tab.c, which contains definitions of all the codes employed for the grammar symbols, e.g. in our grammar, it may contain #define a 33. Accordingly you should include in the definition section of Lex: #include y.tab.c OTHER SYSTEM VARIABLES When Lex matches a regular expression with the next few symbols of the source code, the portion of the source code that is matched is put into the system string variable yytext , and yyleng is set to the number of symbols matched. Lex reads its source using the file variable yyin, which is set to stdin, i.e. in general the keyboard. It’s convenient for your main program, located in the last section of the input to Yacc to set this variable to an appropriate disk file. Accordingly, the Definition section of the input to Yacc, should contain: extern file * yyin Some versions of Lex (or Flex), require the user to provide a procedure called yywrap(), which is to be called for housekeeping at the end of the source code. So, to be sure, add as the last line of your input to Lex : int yywrap(){}. For Yacc, in all cases, you should provide a procedure called yyerror, that you can add at the end of the input to Yacc: void yyerror(char * mes) When Yacc detects a syntax error in the source code, it calls this procedure, setting mes to the words “syntax error”. Note that the program produced by Yacc only begins parsing the input source code when the function yyparse() is employed by your code. Also Lex only begins reading in the input source when the function yylex() is invoked. But, if you are using Yacc in conjuction with Lex, then the program that Yacc constructs itself calls yylex() for each successive next symbol of the source. You do not have to employ this function in the code that you provide.