SKELETON CODE FOR YACC & LEX

advertisement
SKELETON CODE FOR YACC & LEX
FOR LEX CODE
%{
#define YYSTYPE double
extern YYSTYPE yylval;
#include file corresponding to "y.tab.h". Note: for Cygwin, this #include
statement must come after the #define YYSTYPE one.
.
.
%}
macro definitions of the form: name to assign to the re
the re
%%
rules of the form:
re
c code to be
peformed when the re matches
the remaining input
%%
procedures employed by the c code above, and if yacc is not being used, a
main program
int yywrap(){ }
FOR YACC CODE
%{
Same definition of YYSTYPE as in the code for LEX.
extern FILE * yyin;
int lex(void);
int yyparse(void);
void yyerror(char * mes);
.
.
%}
%token list of the terminals in the grammar
%%
productions and the c code to performed when that production is employed
%%
int main(int argc char * argv[ ]) {
FILE * outfile
Code to open outfile as type "wb" with a name constructed by adding ".asm"
onto argv[1]
yyin = fopen(argv[1], "r")
yyparse();
}
void yyerror(char * mes) printf("%s\n", mes);
EXPLANATION
NOTE. In the description below, whenever we refer to what Yacc or
Lex does, what we really mean is what the program in C that Yacc or
Lex produces does.
YYSTYPE & YYLVAL
Note that in using a parsing machine, such as the example in Slide Set 24, one
can evaluate the successive contents of State No. stack and the remaining input,
without making use of Symbol Stack. We are free then to employ items on
Symbol Stack of any type that we choose. These items could be grammar
symbols as in Slide Set 24, or they could be integers, or of type double, or
pointers to entries in symbol tables. Yacc allows you to choose. The next input
symbol (i.e. head of the remaining input) is defined in Yacc by the following
statement:
YYSTYPE YYLVAL
YYLVAL is what Yacc calls the next input symbol, and YYSTYPE is a type
variable which Yacc leaves for you to define, e.g. via a statement such as
#define YYSTYPE double
The same type variable YYSTYPE is used by Yacc in the definition of Symbol
Stack, so in the above example Symbol Stack would be defined as a stack of
numbers of type double. Now it’s the function of LEX to set YYLVAL to
whatever corresponds to the next input symbol. LEX thus makes references to
YYLVAL, but it does not contain a definition of YYLVAL. In order for Lex to
transmit this information to Yacc (when Yacc calls Lex for the next input symbol),
it is necessary that Yacc and Lex employ the same definitions for YYLVAL.
So in the Definitions section of the input to Yacc, we need to provide a definition
for YYSTYPE, such as via the #define statement above, and in the Definitions
section of the input to Lex, we should employ the same definition. Furthermore,
since YYLVAL is defined in Yacc (via YYSTYPE YYLVAL), to ensure that Lex
employs the same variable we need to employ an extern definitions for YYLVAL
in Lex, namely extern YYSTYPE YYLVAL
To sum up, if the entries of symbol stack and the next input symbol are to be of
type e.g. double, then in the Definition section of the input to Yacc, we should
include #define YYSTYPE double
and in the Definition section of the input to Lex, we should include
#define YYSTYPE double
extern YYSTYPE YYLVAL
Note that the statement YYSTYPE YYLVAL is already provided by Yacc, and
so you should not include it in the input to Yacc also.
As an example consider the grammar employed with the parsing machine in
Slide set 24: E → E + T | T
T→T*a | a
We could be using this grammar and the parsing machine in Slide set
24 to evaluate arithmetic expressions, such as 3.2 + 5.0 * 7.1
In this case numbers would be classified by Lex as “a” and in reading
the next input symbol, such as 3.2, the symbol “a” would be used in
conjunction with the parsing machine to determine what transition or
reduction to make, and the subsequent contents of State No. stack,
but the 3.2 (not an “a”) would be pushed onto symbol stack. Lex
would provide the information for Yacc to do the above by setting
YYLVAL to 3.2 and returning, as a function, with a value that is a code
for “a”.
CODES FOR GRAMMAR SYMBOLS
But what is this code for “a”, and how to we ensure that both Yacc and
Lex employ the same code? Rather than deal with awkward
grammar symbols, Yacc assigns an integer code to each grammar
symbol, and employs a parsing machine (represented by a twodimensional array) that also uses these codes. In order allow Lex to
employ the same codes, Yacc (if invoked with the –d option) provides
a file called y.tab.c, which contains definitions of all the codes
employed for the grammar symbols, e.g. in our grammar, it may
contain #define a 33. Accordingly you should include in the
definition section of Lex: #include y.tab.c
OTHER SYSTEM VARIABLES
When Lex matches a regular expression with the next few symbols of
the source code, the portion of the source code that is matched is put
into the system string variable yytext , and yyleng is set to the
number of symbols matched. Lex reads its source using the file
variable yyin, which is set to stdin, i.e. in general the keyboard. It’s
convenient for your main program, located in the last section of the
input to Yacc to set this variable to an appropriate disk file.
Accordingly, the Definition section of the input to Yacc, should contain:
extern file * yyin
Some versions of Lex (or Flex), require the user to provide a
procedure called yywrap(), which is to be called for housekeeping at
the end of the source code. So, to be sure, add as the last line of
your input to Lex : int yywrap(){}.
For Yacc, in all cases, you should provide a procedure called yyerror, that you
can add at the end of the input to Yacc: void yyerror(char * mes)
When Yacc detects a syntax error in the source code, it calls this procedure,
setting mes to the words “syntax error”.
Note that the program produced by Yacc only begins parsing the input source
code when the function yyparse() is employed by your code. Also Lex only
begins reading in the input source when the function yylex() is invoked. But, if
you are using Yacc in conjuction with Lex, then the program that Yacc constructs
itself calls yylex() for each successive next symbol of the source. You do not
have to employ this function in the code that you provide.
Download