Sorin Manolache Compilers and Interpreters Tutorial 1 Sorin Manolache, sorma@ida.liu.se • flex and bison documentation available at http://www.ida.liu.se/~TDDB44/ • online version of this tutorial: http://www.ida.liu.se/~sorma/teaching/ compinterp/tutorial1.pdf November 30, 2005 1 Sorin Manolache Flex • is a scanner generator • the input is a file (specification file) containing mainly a set of rules for the token construction • the output is a C source file containing the scanner code • the entry point is the function int yylex() • yylex scans tokens from the FILE *yyin (stdin) until – a rule matches and the associated action executes a return statement – EOF is reached. yylex then returns 0 November 30, 2005 2 Sorin Manolache Flex – Input File Specification • 3 parts separated by %% on a new line optional definitions %% optional rules %% optional additional C code • rules section – format <unindented pattern><whitespace><action> – everything different from the above pattern is copied verbatim to the output file – action is C code November 30, 2005 3 Sorin Manolache Flex – Patterns (1) x . [xyz] [C-X5-9] [^A-Z] r* r+ r? r|s rs November 30, 2005 matches the character ‘x’ matches any character except newline matches any (one) of the characters between the brackets matches any (one) character different from the characters between the brackets matches zero or more (unbounded) occurrences of r. r is itself a regexp matches one or more (unbounded) occurrences of r. r is itself a regexp matches zero or one r. r is itself a regexp matches either r or s. r and s are themselves regexps matches r followed by s. r and s are themselves regexps 4 Sorin Manolache Flex – Patterns (2) (r) ^r r$ <<EOF>> <s>r matches r. Used for priority overriding matches r occurring at the beginning of line matches r at the end of line matches the end of file matches r in the context s. r is a regexp. s denotes a context (start condition) • use ““ or \ for avoiding the special meaning of some symbols • special characters can be used: \n \r \t \a \b \f \v \0 • operator priorities: – (), [] highest – *, +, ? – concatenation – | November 30, 2005 lowest 5 Sorin Manolache Flex – Matching (1) • “greedy”, matches as much as possible • if more than one rule can be applied, then the first appearing in the flex specification file is preferred • if no rule matches, then the default rule applies: it copies the character to the output • after matching – yytext (extern char yytext[];) contains the matched text – yyleng (extern int yyleng;) contains the matched text length November 30, 2005 6 Sorin Manolache Flex – Matching (2) • Special actions, macros and handy functions: – ECHO – REJECT, choose next best rule, either a “later” one matching the same text, or another rule matching a the largest prefix of the matched text – yymore(), do another match and append its result to the current match – yyless(int n), push all but the first n characters back to the input stream (to be matched next time). yytext will contain only the first n of the matched characters. – YY_USER_ACTION denotes an action to be taken before the matched rule action November 30, 2005 7 Sorin Manolache Flex – Definitions Section • format: <unindented name><whitespace><definition> • similar to macros, referred later in the rules section • reference to a definition: {name} • example: DIGIT [0-9] %% {DIGIT}* return INT; • everything between %{ and %} is copied verbatim to the generated C source file, both in the definitions and in the rules section November 30, 2005 8 Sorin Manolache Flex – Contexts (Start Conditions) • handy for matching some rules only if the scanner is in some state (context), for instance, when parsing inside comments or inside quotes • the context (start condition) is defined in the definitions section • the scanner enters a particular context after executing the action BEGIN(start_condition_name) • if the start condition is exclusive then only those rules match that are specific to this context, i.e. those rules prefixed with <start_condition_name> • if the start condition is inclusive then non-prefixed rules may match also • prefixed rules cannot match if the scanner is not in the corresponding context (the start condition is false) November 30, 2005 9 Sorin Manolache Flex – Contexts (2) • the scanner exits a particular context either by entering another one or by entering the initial “start condition” (BEGIN(INITIAL)) • syntax of start condition declaration: %x exclusive_start_condition_name %s inclusive_start_condition_name • example: %x html_tag %% [^<]* “<” <html_tag>[^>]* <html_tag>”>” November 30, 2005 BEGIN(html_tag); printf(“%s\n”, yytext); BEGIN(INITIAL); 10 Sorin Manolache Flex – Running and Compiling • -i case insensitive, or %caseless • %yylineno • %yywrap or %noyywrap • %main or %nomain • -o output file, or %output=”name” • if main or yywrap have to be provided (no %nomain or no %noyywrap) and are not provided by the user, then one has to link with the -lfl November 30, 2005 11