CIS 324: Language Design and Implementation Automatic Lexical Analyzer Generators 1. The Lex Language and Software Tool The Lex language provides a notation for specifying regular expressions. The software tool for automatic generation of lexical analyzers from regular expressions written in Lex is called Lex Compiler. The lexical analyzer generator for Java programs is called JLex (it is written in Java by Elliot Berk at Princeton University). JLex takes a regular expression specification and generates a Java source code that implements the corresponding lexical analyzer. It can be used together with the CUP utility which generates a parser that can be made to use input translated by JLex. A lexical analyzer can be generated with JLex in the following way: JLex source program JLex compiler Myprog.lex Myprog.lex.java /class Yylex with method yylex()/ Java compiler Myprog.out /executable lexical analyzer/ The source code of the class Yylex contains a tabular representation of the transition diagrams for the given regular expressions, which when run through the yields an executable lexical analyzer that can convert given sequences of input symbols into strings of tokens. A JLex specification involves patterns and actions. The patterns are essentially regular expressions written as sequences of literals and meta-operations: a [ab] [a-b] [^s] p? p* p+ p|q character "a" character "a" or character "b" all characters between "a" and "b" any character not in the set s optional symbol zero or more symbols one or more symbols symbol p or symbol q The actions can be any correct Java statement block, where each action should terminate with a statement that returns the value of the token. A JLex specification has the following format: user code %% JLex directives %% regular expressions The ``%%'' directives serve to separate sections of the input file and should be placed at the beginning of their line. The user code has to be enclosed in braces so as to be included directly in the program (more precisely inside the Yylex class).