The Lex Language

advertisement
CIS 324: Language Design and Implementation
Automatic Lexical Analyzer Generators
1. The Lex Language and Software Tool
The Lex language provides a notation for specifying regular expressions.
The software tool for automatic generation of lexical analyzers from
regular expressions written in Lex is called Lex Compiler.
The lexical analyzer generator for Java programs is called JLex
(it is written in Java by Elliot Berk at Princeton University).
JLex takes a regular expression specification and generates a Java
source code that implements the corresponding lexical analyzer.
It can be used together with the CUP utility which generates
a parser that can be made to use input translated by JLex.
A lexical analyzer can be generated with JLex in the following way:
JLex
source program
JLex compiler
Myprog.lex
Myprog.lex.java
/class Yylex with
method yylex()/
Java compiler
Myprog.out
/executable lexical
analyzer/
The source code of the class Yylex contains a tabular representation of
the transition diagrams for the given regular expressions, which when run
through the yields an executable lexical analyzer that can convert given
sequences of input symbols into strings of tokens.
A JLex specification involves patterns and actions.
The patterns are essentially regular expressions written as sequences
of literals and meta-operations:
a
[ab]
[a-b]
[^s]
p?
p*
p+
p|q
character "a"
character "a" or character "b"
all characters between "a" and "b"
any character not in the set s
optional symbol
zero or more symbols
one or more symbols
symbol p or symbol q
The actions can be any correct Java statement block, where each action
should terminate with a statement that returns the value of the token.
A JLex specification has the following format:
user code
%%
JLex directives
%%
regular expressions
The ``%%'' directives serve to separate sections of the input file and
should be placed at the beginning of their line. The user code has to
be enclosed in braces so as to be included directly in the program
(more precisely inside the Yylex class).
Download