• Two issues in lexical analysis

advertisement
• Two issues in lexical analysis
– Specifying tokens (regular expression) – last
lecture
– Identifying tokens specified by regular
expression – today’s topic
• How to recognize tokens specified by
regular expressions?
– A recognizer for a language is a program that takes a
string x as input and answers “yes” if x is a sentence of
the language and “no” otherwise.
• In the context of lexical analysis, given a string and a regular
expression, a recognizer of the language specified by the
regular expression answer “yes” if the string is in the language.
• How to recognize regular exp int? What about int | for?
0
i
1 n
2
t
3
All other characters
error
0
i
1
n
2
t
3
• How to recognize tokens specified by
regular expressions?
– How to recognize regular expression int | for?
0
i
f
1
4
n
o
2
5
3
t
r
6
– A regular expression can be compiled into a
recognizer (automatically) by constructing a
finite automata which can be deterministic or
non-deterministic.
• Non-deterministic finite automata
(NFA)
– A non-deterministic finite automata (NFA) is a mathematical
model that consists of: (a 5-tuple (Q,  ,  , q0, F )
• a set of states Q
• a set of input symbols 
• a transition function that maps state-symbol pairs to sets of
states.
• A state q0 that is distinguished as the start (initial) state
• A set of states F distinguished as accepting (final) states.
– An NFA accepts an input string x if and only if there is some path
in the transition graph from the start state to some accepting state
(after consuming x).
Finite State Machines = Regular
Expression Recognizers
relop  < | <= | <> | > | >= | =
start
0
<
1
=
>
other
=
5
>
id  letter ( letter | digit )*
start
9
6
2
return(relop, LE)
3
return(relop, NE)
4 * return(relop, LT)
return(relop, EQ)
=
7 return(relop, GE)
other
8 * return(relop, GT)
letter or digit
letter
5
10
other
11 * return(gettoken(),
5/29/2016
install_id())
• An NFA is non-deterministic in that (1) same
character can label two or more transitions out of
one state (2) empty string can label transitions.
• For example, here is an NFA that recognizes the
language ?
a
0
a
1
b
2
b
3
b
• An NFA can easily implemented using a transition
table, which can be used in the language
recognizer.
State
0
1
2
a
{0, 1}
-
b
{0}
{2}
{3}
From a Regular Expression to an
NFA

start
a
start
r1  r2
start
r1 r2
r*
start
start
7
i
i

a

i

i N(r1)
i

f
f
N(r1)
N(r2)


f
N(r2) f

N(r)


f
5/29/2016
• Examples:
–
–
–
–
–
a
a|b
ab
a*b
(a|b)*abb.
• NFA can be converted into deterministic
finite state automata (DFA) to improve
efficiency.
Download