04_FSM

advertisement
Course Overview
PART I: overview material
1
2
3
Introduction
Language processors (tombstone diagrams, bootstrapping)
Architecture of a compiler
Supplementary material:
Theoretical foundations
PART II: inside a compiler
(Finite-state machines)
4 Syntax analysis
5
6
7
Contextual analysis
Runtime organization
Code generation
PART III: conclusion
8
9
Interpretation
Review
1
Finite State Machines (aka Finite Automata)
• A FSM is similar to a compiler in that:
– A compiler recognizes legal programs
in some (source) language.
– A finite-state machine recognizes legal strings
in some language.
• Example: Pascal Identifiers
– sequences of one or more letters or digits,
starting with a letter:
letter | digit
letter
S
A
2
Finite State Machines viewed as Graphs
• A state
• The start state
• An accepting state
• A transition
a
3
Finite State Machines
• Transition
s1
a
> s2
• Is read
In state s1 on input “a” go to state s2
• If end of input
– If in accepting state => accept
– Otherwise => reject
• If no transition possible (got stuck) => reject
4
Language defined by FSM
• The language defined by a FSM is the set of
strings accepted by the FSM.
– Are in the language of the FSM shown above:
• x, mp2, XyZzy, position27.
– Are not in the language of the FSM shown above:
• 123, a?, 13apples.
5
Example: Integer Literals
• FSM that accepts integer literals with an
optional + or - sign:
digit
B
digit
digit
+
S
A
6
Formal Definition
• Each finite state machine is a 5-tuple (, Q, ,
q, F) that consists of:
–
–
–
–
–
An input alphabet 
A set of states Q
A start state q
A set of accepting states (or final states) F  Q
 is a state transition function: Q x   Q
that encodes transitions statei input> statej
7
State-Transition Function
for the integer-literal example:
(S, +) = A
(S, –) = A
(S, digit) = B
(A, digit) = B
(B, digit) = B
8
FSM Examples
0
1
A
1
B
0
Accepts strings
over alphabet
{0,1} that end in 1
9
FSM Examples
1
a
a
b
2
b
3
a
4
b
b
b
a
5
a
Accepts strings
over alphabet {a,b}
that begin and end
with same symbol
10
FSM Examples
0
Accepts strings over
{0,1,2} such that sum of
digits is a multiple of 3
1
Start
0
2
0
1
1
2
2
0
2
1
11
FSM Examples
0
0
1
Even
Odd
1
Accepts strings over {0,1}
that have an odd number
of ones
12
FSM Examples
1
0
1
1
0
'0'
0,1
'00'
'001'
0
Accepts strings over
{0,1} that contain
the substring 001
13
Examples
• Design a FSM to recognize strings with an
equal number of ones and zeros.
– Not possible
• Design a FSM to recognize strings with an
equal number of substrings "01" and "10".
– Perhaps surprisingly, this is possible
14
FSM Examples
0
1
0
1
Accepts strings with
an equal number
of substrings "01" and
"10"
0
1
1
1
0
0
15
TEST YOURSELF
• Question 1: Draw a finite-state machine that
accepts Java identifiers
– one or more letters, digits, or underscores, starting
with a letter or an underscore.
• Question 2: Draw a finite-state machine that
accepts only Java identifiers that do not end
with an underscore
16
TEST YOURSELF
Question 3: What strings does this FSM accept?
Describe the set of accepted strings in English.
1
q0
q2
1
0
0
0
0
1
q1
q3
1
17
Two kinds of Finite State Machines
Deterministic (DFSM):
– No state has more than one outgoing edge with the
same label. [All previous FSM were DFSM.]
Non-deterministic (NFSM):
– States may have more than one outgoing edge with
same label.
– Edges may be labeled with  (epsilon), the empty
string. [Note that some books use the symbol .]
– The automaton can make an  epsilon transition
without consuming the current input character.
18
Example of NFSM
• integer-literal example:
digit
B

digit
+
S
A
19
Example of NFSM
0,1
0
1
0
'0'
0,1
'00'
'001'
Accepts strings over
{0,1} that contain
the substring 001
20
Non–deterministic finite state machines (NFSM)
• sometimes simpler than DFSM
• can be in multiple states at the same time
• NFSM accepts a string if
–
–
–
–
there exists a sequence of moves
starting in the start state,
ending in a final state,
that consumes the entire string.
• Examples:
– Consider the integer-literal NFSM on input "+752"
– Consider the second NFSM on input "10110001"
21
Equivalence of DFSM and NFSM
• Theorem:
– For each non-deterministic finite state machine N,
we can construct a deterministic finite state
machine D such that N and D accept the same
language.
– [proof omitted]
• Theorem:
– Every deterministic finite state machine can be
regarded as a non–deterministic finite state
machine that just doesn’t use the extra non–
deterministic capabilities.
22
How to Implement a FSM
A table-driven approach:
• Table:
– one row for each state in the machine, and
– one column for each possible character.
• Table[j][k]
– which state to go to from state j on input character k,
– an empty entry corresponds to the machine getting stuck.
23
The table-driven program for a DFSM
state = S
repeat {
// S is the start state
k = next character from the input
if (k == EOF) then
// end of input
if (state is a final state) then accept
else reject
state = T[state][k]
if (state = empty) then reject
// got stuck
}
24
Download