Converting an NFA to a DFA

advertisement
Converting an NFA to a DFA
This assignment builds on the work you’ve done in Major Assignment 2 in which your
program read a regular expression (in a standard form) and created a state transition table
for an NFA that recognizes the same strings as the input regular expression. In this
assignment you will build a deterministic finite automata (DFA) using the NFA state
transition table. The DFA your program will build will again be represented by a state
transition table.
Let’s start by defining more precisely what we mean (or at least what I mean) by a DFA.
A DFA will have an alphabet, that’s a set of input characters, which for our purposes will
be capital letters A .. Z. It will also have some number of DFA states, each represented
by a row of the DFA state transition table. The DFA will have NO epsilon transitions at
all. Finally it will have a number of transitions, exactly ONE transition for each possible
combination of input character (capital letter) and state (row of the transition table) of the
DFA. I want to emphasize this point as some people do not require that each state have a
transition for EACH possible input character. In this class (and most textbooks I’ve
seen) we WILL require that, however.
One reason I’m emphasizing that is that there is a youtube video that goes through an
example of building an NFA without epsilon-transitions from an NFA that does include
epsilon transitions and that example could be useful to you in solidifying your concept of
the algorithm used to convert an NFA to a DFA. But, alas, the presenter does NOT
require that each row have a transition for each possible input character. The “fix” to
turn his “DFA” into one that fits our definition is simple, but you need to be aware of it.
More on that later.
In the rest of this description of how to convert an NFA to a DFA, I’ll quickly describe
the two algorithms that you need to use, and then go through an extended example, at
least of the second, more complicated algorithm. So, the two algorithms you need to
implement are:
1. Compute the epsilon-closure of each state (row) in the NFA state transition table.
To do this you wish to identify, for each state, S, of the NFA table each state that
can be reached from S by follow 0 or more epsilon (marked ‘e’ in our table)
transitions. Although you could do this in many different ways, I suggest a
recursive algorithm to find all such states (rows).
2. Build the states (rows) of the DFA state transition table. This requires some
initialization and then starting with the “initial” state, repeatedly building new
state(s) by following transitions (of the NFA states) until all DFA states (rows)
have been completed. As to initialization, we’ll start with two states, the first
(DFA state 0) will be what is often called an error state. This “error” state is
exactly what is missing in the youtube tutorial I’ll point you to soon. The error
state (DFA state 0) will have a transition to itself for each possible input character
(capital letter). The next DFA state (1, of course) will represent the epsilon-
closure of the initial state of the NFA. In general, each DFA state will represent a
subset of the NFA states. Thus the “error” state will be the empty set {}, that
includes none of the NFA states. DFA state 1, which will be the initial state for
the DFA will represent those (NFA) states included in the epsilon closure of the
initial state of the NFA. Once we have those two states identified and the error
state, DFA state 0, already completed since we have a transition for each possible
input character, we keep applying the following rule to fill out the DFA state
transition table.
a. Pick the next “uncompleted” row of the DFA table and, for each possible
input variable compute a set of states, SofS, that includes the set union of
the epsilon closures of all NFA states NS included in any state that is part
of the set of NFA states represented by this DFA state. Ok, that’s a bit
confusing. Let’s try a modest example. Assume that DFA state x,
includes/represents NFA states {3,6,9,11} and that the character we’re
building a transition on is ‘B’. Assume further than NFA state 6 on a B
goes to NFA state 12, and that NFA state 9 on a B goes to NFA state 8.
Then we would build a “new” state by performing a set union of the
(NFA) epsilon-closure of NFA state 12 and the epsilon-closure of the
NFA state 8. Again, let’s say that the union of those two epsilon sets is
{3,4,6,8,12,15,16}. What we would do next is to see if one of the
existing DFA states includes exactly that set of NFA states. If not, we add
a new DFA state corresponding to {3,4,6,8,12,15,16}. But if we already
HAVE such a DFA state, say state y, we don’t need to add it to the DFA
table. In either case, we mark the column for row x and column B to be
the (possibly just added) DFA state representing the NFA states
{3,4,6,8,12,15,16}.
b. Eventually (it may take a while, thank goodness we’re having the
computer do this for us) the algorithm will converge and we’ll have a
complete table with all transitions included and no uncompleted rows of
the DFA state transition table. Then we’re done. Well, almost. We still
need to mark the initial and final states. The initial state is easy. Given
the way I defined the algorithm above, DFA state 1 will be the epsilon
closure of the NFA’s initial state and that will be the initial state for the
DFA. Now, the final state(s) is a bit more difficult. In general, the final
states for either an NFA or a DFA will be a set. But since we built our
NFA from a regular expression using a specific algorithm the NFA will
have a single final state. However, that set of final states for the DFA will
be any DFA state that “includes” the NFA final state among the set of
NFA states it represents.
Ok, now its time to have an extended example of how to build the DFA transition table.
Please go to the (hopefully separate) window that includes a copy of a handout
specifying
1.
2.
3.
4.
An input regular expression string,
The NFA transition table
The initial and final states of the NFA, and
The epsilon closure for each state (row) of the NFA.
Ok, let’s take our example NFA transition table along with the initial and final states of
the NFA and the list of epsilon-transitions and build our DFA table a row (state) at a
time.
First we build the “error” state, which I’m designating row (state) 0 of the DFA. That’s a
completely arbitrary choice, but let’s all go with it to help the grading process. Then our
initial DFA table will be:
DFA State A B C D Set of NFA States
0
0 0 0 0
{}
Next we want to add our “initial” DFA state, state1 based upon the epsilon closure of the
NFA’s initial state, state 10. So we have
DFA State A B C D Set of NFA States
0
0 0 0 0
{}
1
{0, 8, 10}
Note that all we have so far is the DFA state number and the set of NFA states that this
DFA state represents, in this case the {0, 8, 10} NFA states in the epsilon-closure of NFA
state 10. Now we look for any transitions from any of those three NFA states (0,8,10)
and we first find that NFA state 0 goes to NFA state 1 on an ‘A’. So we build a new
DFA state, 2, based upon the epsilon closure of 1 (the “destination” from DFA state 1 on
an ‘A’), and we have.
DFA State A B C D Set of NFA States
0
0 0 0 0
{}
1
2
{0, 8, 10}
2
{1, 2}
Of course if we already had a DFA state corresponding to the set {1, 2} we wouldn’t
have added a row, but since we didn’t have such a row we add one now. And now it’s
back to checking the rest of the columns for DFA state 1. Note that NFA state 8, “part”
of DFA state 1, has a transition on a ‘D’. So we add that in. And when we do that we
get yet another new DFA state based upon the epsilon-closure of 9 (the NFA state
destination from NFA state 8 on a ‘D’.) Again, we add a new DFA row, yielding
DFA State A B C D Set of NFA States
0
0 0 0 0
{}
1
2
3
{0, 8, 10}
2
{1, 2}
3
{9, 11}
But we still need to complete DFA state 1. Since none of NFA state 0, 8, and 10 have
transitions for ‘B’ or ‘C’ we fill in transitions to the error state, 0 and we now have
completed DFA state 1 and have: (Note: Adding transitions to this error state is exactly
what the youtube video leaves out. It just leaves those “boxes” in the table empty.
Shame!)
DFA State A B C D Set of NFA States
0
0 0 0 0
{}
1
2 0 0 3
{0, 8, 10}
2
{1, 2}
3
{9, 11}
So, we move on to “complete” DFA state 2, which, since it has a single non-error
transition gives us an updated table
DFA State A B
0
0 0
1
2 0
2
0 4
3
4
C D Set of NFA States
0 0
{}
0 3
{0, 8, 10}
0 0
{1, 2}
{9, 11}
{3, 4, 6, 7, 11}
Four is a new DFA state based upon the epsilon-closure of NFA state three which is
where the NFA transition from q2 to q3 on a ‘B’ goes. Having now completed DFA 2,
after adding the “error” transitions, we find that DFA state 3 has NO DFA transitions
(NFA states 9 and 11 have only epsilon transitions) and so its row is completed and we
move to DFA state four with:
DFA State A B
0
0 0
1
2 0
2
0 4
3
0 0
4
C D Set of NFA States
0 0
{}
0 3
{0, 8, 10}
0 0
{1, 2}
0 0
{9, 11}
{3, 4, 6, 7, 11}
In similar fashion we see that DFA state four has a single transition, on a ‘C’ that leads to
the DFA state {4, 5, 7, 11}which is the epsilon-closure of NFA state 5. We now have:
DFA State A B
0
0 0
1
2 0
2
0 4
3
0 0
4
0 0
5
C D Set of NFA States
0 0
{}
0 3
{0, 8, 10}
0 0
{1, 2}
0 0
{9, 11}
5 0
{3, 4, 6, 7, 11}
{4, 5, 7, 11}
Note that we include a new DFA state 5, even though the NFA subset of DFA state 4
includes all of the NFA states that DFA state 5 does, because the sets are NOT the same.
More specifically, DFA state 4 includes NFA state 3 and DFA state 5 does not. So we
need DFA states for each. And finally, after completing DFA state 5, we see that we do
NOT have any new states and so we’re done and the final DFA state transition table is:
DFA State A B
0
0 0
1
2 0
2
0 4
3
0 0
4
0 0
5
0 0
C D Set of NFA States
0 0
{}
0 3
{0, 8, 10}
0 0
{1, 2}
0 0
{9, 11}
5 0
{3, 4, 6, 7, 11}
5 0
{4, 5, 7, 11}
We have only the initial and final states to specify and we’re done. As stated before the
algorithm I’ve outlined always includes DFA state 1 as the initial state and any DFA state
that includes any (of the one) final state(s) of the NFA is a final state for the DFA so we
have:
Initial State = 1
Final States = {3, 4, 5}
A few final notes:
1. You don’t need to include a column in your table for the “Set of NFA States”. I
only included it here to (hopefully) add clarity,
2. You will be provided with both print routines to print out the NFA and DFA
tables, again in hopes of making the grading easier, and
3. You will also be provided with a Set implementation set.h and set.c
I anticipate that you’ll use (and submit) both the printTable.c and set routines in your
code.
Download