Converting an NFA to a DFA This assignment builds on the work you’ve done in Major Assignment 2 in which your program read a regular expression (in a standard form) and created a state transition table for an NFA that recognizes the same strings as the input regular expression. In this assignment you will build a deterministic finite automata (DFA) using the NFA state transition table. The DFA your program will build will again be represented by a state transition table. Let’s start by defining more precisely what we mean (or at least what I mean) by a DFA. A DFA will have an alphabet, that’s a set of input characters, which for our purposes will be capital letters A .. Z. It will also have some number of DFA states, each represented by a row of the DFA state transition table. The DFA will have NO epsilon transitions at all. Finally it will have a number of transitions, exactly ONE transition for each possible combination of input character (capital letter) and state (row of the transition table) of the DFA. I want to emphasize this point as some people do not require that each state have a transition for EACH possible input character. In this class (and most textbooks I’ve seen) we WILL require that, however. One reason I’m emphasizing that is that there is a youtube video that goes through an example of building an NFA without epsilon-transitions from an NFA that does include epsilon transitions and that example could be useful to you in solidifying your concept of the algorithm used to convert an NFA to a DFA. But, alas, the presenter does NOT require that each row have a transition for each possible input character. The “fix” to turn his “DFA” into one that fits our definition is simple, but you need to be aware of it. More on that later. In the rest of this description of how to convert an NFA to a DFA, I’ll quickly describe the two algorithms that you need to use, and then go through an extended example, at least of the second, more complicated algorithm. So, the two algorithms you need to implement are: 1. Compute the epsilon-closure of each state (row) in the NFA state transition table. To do this you wish to identify, for each state, S, of the NFA table each state that can be reached from S by follow 0 or more epsilon (marked ‘e’ in our table) transitions. Although you could do this in many different ways, I suggest a recursive algorithm to find all such states (rows). 2. Build the states (rows) of the DFA state transition table. This requires some initialization and then starting with the “initial” state, repeatedly building new state(s) by following transitions (of the NFA states) until all DFA states (rows) have been completed. As to initialization, we’ll start with two states, the first (DFA state 0) will be what is often called an error state. This “error” state is exactly what is missing in the youtube tutorial I’ll point you to soon. The error state (DFA state 0) will have a transition to itself for each possible input character (capital letter). The next DFA state (1, of course) will represent the epsilon- closure of the initial state of the NFA. In general, each DFA state will represent a subset of the NFA states. Thus the “error” state will be the empty set {}, that includes none of the NFA states. DFA state 1, which will be the initial state for the DFA will represent those (NFA) states included in the epsilon closure of the initial state of the NFA. Once we have those two states identified and the error state, DFA state 0, already completed since we have a transition for each possible input character, we keep applying the following rule to fill out the DFA state transition table. a. Pick the next “uncompleted” row of the DFA table and, for each possible input variable compute a set of states, SofS, that includes the set union of the epsilon closures of all NFA states NS included in any state that is part of the set of NFA states represented by this DFA state. Ok, that’s a bit confusing. Let’s try a modest example. Assume that DFA state x, includes/represents NFA states {3,6,9,11} and that the character we’re building a transition on is ‘B’. Assume further than NFA state 6 on a B goes to NFA state 12, and that NFA state 9 on a B goes to NFA state 8. Then we would build a “new” state by performing a set union of the (NFA) epsilon-closure of NFA state 12 and the epsilon-closure of the NFA state 8. Again, let’s say that the union of those two epsilon sets is {3,4,6,8,12,15,16}. What we would do next is to see if one of the existing DFA states includes exactly that set of NFA states. If not, we add a new DFA state corresponding to {3,4,6,8,12,15,16}. But if we already HAVE such a DFA state, say state y, we don’t need to add it to the DFA table. In either case, we mark the column for row x and column B to be the (possibly just added) DFA state representing the NFA states {3,4,6,8,12,15,16}. b. Eventually (it may take a while, thank goodness we’re having the computer do this for us) the algorithm will converge and we’ll have a complete table with all transitions included and no uncompleted rows of the DFA state transition table. Then we’re done. Well, almost. We still need to mark the initial and final states. The initial state is easy. Given the way I defined the algorithm above, DFA state 1 will be the epsilon closure of the NFA’s initial state and that will be the initial state for the DFA. Now, the final state(s) is a bit more difficult. In general, the final states for either an NFA or a DFA will be a set. But since we built our NFA from a regular expression using a specific algorithm the NFA will have a single final state. However, that set of final states for the DFA will be any DFA state that “includes” the NFA final state among the set of NFA states it represents. Ok, now its time to have an extended example of how to build the DFA transition table. Please go to the (hopefully separate) window that includes a copy of a handout specifying 1. 2. 3. 4. An input regular expression string, The NFA transition table The initial and final states of the NFA, and The epsilon closure for each state (row) of the NFA. Ok, let’s take our example NFA transition table along with the initial and final states of the NFA and the list of epsilon-transitions and build our DFA table a row (state) at a time. First we build the “error” state, which I’m designating row (state) 0 of the DFA. That’s a completely arbitrary choice, but let’s all go with it to help the grading process. Then our initial DFA table will be: DFA State A B C D Set of NFA States 0 0 0 0 0 {} Next we want to add our “initial” DFA state, state1 based upon the epsilon closure of the NFA’s initial state, state 10. So we have DFA State A B C D Set of NFA States 0 0 0 0 0 {} 1 {0, 8, 10} Note that all we have so far is the DFA state number and the set of NFA states that this DFA state represents, in this case the {0, 8, 10} NFA states in the epsilon-closure of NFA state 10. Now we look for any transitions from any of those three NFA states (0,8,10) and we first find that NFA state 0 goes to NFA state 1 on an ‘A’. So we build a new DFA state, 2, based upon the epsilon closure of 1 (the “destination” from DFA state 1 on an ‘A’), and we have. DFA State A B C D Set of NFA States 0 0 0 0 0 {} 1 2 {0, 8, 10} 2 {1, 2} Of course if we already had a DFA state corresponding to the set {1, 2} we wouldn’t have added a row, but since we didn’t have such a row we add one now. And now it’s back to checking the rest of the columns for DFA state 1. Note that NFA state 8, “part” of DFA state 1, has a transition on a ‘D’. So we add that in. And when we do that we get yet another new DFA state based upon the epsilon-closure of 9 (the NFA state destination from NFA state 8 on a ‘D’.) Again, we add a new DFA row, yielding DFA State A B C D Set of NFA States 0 0 0 0 0 {} 1 2 3 {0, 8, 10} 2 {1, 2} 3 {9, 11} But we still need to complete DFA state 1. Since none of NFA state 0, 8, and 10 have transitions for ‘B’ or ‘C’ we fill in transitions to the error state, 0 and we now have completed DFA state 1 and have: (Note: Adding transitions to this error state is exactly what the youtube video leaves out. It just leaves those “boxes” in the table empty. Shame!) DFA State A B C D Set of NFA States 0 0 0 0 0 {} 1 2 0 0 3 {0, 8, 10} 2 {1, 2} 3 {9, 11} So, we move on to “complete” DFA state 2, which, since it has a single non-error transition gives us an updated table DFA State A B 0 0 0 1 2 0 2 0 4 3 4 C D Set of NFA States 0 0 {} 0 3 {0, 8, 10} 0 0 {1, 2} {9, 11} {3, 4, 6, 7, 11} Four is a new DFA state based upon the epsilon-closure of NFA state three which is where the NFA transition from q2 to q3 on a ‘B’ goes. Having now completed DFA 2, after adding the “error” transitions, we find that DFA state 3 has NO DFA transitions (NFA states 9 and 11 have only epsilon transitions) and so its row is completed and we move to DFA state four with: DFA State A B 0 0 0 1 2 0 2 0 4 3 0 0 4 C D Set of NFA States 0 0 {} 0 3 {0, 8, 10} 0 0 {1, 2} 0 0 {9, 11} {3, 4, 6, 7, 11} In similar fashion we see that DFA state four has a single transition, on a ‘C’ that leads to the DFA state {4, 5, 7, 11}which is the epsilon-closure of NFA state 5. We now have: DFA State A B 0 0 0 1 2 0 2 0 4 3 0 0 4 0 0 5 C D Set of NFA States 0 0 {} 0 3 {0, 8, 10} 0 0 {1, 2} 0 0 {9, 11} 5 0 {3, 4, 6, 7, 11} {4, 5, 7, 11} Note that we include a new DFA state 5, even though the NFA subset of DFA state 4 includes all of the NFA states that DFA state 5 does, because the sets are NOT the same. More specifically, DFA state 4 includes NFA state 3 and DFA state 5 does not. So we need DFA states for each. And finally, after completing DFA state 5, we see that we do NOT have any new states and so we’re done and the final DFA state transition table is: DFA State A B 0 0 0 1 2 0 2 0 4 3 0 0 4 0 0 5 0 0 C D Set of NFA States 0 0 {} 0 3 {0, 8, 10} 0 0 {1, 2} 0 0 {9, 11} 5 0 {3, 4, 6, 7, 11} 5 0 {4, 5, 7, 11} We have only the initial and final states to specify and we’re done. As stated before the algorithm I’ve outlined always includes DFA state 1 as the initial state and any DFA state that includes any (of the one) final state(s) of the NFA is a final state for the DFA so we have: Initial State = 1 Final States = {3, 4, 5} A few final notes: 1. You don’t need to include a column in your table for the “Set of NFA States”. I only included it here to (hopefully) add clarity, 2. You will be provided with both print routines to print out the NFA and DFA tables, again in hopes of making the grading easier, and 3. You will also be provided with a Set implementation set.h and set.c I anticipate that you’ll use (and submit) both the printTable.c and set routines in your code.