Uses of DFAs

advertisement
Uses of DFAs


DFAs are a way of handling certain
specialized issues involving character strings
For example, one might want to determine
whether a given input string
–
–
–
–
is a legal identifier in some language
is a legal literal of a particular type in some
particular programming language
is a possible word in a given natural language
contains a particular substring
Recognizing legal
identifiers/literals

Typical sets of legal identifers (or literals)
include those
–
–
–
–

strings beginning with a letter followed by some
digits
strings consisting only of letters
strings that containing at most one digit
strings that begin with the underscore character
Note that all of these sets are languages
What’s a DFA (roughly)?




A DFA is an abstraction of a machine
It corresponds more to a program than to a
(programmable) computer
It answers questions about membership in
certain languages
That is, it answers yes/no questions of the
form: is a string in a particular set of strings?
–
Note that the examples given above all have this
form
An abstract machine for a simple
switch





An abstraction of light switch might have two
states
One would correspond to the “on” position
There would be transitions possible between
the states
It would need to be clear which state was the
initial state
This information could be summarized as in
the diagram below
An abstract machine diagram for a
simple switch
Labeled transitions




This machine is not yet a DFA, since it has
nothing to do with strings or languages.
By labeling the transitions with symbols from
an alphabet, we’d get a DFA (shown below)
If both transitions were labeled with a 0, the
resulting DFA would correspond to the set of
strings of odd length over the alphabet {0}.
That is, sequences of transitions that took us
from the initial state to the “on” state would
correspond to sequences of 0’s of odd length
A DFA for a simple switch

This DFA was constructed with JFLAP
Specifying a DFA

Just as in our example, specifying a DFA
requires specifying
–
–
–
–
–
An input alphabet ({0} in our example)
A set of states
An initial state (the “off” state in our example)
A set of final states (consisting only of the “on”
state in our example)
A set of transitions from state to state, each
labeled with a symbol from the input alphabet
DFA Pragmatics

Realistic examples can have large alphabets
and thus ugly diagrams
–

so we’ll use simple alphabets in most examples
States from which no final states are
reachable can be omitted from diagrams
–
We call these dead states (or trap states)
Generalizing the alphabet




Suppose we generalized our example to a
question about a larger alphabet – say {0,1}
Let’s (arbitrarily) say that we want to
recognize those strings with an odd number
of both 0’s and 1’s.
As you might guess, we need more than two
states to do this.
A DFA for this language is given below
–
Note the conventions for initial and final states,
and for state names
Recognizing strings with an odd
number of both 0's and 1's
What’s a DFA – precisely?

A DFA (deterministic finite acceptor – or
automaton) consists of
–
–
–
–
–

A finite (input) alphabet S
A finite set Q of states
An initial (or start) state q0 (from Q)
A set F of final states (a subset of Q)
A transition function d : Q x S -> Q
Note that both the alphabet and set of states
are finite -- a DFA has a finite description
Tabular representation of DFAs

Tabular representation of the DFA above:
–
–
–
–
–

d
q0
q1
q2
F q3
|
|
|
|
|
0 1
q1 q2
q0 q3
q3 q0
q2 q1
Note that the table implicitly gives the states,
the start state, and the input alphabet
–
The final states have to be explicitly labeled
States and strings

Note that states in the DFA above
correspond to strings as follows:
–
–
–
–

q0:
q1:
q2:
q3:
even number of 0's; even number of 1's
odd number of 0's; even number of 1's
even number of 0's; odd number of 1's
odd number of 0's; odd number of 1's
DFA states often have a (fairly) simple
representation in terms of the strings that are
taken there
The extended transition function



d may be extended to handle strings
This gives a function d*: Q x S* -> Q
d* is defined for all states q, strings w, and
symbols a by
–
–

d*(q,l) = q
d*(q,wa) = d(d*(q,w), a)
Note that d* extends d in the sense that
d*(q,a) = d(q,a)
–
to see this, let w = l above
Notational conventions


Lower-case letters near q in the English
alphabet are used for states, perhaps with
subscripts
We use S, Q, q0, F, d, and d* without
comment if only one DFA is under discussion
Acceptance of a string

A string x is accepted by a DFA M iff d*(q0,x) e
F,
–

where d* is the extended transition function
Fact: For this function d*,
d*(q,xy) = d* (d*(q,x), y) for any q, x, and y
Acceptance of a language


A language L is accepted (or recognized) by
M iff every string in L is accepted by M, and
no string outside of L is accepted by M
A DFA M accepts exactly one language.
–

A language is regular iff it is accepted by
some DFA.
–

This language is denoted L(M)
Fact: All finite languages are regular
Two DFAs are equivalent iff they accept the
same language.
More examples of DFAs

strings alternating 0's and 1's
State q3 is a dead state
Still another example of a DFA

For binary representations of integers
(without leading 0’s)
–
–
all strings over {0,1} beginning with 1
dead state omitted
Yet another example of a DFA


Strings over {0,1,.} that contain at most one
period
This DFA is not minimal
A DFA for binary multiples of 3

Here we identify l with 0 – a good exercise is
to redo this example without this assumption
Regular expressions




Regular expressions are used to represent
languages.
It turns out that every regular expression
represents a regular language,
Also, every regular language may be
represented by a regular expression.
The language represented by the regular
expression r is denoted L(r).
Regular expressions defined



An expression is a regular expression iff it
has one of the following forms:
–
f, l, or a, where a is a member of some S
–
r+s, rs, r*, or (r), where r and s are regular
expressions.
The expressions f, l and a respectively
represent f, {l}, and {a}
The expressions r+s, rs, r*, and (r)
respectively represent L(r) U L(s), L(r)L(s),
L(r)*, and L(r)
About regular expressions



Don’t confuse the empty language f with the
nonempty language {l}
The precedence of the * operator is higher
than for juxtaposition; for +, it’s lower.
These pairs of regular expressions are
equivalent (represent the same languages)
–
–
–
–
rl and r; and r and lr
(rs)t and r(st)
(r+s)t and rt + st; and r(s+t) and rs + rt
r* and (r*)*; and (r+s)* and (r*s*)*
Examples of regular expressions

(0+1+2)*
–

a(a+b+c)*
–

all strings over {0,1} having 010 as a substring
000+001+010+011+100+101+110+111
–

all strings over {a,b,c} starting with a
(0+1)*010(0+1)*
–

all strings over {0,1,2}
all bit strings of length 3
(0+1)(0+1)(0+1)(0+1)
–
all bit strings of length 4
Regular expressions and regular
languages




For every regular expression r, L(r) is a
regular language.
If L is a regular language, then L = L(r) for
some regular expression r.
We will soon see constructive proofs of these
claims.
The proof of the first claim requires a new
concept -- nondeterminism
Problems with constructing DFAs

For some languages, the FAs that are most
natural are incorrect. For example,
–
–
–
–
for (a+b)*ab(a+b)*, the natural FA below isn't
deterministic (d is not a function)
for 0*1*, the natural FA recognizes (0+1)*
for 10+(12)*, the natural FA isn't deterministic, and
accepts too many strings
for (10)*12, the natural FA isn't deterministic
An attempt at a DFA for
L( (a+b)*ab(a+b)* )
Nondeterminism




One way of dealing with this issue is to allow
finite automata a choice of moves for a given
state and symbol.
This approach is called nondeterminism.
In some cases, it is also useful to allow lmoves that don't consume an input symbol.
It turns out that adding nondeterminism
(even with l-moves) does not allow us to
accept any new languages.
Modeling nondeterminism


Ways of thinking of nondeterminism:
omniscience
–

branching processes
–

allocate a new processor for each choice
simultaneous processes
–

suppose that you could always guess correctly
allow being in several states at once
backtracking
–
try each choice one after another
Sets of states



Our treatment of nondeterminism is based on
the third approach above.
We will allow the computation to be in
several states at once (or no state at all).
This requires a new transition function:
–
–
d: Q x (S U {l})-> 2Q
Note that l is a legal 2nd argument, and that the
value is a (possibly empty) set of states.
NFAs

If our definition of a DFA is modified to use
the new transition function, we get an NFA
–

U
Note that the “N” stands
for “nondeterministic”
rather than “not”.
The FA shown above for L((a+b)*ab(a+b)*) is
a legal NFA for that language, with table:
d |
a
b
l
q0 | {q0, q1} {q0} {}
q1 |
{} {q2} {}
F q2 | {q2} {q2} {}
Acceptance by NFAs

An NFA accepts a string x iff d*(q0, x)
contains a state of F, where d* is defined
informally in Linz (p. 51). More precisely,
d*(q, l) is the set of states accessible from q by
zero or more l moves
d*(q, a) is the set of states accessible from q by
zero or more l moves, then one a move and then
zero or more l moves
d*(q, wa) is the union over all p in d*(q, w) of d*(p, a)
d*(S, x) is the union over all q in S of d*(q, x)
An NFA for L((10)*12)

A table for an NFA for (10)*12:
–
–
–
–
–
δ | 0
1
2
q0 | {} {q1,q2} {}
q1 | {q0} {}
{}
q2 | {}
{} {q3}
F q3 | {}
{}
{}
l
{}
{}
{}
{}
The use of l moves in NFAs

A bad NFA for 0*1*:

A good NFA for 0*1*:
Constructing DFAs



We’ll soon have algorithms to find equivalent
DFAs for NFAs or regular expressions.
In other cases, try starting with a start state.
For every new state, determine destination
states from this state on every symbol
–
–
Tricky part: must these destination states be new
states, or can old states be reused?
Useful observation: if x and y go to the same
state, so do xz and yz for all z
Equivalence of NFAs and DFAs



Claim: Any language accepted by an NFA is
regular
To show the claim, we need to find, for any
NFA MN, a DFA MD with L(MD) = L(MN)
Idea: let states of the DFA correspond to
sets of states of the NFA
–

we’ll use brackets to denote states of MD
Although there will be 2m states (if MN has m
states), many will be dead or inaccessible
Constructing equivalent DFAs



[q0] will be the start state of MD
[S] is a final state iff S contains a final state of
MN, or if [S] = [q0] and MN accepts l
The transition function dD is defined by letting
dD([S],a) correspond to the union over all q in
S of dN*(q,a), where dN is MN’s d function
–

We use dN* rather than dN to allow l moves
By induction, dD*([S],x)= [dN*(S,x)] for all x
with |x| > 0, so that dD*([q0],x) = [dN*(q0,x)]
Example 1 of an equivalent DFA

The NFA below accepts L(0*1*)
–
–
–

δ | 0
q0 | {q0}
F q1 | {}
1
l
{} {q1}
{q1} {}
The equivalent DFA is given below
–
–
–
–
–
F [q0]
| [q0,q1] [q1]
F [q0,q1] | [q0,q1] [q1]
F [q1]
| []
[q1]
[]
| []
[]
Note: the 4th state is a trap (dead) state
Example 2 of an equivalent DFA

The DFA equivalent to the NFA given above
for L((10)*12) is:
–
–
–
–

δ
[q0]
[q1, q2]
F [q3]
| 0
1
2
| [] [q1, q2] []
| [q0] []
[q3]
| []
[]
[]
Here the dead state and inaccessible states
have been omitted
Another example with l moves

The NFA below accepts L(10+(12)*)
The equivalent DFA

The equivalent DFA is:
–
–
–
–
–
–
δ
F [q0]
[q2,q5]
F [q3]
F [q4]
[q5]
|
|
|
|
|
|
0
1
2
[] [q2,q5] []
[q3] []
[q4]
[]
[]
[]
[]
[q5]
[]
[]
[]
[q4]
Breadth-first search (BFS)


Several steps in the algorithms above require
finding all states accessible from some state
q.
These steps can be implemented by a
breadth-first search (BFS) algorithm
–
–
this algorithm is a specialization of the algorithm
of Linz, p. 9
it can be used in step 2 of Linz’s nfa-to-dfa
algorithm of p. 59
The BFS algorithm




The algorithm maintains a queue of
generated states whose successors have not
been generated.
Until the queue becomes empty, the next
state is dequeued and marked, and its
successors found.
Those successors that are not marked are
enqueued.
The queue will eventually empty, since there
are only finitely many states
Using the BFS algorithm

BFS can find all l-paths leaving a state q
–
–

Or all accessible states of a DFA
–
–

a “successor” is the destination of a l-transition
the queue is initialized with q only
a “successor” is the destination of any transition
the queue is initialized with q0 only
Or all nondead states of a DFA
–
–
q’s “successor” is the source of any transition to q
the queue is initialized with all states of F
Two useful notions

A useful notion in both DFA minimization and
in finding regular expressions for DFAs is
identifying states with the strings taken there
–


That is, we identify q with {x | d*(q0,x) = q}
Another useful notion for minimization is
distinguishability
For a language L, two strings x and y in S*
are distinguishable iff for some z in S*,
exactly one of {xz, yz} is in L
Minimal DFAs and distinguishability





A DFA separates x and y iff
d*(q0,x) /= d*(q0,y)
DFAs must separate distinguishable strings
DFAs may separate indistinguishable strings
One might expect minimal DFAs to separate
only distinguishable strings
Key to seeing that this is both correct and
feasible is the notion of equivalence relations
Equivalence relations and
minimization

Note first that indistinguishability is an
equivalence relation on S*.
–

Also, being taken to the same state by a DFA
M is an equivalence relation on S*.
–
–

We’ll use the infix operator ~ for this relation
The equivalence classes are the states of M
We’ll use the infix operator ~M for this relation
We’ll use the convention that [x] is the
(equivalence) class containing x
The minimal DFA for L

Note that for a given L, ~M refines ~
–
–
–


That is, if x ~M y, then x ~ y
So every class of ~M is in a single class of ~
And ~ has no more classes than any ~M
If the classes of ~ can represent the states of
some DFA for L, this DFA would be minimal.
And they can – we define a DFA M^ to have
this set of states, and alphabet S, and …
The minimal DFA for L


… the class [l] containing l as its start state
{[x] | x e L} as its set of final states
–

transition function d^, where d^([x],a) = [xa]
–

Note: this is well-defined; if y e [x], then [ya] = [xa]
An easy induction shows that d^*([l],x) = [x]
–

Note: strings in any class are all in L or all not in L
and thus that M^ accepts L
This construction depends only on L
–
and not on any DFA accepting L
Combining states


Our remarks about refinement show that M^
can be obtained by combining states of any
M accepting L
But which states of M do we combine?
–
–

the indistinguishable states, where for all x in S*,
both of {d*(p,x), d*(q,x)} are final or both not?
but these are hard to recognize
Instead we split distinguishable states
–
–
that is, successively refine a partition of Q
starting with a split of states into F and Q-F
Distinguishing states

States can be distinguished recursively, by
an “unzipping” process
–

using d rather than d*
Suppose that p and q are distinguishable
because d*(p,abc) is in F and d*(q,abc) isn’t
–
–
–
–
d*(p,abc) and d*(q,abc) are distinguished initially
d*(p,ab) and d*(q,ab) will be distinguished based
on c
then d*(p,a) and d*(q,a) will be distinguished
based on b
then p and q will be distinguished based on a
The minimization algorithm




To test whether a set S of states may be split,
find d(p,a) for all p in S and some a in S
If the results are in different sets of the
partition, split S based on these sets.
The minimization algorithm (cf. Linz, pp. 645), repeatedly applies this test (as a single
pass) to all nonsingleton sets in the current
partition, and a in S
It halts if no splits are made in a pass
–
only finitely many passes are needed, as for Linz
From regular expressions to DFAs

To find an NFA accepting L(r) for a regular
expression r, it’s enough to
–
–


find NFAs for the three basic regular expressions
Show how to deal with the three ways of creating
new regular expressions from old
The relevant constructions are given in Linz,
Figures 3.1-3.5
NFAs constructed in this way can be
converted to DFAs and minimized
From DFAs to regular expressions




Linz’s algorithm for constructing a regular
expression for L(M) is perhaps more
naturally expressed in terms of equations.
The solution technique is very much as for
simultaneous linear equations in algebra
We identify each state A with {x | d(q0,x) = A}
For each state we construct an equation
based on the incoming transitions.
Equations for DFAs

For example, the DFA of Figure 2.13 of Linz
corresponds to the pair of equations
–
–


A = l + Bb
B = Aa + Ba
Note that start states have a l term, and that
we may ignore dead states
We proceed by eliminating equations
–
It’s usually best not to eliminate the start state’s
equation
Eliminating equations

If r does not contain the variable X and s is a
regular expression, then X = r + Xs has the
solution X = rs*
–

e.g., B = Aa + Ba has solution B = Aaa*
After solving an equation, we may substitute
into other equations.
–
–
–
For us, A = l + Bb becomes A = l + Aaa*b
And solving for A gives L(M) = A = (aa*b)*
If B were final, we’d use that B = Aaa* =
(aa*b)*aa*
The example of binary multiples of 3

For binary multiples of 3, we get equations
–
–
–

We see that C = B01*, and then get
–
–

A = l + A0 + B1
B = A1 + C0
C = B0 + C1
A = l + A0 + B1
B = A1 + B01*0
From B = A1(01*0)* we get
–
–
A = l + A[0 + 1(01*0)*1] and then
L(M) = A = l[0 + 1(01*0)*1]* = [0 + 1(01*0)*1]*
Download