Chapter 4: Properties of Regular Languages

advertisement
Chapter 4: Properties of Regular Languages
We begin by looking at closure properties. A set is closed under an operation if
whenever two elements of the set are combined using that operation then the result is
also in the set. For example, If two integers are added, subtracted or multiplied then the
sum, difference or product is also an integer. However, when one integer is divided by
another we have no such guarantee—the result might be a fraction.
Keeping in mind that a language is just a set of strings over an alphabet, as you might
suspect from the proof of creating an NFA from a regular expression, the regular
languages are closed under union. That is, if we take the union of two regular
languages, the resulting language is also regular. This can be shown by connecting the
machines for the two languages using the same construction as in case 1 of the proof of
Theorem 3.1. Some closure properties will be proven constructively and others are
proven by theorem. The constructive ones basically involve combining DFA’s for the
languages in some way (as with the union of two languages above). Although we may
choose any of the equivalent representations of a regular language, the DFA is usually
the most straightforward one to use in a construction because it has exactly one move
from every state on every input symbol and has no -moves. The other thing we’ll look
at in this chapter is what questions about regular languages can be answered.
_
Theorem 4.1 If L1 and L2 are regular languages, then so are L1  L2, L1  L2, L1•L2, L1
and L1*. Thus, the family of regular languages is closed under union, intersection,
concatenation, complementation and star-closure.
Proof: If L1 and L2 are regular, then there are DFAs M1 and M2 such that L1 = L(M1) and
L2 = L(M2). As in the proof of theorem 3.1, we can build NFAs that recognize L 1  L2,
L1•L2, and L1*.
We have already discussed how to find a machine for the complement
of a language accepted by a DFA and we use that to show closure under complement.
Specifically, let M = (Q, ,  q0, F) be a DFA that accepts L1. Then, the DFA M = (Q, ,
 q0, Q - F) accepts the complement of L1. To see this we make the following
observation: it’s important that we defined the DFA in the way that we did, namely, that
 is a total function so there is a transition from every state on every input symbol.
Otherwise, if a transition were missing we could have a string that was in neither L(M)
nor L(M). Thus, either *(q0, w)  F in which case w  L((M) or *(q0, w)  Q - F in
which case w is in the complement of L1.
Note: We can’t do the proof in this manner with an NFA. Suppose w  L(M), but
because M is an NFA we might end up in a nonaccept state when we follow a walk
starting at q0 labeled w in GM. Following another walk labeled w starting at q0 through
GM would lead us to an accept state implying that w  L(M).
For closure under intersection, let’s use DeMorgan’s law that the complement of an
intersection of two languages is the union of the complements of the languages.
_______
__
__
L1  L2 = L 1  L2
__
__
That is, since L1 and L2 are regular so are L1 and L2, and the union of those
complements is also regular by closure under union. Then, the complement of that
union must also be regular.
Note, the book does this constructively. Let’s look briefly at that just to see a more
sophisticated construction proof. Let L1 = L(M1) and L2 = L(M2) where M1 = (Q, , 1, q0,
F1) and M2 = (P, , 2, p0, F2). Then we construct an automaton M3 = (Q  P, , 3, (q0,
p0), F3). Basically, we form a new automaton whose set of states is the Cartesian
product of the states of the other two automata i.e. each state in the new machine is an
ordered pair of states from the original machines. As you might expect, the new 
function acts like 1 on the first component in an ordered pair and like 2 on the second
component. That is,
3((qi, pj), a) = (1(qi, a), 2(pj, a)). The set of final states F3 = {(qf, pf) | qf  F1 and pf 
F2}. It is straightforward to show that w  L1  L2 if and only if w  L(M3).
There are two more closure properties that are easy to establish. Regular languages
are
__
closed under difference: L1 – L2 = L1  L2 because of closure under intersection and
complement.
Similarly, Theorem 4.2 states that the regular languages are closed under reversal
which is easy to show constructively. Given a language L, LR is obtained by reversing
each string in L. Here’s how to prove closure under reversal using automaton
construction. Basically we start with an automaton M for L and introduce a new start
state q0'. We then put in -moves from the new start state to each of the original final
states. Then, we reverse all of the arrows in the underlying graph. The new final state
is the original start state q0. Mathematically we have:
Proof. Let L = L(M) for a DFA M = (Q, , , q0, F). Construct an automaton
M′ = (Q  {q0’}, , ′, q0′, {q0}) for LR, where ′(q, a) = p iff (p, a) = q. For new start
state q0′, ′(q0′, ) = p for every p  F
Summarizing, we have shown that the regular languages are closed under, union,
concatenation, star closure, complement, difference, intersection and reversal. Here
are some examples of ways in which closure properties can be used to prove or
disprove whether a language is regular.
_
Example 1: Prove or disprove: If L is not regular then L is not regular.
_
Proof by contradiction. Suppose L is a language that is not regular but L is regular.
_
Since regular languages are closed under complement, the complement of L = L must
be regular. This is a contradiction. Therefore if L is not regular then its complement is
not regular either.
_
(Note: the complement of L is denoted by L with two bars on top of it. )
Example 2: If L is regular then L′ = {xy | x  L and y  L} is also regular
_
_
Since L is regular, so is L. Then L′ = L•L which is regular since the regular languages
are closed under concatenation.
Example 3: #8 p. 109 Define the complementary or (cor) of two languages by
_
_
cor(L1, L2) = {w | w  L1 or w  L2}. Show that the family of regular languages is closed
under the cor operation.
class exercise—fill in the proof below
Proof: Let L1 and L2 be regular languages.
Closure under other operations
Before we get into additional operations under which the regular languages are closed
let’s take a look at the language L = {0n1n}. This is a context-free language that is not
regular. We’ll prove this later, but intuitively we can justify that this is not regular by
making the observation that finite automata cannot count i.e. if a string in this language
was input to an automaton the machine would have to “count” the number of 0’s so that
when the 1’s are encountered it could determine if there are exactly as many 1’s as 0’s.
A machine cannot add states in order to count the 0’s. Once the machine is built, the
number of states if fixed.
Definition 4.1: Let  and  be alphabets. Then a function h:  * is called a
homomorphism. That is, a homomorphism is a substitution in which a single letter is
replaced by a string.
Although this is defined for alphabet symbols, a homomorphism can be extended to
languages in the obvious way. If h is a homomorphism and w = a1a2…ak is a string,
then h(w) = h(a1)h(a2)…h(ak). That is, the homomorphism is applied to every symbol in
the string. Then if L is a language, h(L) = {h(w) | w  L}. h(L) is called the
homomorphic image of L. If we start with a regular expression r for a language L then a
regular expression for h(L) can be obtained by applying the homomorphism to each
alphabet symbol in r.
For example, consider the languages L = {01, 10}* over  = {0, 1}. Let h be the
homomorphism defined by h(0) = ab and h(1) = c. Then, h(01) = abc and h(10) = cab.
In this case h(L) = {abc, cab}*. If r = (0 + 1)*11, then h(r) =(ab + c)*cc.
Theorem 4.3: Let h be a homomorphism. If L is a regular language, then its
homomorphic image h(L) is also regular. The family of regular languages is therefore
closed under arbitrary homomorphisms.
Proof method: Let L be a regular language and let r be a regular expression such that L
= L(r). We need to show that h(r) defines the language h(L). Specifically, it must be
shown that for every w  L(r), h(w)  L(h(r)) and conversely that for every y  L(h(r)),
there is a w in L such that y = h(w). Induction can be used to prove this. See the text
for a complete proof.
How not to use a homomorphism: An important thing to keep in mind is that the
homomorphism is defined on single alphabet symbols, not strings. That is, a single
alphabet symbol is replaced by a string not the other way around. If L = {(ab)ncn | n  0},
we cannot show that L is not regular by replacing ab by 0 and c by 1 to get {0n1n}. We’ll
have to find other methods to do that.
Another example of using closure properties
Example 4: Let L = {0i1n2n | i  1, n  0}. Suppose L is regular and consider the
following homomorphism:
h(0) = 
h(1) = 0
h(2) = 1
Then, h(L) = {0n1n} Since the regular languages are closed under homomorphism, if L
is regular this implies {0n1n} is regular. But, as we’ll show later, {0n1n} is not regular.
Therefore, we have a contradiction so L is not regular.
Note that the choice of homomorphism matters. For example, consider this
homomorphism h′ instead: h′(0) = , h′(1) = 0, h′(2) = 0. Then h′(L) = {02n} which is a
regular language.
Caution: This brings up an important point about the use of closure properties—if two
regular languages are combined under an operation for which the regular languages are
closed then we are guaranteed that the resulting language is regular. However, if
languages L1 and L2 are combined using an operation under which regular languages
are closed to create a new language L3, and both L1 and L3 are regular it does NOT
mean that L2 is regular. For example, let L1 =  and L2 = {0n1n}. Then L1  L2 = ,
but L2 is not a regular language even though  is a regular language. Thus, just
because the result of combining two languages is regular it does not mean that both of
the original languages that were combined are regular.
Since a homomorphism is a function, we can talk about inverse homomorphisms.
(Actually, it's more like a preimage than an inverse since the homomorphism is not
necessarily one-to-one or onto.) So, if h is a homomorphism on L, h-1(L) = {w | h(w) is in
L}. As one might expect, the regular languages are closed under inverse
homomorphism. Specifically, If h is a homomorphism from alphabet  to alphabet ,
and L is a regular language over , then h-1(L) is also a regular language.
You have to be a bit careful applying inverse homomorphisms because the answer isn’t
always what you might expect it to be. For example, let L = {aba, aabb} and define the
homomorphism h(0) = ab, h(1) = a. Then h-1(L) = {01} since aabb has no preimage
under the given homomorphism. Note that this implies that in general h(h-1(L))  L.
Now, let’s look again at one of the examples above: L = {(ab)ncn | n  0}. We can use
an inverse homomorphism to show this is not regular. Consider the homomorphism
h(0) = ab and h(1) = c. Then, h-1(L) = {0n1n | n  0} which we have already noted is
not a regular language.
A concept closely related to homomorphisms is of substitution. In this, case rather than
replacing each alphabet symbol by a string it is replaced by a language. Let’s go back
to the example L = {01, 10}*. Define a substitution s on L by s(0) = a* and s(1) = (a + b).
Then s(L) becomes {a*(a + b), (a + b)a*}*.
Definition 4.2 Let L1 and L2 be languages over the same alphabet. Then, the right
quotient of L1 with L2 is defined as L1/L2 = {x | xy  L1 for some y  L2}. (Play with this a
bit to make sure you understand how this works.)
To form the right quotient we take all strings in L 1 that have a suffix belonging to L2.
Every such string, after removal of this suffix belongs to L 1/L2.
Theorem 4.4: If L1 and L2 are regular languages, then L1/L2 is also regular. Thus, we
say that the family of regular languages is closed under right quotient with a regular
language.
See the book for the proof, we’re not going to do it.
Example 5: (Example 4.5 in Linz:) Let L1 = L(a*baa*) and L2 = L(ab*). Find L1/L2. The
answer is L1/L2 = L(a*ba*). Here’s the reasoning behind the answer:
Look at L2 = {a, ab, abbb, …}. Since none of the strings in L1 end in b, the only string in
L2 that we need to worry about is a. Thus, the problem becomes L 1/{a} which can easily
be seen to be a*ba* since it just removes the mandatory a from the end of each string in
L1.
Example 6: Another constructive example.
Prove that if L is regular then so is drop(L) = {x | x is formed by dropping one letter from
a word in L}.
Note: this will not be discussed in class unless you have questions about it.
Proof sketch: Let L be a regular language and let M = (Q, Σ, , q0, F) be a DFA such
that L(M) = L. Construct an NFA M' that consists of two copies of M. Stay in the first
copy until a letter is dropped. Then use  to move to the second copy and continue in
the second copy until the string as been read. We’ll denote states in the second copy of
M by q' to distinguish them from the states in the original machine. Construct M' = (Q 
Q', Σ, ', q0, F')
States of M': Q  Q' where qi'  Q' iff qi  Q
Final states: F' = {p' | p  F}
The transition function: '(qi, a) = (qi, a) for all qi  Q
'(qi', a = qj iff (qi, a) = qj.
'(qi, ) = {qj' |  a   such that (qi, a) = qj} It is moves of this last type that allow us to
drop the letter.
For example, the construction looks something like this:
The next couple of examples rely on the fact that {anbn | n  0} is not regular. Again,
we’ll prove that fact shortly.
Example 7: Let L = {aibj | i, j  0, i  j}.
_
What is L? Your first guess is probably that the complement is aibi, but that isn’t right
since * - L contains strings like abab and baaab.
Is L regular?
_
Assume that L is regular. Then L is also regular. The language a*b* is regular. Since
regular
_
languages are closed under intersection, this implies that L  a*b* = {anbn} is regular,
giving us a contradiction so L is not regular.
So what have we learned here? Even though anbn is not the complement of L, strings of
that type are in L’s complement. So, if we can use another operation under which the
regular languages are closed to “extract” strings of that form from the complement then
we will get our contradiction. Since a*b* is regular (it’s a regular expression) we can
intersect that with the complement to isolate the strings with the right pattern.
Example 8: Let L = {aibjck | i = 0 or j = k}. That is, if there is at least one a then the
number of b’s equals the number of c’s. Consider the following homomorphism h:
h(a) = , h(b) = 0, h(c) = 1. Then, h(L) = {0j1k} but it is not necessarily true that j = k. In
this case, h(L) is regular, but this doesn’t give us any information about whether or not L
is regular.
Assume L is regular. Since ab*c* is regular, let L′ = L  ab*c* = {abncn}. By closure
under intersection, L’ is regular. Then, consider the homomorphism h we tried to use
before (h(a) = , h(b) = 0, h(c) = 1). This time, h(L′) = {0n1n} which is not regular.
Therefore, L could not have been regular either.
Example 9: (problem 4.1.16) Consider the statement: If L 1 is regular and L1  L2 is
also regular then L2 is regular. Prove or disprove this.
Remember the caution paragraph above. This is an example of how you must be
careful of how you use the closure properties. Suppose L1 = *. Then L1  L2 is also
* which is a regular set, but L2 could be any language such as {0n1n}, so using this L1
and L2 gives us a counterexample to the claim. Since L2 can be arbitrary, if this were
true it would imply that every language was regular.
Example 10: Consider the related problem If L1 is finite and L1  L2 is regular then L2 is
regular.
__ __
This statement is true. To see this let L3 = (L1  L2)  L1. L1 is regular since regular
languages are closed under complement and every finite language is regular.
Therefore, L3 is regular since it is the intersection of two regular languages. Since L1 
L2 is finite, then it too is regular. Now, we can write L2 = L3  (L1  L2) which is a
regular language since the regular languages are closed under union. (Suggestion:
convince yourself this works by using a Venn diagram.)
Download