Properties of Regular Languages Chapter 4

advertisement
Properties of
Regular Languages
Chapter 4
Regular Languages
A language is regular iff it is accepted by some
DFA.
Example: L = {w  {a, b}* :
every a is immediately followed by a b}.
Languages: Regular or Not?
a*b* is regular.
{anbn: n  0} is not.
{w  {a, b}* : every a is immediately followed by b} is regular.
{w  {a, b}* : every a has a matching b somewhere} is not
Note: We may define a grammar for a non-regular language,
but not a regular grammar (left- or right-linear)
Closure Properties of Regular Languages
● Union
● Intersection
● Concatenation
● Difference
● Kleene
● Reverse
star
● Complement
● Letter
substitution
Showing that a Language is Regular
Theorem: Every finite language is regular.
Proof: If L is the empty set, then it is defined by the
regular expression  and so is regular. If it is any finite
language composed of the strings s1, s2, … sn for some
positive integer n, then it is defined by the regular
expression:
s1 + s2 + … + sn
So it too is regular.
Showing that a Language is Regular
Example:
Let
L = L1 ∩ L2, where:
L1 = {anbn, n  0}, and
L2 = {bnan, n  0}
L1 and L2 are not regular,
but L = λ, which is regular
More Examples
= {w  {0 - 9}*: w is the social security number of the
current US president}.
● L1
● L2
= {1 if Santa Claus exists and 0 otherwise}
● L3
= {1 if God exists and 0 otherwise}
Examples of finite languages. Even though they may be
unknown (or debatable), the languages are finite;
therefore regular.
Showing That L is Regular
1. Show that L is finite.
2. Exhibit an FSM for L.
3. Exhibit a regular expression for L.
4. Show that the number of equivalence classes of L is
finite.
5. Exhibit a regular grammar for L.
6. Exploit the closure theorems.
Identifying Nonregular Languages
Finite languages are regular, but what
about infinite languages???
Some are regular: a*b*
Some are not: L = {anbn: n  0}
Pigeonhole Principle
If n objects are put into m boxes,
and if n > m, then at least one box
must have more than 1 object in it.
Indicates
repetition
is needed
Showing that a Language is Not Regular
Every regular language can be accepted by some FSM.
It can only use a finite amount of memory to record
essential properties.
The only way to generate/accept an infinite language with
a finite description is to use:
• Kleene star (in regular expressions), or
• cycles (in automata).
This forces some kind of simple repetitive cycle within the
strings.
Example:
ab*a generates aba, abba, abbba, abbbba, etc.
How Long a String Can Be Accepted?
What is the longest string that a 3-state FA can accept?
Theorem – Long Strings
Theorem: Let M = (Q, , , q0, F) be any DFSM. If M
accepts any string of length |Q| or greater, then that string
will force M to visit some state more than once (thus
traversing at least one loop).
Proof: M must start in one of its states. Each time it reads
an input character, it visits some state. So, in processing a
string of length n, M creates a total of n + 1 state visits. If
n+1 > |Q|, then, by the pigeonhole principle, some state
must get more than one visit. So, if n  |Q|, then M must
visit at least one state more than once.
The Pumping Lemma for Regular Languages
Let L be an infinite regular language.
So, m  1 :  strings w  L, where |w|  m
 x, y, z
w = xyz,
|xy|  m,
|y|  1, and
Note:
m is the number of states in an FA
that accepts L
y represents a repetition in string w
(a loop in the FA)
i represents the number of times
through the loop
wi = xyi z is also in L, for all i = 0, 1, 2, …
The Pumping Lemma for Regular Languages
Paraphrased:
Every sufficiently long string in L can be broken into
three parts in such a way that an arbitrary number of
repetitions of the middle part yields another string in L.
We say that the middle string is “pumped”.
Example: {anbn: n  0} is not Regular
Choose w to be ambm
(We get to choose any w; m is from pumping lemma).
1
2
aaaaa…aaaaabbbb …bbbbbb
x
y
z
We show that there is no x, y, z with the required properties:
|xy|  m,
|y|  1, and
wi = xyi z is also in L, for all i = 0, 1, 2, …
Since |xy|  m, y must be in region 1. So y = ap for some p  1.
Assume that i = 2 (we pump one extra copy of y), this means:
am+pbm
which  L, since it has more a’s than b’s.
Using the Pumping Theorem Effectively
To choose w (a string in the language L):
• Choose a w with as homogeneous as possible in
initial region of length at least m.
• Choose a w that is only barely in L.
To choose i:
• Try letting i be either 0 or 2.
• If that doesn’t work, analyze L to see if there is
some other specific value that will work.
PalEven = {ssR : s  {a, b}*}
1. Let w = ambmbmam
Therefore, |w| = 4m, |w|  m.
2. w = xyz, where |xy|  m and and |y|  1; therefore y
must occur in the first m characters, so y = ap, p > 0.
3. Let i = 2 (make an extra copy of y). This means that
the first half of the string has more a’s than the
second half of the string and is not in PalEven.
Therefore, by contradiction of the pumping lemma,
PalEven is not a regular language.
{axby: x < y}
L = {axby: x < y}
Assume L is regular and m = # of states in the FA that
accepts L:
1. Let w = ambm+1
Therefore, |w| = 2m + 1, |w|  m.
2. w = xyz, where |xy|  m and and |y|  1; therefore y
must occur in the first m characters, so y = ap, p > 0.
3. Let i = 2 (make an extra copy of y). This means that
the resulting string is am+p bm+1
Therefore, by contradiction of the pumping lemma,
PalEven is not a regular language.
{axby: x  y}
{axby: x  y}
Assume L is regular and m = # of states in the FA that
accepts L:
1. Let w = ambm
Therefore, |w| = 2m, |w|  m.
2. w = xyz, where |xy|  m and and |y|  1; therefore y
must occur in the first m characters, so y = ap, p > 0.
3. Let i = 0 (pump out y). This means that the resulting
string is am-p bm. Since p > 0, there are now fewer
a’s than b’s.
Therefore, by contradiction of the pumping lemma,
PalEven is not a regular language.
Poetry
The Pumping Lemma
Any regular language L has a magic number p
And any long-enough word in L has the following property:
Amongst its first p symbols is a segment you can find
Whose repetition or omission leaves x amongst its kind.
So if you find a language L which fails this acid test,
And some long word you pump becomes distinct from all the
rest,
By contradiction you have shown that language L is not
A regular guy, resiliant to the damage you have wrought.
But if, upon the other hand, x stays within its L,
Then either L is regular, or else you chose not well.
For w is xyz, and y cannot be null,
And y must come before p symbols have been read in full.
As mathematical postscript, an addendum to the wise:
The basic proof we outlined here does certainly generalize.
So there is a pumping lemma for all languages context-free,
Although we do not have the same for those that are r.e.
-- Martin Cohn
Download