Regular Expression

advertisement
Lesson 6


Regular Expressions
Regular expressions are useful for specifying the language that a finite automaton
recognizes.
They are also employed to denote syntax for tokens in a programming language when
building a compiler or to check syntax for responses on web-applications forms.
Definition:
A regular expression over an alphabet :
1. Ø, ε are regular
2. a is regular  a  .
3. Given that ,  are regular expressions, then so are:
a) ( ◦ )
b) ( + )
c) (*), (*)
4. An expression is regular iff it can be formed by a finite number of application
of rules 1  3.
Regular Expressions and Regular Sets
Regular Expression
An expression built up from the previous
rules
Regular set
The set of strings that the expression
denotes
Let  = {0,1}
00
{00}
(0 + 1)*
{, 0, 1, 00, 01, 10, 11, 000, …, 111, …}
(0 + 1)*00(0 + 1)*
The set of binary strings containing ‘00’
{00, 000, 100, 001, 1001, 1100, 10100, …}
(0 + 1)*011
The set of binary strings ending with ‘011’
{011, 0011, 1011, 01011, 10011, … }
Now, let  = {0, 1, 2}
0*1*2*
The set of strings consisting of
an arbitrary number of 0’s followed by
an 

of 1’s and finally by
an 

of 2’s.
{, 0, 1, 2, 011, 1112, 011222, …}
1
L(E) denotes the regular set denoted by the expression E.
Consider the language expression E = {0n1 n 2 n | n  0}
Note that L(E)  L(0*1*2*).
i.e. L(E) = {, 012, 001122, …}, observe here the number of 0’s, 1’s, and 2’s must be equal.
In fact L(E)  L(0*1*2*).
A finite automaton M can be built such that L(M) = L(0*1*2*). We provide an -nfa.
0
q0
1

q1
2

q2
We note that no such fa can be constructed for L(E).
We may thereby conclude that {0n1 n 2 n | n  0} is not a regular language (a fact we will prove in
due time).
Problem:
Find a regular expression for the set of all strings over {b,c} containing an even
number of b’s.
Why is the answer not (bb)+?
Well, of course, zero is an even number, yet (bb)+ = {bb, bbbb, b6, …}
We try (bb)* …
But no c’s are permitted by this expression …
O.K … how about c*(bb)* c* ?
Why must the b’s be contiguous (next to each other) ?
2
And finally … E = c*(b c*b)* c*
The following -nfa M has L(M) = L(E).
c
q0
c

q1
b
q2
c
b
q3

q4

Regular Expressions and Finite Automata
We wish to prove:
Regular Expressions  Finite Automata
i.e. the class of languages that regular expressions can denote is equivalent to the class of
languages that a finite automata can recognize.
We have seen several such proofs of this ilk and recall their form.
I. Given an arbitrary regular expression E, we must be able to construct an fa M, such that
L(M) = L(E).
II. Given an arbitrary fa M, we must be able to construct a regular expression E, such that
L(E) = L(M).
__________________________________________
3
I. We begin with our basis machines:
Regular Expression E
1.
2.
Corresponding fa M
Ø
q0

q0
a
q0
qf
a
qf
Next, the inductive step is considered:
Suppose that  and  are arbitrary regular expressions. Then we may assume that M and
M, machines to recognize  and  respectively, exist.
3.
(a)
Then a machine for ( ◦ ) may be constructed as follows:

M
M
M ◦ 
The start state of M is the new start state. The accept state of M is this
composite machine’s accept state. And we have an –transition from M’s accept
state to the start state of M.
4
(b)
Constructing a machine for ( + ):


M


M
M
(c)
And finally, a machine for ()*



M

M *
We have shown that given an arbitrary regular expression, an equivalent finite automaton can be
constructed. Hence, we now have:
Regular expression  finite automata
5
Before completing our proof, let’s take an example to illustrate the aforementioned
constructions.
Example
Build a finite automaton for the regular expression (01 + 10)*
We begin with a machine for each of the regular expressions: 0 and 1.
M0
q0
0
q2
and M1
q0
1
q2
Next, we employ construction 3(a) to build machines for regular expressions 01 and 10.
0

M0
1
M1
M01
1

M1
0
M0
M10
6
Using construction 3(b) we obtain a finite automaton for (01 + 10)
0

1




1

0
M(01+10)
And finally rule 3(c) yields a machine for (01 + 10)*

0


1



1



0
M(01+10)

M(01+10)*
7
Part II of our proof that regular expressions  fa.
Wlog we may assure that our finite automaton is deterministic. … //Why?
The algorithm paradigm employed is dynamic programming. To solve a problem P we will first
solve all smaller problems. (Contrast this with the divide and conquer paradigm as employed
in binary search, quick sort, merge sort wherein only some smaller problems are first solved).
An example
0
1
1
M:
We desire a regular
expression such that
L(E) = L(M)
q2
q1
0
First, some notation:
We let
R
k
ij
stand for the set of all strings that take our machine from state i to state j never
passing through a state numbered higher than k.
Note, that to pass through a state means to enter and then leave that state (and not to leave
and then enter!).
R
k
ij
is defined recursively:

R = R R  R + R
k
k 1
k 1
k 1
k 1
ij
ik
kk
kj
ij
The basis steps are:
0
Rij = a if qi, a = qj with i  j, and
R
0
ij
= a +  if qi, a = qj with i = j
In our example L(M) =
R
2
12
//why?
8
0
1
1
recall
q1
q2
0
0
We start at the beginning:
0
R11 = 0 + 
R
R
R
0
22
0
12
0
21
= 1+
= 1
We use this to fill in the
following table:
= 0
R
R
R
R
k
11
k
22
k=0
k=1
0+
?
1+
?
1
?
0
?
k
12
k
21
1
R
11
= R110  R110  R110 + R110
= (0 + )(0 + )*(0 + ) + (0 + )
= 0*
1
R
22
= R021  R110  R120 + R 022
= 0 (0 + )*1 + (1 + )
= 00*1 + (1 + )
= 0*1 + 
1
R
12
= R110  R110  R120 + R120
= (0 + )(0 + )*1 + 1
= 00*1 + 1
= 0*1
9
0
1
1
recall
q1
q2
0
0
1
R
21
= R021  R110  R110 + R021
= 0 (0 + )*(0 + ) + 0
= 00*(0) + 0
= 00*
So we have:
R
R
R
R
k
11
k
22
k
k=0
k=1
0+
0*
1+
0*1 + 
1
0*1
0
00*
12
k
21
And L(M) =
R = R  R  R + R
2
1
1
1
1
12
12
22
22
12
= 0*1(0*1 + )*(0*1 + ) + 0*1
= 0*1(0*1 + )*
= 0*1(0*1)* or (0*1)+
Hence, any language than can be recognized by a finite automaton can be denoted by a regular
expression and vice versa.
10
Practice Problem:
Give a regular expression E that expresses the set of strings recognized by
the following dfa:
1
start
q1
0
0
q2
1
q3
0,1
11
Download