Uploaded by Shahin Shuvo

Lecture4

advertisement
CSE 3204: Formal Language,
Automata and Computability
Regular Expressions
Reading: Chapter 3
1
RE’s: Introduction



Regular expressions are an algebraic way to describe
languages.
They describe exactly the regular languages.
If E is a regular expression, then L(E) is the language it
defines.
Regular Expressions: Language
• The set of strings accepted by a fnite automaton is referred to
as the language accepted by the finite automaton.
• For each fnite automaton there is a regular expression that
defnes the same language.
Regular Expressions: Language

Basis 1: If a is any symbol, then a is a RE, and L(a) = {a}.

Note: {a} is the language containing one string, and that
string is of length 1.

Basis 2: ε is a RE, and L(ε) = {ε}.

Basis 3: ∅ is a RE, and L(∅) = ∅.
Regular Expressions: Language


Induction 1: If E1 and E2 are regular expressions, then E1+E2
is a regular expression, and L(E1+E2) = L(E1)L(E2).
Induction 2: If E1 and E2 are regular expressions, then E1E2
is a regular expression, and L(E1E2) = L(E1)L(E2).
Concatenation : the set of strings wx such that w
is in L(E1) and x is in L(E2).
Regular Expressions: Language

Induction 3: If E is a RE, then E* is a RE, and L(E*) =
(L(E))*.
Closure, or “Kleene closure” = set of strings
w1w2…wn, for some n > 0, where each wi is
in L(E).
Note: when n=0, the string is ε.
Identities and Annihilators



∅ is the identity for +.

R + ∅ = R.
ε is the identity for concatenation.

εR = Rε = R.
∅ is the annihilator for concatenation.
∅R = R∅ = ∅.

Examples: RE’s

L(01) = {01}.

L(01+0) = {01, 0}.


L(0(1+0)) = {01, 00}.

Note order of precedence of operators.
L(0*) = {ε, 0, 00, 000,… }.
Examples
Examples
• The Language defned by the expression
ab*a
• Is the set of all strings of a’s and b’s that have at least two
leters, that begin and end with a’s and that have nothing but
b’s inside (if any thing at all).
• Language(ab*a)={aa aba abba abbba…}
Examples
• a*b*
• Language(a*b*)={ε a b aa ab bb aaa aab…} Note
a*b* ≠ (ab)*
• L2 = { xodd}
can be defned as x(xx)* or (xx)*x
• 01* + 10*
• Denotes the language consistng of all strings that are
either a single 0 followed by any number of 1’s or a
single 1 followed by any number of 0’s.
Languages and Regular Expression
S.No.
Languages
Regular
Expression
1
{ε}
ε
2
{0}
0
3
{001} i.e. {0}{0}{1}
001
Examples
•
Write regular expression for the following languages:
1.
2.
The set of strings over alphabet {a, b, c} containing at least one
a and at least one b.
Ans: The simplest approach is to consider those strings in
which the frst a precedes the frst b separately from those
where the opposite occurs. The expression:
c*a(a+c)*b(a+b+c)* + c*b(b+c)*a(a+b+c)*
Examples
•
•
•
The language of all words that have at least two a’s can be
described by the expression
(a + b)*a (a + b)*a (a + b)*
(some beginning) (the 1st important a) (some middle) (the 2nd
important a) (some end)
Equivalence of RE’s and Automata


We need to show that for every RE, there is an automaton that
accepts the same language.

Pick the most powerful automaton type: the ε-NFA.
And we need to show that for every automaton, there is a RE
defining its language.

Pick the most restrictive type: the DFA.
Converting a RE to an ε-NFA


Proof is an induction on the number of operators (+,
concatenation, *) in the RE.
We always construct an automaton of a special form (next
slide).
Equivalence of FA’s and regex’s
•
We have already shown that DFA’s, NFA’s, and ε-NFA
all are equivalent.
•
To show FA’s equivalent to regex’s we need to
establish that
1.
2.
For every DFA A we can fnd (construct, in this case) a regex R,
such that L(R) = L(A).
For every regex R there is a ε-NFA A, such that L(A) = L(R).
Simplification Rules
• We will be needing the following simplifiatio rules:
•
•
•
•
(ε + R)* = R*
R + RS* = RS*
ØR = R Ø = Ø (Annihilaton)
Ø + R = R + Ø = R (Identty)
Convert DFA to regex
L(A) = {x0y | x Є {1}* and y Є {0,1} }
Convert DFA to regex (con’t)
Convert DFA to regex (con’t)
Convert DFA to regex (con’t)
Observations
• There are n3 expressions for an o-state
automaton
• We need a more efcient approach:
• The State Elimination Technique
The State Elimination Technique
• When state S is eliminated, all the
paths that went through s no longer
exist in the automaton.
• To not to change the language of
automaton, add an arc from q to p.
• How to label that arc? Use a Regular
Expression.
• The language of the automaton is
the union over all paths from the
start state to an acceptng state of
the language formed by
concatenatng the REs along the
path.
The State Elimination Technique
• What happens when we eliminate
state s.
• For each acceptng state q eliminate
from the original automaton all
states except q0 and q.
• To compensate, we introduce, for
each predecessor qi of s and each
successor pj of s, a RE that
represents all the paths that start at
qi and fnally go to pj.
• The expression for these paths is
QiS*Pj.
• Add this expression to the arc from
qi to pj.
Constructing a RE from a FA
1.
2.
For each acceptng state q, apply the previous reducton process
to produce an equivalent automaton with RE labels on the arcs.
Eliminate all states except q and the start state q0.
If q ≠ q0, then we shall be lef with a two-state automaton that
looks like;
The RE for the accepted strings can be described as (R+SU*T)*SU
The Strategy for Constructing a RE
from a FA
3.
If the start state is also an acceptng state, then we are lef with
a one-state automaton that looks like;
The RE denotng the strings that it accepts is R*.
4.
The desired RE is the sum (union) of all the expressions derived
from the reduced automata for each acceptng state, by rules
(2) and (3).
Example 3.6
• First step is to
convert it to an
automaton with
regular
expression
labels.
Example (con’t)
• Lets eliminate state B.
• State B has one predecessor, A, and one successor, C. Thus:
• Q1 = 1, P1 = 0 + 1, R11 = Ø (Since the arc from A to C does not exist) and S = Ø (because there
is no loop at state B).
• The resultant expression is Ø + 1Ø*(0 + 1).
• To simplify;
• inital Ø may be ignored in a union.
• L(Ø*) = {Є} U L(Ø) U L(Ø) …….
• Thus Ø + 1Ø*(0 + 1) is equivalent to 1(0 + 1).
Example (con’t)
• Lets eliminate state C and obtain AD.
• The mechanics is similar to those performed to eliminate state B and the
resultng automaton is shown as follows:
•The REs are R = 0 + 1, S = 1(0 + 1)(0 + 1), T = Ø, U = Ø.
•The generic expression (R + SU*T)*SU* thus simplifies in this case to R*S,
or (0 + 1)*1(0 + 1)(0 + 1).
Example (con’t)
• We can eliminate D to obtain AC.
• with regex (0 + 1)*1(0 + 1)
•The final expression is the sum of previous two regex’s:
• (0 + 1)*1(0 + 1)(0 + 1) + (0 + 1)*1(0 + 1)
From regex’s to ε-NFA’s
• Theorem 3.7: For every regex R we
can construct an ε-NFA A, s.t. L(A) =
L(R).
• Proof: By structural inducton:
• Basis: Automata for ε, Ø, and a.
From regex’s to ε-NFA’s
• Inducton: Automata for R + S, RS, and R*.
Example 3.8
• Let us convert the regular expression (0 + 1)*1(0 + 1)
Example 3.8 (cont.)
Application of Regular Expressions
• Regular Expression in UNIX
• Most real applicatons deal with the ASCII character set
• UNIX regular expression allow us to write iharaiter ilasses to
represent large sets of characters. The rules are:
• The symbol . (dot) stands for “any character”
• The sequence [a1,a2…ak] stands for the regular expression
a1 + a2 + … + ak
• A range of the form x-y mean all the characters form x to y in
the ASCII sequence. e.g. digits can be expressed [0-9]
Application of Regular Expressions
• There are special notatons for several of the most common
classes of characters. e.g.
• [:digit;] is the set of ten digits, the same as [0-9].
• [:alpha:] stands for any alphabetc character, as does [AZa-z]
• [:alnum:] stands for the digits and leters, as does {A-Zaz0-9]
• grep stands for “Global (search for) Regular Expression and
Print”
• Lexical Analysis
• Finding Paterns in Text
Download