정규 언어

advertisement
http://plac.dongguk.ac.kr
컴파일러 입문
제3장
정규 언어
http://plac.dongguk.ac.kr
목 차
3.1 정규 문법과 정규 언어
3.2 정규 표현
3.3 유한 오토마타
3.4 정규 언어의 속성
Regular Language
Page 2
http://plac.dongguk.ac.kr
정규 문법과 정규 언어

A study of the theory of regular languages is often justified by the fact
that they model the lexical analysis stage of a compiler.

Type 3 Grammar(N. Chomsky)
RLG :
A → tB, A → t
LLG :
A → Bt, A → t


where, A,B ∈ VN and t ∈ VT*.
It is important to note that grammars in which left-linear productions are
intermixed with right-linear productions are not regular.
For example,
G : S → aR S → c R → Sb
L(G) = {ancbn | n  0} is a cfl.
Regular Language
Page 3
http://plac.dongguk.ac.kr

Definition
(1) A grammar is regular if each rule is
i) A  aB, A  a, where a  VT, A, B  VN.
ii) if S  ε  P, then S doesn't appear in RHS.

우선형 문법 A  tB, A  t 의 형태에서 t 가 하나의 terminal 로
이루어진 경우로 정규 문법에 관한 속성을 체계적으로 전개하기 위하
여 바람직한 형태이다.
(2) A language is said to be a regular language(rl) if it can be
generated by a regular grammar.
ex) L = { anbm| n, m ≥1 } is rl.
S  aS | aA
A  bA | b
Regular Language
Page 4
http://plac.dongguk.ac.kr
[Theorem] The production forms of regular grammar can be derived
from those of RLG.(RLG => RG) (Text p.69)
(proof)
A  tB, where t  VT*.
Let t = a1a2... an, ai  VT.
A  a1A1
A1  a2A2
..
.
Right-linear grammar :
A → tB or A → t,
where A, B ∈ VN and t ∈ VT*.
An-1  anB.
If t = e, then A  B (single production) or A  e (epsilon production).
⇒ These forms of productions can be easily removed.
(Text pp.175-181)
ex) S  abcA ⇒ S  aS1,
A  bcA ⇒ A  bA1,
A  cd
⇒ A  cA1',
S1  bS2
A1  cA
A1'  d
S2  cA
Regular Language
Page 5
http://plac.dongguk.ac.kr
Equivalence
1. 언어 L은 우선형 문법에 의해 생성된다.
2. 언어 L은 좌선형 문법에 의해 생성된다.
3. 언어 L은 정규 문법에 의해 생성된다.
정규 언어
[예] L = {anbm | n,m ≥ 1} : rl
S  aS | aA
A  bA | b
Text p. 70
Regular Language
Page 6
http://plac.dongguk.ac.kr

토큰의 구조를 정의하는데 정규 언어를 사용하는 이유
(1) 토큰의 구조는 간단하기 때문에 정규 문법으로 표현할 수 있다.
(2) context-free 문법보다는 정규 문법으로부터 효율적인 인식기를 구
현할 수 있다.
(3) 컴파일러의 전반부를 모듈러하게 나누어 구성할 수 있다.
(Scanner + Parser)

문법의 형태가 정규 문법이면 그 문법이 나타내는 언어의 형태를 체계
적으로 구하여 정규 표현으로 나타낼 수 있다.
G
derivation
L
if G = rg, L: re.
Regular Language
Page 7
http://plac.dongguk.ac.kr
정규 표현


A notation that allows us to describe the structures of
sentences in regular language.
The methods for specifying the regular languages
(1) regular grammar(rg)
(2) regular expression(re)
(3) finite automata(fa)
rg
fa
re
Regular Language
Page 8
http://plac.dongguk.ac.kr

Text p. 71
Definition :
 A regular expression over the alphabet T and the language
denoted by that expression are defined recursively as
follows :
I. Basis : f , e , a  T.
(1) f is a regular expression denoting the empty set.
(2) e is a regular expression denoting {e}.
(3) a where a  T is a regular expression denoting {a}.
II. Recurse : + , • , *
If P and Q are regular expressions denoting Lp and Lq
respectively, then
(1) (P + Q) is a regular expression denoting Lp U Lq. (union)
(2) (P • Q) is a regular expression denoting Lp  Lq. (concatenation)
(3) (P*) is a regular expression denoting (closure)
{e} U Lp U Lp2 U ... U Lpn ...
Note : precedence : + < • < *
II. Nothing else is a regular expression.
Regular Language
Page 9
http://plac.dongguk.ac.kr
ex) (0+1)* denotes {0,1}*.
(0+1)*011 denotes the set of all strings of 0s and 1s
ending in 011.

Definition : if α is α regular expression, L(α) denotes the
language associated with α. (Text p.72)

Let a and b be regular expressions. Then,
(1) L(α+ β) = L(α)  L(β)
(2) L(α β) = L(α) L(β)
(3) L(α*) = L(α)*

examples :
(1) L(a*) = {e, a, aa, aaa, … } = {an | n  0}
(2) L((aa)*(bb)*b) = {a2nb2m+1| n,m  0}
(3) L((a+b)*b(a+ab)*) --- 연습문제 3.2 (3) - text p.115
= { b, ba, bab, ab, bb, aab, bbb, … }
Regular Language
Page 10
http://plac.dongguk.ac.kr

Definition : Two regular expressions are equal if and only if
they denote the same language.
 α= β if L(α) = L(β).

Axioms : Some algebraic properties of regular expressions.
Let a, b and g be regular expressions. Then, (Text p.73)
A1. α+β = β+α
A3. (αβ) γ = α (βγ)
A5. (β + γ) α = βα + γα
A7. α + f = α
A9. e α = α = α e
A11. α* = (e + α)*
A13. α* + α = α *
A15. (α + β)* = (α* β *) *
A2. (α+β) +γ = α+ (β+γ)
A4. α(β+γ) = αβ +αγ
A6. α+α=α
A8. αf = f = fα
A10. α* = e +α•α*
A12. (α* )* = α*
A14. α* + α+ = α*
Regular Language
Page 11
http://plac.dongguk.ac.kr

All of these identities(=Axioms) are easily proved by the
definition of regular expression.
A8. αf = f = f α
(proof) αf = { xy | x  Lα and y Lf }
Since y  Lf is false, (x  Lα and y  Lf) is false.
Thus αf = f .

Definitions : regular expression equations.
::= the set of equations whose coefficient are
regular expressions.
ex) α,β가 정규 표현이면, X = αX+β가 정규 표현식이다. 이때, X의
의미는 nonterminal 심볼이며 우측의 식이 그 nonterminal이 생
성하는 언어의 형태이다.
Regular Language
Page 12
http://plac.dongguk.ac.kr
▶ The solution of the regular expression equation
X = αX + β

When we substitute X = α*β in both side of the equation, each side of
the equation represents the same language.
X = αX + β
= α(α*β) + β
= αα*β + β = (αα* + ε)β = α*β.

fixed point iteration
X = αX + β
= α(αX + β) + β
= α2X + αβ + β = α2X + (ε + α)β
..
.
k+1
= α X + (ε + α + α2 + ... αk )β
= (ε + α + α2 + ... + αk + ...)β = α*β.
Regular Language
Page 13
http://plac.dongguk.ac.kr

Not all regular expression equations have unique solution.
X = αX + β
(a) If ε is not in α, then X = α*β is the unique solution.
(b) If ε is in α, then X = α*(β + L) for some language L.
So it has an infinity of solutions.
⇒ Smallest solution : X = α*β.
ex)
X = X + a : not unique solution
⇒ X = a + b or X = b*a or X = (a + b)* etc.
X=X+a
X=X+a
=a+b+a
= b *a + a
=a+a+b
= (b* + ε) a
= a + b.
= b*a
Regular Language
Page 14
http://plac.dongguk.ac.kr

Finding a regular expression denoting L(G) for a given rg G.
G
derivation
L
if G = rg, L: re.


L(A) where A  VN denotes the language generated by A.
By definition, if S is a start symbol, then L(G)= L(S).
Two steps :
1. Construct a set of simultaneous equations from G.
A  aB, A  a
L(A) = {a}·L(B) U {a}  A = aB + a
In general, X  α |β| γ ⇒ X = α + β + γ.
2. Solve these equations.
X = αX + β  X = α*β.
Regular Language
Page 15
http://plac.dongguk.ac.kr
ex1) S  aS
S  bR
S ε
R  aS
L(S) = {a}L(S) U {b}L(R) U{ε}
L(R) = {a}L(S)

ree: S = aS + bR + ε
R = aS
S = aS + baS + ε
= (a + ba)S + ε
= (a + ba)* ε = (a + ba)*
ex2) S  aA | bB | b

A bA | ε
B bS
ree: S = aA + bB + b
A = bA + ε ⇒ A = b*ε = b*
B = bS
 S = ab* + bbS + b
= bbS + ab* + b
= (bb)*(ab*+b)
Regular Language
Page 16
http://plac.dongguk.ac.kr
ex3) A  0B | 1A
B  1A | 0C
C  0C | 1C | ε
ex4) S  aA | bS
A  aS | bB
B  aB | bB | ε
ex5) S  0A | 1B | 0
A  0A | 0S | 1B
B  1B | 1 | 0
ex6) X1 = 0X2 + 1X1 + ε
X2 = 0X3 + 1X2
X3 = 0X1 + 1X3
ex7) A1 = (01* + 1) A1 + A2
A2 = 11 + 1A1 + 00A3
A3 = A1 + A2 + ε
ex8) A  aB | bA
B  aB | bC
C  bD | aB
D  bA | aB |ε
Text p.116
3.5(5)
풀이
ex9) X  α1X + α2Y + α3 ex10) PR  b DL SL e
Y  β1X + β2Y + β3
DL  d ; DL | ε
SL  SL ; s | s
Regular Language
Page 17
http://plac.dongguk.ac.kr
인식기(Recognizer)
☞ A recognizer for a language L is a program that takes as input
string x and answers “yes ” if x is a sentence of L and “no ”
otherwise.
a0a1a2 … aiai+1ai+2 … an
input
input head
Finite State Control
• Turing Machine
• Linear Bounded Automata
• Pushdown Automata
• Finite Automata
Auxiliary Storage
Regular Language
Page 18
http://plac.dongguk.ac.kr
유한 오토마타

Text p. 78
Definition : fa
A finite automaton M over an alphabet  is a system (Q, , , q0, F)
where, Q : finite, non-empty set of states.
 : finite input alphabet.
 : mapping function.
q0  Q : start(or initial) state.
F ⊆ Q : set of final states.


mapping  : Q x   2Q.
i,e. (q,a) = {p1, p2, ... , pn}
G = (VN, VT, P, S)
re : f, e, a, + , • , *
M = (Q, , , q0, F)
DFA , NFA.
Regular Language
Page 19
http://plac.dongguk.ac.kr
목차 - FA
1. DFA
2. NFA
3. Converting NFA into DFA
4. Minimization of FA
5. Closure Properties of FA
Regular Language
Page 20
http://plac.dongguk.ac.kr
1. Deterministic Finite Automata(DFA)

deterministic if (q,a) consists of one state.
We shall write "(q,a) = p " instead of (q,a) = {p} if deterministic.
If δ(q,a) always has exactly one number,
We say that M is completely specified.

extension of  : Q x  ⇒ Q x *
(q, e ) = q
(q,xa) = ((q,x),a), where x  * and a  .

A sentence x is said to be accepted by M
if (q0, x) = p , for some p  F.

The language accepted by M :
L(M) = { x | (q0,x)  F }
Regular Language
Page 21
http://plac.dongguk.ac.kr
ex) M = ( {p, q, r}, {0, 1}, , p, {r} )
 : (p,0) = q
(p,1) = p
(q,0) = r
(q,1) = p
δ(r,1) = r
(r,0) = r
1001  L(M) ?
(p,1001) = (p,001) = (q,01) = (r,1) = r  F.
∴ 1001  L(M).

1010  L(M) ?
(p,1010) = (p,010) = (q,10) = (p,0) = q  F.
∴ 1010  L(M).


 : matrix 형태로  transition table.
ex)

p
q
r
Input symbols
0
1
q
p
r
p
r
r
Regular Language
Page 22
http://plac.dongguk.ac.kr

Definition : State (or Transition) diagram for automaton.


The state diagram consists of a node for every state
and a directed arc from state q to state p with label
a   if (q,a) = p.
Final states are indicated by a double circle and the initial state is
marked by an arrow labeled start.
1
0, 1
0
start
p
0
q
r
1
(1+01)*00(0+1)*
Identifier :
start
letter, digit
S
letter
A
Regular Language
Page 23
http://plac.dongguk.ac.kr
Text p. 82
Algorithm : w ? L(M).
assume M = (Q, , , q0, F);
begin
currentstate := q0; (* start state *)
get(nextsymbol);
while not eof do
begin currentstate := (currentstate, nextsymbol);
get(nextsymbol)
end;
if currentstate in F then write(‘Valid String’)
else write(‘Invalid String’);
end.
Regular Language
Page 24
http://plac.dongguk.ac.kr
2. Nondeterministic Finite
Automata(NFA)

nondeterministic if (q,a) = {p1, p2, ..., pn}

In state q, scanning input data a, moves input head one symbol
right and chooses any one of p1, p2, ..., pn as the next state.
ex) NFA (Nondeterministic Finite Automata)
M = ( {q0,q1,q2,q3,qf}, {0,1}, , q0, {qf} )

δ
q0
q1
q2
q3
qf

0
{q1, q2}
{q1, q2}
{qf}
f
{qf}
1
{q1, q3}
{q1, q3}
f
{qf}
{qf}
if (q,a) = f, then (q,a) is undefined.
Regular Language
Page 25
http://plac.dongguk.ac.kr

To define the language recognized by NFA, we must extend .
(i)  : Q x * → 2Q
( q, ε ) = { q }
U
( q, xa ) =
(p,a), where a  VT and x  VT*.
p  ( q, x )
k
 (pi,x)
É
(ii)  : 2Q x * → 2Q
({p1, p2, ..., pk}, x) =
i=1

Definition : A sentence x is accepted by M
if there is a state p in both F and (q0, x).
ex) 1011  L(M) ?
(q0, 1011) = ({q1,q3}, 011) = ({q1,q2},11)
= ({q1,q3},1) = {q1,q3,qf}
 1011  L(M) ( ∵ {q1,q3,qf} ∩ {qf}  Φ)
ex) 0100  L(M) ?
Regular Language
Page 26
http://plac.dongguk.ac.kr

Nondeterministic behavior
q0
q1
q1
q1
q1


q2
q3
q3
q3
f
f
qf
If the number of states |Q| = m and input length |x| = n, then there
are mn nodes.
In general, NFA can not be easily simulated by a simple program,
but DFA can be simulated easily.
And so we shall see DFA is constructible from the NFA.
Regular Language
Page 27
http://plac.dongguk.ac.kr
3. Converting NFA into DFA

Text p. 86
NFA : easily describe the real world.
DFA : easily simulated by a simple program.
===> Fortunately, for each NFA we can find a DFA accepting
the same language.

Accepting Sequence(NFA)
(q0, a1a2 ... an) = ({q1,q2, … ,qi}, a2a3 ... an)
... ...
= ({p1,p2, … ,pj}, ai ... an)
... ...
= {r1,r2, ... ,rk}

Since the states of the DFA represent subsets of the set
of all states of the NFA, this algorithm is often called the
subset construction.
Regular Language
Page 28
http://plac.dongguk.ac.kr
[Theorem] Let L be a language accepted by NFA. Then
there exists DFA which accepts L. Text p.86
(proof) Let M = (Q, , , q0, F) be a NFA accepting L.
Define DFA M' = (Q', , ', q0', F') such that
(1) Q' = 2Q, {q1, q2, ..., qi} ∈ Q', where qi ∈ Q.
denote a set of Q' as [q1, q2, ..., qi].
(2) q0' = {q0} = [q0]
(3) F' = {[r1, r2, ..., rk] | ri ∈ F}
(4) ' : ' ([q1, q2, ...,qi], a) = [p1, p2, ..., pj]
if ({q1, q2, ..., qj}, a) = {p1, p2, ..., pj}.
Now we must prove that L(M) = L(M’) i.e,
' (q0',x)  F'  (q0, x) ∩ F  f.
 we can easily show that by inductive hypothesis on the length of
the input string x.
Regular Language
Page 29
http://plac.dongguk.ac.kr
ex1) M = ({q0,q1}, {0,1}, , q0, {q1}),

0
1
q0
q1
{q0 , q1 }
f
{q0 }
{q0 , q1 }
 dfa M' = (Q', , ', q0', F'),
where Q' = 2Q = {[q0], [q1], [q0,q1]}
q0' = [q0]
F' = {[q1], [q0,q1]}
δ' :δ'([q0],0) = δ({q0},0) = {q0,q1} = [q0,q1]
δ'([q0],1) = {q0} = [q0]
δ' ([q1],0) = δ(q1,0) = f
δ' ([q1],1) = δ(q1,1) = {q0,q1} = [q0,q1]
δ' ([q0,q1],0) = δ({q0,q1},0) = {q0,q1} = [q0,q1]
δ' ([q0,q1],1) = δ({q0,q1},1) = {q0,q1} = [q0,q1]
Regular Language
Page 30
http://plac.dongguk.ac.kr

State renaming : [q0] = A, [q1] = B, [q0,q1] = C.
’
A
B
C
0
C
f
C
1
A
C
C
B
1
1
start

A
0, 1
0
C
Since B is an inaccessible state, it can be removed.
1
start
A
0, 1
0
Regular Language
C
Page 31
http://plac.dongguk.ac.kr

Definition : we call a state p accessible if there is w such that
(q0, w)  (p, *ε) , where q0 is the initial state.
ex2) NFA  DFA
NFA :

q0
q1
q2
q3
qf
0
{q1,q2}
{q1,q2}
{qf}
f
{qf}
1
{q1,q3}
{q1,q3}
f
{qf}
{qf}
DFA :
’
q0
q1q2
q1q3
q1q2qf
q1q3qf
0
q1q2
q1q2qf
q1q2
q1q2qf
q1q2qf
1
q1q3
q1q3
q1q3qf
q1q3qf
q1q3qf
Regular Language
Page 32
http://plac.dongguk.ac.kr
Definition : e - NFA M = (Q, , , q0, F)
 : Q  (   {e} )  2Q
e - CLOSURE : e을 보고 갈 수 있는 상태들의 집합
 s가 하나의 상태
e-CLOSURE(s) = {s}{q|(p, e)=q, p  e-CLOSURE(s)}
 T가 하나 이상의 상태 집합인 경우
∪ e-CLOSURE(q)
e-CLOSURE(T) =
q∈T
ex) e - NFA에서 CLOSURE를 구하기
a
b
a
start
A
ε
B
a
C
ε
D
ε
CLOSURE (A) = {A, B, D}
CLOSURE({A,C}) = CLOSURE(A)  CLOSURE(C) = {A, B, C, D}
Regular Language
Page 33
http://plac.dongguk.ac.kr
Ex) e - NFA  DFA
a
start
1
start
b
2
a
A
4
c
c
c
3
ε

b
B
C
ε
a
b
CLOSURE(1) = {1,3,4} CLOSURE(2) = {2}
 [1,3,4]
 [2]
c
f CLOSURE(3) = {3,4}
 [3,4]
[2]
f
CLOSURE(4) = {4}
 [4]
[3,4]
f
f
[4]
f
f
f
CLOSURE(3) = {3,4}
 [3,4]
f
A = [1,3,4], B = [2], C = [3,4], D = [4]
Regular Language
Page 34
D
http://plac.dongguk.ac.kr
4. Minimization of FA

State minimization => state merge

Definition :
 ω  * distinguishes q1 from q2 if (q1,ω) = q3,
(q2,ω) = q4 and exactly one of q3, q4 is in F.

Algorithm : equivalence relation() ⇒ partition.
Text p. 95
(1)  : final state인가 아닌 가로 partition.
(2)  : input symbol에 따라 다른 equivalence class 로 가는가?
그 symbol로 distinguish 된다고 함.
:
(3)  : 더 이상 partition이 일어나지 않을 때까지.
 The states that can not be distinguished are merged into a
single state.
Regular Language
Page 35
http://plac.dongguk.ac.kr
Text p. 119 3.11
Ex)
a
a
A
b
F
b
D
b
a
a
b
C
a
b
B
E
b
a
 : {A,F}, {B, C, D, E} : 처음에 final, nonfinal로 분할한다.
 : {A,F}, {B,E}, {C,D} : {B, C, D, E} 가 input symbol b에 의해
partition 됨
 : {A,F}, {B,E}, {C,D}.
δ’
a
b
[AF]
[AF]
[BE]
[BE]
[BE]
[CD]
[CD]
[CD]
[AF]
Regular Language
Page 36
http://plac.dongguk.ac.kr

How to minimize the number of states in a fa.
<step 1> Delete all inaccessible states;
<step 2> Construct the equivalence relations;
<step 3> Construct fa M’ = (Q’, , ’, q0’, F’),
(a) Q’ : set of equivalence classes under 
Let [p] be the equivalence class of state p under .
(b) ’([p],a) = [q] if (p,a) = q.
(c) q0’ is [q0].
(d) F' = {[q] | q  F}.

Definition : M is said to be reduced.
if (1) no state in Q is inaccessible and
(2) no two distinct states of Q are indistinguishable
Regular Language
Page 37
http://plac.dongguk.ac.kr
ex) Find the minimum state finite automaton for the language specified by
the finite automaton M = ({A,B,C,D,E,F}, {0,1}, , A, {E,F}),

where  is given by
δ
0
1
A
B
C
D
E
F
B
E
A
F
D
D
C
F
A
E
F
E
Text p. 119 3.11(2)
 : {A, B, C, D}, {E, F}
δ
[A]=p
[C]=q
[B,D]=r
[E, F]=s
0
r
p
s
r
 : {A}, {C}, {B, D}, {E, F}
1
q
p
s
s
Regular Language
Page 38
http://plac.dongguk.ac.kr
Programming
<연습문제 3.20> --- 교과서 121쪽
NFA to DFA
NFA


DFA
Minimization
of DFA
Reduced
DFA
Input Design
Data Structure
Regular Language
Page 39
http://plac.dongguk.ac.kr
5. Closure properties of FA
[Theorem] If L1 and L2 are finite automaton languages (FAL),
then so are (i) L1 U L2 (ii) L1 • L2 (iii) L1*.
(proof) M1 = (Q1, , 1, q1, F1)
M2 = (Q2, , 2, q2, F2), Q1  Q2 = f (∵ renaming)
(i) M = (Q1 U Q2 U {q0}, , , q0, F)
where, (1) q0 is a new state.
(2) F = F1 U F2 if e  L1 U L2.
F1 U F2 U {q0} if e  L1 U L2.
(3) (a) (q0,a) = (q1,a) U (q2,a) for all a  .
(b) (q,a) = 1(q,a) for all q  Q1, a  .
(c) (q,a) = 2(q,a) for all q  Q2, a  .

새로운 시작 상태를 만들어 각각의 fa에 마치 각 fa의 시작 상태에서 온 것처럼 연결
한다. 그리고 e 를 인식하면 새로 만든 시작 상태도 종결 상태로 만든다.
ex) p.98 [예 28]
Regular Language
Page 40
http://plac.dongguk.ac.kr
(ii) M = (Q1 U Q2, , , q0, F)
(1) F =
F2
F1 U F 2
if q2  F2
if q2  F2
(2) (a) (q,a) = 1(q,a) for all q  Q1 - F1.
(b) (q,a) = 1(q,a) U 2(q2,a) for all q  F1.
(c) (q,a) = 2(q,a) for all q  Q2.

M1의 종결 상태에서 M2의 시작 상태에서 온 것처럼 연결한다. 그리고 M1의
시작 상태가 접속한 오토마타의 시작 상태가 된다.
1
M1
:
start
0
A
=> 01*
B
1
M2
:
start
0
X
=> 01*
Y
1
M 1 •M 2 :
start
0
0
A
B
Y
=> 01*01*
1
Regular Language
Page 41
http://plac.dongguk.ac.kr
정규 언어의 속성
Regular grammar (rg)
Finite automata (fa)
Regular expression (re)
※ re ===> fa : scanner generator
Regular Language
Page 42
http://plac.dongguk.ac.kr
목차
1. RG & FA
2. FA & RE
3. Closure Properties of Regular Language
4. The Pumping Lemma for Regular
Language
Regular Language
Page 43
http://plac.dongguk.ac.kr
1. RG & FA

Given rg, there exists a fa that accepts the same language
generated by rg and vice versa.

rg  fa
Given rg, G = (VN, VT, P, S) , construct M = (Q, , , q0, F).
(1) Q = VN U {f}, where f is a new final state.
(2)  = VT.
(3) q0 = S.
(4) F = {f} if e  L(G)
= {S, f} otherwise.
(5)  : if A  aB  P then (A,a) ' B.
if A  a  P then (A,a) ' f.
Regular Language
Page 44
http://plac.dongguk.ac.kr
(proof)
If  is accepted by fa then it is accepted in some sequence of
moves through states, ending in f.
But if (A,a) = B and B  f , then A  aB is a productions.
Also if (A,a) = f then A  a is a production.
So we can use the same series of productions to generate  in G
Thus
*
S => .
ex) p.101 [예 29]
Regular Language
Page 45
http://plac.dongguk.ac.kr

fa  rg
Given M = (Q, , , q0, F), construct G = (VN, VT, P, S).
(1) VN = Q
(2) VT = 
(3) S = q0
(4) P : if (q,a) = r then q  ar.
if p  F then p  e.
1
ex)
0, 1
0
start
p
q
0
r
1
p  1p | 0q
q  1p | 0r
r  0r | 1r | ε
L(P)=(1+01)*00(0+1)*
Regular Language
Page 46
http://plac.dongguk.ac.kr
2. FA & RE

fa  rg  re
ex) p.118 3.10 (1)
b
start
A
a
a
b
b
C
B
D
a
a
b
A = bA + aB
B = aB + bC
C = aB + bD
D = aB + bA + e
=A+e

A = (a+b)*abb
Regular Language
Page 47
http://plac.dongguk.ac.kr

re  fa (※ scanner generator)
For each component, we construct a fa inductively :
1. basis
ε
:
ε
i
a  :
f
a
i
f
2. induction - combine the components.
(1) N1 + N2
ε
N1
ε
i
f
ε
N2
Regular Language
ε
Page 48
http://plac.dongguk.ac.kr
(2) N1 •N2
i
ε
N1
N2
f
ε
(3) N*
i
ε
N
ε
f
ε
Regular Language
Page 49
http://plac.dongguk.ac.kr

Definition : The size of a regular expression is the number
of operations and operands in the expression.
ex) size(ab + c*) = 6

decomposition:
R6
R3
R1
a

.
R5
+
R2
R4
b
c
*
The number of state is at most twice the size of the expression.
(∵ each operand introduces two states and each operator introduces at
most two states.)

The number of arcs is at most four times the size of the expression.
Regular Language
Page 50
http://plac.dongguk.ac.kr

Simplifications : p.106
※ e -arc로 연결된 두 상태는 소스 상태에서 나가는 다른 arc가 없으
면 같은 상태로 취급될 수 있다.
a
ε
B
A
A
a
ex) p.105 [예 31]

re  e-NFA (간단화)  DFA
ex) p.109 [예 33]

The following statements are equivalent :
1. L is generated by some regular grammar.
2. L is recognized by some finite automata.
3. L is described by some regular expression.
Regular Language
Page 51
http://plac.dongguk.ac.kr
p.120 3.14
(1) (b + a(aa* b)*b)*
b
a
a
a
X
Y
Z
b
b
(2) (b + aa + ac + aaa + aac)*
b
a
a
X
Y
Z
a, c
a, c
(3) a(a+b)*b(a+b)*a(a+b)*b(a+b)*
S
a
W
a, b
a, b
b
a
X
a, b
a, b
b
Y
Regular Language
Z
Page 52
http://plac.dongguk.ac.kr
3. Closure Properties of Regular
Language
[Theorem] If L1 and L2 are regular languages,
then so are (i) L1 U L2 , (ii) L1L2, and (iii) L1*.
(proof) (ii) Since L1 and L2 are rl, rg G1 = (VN1, VT1, P1, S1) and
rg G2 = (VN2,VT2, P2, S2), such that L(G1) = L1 and L(G2) = L2.
Construct G=(VN1 U VN2,VT1 U VT2, P, S1) in which P is defined as follows :
(1) If A  aB  P1, A  aB  P.
(2) If A  a  P1, A  aS2  P.
(3) All productions in P2 are in P.
We must prove that L(G) = L(G1) . L(G2).
Since G is rg, L(G) is rl. Therefore L(G1) . L(G2) is rl.
ex) P1 : S  aS | bA
A  aA | a
P2 : X  0X | 1Y
Y  0Y | 1
 P : S  aS | bA A  aA | aX
X  0X | 1Y
Regular Language
Y  0Y | 1
Page 53
http://plac.dongguk.ac.kr
(iii) L : rl, rg G = (VN, VT, P, S) such that L(G) = L.
Let G' = (VN U {S'}, VT, P', S')
P' : (1) If A  aB  P, then A  aB  P'.
(2) If A  a  P, then A  a, A  aS'  P'.
(3) S'  S ┃ε  P'.
We must prove that L(G') = (L(G))*.
*
*
*
  L(G), S => . S' => S => wS' => w*S' => w*.
∴ (L(G))* = L(G').
ex) P : S  aS, S  b
P' : S  aS, S  b, S  bS', S'  S, S'  e .
note P : S = aS + b = a*b
P' : S = aS + b + bS' = a*(b+bS') = a*b + a*bS'
∴ S' = S + e
= a*bS' + a*b + e
= (a*b)*(a*b + e )
= (a*b)*(a*b) + (a*b)* = (a*b)*
Regular Language
Page 54
http://plac.dongguk.ac.kr
4. The Pumping Lemma for Regular
Language

It is useful in proving certain languages not to be regular.
[Theorem] Let L be a regular language. There exists a constant p such that
if a string w is in L and |ω|  p, then w can be written as xyz,
where 0 < |y| ≤p and xyiz  L for all i  0.
(proof) Let M = (Q, , , q0, F) be a fa with n states such that L(M) = L.
Let p = n. If   L and |ω|  n, then consider the sequence of
configurations entered by M in accepting w. Since there are at
least n+1 configurations in the sequence, there must be two with
the same state among the first n+1 configurations.
Thus we have a sequence of moves such that
(q0,xyz) = (q1,yz) =  δ(q1,z) = qf  F for some q1.
y
q0
x
q1
z
qf
But then, (q0,xyiz) = (q1,yiz) = (q1,yi-1z) = ... = (q1,z) = qf F.
Since w = xyz L, xyiz≤ L for all i  0.
Regular Language
Page 55
http://plac.dongguk.ac.kr
Consequently, we say that “finite automata can not count”,
meaning they can not accept a language which requires that
they count the number exactly.
ex) L = {0n1n | n ≥1} is not type 3.
(Proof)
Suppose that L is regular.
Then for a sufficiently large n, 0n1n can be written as xyz
such that | y|  0 and xyiz  L for all i  0.
If y  0+ or y  1+ , then xz = xy0z  L.
If y  0+1+, then xyz  L.
We have a contradiction, so L can not be regular.

ancbn  not rl
ancbm  rl
Regular Language
Page 56
http://plac.dongguk.ac.kr
연습문제 3.5 풀이교과서 116쪽
A = aB + bA
B = aB + bC
C = bD + aB
D = bA + aB + e
……………………… (1)
……………………… (2)
……………………… (3)
……………………… (4)
식 (4)에서 bA + aB = aB + bA = A 이므로
D=A+ e
……………………… (5)
식 (3)에 식 (5)를 대입
C = b(A + e) + aB = bA + aB + b
=A+b
……………………… (6)
식 (2)에 식 (6)을 대입
B = aB + b(A + b) = aB + bA + bb
= A + bb
……………………… (7)
식 (1)에 식 (7)을 대입
A = aB + bA = a(A + bb) + bA = aA + abb + bA = (a + b)A + abb
= (a+b)*abb
 L(G) = (a+b)*abb
Regular Language
Page 57
Download