Week 5 lecture slides

advertisement
CSE 2001 - Introduction to
The Theory of Computation
Week 5 - Jun 9, 2014
• Today: NFA to Regular expression
• Using pumping lemma
• Mealy machines for controls
•
•
----New - Chapter: 2: Context-free languages and
grammars
Review of test 1
1
Announcements
•
•
Tutorials are still 4:30-6 Wednesday, CB 129
•
•
•
•
Test 2 is planned for July 7. Big one
•
(In the future: A final review session is planned for
Wednesday 4:30-6 pm on July 23)
Office hours Thursdays 4 to 5 pm in Lassonde
2013 for individual help (Note room change)
Test 1 marks are posted. Averages: 13.5 - 14
Next week: Assignment 2 (part 1- first 3 questions)
Details on using submit: To submit a file named
foo.txt, you can go to the submit web server, https://
webapp.eecs.yorku.ca/submit/
2
Characterizing the languages of
Regular Expressions
• Let RE be the set of all languages that can be
represented by regular expressions
•We are proving that RE and the Regular
Languages are the same class of languages, i.e.,
RE = RL
Proof:
Step 1: For every regular expression there’s an
equivalent NFA - Lemma 1.55
Step 2: For every DFA there’s an equivalent
regular expression -Lemma 1.60 (intermediate
step: GNFA)
04/29/2014
CSE 2001
3
Generalized Nondeterministic Finite
Automata
• It is not clear how a regular expression can
express the language of a DFA
• To accomplish this it is easiest to use another RL
model, Generalized NFAs
• The main difference is that a transition is labelled
by an arbitrary regular expression R instead of just
a symbol from Σ
• In one step the GNFA can read an arbitrarily long
string of the current and subsequent input
symbols, and can make a transition if the string is
in L(R)
• It turns out that any language recognized by a
GNFA is regular
04/29/2014
CSE 2001
4
GNFA
• A Generalized NFA looks like an ordinary
NFA except:
• Like a DFA, complete set of transitions for
all states and symbols, except:
–No transitions into the start state
–No transitions out of single accept state
– ɛ transitions are allowed
– (the main thing) transitions are labelled by
regular expressions, not just by symbols of Σ
– only one transition from any state to another
• Multiple transitions are combined using U
04/29/2014
CSE 2001
5
Example GNFA (some labels not shown)
0110
0
∅
qS
qA
ε
01*
q2
0* ∪ 11
In this example an input 0111... could cause a
transition from qs to q2 using 011 from the input So could 0 alone, or 01 - nondeterministic choice
04/29/2014
CSE 2001
6
RL = RE
• Lemma 1.60:
If a language is regular, then it can be described by a
regular expression.
• Proof strategy using Generalized NFA’s - GNFAs
§
§
§
§
Regular implies equivalent DFA, by definition
convert that DFA to equivalent (simple) GNFA
convert the GNFA to an equivalent 2-state GNFA
the regular expression is equal to the one on the label
on the single remaining transition in the 2-state GNFA
GNFA - extended (generalized) NFA’s that are
defined to have regular expressions as labels on
their transitions instead of only symbols from Σ U {ɛ}
04/29/2014
CSE 2001
7
Generalized NFA - definition 1.64
A Generalized non-deterministic finite automaton
is M=(Q, Σ, δ, qstart, qaccept) with
• Q - a finite set of states
Σ - the input alphabet
(Let R be the set of all regular expressions over Σ)
• qstart the start state
• qaccept the (unique) accept state
• transition function δ:(Q - {qaccept})×(Q - {qstart}) → R
δ(qi,qj)= Rk means that when the machine is in
qi, and the remaining symbols of the input begin
with a sequence of characters w and w ∈ L(Rk) a
transition from qi to qj can be made (and w is
removed from the input on this computation path)
04/29/2014
CSE 2001
8
GNFA - Computation and Acceptance
• Sipser p.73
• A GNFA accepts a string w∈Σ* if w=w1w2...wn (each wi in
Σ*) and there is a sequence of states r0, r1, ..., rk where
each ri ∈ Q such that:
1. r0 = qstart
2. rk = qaccept
3. for each i, 1≦i≦k, wi is in the language L(Ri) where Ri =
δ(ri-1,ri) i.e. Ri is the label on the transition from ri-1 to ri
• The language recognized by a GNFA is the set of strings it accepts
• It’s not clear how to build them, but we can use them here
• It turns out that a language is recognized by a GNFA iff the
language is regular (Proof: exercise)
04/29/2014
CSE 2001
9
Characteristics of GNFA’s δ
• δ:(Q\{qaccept})×(Q\{qstart}) → R
(Other than the accept state, there are transitions
from every state to every state except the start
state, self included)
The interior Q\{qaccept,qstart} is fully connected by δ
From qstart there are only ‘outgoing transitions’
To qaccept there are only ‘ingoing transitions’
Impossible qi→qj transitions are labeled “δ(qi,qj) = ∅”
qS
Observation: This GNFA
recognizes the language
L(R) whatever regular expression R is.
Why is this true?
04/29/2014
CSE 2001
R∈R
qA
10
Proof Idea of Lemma 1.60
Proof idea (given a DFA M):
Construct an equivalent GNFA M’ with k≥2 states
Reduce one-by-one the internal states until k=2, while
keeping the regular expressions “right” (together they
denote all strings taking automaton from one state to
another)
This GNFA will be of the form
qS
R
qA
This regular expression R
will be such that L(R) = L(M)
04/29/2014
CSE 2001
11
Simplified example: fixing one path
q1-qa when ripping out q2
∑={a,b,c,d}
φ
b
qs
φ
q1
a∪ɛ
d∪(a∪ɛ)c*bc
(a∪ɛ)c*bc
(a∪ɛ)bc
dd
φ
qa
φ
bc
q2
c
To remove q2 we want to fix up the q1 to qa arc so that any
string that could take the machine from q1 to qa before
using q2, can still take the machine from q1 to qa without
using q2. (After that we must fix all paths between other
pairs of states so the loss of q2 doesn’t affect them either)
04/29/2014
CSE 2001
12
Summary
φ
qi R4 ∪RR4
qs
1R
2 *R
3
R1
qj
R3
qrip
R2
Summary: To fix the transition from qi to qj to repair any
strings lost through deleting of qrip replace R4 by
R4∪R1R2*R3 . i.e. δ(qi,qj) becomes δ(qi,qj) ∪ δ(qi,qrip) δ(qrip,qrip)* δ(qrip,qj) Do this for every transition from any qi to any qj,
(including i=j) (except qrip to itself)
c CSE 2001
04/29/2014
13
Proof of Lemma 1.60
Let M be DFA with k states
Create “equivalent” GNFA M’ with k+2 states
Reduce in k steps M’ to M’’ with 2 states, always
maintaining equivalence of language recognized
The resulting GNFA describes a single regular
expression R that expresses all and only the
strings that can take the original M from a start
state to an accept state
The regular language L(M) equals the language
L(R) of the regular expression R
04/29/2014
CSE 2001
14
DFA M → Equivalent GNFA M’
Let M have k states Q={q1,…,qk}
- Add two states qaccept and qstart
qS
- Connect qstart to earlier q1:
qj ε
q1
ε
qA - Connect old accepting states to qaccept
- Complete missing transitions by
qi
∅
qj
- Join multiple transitions:
qi
04/29/2014
1
0
qj
becomes
CSE 2001
qi
0∪1
qj
15
Convert(M): Remove Internal
state of GNFA M to get M’
If the GNFA M has more than 2 states, ‘rip’
internal qrip to get equivalent GNFA M’ by:
- Removing state qrip: Q’=Q\{qrip}
- Changing the transition function δ by
δ’(qi,qj) = δ(qi,qj) ∪ (δ(qi,qrip)(δ(qrip,qrip))*δ(qrip,qj))
for every qi∈Q’\{qaccept} and qj∈Q’\{qstart}
R1 q
rip
qi
R2
R3
R4
04/29/2014
=
qi
R4∪(R1R2*R3)
qj
CSE 2001
16
qj
Proof Lemma 1.60 - continued
• Use induction (on number of states of
GNFA) to prove correctness of the
conversion procedure.
• Base case: k=2.
• Inductive step: 2 cases – qrip is/is not on
accepting path.
R1 q
rip
qi
R2
R3
R4
04/29/2014
=
qi
R4∪(R1R2*R3)
qj
CSE 2001
17
qj
Convert
• Define a recursive procedure CONVERT (M)
that takes a GNFA M and returns a regular
expression equivalent to L(M)
– Convert (M)
–Say M = (Q, Σ, δ, qs, qa)
–If M has only 2 states, return δ(qs,qa)
–If M has > 2 states
»Select any internal state of M, call it qrip (e.g. pick
the highest numbered internal state)
»Define M’ to be (Q-{qrip}, Σ,δ’,qs qa) where δ’(qi,qj) =
δ(qi,qj) ∪ (δ(qi,qrip)(δ(qrip,qrip))*δ(qrip,qj))
»Return Convert (M’)
04/29/2014
CSE 2001
18
Claim 1.65: If M is a k-state GNFA k≧2 the language
recognized by CONVERT(M) is the same as L(M)
• Base case: k=2
– Matching the regular expression R labelling the qa
to qs transition is the only way an input string can
be accepted by the GNFA. So L(M) = R, as
required
• Induction step: k>2
– By IH we can assume that the assertion is true for
k-1 states, and we must prove it for k states
– So let M = (Q, Σ, δ, qs, qa) be a k-state GNFA and let
M’ be as in the procedure CONVERT(M)
–By Lemma on next slide, L(M’)=L(M)
–So the result of CONVERT(M’) returns a regular
expression equivalent to L(M’) and L(M’) = L(M)
04/29/2014
CSE 2001
19
Ripping Lemma
• Lemma: L(M) = L(M’) where M’ is the result of ripping any internal state qrip
out of M as described before
• Proof (⊆) Say w ∈ L(M) i.e. w=w1w2wn, each wh is a string over Σ and there’s
a sequence of states r0...rn such that wh∈L(δ(rh-1, rh)) and r0=qs and rn=qa
• If none of the r’s is qrip, then there is also an accepting computation of w by
M’, because the regular expressions labelling transitions of M’ are unioned
with the ones labelling M
• If there is one appearance of qrip in the accepting computation of M on w, let
ri be the last state visited before the visit to qrip and rj be the first state visited
after the visit to qrip. w can be divided into segments a, b and c such that a is
the part of w that takes M from the start state to qi, b is the part of w that
takes M from the qi to qrip and from qrip to qj , and c is the part that takes M
from the last qj to qs.
• As argued above the labels on transitions of M from qs prior to the visit to ri
are parts of the labels on transitions of M’, and similarly from rj to qa. Hence
strings a and c are in δ’(qs, qi) and δ’(qj, qa) Then in M’ the regular expression
(δ(qi,qrip)(δ(qrip,qrip))*δ(qrip,qj)) can take M’ from state qi to state qj on b like the
regular expressions in M (using δ(qrip,qrip) 0 times.)
• If there are multiple visits to qrip the computations between the first visit to qrip
and the last visit to qrip can be divided into components each of which is a
computation from qrip to qrip, meantime not returning to qrip. Each part of
these from qrip to qrip uses transitions in δ not involving qrip and so can be
duplicated by transitions in δ‘ as above. (Other direction (⊇) left as exercise)
04/29/2014
CSE 2001
20
Recap RL = RE
Let R be a regular expression, then there exists
an NFA M such that L(R) = L(M)
The language L(M) of a DFA M is equivalent to
a language L(M’) of a GNFA = M’, which can
be converted state-by-state to an equivalent twostate M’’
The transition qstart ⎯R→ qacceptof M’’
satisfies L(R) = L(M’’)
Hence: RE ⊆ NFA = DFA ⊆ GNFA ⊆ RE
04/29/2014
CSE 2001
21
4 state GNFA to 3state GNFA,
ripping out q1
@
@
04/29/2014
CSE 2001
22
Regular languages and Non-regular
languages
• Every finite language is regular (Proof:
Exercise)
• There are some infinite languages that we
can prove are not regular
• The proofs are by contradiction, e.g. if the
language were regular that would contradict
the pumping lemma, the closure properties,
etc.
• Essential idea to prove the pumping lemma
is that if the language has a sufficiently large
string in it, the DFA to accept it must repeat
a state
04/29/2014
CSE 2001
23
Repeating DFA Paths
Consider a DFA M with size |Q|=p
On any accepted string w of length p, p+1 states
get visited. For any accepted string w of length
p≥|Q|, there must be a j such that the computation
of M on input w goes through states like: q1,…,qj,
…,qj,…,qk, i.e. some state on the path must repeat
qj
q1
qk
04/29/2014
CSE 2001
24
Repeating DFA Paths
The action of the DFA in qj with a given symbol is
always the same.If we repeat (or ignore) the qj,…,qj
part, the new path will again be an accepting path
qj
q1
qk
06/09/2014
25
Pumping Lemma (Thm 1.37)
For every regular language L, there is a
pumping length p, such that for any string
w ∈ L and |w|≥p, w can be broken into three
parts, w= xyz such that:
1) x yi z ∈ L for every i∈{0,1,2,…}
2) |y| ≥ 1
3) |xy| ≤ p
Note that 1) implies that xz ∈ L
2) says that y cannot be the empty string ε
Condition 3) is not always used, but it shows the
repeated string is not too far from the beginning of
the string this can often help
06/09/2014
26
Use of Pumping Lemma
• To prove a language B is not regular:
–Assume B is regular (to get a proof by contradiction)
–If B is regular then the pumping lemma must apply to B
–So choose a sufficiently long string s in the language B.
–BY PL s can be broken into parts xyz satisfying |y| ≧1, |xy|≦
1, and for any i, xyiz is in B (i.e. s has to have the pumping
property)
–Use the above fact to get a contradiction
• Choose a “nice” string s to make it easy to get contradiction
• Choose a value for i
• Prove that xyiz is not in B
• But PL says xyiz would have to be in B, if B were regular
–Contradiction
• Therefore B is not regular
06/09/2014
27
Example : “ww”
Let F be { ww | w ∈{0,1}* } (Ex. 1.40)
Assume (for the sake of contradiction)
that F is regular. So PL applies to F
Let p be the pumping length for F, and choose s =
0p10p1 so s is in F (s can be any string in F of length at
least p)
Then for some x, y, z s can be written s = xyz = 0p10p1
satisfying PL and condition 3) tells us that |xy|≤p
Because of condition 3, only one possibility for y : y=0k
So if we pump y, (taking i=2) we get xyyz = 0p+k10p1 ∉
F. Pumping lemma says it must be in F if F were
regular but it is not in F => PL doesn’t hold for F =>
F is not regular
(Without using property 3 this is a little more difficult)
04/29/2014
CSE 2001
28
Using closure properties of Regular
Languages with Pumping Lemma
Let C = { w | # of 0s in w equals # of 1s in w}
Problem: If xyz ∈ C with y ∈ C, then xyiz ∈ C
Idea: If C is regular and F is regular, then we know
the intersection set C∩F has to be regular as well
Solution: Assume as usual that C is regular
Take as the regular F = { 0n1m | n,m∈N}, then
for the intersection: C∩F = { 0n1n | n∈N } would be
regular too
But we already know that C∩F is not regular
Conclusion: C is not regular
06/09/2014
29
Pumping Down: E = { 0i1j | i≥j }
Problem: ‘pumping up’ s=0 p1pwith y=0k gives
xyyz = 0p+k1p, xy3z = 0p+2k1p, which are all in E
(hence do not give contradictions)
Solution: pump down to xz = 0p–k1p.
Overall for s = xyz = 0p1p (with |xy|≤p):
y=0k, hence xz = 0p–k1p ∉ E
Contradiction: E is not regular
06/09/2014
30
Pumping lemma review - steps to
prove some language L is not regular
• You know there is a pumping length p for
L (as long as it is an infinite language)
• You choose any string s you like of that
length or longer
• There have to exist x,y,z (satisfying the
criteria) such that s=xyz
• You choose i in xyiz, and prove that the
resulting string cannot be in language L
• i can be 0 or any positive integer
06/09/2014
31
Examples of PL use
• Language
String - i -
Problem with
Pumped string
2
No matter where y
falls, wrong form (or
use property 3)
0p10p1
2
Second part ends in
1, first part in 0
• {0m1n | m>n} 0p+11p
0
•
{anbn|n≧0}
• {ww}
•
0 p1p
{x|x has equal number
of a’s and b’s}
04/29/2014
Reducing number
of 0’s means n≧m
No pumping needed, use closure
under intersection and AnBn result
CSE 2001
32
Mealy Machines (+Moore)
• On assignment you looked at Mealy machines,
aka Finite State Transducers, sequential circuits
• A DFA with output rather than just accept/reject
• Transitions have both input and output symbols
from finite alphabets separated by /
• Very widely used for controls, e.g. elevator,
vending machine, traffic light, alarm system,
simple codes, protocols, process control
• The input alphabet can be signals from sensors
and the output alphabet can be signals to
actuators
• Sold off the shelf to be programmed - PLC
04/29/2014
CSE 2001
33
Simple Mealy controllers
Vending Machine
Mealy
Machine to
dispense
candy after
deposit of 3
nickels
Start
nickel/
nickel/
S0
$0.05
$0.10
dime/release lock
nickel/release lock/
dime/release lock
Extended
to allow
dimes
Start
nickel/
S0
nickel/
$0.05
$0.10
nickel/release lock
dime/
04/29/2014
CSE 2001
34
Break
• When we come back:
• Start of Chapter 2, Context-Free Languages
• Review of test
06/09/2014
35
Chapter 2: Context-free
languages
• Context-Free Languages (CFL)
• Context-Free Grammars (CFG)
derivations, parse trees, ambiguity
• Chomsky Normal Form of CFG
• RL ⊂ CFL
6/9/2014
CSE 2001, Fall 2013
36
Context-Free Languages
Context-free languages (CFLs) are a more powerful
(augmented) model than Finite Automata
CFLs allow us to describe non-regular languages
like { 0n1n | n≥0}
General idea: CFLs are languages that can
be recognized by finite automata that have one
single stack added:
{ 0n1n | n≥0} is a CFL
{ 0n1n0n | n≥0} is not a CFL
6/9/2014
CSE 2001, Fall 2013
37
Context-Free Grammars
Grammars: new way to define/specify a language
Uses substitution rules aka productions that can be
repeatedly applied, starting from a start symbol
What simple process produces the non-regular
language { 0n1n | n ∈ N }?
Start symbol S with rewriting rules:
1) S → 0S1
2) S → ɛ
S yields 0n1n for any n according to
S → 0S1 → 00S11 → … → 0nS1n → 0n1n
6/9/2014
CSE 2001, Fall 2013
38
Context-Free Grammars (Def.)
A context free grammar G=(V,Σ,R,S) is defined by
• V: a finite set variables (or non-terminals)
• Σ: finite set terminal symbols (with V∩Σ=∅)
• R: finite set of substitution rules V → (V∪Σ)*
• S: start symbol ∈ V (usually left side of topmost rule)
Example: ({S},{a,b}, R, S} where R consists of the two rules:
S → aSb
S→ɛ
(Can write this in one line using shorthand S →aSb | ɛ )
6/9/2014
CSE 2001, Fall 2013
39
Derivation ⇒*
A single step derivation “⇒” consists of the
substitution of a variable by a string according
to one of the substitution rules
Example: using the rule “A→BB”, we can have
the derivation “01AB0 ⇒ 01BBB0”
A sequence of several derivations (or none)
is indicated by “ ⇒* ”
Same example: “0AA ⇒* 0BBBB”
• For the grammar with the rules S → 0S1 | ɛ there is
a derivation of the string 000111 as follows: S → 0S1→
00S11→ 000S111→ 000111
6/9/2014
40
Derivations, formally
• If v,v,w are strings of non-terminals and terminals
and A → w is a rule in the grammar we say that
uAv yields uwv, written uAv ⇒ uwv
• We say that u derives v, written u
* v, if u=v or
if there is a sequence u1,u2,...,uk
u1 => u2 =>...=>uk =>v
such that u=>
The language of grammar G with start symbol S is
denoted by L(G):
L(G) = { w ∈ Σ* | S ⇒* w }
06/09/2014
41
Some Remarks
The language L(G) = { w∈Σ* | S ⇒* w }
contains only strings of terminals, not
variables.
Notation: we summarize several rules, like
A→B
A → 01
by
A → B | 01 | AA
A → AA
Unless stated otherwise: topmost rule concerns the start variable
Usually write variables in upper case, terminals in lower
6/9/2014
CSE 2001, Fall 2013
42
Context-Free Grammars (Ex.)
Consider the CFG G=(V,Σ,R,S) with
V = {S}
Σ = {0,1}
R: S → 0S1 | 0Z1
Z → 0Z | ε
Then L(G) = {0i1j | i≥j }
S yields a string of 0s and 1s 0j+k1j according to:
S ⇒ 0S1 ⇒ … ⇒ 0jS1j ⇒ 0jZ1j ⇒ 0j0Z1j ⇒
… ⇒ 0j+kZ1j ⇒ 0j+kε1j = 0j+k1j
6/9/2014
CSE 2001, Fall 2013
43
Importance of CFL
Model for natural languages (Noam Chomsky)
Specification of programming languages:
“parsing of a computer program”
Describes mathematical structures
Intermediate between regular languages and
computable languages (Chapters 3,4,5 and 6)
6/9/2014
CSE 2001, Fall 2013
44
Some closure properties of the
context-free languages
• It is easy to see CFLs are closed under
union, concatenation and star
•
6/9/2014
CSE 2001, Fall 2013
45
Example: Boolean Expressions
Consider the CFG G=(V,Σ,R,S) with
V = {S}
Σ = {0,1,(,),¬,∨,∧}
R: S → 0 | 1 | (¬S) | (S∨S) | (S∧S)
Some elements of L(G):
0
(((¬0)∨1)∧(1∧1))
(1∨(0∧0))
Note: Parentheses prevent “1∨0∧0” confusion.
Consider S → 0 | 1 | S∨S | S∧S
6/9/2014
CSE 2001, Fall 2013
46
Human Languages
Variables enclosed in angle brackes
terminals - strings of english words
Rules:
<SENTENCE> → <NOUN-PHRASE><VERB-PHRASE>
<NOUN-PHRASE> → <CMPLX-NOUN> | <CMPLX-NOUN><PREP-PHRASE>
<VERB-PHRASE> → <CMPLX-VERB> | <CMPLX-VERB><PREP-PHRASE>
<CMPLX-NOUN> → <ARTICLE><NOUN>
<CMPLX-VERB> → <VERB> | <VERB><NOUN-PHRASE> …
a | the
<NOUN> → boy | girl | house
<VERB> → sees | ignores
<ARTICLE> →
Possible element: the boy sees the girl
6/9/2014
CSE 2001, Fall 2013
47
Parse trees and leftmost
derivations
• There is another method to show the derivation of a
string pictorially called a parse tree
S
S
S
0 0 ɛ 11
• If a derivation rewrites the leftmost variable in the
string, the derivation is called a leftmost derivation.
There is a one-to one correspondence between leftmost derivations and parse trees
04/29/2014
CSE 2001
48
Parse Trees
The parse tree of (0)∨((0)∧(1)) via rules
S → 0 | 1 | ¬(S) | (S)∨(S) | (S)∧(S):
S
(
0
) ∨
S
(
S
(
S
)
) ∨
(
0
6/9/2014
CSE 2001, Fall 2013
S
)
1
49
Ambiguity
A grammar is ambiguous if some strings are
derived ambiguously.
A string is derived ambiguously if it has more
than one leftmost derivations.
Typical example: rule S → 0 | 1 | S+S | S×S
S ⇒ S+S ⇒ S×S+S ⇒ 0×S+S ⇒ 0×1+S ⇒ 0×1+1
versus
S ⇒ S×S ⇒ 0×S ⇒ 0×S+S ⇒ 0×1+S ⇒ 0×1+1
6/9/2014
CSE 2001, Fall 2013
50
Ambiguity and Parse Trees
The ambiguity of 0×1+1 is shown by the two
different parse trees:
S
S
S
S
0
04/29/2014
×
+
S
S
S
1
0
×
S
S
1
1
CSE 2001
+
S
1
51
More on Ambiguity
The two different derivations:
S ⇒ S+S ⇒ 0+S ⇒ 0+1
and
S ⇒ S+S ⇒ S+1 ⇒ 0+1
do not constitute an ambiguous string 0+1
(They are not both leftmost derivations and they will have
the same parse tree)
However the above grammar does produce ambiguous
strings. In this case there are other grammars for the same
language that are not ambiguous
Languages that can only be generated by
ambiguous grammars are “inherently ambiguous”
6/9/2014
CSE 2001, Fall 2013
52
Inherent Ambiguity
• Some context-free languages are
inherently ambiguous, for example:
{aibjck | i=j or j=k}
The grammar for simple arithmetic expressions
given at the bottom of page 105 is ambiguous,
but the language it describes is not inherently
ambiguous - See example 2.4 and the note at the
top of page 104
6/9/2014
CSE 2001, Fall 2013
53
Context-Free Languages
Any language that can be generated by a context
free grammar is a context-free language (CFL).
The CFL { 0n1n | n≥0 } shows us that certain
CFLs are nonregular languages.
Q1: Are all regular languages context free?
Q2: Which languages are outside the class CFL?
6/9/2014
CSE 2001, Fall 2013
54
“Chomsky Normal Form”
A context-free grammar G = (V,Σ,R,S) is in
Chomsky normal form if every rule is of the form
A → BC
or
A→x
with variables A∈V and B,C∈V \{S}, and x∈ Σ
For the start variable S we also allow the rule
S → ε (but the start symbol may not appear on
the right hand side of any rule)
Advantage: Grammars in this form are far
easier to analyze.
6/9/2014
CSE 2001, Fall 2013
55
Theorem 2.9
Every context-free language can be described
by a grammar in Chomsky normal form.
Outline of Proof:
We can rewrite any CFG into equivalent Chomsky
normal form.
We do this by replacing, one-by-one, every rule
that is not ‘Chomsky’.
We have to take care of: Starting Symbol,
ε symbol, all other violating rules.
6/9/2014
CSE 2001, Fall 2013
56
Proof of Theorem 2.9
Given a context-free grammar G = (V,Σ,R,S),
rewrite it to Chomsky Normal Form by
1) Add a new start symbol S0 (and add rule S0→S)
2) Remove A→ε rules (from the tail):
before: B→xAy and A→ε, after: B→ xAy | xy
3) Remove unit rules A→B (by the head): “A→B”
and “B→xCy”, becomes “A→xCy” and “B→xCy”
4) Shorten all rules to two: before: “A→B1B2…Bk”,
after: A→B1A1, A1→B2A2,…, Ak-2→Bk-1Bk
5) Replace ill-placed terminals “a” by Ta with Ta→a
6/9/2014
CSE 2001, Fall 2013
57
Careful Removing of Rules
Do not introduce new rules that you removed
earlier.
Example: A→A simply disappears
When removing A→ε rules, insert all new
replacements:
B→AaA becomes B→ AaA | aA | Aa | a
6/9/2014
CSE 2001, Fall 2013
58
Example of Chomsky NF
Initial grammar: S→ aSb | ε
In Chomsky normal form:
S0 → ε | TaTb | TaX
X → STb
S → TaTb | TaX
Ta → a
Tb → b
6/9/2014
CSE 2001, Fall 2013
59
RL ⊆ CFL
Every regular language can be expressed by
a context-free grammar.
Proof Idea:
Given a DFA M = (Q,Σ,δ,q0,F), we construct a
corresponding CF grammar GM = (V,Σ,R,S)
with V = Q and S = q0
Rules of GM:
qi → x δ(qi,x) for all qi∈V and all x∈Σ
qi → ε
for all qi∈F
6/9/2014
CSE 2001, Fall 2013
60
Example RL ⊆ CFL
0
The DFA
1
1
q1
leads to the
context-free grammar
GM = (Q,Σ,R,q1) with the rules
q1 → 0q1 | 1q2
q2 → 0q3 | 1q2 | ε
q3 → 0q2 | 1q2
6/9/2014
CSE 2001, Fall 2013
0
q2
q3
0,1
61
Picture Thus Far
??
context-free
languages
Regular
languages
{ 0 n1 n }
6/9/2014
CSE 2001, Fall 2013
62
Summary
• Every Regular Language can be represented by a regular
expression
• There are languages that aren’t regular
– AnBn isn’t regular
– “ww” isn’t regular
– Pumping Lemma holds for all regular languages
– Using pumping lemma (and closure properties of Regular Languages)
you can prove some languages aren’t regular
• Context-free languages and grammars (all of 2.1 and some of
2.3)
– Productions, derivations, parse trees, closure properties
– Ambiguity
– Chomsky Normal form
• Before next time go over Chapter 2.1,skim 2.3, and especially
Chapter 3.1 - there will be a lot to absorb in one session
06/09/2014
63
Exercises for week 5
• Try Sipser, Exercises, 1.21a or b, 1.29b, 1.30
• Problems: 1.31, 1.34, 1.40b?, 1.43, 1.46c or d
06/09/2014
64
Test 1
• Average was 13.4
• Your mark times 4 gives you your grade on
York’s grade scale, e.g. pass is 12.5
• Colleagues thought the test was too easy
• Next time - we will have longer time, lots more
material, some parts will be similar style but also
adding some “think” or “apply” questions.
• It will cover everything from start of term up to
the week before, so you have to really get going
learning Chapter 2.1 and 3, as well as make up
anything missing from the first two chapters
• Do the exercises, go to office hours and tutorials,
read the book, don’t get behind
06/09/2014
65
Test Comments
• Don’t mark up your test now-keep separate
notes. You could get confused about what your
original answer was
• There were some minor variations between the
two versions, so if a question seems out of order,
has a’s and b’a instead of 0’s and 1’s, or asks for
a different definition, you’ll have to adjust
• Check overall addition, etc.
• To request a re-grade of a question, write down
your reason (along with a statement that you
have not modified your test booklet) and hand in
to me next week
• I have already regraded question 8 extensively
06/09/2014
66
Questions: Version A, page 2
1. The formal definition of acceptance: see
Sipser items 1,2,3 page 40 (2nd edition) (This
is important to know, analyzing computation as
a sequence of steps state by state is central)
1. r0=q0
2. δ(ri,wi+1)=ri+1 for 0≦ i ≦ n-1
3. rn∈F
2.Definition of concatenation or star of A and B:
A∘B={xy | x∈A and y∈B} Definition of * of A:
A*={ɛ} U A UA∘A U A∘A∘A U ... (no end)
3. i.e. they have an even number of ones :
(0*10*10*)*
06/09/2014
67
Test 1,page 3 and 4
4. ba: {q0,q1q2,q3} ab: {q2,q1} abb:{q2,q3}
abba:{q1,q0}
5. No, consider the string “a” - it would be
accepted by both M and N
6. {w|w is an even length string of a’s and
b’s ending in aa} ( I assume b doesn’t
need to appear) Two easy ways:
1. draw 2 NFA’s, one for even length strings, one
for strings ending in “aa”, so their intersection
is regular
2. regular expression (aUb)*aa
06/09/2014
68
Test 1 page 5
7. (aUb)c: 2 points for each part. If you did not use something
like the construction asked for, but it’s correct otherwise you
get .5/2 There were many incorrect answers for the union, you
have to use ɛ transitions from all former accept states
8. Badly done. Ambiguity over “even multiple”, it would have
been better for me to say “evenly divisible” not a multiple that is
even, and the first meaning is the language that the DFA
recognizes. I remarked it twice. Induction is a basic tool. I was
looking for something like:
Induction Hypothesis: For any i≧0, a string with i a’s and
any number of b’s brings M to state q i mod 3
Alternate (not as good): For any n≧0, any string containing 3n
a’s (or 6n) and any number of b’s takes M to the (accepting)
state q0, and any string containing 3n +1 a’s takes M to
(nonaccepting) state q1, and any string containing 3n+2 a’s
takes M to (nonaccepting) state q2...
06/09/2014
69
Exercises for week 5
• Chapter 1:
• Try Sipser, Exercises, 1.21a or b, 1.29b, 1.30
• Problems: 1.31, 1.34, 1.40b?, 1.43, 1.46c or d
• Chapter 2: 2.4 e, 2.6b,2.8, 2.14, 2.16
• Others: 2.15, 2.17, 2.9, 2.14, 2.26
• A pretty hard grammar to write: 2.22
6/9/2014
70
Download