Prof. Carla Gomes gomes@cs.cornell.edu
Module
Modeling Computation:
Language Recognition
Rosen, Chapter 12.4
1
2
Definition: A regular set is a set that can be generated starting from the empty set, empty string , and single elements from the alphabet , using concatenations, unions, and Kleene closures in arbitrary order.
We will give a more precise definition after we define a regular expression .
Definition: The regular expressions over a set I are defined recursively by:
–
(the empty set) is a regular expression,
–
(the set containing the empty string) is a regular expression,
– x is a regular expression for all x
I,
– ( AB ) , ( A
B ) , and A * are regular expressions if A and B are regular expressions
Definition: A regular set is a set represented by a regular expression.
Examples: 001 * , 1(0
(0
1) * 11, and AB * C are regular expressions
The regular set defined by the regular expression 01 * is the set of strings starting with a 0 followed by 0 or more 1 s.
The regular set defined by (10) * is the set of strings containing 0 or more copies of 10 .
The regular set defined by 0(0
1) * 1 is the set of all binary strings beginning with 0 and ending with 1.
The regular set defined by (0
1)1(0
1) is the set of strings { 010 , 011 , 110 , 111 }.
What are the strings represented by
10*
A 1 followed by any mnumber of 0s (including no zeros)
(10)*
Any number of copies of 10 (including null string)
6
0
01 the string 0 or the string 01
0 (0
1)*
Any string beginning with 0
(0*1)*
Any string not ending with a 0 (including null string)
7
The set of bit strings with even length
(00
01
10
11)*
Set of bit strings ending with a 0 not containing 11
Concatenations of 0 or 10 ; not the null string
(0
10)*(0
10)
8
The set of bit strings containing and odd number of 0s
At least one 0
Zero or more 1s, followed by a 0, followed by zero or more 1
1*01*(01*01*)*
9
Regular expressions are actually used quite often in computer science.
For instance, if you are editing a file with vi , and want to see if it contains the string blah followed by a number followed by any character followed by the letter Q , you can use the regular expression blah[0-9][0-9]*.Q
This works because vi uses regular expressions for searching.
Regular Expression Regular Grammar a*
(a+b)* a* + b* a*b ba*
(ab)*
S
| aS
S
| aS | bS
S
| A | B
A
a | aA
B
b | bB
S
b | aS
S
bA
A
| aA
S
| abS
Consider the language { a m b n | m, n
N }, which is represented by the regular expression a*b *.
A regular grammar for this language can be written as follows:
S
| aS | B
B
b | bB.
• Consider the set
A ={binary strings which start with 0 and end with 1 }
We saw that A is recognized by a finite-state automata.
A is generated by the grammar with V={S,A,0,1},
T={0,1}, and P={S
0A, A
0A, A
1A, A
1}
We also saw that A is defined by the regular expression
0(0
1) * 1
• This is no coincidence, as we will see next.
Regular expressions
Finite automata
Each can describe the others
Regular languages
Kleene’s Theorem:
For every regular expression, there is a deterministic finite-state automaton that defines the same language, and vice versa.
•
Theorem: Let L be a language. The following three statements are equivalent
L is regular set (that is, L generated by a regular expression)
L is a regular language (that is, L generated by a regular grammar)
L is recognized by a finite-state automaton
• Put another way, L is a regular set if and only if L is a regular language if and only if L is recognized by a finitestate automaton .
• In other words, regular sets, regular languages, and languages recognized by finite-state automata are all the same thing.
•
Example: What language does the following finite-state automaton recognize?
• If start by going to state S
1 can recognize 000, 0110, 011100, 0111110,
011111100, 00100, 0010100, 01110110, 01110100, …
• It is not easy to see the pattern right away, but notice that they
Start with 0
Can have any number of instances of 111 or 01 interleaved
Can then have either 00 or 110
Can end with any number of 1s.
• These are all of the form 0(111
01)*(00
110)1*
• But we can also start by going to S
6
• If we start by going to S
6
Start with 1 we notice that the strings
Have any number of occurrences of 01
Have a 1
End with as many 0s as we want
• These are of the form
1(01)*10*
• Thus, we can recognize
(0(111
01)*(00
110)1*)
1(01)*10*)
•
Problem: Find a finite-state automaton that recognizes the following language
L ={0 n 1 n | n=0,1,2,…}
• Solution: It cannot be done.
• Proof: Take an advanced course.
• Can you describe
L with a regular expression ?
• Can you give a regular grammar that generates L ?
• Can you give any grammar that generates L ?
DFA
Push down automata
Bounded Turing M’s
Turing machines -
-
-
regular languages
Context-free
Context sensitive
Phrase-structure
20
• Hopefully it is clear that although finite-state machines and finite-state automata are useful models of computation, they have serious limitations.
• Are there more powerful ways to model computation?
• The answer is: Yes.
• Some more powerful models include
Pushdown automaton
Linear bounded automaton
Turing machines
Quantum computation models