Formal Language Theory…what does this mean

advertisement
1
* Lesson 1
Formal Languages
Formal Language Theory … what does this mean?
We are all familiar with natural languages
e.g. English, French, Chinese
The building blocks in English, French and many other languages are letters.
In Chinese???
We build words from these letters…
We all speak English in this class.
Tu ne pouvais pas choisir un autre jour?
In Formal Language Theory – we begin with an alphabet
Alphabet:
A finite set of symbols (usually nonempty)
e.g.
A1 = {a,b, …,z}
English Alphabet
A2 = {0,1}
Binary Alphabet
String: ( or word )
A sequence of symbols from some alphabet
Concatenation of strings:
Let , be stings, then    = 
e.g.
 = bat,  = man
   = bat  man = batman
Concatenation is a binary operation from S  S  S … much as addition of integers is a
binary operation from Z  Z  Z ; Z = {…-2,-1,0,1,2,…}
e.g.
2+3=5
2
note: subtraction of natural numbers is not a binary operation
e.g.
N = {0,1,2,3,…}
2 – 6 = -4,
-4  N (i.e. we do not have closure)
Questions:
Is multiplication of integers a binary operation?
What about division?
Is division of rational numbers a binary operation?
Q = {…  1 2 ,  1 4 ,0, 1 3 , 3 4 ,…}
Q = {(p,q)|p,q  Z, with q 0}
Length of a string
length() or l() or ||
the number of symbols that  contains
length(01) = 2
l(bat) = 3
|man| = 3
Observe that the length of    = || + ||
e.g.
bat  man = batman
l(bat) + l(man) = l(batman)
3 +
3
=
Empty string: 
the unique sting of length zero
i.e. || = 0
note:
   =    = , for all 
e.g. bat   =   bat = bat
We say that  is the identity for concatenation
And naturally    = 
What is the identity element for <N,+>?
for <Z,>?
for multiplication of Z  Z matrices with real coefficients?
the reals R = {…-,147,0,.4,1, 2 …}
6
3
Substring – Let the string  = uvw, v is a substring of 
Prefix – A prefix of a string is a substring that occurs ”at the beginning” of that string
 = uvw, u is a prefix of 
Suffix – A suffix of a string is a substring that occurs “at the end” of that string
 = uvw, w is a suffix of 
example:
Consider the string  = 0101 over  = {0,1}
The prefixes of  include: 0,01,010
We note that every string is a prefix of itself and … is a prefix of every string
(including itself)
Hence the complete set of prefixes of 0101 is {
,0
,01
010
,
,0101}
proper prefixes
The first four prefixes are said to be proper prefixes
The suffixes of  = 0101 are {
,
1,01
101
,
,0101}
proper suffixes
The first four suffixes are said to be proper suffixes
We now list the subwords (or substrings) of  = 0101:
,0,01,010,0101
1,101
//every prefix is a subword
//as is every suffix
Are there any others?
i.e. Does  contain any subwords that are neither prefixes nor suffixes?
We observed in these examples that a string of length four has five prefixes
More generally if || = n, then  has n+1 prefixes
The proof of this employs ???
A similar proof would verify that when || = n, n+1 suffixes exist
4
Why is deriving a similar result for the number of subwords more difficult?
What is the most general result you can provide here?
more definitions…
Concatenation of a set of strings
Let A = {0,1}, B = {a,b},
then A  B = {w | prefixes of w come from A and suffixes of w come from B}
i.e. A  B={0a,0b,1a,1b}
Observe if |A| = m and |B| = n, then |A  B| = m  n
//Proof by ?
Note
A2 is merely A  A = {0,1}  {0,1} = {00,01,10,11}
all strings of length two formed from A
A3 = A  A  A = {0,1}  {0,1}  {0,1} = {000,001,010,011,100,101,110,111}
all strings of length three formed over A
A0 = ?
…well, the set of all strings of length zero over A.
Hence, A0 = {}
careful...| A0 |=1, not zero
Kleene closure of an alphabet (A) – the set of all strings that one can form using the
symbols from the alphabet
more formally…

A = A0  A1  A2     A i ={,0,1,00,01,10,11,000,…,111,0000,…,1111,…}
i 0
Positive closure of an alphabet (A+) – the set of all strings of length one or more formed
from the symbols of some alphabet.

A+ =
A
i
= A \ A0 = {0,1,00,01,10,11,000,…,111,0000,…,1111,…}
i 0
is all non-zero length strings
5
Language L over some alphabet 
L  
i.e. L is a subset of 
examples:
Let  = {0,1}, then  = {,01,0011,000111,…}
L1 = {0n1n | n  0},
i.e.
L1 is the set of all strings consisting of an equal number of 0’s and 1’s,
where all the 0’s come first
So, L1 = {,01,0011,000111,…}
note L1 = {0n1n | n  0} is called a language expression
L2 = {w | w{0,1} s.t. no(w) = n1(w)}
Here, also the number of 0’s equals the number of 1’s, but their order in a string
does not matter
L2 = {,01,10,0011,1100,0101,1010,000111,…}
Observe that every string in L1 is also in L2. However, L2 contains many strings
not in L1
e.g. 10
We have L1  L2
L3 = {w | w  {0,1} with w = wR}
L3 consists of all binary palindromes
A palindrome is a word that reads the same way forward and backwards
Palindromes in English: mom, pop, dad, radar, madam, otto
So L3 = {,0,1,00,11,010,101,000,111,…}
note R =  and not  !
And now, some algebra…
Semigroup – A semigroup <S,  > is a set S together with a binary operation  where the
following property holds:
(x  y)  z = x  (y  z), x,y,z S
i.e.  is associative
6
examples:
<N,+> is a semi-group
2+3=5
//we have closure
And we know that addition of natural numbers is associative…
i.e. (x+y) + z = x + (y+z), x,y,z N
(2+3) + 4 = 2 + (3+4)
5 +4=2+ 7
9 = 9
<Z, > is a semi-group

Z  Z  Z, i.e. multiplication is a binary operation here
And multiplication of integers is associative
<N,-> is not a semi-group… Why not?
<Z,-> is not a semi-group… Why not?
<Z+,+> ???, where Z+ is the positive integers
Z+ = {1,2,3,…}
Monoid – a monoid <M,  > is a set M together with a binary operation  where the
following properties hold:
(x  y)  z = x  (y  z) , x,y,z M
i.e.  is associative
and  an element e (identity element) :
xe = ex = x
examples:
Consider once again <Z,>
We cited earlier that <Z,> is a semi-group
To verify that it forms a monoid as well, we require an identity element
e=
?
+
We also verified that <Z ,+> is a semi-group
Recall that Z+ = {1,2,3,…}
Is < Z+,+> a monoid?
7
How about < Z+{0},+> …is this a monoid?
And finally, given an alphabet  is <,  > a monoid?
where  is the concatenation of strings
Group – A group <G,  > is a set G together with a binary operation  where the
following properties hold:
(x  y)  z = x  (y  z), x,y,z G
 is associative
 an identity element e :
xe = ex = x
For each element x  G,  an element x-1
(called the inverse of x) :
x  x-1 = x-1  x = e
examples:
<Z,+> is a group

+ is a binary operation:
Z  Z Z
+ on integers is associative
(2+3) + 4 = 2 + (3+4)
identity e = 0
3+0=0+3=3
inverses exist
x-1 = -x
e.g. 3-1 = -3
3 + (-3) = (-3) + 3 = 0
Which of the following are groups? Why or why not?
i)
ii)
iii)
iv)
< Z+{0},+>
< Z,>
<Q, >
<,  >, where  is a non-empty alphabet
8
The Relationship Between Languages and Problems
* Problem Specification
PRIME
INSTANCE: An integer n
QUESTION: Is n a prime number?
//Note this is a Decision Problem, i.e. Answer is YES or NO
Language of a Problem
Lprime = {1p | p prime} = {11,111,11111,…}
Then an instance of this problem may be viewed as w  Lprime
//membership problem
note w  Lprime iff 1|w| is prime (in unary)
Give the problem specification and corresponding language for the question:
Is the integer n a perfect square?
Optimization Problems may also be viewed in terms of languages
Traveling Salesperson Problem (TSP)
INSTANCE
An integer n  1, n distinct cities C1, …, Cn, the positive integral
distances between every pair Ci,Cj, denoted by d(Ci,Cj)
QUESTION
What is the minimal tour through these cities?
i.e. find the minimal COST(i1,…,in)
n1
 d(C
j 1
ij
, Ci j1 )  d(Cin , Ci1 )
We must first convert this problem to a corresponding Decision Problem
Traveling Salesperson Check (TSC)
//Decision Problem
INSTANCE An integer n  1, n distinct cities C1, …, Cn, the positive integral
distances between every pair Ci,Cj, denoted by d(Ci,Cj) and a
positive integral bound B
QUESTION Does there exist a tour Ci1 ,..., Cin of the cities such that
COST(i1,…,in) =
n1
 d(C
j 1
ij
, Ci j1 )  d(Cin , Ci1 )  B
9
And finally, one can encode a Decision Problem as a Language Problem
for example – we may assume that the cities C1, …, Cn are represented by the integers 1,…,n
Each distance d: 1…n  1…n  N can be represented as a set of triples (i,j,d(i,j)), 1  i, j  n
And a pair (d,B) may be represented as a pair: ({(i,j,d(i,j): 1  i, j  n},B)
Then for a particular instance, the language of that instance of the problem would be the set
of all words that correspond to “yes” answers to that instance.
Let us consider the following instance of this problem:
J
d
i
1
2
3
4
1
3
4
3
1
2
2
1
1
1
3
1
2
4
1
4
2
3
2
5
So here n = 4, d is provided by the above table and we let B = 8. Then we obtain:
({(1,1,3),(1,1,2),(1,3,1),(1,4,2),(2,1,4),(2,2,1),(2,3,2),(2,4,3),
(3,1,3),(3,2,1),(3,3,4),(3,4,2),(4,4,1),(4,2,1),(4,3,1),(4,4,5)},8)
Then we must encode integers in some manner
Let each nonnegative integer I be encoded as the word abia
Hence we obtain:
({aba,aba,ab3a),(aba,ab2a,ab2a),…},ab8a) as possible encoding for this instance of the TSC
We may choose to include { , } ( , and ) as symbols in our representation or we may replace them by
words over {a,b}. For example,
{  baab
}  baaab
(  baaab
)  baaaaab
,  bab
An instance of this problem determined by (a1,…,am) is represented by r(a1),…,r(am), where r(ai) is a
Representative corresponding to ai, i.e. it is a word
10
The representation of the set of yes instances is denoted by L(r,) and is defined by
L(r, ) = {x1: … : xm : (a1,…,am) determines a yes instance of  and xi a representation of ai, 1  i  m}
: is a separation symbol
This is called the language of yes instances of the problem  under r
We have thus transformed a computational situation into a language-theoretical one
ALGORITHM EXISTENCE
Does there exist an algorithm for a problem ?
can be replaced by:
DECISION ALGORITHM EXISTENCE
Does there exist a decision algorithm for the decision problem D obtained from ?
which can be transformed into:
MEMBERSHIP TESTING
Does there exist a decision algorithm for the language problem obtained from ?
----------------------------------------------------------------------------------------------------------------
In this course we will be concerned with unsolvable problems
One way to be sure that unsolvable problems do indeed exist is to show that there are more
problems than algorithms
Hence we must discuss cardinality
Cardinality of a Set
-Countable sets –
-Uncountable sets –
N, Z, Q, 
R, L = {Li | Li  }
If we prove that |L| > || we have done so.
* examples from Theory of Computation by Derrick Wood , John Wiley & Sons Inc.
11
Download