1 * Lesson 1 Formal Languages Formal Language Theory … what does this mean? We are all familiar with natural languages e.g. English, French, Chinese The building blocks in English, French and many other languages are letters. In Chinese??? We build words from these letters… We all speak English in this class. Tu ne pouvais pas choisir un autre jour? In Formal Language Theory – we begin with an alphabet Alphabet: A finite set of symbols (usually nonempty) e.g. A1 = {a,b, …,z} English Alphabet A2 = {0,1} Binary Alphabet String: ( or word ) A sequence of symbols from some alphabet Concatenation of strings: Let , be stings, then = e.g. = bat, = man = bat man = batman Concatenation is a binary operation from S S S … much as addition of integers is a binary operation from Z Z Z ; Z = {…-2,-1,0,1,2,…} e.g. 2+3=5 2 note: subtraction of natural numbers is not a binary operation e.g. N = {0,1,2,3,…} 2 – 6 = -4, -4 N (i.e. we do not have closure) Questions: Is multiplication of integers a binary operation? What about division? Is division of rational numbers a binary operation? Q = {… 1 2 , 1 4 ,0, 1 3 , 3 4 ,…} Q = {(p,q)|p,q Z, with q 0} Length of a string length() or l() or || the number of symbols that contains length(01) = 2 l(bat) = 3 |man| = 3 Observe that the length of = || + || e.g. bat man = batman l(bat) + l(man) = l(batman) 3 + 3 = Empty string: the unique sting of length zero i.e. || = 0 note: = = , for all e.g. bat = bat = bat We say that is the identity for concatenation And naturally = What is the identity element for <N,+>? for <Z,>? for multiplication of Z Z matrices with real coefficients? the reals R = {…-,147,0,.4,1, 2 …} 6 3 Substring – Let the string = uvw, v is a substring of Prefix – A prefix of a string is a substring that occurs ”at the beginning” of that string = uvw, u is a prefix of Suffix – A suffix of a string is a substring that occurs “at the end” of that string = uvw, w is a suffix of example: Consider the string = 0101 over = {0,1} The prefixes of include: 0,01,010 We note that every string is a prefix of itself and … is a prefix of every string (including itself) Hence the complete set of prefixes of 0101 is { ,0 ,01 010 , ,0101} proper prefixes The first four prefixes are said to be proper prefixes The suffixes of = 0101 are { , 1,01 101 , ,0101} proper suffixes The first four suffixes are said to be proper suffixes We now list the subwords (or substrings) of = 0101: ,0,01,010,0101 1,101 //every prefix is a subword //as is every suffix Are there any others? i.e. Does contain any subwords that are neither prefixes nor suffixes? We observed in these examples that a string of length four has five prefixes More generally if || = n, then has n+1 prefixes The proof of this employs ??? A similar proof would verify that when || = n, n+1 suffixes exist 4 Why is deriving a similar result for the number of subwords more difficult? What is the most general result you can provide here? more definitions… Concatenation of a set of strings Let A = {0,1}, B = {a,b}, then A B = {w | prefixes of w come from A and suffixes of w come from B} i.e. A B={0a,0b,1a,1b} Observe if |A| = m and |B| = n, then |A B| = m n //Proof by ? Note A2 is merely A A = {0,1} {0,1} = {00,01,10,11} all strings of length two formed from A A3 = A A A = {0,1} {0,1} {0,1} = {000,001,010,011,100,101,110,111} all strings of length three formed over A A0 = ? …well, the set of all strings of length zero over A. Hence, A0 = {} careful...| A0 |=1, not zero Kleene closure of an alphabet (A) – the set of all strings that one can form using the symbols from the alphabet more formally… A = A0 A1 A2 A i ={,0,1,00,01,10,11,000,…,111,0000,…,1111,…} i 0 Positive closure of an alphabet (A+) – the set of all strings of length one or more formed from the symbols of some alphabet. A+ = A i = A \ A0 = {0,1,00,01,10,11,000,…,111,0000,…,1111,…} i 0 is all non-zero length strings 5 Language L over some alphabet L i.e. L is a subset of examples: Let = {0,1}, then = {,01,0011,000111,…} L1 = {0n1n | n 0}, i.e. L1 is the set of all strings consisting of an equal number of 0’s and 1’s, where all the 0’s come first So, L1 = {,01,0011,000111,…} note L1 = {0n1n | n 0} is called a language expression L2 = {w | w{0,1} s.t. no(w) = n1(w)} Here, also the number of 0’s equals the number of 1’s, but their order in a string does not matter L2 = {,01,10,0011,1100,0101,1010,000111,…} Observe that every string in L1 is also in L2. However, L2 contains many strings not in L1 e.g. 10 We have L1 L2 L3 = {w | w {0,1} with w = wR} L3 consists of all binary palindromes A palindrome is a word that reads the same way forward and backwards Palindromes in English: mom, pop, dad, radar, madam, otto So L3 = {,0,1,00,11,010,101,000,111,…} note R = and not ! And now, some algebra… Semigroup – A semigroup <S, > is a set S together with a binary operation where the following property holds: (x y) z = x (y z), x,y,z S i.e. is associative 6 examples: <N,+> is a semi-group 2+3=5 //we have closure And we know that addition of natural numbers is associative… i.e. (x+y) + z = x + (y+z), x,y,z N (2+3) + 4 = 2 + (3+4) 5 +4=2+ 7 9 = 9 <Z, > is a semi-group Z Z Z, i.e. multiplication is a binary operation here And multiplication of integers is associative <N,-> is not a semi-group… Why not? <Z,-> is not a semi-group… Why not? <Z+,+> ???, where Z+ is the positive integers Z+ = {1,2,3,…} Monoid – a monoid <M, > is a set M together with a binary operation where the following properties hold: (x y) z = x (y z) , x,y,z M i.e. is associative and an element e (identity element) : xe = ex = x examples: Consider once again <Z,> We cited earlier that <Z,> is a semi-group To verify that it forms a monoid as well, we require an identity element e= ? + We also verified that <Z ,+> is a semi-group Recall that Z+ = {1,2,3,…} Is < Z+,+> a monoid? 7 How about < Z+{0},+> …is this a monoid? And finally, given an alphabet is <, > a monoid? where is the concatenation of strings Group – A group <G, > is a set G together with a binary operation where the following properties hold: (x y) z = x (y z), x,y,z G is associative an identity element e : xe = ex = x For each element x G, an element x-1 (called the inverse of x) : x x-1 = x-1 x = e examples: <Z,+> is a group + is a binary operation: Z Z Z + on integers is associative (2+3) + 4 = 2 + (3+4) identity e = 0 3+0=0+3=3 inverses exist x-1 = -x e.g. 3-1 = -3 3 + (-3) = (-3) + 3 = 0 Which of the following are groups? Why or why not? i) ii) iii) iv) < Z+{0},+> < Z,> <Q, > <, >, where is a non-empty alphabet 8 The Relationship Between Languages and Problems * Problem Specification PRIME INSTANCE: An integer n QUESTION: Is n a prime number? //Note this is a Decision Problem, i.e. Answer is YES or NO Language of a Problem Lprime = {1p | p prime} = {11,111,11111,…} Then an instance of this problem may be viewed as w Lprime //membership problem note w Lprime iff 1|w| is prime (in unary) Give the problem specification and corresponding language for the question: Is the integer n a perfect square? Optimization Problems may also be viewed in terms of languages Traveling Salesperson Problem (TSP) INSTANCE An integer n 1, n distinct cities C1, …, Cn, the positive integral distances between every pair Ci,Cj, denoted by d(Ci,Cj) QUESTION What is the minimal tour through these cities? i.e. find the minimal COST(i1,…,in) n1 d(C j 1 ij , Ci j1 ) d(Cin , Ci1 ) We must first convert this problem to a corresponding Decision Problem Traveling Salesperson Check (TSC) //Decision Problem INSTANCE An integer n 1, n distinct cities C1, …, Cn, the positive integral distances between every pair Ci,Cj, denoted by d(Ci,Cj) and a positive integral bound B QUESTION Does there exist a tour Ci1 ,..., Cin of the cities such that COST(i1,…,in) = n1 d(C j 1 ij , Ci j1 ) d(Cin , Ci1 ) B 9 And finally, one can encode a Decision Problem as a Language Problem for example – we may assume that the cities C1, …, Cn are represented by the integers 1,…,n Each distance d: 1…n 1…n N can be represented as a set of triples (i,j,d(i,j)), 1 i, j n And a pair (d,B) may be represented as a pair: ({(i,j,d(i,j): 1 i, j n},B) Then for a particular instance, the language of that instance of the problem would be the set of all words that correspond to “yes” answers to that instance. Let us consider the following instance of this problem: J d i 1 2 3 4 1 3 4 3 1 2 2 1 1 1 3 1 2 4 1 4 2 3 2 5 So here n = 4, d is provided by the above table and we let B = 8. Then we obtain: ({(1,1,3),(1,1,2),(1,3,1),(1,4,2),(2,1,4),(2,2,1),(2,3,2),(2,4,3), (3,1,3),(3,2,1),(3,3,4),(3,4,2),(4,4,1),(4,2,1),(4,3,1),(4,4,5)},8) Then we must encode integers in some manner Let each nonnegative integer I be encoded as the word abia Hence we obtain: ({aba,aba,ab3a),(aba,ab2a,ab2a),…},ab8a) as possible encoding for this instance of the TSC We may choose to include { , } ( , and ) as symbols in our representation or we may replace them by words over {a,b}. For example, { baab } baaab ( baaab ) baaaaab , bab An instance of this problem determined by (a1,…,am) is represented by r(a1),…,r(am), where r(ai) is a Representative corresponding to ai, i.e. it is a word 10 The representation of the set of yes instances is denoted by L(r,) and is defined by L(r, ) = {x1: … : xm : (a1,…,am) determines a yes instance of and xi a representation of ai, 1 i m} : is a separation symbol This is called the language of yes instances of the problem under r We have thus transformed a computational situation into a language-theoretical one ALGORITHM EXISTENCE Does there exist an algorithm for a problem ? can be replaced by: DECISION ALGORITHM EXISTENCE Does there exist a decision algorithm for the decision problem D obtained from ? which can be transformed into: MEMBERSHIP TESTING Does there exist a decision algorithm for the language problem obtained from ? ---------------------------------------------------------------------------------------------------------------- In this course we will be concerned with unsolvable problems One way to be sure that unsolvable problems do indeed exist is to show that there are more problems than algorithms Hence we must discuss cardinality Cardinality of a Set -Countable sets – -Uncountable sets – N, Z, Q, R, L = {Li | Li } If we prove that |L| > || we have done so. * examples from Theory of Computation by Derrick Wood , John Wiley & Sons Inc. 11