6-1 Theory of Computation Chapter 6 Formal Languages Formal Languages definitions A vocabulary/alphabet V is a finite, nonempty, set of symbols. A word over V is a finite length string of symbols from V. The set V* is the set of all words over V. A language over V is any subset of V*. Phrase-Structure Grammar A phrase-structure grammar is a 4-tuple (V,VT, S, P ) where V is the vocabulary VT is the set of terminals S is the start symbol P is a set of production rules Eg V={0,1,S} S=S VT={0,1} P={S0S, S1} Let w1 and w2 be words over V. Then w1 directly generates/derives w2 written w1 w2, if is a production from P and w1 contains an instance of and w2 is identical to w1 with one instance of replaced by If w1, w2, w3……. wn are words over V and w1 w2, w2 w3,…, wn-1 wn. Then w1 generates/derives w2 Written w1 wn The language L generated by G, sometimes denoted L(G) is the set L={wVT|S w} 6-1 6-2 Theory of Computation to sent output (sentence nounphrase verbphrase) end to nounphrase output (sentence the adjective noun) end to verbphrase output (sentence verb nounphrase) end to noun output pick [girl boy elephant zebra giraffe clown] end to adjective output pick [big happy little funny silly] end to verb output pick [hugs punches likes visits] end ? print sent THE FUNNY CLOWN HUGS THE HAPPY ZEBRA ? print sent THE LITTLE GIRAFFE PUNCHES THE SILLY BOY G = { {S,B,C, a,b,c}, {a,b,c}, S, P} where the productions P are S CB bB cC aSBC BC bb cc S aB bC aBC ab bc Using these productions as "rewriting rules" it can be shown that, starting with S, we can derive any string of the form anbncn 6-2 6-3 Theory of Computation S aSBC aaBCBC (using S aabCBC (using aB aabBCC (using CB aabbCC (using bB aabbcC (using bC aabbcc (using cC aBC) ab) BC) bb) bc) cc) is a valid derivation of a2b2c2. Equivalent Grammars Two different grammars G1 and G2 may generate the same language, i.e. it may be that L(G1) = L(G2). Such grammars are said to be equivalent. There is no general procedure for determining whether two arbitrary grammars are equivalent (c.f. the halting problem). G = { {S, a, b}, {a,b}, S, P} where the productions P are S aSa bSb S aSa aaa bab S aSa bSb bSb aba bbb S S bSb b S AS A bAS SA a G = { {S, a, b}, {a,b}, S, P} where the productions P are S S S aSa a G = { {S, a, b}, {a,b}, S, P} where the productions P are S BS B S aBS SB b 6-3 6-4 Theory of Computation Erasing productions An erasing production takes the form where length() > length () Context sensitive Grammars (type 1) A grammar is said to be context sensitive if none of its productions are erasing productions. With the exception of S Context free Grammars (type 2) Agrammar is said to be context free if all the productions are non erasing and of the form where length()=1. With the exception of S Regular Grammars(type 3) A grammar is said to be regular if all the productions are non erasing and of the form where length()=1 and is either of the form tN or t, where t is terminal and N is a non-terminal. With the exception of S Chomsky’s Heirarchy type 0 type 1 type 2 type 3 Phrase Structure Grammars Context Sensitive Grammars Context Free Grammars Regular Grammars Recognition Machines For all Regular Grammars it is possible to construct a Finite State Machine that will recognise it. For all Context Free Grammars it is possible to construct a Push Down Automata that will recognise it. For all Context Sensitive Grammars it is possible to construct a Linear Bounded Automata (a TM with Finite tape) that will recognise it. For all Phrase Structure Grammars it is possible to construct a Turing Machine that will recognise it. 6-4 6-5 Theory of Computation Backus-Naur Form (BNF) <identifier>::=<letter>|<identifier><letter >|<identifier><digit> <letter>::=a|b|c|…|z <digit>::=0|1|…|9 in BNF non-terminals are identified by < >, the production arrow becomes ::= and | stands for or. G=({I,L,D,a,b,c,…,z,0,1,…,9}, {a,b,c,…,z,0,1,…,9},I,P) where P is the following I L I IL I ID La Lb Lc Ld ... … D0 D1D2D3 … … … Lz … D9 Example Find a Grammar that generates L = {ww | w{0,1}*} 6-5