UNIVERSITY OF ABERDEEN SESSION 2004-2005 Examination in CS3012 (Formal Languages and Compilers) Tuesday, 18th January, 2005 (9 a.m. – 11 a.m.) Answer THREE questions. Use a separate answer book for each question. Each question is worth 25 marks; the marks for each part of a question are shown in brackets. 1. (a) a The following is Finite State Automaton A: 2 1 a b (i) 3 a,b 4 b b b What are the first eight strings of L(A) if going by lexical order? (2) (ii) What are the first four strings of L(A) if going by dictionary order? (2) (iii) Convert the finite state automaton A to a regular expression. The algorithm is provided, but you are not obliged to use it if you do not wish to. begin convert A to a RFSA %trivial while Q\{i,f} is not empty do begin for each state p Q with more than one edge (p,ri,p) (i ≤ n) do replace all those edges by (p,r1+r2+...+rn,p) for each pair p,q Q with more than one edge (p,ri,q) (i ≤ n) do replace all those edges by (p, r1+r2+...+rn,q) select s Q for each pair p,q Q (p,q s) s.t. there are edges (p,r1,s) and (s,r2,q) do if there is an edge (s,r3,s) then add the edge (p,r1r3*r2,q) else add the edge (p,r1r2,q) remove all edges to or from s remove all states and edges with no path from i end return r, where E = {(i,r,f)} end (6) (iv) Write a regular grammar which defines the same language as finite state automaton A. (3) (b) If A and B are regular languages then so is A B. Prove that the converse is not true. (hint: all you need to do is give a counterexample, i.e. give an example of a regular language A and a non-regular language B such that A B is regular) (3) (c) Create a finite state machine which accepts strings of binary numbers which are divisible by five. (9) (25) PLEASE TURN OVER CS3012 (Formal Languages and Compilers) 2. (a) Tuesday, 18th January, 2005 Consider the grammar S -> aS | aSbS | (i) Show a derivation for aab (ii) Show that the grammar is ambiguous. (iii) Explain why ambiguity in a grammar may sometimes cause a problem. (iv) Give an unambiguous grammar defining the same language as the grammar above. (1) (2) (2) (4) (b) Using the grammar, parse table and LR(1) algorithm below, show that a*(a+b) is in the language of the grammar. "id" represents both a and b. 1) 2) 3) 4) 5) 6) 0 1 2 3 4 5 6 7 8 9 10 11 E -> E + T E -> T T -> T * F T -> F F -> ( E ) F -> id E 1 8 T 2 2 9 F 3 3 3 10 id S5 + * S6 R2 R4 S7 R4 R6 R6 S5 ( S4 ) # R2 R4 A R2 R4 R6 R6 S11 R1 R3 R5 R1 R3 R5 S4 S5 S5 S4 S4 S6 R1 R3 R5 S7 R3 R5 begin z := 0 w := input string concatenated with # loop q := last symbol in z t := first symbol in w if M[q, t] = Sn then begin remove t from front of w put n on end of z end else if M[q, t] = Rp then begin remove || symbols from end of z q := last symbol in z put M[q,B] on end of z end else if M[q, t] = A then return true else return false end end %top state in stack % row q, column t in table % p = B -> % top state in stack % new state % input L(G) % input L(G) (8) PLEASE TURN OVER 2 CS3012 (Formal Languages and Compilers) (c) Tuesday, 18th January, 2005 Let F be the set of all strings over {a,b} with equal numbers of a's and b's. Now we define four languages over {a,b} as follows: L1 is the language defined by the grammar rule S-> aSb | bSa | L2 is the language defined by the grammar rule S-> aSa | bSb | L3 is the language defined by the union L1 L2 L4 is the language defined by the grammar rule S-> aSb | bSa | aSa | bSb | Set F is a subset of one of these four languages. (i) For each of the three languages which F is not a subset of, give a counterexample; i.e. give a string which is in F but not in the language. (ii) For the language which F is a subset of, give a proof; i.e. prove that everything in F is also in the language. (3) (5) (25) PLEASE TURN OVER 3 CS3012 (Formal Languages and Compilers) 3. (a) Tuesday, 18th January, 2005 i) What is the layout in memory of struct Tpkt { int p: char q; double r;}; struct pkt B[2]; ii) What is the address of B[1].q with respect to B ? (5) (3) (b) Consider the lex script %include “y.tab.h” L[A-Za-z_] %% integer . {return INT_T;}; ERROR; Explain the meaning of each line. (c) (10) Write lex patterns to match i) a Java style comment - // to end of line ii) a C identifier (4) (d) Explain how virtual methods of classes may be implemented. (3) (25) 4. (a) For the yacc script %union{char * string; int intval;} %token STRUCT_T; %type <string> DET_T; %% Sequence : Statement Sequence { $$.string = cats ($1.string, $2.string); | ; Explain the meaning of each line. (b) How long may a Sequence be? (c) (12) (3) For the yacc pattern representing a for loop: Statement : FOR_T OB_T Substatement SEMI_T Bool SEMI_T Substatement CB_T Statement ; Where FOR_T, OB_T, SEMI_T and CB_T are integer codes representing ‘for’, ‘(‘, ’;’ ,’)’, and Bool leaves a value that can be tested by BZ. Show, in yacc notation, how to arrange the generated code, including labels and branches ( BZ for branch on zero, BA for branch always) i) in the order in which these statements appear ii) in an order that minimizes branches. (10) (25) END OF PAPER 4