CS 321 HM LANGUAGES & COMPILER DESIGN FINAL PSU NAME_________________ Final Exam, 300 Points General Rules: In-class Final. Total time 120 minutes. Open books and open notes. Work alone, no communication with other students during Midterm. You may leave room briefly, one student at a time. Put your name on every separate sheet. 1 (5). Define in words and via samples the exact meaning of “context free”, as in CFG languages. Context free means there are no two rules a and b, which together produce more partial derivations (strings) than the concatenation of a and b would. I.e. there are no productions: a : a_string b : b_string ab : new_string such that “new_string” would be different from “a_string” followed by “b_string”. 2 (10). The language L1 = L(G1) defined by grammar G1 is similar to the language L2 = L(G2) defined by G2. Which language has more strings? How many strings? Define in English the languages L1 and L2. How many terminals, nonterminals and meta-symbols do they each have? List each set. G1: s : ( s ) s | lambda G2: s : epsilon | s ( s ) (2) L1 = L2. So neither has more strings than the other (3) The # of strings is infinite. (2) L1 and L1 are both: the language of well-formed parentheses. (1) Nonterminal is s (1) meta-symbols are : and | (1) Terminals are (, ), and lambda 3 (10). Analyze the language defined by grammar G3. List the set of terminals, nonterminals, and metasymbols. List 4 sample strings in L(G3). Characterize the set of all strings in L(G3). G3: s : A b | B a a : A | A s | B a a b : B | B s | A b b (1) Set of Terminals = { A, B } (1) Set of Nonterminals = { s, a, b } (1) Set of Meta-Symbols = { :, |, and concatenation } (2) Sample strings = AB, BA, ABBA, AABBABABBBBAAAAB (5) L(G3) is the set of all strings with an equal number of A and B symbols 4 (40). Write a Recursive Descent Parser in C for the language L4 = L(G4) with start symbol e below. Functions scan(), Must_Be( Symbol ) and error() are provided. Global token, which is the next available token of enum type Token_tp, is also provided, i.e. you do not need to specify, you may just use them. What is the set of terminals? Give 3 sample strings. Does your parser handle multiple + and * tokens left- or right-associatively? Explain why. 1 Final CS 321 HM G4: LANGUAGES & COMPILER DESIGN FINAL e p : | | : | | PSU NAME_________________ p + e p * e p N I ( e ) (15) // assume: token, error(), Must_Be( Token_tp), scan() void p() { // p if ( N_SYM == token ) { scan(); // skip N } else if ( I_SYM == token ) scan(); // skip I } else if ( OPENP_SYM == token ) { scan(); // skip ( e(); Must_Be( CLOSEDP_SYM ); } else { error( “expected (, N, or I” ); } //end if } //end p (15) void e() { // e p(); if ( PLUS_SYM == Token ) { scan(); // skip ‘+’ e(); } else if ( MULT_SYM == Token ) { scan(); e(); } // end if } //end e (3) The set of terminal symbols = { (, ), +, *, N, I } (3) 3 sample strings are = N, N+I * N, N*N*N*N*N*I*I+I+I*N (4) Multiple * and + are handled right-recursively. 5 (5). Define, expand, explain: BNF, EBNF, derivation, partial derivation, parse tree. (1) BNF: Backus Naur Form (AKA Backus normal form), special syntax for meta-languages (1) EBNF: extended BNF, uses also meta-symbol { } for inclusion 0 or more times (1) Derivation: rules for expanding a nonterminal toward a final string of terminals (1) Partial derivation: a derivation that needs not (but may) yield final string of terminals (1) Parse tree: a graphical representation for a (partial) derivation, equivalent to derivation 6 (15). Explain via words or pictures, how your symbol table handles multi-dimensional arrays. The index type may be any discrete type (such as integer, enumeration, char etc.), and the element type may be any type, including array and record. Draw a plausible symbol table organization for the example below. Invent other samples, if needed to document your symbol table layout. type table_tp is array( 1..50, 1..40) of character; t1 : table_tp; 2 Final LANGUAGES & COMPILER DESIGN CS 321 HM FINAL t2 : array( t3 : array( t4 : array( t5 : array( begin . . . sym_tab[] inx Link 1 1 2 2 3 3 4 4 5 5 6 PSU NAME_________________ 1 .. 10 ) of table_tp; 1..50, 1..40) of character; 1..10 ) of array( 1..50 ) of array( 1..40 ) of character; 1..10, 1..50, 1..40 ) of character; Name tabe_tp T1 T2 T3 T4 T5 array_tab[] inx Low 1 1 1 2 1 3 1 4 1 5 1 6 1 7 1 8 1 9 1 10 1 11 class type_def object object object object object 50 40 10 50 40 10 50 40 10 50 40 high type an_array array an_array an_array an_array an_array inx_tp integer integer integer integer integer integer integer integer integer integer integer scope 1 1 1 1 1 1 el_type an_array character array an_array character an_array an_array character an_array an_array character address 0 2000 22000 24000 44000 el_size 40 1 2000 40 1 2000 40 1 2000 40 1 value - ref 1 1 3 4 6 9 size 2000 40 20000 2000 40 20000 2000 40 20000 2000 40 size 2000 2000 20000 2000 20000 20000 2 other ref - 1 5 7 8 10 11 - 7 (30). Implemenent in C the function Scan() which recognizes tokens. You can use predefined char object NextChar and void functions GetNextChar(), SkipSpace(), and Error(). When Scan() is called, NextChar either holds white space, which is swallowed via SkipSpace(), an erroneous character to be flagged by error(), or it holds the first char of the legal tokens + += ++ - -= -- with symbolic names { plus, plus_eq, plusplus, min, min_eq, minmin }. Upon completion of Scan(), NextChar holds the character to the right of the longest possible token. In case of EOF a fictitious EOF character is provided which is white space. Token of type TokenType will hold one of the enumerated values. (2) void scan() { // scan (13) if ( ‘+’ == NextChar ) { GetNextChar(); if ( ‘=’ == NextChar ) { GetNextChar(); Token = plus_eq; } else if ( ‘+’ ) { GetNextChar(); Token = plusplus; } else { Token = plus; } // end if (13) } else if ( ‘-’ == NextChar ) { GetNextChar(); if ( ‘=’ == NextChar ) { GetNextChar(); Token = min_eq; } else if ( ‘-’ ) { GetNextChar(); 3 Final CS 321 HM LANGUAGES & COMPILER DESIGN FINAL PSU NAME_________________ Token = minmin; } else { Token = min; } // end if (2) } else { error( “+ or – expected” ); } //end if } //end scan 8 (10) Explain the general rules for writing a recursive-descent parser for an LL(1) language. (2) start with a suitable grammar: no left-recursion, few lambda productions, no circular rules (2) Write a procedure nt() for each defined non-terminal nt (2) Ensure the first-set of each alternative of any non-terminal is sufficiently unique to determine via the current token, whether that alternative should be taken (2) for each terminal T used in the right-hand side, call the procedure must_be( T ) (2) for each non-terminal nt in the right hand side, call procedure nt() 9 (60) Write an interpreter for arithmetic expressions of type integer. Ignore overflow- and zero-divide errors (20). Dyadic operators (like + and *) have the same meaning as in conventional arithmetic. They also have the same, distinct precedences (15). Multiple operators of the same precedence associate (5), but the exponentiation operator ** does NOT (5). Prove that the associativities are handled correctly in your interpreter, as alluded to in the grammar (5). Show that 3 representative expressions cause the interpreter to compute the correct value (10). e f t p addop mulop : : : : : | f [ t { p [ NUM + | * | addop e ] mulop t } ** p ] | ( e ) / -- right-to-left -- left-to-right -- non-associative int e() { // e int val = f(); if ( token.cl == PLUS_SYM ) { scan(); val += e(); } else if ( token.cl == MINUS_SYM ) { val -= e(); } //end if return val; } //end f int f() { // f int val = t(); while ( token.cl == TIMES_SYM || token.cl = DIV_SYM ) { e_tok_tp op = token.cl; scan(); if ( op = TIMES_SYM ) } val *= t(); // ignore overflow } else { val /= t(); // ignore zero-divide } //end if } // end while return val; } //end f 4 Final CS 321 HM LANGUAGES & COMPILER DESIGN FINAL PSU NAME_________________ int t() { // t int val = p(); if ( token.cl == EXPO_SYM ) { scan(); val = power( val, p() ); } // end if return val; } //end t int p() { // p int val; if ( token.cl == NUM_SYM ) { val = token.val; scan(); } else if ( token = OPEN_SYM ) { scan(); val = e(); must_be( CLOSED_SYM ); } else { error( “Expected number or ‘(‘” ); } // end if return val; } //end p int main() { printf( “%d\n”, e() ); } //end main 10 (15). Explain the term strongly typed. Explain structural type equivalence. What are advantages of strong typing? What are consequences of weekly typed languages? (3) In a strongly-typed language 2 objects are type compatible, if they are defined by the same type names. (4) If 2 objects are compatible with one another by virtue of being structured similarly, the kind of equivalence is called structural. This is generally called weak typing. (4) safer programming, easier check for type equivalence. (4) easier to write programs, easier to make errors, harder to define, whether 2 types are equivalent. 11 (5). What problem do mutually recursive procedure calls pose for the symbol-table of a compiler? One of the 2 procedures will be referenced forward. Hence the symbol table needs to have a method of handling forward calls. 12 (5). Explain back-patching. When and how is back-patching necessary for program labels? (2) back-patching is the completion of a incompletely, tentatively generated machine instructions. (3) The problem, arises when program objects are used but not yet known. For example, a branch to a label that is not yet defined at the point for reference. 13 (10). What is alignment? Explain the disadvantage (cost) of alignment. Explain its advantage. Explain why holes are generated by the compiler’s data allocation method? How can alignment requirements become visible to the programmer? 5 Final LANGUAGES & COMPILER DESIGN CS 321 HM FINAL PSU NAME_________________ (2) requirement of an object’s address to be positioned at certain locations. (2) Disadvantage: wasted memory locations. (2) Advantage: fast access, due to minimal memory accesses. (2) compiler may have to allocate an unused location to align the next object (2) can be visible via compiler directives, align or NOT to align. 14 (10). A C structure is defined with 6 fields for a 32-bit, byte-addressable target, as shown below. Show a correct, space-efficient method for a C compiler to order the fields. Explain. struct test_str { char c1; char c2; char c3; } test_str_tp; int i1; int i2; char c4; (3) the only way in C is to preserve the order as is, no re-arrangement allowed by language rules. (4) This, is 4 bytes c1, 4 bytes i1, 4 bytes c2, 4 bytes i2, 2 bytes for c3 and c4, plus (3) possibly a hole of 2 bytes, unless another short object follows. 15 (40). Draw a plausible symbol table, block table, and array table situation for the following C-like declarations, taken from a program for a 32-bit, byte-addressable, 4-byte per work target architecture. Assume the fist available address in the data area (.data) is 100010, and 200010 in the code area (.text). Formal parameters start at offset -8 from the base pointer, moving toward lower addresses, while locals start at offset 4 from the base pointer, moving toward higher addresses. int i; const float PI = typedef char[ 80 typedef line_tp[ line_tp line; page_tp page; int foo( char a, { // foo int nest; } //end foo char glob = ‘x’; 3.1424; ] line_tp; 50 ] page_tp; int & b ) symtab[] (3) point for each correct row inx 1 2 3 4 5 6 7 8 9 10 11 link 1 2 3 4 5 6 _ 8 _ 6 name i PI line_tp page_tp line page foo a b nest glob class object const type_def type_def object object function val_param ref_param object object type int float an_array array array array integer character integer integer character scope 1 1 1 1 1 1 1 2 2 2 1 address 1000 1004 1084 2000 -8 -12 4 5084 value 3.1424 ‘x’ ref 1 2 9 - size 4 4 80 4000 80 4000 xxx 1 4 4 1 array_tab[] (3) points for each correct row 6 Final LANGUAGES & COMPILER DESIGN CS 321 HM FINAL inx 1 2 low high 0 0 inx_tp integer integer 79 49 el_tp character array el_size 1 80 PSU NAME_________________ size 80 4000 ref 1 block_tab[] 1 point for correct entries in 2 rows inx first Last 1 1 11 2 8 10 16 (10). Given the grammar G5 below with start symbol e, draw the parse tree for the string NUM * NUM ^ ID G5: e f t p : : : : f [ t [ p [ NUM + * ^ | e f p ( ] ] ] e ) | ID 17 (10). Generate pseudo assembly code for the following source program segment, in which all objects are integer types. i := 199; . . . if i > 6 then i++; else j := i + 12 end if; else1: move st . . . ld cmp ble inc br ld add st r1, #199 r1, [i] r1, [i] r1, 6 else1 [i] endif1 r1, [i] r1, #12 r1, [j] -- each correct instruction 1 point -- use flags -- also OK: load r1, [i]; add r1, #1; st r1, [i] end_if1: 18 (10). The declaration part of a C program for a byte-addressable, 32-bit integer, four-byte per word target defines a structure type that is 5 bytes long. Explain a plausible layout of struct object s1 in the data area. Explain a plausible layout of array s2[] in the data area. Document your assumptions, and explain them. What is the total size of s2[]? 7 Final CS 321 HM LANGUAGES & COMPILER DESIGN FINAL PSU NAME_________________ typdef struct data_tp { char c; int i; } s_data_tp; s_data_tp s1; s_data_tp s2[ 100 ]; (2) assume that 4-byte aligned access is faster than unaligned access (2) s1 consumes 5 bytes, no more (2) if another long object follows, then a3-byte hole may be created (2) the array s2 uses 3 byte holes after every element (2) s2[] is 800 – 3 bytes long. The hole MAY be caused by the next element. 8 Final