Final Exam Solution

advertisement
CS 321
HM
LANGUAGES & COMPILER DESIGN
FINAL
PSU
NAME_________________
Final Exam, 300 Points
General Rules: In-class Final. Total time 120 minutes. Open books and open notes. Work alone, no
communication with other students during Midterm. You may leave room briefly, one student at a
time. Put your name on every separate sheet.
1 (5). Define in words and via samples the exact meaning of “context free”, as in CFG languages.
Context free means there are no two rules a and b, which together produce more partial
derivations (strings) than the concatenation of a and b would. I.e. there are no productions:
a
: a_string
b
: b_string
ab
: new_string
such that “new_string” would be different from “a_string” followed by “b_string”.
2 (10). The language L1 = L(G1) defined by grammar G1 is similar to the language L2 = L(G2)
defined by G2. Which language has more strings? How many strings? Define in English the languages
L1 and L2. How many terminals, nonterminals and meta-symbols do they each have? List each set.
G1: s : ( s ) s | lambda
G2: s : epsilon | s ( s )
(2) L1 = L2. So neither has more strings than the other
(3) The # of strings is infinite.
(2) L1 and L1 are both: the language of well-formed parentheses.
(1) Nonterminal is s
(1) meta-symbols are : and |
(1) Terminals are (, ), and lambda
3 (10). Analyze the language defined by grammar G3. List the set of terminals, nonterminals, and metasymbols. List 4 sample strings in L(G3). Characterize the set of all strings in L(G3).
G3: s
: A b | B a
a
: A | A s | B a a
b
: B | B s | A b b
(1) Set of Terminals
= { A, B }
(1) Set of Nonterminals
= { s, a, b }
(1) Set of Meta-Symbols
= { :, |, and concatenation }
(2) Sample strings
= AB, BA, ABBA, AABBABABBBBAAAAB
(5) L(G3) is the set of all strings with an equal number of A and B symbols
4 (40). Write a Recursive Descent Parser in C for the language L4 = L(G4) with start symbol e below.
Functions scan(), Must_Be( Symbol ) and error() are provided. Global token, which is the next
available token of enum type Token_tp, is also provided, i.e. you do not need to specify, you may just
use them. What is the set of terminals? Give 3 sample strings. Does your parser handle multiple + and
* tokens left- or right-associatively? Explain why.
1
Final
CS 321
HM
G4:
LANGUAGES & COMPILER DESIGN
FINAL
e
p
:
|
|
:
|
|
PSU
NAME_________________
p + e
p * e
p
N
I
( e )
(15) // assume: token, error(), Must_Be( Token_tp), scan()
void p()
{ // p
if ( N_SYM == token ) {
scan();
// skip N
} else if ( I_SYM == token )
scan();
// skip I
} else if ( OPENP_SYM == token ) {
scan();
// skip (
e();
Must_Be( CLOSEDP_SYM );
} else {
error( “expected (, N, or I” );
} //end if
} //end p
(15) void e()
{ // e
p();
if ( PLUS_SYM == Token ) {
scan();
// skip ‘+’
e();
} else if ( MULT_SYM == Token ) {
scan();
e();
} // end if
} //end e
(3) The set of terminal symbols
= { (, ), +, *, N, I }
(3) 3 sample strings are
= N, N+I * N, N*N*N*N*N*I*I+I+I*N
(4) Multiple * and + are handled right-recursively.
5 (5). Define, expand, explain: BNF, EBNF, derivation, partial derivation, parse tree.
(1) BNF:
Backus Naur Form (AKA Backus normal form), special syntax for meta-languages
(1) EBNF:
extended BNF, uses also meta-symbol { } for inclusion 0 or more times
(1) Derivation: rules for expanding a nonterminal toward a final string of terminals
(1) Partial derivation: a derivation that needs not (but may) yield final string of terminals
(1) Parse tree: a graphical representation for a (partial) derivation, equivalent to derivation
6 (15). Explain via words or pictures, how your symbol table handles multi-dimensional arrays. The
index type may be any discrete type (such as integer, enumeration, char etc.), and the element type may
be any type, including array and record. Draw a plausible symbol table organization for the example
below. Invent other samples, if needed to document your symbol table layout.
type table_tp is array( 1..50, 1..40) of character;
t1 : table_tp;
2
Final
LANGUAGES & COMPILER DESIGN
CS 321
HM
FINAL
t2 : array(
t3 : array(
t4 : array(
t5 : array(
begin . . .
sym_tab[]
inx Link
1
1
2
2
3
3
4
4
5
5
6
PSU
NAME_________________
1 .. 10 ) of table_tp;
1..50, 1..40) of character;
1..10 ) of array( 1..50 ) of array( 1..40 ) of character;
1..10, 1..50, 1..40 ) of character;
Name
tabe_tp
T1
T2
T3
T4
T5
array_tab[]
inx
Low
1
1
1
2
1
3
1
4
1
5
1
6
1
7
1
8
1
9
1
10
1
11
class
type_def
object
object
object
object
object
50
40
10
50
40
10
50
40
10
50
40
high
type
an_array
array
an_array
an_array
an_array
an_array
inx_tp
integer
integer
integer
integer
integer
integer
integer
integer
integer
integer
integer
scope
1
1
1
1
1
1
el_type
an_array
character
array
an_array
character
an_array
an_array
character
an_array
an_array
character
address
0
2000
22000
24000
44000
el_size
40
1
2000
40
1
2000
40
1
2000
40
1
value
-
ref
1
1
3
4
6
9
size
2000
40
20000
2000
40
20000
2000
40
20000
2000
40
size
2000
2000
20000
2000
20000
20000
2
other
ref
-
1
5
7
8
10
11
-
7 (30). Implemenent in C the function Scan() which recognizes tokens. You can use predefined char
object NextChar and void functions GetNextChar(), SkipSpace(), and Error(). When Scan() is called,
NextChar either holds white space, which is swallowed via SkipSpace(), an erroneous character to be
flagged by error(), or it holds the first char of the legal tokens + += ++ - -= -- with symbolic names
{ plus, plus_eq, plusplus, min, min_eq, minmin }. Upon completion of Scan(), NextChar
holds the character to the right of the longest possible token. In case of EOF a fictitious EOF character
is provided which is white space. Token of type TokenType will hold one of the enumerated values.
(2) void scan()
{ // scan
(13) if ( ‘+’ == NextChar ) {
GetNextChar();
if ( ‘=’ == NextChar ) {
GetNextChar();
Token = plus_eq;
} else if ( ‘+’ ) {
GetNextChar();
Token = plusplus;
} else {
Token = plus;
} // end if
(13) } else if ( ‘-’ == NextChar ) {
GetNextChar();
if ( ‘=’ == NextChar ) {
GetNextChar();
Token = min_eq;
} else if ( ‘-’ ) {
GetNextChar();
3
Final
CS 321
HM
LANGUAGES & COMPILER DESIGN
FINAL
PSU
NAME_________________
Token = minmin;
} else {
Token = min;
} // end if
(2) } else {
error( “+ or – expected” );
} //end if
} //end scan
8 (10) Explain the general rules for writing a recursive-descent parser for an LL(1) language.
(2) start with a suitable grammar: no left-recursion, few lambda productions, no circular rules
(2) Write a procedure nt() for each defined non-terminal nt
(2) Ensure the first-set of each alternative of any non-terminal is sufficiently unique to determine
via the current token, whether that alternative should be taken
(2) for each terminal T used in the right-hand side, call the procedure must_be( T )
(2) for each non-terminal nt in the right hand side, call procedure nt()
9 (60) Write an interpreter for arithmetic expressions of type integer. Ignore overflow- and zero-divide
errors (20). Dyadic operators (like + and *) have the same meaning as in conventional arithmetic. They
also have the same, distinct precedences (15). Multiple operators of the same precedence associate (5),
but the exponentiation operator ** does NOT (5). Prove that the associativities are handled correctly in
your interpreter, as alluded to in the grammar (5). Show that 3 representative expressions cause the
interpreter to compute the correct value (10).
e
f
t
p
addop
mulop
:
:
:
:
:
|
f [
t {
p [
NUM
+ |
* |
addop e ]
mulop t }
** p ]
| ( e )
/
-- right-to-left
-- left-to-right
-- non-associative
int e()
{ // e
int val = f();
if ( token.cl == PLUS_SYM ) {
scan();
val += e();
} else if ( token.cl == MINUS_SYM ) {
val -= e();
} //end if
return val;
} //end f
int f()
{ // f
int val = t();
while ( token.cl == TIMES_SYM || token.cl = DIV_SYM ) {
e_tok_tp op = token.cl;
scan();
if ( op = TIMES_SYM ) }
val *= t(); // ignore overflow
} else {
val /= t(); // ignore zero-divide
} //end if
} // end while
return val;
} //end f
4
Final
CS 321
HM
LANGUAGES & COMPILER DESIGN
FINAL
PSU
NAME_________________
int t()
{ // t
int val = p();
if ( token.cl == EXPO_SYM ) {
scan();
val = power( val, p() );
} // end if
return val;
} //end t
int p()
{ // p
int val;
if ( token.cl == NUM_SYM ) {
val = token.val;
scan();
} else if ( token = OPEN_SYM ) {
scan();
val = e();
must_be( CLOSED_SYM );
} else {
error( “Expected number or ‘(‘” );
} // end if
return val;
} //end p
int main() { printf( “%d\n”, e() ); } //end main
10 (15). Explain the term strongly typed. Explain structural type equivalence. What are advantages of
strong typing? What are consequences of weekly typed languages?
(3) In a strongly-typed language 2 objects are type compatible, if they are defined by the same
type names.
(4) If 2 objects are compatible with one another by virtue of being structured similarly, the kind
of equivalence is called structural. This is generally called weak typing.
(4) safer programming, easier check for type equivalence.
(4) easier to write programs, easier to make errors, harder to define, whether 2 types are
equivalent.
11 (5). What problem do mutually recursive procedure calls pose for the symbol-table of a compiler?
One of the 2 procedures will be referenced forward. Hence the symbol table needs to have a
method of handling forward calls.
12 (5). Explain back-patching. When and how is back-patching necessary for program labels?
(2) back-patching is the completion of a incompletely, tentatively generated machine instructions.
(3) The problem, arises when program objects are used but not yet known. For example, a
branch to a label that is not yet defined at the point for reference.
13 (10). What is alignment? Explain the disadvantage (cost) of alignment. Explain its advantage.
Explain why holes are generated by the compiler’s data allocation method? How can alignment
requirements become visible to the programmer?
5
Final
LANGUAGES & COMPILER DESIGN
CS 321
HM
FINAL
PSU
NAME_________________
(2) requirement of an object’s address to be positioned at certain locations.
(2) Disadvantage: wasted memory locations.
(2) Advantage: fast access, due to minimal memory accesses.
(2) compiler may have to allocate an unused location to align the next object
(2) can be visible via compiler directives, align or NOT to align.
14 (10). A C structure is defined with 6 fields for a 32-bit, byte-addressable target, as shown below.
Show a correct, space-efficient method for a C compiler to order the fields. Explain.
struct test_str {
char c1;
char c2;
char c3;
} test_str_tp;
int i1;
int i2;
char c4;
(3) the only way in C is to preserve the order as is, no re-arrangement allowed by language rules.
(4) This, is 4 bytes c1, 4 bytes i1, 4 bytes c2, 4 bytes i2, 2 bytes for c3 and c4, plus
(3) possibly a hole of 2 bytes, unless another short object follows.
15 (40). Draw a plausible symbol table, block table, and array table situation for the following C-like
declarations, taken from a program for a 32-bit, byte-addressable, 4-byte per work target architecture.
Assume the fist available address in the data area (.data) is 100010, and 200010 in the code area
(.text). Formal parameters start at offset -8 from the base pointer, moving toward lower addresses,
while locals start at offset 4 from the base pointer, moving toward higher addresses.
int i;
const float PI =
typedef char[ 80
typedef line_tp[
line_tp line;
page_tp page;
int foo( char a,
{ // foo
int nest;
} //end foo
char glob = ‘x’;
3.1424;
] line_tp;
50 ] page_tp;
int & b )
symtab[] (3) point for each correct row
inx
1
2
3
4
5
6
7
8
9
10
11
link
1
2
3
4
5
6
_
8
_
6
name
i
PI
line_tp
page_tp
line
page
foo
a
b
nest
glob
class
object
const
type_def
type_def
object
object
function
val_param
ref_param
object
object
type
int
float
an_array
array
array
array
integer
character
integer
integer
character
scope
1
1
1
1
1
1
1
2
2
2
1
address
1000
1004
1084
2000
-8
-12
4
5084
value
3.1424
‘x’
ref
1
2
9
-
size
4
4
80
4000
80
4000
xxx
1
4
4
1
array_tab[] (3) points for each correct row
6
Final
LANGUAGES & COMPILER DESIGN
CS 321
HM
FINAL
inx
1
2
low
high
0
0
inx_tp
integer
integer
79
49
el_tp
character
array
el_size
1
80
PSU
NAME_________________
size
80
4000
ref
1
block_tab[] 1 point for correct entries in 2 rows
inx
first
Last
1
1
11
2
8
10
16 (10). Given the grammar G5 below with start symbol e, draw the parse tree for the string NUM *
NUM ^ ID
G5:
e
f
t
p
:
:
:
:
f [
t [
p [
NUM
+
*
^
|
e
f
p
(
]
]
]
e ) | ID
17 (10). Generate pseudo assembly code for the following source program segment, in which all
objects are integer types.
i := 199;
. . .
if i > 6 then
i++;
else
j := i + 12
end if;
else1:
move
st
. . .
ld
cmp
ble
inc
br
ld
add
st
r1, #199
r1, [i]
r1, [i]
r1, 6
else1
[i]
endif1
r1, [i]
r1, #12
r1, [j]
-- each correct instruction 1 point
-- use flags
-- also OK: load r1, [i]; add r1, #1; st r1, [i]
end_if1:
18 (10). The declaration part of a C program for a byte-addressable, 32-bit integer, four-byte per word
target defines a structure type that is 5 bytes long. Explain a plausible layout of struct object s1 in the
data area. Explain a plausible layout of array s2[] in the data area. Document your assumptions, and
explain them. What is the total size of s2[]?
7
Final
CS 321
HM
LANGUAGES & COMPILER DESIGN
FINAL
PSU
NAME_________________
typdef struct data_tp
{
char c;
int
i;
} s_data_tp;
s_data_tp s1;
s_data_tp s2[ 100 ];
(2) assume that 4-byte aligned access is faster than unaligned access
(2) s1 consumes 5 bytes, no more
(2) if another long object follows, then a3-byte hole may be created
(2) the array s2 uses 3 byte holes after every element
(2) s2[] is 800 – 3 bytes long. The hole MAY be caused by the next element.
8
Final
Download