Adding structures to YAL

advertisement
CS 5300
Compiler Design
Fall 2006
Structures in YAL – some thoughts
I.
Possible syntax additions to YAL for Structures
ids
idref
struct
type
:
:
:
:
idref | idref LBRAC vallist RBRACK ;
id | idref DOT ids ;
STRUCTURE varlist ENDSTRUCT ;
INTEGER | REAL | STRING | struct ;
* note that this example is both left and right recursive and may not be LR(1) – no matter
II.
Possible semantic interpretation (method #1)
The variable ‘object’ represents a specific variable instance.
Example
VARIABLES
a : INTEGER ;
object : STRUCTURE
b : REAL ;
c : INTEGER [5] ;
ENDSTRUCT;
d : REAL;
ENDVARS
...
read (object.b);
object.c[a] := 99;
...
{ declaration section }
{ code section }
A. Symbol table representation
I’ll use ‘T’ for a structure type. The member variables of a structure are added to the
SBT in the order they are encountered. They have no level. As such, they can only be
found in a standard SBT search by finding the structure variable first, then searching
down to find the appropriate member variable. A variable of type ‘T’ requires no
frame storage – only its members! As such, it will always have the same offset as the
first member variable. We’ll add another column to represent the byte size of each
variable cell:
...
5
6
7
8
9
level
name
type
offset
cellsize
dim
2
2
n/a
n/a
2
a
object
b
c
d
I
T
R
I
R
8
12
12
16
36
4
24
4
4
4
0
0
0
-->1:5
0
CS 5300
Compiler Design
Fall 2006
B. Assignment and reference
1. object.b := x;
{First, ‘object’ is found with a standard SBT search.
Next, ‘b’ is found by searching down from ‘object’ for
members of ‘object’. When ‘b’ is found, tuples are
generated to copy 4 bytes of REAL from ‘x’ into ‘b’ at
offset 12 in the current frame.}
{First, ‘object’ is found. Next, tuples are generated to
copy 24 bytes from ‘y’s offset into ‘object’.}
2. object := y;
C. Parameter passing
Suppose I introduce another data type ‘Ta’ representing a reference to a structure. The
difficulty in using structure references is associated with verifying members and
determining their proper offsets in a frame. In this method (#1), structures are actual
variables rather than types. As such, each structure has its own members. You cannot
declare two structures to be the same. They may have the same member types, but
they are still distinct. Any structure argument could be used for any structure
parameter! At run-time, the program cannot verify that an argument structure and a
parameter structure are the same.
We can partially solve this problem by including the structure portions of the SBT
into the run-time program as a static table. A parameter of type ‘Ta’ would utilize two
cells in an activation record; the normal address of the reference and the SBT entry of
the corresponding argument structure.
Declare a generic structure parameter that is a reference to any structure variable:
struct
:
STRUCTURE varlist ENDSTRUCT | STRUCTPARAM;
Example: PROCEDURE Myfunc ( z : STRUCTPARAM )...{ function declaration }
...
a := Myfunc ( object );
{ calling function }
...
Now suppose we have an assignment to ‘z’;
z.b := 3.14;
To find where to store the 3.14, ‘z’ references the SBT slot of the actual structure. The
compiler would add a boiler-plate or library function to the program to verify that ‘b’
is a valid member of the corresponding argument and determine ‘b’s offset in the
appropriate frame. I might define a RESOLVE tuple to call this library function and
calculate the offset. The function call then becomes;
PUSH
object
1
0
{ reference to SBT slot 6 }
CALL Myfunc
0
0
POP
0
0
-1
The above assignment statement might look like this:
ASGN
3.14
0
-1
CS 5300
Compiler Design
RESOLVE z
IR
-1
b
-2
Fall 2006
-2
0
{ store the address of z.b into –2}
D. Discussion
The structure portion of the SBT must be included into the run-time program. Think
about it – each argument to a STRUCTPARAM may have a different offset for a
member variable ‘b’.
This approach is also slow since compiler work is put off until run-time. In other
words, the RESOLVE tuple represents execution time. On the other hand, it is very
versatile since a function can be written to accept an argument of any structure type.
Also, structures have scope.
III.
Possible semantic interpretation (method #2 – more traditional)
The variable ‘object’ represents a new type. As such, it cannot be used as a variable.
Specific variables instances are then declared as type ‘object’:
thisobject : object;
A. Symbol table representation
To make this clear, I’ll put my structure types into a separate Structures table:
name
object
b
c
1
2
3
type
T
R
I
offset
0
0
4
cellsize
24
4
4
dim
0
0
-->1:5
My SBT now looks like this:
...
5
6
7
level
name
type
offset
cellsize
dim
2
2
2
a
thisobject
d
I
<object>
R
8
12
36
4
24
4
0
0
0
Note the following;
1. a variable of type ‘object’ requires 24 bytes
2. within the Structures table, offsets refer to the start of the object (not the frame)
B. Assignment and reference
1. thisobject.b := x;
3. thisobject := y;
{First, ‘thisobject’ is found with a standard SBT search.
Next, the offset for ‘b’ is found in the Structures table.
The frame offset of ‘thisobject.b’ is then calculated as the
frame offset of ‘thisobject’ plus the structure offset of
‘b’}
{First, the compiler would verify that both ‘y’ and
‘thisobject’ are the same type (<object>). Next, 24 bytes
CS 5300
Compiler Design
Fall 2006
would be copied from ‘y’s offset into ‘thisobject’s
offset.}
C. Parameter passing
Now, the parameter ‘z’ above can be a true pointer to the corresponding argument
since they are the exact same type. Consider again the assignment to the above
parameter ‘z’;
z.b := 3.14;
The compiler can now verify that ‘b’ is a member of ‘object’. It can also find the
offset of member ‘b’ within an ‘object’ type. To determine where to store the 3.14
value, the tuples or assembly code would add the offset of member ‘b’ (0) to the
pointer value of ‘z’. No symbol table lookup is required at runtime.
D. Discussion
Faster, less memory, good type checking, but less versatile. Structures have no scope.
This is the model used in C++, Java, and Pascal.
Download