CS 5300 Compiler Design Fall 2006 Structures in YAL – some thoughts I. Possible syntax additions to YAL for Structures ids idref struct type : : : : idref | idref LBRAC vallist RBRACK ; id | idref DOT ids ; STRUCTURE varlist ENDSTRUCT ; INTEGER | REAL | STRING | struct ; * note that this example is both left and right recursive and may not be LR(1) – no matter II. Possible semantic interpretation (method #1) The variable ‘object’ represents a specific variable instance. Example VARIABLES a : INTEGER ; object : STRUCTURE b : REAL ; c : INTEGER [5] ; ENDSTRUCT; d : REAL; ENDVARS ... read (object.b); object.c[a] := 99; ... { declaration section } { code section } A. Symbol table representation I’ll use ‘T’ for a structure type. The member variables of a structure are added to the SBT in the order they are encountered. They have no level. As such, they can only be found in a standard SBT search by finding the structure variable first, then searching down to find the appropriate member variable. A variable of type ‘T’ requires no frame storage – only its members! As such, it will always have the same offset as the first member variable. We’ll add another column to represent the byte size of each variable cell: ... 5 6 7 8 9 level name type offset cellsize dim 2 2 n/a n/a 2 a object b c d I T R I R 8 12 12 16 36 4 24 4 4 4 0 0 0 -->1:5 0 CS 5300 Compiler Design Fall 2006 B. Assignment and reference 1. object.b := x; {First, ‘object’ is found with a standard SBT search. Next, ‘b’ is found by searching down from ‘object’ for members of ‘object’. When ‘b’ is found, tuples are generated to copy 4 bytes of REAL from ‘x’ into ‘b’ at offset 12 in the current frame.} {First, ‘object’ is found. Next, tuples are generated to copy 24 bytes from ‘y’s offset into ‘object’.} 2. object := y; C. Parameter passing Suppose I introduce another data type ‘Ta’ representing a reference to a structure. The difficulty in using structure references is associated with verifying members and determining their proper offsets in a frame. In this method (#1), structures are actual variables rather than types. As such, each structure has its own members. You cannot declare two structures to be the same. They may have the same member types, but they are still distinct. Any structure argument could be used for any structure parameter! At run-time, the program cannot verify that an argument structure and a parameter structure are the same. We can partially solve this problem by including the structure portions of the SBT into the run-time program as a static table. A parameter of type ‘Ta’ would utilize two cells in an activation record; the normal address of the reference and the SBT entry of the corresponding argument structure. Declare a generic structure parameter that is a reference to any structure variable: struct : STRUCTURE varlist ENDSTRUCT | STRUCTPARAM; Example: PROCEDURE Myfunc ( z : STRUCTPARAM )...{ function declaration } ... a := Myfunc ( object ); { calling function } ... Now suppose we have an assignment to ‘z’; z.b := 3.14; To find where to store the 3.14, ‘z’ references the SBT slot of the actual structure. The compiler would add a boiler-plate or library function to the program to verify that ‘b’ is a valid member of the corresponding argument and determine ‘b’s offset in the appropriate frame. I might define a RESOLVE tuple to call this library function and calculate the offset. The function call then becomes; PUSH object 1 0 { reference to SBT slot 6 } CALL Myfunc 0 0 POP 0 0 -1 The above assignment statement might look like this: ASGN 3.14 0 -1 CS 5300 Compiler Design RESOLVE z IR -1 b -2 Fall 2006 -2 0 { store the address of z.b into –2} D. Discussion The structure portion of the SBT must be included into the run-time program. Think about it – each argument to a STRUCTPARAM may have a different offset for a member variable ‘b’. This approach is also slow since compiler work is put off until run-time. In other words, the RESOLVE tuple represents execution time. On the other hand, it is very versatile since a function can be written to accept an argument of any structure type. Also, structures have scope. III. Possible semantic interpretation (method #2 – more traditional) The variable ‘object’ represents a new type. As such, it cannot be used as a variable. Specific variables instances are then declared as type ‘object’: thisobject : object; A. Symbol table representation To make this clear, I’ll put my structure types into a separate Structures table: name object b c 1 2 3 type T R I offset 0 0 4 cellsize 24 4 4 dim 0 0 -->1:5 My SBT now looks like this: ... 5 6 7 level name type offset cellsize dim 2 2 2 a thisobject d I <object> R 8 12 36 4 24 4 0 0 0 Note the following; 1. a variable of type ‘object’ requires 24 bytes 2. within the Structures table, offsets refer to the start of the object (not the frame) B. Assignment and reference 1. thisobject.b := x; 3. thisobject := y; {First, ‘thisobject’ is found with a standard SBT search. Next, the offset for ‘b’ is found in the Structures table. The frame offset of ‘thisobject.b’ is then calculated as the frame offset of ‘thisobject’ plus the structure offset of ‘b’} {First, the compiler would verify that both ‘y’ and ‘thisobject’ are the same type (<object>). Next, 24 bytes CS 5300 Compiler Design Fall 2006 would be copied from ‘y’s offset into ‘thisobject’s offset.} C. Parameter passing Now, the parameter ‘z’ above can be a true pointer to the corresponding argument since they are the exact same type. Consider again the assignment to the above parameter ‘z’; z.b := 3.14; The compiler can now verify that ‘b’ is a member of ‘object’. It can also find the offset of member ‘b’ within an ‘object’ type. To determine where to store the 3.14 value, the tuples or assembly code would add the offset of member ‘b’ (0) to the pointer value of ‘z’. No symbol table lookup is required at runtime. D. Discussion Faster, less memory, good type checking, but less versatile. Structures have no scope. This is the model used in C++, Java, and Pascal.