Synthesis and the Parse Tree © Allan C. Milne Abertay University v14.6.16 Agenda. • Synthesis. • Parse Trees. • An Example. • Representing The Tree. Synthesis. • The final phase of the compiler is to generate an artefact. • Requires synthesis of contextual information on the syntactic structure of the program and the meaning of user-defined names. • This information is exposed to this synthesis phase via – the parse tree for the syntactic structure; and – the symbol table for the identifier context. What do We Know So Far? • How to perform parsing. • How to create a symbol table. • Sometimes synthesis can be performed as the parse proceeds. • If so, then no parse tree is required and appropriate artefact generation functions can be called directly from the action code associated with productions in the Yacc script. What Might We Need To Know? • However often synthesis requires knowledge of the entire program structure before artefact generation can proceed. – • This structure is represented by the parse tree. This latter approach requires us to know – – – how to represent a parse tree; how to build the tree; and how to then process the tree. The Parse Tree. • This is a tree data structure representing the syntactic structure of the input program. • It effectively represents the derivation sequence of the program. • The root of the tree is the starter symbol. • • • The branches are the elements of the production being applied. Non-terminal elements have, in turn, their own branches. The leaves are the terminal tokens of the source program. ( ant, dog ) <AnimalList> ( <MoreAnimals> <Animal> tANT , <Animal> tDOG ) ( ant, dog ) •In representing a parse tree we often omit the terminal symbols that are ‘noise’ (syntactic sugar). <AnimalList> ( <MoreAnimals> <Animal> tAnt , <Animal> tDOG ) GenVal Examples. • For GenVal, the parse tree does not require to reflect the <Declarations> part of a script. – The <Declarations> part processing constructs the symbol table as the parse proceeds. – The compiler does not need to refer back to this parse, only to the symbol table. – We can therefore start the tree from <StatementSequence>. • Terminal keywords and punctuation will not be represented in the parse tree except where significant. Generate 2 integer values from 0 to limit*6 <StatementSequence> <Statement> <Generator> <Expression> <Type> <Range> tNUMBER (2) tINTEGER <Expression> tNUMBER (0) <Expression> tIDENTIFIER “Limit” * <Expression> <Expression> tNUMBER (6) Representing The Parse Tree. • Use a child/sibling pointer model of the tree. • A node represents a non-terminal or terminal symbol. • The child pointer of a non-terminal node points to the first node of a list of nodes representing the elements making up the production of the non-terminal. • The sibling pointer of any node points to the node representing the next element of the production being applied. The Parse Tree Node. • Represented by a struct containing – – the type of the node; the value associated with the node (only valid for terminal nodes); The pointers associated with the child/sibling tree structure; - a pointer to the start node of the sub-branch defining the structure (for a nonterminal) - a pointer to the next sibling node of the production for the parent node. struct treeNode { int type; union data { double dblValue; char *strValue; } value; struct treeNode *structure; struct treeNode *next; }; typedef struct treeNode parseNode; So We Have … Type: NTStatementSequence Value: Structure Next : null NTStatement Value: Structure Next : null NTGenerator Value: Structure Next : null NTExpression Value: Structure Next NTType Value: Structure Next NTRange Value: Structure Next : null … continued NTExpression Value: Structure Next NTNumber Value: 2 Structure : null Next : null NTType Value: Structure Next NTInteger Value: Structure : null Next : null NTRange Value: Structure Next : null … exercise for the student