ELIMINATING UNIT PRODUCTIONS Definition. A unit production is a production that we wish to eliminate whose right-hand side consists of a single symbol We’ll abbreviate it as a “unit prod” Example B_EXPR → B_EXPR OR B_FACT I B_FACT B_FACT → B_FACT AND B_SEC | B_SEC B_SEC → NOT B_PRIM I B_PRIM B_PRIM → ARITH_EXPR = ARITH_EXPR In the above grammar the three productions whose lhs’s are shown in red are unit prods Unit productions such as those on the previous slide play no role in code generation. Eliminating them both reduces the size of the parser obtained and increases its speed A tree of unit productions is a graphical representation of those that occur. For instance, if the unit prods. in a grammar are A→B B→C C→D E→B F→D F → G then the tree involved would be: A E B C F D G In the (upside down) tree on the previous slide, the leaves are F and G. These are the symbols that occur as rhs’s of unit prods but do not occur as the lfs’s of any unit prod. Algorithm for Eliminating Unit Productions from a Parsing Machine 1. For each state S of the machine in turn (including the the new states added to the machine in step 2), do step 2 for each leaf x, if any, such that the x-successor of S has a unit reduction. When all these iterations of step 2 are complete, go on to steps 3, 4, 5. 2. Let x1,..,xn be the symbols (which will include x) for which actions are defined at S such that we can derive x from xi entirely via unit productions, and for 1<= i <= n, let the xi - successor of S be Ti. If any state R is, or at a previous stage of the algorithm has been, a combination of states T1,...,Tn, make R the new x-successor of S; otherwise setup a new state T as the x-successor of S, where T is a combination of states T1,...,Tn. 3. Delete all connections to states that represent transitions with respect to left-hand sides of unit productions. 4. Delete all state which at this stage are not reachable from State 0. 5. Replace every reduce action y → ..., where y is the left-hand side of a unit production by x → ..., where x is an arbitrarily selected leaf such that x is derivable from y entirely via unit productions. Example Consider the grammar E -> E + T | T T -> T * a | a The unit productions here are: E T a and the sole leaf is a Consider the grammar E→ E+T|T T→ T*a|a The unit productions here are E → T and T → a. a is a leaf, as it occurs as the rhs of T → a but does not occur as the lhs of a unit prod The parsing machine for this grammar was given in set 2, and is reproduced on the next slide There are unit productions at states 3 and 4. These states are successors of states 0 and 2 Step 1 of the algorithm, accordingly, asks us to perform step 2 as applied to states 0 and 2 Applying step 2 to state 0, we note that this state has an a, T, and E successor. These are all symbols from which one can derive “a” entirely through unit productions. For instance we can derive a from T via T→a and we can derive a from E via E → T, T → a So we combine the E, T and a successors of state 0 (states 4, 3 and 1), to form the new asuccessor of state 0. This new state has all the actions (other than unit prods.) defined at it that states 4, 3, and 1 have. For simplicity of exposition, we do not show state 3, the t-successor of state 0, which would still be present at this stage, and only deleted in step 4. a ACCEPT if -| 1,3,4 + a Applying step 2 to state 2, we make the new asuccessor of state 2 one which combines the actions (other than unit prods) of states 4 and 6. At this stage state 6 is still present, and only gets deleted in step 4. But for simplicity we have omitted it from the diagram. a + a ACCEPT if -| a 1,3,4 4,6 Applying step 3, then produces a + a ACCEPT if -| a 1,3,4 4,6 States 1 and 4 (as well as states 3 and 6 which were omitted from the previous diagrams) are at this stage not reachable from state 0. So, in applying step 4, they are deleted. The result then is: a + a ACCEPT if -| a 1,3,4 4,6 Finally, in step 5, we change the productions which at present have E or T as their rhs’s, by replacing this rhs by a. So the reduction T → T * a becomes a → T * a and E → E + T becomes a → E + T This produces: a + a ACCEPT if -| a 1,3,4 4,6 a→ a→ In using this parsing machine, whatever code was associated with the reduction T→T*a now becomes associated with a → T * a, and whatever code was associated with E→E+T now becomes associated with a→ E+t Class Exercise Employing the stacks Symbol List and State No. List, provide a parse of a + a * a using the parsing machine on the previous slide