SHORT-CIRCUIT EVALUATION OF BOOLEAN EXPRESSIONS Let's consider this in connection with generating code for the IF statement. A suitable grammar for this statement is: IF_STATEMENT → SIMPLE_IF | SIMPLE_IF_ELSE_BLOCK The production for the if statement without ELSE SIMPLE_IF → IF_PREFIX TRUE_BLOCK ENDIF Productions for the if statement including ELSE SIMPLE_IF_ELSE_BLOCK → SIMPLE_IF_ELSE FALSE_BLOCK ENDIF SIMPLE_IF_ELSE → IF_PREFIX TRUE_BLOCK ELSE Productions employed for both cases IF_PREFIX → IF BOOL_EXPRESSION BOOL_EXPRESSION → BOOL_EXPRESSION OR BOOL_FACTOR I BOOL_FACTOR BOOL_FACTOR → BOOL_FACTOR AND BOOL_SECONDARY | BOOL SECONDARY BOOL_SECONDARY → NOT BOOL_PRIMARY I BOOL_PRIMARY BOOL_PRIMARY → ARITH_EXPRESSION = ARITH_EXPRESSION I ARITH_EXPRESSION <> ARITH_EXPRESSION I ARITH_EXPRESSION > ARITH_EXPRESSION I ARITH_EXPRESSION >= ARITH_EXPRESSION I ARITH_EXPRESSION < ARITH_EXPRESSION I ARITH_EXPRESSION <= ARITH_EXPRESSION I ( BOOL_EXPRESSION ) For the purpose of readibility let's employ the macros: #define head_of_false_chain kind_of_Location #define head_of_true_chain location The code required for the various productions is described below BOOL_PRIMARY → ARITH_EXPRESSION = ARITH_EXPRESSION For productions such as these we generate code to compare the arithmetic expressions, followed by a conditional jump instruction which will be executed if the comparison involved is true. In the present case we would generate in code_s: JE 0000 coded as 0f 84 0000 and we would set $$.head_of_true_chain = code_i - 2 and $$.head_of_false_chain = 0 As we carry out the present algorithm, for each boolean expression that we generate code for, we will produce an entry on symbol stack whose head_of_true chain points to a chain of (the offset fields within) conditional branch instructions that branch if the boolean expression as a whole is true, and whose head_of_false_chain points to a similar chain for when the boolean expression is false. In the code above, there are no branches for if the comparison is false, so $$.head_of_false_chain is 0. There is a single branch for if the condition is true. Since it's the only one in the chain, it is itself the end of the chain. So the offset field of the JE instruction is 0. $$.head_of_true chain here points to this offset field. BOOL SECONDARY → NOT BOOL PRIMARY Clearly all the code needs to do here is to set $$.head_of_false_chain to $2.head_of_true_chain, and $$.head_of_true_chain to $2.head_of_false_chain. BOOL_FACTOR → BOOL_FACTOR AND BOOL_SECONDARY See steps 3 and 5 in the example below Consider the conditional branch right at the end of the code for $1 (ie. BOOL_FACTOR). If this were a branch for if $1 is true, then it would have to branch to the start of the code for evaluating $3. On the other hand if it were a branch for if $1 is false, then it would constitute a branch for if $$ was false, and the code would here avoid having to evaluate $3. So the first thing to do is to ensure that the conditional branch at the end of the code for $1 is a branch for if $1 is false. You can tell which it is by checking which of $1.head_of_false_chain or $1.head_of_true_chain is larger. If the conditional branch instruction is a branch for if $1 is true, then you need to change it to a branch for the opposite condition to make it a branch for if $1 is false. It so happens that this can be done in all cases by merely reversing the low_order bit of the opcode involved (i.e. of the byte following the 0f byte). This can be done by means of an exclusive-or with 1. E.g. if x is the value of $1.head_of_true_chain, one could use code such as: code_s[x - 1] = code_s[x-1] ^ 0x'01' You will now need to fixup the false chain, since it has been given a new head, and amend the values of $1.head_of_true_chain and $1.head_of_false_chain accordingly. After all this is done, you can make all the members of the true chain for $1 (conditional) branches to the start of the code for $3. If y is the larger value of $1.head_of_false_chain and $1.head_of_true_chain, then the code for $3 starts at code_s[y+2]. Now join together the false chains for $3 and $1, since these all are branches for if $$ is false. Finally set $$.head_of_true_chain to $3.head_of_true_chain, and $$.head_of_false_chain to the head of the combined false chains referred to above. Be careful to allow for the case where the false chain for $3 is empty. BOOL EXPRESSION → BOOL EXPRESSION OR BOOL FACTOR See steps 7 and 9 in the example below. The code for this production follows similar principles to that described above for AND. In this case, in order that the last conditional branch instruction in the code for $1 be a branch that, if taken, avoids having to evaluate $3, it needs to be a branch for if $1 is true. If it isn't then you should change it by the means described above, and adjust the true chain for $1, and the values of $1.head_of_false_chain and $1.head_of_true_chain accordingly. Make,the members of the false chain for $1 into branches to the start of the code for $3. Then join together the true chains for $3 and $1, since these all are branches for if $$ is true. Finally set $$.head_of_false_chain to $3.head_of_false_chain, and $$.head_of_true_chain to the head of the combined true chains referred to above. Again be careful to allow for the case where the true chain for $3 is empty. IF_PREFIX → IF BOOLEAN_EXPRESSION As in the code for AND, ensure that the last branch in the code for $2 is a branch if $2 is false, as this is the case where we do not evaluate the block of statements following. Adjust the false chain for $2 and $2.head_of_false_chain and $2.head _of_true_chain accordingly. Now make all the members of the true chain pointed to by $2.head_of_true_chain into branches to code_i. Set $$.head_of_false_chain to $2.head_of_false_chain. Code for the IF statement without ELSE SIMPLE_IF → IF_PREFIX TRUE_BLOCK ENDIF Make all the members of the false chain pointed to by $1.head_of_false_chain into branches to code i. Code for the IF statement including ELSE SIMPLE_IF _ELSE → IF_PREFIX TRUE_BLOCK ELSE Generate an unconditional branch with offset 0000, and set $$.head_of_true_chain to point to this offset field. Make all the member of the false chain pointed to by $1.head_of_false_chain into branches to code_i. SIMPLE_IF _ELSE_BLOCK → SIMPLE_IF _ELSE FALSE_BLOCK ENDIF Make the branch pointed to by $1.head_of_true_chain into a branch to code_i. NOTE Yacc reports a shift-reduce conflict for most grammars written for the IF statement. For example, with the statement: If A then if B then C else D the shift-reduce conflict occurs in the state reached when the next input symbol is “else”. If the shift action is chosen, the statement will be interpreted as: if A then (if B then C else D) whereas if the reduce action is taken, it will be interpreted as: if A then (if B then C) else D As you can see, if A is false in the first case, then D will not be executed (nor will C), whereas if A is false in the second case, then D will be executed. Nearly all programming languages adopt the first interpretation (obtained taking the shift action). Since the default action that Yacc takes in shift-reduce conflicts is in fact the shift action, the above conflict can be ignored for such languages, and while it is possible to produce grammars for the If statement that avoid the conflict, this is not necessary. EXAMPLE Let’s consider the problem of evaluating the short-circuit code for: if (a and b and c) or (d and e and f) or (g and h and i) go to place1 else go to place2 where the letters a – i are conditions such as X > Y+2. Let the letters A – I denote the code for evaluating a – i respectively. By e.g. Cjt300h we mean the code for evaluating condition c followed by a jump to offset 300h in the case where c is true. For example, if c is X > Y + 2, then Cjt300h could represent the following code: add y, 2 mov ax, x cmp ax, y je 300h1 By Cjf300h we mean the code for the comparison as above, followed by a jump to offset 300h in the case where c is false, ie. the last instruction should be changed to jne 300h Using our productions for the if statement, the following is a derivation of the example’s if statement. 1 By je 300h we mean the conditional jump to offset 300h in the code segment, but je is a relative jump whose operand indicates how many bytes to jump forward or backward, so the actual operand required to branch to offset 300h has to be evaluated. DERIVATION OF THE EXAMPLE’S IF STATEMENT The “bool_” prefixes in the nonterminals have been omitted for brevity. The reductions are numbered in the order they occur in a parse using an LR(1) parsing machine. The derivation of “go to true-place” from true-block, and of “go to false-place” from false-block are not shown. if_statement 46 simple_if_else_block 45 simple_if_else false-block endif 44 if_prefix true-block else 43 if expression 42 expression or factor 28 41 expression or factor secondary 14 27 40 factor secondary primary 13 26 39 secondary primary ( expression) 12 25 38 bool_primary ( expression ) factor 11 24 37 ( expression ) factor factor and secondary 10 23 34 36 factor factor and secondary factor and secondary primary 9 20 22 31 33 35 factor and secondary factor and secondary primary secondary primary i 6 8 17 19 21 30 32 factor and secondary primary secondary primary f primary h 3 5 7 16 18 29 secondary primary c primary e g 2 4 15 primary b d 1 a SNAPSHOTS OF THE PARSE OF THE EXAMPLE IF STATEMENT Our snapshots show relavant parts of the content of the code segment and symbol stack after selected productions, identified by the numbering given in the derivation above. After a production such as bool_factor → bool_factor and bool_secondary with our original method, in which the entries in symbol_stack where grammar symbols, we replaced the top three members of the stack by bool_factor. Now we are employing a stack in which each entry is a pair of numbers, that we have named head_of_true_chain and head_of_false_chain, and our algorithm is designed to make the head_of_true_chain point 2 to a back chain in the code segment of jumps that will occur if the code derived from bool_factor is true, and head_of_false_chain point to a back chain of such jumps that will occur if the code involved is false. For illustration purposes, our snapshots will also depict the source code corresponding to the code that has been generated 1. The situation after reduction 1: primary → a primary t=n1 f=0 symbol stack (abbrev. ss) code segment (abbrev. cs) AJt0 n1 ( a source code involved (abbrev. sc) Here the code for comparison a, followed by a jump if true with a zero operand has been generated in the code segment and n1 is the offset of the operand field involved. In symbol stack the “t=n1 f=0” means the head of the true chain for $$ has been set to n1, and the head of its false chain has been set to zero 2. After reduction 5: secondary → primary ss cs sc factor and t =n1 f=0 AJt0 n1 ( a and secondary t=n2 f=0 BJt0 n2 b 3. After reduction 6: factor → factor and secondary factor t=n2 f=n1 ss cs sc AJf0 n1 ( a and BJt0 n2 b 4. After reduction 8: secondary → primary factor t=n2 f=n1 ss cs sc 2 AJf0 n1 ( a and and BJt0 n2 b and secondary t=n3 f=0 CJt0 n3 c While one could use a C pointer for this purpose, by “points to” I mean here “supplies the offset in the code segment of” 5. After reduction 9: factor → factor and secondary Note that we have changed the BJt to BJf , since we want to avoid evaluated c in the case where b is false. If b is true, we would need to evaluate c to determine whether a and b and c is true or not. So the code BJt would have to be a condition branch to C , and C would thus be evaluated whether or not B were true. factor t=n3 f=n2 ss cs sc AJf0 n1 ( a BJfn1 n2 and b CJt0 n3 and c 6. After reduction 27: factor → secondary expression t=n3 f=n2 ss cs sc or AJf0 n1 BJfn1 n2 CJt0 n3 n1 n2 n3 ( a and b and c ) factor t=n6 f=n5 DJf0 n4 or EJfn4 n5 ( d and FJt0 n6 e and f ) 7. After reduction 28: expresssion → expression or factor expression t=n6 f=n5 ss cs sc AJfn3+2 n1 ( a BJfn3+2 n2 and b CJt0 n3 and c ) DJf0 n4 or EJfn4 n5 ( d and FJt0 n6 e and f ) 8. After reduction 41: factor → secondary expression t=n6 f=n5 ss cs AJfn3+2 n1 sc ( a and BJfn3+2 n2 CJt0 n3 b c ) or ( d and DJf0 n4 and factor t=n9 f=n8 EJfn4 n5 FJt0 n6 e f ) and GJf0 n7 or ( g and HJfn1 n8 IJt0 n9 h i ) and 9. After reduction 42: expression → expression or factor if ss cs AJfn3+2 n1 sc ( a and BJfn3+2 n2 CJt0 n3 b c ) or ( d and DJfn6+2 n4 and EJfn6+2 n5 FJt0 n6 e f ) and expression t=n9 f=n8 GJf0 n7 or ( g HJfn1 n8 and h IJt0 n9 and i ) 10. After reduction 43: if_prefix → if expression if_prefix t=0 f=n9 ss cs AJfn3+2 n1 sc ( a and BJfn3+2 n2 CJtn9+2 n3 b c ) or ( d and DJfn6+2 n4 and EJfn6+2 n5 FJt n9+2 n6 e f ) and or GJf0 n7 ( g HJfn1 n8 IJf0 n9 h i ) and and n10 11. After reduction 44: simple_if_else → if_prefix tblk else simple_if_else t=n11 f=n9 ss cs AJfn3+2 n1 sc ( a and BJfn3+2 n2 CJtn9+2 n3 b c ) or ( d and DJfn6+2 n4 and EJfn6+2 n5 FJt n9+2 n6 GJf11+2 n7 e f ) ( g and or HJfn11+2 n8 and h and IJfn11+2 tblk J0 n9 n10 n11 i ) tblk else 12. After reduction 45: simple_if_else_block → simple_if_else fblk endif simple_if_else fblk endif t=n11 f=n9 ss cs AJfn3+2 n1 sc ( a and BJfn3+2 n2 CJtn9+2 n3 b c ) or ( d and DJfn6+2 n4 and EJfn6+2 n5 e and FJt n9+2 n6 f ) or GJf11+2 n7 ( g and HJfn11+2 n8 h and IJfn11+2 n9 i ) tblk Jn13 fblk n10 n11 n12 n13 tblk else blk