THE UNARY MINUS PROBLEM We refer only to the unary minus below, but the same considerations apply to the unary plus. Problems arise in using the minus sign both for the binary minus as in 3 – 4 and the unary minus as in -4. For instance, if we allow Lex to define an integer as {“+”|-}?[0-9]+ , then in using a grammar such as: expression → expression + term | expression - term | + term | term term → term * secondary | term / secondary | secondary secondary → secondary ^ primary | primary primary → number | ( expression ) Lex will interpret the expression 3–4 as two successive numbers, 3 and -4. Methods of solving this problem by means of appropriate grammars lead to complications. For instance it isn’t acceptable to only define unsigned integers in Lex, and then provide for negative numbers via the grammar, as in: expression → expression + term | expression - term term → term * secondary | term / secondary | secondary secondary → secondary ^ primary | primary primary → number | + number | - number | ( expression ) With this grammar -32, will be represented in Lex’s input to Yacc as – secondary ^ primary which has the following derivation: expression term secondary secondary ^ primary primary - number As can be seen, this derivation interprets -32 as (-3)2, but -32 is conventionally interpreted as –(32) The best and simplest methof of avoiding these problems is to use different symbols in your grammar for the unary minus and unary plus. Thus the last two lines of the above grammar should be replaced by: primary → unary_plus number | unary_minus number | number | ( expression ) The problem of distinguishing between the different types of minus and plus signs is relegated to Lex. For this we need to employ two additional features or regular expressions: 1. If the symbol ^ occurs at the beginning of a regular expression then it matches the beginning of a line in the source (and if the symbol $ occurs at the end of a regular expression then it matches the end of a line in the source) For example, ^John will match the string “John” in the source if and only if the “J” occurs in column 1 of the source line. 2. If R1 and R2 are regular expressions then Lex takes R1 as matching the head of the source if and if R1R2 in fact matches it. For example, June/Jones matches the “June” in the source ”JuneJones is here” but does not match anything in the source “June has come” So now let’s consider where a unary minus can occur in arithmetic expressions (a similar description will apply to the unary plus). The places involved are: 1. At the beginning of an expression, as e.g. in: -4 + 3 2. After a “(“, as e.g. in: 3 + (-4 * 5) 3. After a “*” or “/” symbol as e.g. in: 3 * -4 To recognize these occurences, we need to include in Lex a declaration such as int unary_minus_flag = 0 This flag will be set to 1 if the next minus sign is in fact a unary minus. Let us also employ as a macro: whitespace (“ “|/t)* Then we can use rules such as: ^whitespace”-“ {return unary_minus;} “(“/whitespace”-“ {unary_plus_flag = 1; return ‘(‘;} “-“ {if (unary_minus_flag == 1) { unary_minus_flag = 0; return unary_minus; } else { unary_minus_flag = 0; return ‘-‘; } }