THE UNARY MINUS PROBLEM

advertisement
THE UNARY MINUS PROBLEM
We refer only to the unary minus below, but the same considerations apply to the
unary plus.
Problems arise in using the minus sign both for the binary minus as in 3 – 4 and
the unary minus as in -4. For instance, if we allow Lex to define an integer as
{“+”|-}?[0-9]+ , then in using a grammar such as:
expression → expression + term | expression - term | + term | term
term → term * secondary | term / secondary | secondary
secondary → secondary ^ primary | primary
primary → number | ( expression )
Lex will interpret the expression 3–4 as two successive numbers, 3 and -4.
Methods of solving this problem by means of appropriate grammars lead to
complications. For instance it isn’t acceptable to only define unsigned integers in
Lex, and then provide for negative numbers via the grammar, as in:
expression → expression + term | expression - term
term → term * secondary | term / secondary | secondary
secondary → secondary ^ primary | primary
primary → number | + number | - number | ( expression )
With this grammar -32, will be represented in Lex’s input to Yacc as
– secondary ^ primary
which has the following derivation:
expression
term
secondary
secondary ^ primary
primary
- number
As can be seen, this derivation interprets -32 as (-3)2,
but -32 is conventionally interpreted as –(32)
The best and simplest methof of avoiding these problems is to use different
symbols in your grammar for the unary minus and unary plus. Thus the last two
lines of the above grammar should be replaced by:
primary → unary_plus number | unary_minus number
| number | ( expression )
The problem of distinguishing between the different types of minus and plus
signs is relegated to Lex. For this we need to employ two additional features or
regular expressions:
1. If the symbol ^ occurs at the beginning of a regular expression then it
matches the beginning of a line in the source (and if the symbol $ occurs
at the end of a regular expression then it matches the end of a line in the
source)
For example, ^John
will match the string “John” in the source if and
only if the “J” occurs in column 1 of the source line.
2. If R1 and R2 are regular expressions then Lex takes R1 as matching the
head of the source if and if R1R2 in fact matches it.
For example, June/Jones matches the “June” in the source
”JuneJones is here” but does not match anything in the source
“June has come”
So now let’s consider where a unary minus can occur in arithmetic expressions
(a similar description will apply to the unary plus).
The places involved are:
1. At the beginning of an expression, as e.g. in:
-4 + 3
2. After a “(“,
as e.g. in:
3 + (-4 * 5)
3. After a “*” or “/” symbol
as e.g. in:
3 * -4
To recognize these occurences, we need to include in Lex a declaration such as
int unary_minus_flag = 0
This flag will be set to 1 if the next minus sign is in fact a unary minus.
Let us also employ as a macro: whitespace (“ “|/t)*
Then we can use rules such as:
^whitespace”-“
{return unary_minus;}
“(“/whitespace”-“
{unary_plus_flag = 1; return ‘(‘;}
“-“
{if (unary_minus_flag == 1) {
unary_minus_flag = 0;
return unary_minus;
}
else {
unary_minus_flag = 0;
return ‘-‘;
} }
Download