Grammars, Stacks, & Prefix & Postfix expressions

advertisement
Grammars, Stacks, & Prefix & Postfix expressions
To illustrate a common application of recursion, Chapter 6 introduces the concept of
grammars that describe the syntax of a language or a language construct, and then
writing a recursive algorithm to recognize strings in that “language”. We need to
begin by discussing what a grammar is and an explanation of the notation used in your
text to define a grammar.
The syntax of a programming language is a definition of what constitutes a
grammatically correct program in that language. The formal syntax of a language has 2
layers: a lexical layer and a grammar layer. Lexical syntax corresponds to the spelling
of words in the language, and identifies what special symbols have meaning within the
language ie: <=, <, <> else, if, int could all potentially be meaningful tokens. Tokens
are the smallest meaningful components of a language such as words or punctuation
symbols.
The grammatical layer is specified by a set of recursive rules called a grammar. Just
as natural languages (English etc.) have grammars which specify sentence structure, and
punctuation. Grammars for programming languages define valid statement structure
A string for a particular language can be parsed to determine its validity using either
nested recursive calls, or sometimes Stacks and queues can be used to determine if a
string is valid for the language described by a grammar. A common notation for
grammars is Baccus-Naur form.
BNF Notation
<S> => $ | <W> | $ <S>
<W> => abb | a<W>bb
$, abb, $$$, aabbbb, $abb, $$$abb, $$aabbbb…..
A BNF grammar consists of 4 constructs
 Terminals: are the basic symbols of which strings in a language are
composed. These may include single characters, identifiers, digits,
punctuation symbols, keywords. These appear on the right hand side of
productions.
 Non-terminals: are special symbols which can be rewritten in terms of
other non-terminals and terminals. These can occur on both the right and
left hand side of productions. It is possible to define a non-terminal which
can be rewritten in terms of itself (recursion). They help to provide a
hierarchical structure for the language being defined.
 Starting symbol: When defining a grammar for a we select a starting
symbol which denotes the construct (or language) itself which is being
described. It is the starting point for the derivation of any string.
 Productions: Set of “rewriting” rules which define the ways in which the
syntactic categories may be built up from one another and from the
terminals. Each production consists of a non-terminal, followed by an
arrow -> (or ::=) followed by a string of non-terminals and terminals.
A sample set of productions for an integer number might be:
<integer> -> <sign><unsigned>
<unsigned> -> <digit> | <unsigned><digit>
<sign> -> + | - | 
<digit> -> 0 | 1| 2| 3| 4| 5| 6| 7| 8| 9
Once we have defined a grammar we can do a derivation to recognize a string as
belonging to the language defined by a grammar. A derivation begins with the start
symbol and repeatedly replaces a non-terminal symbol by the right hand side of a
production having that non-terminal as its left hand side until we eventually obtain that
string.
EXERCISE: DETERMINE IF 576 IS A SENTENCE DESCRIBED BY THE
PREVIOUS GRAMMAR
<integer> -> <sign><unsigned> -> <unsigned> -> <unsigned><digit> ->
<unsigned><digit><digit> -> <digit><digit><digit> -> 5<digit><digit> ->
57<digit> -> 576
A grammar not only defines the valid syntax for a language but is an outline for a
“recognizer” program for the language, where non-terminals represent calls to
potentially recursive functions.
A “recognizer” program can be written to determine whether or not a string is part
of a language using recursive methods (LOOK AT THE EXAMPLE OF A
GRAMMER FOR THE LANGUAGE ANBN, on page 304 of your text).
Extra Example, Write a grammar to represent the following language and modify
the recognizer program:
L(G) = {ancbn : n >= 0 }
= { c, acb, aacbb, aaacbbb, aaaacbbbb, . . . } Empty string is not valid
<S> => c | a<S>b
isAnCBn( String W) {
Boolean result= false;
If ( the length of W is 0) result = false;
Else if ( if the length of W is 1 && W == “c”) result =true;
Else {
If (first character of W is ‘a’ && last character is ‘b’)
Result = isAnCBn( W minus its first and last characters);
Else result = false;
}
Return result;
}// end method
Specify a grammar for a language L(G) = { anbmco : n, m, o > 0}
<A> => a<B> | a<A>
<B> => b<C> | b<B>
<C> => c<C> | c
Describing the syntax of and establishing the ability to evaluate arithmetic expressions is
frequently the starting point in the definition of a new programming language. By
creating a grammar which describes how the language construct is defined we can then
write recognition programs to determine if a sentence in that language is valid. But
writing grammars isn’t always easy.
Most programming languages incorporate expression syntax that uses 3 different
expression notations: infix, prefix, and postfix. Your text books shows both a grammar
and parts of a recursive recognizer program for prefix and postfix notations. We are
familiar with the infix notation for expressing algebraic expressions:
A+B
In the infix notation, a binary operator is written in between the two operands that it acts
upon.
In Prefix notation, a binary operator is written in front of the two operands it affects:
+AB
Method calls are a form of prefix expression, where the operator is the method name,
and the operands are surrounded by parenthesis:
methodname (operand1, operand2)
And in postfix, the operator follows its operands:
AB+
Both of these expressions facilitate the execution of expressions, and are parenthesis
free. As we will see in the future, we can use a stack to quickly and readily evaluate
a postfix expression, and convert an infix expression into a postfix expression.
In general, grammars to define infix expressions can be ambiguous unless we account for
the rules for precedence for operators, and account for the correct associativity when two
or more operators of the same precedence are in the same expression.
A+B*C
Should this expression be evaluated as ( A + B ) * C or A + ( B * C) we know the
second is correct, but we must write our grammar to reflect this.
A grammar for simple expressions containing only the operators * / + and – , with the
correct precedence and associatively is:
<assign> --> <id> = <expr>
<id> --> A | B | C | D
<expr> --> <expr> + <term> | <expr> - <term> | <term>
<term> --> <term> * <factor> | <term> / <factor> |
<factor>
<factor> --> <id> | ( <expr> )
Another solution is require that every expression be fully parenthesized, for example:
A* B + C
would be written as ( ( A * B ) + C )
Although this works it is tedious, another solution is to translate the infix expression
into one of the other two forms (pre or postfix), which are unambiguous and are
called parentheses free expressions, and more readily validated with a simpler
grammar.
To convert an infix expression to either prefix or postfix expression, one technique that
can be done is to fully parenthesize the original expression, and simply move the
operator. During conversion the order of operands in the original
expression is preserved.
A+(B*C)*E–D
( ( A + ( ( B * C ) * E ) ) – D)
Next the operator is simply moved outside the parenthesis: For prefix, it is moved
preceding the parenthesis, and in postfix it is moved after:
Prefix:
-+A**BCED
-+A**BCED
Assuming A = 2, B = 2, C = 3, D = 4, E = 5
-+2**2354
We evaluate the expression from left to right, we extract each
“token” from the string, and when we find an operator followed by
two operands, we evaluate it , and put the result back into place:
Step 1: evaluate * 2 3 = 6
-+2*654
Step 2, evaluate * 6 5 = 30
- + 2 30 4
Step 3, evaluate + 2 30 =32
- 32 4
Step 4 evaluate – 32 4 = 28
Postfix Expressions
( ( A + ( ( B * C ) * E ) ) – D)
ABC*E*+D–
As we scan the expression from left to write,
when we find 2 operands followed by an
operator, we evaluate the sub-expression,
replace the sub-expression with its result,
and continue scanning.
Assuming A = 2, B = 2, C = 3, D = 4, E = 5
2 2 3 * 5 * + 4
2 6 5 * + 4 2 30 + 4
32
-
{2 * 3}
{ 6 * 5}
{ 2 + 30}
4 - { 32 – 4}
Your text has a grammar for prefix and
postfix expressions
( ( ( ( a – b) * c ) – d ) + (a * c) )
+–*– abcd*ac
ab– c*d –a c* +
Download