Grammars, Stacks, & Prefix & Postfix expressions To illustrate a common application of recursion, Chapter 6 introduces the concept of grammars that describe the syntax of a language or a language construct, and then writing a recursive algorithm to recognize strings in that “language”. We need to begin by discussing what a grammar is and an explanation of the notation used in your text to define a grammar. The syntax of a programming language is a definition of what constitutes a grammatically correct program in that language. The formal syntax of a language has 2 layers: a lexical layer and a grammar layer. Lexical syntax corresponds to the spelling of words in the language, and identifies what special symbols have meaning within the language ie: <=, <, <> else, if, int could all potentially be meaningful tokens. Tokens are the smallest meaningful components of a language such as words or punctuation symbols. The grammatical layer is specified by a set of recursive rules called a grammar. Just as natural languages (English etc.) have grammars which specify sentence structure, and punctuation. Grammars for programming languages define valid statement structure A string for a particular language can be parsed to determine its validity using either nested recursive calls, or sometimes Stacks and queues can be used to determine if a string is valid for the language described by a grammar. A common notation for grammars is Baccus-Naur form. BNF Notation <S> => $ | <W> | $ <S> <W> => abb | a<W>bb $, abb, $$$, aabbbb, $abb, $$$abb, $$aabbbb….. A BNF grammar consists of 4 constructs Terminals: are the basic symbols of which strings in a language are composed. These may include single characters, identifiers, digits, punctuation symbols, keywords. These appear on the right hand side of productions. Non-terminals: are special symbols which can be rewritten in terms of other non-terminals and terminals. These can occur on both the right and left hand side of productions. It is possible to define a non-terminal which can be rewritten in terms of itself (recursion). They help to provide a hierarchical structure for the language being defined. Starting symbol: When defining a grammar for a we select a starting symbol which denotes the construct (or language) itself which is being described. It is the starting point for the derivation of any string. Productions: Set of “rewriting” rules which define the ways in which the syntactic categories may be built up from one another and from the terminals. Each production consists of a non-terminal, followed by an arrow -> (or ::=) followed by a string of non-terminals and terminals. A sample set of productions for an integer number might be: <integer> -> <sign><unsigned> <unsigned> -> <digit> | <unsigned><digit> <sign> -> + | - | <digit> -> 0 | 1| 2| 3| 4| 5| 6| 7| 8| 9 Once we have defined a grammar we can do a derivation to recognize a string as belonging to the language defined by a grammar. A derivation begins with the start symbol and repeatedly replaces a non-terminal symbol by the right hand side of a production having that non-terminal as its left hand side until we eventually obtain that string. EXERCISE: DETERMINE IF 576 IS A SENTENCE DESCRIBED BY THE PREVIOUS GRAMMAR <integer> -> <sign><unsigned> -> <unsigned> -> <unsigned><digit> -> <unsigned><digit><digit> -> <digit><digit><digit> -> 5<digit><digit> -> 57<digit> -> 576 A grammar not only defines the valid syntax for a language but is an outline for a “recognizer” program for the language, where non-terminals represent calls to potentially recursive functions. A “recognizer” program can be written to determine whether or not a string is part of a language using recursive methods (LOOK AT THE EXAMPLE OF A GRAMMER FOR THE LANGUAGE ANBN, on page 304 of your text). Extra Example, Write a grammar to represent the following language and modify the recognizer program: L(G) = {ancbn : n >= 0 } = { c, acb, aacbb, aaacbbb, aaaacbbbb, . . . } Empty string is not valid <S> => c | a<S>b isAnCBn( String W) { Boolean result= false; If ( the length of W is 0) result = false; Else if ( if the length of W is 1 && W == “c”) result =true; Else { If (first character of W is ‘a’ && last character is ‘b’) Result = isAnCBn( W minus its first and last characters); Else result = false; } Return result; }// end method Specify a grammar for a language L(G) = { anbmco : n, m, o > 0} <A> => a<B> | a<A> <B> => b<C> | b<B> <C> => c<C> | c Describing the syntax of and establishing the ability to evaluate arithmetic expressions is frequently the starting point in the definition of a new programming language. By creating a grammar which describes how the language construct is defined we can then write recognition programs to determine if a sentence in that language is valid. But writing grammars isn’t always easy. Most programming languages incorporate expression syntax that uses 3 different expression notations: infix, prefix, and postfix. Your text books shows both a grammar and parts of a recursive recognizer program for prefix and postfix notations. We are familiar with the infix notation for expressing algebraic expressions: A+B In the infix notation, a binary operator is written in between the two operands that it acts upon. In Prefix notation, a binary operator is written in front of the two operands it affects: +AB Method calls are a form of prefix expression, where the operator is the method name, and the operands are surrounded by parenthesis: methodname (operand1, operand2) And in postfix, the operator follows its operands: AB+ Both of these expressions facilitate the execution of expressions, and are parenthesis free. As we will see in the future, we can use a stack to quickly and readily evaluate a postfix expression, and convert an infix expression into a postfix expression. In general, grammars to define infix expressions can be ambiguous unless we account for the rules for precedence for operators, and account for the correct associativity when two or more operators of the same precedence are in the same expression. A+B*C Should this expression be evaluated as ( A + B ) * C or A + ( B * C) we know the second is correct, but we must write our grammar to reflect this. A grammar for simple expressions containing only the operators * / + and – , with the correct precedence and associatively is: <assign> --> <id> = <expr> <id> --> A | B | C | D <expr> --> <expr> + <term> | <expr> - <term> | <term> <term> --> <term> * <factor> | <term> / <factor> | <factor> <factor> --> <id> | ( <expr> ) Another solution is require that every expression be fully parenthesized, for example: A* B + C would be written as ( ( A * B ) + C ) Although this works it is tedious, another solution is to translate the infix expression into one of the other two forms (pre or postfix), which are unambiguous and are called parentheses free expressions, and more readily validated with a simpler grammar. To convert an infix expression to either prefix or postfix expression, one technique that can be done is to fully parenthesize the original expression, and simply move the operator. During conversion the order of operands in the original expression is preserved. A+(B*C)*E–D ( ( A + ( ( B * C ) * E ) ) – D) Next the operator is simply moved outside the parenthesis: For prefix, it is moved preceding the parenthesis, and in postfix it is moved after: Prefix: -+A**BCED -+A**BCED Assuming A = 2, B = 2, C = 3, D = 4, E = 5 -+2**2354 We evaluate the expression from left to right, we extract each “token” from the string, and when we find an operator followed by two operands, we evaluate it , and put the result back into place: Step 1: evaluate * 2 3 = 6 -+2*654 Step 2, evaluate * 6 5 = 30 - + 2 30 4 Step 3, evaluate + 2 30 =32 - 32 4 Step 4 evaluate – 32 4 = 28 Postfix Expressions ( ( A + ( ( B * C ) * E ) ) – D) ABC*E*+D– As we scan the expression from left to write, when we find 2 operands followed by an operator, we evaluate the sub-expression, replace the sub-expression with its result, and continue scanning. Assuming A = 2, B = 2, C = 3, D = 4, E = 5 2 2 3 * 5 * + 4 2 6 5 * + 4 2 30 + 4 32 - {2 * 3} { 6 * 5} { 2 + 30} 4 - { 32 – 4} Your text has a grammar for prefix and postfix expressions ( ( ( ( a – b) * c ) – d ) + (a * c) ) +–*– abcd*ac ab– c*d –a c* +