Describing Syntax CS 3360 Spring 2012 Sec 3.1-3.4 Adapted from Addison Wesley’s lecture notes (Copyright © 2004 Pearson Addison Wesley) CS 3360 1 Outline Introduction Formal description of syntax Backus-Naur Form (BNF) Attribute grammars (probably next time ) CS 3360 2 Introduction Who must use language definitions? Implementers Programmers (the users of the language) Syntax - the form or structure of the expressions, statements, and program units Semantics - the meaning of the expressions, statements, and program units CS 3360 3 Introduction (cont.) Example Syntax of Java while statement while (<boolean-expr>) <statement> Semantics? CS 3360 4 Describing Syntax – Vocabulary A sentence is a string of characters over some alphabet A language is a set of sentences A lexeme is the lowest level syntactic unit of a language (e.g., *, sum, while) A token is a category of lexemes (e.g., identifier) CS 3360 5 Example index = 2 * count + 17; Lexemes index = 2 * count + 17 ; CS 3360 Tokens identifier equal_sign int_literal mult_op identifier plus_op int_literal semicolon 6 Describing Syntax Formal approaches to describing syntax: Recognizers (once you have code) Can tell whether a given string is in a language or not Used in compilers, and called a parser Generators (in order to build code) Generate the sentences of a language Used to describe the syntax of a language CS 3360 7 Formal Methods of Describing Syntax Context-Free Grammars (CFG – see automata course) Developed by Noam Chomsky in the mid- 1950’s Language generators, meant to describe the syntax of natural languages Define a class of languages called contextfree languages CS 3360 8 Formal Methods of Describing Syntax Backus-Naur Form Invented by John Backus to describe Algol 58 Extended by Peter Naur to describe Algol 60 BNF is equivalent to context-free grammars A metalanguage is a language used to describe another language. In BNF, abstractions are used to represent classes of syntactic structures--they act like syntactic variables (also called nonterminal symbols) CS 3360 9 Backus-Naur Form <while_stmt> while ( <logic_expr> ) <stmt> This is a rule (also called a production rule); it describes the structure of a while statement CS 3360 10 Backus-Naur Form A rule has a left-hand side (LHS) and a righthand side (RHS), and consists of terminal and non-terminal symbols A grammar is a finite non-empty set of rules An abstraction (or non-terminal symbol) can have more than one RHS <stmt> <single_stmt> | { <stmt_list> } CS 3360 11 Backus-Naur Form Syntactic lists are described using recursion <ident_list> ident | ident , <ident_list> Example sentences: ident ident , ident ident , ident, ident CS 3360 12 Example A grammar for small language: <program> <stmts> <stmts> <stmt> | <stmt> ; <stmts> <stmt> <var> = <expr> <var> a | b | c | d <expr> <term> + <term> | <term> - <term> <term> <var> | 5 Sample program a = b + 5 CS 3360 13 Exercise Define a grammar to generate all sentences of the form: subject verb object . where subject is “i” or “we”, and verb is “love” or “like”, and object is “exercises” or “programming”. CS 3360 14 Exercise Define the syntax of Java Boolean expressions consisting of: Constants: false and true Operators: !, &&, and || CS 3360 15 Derivation A derivation is a repeated application of rules, starting with the start symbol and ending with a sentence (all terminal symbols) Example: <ident_list> ident | ident , <ident_list> <ident_list> => ident , <ident_list> => ident , ident , <ident_list> => ident, ident , ident CS 3360 16 More Example <program> <stmts> <stmts> <stmt> | <stmt> ; <stmts> <stmt> <var> = <expr> <var> a | b | c | d <expr> <term> + <term> | <term> - <term> <term> <var> | 5 a = b + 5 <program> => <stmts> => <stmt> => <var> = <expr> => a = <expr> => a = <term> + <term> => a = <var> + <term> => a = b + <term> => a = b + 5 CS 3360 17 Derivation Every string of symbols in the derivation is a sentential form A sentence is a sentential form that has only terminal symbols A leftmost derivation is one in which the leftmost nonterminal in each sentential form is the one that is expanded A derivation may be neither leftmost nor rightmost CS 3360 18 Exercise <program> <stmts> <stmts> <stmt> | <stmt> ; <stmts> <stmt> <var> = <expr> <var> a | b | c | d <expr> <term> + <term> | <term> - <term> <term> <var> | 5 Derive a = b + 5 by using a rightmost derivation. CS 3360 19 Parse Tree A hierarchical representation of a derivation <program> <stmts> <stmt> <var> = <expr> a <term> + <term> <var> 5 b CS 3360 20 Ambiguity of Grammars A grammar is ambiguous if and only if it generates a sentential form that has two or more distinct parse trees. CS 3360 21 An Ambiguous Expression Grammar <expr> <expr> <op> <expr> | 5 <op> / | <expr> <expr> <expr> <op> <expr> <expr> <op> <expr> <op> <expr> 5 CS 3360 - 5 <expr> <expr> <op> <expr> / 5 5 - 5 / 5 22 An Unambiguous Expression Grammar If we use the parse tree to indicate precedence levels of the operators, we cannot have ambiguity <expr> <expr> - <term> | <term> <term> <term> / 5 | 5 <expr> <expr> <term> 5 CS 3360 - <term> <term> / 5 5 23 Exercise Prove or disprove the ambiguity of the following grammar <stmt> -> <if-stmt> <if-stmt> -> if <expr> then <stmt> | if <expr> then <stmt> else <stmt> CS 3360 24 Operator Precedence Derivation: <expr> <expr> - <term> | <term> <term> <term> / 5 | 5 <expr> => <expr> - <term> => <term> - <term> => 5 - <term> => 5 - <term> / 5 => 5 - 5 / 5 CS 3360 25 Operator Associativity Can we describe operator associativity correctly? A=A+B + C (A + B) + C or A + (B + C)? Does it matter? CS 3360 26 Operator Associativity Operator associativity can also be indicated by a grammar <expr> -> <expr> + <expr> | 5 (ambiguous) <expr> -> <expr> + 5 | 5 (unambiguous) <expr> <expr> <expr> <expr> + + 5 5 5 CS 3360 27 Left vs. Right Recursion A rule is left recursive if its LHS also appears at the beginning (left end) of its RHS. A rule is right recursive if its LHS also appears at the right end of its RHS. <factor> -> <expr> ** <factor> | <expr> <expr> -> c Example: c ** c ** c interpreted as c ** (c ** c) CS 3360 28 Exercise Define a BNF grammar for expressions consisting of +, *, and ** (exponential). The operator ** has precedence over *, and * has precedence over +. Both + and * are left associative while ** is right associative. Using the above grammar, draw a parse tree for the sentence: 7 + 6 + 5 * 4 * 3 ** 2 ** 1 Exercise to do in groups at the end of lecture CS 3360 29 Extended BNF (EBNF) Extended BNF (just abbreviations): Optional parts are placed in brackets ([ ]) <meth_call> -> ident ( [<expr_list>] ) Put alternative parts of RHSs in parentheses and separate them with vertical bars <term> -> <term> (+ | -) const Put repetitions (0 or more) in braces ({ }) <ident> -> letter {letter | digit} CS 3360 30 Example BNF: <expr> <expr> + <term> | <expr> - <term> | <term> <term> <term> * <factor> | <term> / <factor> | <factor> EBNF: <expr> <term> {(+ | -) <term>} <term> <factor> {(* | /) <factor>} CS 3360 31 Exercise / Homework Write BNF rules for the following EBNF rules: 1. <meth_call> -> <ident> “(” [<expr_list>] “)” 2. <term> -> <term> (+ | -) const 3. <ident> -> letter {letter | digit} Due on Tuesday at the start of the session! CS 3360 32 Outline Introduction Describing syntax formally Backus-Naur Form (BNF) Attribute grammars CS 3360 33 Attribute Grammars CFGs cannot describe all of the syntax of programming languages Additions to CFGs to carry some semantic info along through parse trees Primary value of attribute grammars: Static semantics specification Compiler design (static semantics checking) CS 3360 34 Basic Idea Add attributes, attribute computation functions, and predicates to CFGs Attributes Attribute computation functions Associated with grammar symbols Can have values assigned to them Associated with grammar rules Specify how to compute attribute values Are often called semantic functions Predicate functions CS 3360 Associated with grammar rules State some of the syntax and static semantic rules of the language 35 Example BNF <meth_def> -> meth <meth_name> <meth_body> end <meth_name> <meth_name> -> <identifier> <meth_body> -> … AG 1. Syntax rule: <meth_def> -> meth <meth_name>[1] <meth_body> end <meth_name>[2] Predicate: <meth_name>[1].string == <meth_name>[2].string 2. Syntax rule: <meth_name> -> <identifier> Semantic rule: <meth_name>.string <- <identifier>.string CS 3360 36 Attribute Grammars Defined An attribute grammar is a CFG with the following additions: A set of attributes A(X) for each grammar symbol X A(X) consists of two disjoint sets S(X) and I(X) S(X): synthesized attributes I(X): inherited attributes Each rule has a set of functions that define certain attributes of the non-terminals in the rule Each rule has a (possibly empty) set of predicates to check for attribute consistency CS 3360 37 Attribute Functions Let X0 X1 ... Xn be a rule Functions of the form S(X0) = f(A(X1), ... , A(Xn)) define synthesized attributes Functions of the form I(Xj) = f(A(X0), ... , A(Xn)), for 1 <= j <= n, define inherited attributes. Often of the form: I(Xj) = f(A(X0), ... , A(Xj-1)) Initially, there are intrinsic attributes on the leaves. Intrinsic attributes are synthesized attributes whose value are determined outside the parse tree. CS 3360 38 Example - Type Checking Rules BNF <assign> -> <var> = <expr> <expr> -> <var> | <var> + <var> <var> -> A | B | C Rule A variable is either int or float. If the two operands of + has the same type, the type of expression is that of the operands; otherwise, it is float. The type of the left side of assignment must match the type of the right side. Attributes actual_type: synthesized for <var> and <expr> expected_type: inherited for <expr> string: intrinsic for <var> CS 3360 39 Example – Attribute Grammar 1. Syntax rule: <assign> -> <var> = <expr> Semantic rule: <expr>.expected_type <- <var>.actual_type 2. Syntax rule: <expr> -> <var>[1] + <var>[2] Semantic rule: <expr>.actual_type <(<var>[1].actual_type == int && <var>[2].actual_type == int) ? int : float Predicate: <expr>.actual_type == <expr>.expected_type 3. Syntax rule: <expr> -> <var> Semantic rule: <expr>.actual_type <- <var>.actual_type Predicate: <expr>.actual_type == <expr>.expected_type 4. Syntax rule: <var> -> A | B | C Semantic rule: <var>.actual_type <- lookup(<var>.string) CS 3360 40 Example – Parse Tree A=A+B <assign> <expr> <var> A CS 3360 <var>[1] = A <var>[2] + B 41 Example – Flow of Attributes A=A+B <assign> <expr>.expected_type <- <var>.actual_type <expr>.actual_type <- (<var>[1].actual_type == int && <var>[2].actual_type == int) ? int : float expected_type <expr> actual_type <var> <var>[1] CS 3360 actual_type actual_type actual_type A <var>[2] = A + B 42 Example – Calculating Attributes <expr>.expected_type <- <var>.actual_type <expr>.actual_type <- (<var>[1].actual_type == int && <var>[2].actual_type == int) ? int : float A=A+B <assign> expected_type <expr> actual_type float float <var> <var>[1] float float float CS 3360 actual_type int actual_type actual_type A <var>[2] = A float + B int 43 Example – Calculating Attributes A=A+B <assign> <expr>.expected_type <- <var>.actual_type <expr>.actual_type <- (<var>[1].actual_type == int && <var>[2].actual_type == int) ? int : float expected_type <expr> actual_type float int <var> <var>[1] int int int CS 3360 actual_type float actual_type actual_type A <var>[2] = A int + B float 44 Attribute Grammars How are attribute values computed? If all attributes were inherited, the tree could be decorated in top-down order. If all attributes were synthesized, the tree could be decorated in bottom-up order. In many cases, both kinds of attributes are used, and it is some combination of top-down and bottom-up that must be used. CS 3360 45 Group Exercise: homework due Tuesday February, 7 at the start of class BNF <cond_expr> -> <expr> ? <expr> : <expr> <expr> -> <var> | <expr> + <expr> <var> -> id Rule id's type can be bool, int, or float. Operands of + must be numeric and of the same type. The type of + is the type of its operands. The first operand of ?: must be of bool and the second and third must be of the same type. The type of ?: is the type of its second and third operands. Given the above BNF and rule: 1. Define an attribute grammar 2. Draw a decorated parse tree for “id ? id : id + id” assuming that the first id is of type bool and the rest are of type int. CS 3360 46