lectures from week 1, 2, 3

Describing Syntax
CS 3360
Spring 2012
Sec 3.1-3.4
Adapted from Addison Wesley’s lecture notes (Copyright © 2004 Pearson Addison Wesley)
CS 3360
1
Outline
Introduction
 Formal description of syntax
 Backus-Naur Form (BNF)
 Attribute grammars (probably next time  )

CS 3360
2
Introduction

Who must use language definitions?
 Implementers
 Programmers


(the users of the language)
Syntax - the form or structure of the expressions,
statements, and program units
Semantics - the meaning of the expressions,
statements, and program units
CS 3360
3
Introduction (cont.)

Example
 Syntax
of Java while statement
while (<boolean-expr>) <statement>
 Semantics?
CS 3360
4
Describing Syntax – Vocabulary
A sentence is a string of characters over
some alphabet
 A language is a set of sentences
 A lexeme is the lowest level syntactic unit
of a language (e.g., *, sum, while)
 A token is a category of lexemes (e.g.,
identifier)

CS 3360
5
Example
index = 2 * count + 17;
Lexemes
index
=
2
*
count
+
17
;
CS 3360
Tokens
identifier
equal_sign
int_literal
mult_op
identifier
plus_op
int_literal
semicolon
6
Describing Syntax

Formal approaches to describing syntax:
 Recognizers (once you have code)
Can tell whether a given string is in a language or
not
 Used in compilers, and called a parser

 Generators (in order to build code)
Generate the sentences of a language
 Used to describe the syntax of a language

CS 3360
7
Formal Methods of Describing
Syntax

Context-Free Grammars (CFG – see automata
course)
 Developed
by Noam Chomsky in the mid-
1950’s
 Language generators, meant to describe the
syntax of natural languages
 Define a class of languages called contextfree languages
CS 3360
8
Formal Methods of Describing
Syntax

Backus-Naur Form
 Invented
by John Backus to describe Algol 58
 Extended by Peter Naur to describe Algol 60
 BNF is equivalent to context-free grammars
 A metalanguage is a language used to describe
another language.
 In BNF, abstractions are used to represent classes of
syntactic structures--they act like syntactic variables
(also called nonterminal symbols)
CS 3360
9
Backus-Naur Form
<while_stmt>  while ( <logic_expr> ) <stmt>

This is a rule (also called a production
rule); it describes the structure of a while
statement
CS 3360
10
Backus-Naur Form



A rule has a left-hand side (LHS) and a righthand side (RHS), and consists of terminal and
non-terminal symbols
A grammar is a finite non-empty set of rules
An abstraction (or non-terminal symbol) can
have more than one RHS
<stmt>  <single_stmt>
| { <stmt_list> }
CS 3360
11
Backus-Naur Form

Syntactic lists are described using recursion
<ident_list>  ident
| ident , <ident_list>
Example sentences:
ident
ident , ident
ident , ident, ident
CS 3360
12
Example
A grammar for small language:
<program>  <stmts>
<stmts>  <stmt> | <stmt> ; <stmts>
<stmt>  <var> = <expr>
<var>  a | b | c | d
<expr>  <term> + <term> | <term> - <term>
<term>  <var> | 5
 Sample program
a = b + 5

CS 3360
13
Exercise

Define a grammar to generate all sentences of the form:
subject verb object .
where subject is “i” or “we”, and verb is “love” or “like”,
and object is “exercises” or “programming”.
CS 3360
14
Exercise

Define the syntax of Java Boolean expressions
consisting of:
 Constants: false and true
 Operators: !, &&, and ||
CS 3360
15
Derivation


A derivation is a repeated application of rules,
starting with the start symbol and ending with a
sentence (all terminal symbols)
Example:
<ident_list>  ident | ident , <ident_list>
<ident_list> => ident , <ident_list>
=> ident , ident , <ident_list>
=> ident, ident , ident
CS 3360
16
More Example
<program>  <stmts>
<stmts>  <stmt> | <stmt> ; <stmts>
<stmt>  <var> = <expr>
<var>  a | b | c | d
<expr>  <term> + <term>
| <term> - <term>
<term>  <var> | 5
a = b + 5
<program> => <stmts>
=> <stmt>
=> <var> = <expr>
=> a = <expr>
=> a = <term> + <term>
=> a = <var> + <term>
=> a = b + <term>
=> a = b + 5

CS 3360
17
Derivation




Every string of symbols in the derivation is a
sentential form
A sentence is a sentential form that has only
terminal symbols
A leftmost derivation is one in which the leftmost
nonterminal in each sentential form is the one
that is expanded
A derivation may be neither leftmost nor
rightmost
CS 3360
18
Exercise

<program>  <stmts>
<stmts>  <stmt> | <stmt> ; <stmts>
<stmt>  <var> = <expr>
<var>  a | b | c | d
<expr>  <term> + <term>
| <term> - <term>
<term>  <var> | 5
Derive
a = b + 5
by using a rightmost derivation.
CS 3360
19
Parse Tree

A hierarchical representation of a
derivation
<program>
<stmts>
<stmt>
<var>
=
<expr>
a <term> +
<term>
<var>
5
b
CS 3360
20
Ambiguity of Grammars

A grammar is ambiguous if and only if it
generates a sentential form that has two or
more distinct parse trees.
CS 3360
21
An Ambiguous Expression Grammar


<expr>  <expr> <op> <expr> | 5
<op>  / | <expr>
<expr>
<expr>
<op> <expr>
<expr> <op>
<expr> <op> <expr>
5
CS 3360
-
5
<expr>
<expr> <op> <expr>
/
5
5
-
5
/
5
22
An Unambiguous Expression
Grammar

If we use the parse tree to indicate precedence levels of
the operators, we cannot have ambiguity
<expr>  <expr> - <term> | <term>
<term>  <term> / 5 | 5
<expr>
<expr>
<term>
5
CS 3360
-
<term>
<term>
/
5
5
23
Exercise

Prove or disprove the ambiguity of the following grammar
<stmt> -> <if-stmt>
<if-stmt> -> if <expr> then <stmt>
| if <expr> then <stmt> else <stmt>
CS 3360
24
Operator Precedence
Derivation:
<expr>  <expr> - <term> | <term>
<term>  <term> / 5 | 5
<expr> => <expr> - <term>
=> <term> - <term>
=> 5 - <term>
=> 5 - <term> / 5
=> 5 - 5 / 5
CS 3360
25
Operator Associativity

Can we describe operator associativity
correctly?
A=A+B + C
(A + B) + C or A + (B + C)?
Does it matter?
CS 3360
26
Operator Associativity

Operator associativity can also be indicated by
a grammar
<expr> -> <expr> + <expr> | 5 (ambiguous)
<expr> -> <expr> + 5 | 5 (unambiguous)
<expr>
<expr>
<expr>
<expr>
+
+
5
5
5
CS 3360
27
Left vs. Right Recursion


A rule is left recursive if its LHS also appears at
the beginning (left end) of its RHS.
A rule is right recursive if its LHS also appears
at the right end of its RHS.
<factor> -> <expr> ** <factor> | <expr>
<expr> -> c
Example: c ** c ** c interpreted as c ** (c ** c)
CS 3360
28
Exercise

Define a BNF grammar for expressions consisting of +, *,
and ** (exponential). The operator ** has precedence
over *, and * has precedence over +. Both + and * are
left associative while ** is right associative.

Using the above grammar, draw a parse tree for the
sentence:
7 + 6 + 5 * 4 * 3 ** 2 ** 1
Exercise to do in groups at the end of lecture 
CS 3360
29
Extended BNF (EBNF)

Extended BNF (just abbreviations):
 Optional
parts are placed in brackets ([ ])
<meth_call> -> ident ( [<expr_list>] )
 Put
alternative parts of RHSs in parentheses and
separate them with vertical bars
<term> -> <term> (+ | -) const
 Put
repetitions (0 or more) in braces ({ })
<ident> -> letter {letter | digit}
CS 3360
30
Example

BNF:
<expr>  <expr> + <term>
| <expr> - <term>
| <term>
<term>  <term> * <factor>
| <term> / <factor>
| <factor>

EBNF:
<expr>  <term> {(+ | -) <term>}
<term>  <factor> {(* | /) <factor>}
CS 3360
31
Exercise / Homework

Write BNF rules for the following EBNF rules:
1. <meth_call> -> <ident> “(” [<expr_list>] “)”
2. <term> -> <term> (+ | -) const
3. <ident> -> letter {letter | digit}
Due on Tuesday at the start of the session!
CS 3360
32
Outline

Introduction
 Describing syntax formally
 Backus-Naur Form (BNF)
 Attribute grammars
CS 3360
33
Attribute Grammars
CFGs cannot describe all of the syntax of
programming languages
 Additions to CFGs to carry some semantic
info along through parse trees
 Primary value of attribute grammars:

 Static
semantics specification
 Compiler design (static semantics checking)
CS 3360
34
Basic Idea


Add attributes, attribute computation functions, and
predicates to CFGs
Attributes



Attribute computation functions




Associated with grammar symbols
Can have values assigned to them
Associated with grammar rules
Specify how to compute attribute values
Are often called semantic functions
Predicate functions


CS 3360
Associated with grammar rules
State some of the syntax and static semantic rules of the
language
35
Example

BNF
<meth_def> -> meth <meth_name> <meth_body> end <meth_name>
<meth_name> -> <identifier>
<meth_body> -> …

AG
1. Syntax rule: <meth_def> -> meth <meth_name>[1]
<meth_body>
end <meth_name>[2]
Predicate: <meth_name>[1].string == <meth_name>[2].string
2. Syntax rule: <meth_name> -> <identifier>
Semantic rule: <meth_name>.string <- <identifier>.string
CS 3360
36
Attribute Grammars Defined

An attribute grammar is a CFG with the following
additions:
A
set of attributes A(X) for each grammar symbol X



A(X) consists of two disjoint sets S(X) and I(X)
S(X): synthesized attributes
I(X): inherited attributes
 Each
rule has a set of functions that define certain
attributes of the non-terminals in the rule
 Each rule has a (possibly empty) set of predicates to
check for attribute consistency
CS 3360
37
Attribute Functions



Let X0  X1 ... Xn be a rule
Functions of the form S(X0) = f(A(X1), ... , A(Xn))
define synthesized attributes
Functions of the form I(Xj) = f(A(X0), ... , A(Xn)),
for 1 <= j <= n, define inherited attributes.
 Often

of the form: I(Xj) = f(A(X0), ... , A(Xj-1))
Initially, there are intrinsic attributes on the
leaves.
 Intrinsic
attributes are synthesized attributes whose
value are determined outside the parse tree.
CS 3360
38
Example - Type Checking Rules

BNF
<assign> -> <var> = <expr>
<expr> -> <var> | <var> + <var>
<var> -> A | B | C

Rule


A variable is either int or float.
If the two operands of + has the same type, the type of expression is that of the
operands; otherwise, it is float.
 The type of the left side of assignment must match the type of the right side.

Attributes
 actual_type: synthesized for <var> and <expr>
 expected_type: inherited for <expr>
 string: intrinsic for <var>
CS 3360
39
Example – Attribute Grammar
1. Syntax rule: <assign> -> <var> = <expr>
Semantic rule: <expr>.expected_type <- <var>.actual_type
2. Syntax rule: <expr> -> <var>[1] + <var>[2]
Semantic rule: <expr>.actual_type <(<var>[1].actual_type == int
&& <var>[2].actual_type == int) ? int : float
Predicate: <expr>.actual_type == <expr>.expected_type
3. Syntax rule: <expr> -> <var>
Semantic rule: <expr>.actual_type <- <var>.actual_type
Predicate: <expr>.actual_type == <expr>.expected_type
4. Syntax rule: <var> -> A | B | C
Semantic rule: <var>.actual_type <- lookup(<var>.string)
CS 3360
40
Example – Parse Tree

A=A+B
<assign>
<expr>
<var>
A
CS 3360
<var>[1]
=
A
<var>[2]
+
B
41
Example – Flow of Attributes

A=A+B
<assign>
<expr>.expected_type <- <var>.actual_type
<expr>.actual_type <- (<var>[1].actual_type == int
&& <var>[2].actual_type == int) ? int : float
expected_type <expr> actual_type
<var>
<var>[1]
CS 3360
actual_type
actual_type
actual_type
A
<var>[2]
=
A
+
B
42
Example – Calculating Attributes

<expr>.expected_type <- <var>.actual_type
<expr>.actual_type <- (<var>[1].actual_type == int
&& <var>[2].actual_type == int) ? int : float
A=A+B
<assign>
expected_type <expr> actual_type
float
float
<var>
<var>[1]
float
float
float
CS 3360
actual_type
int
actual_type
actual_type
A
<var>[2]
=
A
float
+
B
int
43
Example – Calculating Attributes

A=A+B
<assign>
<expr>.expected_type <- <var>.actual_type
<expr>.actual_type <- (<var>[1].actual_type == int
&& <var>[2].actual_type == int) ? int : float
expected_type <expr> actual_type
float
int
<var>
<var>[1]
int
int
int
CS 3360
actual_type
float
actual_type
actual_type
A
<var>[2]
=
A
int
+
B
float
44
Attribute Grammars

How are attribute values computed?
 If
all attributes were inherited, the tree could
be decorated in top-down order.
 If all attributes were synthesized, the tree
could be decorated in bottom-up order.
 In many cases, both kinds of attributes are
used, and it is some combination of top-down
and bottom-up that must be used.
CS 3360
45
Group Exercise: homework due
Tuesday February, 7 at the start of class

BNF
<cond_expr> -> <expr> ? <expr> : <expr>
<expr> -> <var> | <expr> + <expr>
<var> -> id


Rule
 id's type can be bool, int, or float.
 Operands of + must be numeric and of the same type.
 The type of + is the type of its operands.
 The first operand of ?: must be of bool and the second and third must
be of the same type.
 The type of ?: is the type of its second and third operands.
Given the above BNF and rule:
1. Define an attribute grammar
2. Draw a decorated parse tree for “id ? id : id + id” assuming that the
first id is of type bool and the rest are of type int.
CS 3360
46