Chapter 3 updated

advertisement
Chapter 3
Describing
Syntax and Semantics
Chapter 3: Describing Syntax and Semantics
Objectives:
- Introduction
- The General Problem of Describing Syntax
- Formal Methods of Describing Syntax
- Attribute Grammars
- Describing the Meanings of Programs: Dynamic Semantics
2
3.1 Introduction
 A language is a set of sentences or statements.
Sentences or statements are the valid strings of a language.
They consist of valid alphabets sequenced in a way that is
consistent with the grammar of the language.
Who must use language definitions?
-Language designers
- Implementers
- Programmers (the users of the language)
- A formal description of the language is essential in
learning, writing, and implementing the language. Thus
description must be precise and understandable.
3
- Languages are described by their syntaxes and semantics
What is the meaning of Syntax? It means the form or
structure of the expressions, statements and program units.
The Syntax rules of a language specify which strings of
characters from the language’s alphabet are in the language.
What is the meaning of Semantics? It means the
meaning of the expressions, statements, and program
units.
-Syntax describes what the language looks like.
-Semantics determines what a particular construct actually
does in a formal way.
- Syntax is much easier to describe than semantics.
4
3.2 The General Problem of Describing Syntax: Terminology
Formal descriptions of the syntax of programming languages, for
simplicity sake, often do not include descriptions of the lowest-level
syntactic units. These small units are called lexemes.
A lexeme is basic component during lexical analysis. A lexeme
consists of related alphabet from the language. e.g. numeric literals,
operators (+), and special words (begin).
Lexemes are partitioned into groups. For example, the names of
variables, methods, classes, and so forth in a programming language
form a group called identifiers. Each lexeme group is represented by a
name, or a token.
A token of a language is a category of its lexemes, and a lexeme is an
instance of a token. For example, an identifier is a token that can have
lexemes, or instances, such as counter and total (variable names).
5
Example: consider the following Java statement
index = 2 * count +17;
The lexemes and tokens of this statement are represented
in the following table
Lexemes
Tokens
index
identifier
=
equal_sign
2
int_literal
*
mult_op
17
int_literal
;
semicolon
6
In general, languages can be formally defined in two distinct ways:
A language can be (1) generated or (2) recognized.
3.2.1 Language Recognizer
A recognizer of a language identifies those strings that are within
a language from those that are not.
The lexical analyzer and the parser of a compiler are the
recognizer of the language the compiler translates. The lexical
analyzer recognizes tokens, and the parser recognizes the
syntactic structure.
3.2.2 Language Generators
A generator generates the valid sentences of a language.
In some cases it is more useful than the recognizer since we can
“watch and learn”.
7
3.3 Formal Methods of Describing Syntax
The formal language that is used to describe the syntax of
programming languages is called grammar.
3.3.1 BNF (Backus-Naur Form) and Context-Free Grammars
BNF is widely accepted way to describe the syntax of a
programming language.
3.3.1.1 Context-Free Grammars
Regular expression and context-free grammar are useful in
describing the syntax of a programming language. Regular
expression describes how a token is made of alphabets, and
context-free grammar determines how the tokens are put together
as a valid sentence in the grammar.
8
3.3 Formal Methods of Describing Syntax
3.3.1.2 Origins of Backus- Naur From (BNF)
John Backus and Peter Naur used BNF to describe ALGOL 58
and ALGOL 60, and BNF is nearly identical to context-free
grammar.
3.3.1.3 BNF Fundamentals
•Metalanguage is a language that is used to describe another
language. BNF is a metalanguage for programming languages.
• The BNF consists of rules (or productions). A rule has a lefthand side (LHS) as the abstraction, and a right-hand side (RHS)
as the definition. As in java assignment statement the definition
could be as follows:
<assign> → <var> = <expression>
9
•The abstractions in a BNF description, or grammar, are often called
nonterminal symbols, or simply nonterminals.
•Lexemes and tokens of the rules are called terminal symbols, or
simply terminals.
•A rule has a left-hand side (LHS), which is a nonterminal, and a
right-hand side (RHS), which is a string of terminals and/or
nonterminals i.e. mixture of tokens, lexemes, and references to other
abstractions.
•Nonterminals are often enclosed in angle brackets < >
•A BNF description, or grammar, is a collection of rules.
The rule indicates that whenever you see a nonterminal on the LHS,
you can replace it with the RHS, just like expanding a non-terminal
into its children in a tree.
10
An abstraction (or nonterminal symbol) can have more than one
RHS. e.g.
<if_stmt> → if ( <logic_expr> ) <stmt>
| if ( <logic_expr> ) <stmt> else <stmt>
3.3.1.4 Describing Lists
Recursion is used in BNF to describe lists. e.g.
<ident_list> → ident
| ident, <ident_list>
11
3.3.1.5 Grammars and Derivations
-BNF is a generator of the language. The sentences of a language can
be generated from the start symbol by applying a series of rules on it.
- A derivation is a repeated application of rules, starting with the start
symbol and ending with a sentence (all terminal symbols)
-Replacing a non-terminal with different RHS’s may derive different
sentences.
-Each string of symbols during a derivation is a sentential form. A
sentence is a sentential form that has only terminal symbols
-If we replace every leftmost non-terminal of the sentential form, the
derivation is leftmost. However, the set of sentences generated are not
affected by the derivation order.
12
Example (3.1) a grammar for small language
<program> → begin <stmt_list> end
<stmt_list> → <stmt>
| <stmt>; <stmt_list>
<stmt> → <var> = <expression>
<var> → A | B | C
<expression> → <var> + <var>
| <var> - <var>
| <var>
A derivation of a program in this language follows:
<program> => begin <stmt_list> end
=> begin <stmt> ; <stmt_list> end
=> begin <var> = <expression>; <stmt_list> end
=> begin A= <expression>; <stmt_list> end
=> begin A = <var> + < var> ; <stmt_list> end
=> begin A = B+ < var> ; <stmt_list> end
=> begin A = B + C; <stmt_list> end
=> begin A = B + C; <stmt> end
=> begin A = B + C ; <var> = <expression> end
=> begin A = B + C ; B = <expression> end
=> begin A = B + C ; B = <var> end
=> begin A = B + C ; B = C end
13
Example (3.2) a grammar for a Simple Assignment Statements
<assign> → <id> = <expr>
<id> → A | B | C |
<expr> → <id> + <expr>
| <id>*<expr>
| (<expr>)
| <id>
A = B*(A+C) is generated by the leftmost derivation:
<assign> => <id> = <expr>
=> A = <expr>
=> A = <id> * <expr>
=> A = B * <expr>
=> A = B * (<expr>)
=> A = B * (<id> + <expr>)
=> A = B * (A + <expr>)
=> A = B * (A + <id>)
=> A = B * (A + C)
14
3.3.1.6 Parse Trees
A derivation can be represented by a tree hierarchy called parse
tree. The root of the tree is the start symbol and applying a
derivation rule corresponds to expand a non-terminal in a tree into
its children.
A parse tree for the simple statement: <assign>
A = B * (A + C)
<id>
=
<expr>
<id>
*
A
(
<expr>
<expr>
)
B
<id>
+
<expr>
<id>
A
C
15
3.3.1.7 Ambiguity
A grammar is ambiguous if for a given sentence, there is more than
one parse tree, i.e., there are two derivations that lead to the same
sentence.
Two distinct parse trees (generated by the grammar on the next
slide) for the same sentence, A = B +C * A
<assign>
<id>
A
=
<assign>
<expr>
<expr>
<id>
B
+
<expr>
<id>
C
<id>
<expr>
*
A
<expr>
<id>
A
<expr>
<id>
<expr>
=
<expr>
+
*
<expr>
<expr>
<id>
<id>
A
B
C
16
An Ambiguous Grammar
<assign> → <id> = <expr>
<id> → A | B | C
<expr> → <expr> + <expr>
| <expr> * <expr>
| ( <expr> )
| <id>
17
3.3.1.8 Operator Precedence
Operator Precedence can be maintained by modifying a grammar
so that operators with higher precedence are grouped earlier with
its operands, so that they appear lower in the parse tree.
If we use the parse tree to indicate precedence levels of the
operators, we cannot have ambiguity
<expr> → <expr> - <term> | <term>
<term> → <term> / const | const
18
An Unambiguous Grammar
<assign> → <id> = <expr>
<id> → A | B | C
<expr> → <expr> + <term>
| <term>
<term> → <term> * <factor>
| <factor>
<factor> → ( <expr> )
| <id>
19
3.3.1.9 Associativity of Operators
Operator associativity can also be indicated by a grammar.
<expr> => <expr> + <expr> | const
<expr> => <expr> + const | const
(ambiguous)
(unambiguous)
<expr>
<expr>
<expr>
+
+
const
const
const
20
3.3.2 Extended BNF
- EBNF has the same expression power as BNF
- The addition includes optional constructs (parts), repetition, and
multiple choices, very much like regular expression.
- Optional parts are placed in brackets ([ ])
<if_stmt> → if (<expression>) <statement> [else <statement>]
- Alternative parts of RHSs in parentheses and separate them with
vertical bars
<term> → <term> (* | / | %) <factor>
- Put repetitions (0 or more) are placed inside braces ({})
<ident_list> → <identifier> {, <identifier>}
21
3.3.2 Extended BNF
BNF
<expr>  <expr> + <term>
| <expr> - <term>
| <term>
<term>  <term> * <factor>
| <term> / <factor>
| <factor>
EBNF
<expr>  <term> {(+ | -) <term>}
<term>  <factor> {(* | /) <factor>}
22
3.4 Attribute Grammars
An attribute grammar is an extension to a context –free grammar . It
is used to describe more of the structure of a programming language
than can be described with context-free grammar.
The extension allows certain language rules to be described, such as
type compatibility.
3.4.1 Static Semantics
- There are many restrictions on programming languages that are
either difficult or impossible to describe in BNF, however, they can be
described when we add attributes to the terminal/non-terminals in
BNF.
- These added attribute and their computation could be computed at
compile-time, thus the name static semantics.
23
3.4 Attribute Grammars
3.4.1 Static Semantics (contd.)
- Consider a rule which states that a variable must be declared
before it is referenced.
- Cannot be specified in a context-free grammar.
-Can be tested at compile time.
- Some rules can be specified in the grammar of a language, but
will unnecessarily complicate the grammar.
e.g. a rule in JAVA that states that a string literal cannot be
assigned to a variable which was declared to be type int.
24
3.4 Attribute Grammars
3.4.2 Basic concepts
An attribute grammar consists of, in addition to its BNF rules, the
attributes of grammar symbols, a set of attribute computation
functions (or semantic functions), and predicate functions that
determine whether a rule could be applied. The latter two functions
are associated with the grammar rules. Thus basic concepts are:
- Attribute grammars are grammars to which have been added
attributes, attribute computation functions and predicate functions.
25
- Attributes  associate with grammar symbols, are similar
to variables in the sense that they can have values assigned
to them.
- Attribute computation functions  semantic functions,
are associated with grammar rules. They are used to specify
how attribute values are computed.
- Predicate function  which state some of the syntax and
static semantic rules of the language are associated with
grammar rule.
26
3.4. 3 Attribute grammars defined
Definition: An attribute grammar is a context-free grammar
(CFG) with the following additions:
- For each grammar symbol X there is a set A(X) of attribute
values.
- Each rule has a set of functions that define certain attributes
of the non-terminals in the rule.
- Each rule has a set of predicates to check for attribute
consistency.
-Associate with each grammar symbol X is a set of attributes
A(X).
- A(X) consists of two disjoint sets S(X) and I(X), called
synthesized and inherited attributes
27
Attribute Grammars (continued)
- Synthesized attributes are used to pass semantic information up a
parse tree.
- Inherited attributes pass semantic information down and across a
parse tree.
- Let X0  X1 … Xn be a rule
- Function of the form S(X0) = f (A(X1), … , A (Xn)) define synthesized
attributes of X0. Their values depend only on children nodes.
- Functions of the form I(Xj) = f(A(X0), ... , A (Xn)),
for 1 <= j <= n, define inherited attributes of Xj. Their values depend
on parent node or sibling nodes.
Note that, to avoid circularity, inherited attributes are often restricted to
functions of the form
I(Xj) = f(A(X0), … , A(X(j-1)))
This form prevents an inherited attribute from depending on itself or
on attributes to the right in the parse tree.
28
3.4.4 Intrinsic attributes
 They are synthesized attributes of leaf nodes
whose values are determined outside the parse tree.
 For example, the type of an instance of a variable
in a program could come from the symbol table,
which is used to store variable names and their
types.
29
3.4.5 Examples of Attribute Grammars
 Example: expressions of the form expr = id + id
 id's can be either int_type or real_type.
 When they are the same, the expression type is that of the operands (id's).
 When the operand types (id's) are not the same, the type of the expression
is always real.
 Type of the expression must match it's expected type.
 BNF
 <assign>  <var> = <expr>
 <expr>  <var> + <var>
| <var>
 <var>  A | B | C
• Attributes:
Actual_type: A synthesized attribute
associated with the non-terminals
for <var> and <expr>
Expected_type: An inherited
attribute associated with the
non-terminal <expr>
30
An attribute grammar for simple assignment statements
1.
Syntax rule: <assign>  <var> = <expr>
Semantic rule: <expr>.expected_type  <var>.actual_type
2.
Syntax rule: <expr>  <var>[2] + <var>[3]
Semantic rule: <expr>.actual_type 
if (<var>[2].actual_type =int) and
(<var>[3].actual_type =int)
then int
else real
end if
Predicate: <expr>.actual_type == <expr>.expected_type
31
3.
4.
Syntax rule: <expr>  <var>
Semantic rule: <expr>.actual_type  <var>.actual_type
Predicate: <expr>.actual_type == <expr>.expected_type
Syntax rule: <var>  A | B | C
Semantic rule: <var>.actual_type  look-up (<var>.string)
The look-up function looks up a given name in symbol table and
returns the variable’s type.
•A parse tree of the sentence A = A + B
generated by the grammar in previous
Example.
32
3.4.6 Computing Attribute Values
1.
2.
3.
4.
5.
<var>.actual_type ← look-up(A) (Rule 4)
<expr>.expected_type ← <var>.actual_type (Rule 1)
<var>[2].actual_type ← look-up(A) (Rule 4) <var>[3].actual_type
← look-up(B) (Rule 4)
<expr>.actual_type ← either int or real (Rule 2)
<expr>.expected_type == <expr>.actual_type is either TRUE or
FALSE (Rule 2)
Figure 3.7: The flow of attributes
in the tree
33
Computing Attribute Values Conts.
Figure 3.8: a fully attributed parse tree
34
3.5 Describing the meaning of Programs: Dynamic
Semantics
- The dynamic semantics of a program is the meaning of its
expressions, statements, and program units. Dynamic semantic
determines the meanings of programming constructs during their
execution.
- There is no single widely accepted notation or formalism for
describing semantics.
- Several needs for a methodology and notation for semantics:
- Programmers need to know what statements mean
- Compiler writers must know exactly what language
constructs do
- Three methods that are used to describe semantics formally:
- Operational semantics
- Axiomatic semantics
-Denotational semantics
35
3.5.1 Operational Semantics
 The idea behind operational semantics is to describe
the meaning of a statement or program by specifying
the effects of running it on a machine.
C Statement
Operational Semantics
36
3.5.2 Denotational Semantics
 It is the most rigorous, widely known method for describing the
meaning of programs
 Based on recursive function theory
 The most abstract semantics description method
 Originally developed by Scott and Strachey (1970)
 The process of building a denotational specification for a
language (not necessarily easy):
• Define a mathematical object for each language entity
• Define a function that maps instances of the language
entities onto instances of the corresponding mathematical
objects
 The method is named denotational because the mathematical
objects denote the meaning of their corresponding syntactic
entities.
37
The difference between denotational and operational semantics:
• In operational semantics, programming language constructs are
translated into simpler programming language constructs.
• In denotational semantics, programming language constructs are
mapped to mathematical objects
38
3.5.2.1 Simple Example
Decimal Numbers
<dec_num> 
'0' | '1' | '2' | '3' | '4' | '5' |
'6' | '7' | '8' | '9'
| <dec_num> ('0' | '1' | '2' | '3' |
'4' | '5' | '6' | '7' |
'8' | '9')
Mdec('0') = 0,
Mdec ('1') = 1, …,
Mdec ('9') = 9
Mdec (<dec_num> '0') = 10 * Mdec (<dec_num>)
Mdec (<dec_num> '1’) = 10 * Mdec (<dec_num>) + 1
…
Mdec (<dec_num> '9') = 10 * Mdec (<dec_num>) + 9
39
3.5.2.2 The state of a program
 The state of a program is the values of all its current variables
s = {<i1, v1>, <i2, v2>, …, <in, vn>}
 Let VARMAP be a function that, when given a variable name and
a state, returns the current value of the variable
VARMAP(ij, s) = vj
40
3.5.3 Axiomatic Semantics
Axiomatic semantics is based on mathematical logic.
Axiomatic
semantics
defined
in
conjunction
with
the
development of a method to prove the correctness of a program
rather than directly specifying the meaning of a program
 Each statement of a program is both preceded and followed by
a logical expression that specifies constraints on program
variables.
 Simple Boolean expressions are adequate to express constraints
41
3.5.3.1 Assertions
 The logical expressions are called predicates, or assertions.
 An assertion immediately preceding a program statement
describes the constraints on the program variables at that
point in the program (precondition ).
 An assertion immediately following a statement describes
the new constraints on those variables (and possibly others)
after execution of the statement (postcondition).
 Precondition and postcondition assertions are presented in
braces to distinguish them from parts of program
statements  {x > 10} sum = 2 * x + 1 {sum > 1}
 Developing an axiomatic description or proof of a given
program requires that every statement in the program have
both a precondition and a post condition.
42
Example : Show that the program segment
y=2
{x = 1} z = x + y {z = 3}
 is correct with respect to the precondition {P}: x = 1 and the
postcondition {Q}: z = 3.
Solution:
 Suppose that p is true, so that x=1 as the program begins.
 Then y is assigned 2
 and z is assigned the sum of x and y, which is 3.
 Hence program segment is correct..
x
1
y
2
z
1 + 2 = 3 , thus: {P} statement {Q} is true.
43
Summary
 BNF and context-free grammars are equivalent meta-
languages
 Well-suited for describing the syntax of programming
languages
 An attribute grammar is a descriptive formalism that can
describe both the syntax and the semantics of a language
 Three primary methods of semantics description
 Operational, axiomatic, denotational
44
Download