Programming Languages
2nd edition
Tucker and Noonan
Chapter 7
Semantics
Surely all this is not without meaning.
Ishmael, Moby Dick by Herman Melville
Copyright © 2006 The McGraw-Hill Companies, Inc.
7.1 Motivation
7.2 Expression Semantics
7.3 Program State
7.4 Assignment Semantics
7.5 Control Flow Semantics
Copyright © 2006 The McGraw-Hill Companies, Inc.
Semantics is a precise definition of the meaning of a program that is syntactically correct and valid according to the type rules.
Semantics describes run-time behavior
A programming language is complete only when its syntax, type system, and semantics are well-defined.
Copyright © 2006 The McGraw-Hill Companies, Inc.
To provide an authoritative definition of the meaning of all language constructs for:
1. Programmers
2. Compiler writers
3. Standards developers
A program written in a given language must always produce the same results when given the same data, regardless of the compiler or computer.
Copyright © 2006 The McGraw-Hill Companies, Inc.
• Operational Semantics : What happens when a program is run on a machine – how statements change the machine’s state.
–
Early languages were designed for a specific computer; e.g, Fortran on IBM 709; today, semantics are general
•
Axiomatic semantics
– Chapter 18
–
Abstract, based on formal logic
•
Denotational semantics: Statements as program state-transforming functions – (Chapter 8)
–
Rigorous, mathematical, defines functions that specify exactly how state is affected by each language object.
Copyright © 2006 The McGraw-Hill Companies, Inc.
Copyright © 2006 The McGraw-Hill Companies, Inc.
Notation (infix, Polish, etc.)
Precedence/associativity rules (if not completely specified in the grammar)
Special issues short-circuit evaluation representational issues subexpression evaluation and side effects
Copyright © 2006 The McGraw-Hill Companies, Inc.
• Infix : a binary operator is between its two operands.
• What is the meaning of a + b – c * d
–
Is it (((a + b)- c)* d) ?
–
Or ( a + ( b – c) * d) ? Or …?
• Without knowledge of the language, the expression is ambiguous
• Languages that use infix may also use associativity and precedence rules to disambiguate.
–
Parentheses can be used to override normal evaluation order to achieve the desired meaning.
Copyright © 2006 The McGraw-Hill Companies, Inc.
• Infix : (a + b) - (c * d)
• Polish Prefix : - + a b * c d
• Polish Postfix : a b + c d * -
• Cambridge Polish : (- (+ a b) (* c d))
Identify the subexpressions.
Copyright © 2006 The McGraw-Hill Companies, Inc.
• Completely specifying evaluation order leads to large grammars; one alternative: use an un ambiguous notation
–
Polish prefix: a binary operator precedes its operands
Polish Prefix : - + a b * c d
–
Polish postfix: a binary operator follows its operands
Polish Postfix : a b + c d * -
• Preorder and postorder traversals of the parse tree generate the Polish notations.
Copyright © 2006 The McGraw-Hill Companies, Inc.
Pre-order: visit the root, visit the left subtree, visit the right subtree
Post-order: visit the left subtree, visit the right subtree, visit the root
Copyright © 2006 The McGraw-Hill Companies, Inc.
• If all operators are binary, this notation eliminates the need for parentheses, precedence rules, or associativity rules
– a*b*c : **abc (left associativity) or * a*bc (right)
– a*b*c : abc** or ab*c* (which is left associativity?)
• Flaw: what about unary operators?
- a + b * c (+-a*bc won’t work)
One solution: Use a different symbol (e.g. ‘~’) for the unary operator
Copyright © 2006 The McGraw-Hill Companies, Inc.
• Polish prefix and postfix fail if operators aren’t binary
- a + b * c (+-a*bc won’t work)
• Solution 1: use different symbol for unary operators
• Solution 2: Cambridge prefix - operators can be n-ary:
Rewrite a + b – c * d as (-(+ a b)(* c d))
Rewrite a + b + c + d as (+ a b c d)
Rewrite -a + b * c as (+(-a)(*b c)
Lisp and Scheme use Cambridge prefix.
Operators precede operands, fully parenthesized.
Copyright © 2006 The McGraw-Hill Companies, Inc.
Language + - * / Unary - ** == != < ...
C-like
Ada
Fortran
L
L
L
R non
R non
R
L non
L
In some languages the meaning of a < b < c is not the same as the meaning of (a < b) && (b < c)
Argues for non-associativity of logical operators
Copyright © 2006 The McGraw-Hill Companies, Inc.
Operators C-like
Unary 7
**
* / 6
+ -
== !=
< <= ...
not (!) log. and 2 log. or 1
3
7
5
4
Ada Fortran
2
2
3
2
1
1
3
5
4
3
5
4
3
2
2
2
1
1
Copyright © 2006 The McGraw-Hill Companies, Inc.
Smalltalk: left-to-right evaluation, uniform precedence; left associative
APL: right-to-left evaluation, uniform precedence, right associative
APL example:
The following expression sorts a word list stored in matrix X according to word length:
X[ ⍋ X+.≠' ';]
Copyright © 2006 The McGraw-Hill Companies, Inc.
Summary
Grammars can capture some semantic features, notably operator precedence and associativity
To simplify the grammar, languages with many operators often write an ambiguous grammar and use tables such as those in the previous examples during translation.
Copyright © 2006 The McGraw-Hill Companies, Inc.
Short-circuit evaluation starts evaluating from the left and stops when the expression value is known
So a && b is evaluated as: if a then b else false
Or if (!a) false else B
Copyright © 2006 The McGraw-Hill Companies, Inc.
Likewise a || b is evaluated as: if a then true else b
One reason for short-circuit is slightly more efficient.
Another reason: allows for situations where the second operand might be undefined.
Copyright © 2006 The McGraw-Hill Companies, Inc.
Node p = head; while (p != null && p.info
!= key) p = p.next; if (p == null) // not in list
...
else // found it
// without short-circuiting, what happens if p is null?
Copyright © 2006 The McGraw-Hill Companies, Inc.
Without Short-circuit Evaluation:
} boolean found = false; while (p != null && !found) { if (p.info == key) found = true; else p = p.next;
Copyright © 2006 The McGraw-Hill Companies, Inc.
If an expression has been statically type checked its meaning should depend only on sub-expression values and operator meaning.
But …
• Number representation (finite-ness)
• Side effects
Copyright © 2006 The McGraw-Hill Companies, Inc.
In computers, unlike the real world, numbers are finite: there is a largest and a smallest.
Associativity does not always hold.
Is (a + b) + c = a + (b + c) ?
What if a = IntMax , b = 3 , c = -5 ?
See example for floats in a previous lecture.
Copyright © 2006 The McGraw-Hill Companies, Inc.
Side Effects – of Functions or Expressions
“A function or expression is said to have a side effect if, in addition to producing a value, it also modifies some state or has an observable interaction with calling functions or the outside world.
For example, a function might modify a global or a static variable , modify one of its arguments, raise an exception, write data to a display or file, read data, or call other side-effecting functions.” http://en.wikipedia.org/wiki/Side_effect_(computer_science)
Copyright © 2006 The McGraw-Hill Companies, Inc.
Side Effects – of Functions or Expressions
• Side effects are not necessarily bad
– They are how void functions accomplish their job
• However, sometimes they have unexpected consequences, particularly if there are language issues that are semantically undefined.
Copyright © 2006 The McGraw-Hill Companies, Inc.
What is the value of a = b * i++ + c * i; when i = 2; b = 2; c = 5;
Order of evaluation for subexpressions is not defined in most languages: ( b * i++) may be evaluated before or after ( c * i)
In C-like languages, a is semantically undefined.
Or what about the expression y + f(x, y); if f modifies the value of y
Copyright © 2006 The McGraw-Hill Companies, Inc.
“The state of a program is the binding of all active
[alive] objects to their current values.” – page 160
State-related maps:
1. The pairing of active objects with specific memory locations ( environment )
2. The pairing of active memory locations with their current values ( memory map )
Copyright © 2006 The McGraw-Hill Companies, Inc.
Given a program with two variables, i and j , currently associated with memory addresses 154 and 155 and having values 13 and -1: environment = { < i , 154>, < j , 155>} memory = {<0, undef
>, …, <154, 13 >, <155, -1 >, …} program state = memory X environment
= {< i, 13 >, < j, -1 >}
Copyright © 2006 The McGraw-Hill Companies, Inc.
The individual steps that occur during a program run can be viewed as a series of state transformations.
For example, in the previous case, executing statement i = i + 2 * j produces this state: {< i,
11 >, < j, -1 >}
A program trace (similar to what could be provided by a debugger) shows changes in program state.
Copyright © 2006 The McGraw-Hill Companies, Inc.
// compute the factorial of n
1 void main ( ) {
2 int n, i, f;
3 n = 3;
4 i = 1;
5 f = 1;
6 while (i < n) {
7 i = i + 1;
8
9 } f = f * i;
10 }
Copyright © 2006 The McGraw-Hill Companies, Inc.
// compute the factorial of n
1 void main ( ) {
2 int n, i, f;
3 n = 3;
4 i = 1;
5 f = 1;
6 while(i < n) {
7 i = i + 1;
8 f = f * i;
9 }
10 } n i f undef? undef undef
3 undef undef
Copyright © 2006 The McGraw-Hill Companies, Inc.
// compute the factorial of n
1 void main( ) {
2 int n, i, f;
3 n = 3;
4 i = 1;
5 f = 1;
6 while(i < n) {
7 i = i + 1;
8 f = f * i;
9 }
10 } n i f
3
3 undef undef
1 undef
Copyright © 2006 The McGraw-Hill Companies, Inc.
// compute the factorial of n
1 void main( ) {
2 int n, i, f;
3 n = 3;
4 i = 1;
7
8
5 f = 1;
6 while(i < n) { i = i + 1; f = f * i;
9 }
10 } n
3
3
3
Copyright © 2006 The McGraw-Hill Companies, Inc.
i f undef undef
1 undef
1 1
// compute the factorial of n
1 void main( ) {
2 int n, i, f;
3 n = 3;
4 i = 1;
7
8
5 f = 1;
6 while(i < n) { i = i + 1; f = f * i;
9 }
10 } n
3
3
3
3
Copyright © 2006 The McGraw-Hill Companies, Inc.
i f
1
1 undef undef
1 undef
1
1 (no change)
// compute the factorial of n
1 void main( ) {
2 int n, i, f;
3 n = 3;
4 i = 1;
7
8
5 f = 1;
6 while(i < n) { i = i + 1 f = f * i;
;
9 }
10 } n
3
3
Copyright © 2006 The McGraw-Hill Companies, Inc.
i
1
2 f
1
1
// compute the factorial of n
1 void main( ) {
2 int n, i, f;
3 n = 3;
4 i = 1;
7
8
5 f = 1;
6 while(i < n) { i = i + 1; f = f * i ;
9 }
10 } n
3
Copyright © 2006 The McGraw-Hill Companies, Inc.
i
2 f
3
(In C/C++ undefined variables are initialized according to their types but not in every language.)
Loop test – no side effects
Assignment statements – have side effects
T he program state is altered.
Order of sub-expression evaluation will be an issue if the sub-expressions have side effects.
Copyright © 2006 The McGraw-Hill Companies, Inc.
Abstract syntax for Clite assigment statement
Assignment = Variable target : Expression source
Semantic interpretation :
• Evaluate the source expression, using current state
• Get value
• Replace value of target variable with value, getting a new state.
Copyright © 2006 The McGraw-Hill Companies, Inc.
Meaning rules map states to states
Meaning Rule 8.3 (page 202) The meaning of an assignment statement is the result of replacing the value of the target Variable by the value of the source Expression in the current state.
Copyright © 2006 The McGraw-Hill Companies, Inc.
Assignment = Variable target : Expression source
M: Assignment X State → State
M(Assignment a, State state )
= state
ט
{<a.target, M(a.source, state)>}
Loose interpretation: the meaning of an assignment a given the state state is the state in which the only change is that the target variable of a has a new value which is defined as the meaning of the source expression.
Copyright © 2006 The McGraw-Hill Companies, Inc.
Issues
• Multiple assignment
• Assignment statement vs. expression
• Copy vs. reference semantics
Copyright © 2006 The McGraw-Hill Companies, Inc.
Example: a = b = c = 0;
Sets all 3 variables to zero.
a = b = c;
Associativity???
Copyright © 2006 The McGraw-Hill Companies, Inc.
In most languages, assignment is a statement; cannot appear in an expression.
In most C-like languages, assignment is an expression.
• while (*p++ = *q++) ; // strcpy
• while (ch = getc(fptr)) ... // ???
• while (p = p->next) ... // ???
Expression value: the value assigned to the target var.
Copyright © 2006 The McGraw-Hill Companies, Inc.
if (a = 0) ...
// an error ???
The meaning (semantics) of an assignment statement is not the same as the meaning of an assignment expression.
Copyright © 2006 The McGraw-Hill Companies, Inc.
• Copy: a = b;
– a, b have same value.
– Changes to either have no effect on other.
– Used in imperative languages.
• Reference
– a, b point to the same object.
– A change in object state affects both
– Used by many object-oriented languages.
Copyright © 2006 The McGraw-Hill Companies, Inc.
Textbook definition: A programming language is
Turing complete if its programs are capable of computing any computable functions.
Or… if it can compute the same functions that a
Turing machine can compute
Turing machine = a simple, theoretical model of a computer; can recognize all languages described by any grammar (including the unrestricted grammars)
Importance: in theoretical CS – decidability issues
Copyright © 2006 The McGraw-Hill Companies, Inc.
To be Turing complete, an imperative language needs:
• Statement sequencing
• Conditional statement
• Looping statement
Bohm-Jacopini Theorem
Copyright © 2006 The McGraw-Hill Companies, Inc.
s1 s2
Semantics: in the absence of a branching statement,
• First execute s1
• Then execute s2
• Output state of s1 becomes the input state of s2
Branching statements: return , break , continue , goto,
…
Copyright © 2006 The McGraw-Hill Companies, Inc.
Conditional
IfStatement
if ( Expresion ) Statement
[ else Statement ]
Example:
Meaning: if (a > b) z = a; else z = b;
If the test expression is true, then the output state of the conditional is the output state of the then branch, else the output state of the conditional is the output state of the else branch.
Copyright © 2006 The McGraw-Hill Companies, Inc.
Conditional
Switch (Case) Statement
Originated in Fortran as a computed goto: goto (100, 200, 300, 400), i
Some issues for a switch statement:
• Data type of switch variable – some integral type
• Whether or not to require a break after each case
– Use of implicit break (Ada) is less error prone
• Whether or not to allow a range of values
Copyright © 2006 The McGraw-Hill Companies, Inc.
WhileStatement
while ( Expression ) Statement
The expression is evaluated.
If it is true, first the statement is executed, and then the loop statement is executed again.
Otherwise the loop terminates.
Copyright © 2006 The McGraw-Hill Companies, Inc.
• Begin-test loop (e.g. while )
• End-test loop (e.g. do_while )
• Counting loop (e.g. for )
• Iterator loops (e.g., for-each )
– An iterator is any finite set of values over which a loop can be repeated
– foreach $number (@ number) {
$number = $number + 1;
}
Copyright © 2006 The McGraw-Hill Companies, Inc.
• foreach loops exist in several languages (e.g., Ada, C#)
• Simple form of a for loop in which there is no explicit
LCV, no requirement to specify increment
• The assumption is that every item in the collection will be processed in order.
– Another Perl example
– foreach $key (sort keys %list) { print $key, “\t”, $list{$key},“\n”;
}
Copyright © 2006 The McGraw-Hill Companies, Inc.
return: exit from a function break : non-standard exit from a loop continue : non-standard exit from current loop iteration goto : transfer of control to any named program statement
Structured Programming: In part, advocates program design based on sequence, conditional, while loops only – no goto.
Copyright © 2006 The McGraw-Hill Companies, Inc.
j = 0 ct = 0 sum = 0.0
100 if (j .ge. n) goto 300 j = j + 1 if (a(j) .lt. 0) goto 100 ct = ct + 1 sum = sum + a(j) goto 100
300 continue
Copyright © 2006 The McGraw-Hill Companies, Inc.
Semantics = program meaning i.e
., how the program’s state changes as it executes
This chapter presents an informal discussion expressions, assignments, conditionals, branching, looping statements (skipped I/O and exceptions)
Chapter 8 shows how the semantics of a language can be defined using a set of meaning functions; a formal statement of the semantics of various program constructs.
Copyright © 2006 The McGraw-Hill Companies, Inc.
Examples of Meaning Functions for Clite
The meaning of a Program is … the meaning of its body when given an initial state consisting of the variables of the decpart , each initialized to the undef value corresponding to its type.
The meaning of an assignment statement is the result of replacing the value of the target Variable by the value of the source Expression in the current state.
Copyright © 2006 The McGraw-Hill Companies, Inc.
• Background Concepts
– open , close
• open binds a file to a file id, sets up tables, etc.
–
Access: sequential vs. random
• Sequential access reflects order of data in file; random access allows elements to be accessed out of order, multiple times
–
Stream vs. fixed length records
• Data is viewed as either a stream of characters or a series of fixed length records
Copyright © 2006 The McGraw-Hill Companies, Inc.
• Used to represent keyboard, screen display, and a standard error file to store run-time errors
– Unix: stdin, stdout, stderr
– C: stdin, stdout, stderr
– C++: cin, cout, cerr
– Java: System.in, System.out, System.err
Copyright © 2006 The McGraw-Hill Companies, Inc.
Unformatted I/O corresponds to what C++ and Java refer to as streams, and Fortran calls list-directed .
Characters are read or written as a stream; input routines group them into tokens and convert to an internal data type based on the next list item
C++ Example : cin >> x >> y >> p;
Fortran Example : READ(*,*) x, y, p
Copyright © 2006 The McGraw-Hill Companies, Inc.
• Fortran code to read data into an array using list integer :: i, a(8) write(*,*) “Enter 8 integers: “ read(*,*) a write(*,*) a
• Java stream types for sequential I/O:
– file: transfers data to or from a file
– pipe: transfers data to another stream
– memory: transfer data between an array and the program
Copyright © 2006 The McGraw-Hill Companies, Inc.
Requires the programmer to write descriptions of the input/output format
C: fscanf(input, “%d”, &a[i]); hybrid; permits arbitrary number of blanks
Fortran syntax for formatted reads and writes:
READ (file-unit,format-expr)var-list)
WRITE (file-unit,format-expr)var-list)
Fortran doesn’t automatically skip blanks
Copyright © 2006 The McGraw-Hill Companies, Inc.
WRITE (*, 120) a
120 FORMAT ( ‘Contents of array a: ‘, 8I5)
Position 1 in an output format statement controls line feeds
Blank indicates single space, ‘1’ (?) double space, anything else is a form feed
Copyright © 2006 The McGraw-Hill Companies, Inc.
• C
– Codes: d, e, f, c, s (decimal. float, float, char, string)
– Specifier: % opt-width code
– Ex: %s %5d %20s %8.2f
• Fortran
– Codes: i, f, a (integer, float, string)
– Specifier: op-repeat code width
– Ex: 8i4, f8.2, a20
Copyright © 2006 The McGraw-Hill Companies, Inc.