Expressions and Statements

advertisement
Expressions and Statements
Programming Language Concepts
Lecture 16
Prepared by
Manuel E. Bermúdez, Ph.D.
Associate Professor
University of Florida
7 Categories of Control Constructs
1. Sequencing:
• After A, execute B.
• A block is a group of sequenced
statements.
2. Selection:
• Choice between two (or more)
statements.
3. Iteration:
• A fragment is executed repeatedly.
7 Categories of Control Constructs
(cont’d)
4. Procedural abstraction:
• Encapsulate a collection of control
constructs in a single unit.
5. Recursion:
• An expression defined in terms of a
simpler version of itself.
6. Concurrency:
• Two or more fragments executed at
same time.
7 Categories of Control Constructs
(cont’d)
7. Non-determinacy:
• Order is deliberately left
unspecified, implying any
alternative will work.
Expression Evaluation
• Expressions consist of:
• Simple object, or
• Operator function applied to a
collection of expressions.
• Structure of expressions:
• Prefix (Lisp),
• Infix (most languages),
• postfix (Postscript, Forth, some
calculators)
Expression Evaluation (cont’d)
• By far the most popular notation is infix.
• Raises some issues.
• Precedence:
• Specify that some operators, in
absence of parentheses, group more
tightly than other operators.
Expression Evaluation (cont’d)
• Associativity: tie-breaker for operators
on the same level of precedence.
• Left Associativity: a+b+c
evaluated as (a+b)+c
• Right Associativity: a+b+c evaluated
as a+(b+c)
• Different results may or may not
accrue:
• Generally: (a+b)+c = a+(b+c),
but (a-b)-c <> a-(b-c)
Expression Evaluation (cont’d)
• Specify evaluation order of operators.
• Generally, left-to-right (Java), but in
some languages, the order is
implementation-defined (C).
Operators and Precedence in
Various Languages
• C is operator-richer than most languages
• 17 levels of precedence.
• some not shown in figure:
• type casts,
• array subscripts,
• field selection (.)
• dereference and field selection
• a->b, equivalent to (*a).b
• Pascal:<, <=, …, in (row 6)
Pitfalls in Pascal
if a < b and c < d then ... is parsed as
if a < (b and c) < d.
Will only work if a,b,c are booleans.
Pitfalls in C
a < b < c parsed as (a < b) < c,
yielding a comparison between
(a < b) (0 or 1) and c.
Assignments
• Functional programming:
• We return a value for surrounding
context.
• Value of expression depends solely on
referencing environment, not on the
time in which the evaluation occurs.
• Expressions are "referentially
transparent."
Assignments (cont’d)
• Imperative:
• Based on side-effects.
• Influence subsequent computation.
• Distinction between
• Expressions (return a value)
• Statements (no value returned,
done solely for the side-effects).
Variables
• Can denote a location in memory
(l-value)
• Can denote a value (r-value)
• Typically,
2+3 := c; is illegal, as well as
c := 2+3; if c is a declared constant.
Variables (cont’d)
• Expression on left-hand-side of
assignment can be complex, as long as
it has an l-value:
(f(a)+3)->b[c] = 2;
in C.
• Here we assume f returns a pointer to
an array of elements, each of which is
a structure containing a field b, an
array. Entry c of b has an l-value.
Referencing/Dereferencing
• Consider
b := 2;
c := b;
a := b + c;
• Value Model
Reference Model
Referencing/Dereferencing (cont’d)
• Pascal, C, C++ use the "value model":
• Store 2 in b
• Copy the 2 into c
• Access b,c, add them, store in a.
• Clu uses the "reference" model:
• Let b refer to 2.
• Let c also refer to 2.
• Pass references a,b to "+", let a
refer to result.
Referencing/Dereferencing (cont’d)
• Java uses value model for intrinsic
(int, float, etc.) (could change
soon !), and reference model for userdefined types (classes)
Orthogonality
• Features can be used in any
combination
• Every combination is consistent.
• Algol was first language to make
orthogonality a major design goal.
Orthogonality In Algol 68
• Expression oriented; no separate notion of
statement.
begin
a := if b < c then d else e;
a := begin f(b); g(c) end;
g(d);
2+3
end
Orthogonality In Algol 68 (cont’d)
• Value of 'if' is either expression (d or e).
• Value of 'begin-end' block is value of last
expression in it, namely g(c).
• Value of g(d) is obtained, and discarded.
• Value of entire block is 5.
Orthogonality In Algol 68 (cont’d)
• C does this as well:
• Value of assignment is value of
right-hand-side:
c = b= a++;
Pitfall in C
if (a=b) { ... }
/* assign b to a and proceed */
/* if result is nonzero /*
• Some C compilers warn against this.
• Different from
if (a==b) {
...
}
• Java has separate boolean type:
• prohibits using an int as a boolean.
Initialization
• Not always provided (there is assignment)
• Useful for 2 reasons:
1. Static allocation:
• compiler can place value directly into
memory.
• No execution time spent on
initialization.
2. Variable not initialized is common error.
Initialization (cont’d)
• Pascal has NO initialization.
• Some compilers provide it as an
extension.
• Not orthogonal, provided only for
intrinsics.
• C, C++, Ada allow aggregates:
• Initialization of a user-defined
composite type.
Example: (C)
int a[] = {2,3,4,5,6,7}
• Rules for mismatches between
declaration and initialization:
int a[4] = {1,2,3}; /* rest filled with zeroes */
int a[4] = {0};
/* filled with all zeroes */
int a[4] = {1,2,3,4,5,6,7}
/* oops! */
• Additional rules apply for
multi-dimensional arrays and structs in C.
Uninitialized Variables
• Pascal guarantees default values
(e.g. zero for integers)
• C guarantees zero values only for
static variables, *garbage* for
everyone else !
Uninitialized Variables (cont’d)
• C++ distinguishes between:
• initialization (invocation of a
constructor, no initial value is
required)
• Crucial for user-defined ADT's to
manage their own storage, along
with destructors.
• assignment (explicit)
Uninitialized Variables (cont’d)
• Difference between initialization and
assignment: variable length string:
• Initialization: allocate memory.
• Assignment: deallocate old memory
AND allocate new.
Uninitialized Variables (cont’d)
• Java uses reference model, no need
for distinction between initialization
and assignment.
• Java requires every variable to be
"definitely assigned", before using it in
an expression.
• Definitely assigned: every execution
path assigns a value to the variable.
Uninitialized Variables (cont’d)
• Catching uninitialized variables at
run-time is expensive.
• harware can help, detecting special
values, e.g. "NaN" IEEE floatingpoint standard.
• may need extra storage, if all
possible bit patterns represent
legitimate values.
Combination Assignment Operators
• Useful in imperative languages, to
avoid repetition in frequent updates:
a = a + 1;
b.c[3].d = b.c[3].d * 2;
• Can simplify:
++a;
b.c[3].d *= 2;
/* ack !
*/
Combination Assignment Operators
(cont’d)
• Syntactic sugar for often used
combinations.
• Useful in combination with
autoincrement operators:
A[--i]=b; equivalent to
A[i -= 1] = b;
Combination Assignment Operators
(cont’d)
*p++ = *q++;
/*
++ has higher precedence than * */
equivalent to
*(t=p, p += 1, t) = *(t=q, q += 1, t);
• Advantage of autoincrement operators:
• Increment is done in units of the
(user-defined) type.
Comma Operator
• In C, merely a sequence:
int a=2, b=3;
a,b = 6;
/* now a=2 and b=6
int a=2, b=3;
a,b = 7,6; /*
*/
now a=2 and b=7
*/
/* = has higher precedence than ,
*/
Comma Operator
• In Clu, "comma" creates a tuple:
a,b := 3,4
a,b := b,a
assigns 3 to a, 4 to b
swaps them !
• We already had that in RPAL:
let t=(1,2)
in (t 2, t 1)
Ordering within Expressions
• Important for two reasons:
1. Side effect:
• One sub expression can have a side
effect upon another subexpression:
(b = ++a + a--)
2. Code improvement:
• Order evaluation has effect on
register/instruction scheduling.
Ordering within Expressions (cont’d)
• Example: a * b + f(c)
• Want to call f first, avoid storing
(using up a register) for a*b
during call to f.
Ordering within Expressions (cont’d)
• Example:
a := B[i];
c := a * 2 + d * 3;
• Want to calculate d * 3 before a * 2:
Getting a requires going to memory
(slow); calculating d * 3 can proceed
in parallel.
Ordering within Expressions (cont’d)
• Most languages leave subexpression
order unspecified (Java is a notable
exception, uses left-to-right)
• Some will actually rearrange
subexpressions.
Example (Fortran)
a = b + c
c = c + e + b
rearranged as
a = b + c
c = b + c + e
and then as
a = b + c
c = a + e
Rearranging Can Be Dangerous
• If a,b,c are close to the precision limit
(say, about ¾ of largest possible
value), then
a + b - c will overflow, whereas
a - c + b will not.
• Safety net: most compilers guarantee
to follow ordering imposed by
parentheses.
Short Circuit Evaluation
• As soon as we can conclude outcome of the
evaluation, skip the remainder of it.
• Example (in Java):
if ( list != null && list.size() != 0))
System.out.println(list.size());
• Will never throw null pointer exception
Short Circuit Evaluation (cont’d)
• Can't do this in Pascal:
if (list <> nil) and (list^.size <> 0)
• will evaluate list^.size even when list is
nil.
• Cumbersome to do it in Pascal:
if list <> = nil then
if list^.size <> 0 then
System.out.println(list.size());
Short Circuit Evaluation (cont’d)
• So, is short-circuit evaluation always good?
• Not necessarily.
Short Circuit Evaluation (cont’d)
Short Circuit Evaluation (cont’d)
• Here, the idea is to tally AND to spellcheck every word, and print the word
if it's misspelled and has appeared for
the 10th time.
• If the 'and' is short-circuit, the
program breaks.
• Some languages (Clu, Ada, C) provide
BOTH short-circuit and non shortcircuit Boolean operators.
Structured Programming
• Federal Law: Abandon Goto's !
• Originally, Fortran had goto's:
if a .lt. b goto 10
...
10
Structured Programming (cont’d)
• Controversy surrounding Goto's:
• Paper (letter to editor ACM Comm.)
in 1968 by E. Dykstra:
• "Goto statement Considered
Harmful"
• argument: Goto's create
"spaguetti code".
Structured Programming (cont’d)
• Legacy: structured programming:
use of
• sequencing (;)
• alternation (if)
• iteration (while)
• Sufficient to solve any problem.
Structured Programming (cont’d)
• Part of focus on *control* during first
40 years in programming.
• During 80's, 90's and beyond, focus
shifted to *data* (OO-programming)
Structured Programming (cont’d)
• Common (former) use of goto: break
out of loop(s), maybe deeply nested:
while true do begin
if (...) then goto 100;
end;
100: ...
Structured Programming (cont’d)
• In C, this can be accomplished using a
'break' statement, but consider this ...
while (...) {
switch (...) {
...
goto loop_done;
{
}
loop_done: ...
/* break won't do */
Structured Programming (cont’d)
• Today, we use *exceptions*.
• Exception: Upon a certain (error)
condition, allows a program to back out
of nested context to some point where
it can recover and proceed.
• Requires unwinding of the stack frame.
• More later.
Structured Programming (cont’d)
• Semantically, goto's are *very*
difficult to understand and implement
correctly.
• Some (circumstantial) evidence:
• RPAL
 LPAL  JPAL
• (JPAL: PAL with jumps)
• JPAL by far the hardest of the three
to describe.
Structured Programming (cont’d)
• When executing a jump, we might be:
•
•
•
•
exiting one or more procedure calls.
exiting many nested loops.
diving into the middle of a procedure
diving into the middle of a loop.
• What happens to the stack ???
Structured Programming (cont’d)
• Goto's in general described using
*continuations*:
• A continuation captures the context
(state) in which execution might
continue.
• Continuations essential to
denotational semantics (more
later).
Statement Sequencing
• Basic assumption:
• A sequence of statements will have
side effects.
• Not always desirable; easier to reason
(prove correct) programs in which
functions have no side effects.
• Sometimes side-effects are *very*
desirable. Example: rand() function.
• want it to produce a different
number each time it's called.
Statement Selection
• Most languages use a variant of the original
if...then...else introduced in
Algol 60:
if condition then statement
else if condition then statement
else if condition then statement
...
else statement
Statement Selection (cont’d)
switch (condition) {
case a: block_a;
case b: block_b :
...
default: block_c}
is often syntactical sugar for
if (condition == a) block_a
else if (condition == b) block_b
...
else block_c
Statement Selection (cont’d)
• Some languages require explicit break
statements between cases, otherwise
all the other cases evaluate to true
(e.g. C)
Short-Circuited Conditions
• Design goal: implement if’s efficiently.
• Jump code: efficient organization of code
to take advantage of short-circuited
boolean expressions.
• Value of expression never stored in a
register.
Short-Circuited Conditions (cont’d)
• If the value of the entire expression is
needed, we can still use jump code.
• Example (Ada):
found := p /= null and then p.key =val;
equivalent to
if p /= null and then p.key=val then
found := true;
else
found := false;
end if;
Short-Circuited Conditions (cont’d)
• Jump code:
r1 := p
if r1=0 goto L1
Could be L2! Better to
r2 := r1->key
perform that
if r2 <> val goto L1 improvement in a code
r1 := 1
optimizer
goto L2
L1: r1:=0
L2: found := r1
Case/Switch Statements
• Alternative syntax for nested
if...then...else statements. Example:
i := (* potentially complicated
expression *)
if i=1 then
clause_A
else if i=2 or i=7 then
clause_B
else if i >=3 and i <= 5 then
clause_C
else if i=10 then
clause_D
else
clause_E
Corresponding CASE statement
Case/Switch Statements (cont’d)
• Purpose of case statement is not only
syntactic elegance, but efficiency. Wish to
*compute* the address to which to branch.
• So, list ten cases (range of values tested):
• Store addresses starting at location T
(jump table).
• Calculate r1 (the test expression value).
• First test for r1 out of range 1..10.
• Subtract 1 from r1, obtaining an offset
(0..9).
• Get address T[r1], store in r2.
• Branch to (indirect) r2.
Case/Switch Statements (cont’d)
• Advantages: fast, occupies reasonable
space if labels are dense.
• Disadvantage: can occupy enormous
amounts of space if values are not dense
(e.g. 1, 3..5, 50000..50003)
Case/Switch Statements (cont’d)
• Variations:
• Use hash table for T.
• Good idea if total range is large,
many missing values, and no large
value ranges.
• Requires a separate entry for each
possible value.
• Use binary search for table T.
• Good idea if value ranges are large,
runs in O(log n) time.
Case/Switch Statements (cont’d)
• Combining techniques:
• Compilers usually generate code for
each arm, building up knowledge of
the label set.
• Then use knowledge to choose strategy
(binary search or hash table).
• Less sophisticated compilers often
generate poor code, programmer must
restructure case statement to prevent
huge tables, or very inefficient code.
Case/Switch Statements (cont’d)
• Pascal, C don't allow ranges (avoid
binary search).
• Standard Pascal doesn't allow a default
clause:
• Run-time semantic error if no case
matches expression.
• Many Pascal compilers *do* allow it
as an extension.
Case/Switch Statements (cont’d)
• Modula provides an optional ELSE clause.
• Ada requires labels to cover *ALL* values
in the domain of the type of the
expression. Ranges and an *others*
clause are allowed.
• C, Fortran 90: OK for expression to
match no value: statement does nothing.
Case/Switch Statements (cont’d)
• C is different in other respects:
• A label can have an empty arm,
• Control "falls" through to next label.
• Effectively allows lists of values.
• Example:
switch (grade) {
case 10: case 9: case 8: case 7:
printf("Pass"); break;
default: printf("Fail"); break;
}
Case/Switch Statements (cont’d)
• 'break' needed to prevent
fallthrough.
• If a value matches test expression,
fall-through takes place i.e. no more
comparisons.
Case/Switch Statements (cont’d)
• Example:
switch (grade) {
case 10: case 9: case 8: case 7:
num_pass++;
case 6: borderline++;
case 0: case 1: case: 2
case 3: case 4: case: 5
fail ++;
default: total++; break;
}
• In C, a forgotten break can be a difficult bug to find.
Expressions and Statements
Programming Language Concepts
Lecture 16
Prepared by
Manuel E. Bermúdez, Ph.D.
Associate Professor
University of Florida
Download