Expressions and Statements Programming Language Concepts Lecture 16 Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida 7 Categories of Control Constructs 1. Sequencing: • After A, execute B. • A block is a group of sequenced statements. 2. Selection: • Choice between two (or more) statements. 3. Iteration: • A fragment is executed repeatedly. 7 Categories of Control Constructs (cont’d) 4. Procedural abstraction: • Encapsulate a collection of control constructs in a single unit. 5. Recursion: • An expression defined in terms of a simpler version of itself. 6. Concurrency: • Two or more fragments executed at same time. 7 Categories of Control Constructs (cont’d) 7. Non-determinacy: • Order is deliberately left unspecified, implying any alternative will work. Expression Evaluation • Expressions consist of: • Simple object, or • Operator function applied to a collection of expressions. • Structure of expressions: • Prefix (Lisp), • Infix (most languages), • postfix (Postscript, Forth, some calculators) Expression Evaluation (cont’d) • By far the most popular notation is infix. • Raises some issues. • Precedence: • Specify that some operators, in absence of parentheses, group more tightly than other operators. Expression Evaluation (cont’d) • Associativity: tie-breaker for operators on the same level of precedence. • Left Associativity: a+b+c evaluated as (a+b)+c • Right Associativity: a+b+c evaluated as a+(b+c) • Different results may or may not accrue: • Generally: (a+b)+c = a+(b+c), but (a-b)-c <> a-(b-c) Expression Evaluation (cont’d) • Specify evaluation order of operators. • Generally, left-to-right (Java), but in some languages, the order is implementation-defined (C). Operators and Precedence in Various Languages • C is operator-richer than most languages • 17 levels of precedence. • some not shown in figure: • type casts, • array subscripts, • field selection (.) • dereference and field selection • a->b, equivalent to (*a).b • Pascal:<, <=, …, in (row 6) Pitfalls in Pascal if a < b and c < d then ... is parsed as if a < (b and c) < d. Will only work if a,b,c are booleans. Pitfalls in C a < b < c parsed as (a < b) < c, yielding a comparison between (a < b) (0 or 1) and c. Assignments • Functional programming: • We return a value for surrounding context. • Value of expression depends solely on referencing environment, not on the time in which the evaluation occurs. • Expressions are "referentially transparent." Assignments (cont’d) • Imperative: • Based on side-effects. • Influence subsequent computation. • Distinction between • Expressions (return a value) • Statements (no value returned, done solely for the side-effects). Variables • Can denote a location in memory (l-value) • Can denote a value (r-value) • Typically, 2+3 := c; is illegal, as well as c := 2+3; if c is a declared constant. Variables (cont’d) • Expression on left-hand-side of assignment can be complex, as long as it has an l-value: (f(a)+3)->b[c] = 2; in C. • Here we assume f returns a pointer to an array of elements, each of which is a structure containing a field b, an array. Entry c of b has an l-value. Referencing/Dereferencing • Consider b := 2; c := b; a := b + c; • Value Model Reference Model Referencing/Dereferencing (cont’d) • Pascal, C, C++ use the "value model": • Store 2 in b • Copy the 2 into c • Access b,c, add them, store in a. • Clu uses the "reference" model: • Let b refer to 2. • Let c also refer to 2. • Pass references a,b to "+", let a refer to result. Referencing/Dereferencing (cont’d) • Java uses value model for intrinsic (int, float, etc.) (could change soon !), and reference model for userdefined types (classes) Orthogonality • Features can be used in any combination • Every combination is consistent. • Algol was first language to make orthogonality a major design goal. Orthogonality In Algol 68 • Expression oriented; no separate notion of statement. begin a := if b < c then d else e; a := begin f(b); g(c) end; g(d); 2+3 end Orthogonality In Algol 68 (cont’d) • Value of 'if' is either expression (d or e). • Value of 'begin-end' block is value of last expression in it, namely g(c). • Value of g(d) is obtained, and discarded. • Value of entire block is 5. Orthogonality In Algol 68 (cont’d) • C does this as well: • Value of assignment is value of right-hand-side: c = b= a++; Pitfall in C if (a=b) { ... } /* assign b to a and proceed */ /* if result is nonzero /* • Some C compilers warn against this. • Different from if (a==b) { ... } • Java has separate boolean type: • prohibits using an int as a boolean. Initialization • Not always provided (there is assignment) • Useful for 2 reasons: 1. Static allocation: • compiler can place value directly into memory. • No execution time spent on initialization. 2. Variable not initialized is common error. Initialization (cont’d) • Pascal has NO initialization. • Some compilers provide it as an extension. • Not orthogonal, provided only for intrinsics. • C, C++, Ada allow aggregates: • Initialization of a user-defined composite type. Example: (C) int a[] = {2,3,4,5,6,7} • Rules for mismatches between declaration and initialization: int a[4] = {1,2,3}; /* rest filled with zeroes */ int a[4] = {0}; /* filled with all zeroes */ int a[4] = {1,2,3,4,5,6,7} /* oops! */ • Additional rules apply for multi-dimensional arrays and structs in C. Uninitialized Variables • Pascal guarantees default values (e.g. zero for integers) • C guarantees zero values only for static variables, *garbage* for everyone else ! Uninitialized Variables (cont’d) • C++ distinguishes between: • initialization (invocation of a constructor, no initial value is required) • Crucial for user-defined ADT's to manage their own storage, along with destructors. • assignment (explicit) Uninitialized Variables (cont’d) • Difference between initialization and assignment: variable length string: • Initialization: allocate memory. • Assignment: deallocate old memory AND allocate new. Uninitialized Variables (cont’d) • Java uses reference model, no need for distinction between initialization and assignment. • Java requires every variable to be "definitely assigned", before using it in an expression. • Definitely assigned: every execution path assigns a value to the variable. Uninitialized Variables (cont’d) • Catching uninitialized variables at run-time is expensive. • harware can help, detecting special values, e.g. "NaN" IEEE floatingpoint standard. • may need extra storage, if all possible bit patterns represent legitimate values. Combination Assignment Operators • Useful in imperative languages, to avoid repetition in frequent updates: a = a + 1; b.c[3].d = b.c[3].d * 2; • Can simplify: ++a; b.c[3].d *= 2; /* ack ! */ Combination Assignment Operators (cont’d) • Syntactic sugar for often used combinations. • Useful in combination with autoincrement operators: A[--i]=b; equivalent to A[i -= 1] = b; Combination Assignment Operators (cont’d) *p++ = *q++; /* ++ has higher precedence than * */ equivalent to *(t=p, p += 1, t) = *(t=q, q += 1, t); • Advantage of autoincrement operators: • Increment is done in units of the (user-defined) type. Comma Operator • In C, merely a sequence: int a=2, b=3; a,b = 6; /* now a=2 and b=6 int a=2, b=3; a,b = 7,6; /* */ now a=2 and b=7 */ /* = has higher precedence than , */ Comma Operator • In Clu, "comma" creates a tuple: a,b := 3,4 a,b := b,a assigns 3 to a, 4 to b swaps them ! • We already had that in RPAL: let t=(1,2) in (t 2, t 1) Ordering within Expressions • Important for two reasons: 1. Side effect: • One sub expression can have a side effect upon another subexpression: (b = ++a + a--) 2. Code improvement: • Order evaluation has effect on register/instruction scheduling. Ordering within Expressions (cont’d) • Example: a * b + f(c) • Want to call f first, avoid storing (using up a register) for a*b during call to f. Ordering within Expressions (cont’d) • Example: a := B[i]; c := a * 2 + d * 3; • Want to calculate d * 3 before a * 2: Getting a requires going to memory (slow); calculating d * 3 can proceed in parallel. Ordering within Expressions (cont’d) • Most languages leave subexpression order unspecified (Java is a notable exception, uses left-to-right) • Some will actually rearrange subexpressions. Example (Fortran) a = b + c c = c + e + b rearranged as a = b + c c = b + c + e and then as a = b + c c = a + e Rearranging Can Be Dangerous • If a,b,c are close to the precision limit (say, about ¾ of largest possible value), then a + b - c will overflow, whereas a - c + b will not. • Safety net: most compilers guarantee to follow ordering imposed by parentheses. Short Circuit Evaluation • As soon as we can conclude outcome of the evaluation, skip the remainder of it. • Example (in Java): if ( list != null && list.size() != 0)) System.out.println(list.size()); • Will never throw null pointer exception Short Circuit Evaluation (cont’d) • Can't do this in Pascal: if (list <> nil) and (list^.size <> 0) • will evaluate list^.size even when list is nil. • Cumbersome to do it in Pascal: if list <> = nil then if list^.size <> 0 then System.out.println(list.size()); Short Circuit Evaluation (cont’d) • So, is short-circuit evaluation always good? • Not necessarily. Short Circuit Evaluation (cont’d) Short Circuit Evaluation (cont’d) • Here, the idea is to tally AND to spellcheck every word, and print the word if it's misspelled and has appeared for the 10th time. • If the 'and' is short-circuit, the program breaks. • Some languages (Clu, Ada, C) provide BOTH short-circuit and non shortcircuit Boolean operators. Structured Programming • Federal Law: Abandon Goto's ! • Originally, Fortran had goto's: if a .lt. b goto 10 ... 10 Structured Programming (cont’d) • Controversy surrounding Goto's: • Paper (letter to editor ACM Comm.) in 1968 by E. Dykstra: • "Goto statement Considered Harmful" • argument: Goto's create "spaguetti code". Structured Programming (cont’d) • Legacy: structured programming: use of • sequencing (;) • alternation (if) • iteration (while) • Sufficient to solve any problem. Structured Programming (cont’d) • Part of focus on *control* during first 40 years in programming. • During 80's, 90's and beyond, focus shifted to *data* (OO-programming) Structured Programming (cont’d) • Common (former) use of goto: break out of loop(s), maybe deeply nested: while true do begin if (...) then goto 100; end; 100: ... Structured Programming (cont’d) • In C, this can be accomplished using a 'break' statement, but consider this ... while (...) { switch (...) { ... goto loop_done; { } loop_done: ... /* break won't do */ Structured Programming (cont’d) • Today, we use *exceptions*. • Exception: Upon a certain (error) condition, allows a program to back out of nested context to some point where it can recover and proceed. • Requires unwinding of the stack frame. • More later. Structured Programming (cont’d) • Semantically, goto's are *very* difficult to understand and implement correctly. • Some (circumstantial) evidence: • RPAL LPAL JPAL • (JPAL: PAL with jumps) • JPAL by far the hardest of the three to describe. Structured Programming (cont’d) • When executing a jump, we might be: • • • • exiting one or more procedure calls. exiting many nested loops. diving into the middle of a procedure diving into the middle of a loop. • What happens to the stack ??? Structured Programming (cont’d) • Goto's in general described using *continuations*: • A continuation captures the context (state) in which execution might continue. • Continuations essential to denotational semantics (more later). Statement Sequencing • Basic assumption: • A sequence of statements will have side effects. • Not always desirable; easier to reason (prove correct) programs in which functions have no side effects. • Sometimes side-effects are *very* desirable. Example: rand() function. • want it to produce a different number each time it's called. Statement Selection • Most languages use a variant of the original if...then...else introduced in Algol 60: if condition then statement else if condition then statement else if condition then statement ... else statement Statement Selection (cont’d) switch (condition) { case a: block_a; case b: block_b : ... default: block_c} is often syntactical sugar for if (condition == a) block_a else if (condition == b) block_b ... else block_c Statement Selection (cont’d) • Some languages require explicit break statements between cases, otherwise all the other cases evaluate to true (e.g. C) Short-Circuited Conditions • Design goal: implement if’s efficiently. • Jump code: efficient organization of code to take advantage of short-circuited boolean expressions. • Value of expression never stored in a register. Short-Circuited Conditions (cont’d) • If the value of the entire expression is needed, we can still use jump code. • Example (Ada): found := p /= null and then p.key =val; equivalent to if p /= null and then p.key=val then found := true; else found := false; end if; Short-Circuited Conditions (cont’d) • Jump code: r1 := p if r1=0 goto L1 Could be L2! Better to r2 := r1->key perform that if r2 <> val goto L1 improvement in a code r1 := 1 optimizer goto L2 L1: r1:=0 L2: found := r1 Case/Switch Statements • Alternative syntax for nested if...then...else statements. Example: i := (* potentially complicated expression *) if i=1 then clause_A else if i=2 or i=7 then clause_B else if i >=3 and i <= 5 then clause_C else if i=10 then clause_D else clause_E Corresponding CASE statement Case/Switch Statements (cont’d) • Purpose of case statement is not only syntactic elegance, but efficiency. Wish to *compute* the address to which to branch. • So, list ten cases (range of values tested): • Store addresses starting at location T (jump table). • Calculate r1 (the test expression value). • First test for r1 out of range 1..10. • Subtract 1 from r1, obtaining an offset (0..9). • Get address T[r1], store in r2. • Branch to (indirect) r2. Case/Switch Statements (cont’d) • Advantages: fast, occupies reasonable space if labels are dense. • Disadvantage: can occupy enormous amounts of space if values are not dense (e.g. 1, 3..5, 50000..50003) Case/Switch Statements (cont’d) • Variations: • Use hash table for T. • Good idea if total range is large, many missing values, and no large value ranges. • Requires a separate entry for each possible value. • Use binary search for table T. • Good idea if value ranges are large, runs in O(log n) time. Case/Switch Statements (cont’d) • Combining techniques: • Compilers usually generate code for each arm, building up knowledge of the label set. • Then use knowledge to choose strategy (binary search or hash table). • Less sophisticated compilers often generate poor code, programmer must restructure case statement to prevent huge tables, or very inefficient code. Case/Switch Statements (cont’d) • Pascal, C don't allow ranges (avoid binary search). • Standard Pascal doesn't allow a default clause: • Run-time semantic error if no case matches expression. • Many Pascal compilers *do* allow it as an extension. Case/Switch Statements (cont’d) • Modula provides an optional ELSE clause. • Ada requires labels to cover *ALL* values in the domain of the type of the expression. Ranges and an *others* clause are allowed. • C, Fortran 90: OK for expression to match no value: statement does nothing. Case/Switch Statements (cont’d) • C is different in other respects: • A label can have an empty arm, • Control "falls" through to next label. • Effectively allows lists of values. • Example: switch (grade) { case 10: case 9: case 8: case 7: printf("Pass"); break; default: printf("Fail"); break; } Case/Switch Statements (cont’d) • 'break' needed to prevent fallthrough. • If a value matches test expression, fall-through takes place i.e. no more comparisons. Case/Switch Statements (cont’d) • Example: switch (grade) { case 10: case 9: case 8: case 7: num_pass++; case 6: borderline++; case 0: case 1: case: 2 case 3: case 4: case: 5 fail ++; default: total++; break; } • In C, a forgotten break can be a difficult bug to find. Expressions and Statements Programming Language Concepts Lecture 16 Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida