CS461 Programming Languages Lectures week 3 FORTRAN (FORmula TRANslating system) mid 50’s: IBM John Backus – algebraic language translator efficiency the big issue -> impact on design 54 – made up as they went along 57 – released Fortran0 – Fortran IV (66) Fortran77 (ANSI) -> many ideas for later languages Characteristic features - set of fixed fields - typing implicit I-N (integer) - IF provides branches -, 0, + - DO statement (Fortran IV and before – repeat, incr. Only, limits could not be expressions) - FORMAT statements gave control over I/O and introduced H for (Hollerith) character strings - Commenting For Historical Reasons look at early versions FORTRAN0 – lacked subprograms (proc/fns/subroutines) FORTRAN1 – similar to pseudocodes, had subprogram facilities (but communicated by using parameters or shared data areas called COMMON blocks, aka GLOBAL variables) In general 2 parts to a program: - declarative (describes data areas, lengths, init values) - imperative (commands) ------ ----- FORTRAN PROGRAM nonexecutable (compile time) executable (run time) Bindings Declaration include bindings and initializations; 1. allocate area of memory of a size 2. bind (attach) a symbolic name to area of memory 3. initialize contents of memory Example: DIMENSION DTA (900) 1. allocates 900 words 2. binds the name DTA to the location 3. DATA DTA/900*0.0 (not required in Fortran) Imperatives are computational (arith, move), control-flow (IF, GOTO, DO-LOOP), or I/O(READ, PRINT) Fortrans primary computational imperative: assignment Stages to Run program and Bindings 1. compilation 2. linking 3. loading 4. execution AVG=SUM/FLOAT(N) 1. Compilation Fortran subprograms -> relocatable object code statements -> instructions of computer subprograms reside in memory w/ other subprograms not yet compiled => impossible to determine at compile time location in memory that subprogram will go Therefore addresses of variables and statements not known So binding occurs later during loading! 2. Linking - Incorporate libraries, subprograms already compiled 3. Loading – - 4. Execution program placed in computer memory go from relocatable code (.OBJ) to absolute format (.EXE) bind all code and data references to addresses of locations Compilation: 3 phases Determines efficiency of final program 1. 2. 3. syntactic analysis (lexical analysis and parser) - classify statements, extract parts optimization – produce as good code as could be produced by experienced programmer code synthesis (relocatable format) Design: Data Structures - suggested by math: scalars and vectors (arrays) Scalars Primary primitives: numeric scalars (distinct values, ordering) Fortran II (60s) INTEGERs – indexing and counting Floating point – evaluation of math and physical formulas Double precision Complex (scientific calculations) Logicals Integers (32 bit word) s b30 b29 … b2 b1 b0 Operations: +,-,*,/, tests for 0, tests for sign Floating point -1.5x103 coefficient and power of 10 Operations: : +,-,*,/,comp, abs, exp (library) sm sc c7 … c0 m21 m20 … m1 m0 mx2c where m is mantissa and c is characteristic NOTE: Arithmetic operators overloaded (VIOLATES ORTHOGONALITY PRINCIPLE) Can mix types in expressions, but computer numbers not related same way math numbers are Compiler resolves by looking at context to determine machine instructions needed to generate Early Fortran did NOT allow implicit or explicit conversion X+FLOAT (I) Later versions allowed implicit coercion. or I=IFIX(X) Characters Integer Type OVERWORKED Integer could represent integers and char-strings Hollerith constant – type integer (early form of char string) Example: 6HCARMEL -> “CARMEL” Character strings not first class in FORTRAN Can’t use in all ways we want -> VIOLATES REGULARITY PRINCIPLE Also no Hollerith variable No string comparisons Weak typing creates a loop-hole (VIOLATES SECURITY PRINCIPLE) Permits reading into integer/real variables Permits constants to be used as parameters where integers are expected Fortran 77 HAS CHARACTER data type. ARRAY (data constructor) Example : DIMENSION DTA(100), COORD(10,10) Fortran does not require initialization Dimensions – integer, limited to 3 (7 in FORTRAN 77) (VIOLATES 0-1-INFINITY PRINCIPLE and REGULARITY PRINCIPLE) Array implementation will be skipped, read MacClennan’s book if you are interested. Array Subscripts Had to fit a form for optimization purposes. Examples: I+1 allowed 1+ I not allowed Subscript forms: c, v, v+c or c-c, c*v, c*v + c or c*v –c VIOLATES REGULARITY PRINCIPLE Name Structures - organize names in program declarations or binding constructs Example: INTEGER I, J, K - 1 word allocated to each - names bound to addresses - initialization in DATA statement - information put into a ‘symbol table’ EXAMPLE name I type location integer 0245 Declarations are non-executable – provide information to compiler, liner and loader Static allocation done before execution and doesn’t change during execution In FORTRAN, all subprograms before invocation have locations allocated In Pascal and C++ - allocate memory dynamically The optional declarations in FORTRAN are dangerous! - False economy VIOLATES SECURITY PRINCIPLE! - Leads to obscure name chosen, such as KOUNT, ISUM, XLENGTH - And what about typos? COUNT = COUMT + 1 COUMT was implicitly declared and value is ? Environments determine meanings (Concept of SCOPE important for midterm!) Context of statement based on environment Set of definitions visible to a statement or construct Environment determines visibility of bindings In FORTRAN - subprograms are separately compilable - variable names local in scope - see parameters - see COMMON block (global) but each subprogram must include an identical declaration of a COMMON block. (What if all specs don't agree? No Error! VIOLATES SECURITY PRINCIPLE) - subprogram names are GLOBAL no nested hidden subprograms – all at same level, all visible to all VIOLATES INFORMATION HIDING PRINCIPLE Example SUBROUTINE A COMMON/SYMTAB/NAMES(100), LOC(100), TYPE(100), DIMS(100) … END SUBROUTINE B COMMON/SYMTAB/NAMES(100), LOC(100), TYPE(100), DIMS(100) … END What if LOC and TYPE switched? LOC are integers and TYPE are reals. PROBLEM! But won’t be caught! (VIOLATES SECURITY PRINCIPLE) FORTRAN VIOLATES SYNTACTIC CONSISTENCY PRINCIPLE Which states things which look similar should be similar and things which look different should be Different. Examples: ‘**’ for exponent, but leave one * out and you have legal multiplication. FORTRAN has weak typing (so does C/C++): Integer variables can contain addresses and chars Should have a LABEL type to hold addresses VIOLATES DEFENSE IN DEPTH PRINCIPLE Which states if an error gets through one line of defense (such as syntactic checking by compiler), then it should be caught by next line of defense (type checking) HARDEST PROBLEM IN LANGUAGE DESIGN: identifying interaction of features Example: how does syntax of GOTO’s work with overloading of integer type (where integer can contain addresses). Control, data, name, syntactic structures Control structures govern flow of control If Early Fortran Example: IF (e) n1, n2, n3 Evaluates expression e, branches to n1, n2, or n3 depending if result - ,0 ,+ 3way branching unusual, inspired by IBM 709 assembly language Difficult to keep meaning of 3 labels straight NOW Fortran: IF (X .EQ. A(I)) K = I – 1 NOTE: VIOLATES SYNTACTIC CONSISTENCY PRIN. GOTO Early Fortran: GOTO workhorse of control flow Example: IF (e) GOTO 100 …case for False GOTO 200 100 …case for True 200 rest of code Example: 100 …code… IF (e) GOTO 100 Example: 100 IF (e) GOTO 200 …code… GOTO 100 200 …rest of code… This is just an If/Then/Else! Can you tell?!? This is a Repeat/Until loop This is a while loop ! Other examples: Computed GOTOs and Assigned GOTOs (are like switches in C++0 When we see an IF statement, hard to see if its an IF, IF-ELSe, leading or trailing decision loop. Difficult to identify control structures. With GOTO it is even possible to write mid-decision loops! VIOLATES STRUCTURE PRINCIPLE. GOTO is a 2-edged sword: primitive but powerful control structure Understandability is sacrificed STRUCURE PRINCIPLE states the static structure of a program should correspond in a simple way with its dynamic structure of corresponding computations. (Should be possible to visualize behavior by looking at written form.) DO-LOOP - higher level control structure - definite loop Example: DO 100 I=1, N A(I) = A(I) * 2 100 CONTINUE We have what we want, rather than how (init, incr, test, branch) – SUPPORTS AUTOMATION PRINCIPLE and ABSTRACTION PRINCIPLE - can be nested highly optimized (LCV, initial and final values all stated explicitly along with extent of loop.) PRESERVATION OF INFORMATION PRINCIPLE – the language should allow the representation of information that user might know and that the compiler might need. DO-WHILE (VAX FORTRAN) DO WHILE (condition) ….body… END DO SUBPROGRAMS late addition libraries – they had user-defined subprogram – they didn’t have remedied in FORTRAN II (w/ subroutines and functions) YOUR FORTRAN assignments uses 2 subroutines (BISECT&PRINT) and 2 functions (F&G) Subprograms define Procedural Abstraction (SUPPORT ABSTRACTION PRINCIPLE) - fragments of code that occur more than once w/ different variables Example: SUBROUTINE name (formal parameters) …body… RETURN END RETURNS allowed anywhere in subprogram Invoke by CALL statement CALL name (actual parameters) When executed actuals bound to formals, binding occurs during run-time Parameters - usually passed by reference (Pascal VAR parameters, C++ pass by reference with &) FORTRAN allows parameters used for input, output or both Pass by reference Output parameter - need address of variable formal parameter bound to address of actual - efficient - actually it is an input-output - Pass by reference can be dangerous (side effects) because output parameter is actually input-output - Input variable can be changed this way Example: SUBROUTINE SWITCH (N) N=3 RETURN END CALL SWITCH(I) puts 3 in I CALL SWITCH(2) puts 3 in the ‘literal table’ (constants portion of memory) where 2 is stored. Effect of I = 2+2 => will give you I = 6 VIOLATES SECURITY PRINCIPLE! Pass-by-value-Result Value of actual is copied to format at invocation and result copied to actual at exit. Both operations done by the caller, compiler can omit 2nd operation if parameter is constant or expression Activation Records - investigate way subprograms implemented - save state of caller (contents of variables, registers, IP) - way of knowing where subprogram returns to IN nonrecursive FORTRAN – 1 activation record per subprogram - when a subprogram is invoked actual parms -> location callee knows to find them -> callee’s activation record - on return -> transmit to callee a pointer to caller’s activation record - store that pointer in callee’s activation record - pointer is a dynamic link formatting In pseudocode: fixed format lexical convention columns dedicated FORTRAN ignored blanks – all of them! Example: DIMENSION IN DATA (10000), RESULT(8000) DIMENSIONINDATA(10000),RESULT(8000) DIMENSI… Causes problems with compilers and humans Example: DO 20 I = 1. 100 same as DO20I=1.100 Looks like DO 20 I = 1, 100 <- autodeclaration of DO20I as a float American Viking Venus probe lost because of this error. VIOLATES PRINCIPLE OF DEFENSE IN DEPTH (implicit declaration missed it too) Lack of reserved words = Mistake! Example: DIMENSION IF(100) Now: IF(I-1) = 1,2,3 confused with IF(I-1) = 1 2 3 Compiler was a nightmare to write! Syntax of algebraic notation (-B + SQRT(B**2 – 4*A*C))/(2*a) Arithmetic operators have precedence 1. exp 2. mult, / 3. +, Languages differ on unary operator -b or +b could be at highest or at 3. No nested except in DO LOOP – Fortran 77 allows more! Linear syntactic organization.