Introduction to Semantics The meaning of a language 1 Overview This section will cover: what is semantics of a computer language? Three forms of formal semantics and their purpose. classification of errors 2 What is semantics? Semantics describes the meaning of the elements that make up a programming language. Syntax describes the form of program elements. Syntax: ident ::= [A-Za-z_]\w* declaration ::= datatype ident ['=' value] Semantics: - You can only declare an identifier once per scope! - You must declare every identifier before you use it in a statement. - An identifier cannot be a reserved word. 3 What is semantics? (cont'd) Semantic definition of syntax raises more semantic questions... What is a scope? What is a data type? What operations are permitted on data types? Syntax: term ::= factor { (*|/) factor } Semantics (for integer arithmetic): Division by zero raises an exception If the result exceeds the range of 32-bit 2's complement integers, the higher bits are ignored! 4 The parts of semantics Semantics defines many characteristics of a language; semantics is intertwined with: names and scope of names binding time the type system protocol for subprograms These issues will be covered later. 5 Semantics: Type System Semantics defines the type system type system - data types and allowed operations can the programmer define his own data types or aliases for existing data types? range of data types and result of operations Example: Syntax: intvalue ::= 0 | [-](1|...|9){ digit } Semantics: int values are stored as 32-bit binary values in 2's complement form, with a range 2-31 to 231-1. Syntax: expr ::= term { (+|-) term } Semantics: integer + and - are performed modulo the range of int values. If a result of a calculation exceeds the largest value, then higher order bits results from the calculation are discarded. 6 Semantics: Identifiers Identifiers are names. Names can identify: variables and constants points in the program (labels) functions and procedures (subprograms) programmer-defined data types, including classes Syntax: identifier ::= [A-Za-z_]\w* Semantics: all variable and constant names must be unique! cannot use reserved words as identifiers names are case sensitive can a function (method) have same name as variable? 7 Semantics: Scope Scope is a range of statements. Scope of an identifier is the range where an identifier is visible or known. Semantics defines the scope of identifiers. 8 Scope Examples In Pascal, Fortran, and C: variables are defined at the top of a subprogram. Their scope is the entire subprogram. REAL FUNCTION ADD(A, B) REAL X X = A + B ADD = X RETURN END PROGRAM MAIN REAL X, Y, A X = 20.0 Y = 5.0 A = ADD(X, Y) PRINT *, "Sum is ", A 9 Scope Examples C++, C#, Java: variables can be defined anywhere, their scope is from the point of declaration to the end of the smallest enclosing block, { ... }. Idea based on Algol. C float sum(int max) { float x, sum = 0; int k; for(k=1; k<=max; k++){ scanf("%f", &x); sum = sum + x; } return sum; } C++ float sum(int max) { float sum = 0; for(int k=1; k<=max;k++){ float x; scanf("%f", &x); sum = sum + x; } return sum; } 10 Semantics: Binding Time Semantics also defines when names are "bound" to properties. C Example: int n = 1000; "int" bound by C language definition "int is 32-bits" bound by compiler implementation "n is an int" bound when program is compiled address of n is bound when program is loaded (static var) or when function is executed (stack local var) "n = 1000" is bound when program is loaded (static) or each time function is executed 11 Semantics: Subprograms Semantics defines the meaning of subprograms. In particular, how parameters are passed and values returned. /* C and C++ default is * to pass parameters * by value */ void swap(int a, int b) { int tmp; tmp = a; a = b; b = tmp; } /* C++ and C# let you * pass parameters * by reference */ void swap(int& a, int& b) { int tmp; tmp = a; a = b; b = tmp; } 12 Formal Semantics Three mathematical notations for semantics exist. Operational semantics describes the effect of each semantic element on state of a hypothetical computer. Axiomatic semantics describes assertions (or axioms) of what must be true before and after an expression is executed. Denotational semantics describes semantic elements as state changing functions, again using some hypothetical computer. May use recursive functions. 13 Formal Semantics Examples Consider assignment: target = source Operational semantics: s is the state of the computer, v is any value of the source, U-bar is overriding union: s (source) v s ( target source) s {( target ,v )} Axiomatic semantics: if s target = source true s ( s.target \ s.source) ss Denotational semantics: M is a mapping of expressions to program states s . M : Statement State State M ( s, s ) s {( s.target ,s.source )} 14 Why Formal Semantics? 1. Avoid ambiguities in the implementation This can lead to different compilers producing different executable programs from same source. Ada had an ambiguity in implementation of "in out" parameters. In some programs, different compilers produced different results! 2. Enable formal proof of program correctness, at least in some situations. 3. Enable verification that a compiler adheres to language specification. 15 Static and Dynamic Characteristics Aspects of a computer language can be defined as static or dynamic. You often hear "dynamic memory allocation" or "static binding". Static - something that is done or known before the program executes, including things done while the program is being loaded for execution. Dynamic - something that is done or known while the program executes. 16 Static/Dynamic Examples Syntax checking for compiled languages is static A division by zero error is dynamic (unless you insult the compiler by writing "x/0") The definition of data types like "int", "float" is static. Allocating memory for function calls is dynamic. The scope of a variable can be static or dynamic, depending on the language... but usually static. 17 Classifying Errors It is helpful to classify errors by type and when they are detected. 18 Classifying Errors Lexical errors are detected by the compiler: static. Syntax errors are detected by the compiler: static. Semantic errors may be: detected by compiler. int n = 2.5; detected by linker. r = SQRT(x*x+y*y); detected at run-time. /* java */ for(k=-1; ;k++) sum +=a[k]; Logic errors may be: detected at run-time not detected at all 19 Find 9 errors in this program classify as: lexical, syntax, static semantic, dynamic semantic, or logical. Indicate when error is detected. include <stdio.h> /* return maximum of x and y */ int max( integer x, integer y ) { if (x > y) return y; else return x; } int main( ) { int a, b; printf("Input two integers: "); scanf("%f %f", a, b); printf("The max of %d and %d is %d\n", a, b, MAX(x,y); return; } 20 Find 9 errors in this program: solution include <stdio.h> 1. Syntax: missing "#" detected by compiler at "<" symbol int max( integer x, integer y ) { 2. Static Semantic: "integer" isn't a datatype, compiler detect if (x > y) return y; else return x; 3. Logic Error not detected!: this returns min of x and y scanf("%f %f", a, b); 4. Dynamic semantic error: "%f" should be "%d", may be a run-time error or not detected at all 5. Semantic error: must use address of a, b (&a,&b) in scanf. The compiler should detect this, but it may not (gcc did not), since an int can be an address! Maybe runtime error. 21 Find 9 errors in this program: solution printf("The max of %d and %d is %d\n", a, b, MAX(x,y); 6. Static semantic error: "MAX" should be "max". The linker will report an "unresolved external symbol" error because it couldn't find a function named "MAX". 7. Static semantic error: (x,y) should be (a,b). Compiler will report use of undefined variables x, y. 8. Syntax error: missing ")" to close printf( ... ). Compiler reports this as a syntax error. return; 9. Static Semantic error: declared "int main" but here there is no return value. Semantics says that the function's actual return type has to be the same as in the header. Detected by compiler. 22 Find 7 errors in this program classify as: lexical, syntax, static semantic, dynamic semantic, or logical. Indicate when error is detected. #include <stdlib.h> /* return x modulo y, return 0 if y is 0. */ int mod( int x, int y ) { if ( y = 0 ) return 0; else return x # y; } void main( ) { int a, b; printf("Input two integers: "); scanf("%d %d", a, b); printf("%d mod %d is %d\n", a, b, mod(b,a); return; } 23 Find 7 errors: partial solution #include <stdlib.h> Static semantic error: we didn't #include <stdio.h>, so compiler should give an error when scanf and printf are used. However, gcc ignores this. if ( y = 0 ) return 0; // should be ( y == 0 ) Logic error: the C language allows any expression to be used as a test condition in "if". This will set y equal 0, then return a value 0, so the "if" test is always false. The next statement will produce a division by zero error. Java doesn't allow conversion of other datatypes to boolean, so this would be a syntax error in Java. void main( ) { Static semantic error: the C language says that main should return an int. Compiler reports this error. 24 Attributes Properties of language entities, especially identifiers. Examples: Value of an expression Data type of an identifier Number of digits in a numeric data type Memory location of a variable Code body of a function or method Declarations ("definitions") bind attributes to identifiers. Different declarations may bind the same identifier to different sets of attributes. 25 Binding Binding means "an association" associate names with values associate symbols with operations Binding Time describes when this occurs Example: int count; the name "int" was bound by the C language def'n (along with meanings of operators +, -, ... for int) the size (and set of possible values) of "int" was bound bound at compiler design time identifier "count" is bound to "int" at compile time the location is bound at load or execution time 26 Binding Times Louden gives 6 possible binding times: language definition time: Java defines precision of int; C leaves it to the implementation. In C, an "int" can be 16 bits or 32 bits. The stdint.h header on UNIX provides typedefs, such as: typedef short int typedef int int16_t; int32_t; language implementation time: when the compiler or interpreter is written translation time (compile time) link time, for compiled programs load time 27 execution time Load Time versus Execution Time How are count and sum different? C example: int count; /* an external variable is static */ int sub( ) { int sum; /* a local variable, dynamically allocated */ /* do something */ } count is allocated storage at load time (and exists for the life of the program) sum is allocated storage at execution time, i.e. each time sub is executed The scope of count and sum are also different. 28 Static and Dynamic Binding Static Binding - occurs before the program is run Dynamic Binding - occurs while the program is running a symbol can have both static and dynamic attributes /* Binding time example */ int count; /*external var */ int sub( ) { int sum = 0; static int last = 0; int *x; void *p; p = (double *)malloc(...); Type Binding static static static static static dynamic Storage Binding static dynamic dynamic static dynamic dynamic 29 Exercise For each of these attributes, indicate the binding time in C and Java as precisely as possible. 1. number of significant digits in a "float" 2. the meaning of "char" 3. the size of an array variable 4. the memory location of a local variable 5. the value of a constant (C "const int", Java "final") 6. the memory location of a function or method Hint: C and Java differ at least in items 1 and 5 30 So now you know... When someone asks, "are method names statically or dynamically bound to actual code"? /* Java */ class Pet { public void talk( ) { System.out.println("hello"); } } class Dog extends Pet { public void talk() { System.out.println("woof"); } ... Pet p = new Dog( ); p.talk( ); /* C++ */ class Pet { public: void talk( ) { cout << "hello" << endl; } } class Dog: public Pet { public: void talk() { cout << "woof" << endl; } } ... Pet *p; Dog dog; p = &dog; p->talk( ); 31 So now you know... In C++, method names are statically bound to code, unless "virtual" is specified. In Java, all methods are dynamically bound to actual code, except in these cases... "private" methods are statically bound "static" methods are statically bound "final" methods are statically bound 32 Variables and Constants A variable is a name for a memory location, its value can change during execution. A constant is an object whose value does not change throughout its lifetime. Literals are data values (no names) used in a program. int buffer[80]; 80 is a numeric literal. Constants may be: substituted for values by compiler (never allocated) compile-time static (compiler can set value) load-time static (value determined at load time) dynamic (value determined at run time) 33 Binding of Constants C "const" can be compile time, load time, or run time constants: const int MaxSize = 80; /* compile time */ void mysub( const int n ) { const time_t now = time(0); /* load time */ const int LastN = n; /* dynamic */ In Java, "final" means a variable cannot be changed after the first assignment. Otherwise, same as var. static final int MAX = 1000; /* class loadtime */ void mysub ( int n ) { final int LastN = n; /* runtime */ 34 Constants (2) Compile-time constant in Java: static final int zero = 0; Load-time constant in Java: static final Date now = new Date(); Dynamic constant in Java: any non-static final variable. Java "final" identifiers are variables with a restriction (no reassignment). C "const" is more strict: compiler has the option to eliminate them during compilation. 35