Chap 6: Type Checking/Semantic Analysis CSE 4100 Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut 371 Fairfield Way, Unit 2155 Storrs, CT 06269-3155 steve@engr.uconn.edu http://www.engr.uconn.edu/~steve (860) 486 - 4818 Material for course thanks to: Laurent Michel Aggelos Kiayias Robert LeBarre CH6.1 Overview CSE 4100 Type Checking and Semantic Analysis is a Critical Features of Compilers and Compilation Passing a Syntax Check (Parsing) not Sufficient Type Checking Provides Vital Input Software Engineers Assisted in Debugging Process We’ll Focus on Classical Type Checking Issues Background and Motivation Type Analysis The Notion of a Type System Examining a Simple Type Checker Other Key Typing Concepts Concluding Remarks/Looking Ahead CH6.2 Background and Motivation CSE 4100 Recall.... CH6.3 Background and Motivation CSE 4100 What we have achieved All the “words” (Tokens) are known The tree is syntactically correct What we still do not know... Does the program make sense ? What we will not try to find out Is the program correct ? This is Impossible! Our Concern: Does it Compile? Are all Semantic Errors Removed? Do all Types and their Usage Make Sense? CH6.4 Background and Motivation CSE 4100 The program makes “sense” Programmer’s intent is clear Program is semantically unambiguous Data-wise – We know what each name denotes – We know how to represent everything Flow-wise – We know how to execute all the statements Structure-wise – Nothing is missing – Nothing is multiply defined The program is correct It will produce the expected input CH6.5 Tasks To Perform CSE 4100 Scope Analysis Figure out what each name refers to Understand where Scope Exists (See Chapter 7) Type Analysis Figure out the type of each name Names are functions, variables, types, etc. Completeness Analysis Check that everything is defined Check that nothing is multiply defined CH6.6 Output ? CSE 4100 What the analysis produce Data structures “on the side” To describe the types(resolve the types) To describe what each name denotes (resolve the scopes) A Decorated tree Add annotations in the tree Possibly.... Semantic Errors! CH6.7 Pictorially CSE 4100 CH6.8 Type Analysis CSE 4100 Purpose Find the type of every construction Local variables Actuals for calls Formals of method calls Objects Methods Expressions Rationale Types are useful to catch bugs! CH6.9 Type Analysis CSE 4100 Why Bother ? A type system is a tractable syntactic method for proving the absence of certain program behaviors by classifying phrases according to the kind of values they compute. CH6.10 Uses CSE 4100 Many! Error detection Detect early Detect automatically Detect obvious and subtle flaws Abstraction The skeleton to provide modularity Signature/structure/interface/class/ADT/.... Documentation Program are easier to read Language Safety guarantee Memory layout Bound checking Efficiency CH6.11 How It works ? CSE 4100 Classify programs according to the kind of values computed Set of All Programs Set of All Reasonable Programs Set of All Type-Safe Programs Set of All Correct Programs CH6.12 How do we do this ? CSE 4100 Compute the type of every sentence On the tree With a tree traversal Some information will flow up (synthesized) Some information will flow down (inherited) Questions to answer What is a type ? How do I create types ? How do I compute types ? CH6.13 Types... CSE 4100 Types form a language! With Terminals... Non-terminals.... And a grammar! Alternatively Types can be defined inductively Base types (a.k.a. the terminals) Inductive types (a.k.a. grammatical productions) CH6.14 Base Types CSE 4100 What are the base types ? int float double char void bool error CH6.15 Inductive Type Definition CSE 4100 Purpose Define a type in terms of other simple/smaller types Example array pointer reference Pair (products in the book) structure function methods classes ... CH6.16 Relation to Grammar ? CSE 4100 Type → array ( Type , Type ) → pair ( Type , Type ) → tuple ( Type+ ) → struct ( FieldType+ ) → fun ( Type ) : Type → method ( ClassType , Type ) : Type → pointer( Type ) → reference( Type ) → ClassType → BasicType ClassType → class ( name [ , Type ] ) FieldType → name : Type BasicType → int | bool | char | float | double | void | error CH6.17 Type Terms CSE 4100 What is that ? It is a sentence in the type language Example int pair(int,int) tuple(int,bool,float) array(int,int) fun(int) : int fun(tuple(int,char)) : int class(“Foo”) method(class(“Foo”), tuple(int,char)) : int CH6.18 So... CSE 4100 If we have Type Term we have a Type Language We can parse it and obtain.... Type Trees! fun(tuple(int,char)) : int CH6.19 The Notion of a Type System CSE 4100 Token Stream Logical Placement of Type Checker: Parser Syntax Tree Type Syntax Checker Tree Int. Code Interm. Generator Repres. Role of Type Checker is to Verify Semantic Contexts Incompatible Operator/Operands Existence of Flow-of Control Info (Goto/Labels) Uniqueness w.r.t. Variable Definition, Case Statement Labels/Ranges, etc. Naming Checks Across Blocks (Begin/End) Function Definition vs. Function Call Type Checking can Occur as “Side-Effect” of Parsing via a Judicious Use of Attribute Grammars! Employ Type Synthesis CH6.20 Example of type synthesis CSE 4100 Assume the program if (1+1 == 2) then 1 + 3 else 2 * 3 CH6.21 Yes... But.... CSE 4100 What about identifiers ? Key Idea Type for identifiers are inherited attributes! Inherits – From the definition – To the use site. int n; .... if (n == 0) then 1 else n CH6.22 Example CSE 4100 int n; .... if (n == 0) then 1 else n CH6.23 The Notion of a Type System CSE 4100 Type System/Checker Based on: Syntactic Language Construct The Notion of Types Rules for Assigning Types to Language Constructs Strength of Type Checking (Strong vs. Weak) Strong vs. Weak Dynamic vs. Static OODBS/OOPLS Offer Many Variants All Expression in Language MUST have Associated Type Basic (int, real, char, etc.) Constructed (from basic and constructed types) How are Type Expression Defined and Constructed? CH6.24 Type Expressions CSE 4100 A Basic Type is a Type Expression Examples: Boolean, Integer, Char, Real Note: TypeError is Basic Type to Represent Errors A Type Expression may have a Type Name which is also a Type Expression A Type Constructor Applied to Type Expression Results in a Type Expression Array(I,T): I is Integer Range, T is Type Expr. Product: T1T2 is Type Expr if T1, T2 Type Exprs. Record: Tuple of Field Names & Respective Type Pointer(T): T is a Type Expr., Pointer(T) also CH6.25 Type Expressions CSE 4100 A Type Constructor Applied to Type Expression Results in a Type Expression (Continued) Functions: May be Mathematically Characterized with Domain Type D and Range Type R F: D R int int int char char pointer(int) A Type Expression May Contain Variables whose Values are Type Expressions Called Type Variables We’ll Omit from our Discussion … CH6.26 Key Issues for Type System CSE 4100 Classical Type System Approaches Static Type Checking (Compile Time) Dynamic Type Checking (Run Time) How is each Handled in C? C++? Java? Language Level Issues: Sound Type System (ML) No Dynamic Type Checking is Required All Type Errors are Determined Statically Strongly Typed Language (Java, Ada) Compiler Guarantees no Type Errors During Execution Weakly Typed Language (C, LISP) Allows you to Break Rules at Runtime What about Today’s Web-based Languages? CH6.27 The Notion of a Type System CSE 4100 Types System: Rules Used by the Type Checker to Asign Types to Expressions and Verify Consistency Type Systems are Language/Compiler Dependent Different Versions of Pascal have Different Type Systems Same Language Can have Multiple Levels of Type Systems (C Compiler vs. Lint in Unix) Different Compilers for Same Language May Implement Type Checking Differently GNU C++ vs. MS Studio C++ Sun Java vs. MS Java (until Sun forced off market) What are the Key Issues? CH6.28 First Example: Simple Type Checker CSE 4100 Consider Simplistic Language: P→D;E D → D ; D | id: T T → char | int | array [ num] of T | T E → literal | number | id | E mod E | E [ E ] | E What does this Represent? Key: integer; Key MOD 999; X: character; B: integer; B MOD X; Are all of these Legal? A: array [100] of char; A[20] A[200] CH6.29 First: Add Typing into Symbol Table CSE 4100 P D D T T T → → → → → → T E E E → → → → D ; E D ; D id: T {addtype(id.entry, T.type)} char {T.type:= char} int {T.type:= int} array [ num] of T1 {T.Type:= array(1..num.val, T1.type} T {T.type:= pointer(T1.type)} literal {E.type:= char} number {E.type:= integer} id {E.type:= lookup(id.entry) } Notes: Assume Lexical recognition of id (in Lexical Analyzer) Adds id to Symbol Table Thus – we Augment this with T.Type CH6.30 Remaining Typing More Complex CSE 4100 E → E1 mod E2 {E.type := if E1.type = integer and E2.type = integer then integer else type_error} E → E1 [E2 ] {E.type := if E2.type = integer and E1.type = array(s, t) then t else type_error} E → E1 {E.type := if E1.type = pointer(t) then t else type_error} E1 E2 May be Mod, Array, or Ptr Expression Useful Extensions would Include Boolean Type and Extending Expression with Rel Ops, AND, OR, etc. CH6.31 Extending Example to Statements CSE 4100 These Extensions are More Complex from a Type Checking Perspective Right Now, only Individual Statements are Checked P → D ; S S → id =: E {S.type:=if id.type =E.type then void else type_error)} S → if E then S1 {S.type:=if E.type = boolean then S1.type else type_error)} S → while E do S1 {S.type:=if E.type = boolean then S1.type else type_error)} S → S1 ; S2 {S.type:=if S1.type = void and S2.type = void then void else type_error)} CH6.32 What are Main Issues in Type Checking? CSE 4100 Type Equivalence: Conditions under which Types are the Same Tracking of Scoping – Nested Declarations Type Compatibility Conversion/casting, Nonconverting casts, Coercion Type Inference Determining the Type of a Complex Expression Reviewing Remaining Concepts of Note Overloading, Polymorphism, Generics In OO Case: Classes are OO version of a type Issues Need to Consider Way Program in OO In Older Languages like C, these are Critical CH6.33 Structural vs. Name Equivalence of Types CSE 4100 Two Types are “Structurally Equivalent” iff they are Equivalent Under Following 3 Rules: SE1: A Type Name is Structurally Equivalent to Itself SE2: T1 and T2 are Structurally Equivalent if they are Formed by Applying the Same Type Constructors to Structurally Equivalent Types SE3: After a Type Declaration: Type n=T, the Type Name n is Structurally Equivalent to T SE3 is “Name Equivalence” What Do Programming Languages Use? C: All Three Rules Pascal: Omits SE2 and Restricts SE3 to be a Type Name can only be Structurally Equivalent to Other Type Names CH6.34 Type Equivalence CSE 4100 Structural equivalence: equivalent if built in the same way (same parts, same order) Name equivalence: distinctly named types are always different Structural equivalence questions What parts constitute a structural difference? Storage: record fields, array size Naming of storage: field names, array indices Field order How to distinguish between intentional vs. incidental structural similarities? An argument for name equivalence: “They’re different because the programmer said so; if they’re CH6.35 Type Equivalence Records and Arrays CSE 4100 Would record types with identical fields, but different name order, be structurally equivalent? type PascalRec = record a : integer; b : integer end; val MLRec = { a = 1, b = 2 }; val OtherRec = { b = 2, a = 1 }; When are arrays with the same number of elements structurally equivalent? type str = array [1..10] of integer; type str = array [1..2 * 5] of integer; type str = array [0..9] of integer; CH6.36 Consider Name Equivalence in Pascal CSE 4100 type How are Following Compared: var By Rules SE1, SE2, SE3, all are Equivalent! However: Some Implementations of Pascal link = cell; next : link; last : link; p : cell; q, r : cell; next, last – Equivalent p, q, r, - Equivalent Other Implementations of Pascal next, last – Equivalent q, r, - Equivalent type How is Following Interpreted? var link = cell; np = cell; npr = cell; next : link; last : link; p : np; q, r : npr; CH6.37 What about Classes and Equivalence? CSE 4100 Are these SE1? SE2? Or SE3? public class person { public class user { private String lastname, private String lastname, firstname; firstname; private String loginID; private String loginID; private String password; private String password; }; }; What Does Java Require? CH6.38 Checking Structural Equivalence CSE 4100 Employ a Recursive Algorithm to Check SE2: Algorithm Adaptable for Other Versions of SE Constructive Equivalence Means Following are Same: X: array[1..10] of int; Y: array[1..10] of int; CH6.39 Alias Types and Name Equivalence CSE 4100 Alias types are types that purely consist of a different name for another type TYPE TYPE TYPE TYPE Stack_Element = INTEGER; Level = INTEGER; Celsius = REAL; Fahrenheit = REAL; Is Integer assignable to a Stack_Element? Levels? Can a Celsius and Fahrenheit be assigned to each other? Strict name equivalence: aliased types are distinct Loose name equivalence: aliased types are equivalence Ada allows additional explicit equivalence control: subtype Stack_Element is integer; type Celsius is new real; type Fahrenheit is new real; CH6.40 Why is Degree of Type Equivalence Critical? CSE 4100 Governs how Software Engineers Develop Code… Why? SE2 Alone Doesn’t Promote Well Designed, Thought Out, Software … Why? Impacts on Team-Oriented Software Development… How? With SE2 Alone, Errors are Harder to Locate and Correct… Why? Increases Compilation Time with SE2 Alone … Why? CH6.41 Scoping CSE 4100 What is the problem ? Consider this example program class Foo { int n; Foo() { n = 0;} int run(int n) { int i; int j; i = 0; j = 0; while (i < n) { int n; n = i * 2; j = j + n; } return j; } }; CH6.42 Resolving the Issue CSE 4100 Observation Scopes are always properly nested Each new definition could have a different type Idea Make the typing environment sensitive to scopes New operations on typing env. Entering a scope Effect: New declarations overload previous one Leaving a scope Effect: Old declarations become current again What are the Issues? Activating and Tracking Scopes! CH6.43 Scoping CSE 4100 The Scopes class Foo { int n; Foo() { n = 0;} int run(int n) { int i; int j; i = 0; j = 0; while (i < n) { int n; n = i * 2; j = j + n; } return j; } }; Class Scope Method Scope Body Scope Block Scope Key point: Non-shadowed names remain visible CH6.44 Handling Scopes CSE 4100 From a declarative standpoint Introduce a new typing environment Initially equal to the copy of the original Then augmented with the new declarations Discard environment when leaving the scope From an implementation point of view Environment directly accounts for scoping How ? Scope chaining! CH6.45 Scope Chaining CSE 4100 Key Ideas One scope = One hashtable Scope chaining = A linked list of scopes Abstract Data Type Semantic Environment pushScope – Add a new scope in front of the linked list popScope – Remove the scope at the front of the list lookup(name) – Search for an entry for name. If nothing in first scope, start scanning the subsequent scopes in the linked list. CH6.46 Scope Chaining CSE 4100 Advantages Updates are non-destructive When we pop a scope, the previous list is unchanged since addition are only done in the top scope The current list of scopes can be saved (when needed) CH6.47 Entering & Leaving Scopes CSE 4100 Easy to find out... Use the tree structure! Entering scope When entering a class When entering a method When entering a block Leaving scope End of class End of method End of block We’ll Revisit in Chapter 7 on Runtime Environment! CH6.48 Type Conversion CSE 4100 Certain contexts in certain languages may require exact matches with respect to types: aVar := anExpression value1 + value2 foo(arg1, arg2, arg3, … , argN) Type conversion seeks to follow these exact match rules while allowing programmers some flexibility in the values used Using structurally-equivalent types in a nameequivalent language Types whose value ranges may be distinct but intersect (e.g. subranges) Distinct types with sensible/meaningful corresponding values (e.g. integers and floats) CH6.49 Type Conversion CSE 4100 Refers to the Conversion Between Different Types to Carry out Some Action in a Program Often Abused within a Programming Language (C) Typically Used in Arithmetic/Boolean Expressions r := i + r; (Pascal) f := i + c; (C) Two Kinds of Conversion: Implicit: Automatically done by Compiler Explicit: Type-Casts: Programmer Initiated (Ord, Chr, Trunc) If X is a real array, which works faster? Why for I:=1 to N do X[I] := 1; for I:=1 to N do X[I] := 1.0; A Good Optimizing Compiler will Convert 1st option! CH6.50 Type Casting Syntax CSE 4100 Ada n : integer; r : real; ... r := real(n); C/C++/Java // Sample is specific to Java, but shares common syntax. Object n; String s; ... s = (String)n; Some SQLs -- Timestamp is a built-in data type; charField is -- a varchar (string) field of some table. select charField::timestamp from… CH6.51 Type Conversion CSE 4100 Accomplished via an Attribute Grammar Double Underline is a Coercion which must be Tracked and Recognized Later During Code Generation. Why? Real := Real * Integer; What Kind of “*” are there to Utilize? This is Overloading! CH6.52 Non-Converting Type Casts CSE 4100 Type casts that explicitly preserve the internal bit-level representation of values Common in manipulating allocated blocks of memory Same block of memory may be viewed as arrays of characters, integers, or even records/structures Block of memory may be read from a file or other external source that is initially viewed as a “raw” set of bytes CH6.53 Non-Converting Type Casts - Examples CSE 4100 Ada – Explicit Unchecked Conversion Subroutine function cast_float_to_int is new unchecked_conversion(float, integer); C/C++ (Not Java): Pointer Games void *block; // Gets loaded up with some datafrom a file. Record *header = (Record *)block; // Record is struct. • C++: explicit cast types static_cast, reinterpret_cast, dynamic_cast int i = static_cast<int>(d); // Assume d is double. Record *header = reinterpret_cast<Record *>(block); Derived *dObj = dynamic_cast<Derived *>(baseObj); // Derived is a subclass of Base. CH6.54 Type Coercion CSE 4100 Sometimes absolute type equivalence is too strict; type compatibility is sufficient Type equivalence vs. type compatibility in Ada (strict): Types must be equivalent One type must be a subtype of another, or both are subtypes of the same base type Types are arrays with the same sizes and element types in each dimension Pascal extends slightly, also allowing: Base and subrange types are cross-compatible Integers may be used where a real is expected Type coercion is an implicit type conversion between compatible but not necessarily equivalent types CH6.55 Type Coercion Issues CSE 4100 Sometimes viewed as a weakening of type securitY Mixing of types without explicit indication of intent Opposite end of the spectrum: C and Fortran Allow interchangeable use of numeric types Fortran: arithmetic can be performed on entire arrays C: arrays and pointers are roughly interchangeable C++ Add Programmer Extensible Coercion Rules class ctr { public: ctr(int i = 0, char* x = "ctr") { n = i; strcpy(s, x); } ctr& operator++(int) { n++; return *this; } operator int() { return n; } // Coercion to int operator char*() { return s; } // Coercion to char * private: int n; char s[64]; }; CH6.56 Type Inference CSE 4100 Type inference refers to the process of determining the type of an arbitrarily complex expression Generally not a huge issue — most of the time, the type for the result of a given operation or function is clearly known, and you just “build up” to the final type as you evaluate the expression In languages where an assignment is also an expression, the convention is to have the “result” type be the type of the lefthandside But, there are occasional issues, specifically with subrange and composite types CH6.57 Examples of Type Inference CSE 4100 Subranges — in languages that can define types as subranges of base types (Ada, Pascal), type inference can be an issue: type Atype = 0..20; Btype = 10..20; var a : Atype; b : Btype; c : ????; c := a + b; What should c’s type be? Easy answer: always go back to the base type (integer in this case) CH6.58 Examples of Type Inference CSE 4100 What if the result of an expression is assigned to a subrange? a := 5 + b; (* a and b are defined on last slide *) The primary question is bounds checking — operations on subranges can certainly produce results that break away from their defined bounds Static checks: include code that infers the lowest and highest possible results from an expression Dynamic check: static checks are not always possible, so the last resort is to check the result at runtime CH6.59 Examples of Type Inference CSE 4100 Composite types What is the type of operators on arrays? We know it’s an array, but what specifically? (particularly for languages where the index range is part of the array definition) Examples: Strings in languages where strings are exactly character arrays (Pascal, Ada) CH6.60 Examples of Type Inference CSE 4100 Sets In languages that encode a base type with a set (e.g. set of integer), what is the “type” of unions, intersections, and differences of sets? Examples: Particularly tricky when a set is combined with a subrange Same as subrange handling: static checks are possible in some cases, but dynamic checks are not completely avoidable var A : set of 1..10; B : set of 10..20; C : set of 1..15; i : 1..30; ... C := A + B * [1..5, i]; CH6.61 Overloading CSE 4100 The Same Symbol has Different Meanings in Different Contexts Many Examples: + : int int int + : real real real + : set set set (union) + : string string string (concatenate) == (compares multiple types) … >> cout (outputs multiple types) Impacts on Conversion since During Code Generation we must Choose “Correct” Option based on Type CH6.62 Overloading CSE 4100 Coercion Requires the Need to Convert Expression Before Generating Code real := real * int – need to use real * real := real * int_to_real(int) – do the conversion After Conversion, Code Generation can Occur Overloading has Increased Attention with Emergence of Object-Oriented Langauges C++ and Java Allow User Defined Overloaded Definitions for +, -, *, etc. Programmer Definable Routines (e.g., SORT) can be Overloaded based on Type CH6.63 Overloading CSE 4100 A very handy mechanism Available in C++/Java/... What is it? Provide multiple definition of the same function over different types. Example [C++] int operator+(int a,int b); float operator+(float a,float b); float operator+(float a,int b); float operator+(int a,float b); Complex operator+(Complex a,Complex b); .... CH6.64 Polymorphism CSE 4100 Essential Concept: A Function is Polymorphic if it can be Utilized with Arguments/Parameters of More than 1 Type The EXACT, SAME, Piece of Code is being Utilized Why is Polymorphism Important? Consider a List of Items in C: struct item { int info; struct item *next; } Write a Length Function function LEN(list: *item) : integer; In theory – only need to access *next … However, in C, you can’t reuse this Length Function for Different Structures Write a Similar Version for Each Different Structure CH6.65 Polymorphism CSE 4100 Allows us to write Type Independent Code Polymorphism is Supported in ML, a Strongly Type Functional Programming Language: fun length (lptr) = if null (lptr) then 0 else length(tail(lptr)) + 1; Overloading is an Example of ad-hoc Polymorphism Parametric Polymorphism Essentially has Type as a Parameter to Function Stack Operations: Create (T: stack_type) Arithmetic Operations: Plus (X, Y: T: T is a type) This leads to Generics! CH6.66 Generics CSE 4100 Imagine a code fragment that Computes the length of a list Java-style class ListNode<T> { T data; Node<T> next; Node<T>(T d,Node<T> n) { data = d; next = n; } int length() { if (next != null) return 1 + next.length(); else return 1; } } This does not refer to T at all, it can be implemented only once! So... What is the type of this method ? CH6.67 Concluding Remarks/Looking Ahead CSE 4100 Type Checking/Semantic Analysis/Type Systems are Important Part of Compilation Process Check/Verify Context Sensitive Aspects of Language Compatible Operations Definition of Variable Before Use Array Access Etc. Other Issues Impacting Type Checking include Conversion, Scoping, Polymorphism, etc. How Does Type Checking Apply to Project? Looking Ahead: Exploration of Runtime Organization and Environment CH6.68