Programming Languages 2nd edition Tucker and Noonan Chapter 5 Types Types are the leaven of computer programming; they make it digestible. Robin Milner Copyright © 2006 The McGraw-Hill Companies, Inc. 5.1 Type Errors 5.2 Static and Dynamic Typing 5.3 Basic Types 5.4 Non-Basic Types-through 5.4.2 5.5 Recursive Data Types 5.6 Functions as Types 5.7 Type Equivalence 5.8 Subtypes 5.9 Polymorphism and Generics 5.10 Programmer-Defined Types Copyright © 2006 The McGraw-Hill Companies, Inc. What is a Type? A type is a collection of values and operations defined on those values. Example: Integer type has values ..., -2, -1, 0, 1, 2, ... and operations +, -, *, /, <, ... The Boolean type has values true and false and operations , , (!), … Most types have a finite number of values due to fixed size allocation (e.g., can’t express all integers) Exception: some languages (Java, Haskell)support unlimited integer sizes Copyright © 2006 The McGraw-Hill Companies, Inc. How Many Types Are Needed? Early languages, (Fortran, Algol, Cobol), built in all types – no way to define new ones. If you needed a type color, you could use integers; But – there’s no way to prevent illegal operations. In programming languages, types are used to provide ways of effectively modeling a problem solution, so the user should be able to define new types. Copyright © 2006 The McGraw-Hill Companies, Inc. 5.1 Type Errors Machine data carries no type information. Example: 0100 0000 0101 1000 0000 0000 0000 0000 could represent – 3.375 (floating point) – 1,079,508,992 (a 32-bit integer) – 16472 & 0 (two 16-bit integers) – The ASCII characters @ X NUL NUL depending on the interpretation Copyright © 2006 The McGraw-Hill Companies, Inc. • A type error is any error that arises because an operation is attempted on a data type for which it is undefined. – Type errors are common in assembly language programming; e.g., add two floats using an integer machine operation – High level languages reduce the number of type errors. • A type system defines the bindings between a variable’s type, values, and the operations that can be performed on the values. – provides a basis for detecting type errors. Copyright © 2006 The McGraw-Hill Companies, Inc. 5.2 Static and Dynamic Typing A type system imposes constraints that cannot be expressed syntactically in BNF, EBNF, or any other notation system for context-free grammars. Type checking is part of semantic processing – Some languages perform type checking at compile time (e.g, C/C++). (compile-time semantics) – Others (e.g., Perl, Python) perform type checking at run time. – Still others (e.g., Java) do both. Copyright © 2006 The McGraw-Hill Companies, Inc. Definitions A language is statically typed if the types of all variables are fixed when they are declared at compile time. A language is dynamically typed if the type of a variable can vary at run time depending on the value assigned. Copyright © 2006 The McGraw-Hill Companies, Inc. Definition: Strong typing A language is strongly typed if its type system allows all type errors in a program to be detected (either at compile time or at run time.) – If a language detects all type errors we sometimes say it exhibits type safety – Alternate definitions of strong typing exist, however A strongly typed language can be either statically (Ada, Java) or dynamically (Scheme, Perl) typed – (some definitions of strong typing require types to be statically declared.) Copyright © 2006 The McGraw-Hill Companies, Inc. Strong Typing (continued_ C and C++ are not considered strongly typed. • For example, union types aren’t type checked • Array types can be misused Type coercion, legal in most languages, can cause type errors to be undetected. • Consider the expression “a + b”, where a & b are integers. Typing “a + c” by mistake, where c is a float, should be an error but it won’t be detected. Copyright © 2006 The McGraw-Hill Companies, Inc. 5.3 Basic Types Consist of atomic values (e.g., floats) Sometimes called simple or primitive types. Traditionally, correspond for the most part to data types provided by computers and occupy fixed amounts of memory, based on computer memory structure. Copyright © 2006 The McGraw-Hill Companies, Inc. Common Memory Sizes for the Basic Types (On 32-bit Computers) • Nibble: 4 bits • Byte: 8 bits • Half-word: 16 bits In most languages, simple data types are limited to one of these sizes • Word: 32 bits • Double word: 64 bits • Quad word: 128 bits Copyright © 2006 The McGraw-Hill Companies, Inc. Basic Types in C, Ada, Java Type C Ada Java Byte --- --- byte Integer short, int, long integer short, int, long Real float, double float, decimal float, double Character char character char boolean boolean Boolean --- Copyright © 2006 The McGraw-Hill Companies, Inc. Memory Requirements In most languages some bindings of type to memory size are implementation dependent. For example, in C/C++ the only language restriction is sizeOf(short)<= sizeOf(int)<= sizeOf(long) Java defines size of each (half-word, word, double word) Copyright © 2006 The McGraw-Hill Companies, Inc. Floating Point Representation Floating point numbers (reals) are expressed as the product of a binary number and a power of 2. IEEE standard is used on computers designed/built after 1980 If s = sign bit, e = exponent (in excess 127 notation) and m is the mantissa, a single precision representation means (-1)s x 1.m x 2e-127 Copyright © 2006 The McGraw-Hill Companies, Inc. IEEE 754 Floating Point Number Representation Figure 5.1 (-1)s x 1.m x 2e-127 Copyright © 2006 The McGraw-Hill Companies, Inc. Floating Point Problems • Integers are exact, floats aren’t necessarily so. – e.g., since 0.2 cannot be represented exactly as a binary fraction, the floating point representation is not exact; result: 0.2 * 5 is not exactly 1.0 • Overflow and underflow are possible • Inaccuracies due to floating point arithmetic may affect results; for example, it’s possible that a + (b + c) ≠ (a + b) + c Copyright © 2006 The McGraw-Hill Companies, Inc. Example: a = -1324x103, b = 1325 x 103, c = 5424 x 100 Assume 4 significant figures for the fraction. (a + b) = -1324 x 103 + 1325 x 103 = 1 x 103 (a + b) + c = 1000 x 100 + 5424 x 100 = 6424 x 100 Whereas (b + c) = 1325 x 103 + 0005 x 103 = 1330 x 103 a + (b + c) = -1324 x 103 + 1330 x 103 = 6 x 103 = 6000 x 100 This is sometimes called representational error Copyright © 2006 The McGraw-Hill Companies, Inc. Overflow Arithmetic operations may overflow the finite memory allocation and give an incorrect result, regardless of whether the values are integers or floats. Some machines generate an exception when this happens, some (e.g., Java Virtual Machine) don’t. Copyright © 2006 The McGraw-Hill Companies, Inc. Java’s BigInteger and BigDecimal type http://java.sun.com/j2se/1.5.0/docs/api/java/math/BigInteger.html arbitrary-precision integers, semantics of arithmetic operations exactly mimic those of Java's integer arithmetic operators http://java.sun.com/j2se/1.5.0/docs/api/java/math/BigDecimal.html arbitrary-precision signed decimal numbers, consist of an arbitrary precision integer unscaled value and a 32-bit integer scale. Some applications; e.g., computer security, need very large numbers. Copyright © 2006 The McGraw-Hill Companies, Inc. Operator Overloading An operator or function is overloaded when its meaning varies depending on the types of its operands or arguments or result. More about function overloading later Java: a + b – floating point add – integer add – string concatenation What if a is an integer and b is a float? Copyright © 2006 The McGraw-Hill Companies, Inc. Mixed Mode Expressions Mixed mode: operands have different types; e.g., one operand an int, the other floating point Why is it a problem? (Difference in hardware operations) Most languages have type rules that define how mixed mode expressions will be treated. Copyright © 2006 The McGraw-Hill Companies, Inc. Mixed Mode & Type Conversions Java, C, C++ and others use type conversions to make mixed type operands compatible. Coercion: The conversions are done automatically by the language according to a set of type rules. (implicit conversions) Casting: The conversion must be specifically requested by the user. (explicit conversions) Copyright © 2006 The McGraw-Hill Companies, Inc. Type Conversions A type conversion is a narrowing conversion if the result type permits fewer bits, thus potentially losing information. Narrowing: float to int, or long to short Otherwise it is termed a widening conversion. Copyright © 2006 The McGraw-Hill Companies, Inc. Implicit v Explicit Type Conversions Language design decision: Should type conversions be done automatically by the compiler (implicit conversions) or should the language require a cast (explicit)? Consensus: OK to have implicit widening conversions; narrowing conversions should be explicit. Implicit conversions (coercion) may make a language less type safe. Copyright © 2006 The McGraw-Hill Companies, Inc. Implicit v Explicit Type Conversions C++ example widening: C++ example narrowing int sum; int n; float average; average = sum / n; float num1; float num2; int sum; sum = num1 + num2; Copyright © 2006 The McGraw-Hill Companies, Inc. Other Basic Types – Design Decisions Should the language include a Boolean type? What character set should be used? – Traditionally, ASCII: 7-bit code stored in an 8bit byte, which allows for 127 characters – ASCII Latin-1 variant uses 8 bits to encode 191 characters for languages with larger alphabets – EBCDIC: 8-bit code developed by IBM – Unicode: UTF-8, UTF-16, UTF-32. Java uses UTF-16 (16-bit code); currently has codes for about 50,000 characters. Copyright © 2006 The McGraw-Hill Companies, Inc. 5.4 Nonbasic Types Nonbasic types: • Composed from other data types (e.g., arrays) • Also called structured types • Size of the values of these data types is not fixed by the language but is programmer determined. Copyright © 2006 The McGraw-Hill Companies, Inc. 5.4.1 Enumeration Types Provided by many languages as a way to improve program readability Essentially, allow the programmer to assign names to integer values and then use the names in programming (e.g., color, shape, day, …) Copyright © 2006 The McGraw-Hill Companies, Inc. Enumeration Types – C/C++ Example: enum Day {Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, Sunday}; enum Day myDay = Wednesday; Values can be compared: if (myDay == Sunday) ... Implicit enumeration-type-to-integer conversion is okay; use a cast to go the other direction: for(myDay = Monday; myDay < Friday; Day(myDay + 1)) Copyright © 2006 The McGraw-Hill Companies, Inc. Enumeration Types - Java More powerful than C/C++ version because the Enum class inherits methods from the Java class Object, such as toString. for (day d : day.values()) System.out.println(d); Copyright © 2006 The McGraw-Hill Companies, Inc. 5.4.2 Pointers Design decision: to have pointer types, or not? • C, C++, Ada, Pascal • Java??? History • Early languages (Fortran, COBOL): static memory • Later languages added pointers for dynamic memory allocation (Pascal, C, C++, Ada) • Modern languages like Java do not have explicit pointers but still support dynamic memory allocation. – Scheme and Prolog also have implicit pointers. Copyright © 2006 The McGraw-Hill Companies, Inc. Pointer Operations Pointers and references support the construction of linked data structures/data types. The value of a pointer type variable is a memory address. C/C++ operators: • address-of (unary &) returns the address of a variable • dereferencing (unary *) returns the value of a reference – Corresponds to indirect addressing mode in machine code For an arbitrary variable x: *(&x) = x Copyright © 2006 The McGraw-Hill Companies, Inc. C/C++ Examples – Linked List Definitions struct Node { int key; struct Node* next; }; struct Node* head; Node is a recursive data structure Node is a recursive data structure Copyright © 2006 The McGraw-Hill Companies, Inc. Fig 5.4: A Simple Linked List in C/C++ Copyright © 2006 The McGraw-Hill Companies, Inc. Pointer Problems Error-prone (mistakes in usage) Memory leaks (from non-freed dynamic memory) Particularly troublesome in C/C++, in part because of the duality between pointers and array references Copyright © 2006 The McGraw-Hill Companies, Inc. C Array/Pointer Example with Dereferencing float sum(float a[ ], int n) { int i; float s = 0.0; for (i = 0; i<n; i++) s += a[i]; return s; float sum(float *a, int n) { int i; float s = 0.0; for (i = 0; i<n; i++) s += *a++; return s; Copyright © 2006 The McGraw-Hill Companies, Inc. strcpy: A “bad” C example void strcpy(char *p, char *q) { while (*p++ = *q++) ; } Strings p and q are arrays of characters ending in a NUL character, which is interpreted as false. (Note that if p is shorter than q, the array boundaries will be overrun. Copyright © 2006 The McGraw-Hill Companies, Inc. Reference Types in Java • In Java, all nonprimitive types are reference types • A variable from a reference type stores the address of a memory object; i.e., it is the equivalent of a pointer variable (may depend on JVM implementation). • References differ from pointers in that they don’t require specific dereferencing operations. • Java arrays, classes, and interfaces are reference types. Copyright © 2006 The McGraw-Hill Companies, Inc. 5.4.3 Arrays and Lists Arrays: • Sequence of values, accessible by index • All array values are from the same type • Design issues: – – – – Number of dimensions/declaration syntax Index type Are array dimensions static or dynamic? Are the boundaries of the index set determined by the language or the programmer? – Should the language enforce bounds checking? Copyright © 2006 The McGraw-Hill Companies, Inc. Design Issues for Arrays • Declaration syntax/number of dimensions – Technically, C/C++ only allow one dimension int C[4][3]; //array of 4 3-element rows – Compare to Pascal: TYPE Num = ARRAY[1..4, 1..3] OF Integer; VAR Num:C; and Fortran: integer C(4,3) early Fortran version: DIMENSION C(4,3) – Early languages set an upper limit on the number of dimensions Copyright © 2006 The McGraw-Hill Companies, Inc. Array Design Issues • Index type – Usually restricted to integers – Pascal allows any integral type: integers, char, enum Copyright © 2006 The McGraw-Hill Companies, Inc. Array Design Issues Are the bounds of the index set fixed? • C-like languages: (0, …, n-1) • Fortran: defaults to (1, 2, …, n) but modifiable: – INTEGER Counts (-20:20) • Pascal & Ada: programmer determined – Ada: a : array(-4..5) of INTEGER; – Pascal: TYPE nums = ARRAY[10…19] OF Integer nums: Numbers Copyright © 2006 The McGraw-Hill Companies, Inc. Array Design Issues • Are array dimensions static or dynamic? – Pascal: always static; array dimensions are part of the array type. – Java arrays are sized at runtime, don’t change thereafter – C/C++ permits runtime declaration of array bounds for dynamically declared arrays: int numbers[100];//static sized array int* intPtr; cin >> n; intPtr = new int[n];//dynamic array Copyright © 2006 The McGraw-Hill Companies, Inc. Design Issues • Should the language enforce bounds checking? – Java and Ada do; Pascal provides it as an option • Tradeoff – safety versus efficiency • Overwriting array boundaries is one of the most common (and hardest to find) errors in a program – C & C++ generate a memory protection error if the reference is outside the boundaries of the program address space, otherwise no error message is given Copyright © 2006 The McGraw-Hill Companies, Inc. Implementation Issues • Dope Vector: array data needed at compile time or run time or both; e.g., – Element size & type – Index type/range for each dimension – Start address of array in memory • Used to calculate memory addresses, & do index range checking if included in language • Built during the semantic analysis phase of translation Copyright © 2006 The McGraw-Hill Companies, Inc. Example/Java Consider the Java declaration int[] A = new int[n]; At runtime, every reference A[i] must be checked: is 0 ≤ i < n ? Portions of the dope vector must be present at runtime for checking. C/C++ don’t do bounds checking; so array size not needed at runtime. Copyright © 2006 The McGraw-Hill Companies, Inc. Dope Vectors For a one-dimensional array of integers, where integers take 4 bytes of storage 4 = e, element size int = element type 10 = size (n) 2 = e, char = 4 = # For a two-dimensional element element of type rows array of characters, where size (m) characters take 2 bytes Copyright © 2006 The McGraw-Hill Companies, Inc. 3 = # of col’s (n) Address Computation To compute the address of a given element : addr(a[i]) = addr(a[0]) + e∙i (e = element size) addr(a[i][j]) = addr(a[0][0]) + e∙(ni + j) (n = number of columns) Two-dimensional array formula assumes row-major storage order; Fortran uses column-major. This formula assumes indexes start at 0; for languages that permit other bounds, modify the formula. Copyright © 2006 The McGraw-Hill Companies, Inc. Range Checking and Strong Typing Before each reference a[i], check to see if index i is within bounds. Also at runtime compute element’s address: addr(a[i]) = addr(a[0]) + e∙i The computation usually cannot be done until run time because the information isn’t present at compile time (adresses are interpreted) x = a[2*y – 1]; or ar[i][j+1] = 3*n; Copyright © 2006 The McGraw-Hill Companies, Inc. Lists and Slices Lists are used in a number of languages, including Python, Lisp, and Lisp-related languages Python lists are generalized versions of arrays Heterogeneous: myList = [‘cat’, 14, 13.7, ‘final’]; Lists are indexed, like arrays. Indexes start at 0. myList[0] returns ‘cat’ Copyright © 2006 The McGraw-Hill Companies, Inc. Lists and Slices (Python) Slice: a sublist; contiguous series of entries, specify by first index and length: if myList = [‘cat’, 14, 13.7, ‘final’]; then myList[1:2] returns [14, 13.7] Lists can grow and shrink dynamically. Copyright © 2006 The McGraw-Hill Companies, Inc. Strings Fortran, Algol60: none COBOL: static strings, few operations C: string is a fixed length, 1D character array terminated by a NUL character; no error checking provided Ada strings: also fixed length character array but is type safe; run-time checks prevents storage overflow In Java, Perl, Python, strings are built-in types; length is not restricted. C++ provides a string library – not a built-in type. Copyright © 2006 The McGraw-Hill Companies, Inc. Structures (Records) A first step toward classes Collection of elements of different types – Elements commonly referred to as fields – Stored sequentially in memory, with some qualifications Used first in Cobol, PL/I; absent from Fortran, Algol 60 Common to Pascal-like, C-like languages How does a C++ struct differ from a class? Omitted from Java as redundant Copyright © 2006 The McGraw-Hill Companies, Inc. struct employeeType { int id; char name[25]; int age; float salary; char dept; }; struct employeeType employee; ... employee.age = 45; Copyright © 2006 The McGraw-Hill Companies, Inc. Unions C/C++: union Pascal/Ada: variant record Logically: multiple views of same storage Infrequently used today, but might be useful in some systems applications Complicates issues of type safety since actual data type isn’t known at compile time. Copyright © 2006 The McGraw-Hill Companies, Inc. union myUnion {// c/c++ example int i; float r; }; myUnion u; // can only use one member at a time u.i = 10; u.r = 3.14; // u is an integer // u is a now a float; value & type changes // Enough memory is reserved // to hold the largest member u.i = 1; … x = x + u.r // Now u is an integer again // if x is a float, this is a type error Copyright © 2006 The McGraw-Hill Companies, Inc. 5.8: Polymorphism and Generics Polymorphism is from the Greek word meaning “having two or more forms” A function or operation is polymorphic if it can be applied to any one of several related types and achieve the same result. An advantage of polymorphism in a programming language is that it enables code reuse. Copyright © 2006 The McGraw-Hill Companies, Inc. Parametric Polymorphism A type of polymorphism in which code is written without any type-specific references. Later, the code can be used with different data types. Functions that are written in this way are called polymorphic functions. Copyright © 2006 The McGraw-Hill Companies, Inc. Generics/Templates Ada, C++ and Java all provide parametric polymorphism in the form of generics or templates Ada generic functions – templates instantiated at compile time with concrete types and operators – parametric polymorphism – type binding delayed from programming time to compile time OO languages also provide subtype polymorphism through the inheritance mechanism. More about this when we discuss OO programming. Copyright © 2006 The McGraw-Hill Companies, Inc. A Generic Stack in Ada Figure 5.11 generic type element is private; package stack_pck is type stack is private procedure push(i:element; in out s: stack); private type node is record val: element; next: stack end record; end stack_pck package body stack_pck is procedure push … … end push end stack_pck Copyright © 2006 The McGraw-Hill Companies, Inc. Using the Generic Stack in Ada To use the stack in a program, source code should include the generic stack package and a declaration of the actual item type. The compiler will use the appropriate instructions/data types to implement the operations. Copyright © 2006 The McGraw-Hill Companies, Inc. Pure or SubTyping Polymorphism A function that specifies a parameter from type T will also accept an argument from type S if S is a subtype of T. (based on inheritance) Example: Class – GeometricShape has method Draw. Subclasses Circle, Square, etc. each have their own Draw method. Type is bound at runtime, since the object passed to the method determines which version of Draw is called. Copyright © 2006 The McGraw-Hill Companies, Inc.