Programming Languages

advertisement
Programming Languages
2nd edition
Tucker and Noonan
Chapter 5
Types
Types are the leaven of computer programming;
they make it digestible.
Robin Milner
Copyright © 2006 The McGraw-Hill Companies, Inc.
5.1 Type Errors
5.2 Static and Dynamic Typing
5.3 Basic Types
5.4 Non-Basic Types-through 5.4.2
5.5 Recursive Data Types
5.6 Functions as Types
5.7 Type Equivalence
5.8 Subtypes
5.9 Polymorphism and Generics
5.10 Programmer-Defined Types
Copyright © 2006 The McGraw-Hill Companies, Inc.
What is a Type?
A type is a collection of values and operations defined on those
values.
Example: Integer type has values ..., -2, -1, 0, 1, 2, ... and
operations +, -, *, /, <, ...
The Boolean type has values true and false and operations ,
,  (!), …
Most types have a finite number of values due to fixed size
allocation (e.g., can’t express all integers)
Exception: some languages (Java, Haskell)support unlimited
integer sizes
Copyright © 2006 The McGraw-Hill Companies, Inc.
How Many Types Are Needed?
Early languages, (Fortran, Algol, Cobol), built in all
types – no way to define new ones.
If you needed a type color, you could use integers;
But – there’s no way to prevent illegal operations.
In programming languages, types are used to provide
ways of effectively modeling a problem solution,
so the user should be able to define new types.
Copyright © 2006 The McGraw-Hill Companies, Inc.
5.1 Type Errors
Machine data carries no type information.
Example: 0100 0000 0101 1000 0000 0000 0000 0000
could represent
– 3.375 (floating point)
– 1,079,508,992 (a 32-bit integer)
– 16472 & 0 (two 16-bit integers)
– The ASCII characters @ X NUL NUL
depending on the interpretation
Copyright © 2006 The McGraw-Hill Companies, Inc.
• A type error is any error that arises because an
operation is attempted on a data type for which it is
undefined.
– Type errors are common in assembly language
programming; e.g., add two floats using an integer
machine operation
– High level languages reduce the number of type errors.
• A type system defines the bindings between a
variable’s type, values, and the operations that can
be performed on the values.
– provides a basis for detecting type errors.
Copyright © 2006 The McGraw-Hill Companies, Inc.
5.2 Static and Dynamic Typing
A type system imposes constraints that cannot be
expressed syntactically in BNF, EBNF, or any other
notation system for context-free grammars.
Type checking is part of semantic processing
– Some languages perform type checking at compile time
(e.g, C/C++). (compile-time semantics)
– Others (e.g., Perl, Python) perform type checking at run
time.
– Still others (e.g., Java) do both.
Copyright © 2006 The McGraw-Hill Companies, Inc.
Definitions
A language is statically typed if the types of all
variables are fixed when they are declared at
compile time.
A language is dynamically typed if the type of a
variable can vary at run time depending on the
value assigned.
Copyright © 2006 The McGraw-Hill Companies, Inc.
Definition: Strong typing
A language is strongly typed if its type system allows
all type errors in a program to be detected (either at
compile time or at run time.)
– If a language detects all type errors we sometimes say it
exhibits type safety
– Alternate definitions of strong typing exist, however
A strongly typed language can be either statically
(Ada, Java) or dynamically (Scheme, Perl) typed
– (some definitions of strong typing require types to be
statically declared.)
Copyright © 2006 The McGraw-Hill Companies, Inc.
Strong Typing (continued_
C and C++ are not considered strongly typed.
•
For example, union types aren’t type checked
•
Array types can be misused
Type coercion, legal in most languages, can cause
type errors to be undetected.
•
Consider the expression “a + b”, where a & b
are integers. Typing “a + c” by mistake, where c is a
float, should be an error but it won’t be detected.
Copyright © 2006 The McGraw-Hill Companies, Inc.
5.3 Basic Types
Consist of atomic values (e.g., floats)
Sometimes called simple or primitive types.
Traditionally, correspond for the most part to data
types provided by computers and occupy fixed
amounts of memory, based on computer memory
structure.
Copyright © 2006 The McGraw-Hill Companies, Inc.
Common Memory Sizes for the Basic
Types (On 32-bit Computers)
• Nibble: 4 bits
• Byte: 8 bits
• Half-word: 16 bits
In most languages,
simple data types are
limited to one of
these sizes
• Word: 32 bits
• Double word: 64 bits
• Quad word: 128 bits
Copyright © 2006 The McGraw-Hill Companies, Inc.
Basic Types in C, Ada, Java
Type
C
Ada
Java
Byte
---
---
byte
Integer
short, int, long
integer
short, int, long
Real
float, double
float, decimal
float, double
Character
char
character
char
boolean
boolean
Boolean
---
Copyright © 2006 The McGraw-Hill Companies, Inc.
Memory Requirements
In most languages some bindings of type to memory
size are implementation dependent.
For example, in C/C++ the only language restriction is
sizeOf(short)<= sizeOf(int)<= sizeOf(long)
Java defines size of each (half-word, word, double
word)
Copyright © 2006 The McGraw-Hill Companies, Inc.
Floating Point Representation
Floating point numbers (reals) are expressed as the
product of a binary number and a power of 2.
IEEE standard is used on computers designed/built
after 1980
If s = sign bit, e = exponent (in excess 127 notation)
and m is the mantissa, a single precision
representation means (-1)s x 1.m x 2e-127
Copyright © 2006 The McGraw-Hill Companies, Inc.
IEEE 754 Floating Point Number Representation
Figure 5.1
(-1)s x 1.m x 2e-127
Copyright © 2006 The McGraw-Hill Companies, Inc.
Floating Point Problems
• Integers are exact, floats aren’t necessarily so.
– e.g., since 0.2 cannot be represented exactly as a binary
fraction, the floating point representation is not exact;
result: 0.2 * 5 is not exactly 1.0
• Overflow and underflow are possible
• Inaccuracies due to floating point arithmetic may
affect results; for example, it’s possible that
a + (b + c) ≠ (a + b) + c
Copyright © 2006 The McGraw-Hill Companies, Inc.
Example: a = -1324x103, b = 1325 x 103,
c = 5424 x 100
Assume 4 significant figures for the fraction.
(a + b) = -1324 x 103 + 1325 x 103 = 1 x 103
(a + b) + c = 1000 x 100 + 5424 x 100
= 6424 x 100
Whereas
(b + c) = 1325 x 103 + 0005 x 103 = 1330 x 103
a + (b + c) = -1324 x 103 + 1330 x 103 = 6 x 103
= 6000 x 100
This is sometimes called representational error
Copyright © 2006 The McGraw-Hill Companies, Inc.
Overflow
Arithmetic operations may overflow the finite memory
allocation and give an incorrect result, regardless of
whether the values are integers or floats.
Some machines generate an exception when this
happens, some (e.g., Java Virtual Machine) don’t.
Copyright © 2006 The McGraw-Hill Companies, Inc.
Java’s BigInteger and BigDecimal type
http://java.sun.com/j2se/1.5.0/docs/api/java/math/BigInteger.html
arbitrary-precision integers, semantics of arithmetic operations
exactly mimic those of Java's integer arithmetic operators
http://java.sun.com/j2se/1.5.0/docs/api/java/math/BigDecimal.html
arbitrary-precision signed decimal numbers, consist of an arbitrary
precision integer unscaled value and a 32-bit integer scale.
Some applications; e.g., computer security, need very large
numbers.
Copyright © 2006 The McGraw-Hill Companies, Inc.
Operator Overloading
An operator or function is overloaded when its
meaning varies depending on the types of its
operands or arguments or result.
More about function overloading later
Java: a + b
– floating point add
– integer add
– string concatenation
What if a is an integer and b is a float?
Copyright © 2006 The McGraw-Hill Companies, Inc.
Mixed Mode Expressions
Mixed mode: operands have different types;
e.g., one operand an int, the other floating
point
Why is it a problem? (Difference in hardware
operations)
Most languages have type rules that define how
mixed mode expressions will be treated.
Copyright © 2006 The McGraw-Hill Companies, Inc.
Mixed Mode & Type Conversions
Java, C, C++ and others use type conversions to make
mixed type operands compatible.
Coercion: The conversions are done automatically by
the language according to a set of type rules.
(implicit conversions)
Casting: The conversion must be specifically
requested by the user. (explicit conversions)
Copyright © 2006 The McGraw-Hill Companies, Inc.
Type Conversions
A type conversion is a narrowing conversion if
the result type permits fewer bits, thus
potentially losing information.
Narrowing: float to int, or long to short
Otherwise it is termed a widening conversion.
Copyright © 2006 The McGraw-Hill Companies, Inc.
Implicit v Explicit Type Conversions
Language design decision: Should type
conversions be done automatically by the
compiler (implicit conversions) or should the
language require a cast (explicit)?
Consensus: OK to have implicit widening
conversions; narrowing conversions should
be explicit.
Implicit conversions (coercion) may make a
language less type safe.
Copyright © 2006 The McGraw-Hill Companies, Inc.
Implicit v Explicit Type Conversions
C++ example widening: C++ example narrowing
int
sum;
int
n;
float average;
average = sum / n;
float num1;
float num2;
int
sum;
sum = num1 + num2;
Copyright © 2006 The McGraw-Hill Companies, Inc.
Other Basic Types – Design Decisions
Should the language include a Boolean type?
What character set should be used?
– Traditionally, ASCII: 7-bit code stored in an 8bit byte, which allows for 127 characters
– ASCII Latin-1 variant uses 8 bits to encode 191
characters for languages with larger alphabets
– EBCDIC: 8-bit code developed by IBM
– Unicode: UTF-8, UTF-16, UTF-32. Java uses
UTF-16 (16-bit code); currently has codes for
about 50,000 characters.
Copyright © 2006 The McGraw-Hill Companies, Inc.
5.4 Nonbasic Types
Nonbasic types:
• Composed from other data types (e.g., arrays)
• Also called structured types
• Size of the values of these data types is not
fixed by the language but is programmer
determined.
Copyright © 2006 The McGraw-Hill Companies, Inc.
5.4.1 Enumeration Types
Provided by many languages as a way to
improve program readability
Essentially, allow the programmer to assign
names to integer values and then use the
names in programming (e.g., color, shape,
day, …)
Copyright © 2006 The McGraw-Hill Companies, Inc.
Enumeration Types – C/C++
Example:
enum Day {Monday, Tuesday, Wednesday,
Thursday, Friday, Saturday, Sunday};
enum Day myDay = Wednesday;
Values can be compared:
if (myDay == Sunday) ...
Implicit enumeration-type-to-integer conversion
is okay; use a cast to go the other direction:
for(myDay = Monday; myDay < Friday; Day(myDay + 1))
Copyright © 2006 The McGraw-Hill Companies, Inc.
Enumeration Types - Java
More powerful than C/C++ version because the
Enum class inherits methods from the Java
class Object, such as toString.
for (day d : day.values())
System.out.println(d);
Copyright © 2006 The McGraw-Hill Companies, Inc.
5.4.2 Pointers
Design decision: to have pointer types, or not?
• C, C++, Ada, Pascal
• Java???
History
• Early languages (Fortran, COBOL): static memory
• Later languages added pointers for dynamic memory
allocation (Pascal, C, C++, Ada)
• Modern languages like Java do not have explicit
pointers but still support dynamic memory allocation.
– Scheme and Prolog also have implicit pointers.
Copyright © 2006 The McGraw-Hill Companies, Inc.
Pointer Operations
Pointers and references support the construction of
linked data structures/data types.
The value of a pointer type variable is a memory
address.
C/C++ operators:
• address-of (unary &) returns the address of a variable
• dereferencing (unary *) returns the value of a
reference
– Corresponds to indirect addressing mode in machine code
For an arbitrary variable x: *(&x) = x
Copyright © 2006 The McGraw-Hill Companies, Inc.
C/C++ Examples – Linked List Definitions
struct Node {
int key;
struct Node* next;
};
struct Node* head;
Node is a recursive
data structure
Node is a recursive data structure
Copyright © 2006 The McGraw-Hill Companies, Inc.
Fig 5.4: A Simple Linked List in C/C++
Copyright © 2006 The McGraw-Hill Companies, Inc.
Pointer Problems
Error-prone (mistakes in usage)
Memory leaks (from non-freed dynamic memory)
Particularly troublesome in C/C++, in part because of
the duality between pointers and array references
Copyright © 2006 The McGraw-Hill Companies, Inc.
C Array/Pointer Example with Dereferencing
float sum(float a[ ],
int n) {
int i;
float s = 0.0;
for (i = 0; i<n;
i++)
s += a[i];
return s;
float sum(float *a,
int n) {
int i;
float s = 0.0;
for (i = 0; i<n;
i++)
s += *a++;
return s;
Copyright © 2006 The McGraw-Hill Companies, Inc.
strcpy: A “bad” C example
void strcpy(char *p, char *q) {
while (*p++ = *q++) ;
}
Strings p and q are arrays of characters ending in a
NUL character, which is interpreted as false.
(Note that if p is shorter than q, the array boundaries
will be overrun.
Copyright © 2006 The McGraw-Hill Companies, Inc.
Reference Types in Java
• In Java, all nonprimitive types are reference types
• A variable from a reference type stores the address
of a memory object; i.e., it is the equivalent of a
pointer variable (may depend on JVM
implementation).
• References differ from pointers in that they don’t
require specific dereferencing operations.
• Java arrays, classes, and interfaces are reference
types.
Copyright © 2006 The McGraw-Hill Companies, Inc.
5.4.3 Arrays and Lists
Arrays:
• Sequence of values, accessible by index
• All array values are from the same type
• Design issues:
–
–
–
–
Number of dimensions/declaration syntax
Index type
Are array dimensions static or dynamic?
Are the boundaries of the index set determined by the
language or the programmer?
– Should the language enforce bounds checking?
Copyright © 2006 The McGraw-Hill Companies, Inc.
Design Issues for Arrays
• Declaration syntax/number of dimensions
– Technically, C/C++ only allow one dimension
int C[4][3]; //array of 4 3-element rows
– Compare to Pascal:
TYPE Num = ARRAY[1..4, 1..3] OF Integer;
VAR Num:C;
and Fortran:
integer C(4,3)
early Fortran version: DIMENSION C(4,3)
– Early languages set an upper limit on the number of
dimensions
Copyright © 2006 The McGraw-Hill Companies, Inc.
Array Design Issues
• Index type
– Usually restricted to integers
– Pascal allows any integral type: integers, char, enum
Copyright © 2006 The McGraw-Hill Companies, Inc.
Array Design Issues
Are the bounds of the index set fixed?
• C-like languages: (0, …, n-1)
• Fortran: defaults to (1, 2, …, n) but modifiable:
– INTEGER Counts (-20:20)
• Pascal & Ada: programmer determined
– Ada: a : array(-4..5) of INTEGER;
– Pascal: TYPE nums = ARRAY[10…19] OF Integer
nums: Numbers
Copyright © 2006 The McGraw-Hill Companies, Inc.
Array Design Issues
• Are array dimensions static or dynamic?
– Pascal: always static; array dimensions are part of the
array type.
– Java arrays are sized at runtime, don’t change thereafter
– C/C++ permits runtime declaration of array bounds for
dynamically declared arrays:
int numbers[100];//static sized array
int* intPtr;
cin >> n;
intPtr = new int[n];//dynamic array
Copyright © 2006 The McGraw-Hill Companies, Inc.
Design Issues
• Should the language enforce bounds checking?
– Java and Ada do; Pascal provides it as an option
• Tradeoff – safety versus efficiency
• Overwriting array boundaries is one of the most
common (and hardest to find) errors in a program
– C & C++ generate a memory protection error if the
reference is outside the boundaries of the program
address space, otherwise no error message is given
Copyright © 2006 The McGraw-Hill Companies, Inc.
Implementation Issues
• Dope Vector: array data needed at compile time or
run time or both; e.g.,
– Element size & type
– Index type/range for each dimension
– Start address of array in memory
• Used to calculate memory addresses, & do index
range checking if included in language
• Built during the semantic analysis phase of
translation
Copyright © 2006 The McGraw-Hill Companies, Inc.
Example/Java
Consider the Java declaration
int[] A = new int[n];
At runtime, every reference A[i] must be checked:
is 0 ≤ i < n ?
Portions of the dope vector must be present at runtime for checking.
C/C++ don’t do bounds checking; so array size not
needed at runtime.
Copyright © 2006 The McGraw-Hill Companies, Inc.
Dope Vectors
For a one-dimensional
array of integers, where
integers take 4 bytes of
storage
4 = e,
element
size
int =
element
type
10 = size
(n)
2 = e,
char = 4 = #
For a two-dimensional
element element of
type
rows
array of characters, where size
(m)
characters take 2 bytes
Copyright © 2006 The McGraw-Hill Companies, Inc.
3 = # of
col’s
(n)
Address Computation
To compute the address of a given element :
addr(a[i]) = addr(a[0]) + e∙i (e = element size)
addr(a[i][j]) = addr(a[0][0]) + e∙(ni + j)
(n = number of columns)
Two-dimensional array formula assumes row-major
storage order; Fortran uses column-major.
This formula assumes indexes start at 0; for languages
that permit other bounds, modify the formula.
Copyright © 2006 The McGraw-Hill Companies, Inc.
Range Checking and Strong Typing
Before each reference a[i], check to see if index i is
within bounds.
Also at runtime compute element’s address:
addr(a[i]) = addr(a[0]) + e∙i
The computation usually cannot be done until run
time because the information isn’t present at
compile time (adresses are interpreted)
x = a[2*y – 1]; or ar[i][j+1] = 3*n;
Copyright © 2006 The McGraw-Hill Companies, Inc.
Lists and Slices
Lists are used in a number of languages, including
Python, Lisp, and Lisp-related languages
Python lists are generalized versions of arrays
Heterogeneous:
myList = [‘cat’, 14, 13.7, ‘final’];
Lists are indexed, like arrays. Indexes start at 0.
myList[0] returns ‘cat’
Copyright © 2006 The McGraw-Hill Companies, Inc.
Lists and Slices (Python)
Slice: a sublist; contiguous series of entries, specify
by first index and length:
if myList = [‘cat’, 14, 13.7, ‘final’];
then
myList[1:2] returns [14, 13.7]
Lists can grow and shrink dynamically.
Copyright © 2006 The McGraw-Hill Companies, Inc.
Strings
Fortran, Algol60: none
COBOL: static strings, few operations
C: string is a fixed length, 1D character array terminated by a
NUL character; no error checking provided
Ada strings: also fixed length character array but is type safe;
run-time checks prevents storage overflow
In Java, Perl, Python, strings are built-in types; length is not
restricted.
C++ provides a string library – not a built-in type.
Copyright © 2006 The McGraw-Hill Companies, Inc.
Structures (Records)
A first step toward classes
Collection of elements of different types
– Elements commonly referred to as fields
– Stored sequentially in memory, with some qualifications
Used first in Cobol, PL/I; absent from Fortran, Algol 60
Common to Pascal-like, C-like languages
How does a C++ struct differ from a class?
Omitted from Java as redundant
Copyright © 2006 The McGraw-Hill Companies, Inc.
struct employeeType {
int id;
char name[25];
int age;
float salary;
char dept;
};
struct employeeType employee;
...
employee.age = 45;
Copyright © 2006 The McGraw-Hill Companies, Inc.
Unions
C/C++: union
Pascal/Ada: variant record
Logically: multiple views of same storage
Infrequently used today, but might be useful in some
systems applications
Complicates issues of type safety since actual data
type isn’t known at compile time.
Copyright © 2006 The McGraw-Hill Companies, Inc.
union myUnion {// c/c++ example
int i;
float r;
};
myUnion u; // can only use one member at a time
u.i = 10;
u.r = 3.14;
// u is an integer
// u is a now a float; value & type changes
// Enough memory is reserved
// to hold the largest member
u.i = 1;
…
x = x + u.r
// Now u is an integer again
// if x is a float, this is a type error
Copyright © 2006 The McGraw-Hill Companies, Inc.
5.8: Polymorphism and Generics
Polymorphism is from the Greek word meaning
“having two or more forms”
A function or operation is polymorphic if it can be
applied to any one of several related types and
achieve the same result.
An advantage of polymorphism in a programming
language is that it enables code reuse.
Copyright © 2006 The McGraw-Hill Companies, Inc.
Parametric Polymorphism
A type of polymorphism in which code is written
without any type-specific references.
Later, the code can be used with different data types.
Functions that are written in this way are called
polymorphic functions.
Copyright © 2006 The McGraw-Hill Companies, Inc.
Generics/Templates
Ada, C++ and Java all provide parametric
polymorphism in the form of generics or templates
Ada generic functions – templates instantiated at
compile time with concrete types and operators
– parametric polymorphism
– type binding delayed from programming time to compile
time
OO languages also provide subtype polymorphism
through the inheritance mechanism.
More about this when we discuss OO programming.
Copyright © 2006 The McGraw-Hill Companies, Inc.
A Generic Stack in Ada Figure 5.11
generic
type element is private;
package stack_pck is
type stack is private
procedure push(i:element; in out s: stack);
private
type node is record
val: element;
next: stack
end record;
end stack_pck
package body stack_pck is
procedure push …
…
end push
end stack_pck
Copyright © 2006 The McGraw-Hill Companies, Inc.
Using the Generic Stack in Ada
To use the stack in a program, source code
should include the generic stack package and
a declaration of the actual item type.
The compiler will use the appropriate
instructions/data types to implement the
operations.
Copyright © 2006 The McGraw-Hill Companies, Inc.
Pure or SubTyping Polymorphism
A function that specifies a parameter from type T will
also accept an argument from type S if S is a
subtype of T. (based on inheritance)
Example: Class – GeometricShape has method Draw.
Subclasses Circle, Square, etc. each have their own
Draw method.
Type is bound at runtime, since the object passed to
the method determines which version of Draw is
called.
Copyright © 2006 The McGraw-Hill Companies, Inc.
Download