type

advertisement
Chapter 5
Names, Bindings,
Type Checking, and
Scopes
ISBN 0-321-33025-0
Chapter 5 Topics
•
•
•
•
•
•
•
•
•
•
Introduction
Names
Variables
The Concept of Binding
Type Checking
Strong Typing
Type Compatibility
Scope and Lifetime
Referencing Environments
Named Constants
1-2
Imperative Languages
• Imperative languages are abstractions of von
Neumann architecture
– Memory
• stores both instructions and data
– Processor
• provides operations for modifying the contents of
the memory
1-3
Memory Cells and Variables
• The abstractions in a language for the
memory cells of the machine are variables.
– In some cases, the characteristics of the
abstractions are very close to the characteristics of
the cells;
• an example of this is an integer variable, which is
usually represented directly in one or more bytes of
memory.
– In other cases, the abstractions are far removed
from the organization of the hardware memory,
• as with a three-dimensional array, which requires a
software mapping function to support the
abstraction.
1-4
Attributes of Variables
• A variable can be characterized by a
collection of properties, or attributes.
• The most important of variable attributes is
type, a fundamental concept in
programming languages.
1-5
Design Considerations of Data Type
• The design of the data types of a language
requires that a variety of issues be considered.
• Among the most important of these issues are
– the scope of variables
– the lifetime of variables
and
– type equivalence.
• Related to the first two are the issues of
– type checking
and
– initialization.
1-6
C-based Languages
• In this book, the author uses the phrase
C-based languages to refer to C, C++, Java,
and C#.
1-7
Name
• One of the fundamental attributes of
variables: names, which have broader use
than simply for variables.
• A name is a string of characters used to
identify some entity in a program.
1-8
Other Usage of the Name Attribute of
Variables
• Names are also associated with
– labels
– subprograms
– formal parameters
and
– other program constructs.
• The term identifier is often used
interchangeably with name.
1-9
Design Issues for Names
– Are names case sensitive?
– Are special words reserved words or keywords?
1-10
Length of Names
• Length
– If too short, they cannot be connotative
– Length examples:
• FORTRAN I: maximum 6
• C 89:
– no length limitation on its internal names
» Only the first 31 are significant
– external names (defined outside functions and handled by linkers)
» are restricted to 6 characters.
• C# and Java: no limit, and all are significant
• C++: no limit, but implementers often impose one
– They do this so the symbol table in which identifiers are stored
during compilation need not be too large, and also to simplify
the maintenance of that table.
1-11
Name Forms
• Names in most programming languages
have the same form: a letter followed by a
string consisting of letters, digits, and
underscore character (_).
– In the 1970s and 1980s, underscore characters
were widely used to form names.
• E.g. my_stack
– Nowadays, in the C-based languages,
underscore form names are largely replaced by
camel notation.
• E.g. myStack
1-12
Embedded Spaces in Names
• In versions of Fortran prior to Fortran 90,
names could have embedded spaces, which
were ignored.
– For example, the following two names were
equivalent:
Sum Of Salaries
SumOfSalaries
1-13
Case Sensitivity
• In many languages, notably the C-based
languages, uppercase and lowercase letters
in names are distinct
– For example, the following three names are
distinct in C++: rose, ROSE, and Rose.
1-14
Drawbacks of Case Sensitivity
• detrimental to readability
– Names that look very similar in fact denote
different entities.
– Case sensitivity violates the design principle
• that language constructs that look the same should
have the same meaning.
• detrimental to writability
– The need to remember specific case usage
makes it more difficult to write correct
programs.
1-15
Special Words
• Special words in programming languages
are used
– to make programs more readable by naming
actions to be performed.
– to separate the syntactic entities of programs.
• In most languages, special words are
classified as reserved words, but in some
they are only keywords.
– P.S.: In program code examples in this book,
special words are presented in boldface.
1-16
Keywords
• A keyword is a word of a programming
language that is special only in certain
contexts.
1-17
Example of Keywords
• Fortran is one of the languages whose special
words are keywords.
– In Fortran, the word Real, when found at the beginning
of a statement and followed by a name, is considered a
keyword that indicates the statement is a declarative
statement.
– However, if the word Real is followed by the assignment
operator, it is considered a variable name.
– These two uses are illustrated in the following:
Real Apple
Real = 3.4
• Fortran compilers and Fortran program readers
must recognize the difference between names and
special words by context.
1-18
Reserved Words
• A reserved word is a special word of a
programming language that can NOT be
used as a name.
1-19
Advantages of Reserved Words
• As a language design choice, reserved
words are better than keywords because
the ability to redefine keywords can lead to
readability problems.
1-20
Drawback Example of Keywords
• In Fortran, one could have the statements
• Integer Real
• Real Integer
which declare the program variable Real to
be of Integer type and the variable
Integer to be of Real type.
• In addition to the strange appearance of
these declaration statements, the
appearance of Real and Integer as
variable names elsewhere in the program
could be misleading to program readers.
1-21
Variables
• A program variable is an abstraction of a
computer memory cell or collection of cells.
• Variables can be characterized as a sixtuple
of attributes:
–
–
–
–
–
–
Name
Address
Value
Type
Lifetime
Scope
1-22
Benefits of Using Variables
• One of the major adjustments from
machine languages to assembly languages
was to replace absolute numeric memory
addresses with names, making programs far
more readable and thus easier to write and
maintain.
• That above step also provided an escape
from the problem of manual absolute
addressing, because the translator that
converted the names to actual addresses
also chose those addresses.
1-23
Address
• The address of a variable is the memory address
with which it is associated.
• In many languages, it is possible for the
same variable name to be associated with
different addresses
– at different places
and
– at different times in the program.
1-24
How Parameters and Local Variables Are
Represented in an Object File?
abc(int aa)
{int bb;
bb=aa;
:
:
abc:
function prologue
*(%ebp-4)=*(%ebp+8)
function epilogue
}
a C function
equivalent assembly code
aa
return address
previous frame
point
ebp
bb
P.S.: function
prologue
and function epilogue are
added by a compiler
1-25
The Same Names in Different Functions
Are Associated with Different Addresses
• A program can have two subprograms,
subl and sub2, each of which defines a
variable that uses the same name, say sum.
• Because these two variables are
independent of each other, a reference to
sum in subl is unrelated to a reference to
sum in sub2.
1-26
The Same Names in Different Executions May
Be Associated with Different Addresses
• If a subprogram has a local variable that is
allocated from the run-time stack when the
subprogram is called, different calls may
result in that variable having different
addresses.
– These are in a sense different instantiations of the
same variable.
1-27
Memory Allocation of Local Variables
G(int a)
{ int i;
high address
stack
G’s stack frame
H(3);
add_g:
i++;
}
i
b
return address add_g
H(int b)
{ char c[100];
int i;
address of G’s
frame point
H’s stack
frame
C[99]
while((c[i++]=getch())!=EOF)
{
}
C[0]
}
low address
i
1-28
L-value
• The address of a variable is sometimes
called its L-value, because that is what is
required when a variable appears in the left
side of an assignment statement.
1-29
Aliases
• It is possible to have multiple variables that
have the same address.
• When more than one variable name can be
used to access a single memory location,
the names are called aliases.
1-30
Disadvantages of Aliases
• Aliasing is a hindrance to readability because it
allows a variable to have its value changed by an
assignment to a different variable.
– For example, if variables total and sum are aliases, any
change to total also changes sum and vice versa.
– A reader of the program must always remember that
total and sum are different names for the same memory
cell.
– Because there can be any number of aliases in a program,
this is very difficult in practice.
• Aliasing also makes program verification more
difficult.
1-31
Ways to Create Aliases
• Aliases can be created in programs in
several different ways.
• C and C++: union types.
• Two pointer variables are aliases when they
point to the same memory location.
• Reference variables
• When a C++ pointer is set to point at a
variable, the pointer, when dereferenced,
and the variable’s name are aliases.
1-32
Type
• The type of a variable determines
– the range of values the variable can store
and
– the set of operations that are defined for values of
the type.
• For example, the type int in Java specifies
– a value range of -2147483648 to 2147483647
and
– arithmetic operations for addition, subtraction,
multiplication, division, and modulus.
1-33
Value
• The value of a variable is the contents of the
memory cell or cells associated with the
variable.
1-34
Abstract Cells
• It is convenient to think of computer
memory in terms of abstract cells, rather than
physical cells.
• The physical cells, or individually addressable
units, of most contemporary computer
memories are byte-sized, with a byte
usually being eight bits in length.
– This size is too small for most program
variables.
• We define an abstract memory cell to have the
size required by the variable with which it is
associated.
1-35
Example
• Although floating-point values may occupy
four physical bytes in a particular
implementation of a particular language,
we think of a floating-point value as
occupying a single abstract memory cell.
• We consider the value of each simple
nonstructured type to occupy a single
abstract cell.
• Henceforth, when we use the term memory
cell, we mean abstract memory cell.
1-36
r-value
• A variable's value is sometimes called its
r-value because it is what is required when
the variable is used on the right side of an
assignment statement.
• To access the r-value, the L-value must be
determined first.
– Such determinations are not always simple.
• For example, scoping rules can greatly complicate
matters, as is discussed in Section 5.8.
1-37
Binding and Binding Time
• In a general sense, a binding is an
association, such as
– between an attribute and an entity
or
– between an operation and a symbol.
• The time at which a binding takes place is
called binding time.
1-38
Possible Binding Time
• Bindings can take place at:
–
–
–
–
–
–
language design time,
language implementation time,
compile time,
link time,
load time,
run time.
1-39
Static Binding
• A binding is static if
– it first occurs before run time
and
– remains unchanged throughout program
execution.
1-40
Dynamic Binding
• If a binding
– first occurs during run time
or
– can change in the course of program execution,
it is called dynamic.
1-41
Type Binding
• Before a variable can be referenced in a
program, it must be bound to a data type.
• Two important aspects of type bindings are
– how the type is specified?
– when the binding takes place?
• Types can be specified statically through
some form of
– explicit declaration
– implicit declaration
• Both explicit and implicit declarations
create static bindings to types.
1-42
Explicit Declarations
• An explicit declaration is a statement in a
program that lists variable names and
specifies that they are a particular type.
– Most programming languages designed since
the mid-1960s require explicit declarations of
ALL variables.
• Perl, JavaScript, Ruby, and ML are some
exceptions.
1-43
Implicit Declarations
• An implicit declaration is a means of
associating variables with types through
default conventions instead of declaration
statements.
– In this case, the FIRST appearance of a variable
name in a program constitutes its implicit
declaration.
– Several widely used languages whose initial
designs were done before the late 1960s –
notably Fortran, PL/I, and BASIC – have
implicit declarations.
1-44
Implicit Declaration Example
• In Fortran, an identifier that appears in a
program that is not explicitly declared is
implicitly declared according to the
following convention:
– If the identifier begins with one of the letters I,
J, K, L, M, or N, or their lowercase versions, it
is implicitly declared to be Integer type.
– In all other cases, it is implicitly declared to be
Real type.
1-45
Drawbacks of Implicit Declarations
• Although they are a minor convenience to
programmers, implicit declarations can be
detrimental to reliability because they
prevent the compilation process from detecting
some typographical and programmer errors.
– For example, in Fortran, variables that are
accidentally left undeclared by the programmer
are given default types and unexpected attributes,
which could cause subtle errors that are difficult
to diagnose.
1-46
Disable Implicit Declarations in Fortran
• Many Fortran programmers now include
the declaration – Implicit none – in their
programs.
• This declaration instructs the compiler to
no implicitly declare any variables.
1-47
Method to Avoid Implicit Declarations
• Some of the problems with implicit declarations
can be avoided by requiring names for specific
types to begin with particular special characters.
• For example, in Perl,
– any name that begins with $ is a scalar, which can store
either a string or a numeric value.
– any name beginning with @ is an array
– The above rules create different name spaces for different
type variables. In this scenario, the names @apple and
%apple are unrelated, because each is from a different
name space.
– Furthermore, a program reader always knows the type of
a variable when reading its name.
1-48
Declarations and Definitions
• In C and C++, one must sometimes
distinguish between declarations and
definitions.
• Declarations specify types and other attributes
but do not cause allocation of storage.
• Definitions specify attributes and cause
storage allocation,
1-49
Number of Declarations and Definitions
• For a specific name, a C program can have
ANY number of compatible declarations,
but only a SINGLE definition.
1-50
Purpose of Variable Declarations
• One purpose of variable declarations in C is
to provide the type of a variable defined
external to a function but used in the
function.
• It tells the compiler the type of a variable
and that it is defined elsewhere.
1-51
Function Definition and Function
Prototype
• The idea in previous slides carries over to
the functions in C and C++, where prototypes
declare names and interfaces, but not the code
of functions.
• Function definitions, on the other hand, are
complete.
1-52
Example
file2.c
int a=100;
/*variable definition*/
int bar(int y)
{int x;
/*function definition*/
x=y;
return(x);
}
file1.c
#include<stdio.h>
extern int a;
/*variable declaration*/
extern int bar(int); /*function prototype*/
main()
{
printf("a=%d\n",a);
printf("bar(3)=%d\n",bar(3));
}
1-53
Compilation Steps
1.gcc –c file1.c
->
2.gcc –c file2.c
->
3.gcc file1.o file2.o ->
file1.o
file2.o
a.out
1-54
Dynamic Type Binding
• With dynamic type binding:
– the type is not specified by a declaration
statement, nor can it be determined by the
spelling of its name.
– the variable is bound to a type when it is
assigned a value in an assignment statement.
• When the assignment statement is executed, the
variable being assigned is bound to the type of the
value of the expression on the right side of the
assignment.
1-55
The Primary Advantage of Dynamic
Variable Type Binding
• A great deal of programming flexibility.
1-56
Creation of Generic Programs
• A program to process a list of data in a language
that uses dynamic type binding can be written as a
generic program, meaning that it is capable of
dealing with data of any numeric type.
• Whatever type data is input will be acceptable,
because the variables in which the data is to be
stored can be bound to the correct type when the
data is assigned to the variables after input.
• By contrast, because of static binding of types, one
cannot write a C++ or Java program to process a
list of data without knowing the type of that data.
1-57
Example of Dynamic Binding
• In PHP, and JavaScript, the binding of a
variable to a type is dynamic.
– For example, a JavaScript script may contain
the following statement:
list = [10.2, 3.5]
Regardless of the previous type of the variable
named list, this assignment causes it to
become a single-dimensioned array of numeric
elements of length 2.
– If the statement list = 47 followed the
assignment above, list would become a numeric
scalar variable.
1-58
Dynamic Binding Is Less Reliable in
Error Detection
• Dynamic type binding causes programs to
be less reliable, because the error detection
capability of the compiler is diminished
relative to a compiler for a language with
static type bindings.
1-59
Dynamic Binding Results in Weak Typerelated Error Detection
• Dynamic type binding allows any variable to
be assigned a value of any type.
• Incorrect types of right sides of
assignments are not detected as errors;
rather the type of the left side is simply
changed to the incorrect type.
1-60
Example of Drawbacks of Dynamic Type
Binding
• Suppose
– that in a particular JavaScript program, i and x are currently
storing scalar numeric values, and y is currently storing an array.
– that the program needs the assignment statement
i = x;
but because of a keying error, it has the assignment statement
i = y;
• In Javascript (or any other language that uses dynamic type
binding), no error is detected in this statement by the
interpreter - i is simply changed to an array. But later uses
of i will expect it to be a scalar, and correct results will be
impossible.
• In a language with static type binding, the compiler would
detect the error in the assignment i = y, and the program
would not get to execution.
1-61
Disadvantages of Dynamic Binding in
terms of Cost
• Perhaps the greatest disadvantage of dynamic type
binding is cost.
• The cost of implementing dynamic attribute binding is
considerable, particularly in execution time.
• Type checking must be done at run time.
• Furthermore, every variable must have a run-time
descriptor associated with it to maintain the current type.
• The storage used for the value of a variable must be of
varying size, because different type values require
different amounts of storage.
1-62
Implementation Concerns
• Languages that have dynamic type binding for
variables are usually implemented using pure
interpreters rather than compilers.
• Up to date computers do not have instructions
whose operand types are not known at compile
time.
– Therefore, a compiler cannot build machine instructions
for the expression A + B if the types of A and B are not
known at compile time.
• Pure interpretation typically takes at least ten
times as long as to execute equivalent machine
code.
1-63
Type Inference
• ML is a programming language that
supports both functional and imperative
programming (Milner et al., 1990).
• ML employs an interesting type inference
mechanism, in which the types of most expressions
can be determined without requiring the
programmer to specify the types of the
variables.
1-64
General Syntax of a ML Function
fun function_name(formal parameters) = expression;
• The value of the expression is returned by
the function.
1-65
Example (1)
• The function declaration
fun circumf(r) = 3.14159 * r * r;
specifies a function that takes a floatingpoint argument ( real in ML) and produces
a floating-point result.
• The types are inferred from the type of the
constant in the expression.
1-66
Example (2)
• Likewise, in the function
fun times10(x) = 10 * x;
the argument and functional value are
inferred to be of type int.
1-67
Example (3)
• Consider the following ML function:
fun square(x) = x * x;
– ML determines the type of both the parameter and the return
value from the * operator in the function definition.
Because this is an arithmetic operator, the type of the
parameter and the function are assumed to be numeric.
– In ML, the default numeric type to be int. So, it is inferred
that the type of the parameter and the return value of
square is int.
1-68
Example (4)
• If square were called with a floating-point
value, as in
square(2.75);
it would cause an error, because ML does
not coerce real values to int type.
1-69
Example (5)
• If we wanted square to accept real
parameters, it could be rewritten as
fun square(x) : real = x * x;
• Because ML does not allow overloaded
functions, this version could no coexist with
earlier int version.
1-70
Allocation and Deallocation of Memory
Cells
• The memory cell to which a variable is
bound somehow must be taken from a pool
of available memory. This process is called
allocation.
• Deallocation is the process of placing a
memory cell that has been unbound from a
variable back into the pool of available
memory.
1-71
The Lifetime of a Variable
• The lifetime of a variable is the time during
which the variable is bound to a specific
memory location.
• So the lifetime of a variable begins when it
is bound to a specific cell and ends when it
is unbound from that cell.
1-72
Categories of Scalar Variables
1-73
Categories of Scalar Variables
• It is convenient to separate scalar (unstructured)
variables into four categories, according to
their lifetimes:
–
–
–
–
static
stack-dynamic
explicit heap-dynamic
implicit heap-dynamic
1-74
Static Variables
• Static variables are those that
– are bound to memory cells before program
execution begins
and
– remain bound to those same memory cells until
program execution terminates.
1-75
Applications of Static Variables
• Globally accessible variables are often used throughout
the execution of a program, thus making it
necessary to have them bound to the same storage
during that execution.
• Sometimes it is convenient to have variables that
are declared in subprograms be history-sensitive,
that is, have them retain values between separate
executions of the subprogram.
– This is a characteristic of a variable that is statically
bound to storage.
1-76
Advantages of Static Variables
• Another advantage of static variables is
efficiency.
– All addressing of static variables can be direct.
• Other kinds of variables often require indirect
addressing, which is slower.
– No run-time overhead is incurred for allocation
and deallocation of static variables, although
this time is often negligible.
1-77
Disadvantages of Static Variables
• reduced flexibility
– in a language that has only variables that are
statically bound to storage, recursive subprograms
cannot be supported.
• storage cannot be shared among variables
– For example,
• Suppose a program has two subprograms, both of
which require large unrelated arrays.
• Further suppose that the two subprograms are
never active at the same time.
• If the arrays are static, they cannot share the same
storage for their arrays.
1-78
Example
• C and C++ allow programmers to include
the static specifier on a variable definition
in a function, making the variables it
defines static.
1-79
Stack-Dynamic Variables
• Stack-dynamic variables are those
– whose storage bindings are created when their
definition statements are elaborated
but
– whose types are statically bound.
1-80
Elaboration of the Definition Statements
of Stack-Dynamic Variables
• Elaboration of such a definition refers to the
storage allocation and binding process indicated
by the definition.
• Elaboration takes place when execution
reaches the code to which the definition is
attached.
a subprogram or a block
• Elaboration occurs during run time.
1-81
Memory Allocation of Stack-Dynamic
Variables Occur during Run-time
G(int a)
{ int i;
high address
stack
G’s stack frame
H(3);
add_g:
i++;
}
i
b
return address add_g
H(int b)
{ char c[100];
int i;
address of G’s
frame point
H’s stack
frame
C[99]
while((c[i++]=getch())!=EOF)
{
}
C[0]
}
low address
i
1-82
Example
• The variable definitions that appear at the
beginning of a Java method are elaborated
when the method is called.
• The variables defined by those definitions
are deallocated when the method completes
its execution.
1-83
The Location That Stores StackDynamic Variables
• As their name indicates, stack-dynamic
variables are allocated from the run-time
stack.
1-84
Storage Binding of a Variable May Occur
before Its Declaration
• Some languages – for example, C and Java –
allow variable definitions to occur anywhere a
statement can appear.
• In some implementations of these languages,
all of the stack-dynamic variables defined in
a function or method (not including those
declared in nested blocks) may be bound to
storage at the beginning of execution of the
function or method, even though the definitions
of some of these variables do not appear at
the beginning.
1-85
Stack-Dynamic Variables and Recursive
Programs
• To be useful, at least in most cases,
recursive subprograms require some form of
dynamic local storage so that each active
copy of the recursive subprogram has its
own version of the local variables.
• These needs are conveniently met by stackdynamic variables.
1-86
Memory Sharing
• The introduction of stack-dynamic
variables allows
– all subprograms to share the same memory
space for their locals.
1-87
Disadvantages of Stack-Dynamic
Variables
• the run-time overhead of allocation and
deallocation.
– however the overhead is not significant, because
all of the stack-dynamic variables that are
defined at the beginning of a subprogram are
allocated and deallocated togerher.
• slower accesses
– Indirect addressing is required
• subprograms cannot be history sensitive.
1-88
Examples of Stack-Dynamic Variables
• In Java, C++ and C#, local variables defined
in methods are by default stack-dynamic.
• In Pascal and Ada, all non-heap variables
defined in subprograms are stack-dynamic.
1-89
Explicit Heap-Dynamic Variables
• Explicit heap-dynamic variables are nameless
(abstract) memory cells that are allocated and
deallocated by explicit run-time instructions
specified by the programmer.
1-90
Reference Explicit Heap-Dynamic Variables
• Explicit heap-dynamic variables, which are
allocated from and deallocated to the heap,
can only be referenced through pointers or
reference variables.
– The pointer or reference variable that is used to
access an explicit heap-dynamic variable is
created as any other scalar variable.
1-91
Properties of a Heap
• The heap is a collection of storage cells
whose organization is highly disorganized
because of the unpredictability of its use.
1-92
Creating an Explicit Heap-Dynamic
Variable
• An explicit heap-dynamic variable is
created:
– by an operator (for example, in Ada and C++ )
or
– by a call to a system subprogram provided for that
purpose (for example, malloc() in C).
1-93
Allocation Operator in C++
• In C++, the allocation operator, named new,
uses a type name as its operand.
• When executed, an explicit heap-dynamic
variable of the operand type is created and a
pointer to it is returned.
– Because an explicit heap-dynamic variable is
bound to a type at compile time, that binding is
static.
– However, such variables are bound to storage at
the time they are created, which is during run
time.
1-94
Deleting a Heap-Dynamic Variables
• In addition to a subprogram or operator for
creating explicit heap-dynamic variables,
some languages include a means of
destroying them.
1-95
Example of Explicit Heap-dynamic
Variables
What follows is a C++ code segment:
int *intnode;
//create a pointer
...
intnode = new int; // create the heap-dynamic variable
delete intnode;
// deallocate the heap-dynamic variable
// to which intnode points
• In this example, an explicit heap-dynamic variable
of int type is created by the new operator.
• This variable can then be referenced through the
pointer, intnode.
• Later, the variable is deallocated by the delete
operator.
1-96
Java Objects
• Java, all data except the primitive scalars are
objects.
– e.g.
class Circle { … }
Circle cir=new Circle() ;
• Java objects are explicit heap-dynamic and
are accessed through reference variables.
• Java has no way of explicitly destroying a
heap-dynamic variable; rather, implicit garbage
collection is used.
1-97
Applications of Explicit Heap-Dynamic
Variables
• Explicit heap-dynamic variables are often
used for dynamic structures, such as linked lists
and trees, that need to grow and/or shrink
during execution.
• Such structures can be built conveniently
using pointers or references and explicit
heap-dynamic variables.
1-98
Disadvantages of Explicit HeapDynamic Variables
• the difficulty of using pointer reference
variables correctly.
• the cost of
– references to the variables
– allocations
and
– deallocations.
• the complexity of storage management
implementation.
1-99
Implicit Heap-dynamic Variables
• Implicit heap-dynamic variables are bound
to heap storage only when they are assigned
values.
• In fact, all their attributes are bound every
time they are assigned.
1-100
Example
• For example, a JavaScript script may
contain the following statement to assign a
value to the implicit heap-dynamic variable
list :
–
list = [10.2, 3.5]
Regardless of the previous type of the variable
named list, this assignment causes it to
become a single-dimensioned array of numeric
elements of length 2.
– If the statement list = 47 followed the
assignment above, list would become a
numeric scalar variable.
1-101
Advantages
• The advantage of such variables is that they
have the highest degree of flexibility,
allowing highly generic code to be written.
1-102
Disadvantages
• the run-time overhead of maintaining all
the dynamic attributes, which could include
array subscript types and ranges, among others.
• the loss of some error detection by the
compiler, as discussed in Section 5.4.2.2.
1-103
Type Checking
1-104
Generalize the Concepts of Functions
and Assignment Statements
• Subprograms are thought of as operators
– their parameters are their operands.
• The assignment symbol is thought of as a
binary operator
– with its target variable and its expression being
the operands.
1-105
Type Checking
• Type checking is the activity of ensuring
that the operands of an operator are of
COMPATIBLE types.
1-106
Compatible Types
• A compatible type is one that is
– either legal for the operator
or
– is allowed under language rules to be implicitly
converted by compiler-generated code (or the
interpreter) to a legal type.
• This automatic conversion is called a coercion.
1-107
Example
• If an int variable and a float variable are
added in Java, the value of the int variable
is coerced to float and a floating-point add is
done.
1-108
Type Errors
• A type error is the application of an
operator to an operand of an inappropriate
type.
1-109
Example of Type Errors
• In the original version of C, if an int value
was passed to a function that expected a
float value, a type error would occur
(because compilers for that language did
not check the types of parameters.)
– Integer Signed Attacks
1-110
Example of Integer Signed Attacks
void *memcpy(void *dest, const void *src, size_t n);
P.S.: size_t is equivalent to unsigned integer
static char data[256];
void *store_data(char *buf, int len)
{ if (len > 256 )
return -1;
return memcpy(data, buf, len);
}
P.S.: memcpy requires an unsigned integer for the
length parameter; therefore, the signed variable
len would be promoted to an unsigned integer,
lose its negative sign, and could wrap around and
become a very large positive number, cause
memcpy() to read past the bounds of buf.
1-111
Static Type Checking
• If all bindings of variables to types are static
in a language, then type checking can nearly
always be done statically.
1-112
Dynamic Type Checking
• Dynamic type binding requires type
checking at run time, which is called
dynamic type checking.
– Some languages, such as JavaScript and PHP,
because of their dynamic type binding, allow
only dynamic type checking.
1-113
Pros and Cons of Static Type Checking
• It is better to detect errors at compile time
than at run time, because the earlier
correction is usually less costly.
• The penalty for static checking is reduced
programmer flexibility.
1-114
Type Checking for Memory Cells That
Can Store Values of Different Types
• Type checking is complicated when a
language allows a memory cell to store values
of different types at different times during
execution.
– Such memory cells can be created with
• Ada variant records
• Fortran EQUIVALENCE
and
• C and C++ unions.
1-115
Type Checking for Variables That Can Store
Values of Different Types Must Be Dynamic
• For variables that can store values of different types, type
checking, if done, MUST be dynamic and requires the run-time
system to maintain the type of the current value of such memory cells.
• So even though all variables are statically bond to types in
languages such as C++, not all type errors can be detected by
static type checking.
– For example, the type of a statically bond C++ variable
may be union.
char c;
union sign
static type checking
{ int first;
int ≡ int
char second;
} number;
number.first=12;
c=number.second;
int
≡
int
1-116
Strongly Typed Programming
Languages
• A programming language is strongly typed
if type errors are always detected.
• The above requires that the types of all
operands can be determined either at
compile time or at run time.
1-117
The Importance of Strongly Typed
Languages
• The importance of strong typing lies in its
ability to detect ALL misuses of variables
that result in type errors.
• A strongly typed language also allows the
detection, at run time, of uses of the
incorrect type values in variables that can
store values of more than one type.
1-118
Fortran 95 Is Not Strongly Typed
• In Fortran 95 the use of Equivalence
between variables of different types allows
a variable of one type to refer to a value of a
different type, without the system being able
to check the type of the value when one of
the Equivalenced variables is referenced
or assigned.
1-119
Explanation: Fortran 95 Is Not
Strongly Typed
Integer
A
Real
R
Equivalence (A,R)
A=123
A
123 is not a real number;
hence, a type error
occurs.
123
R
1-120
C and C++ Are Not Strongly Typed
Languages
• C and C++ are not strongly typed languages
because
– both allow functions for which parameters are
not type checked.
– Furthermore, the union types of these
languages are not type checked.
1-121
Coercion Rules vs. Type Checking
• The coercion rules of a language have an
important effect on the value of type checking.
– For example,
• Expressions are strongly typed in Java.
• However, an arithmetic operator with one floatingpoint operand and one integer operand is legal.
• The value of the integer operand is coerced to
floating-point, and a floating-point operation takes
place.
• Even though the above is what is usually intended
by the programmer, the coercion also results in a
loss of part of the reason for strong typing – error
detection (see next slide).
1-122
The Value of Strong Typing Is Weakened
by Coercion
• Suppose a program written in a strongly
typed language had the int variables a and
b and the float variable d.
• Now, if a programmer meant to type a + b,
but mistakenly typed a + d, the error would
not be detected by the compiler. The value
of a would simply be coerced to float.
• The above coercion weaken the value of
strong typing.
1-123
Coercion Reduces Reliability
• Languages with a great deal of coercion,
like Fortran, C, and C++, are significantly
less reliable than those with little coercion,
such as Ada.
• Java and C# has half as many assignment type
coercions as C++, so its error detection is better
than that of C++, but still not nearly as
effective as that of Ada.
1-124
Download