Submitted in Partial of the for at the

advertisement
THE PERFIRMANCE ADVANTAGES OF A HIGH LEVEL LANGUAGE MACHINE
by
JAMES WALTER RYMARCZYK
Submitted in
Partial Fulfillment
of the R.equirements for the
Degree of Bichelor of Science
at the
INSTITUTE OF TECHNOLOGY
MASSACHUSETTS
June, 1972
Signtture 3f Authoa
.
; .
.ngineeri..
Department of Electrical
C3rtified by
S
S
0
0
S
S
S
S
.
May
Engineering;
197
12,
1972
S
Thesis Supervisor
Accepted by
S
S
S
S
S
S
S
S
S
S
S
S
S
S
S
.
.
Chairmin, Departmental Committee on Theses
ABSTRACT
This paper identifies and discusses a number of mechanisms
by which a machine with a suitable high level interface language
might achieve a level of performance which ex.eeds that of a
A high
machine with a conventional von Neumann architecture.
level language machine is characterized which is somewhat less
fl3xible than a conventional design, but which significantly
otit-performs the conventional machine when used in a way that
explits the high level language.
K3ywords and Phrases:
Computer Architecture,
Machine Organization,
High Level Language Machine,
Language Oriented Computer Design,
Computer Command Structures,
Hardware Implementation
of Programming Languages,
High Performance Computer Design.
ACKNOWLEDGEMENT
my thesis supervisor, and to
I am grateful to Stu Mainick,
Dive Kelleher and Steve Zilles of IBM for their generous efforts
in reviewing and commenting upon this paper.
iii
TABLE OF CONTENTS
Page
Section
I.
II.
.
.
.
.
.
.
.
.
.
1
.
.
.
.
.
.
.
.
.
.
2
.
.
.
.
.
.
.
.
.
.
3
CHARACrERISTICS
.
.
.
.
.
.
.
7
8
Perspective
A.
Historical
B.
objectives
.
.
.
.
LANGUAGE
FAVORA3LE
A.
.
.
.
.
.
INTRODUCrION
Program Structure
.
B. Primitive Data Types
.
.
.
,
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
11
,
.
.
13
III. MECBiA1ISMS FOR ACHIEVIN3 IPROVED PERFORMANZE
A.
Optimization
1. Coatext-Sensitive
.
.
.
13
.
.
.
.
.
.
14
Operations
.
.
.
.
.
14
.
.
.
.
.
.
.
16
.
.
.
.
.
.
.
17
.
.
.
.
.
17
.
.
.
.
18
.
.
.
.
19
.
.
.
.
20
Optimizations
Unnecessary
Avoiding
b.
Reordering operations
c.
Exploiting Special Cases
.
Processing ot Aggregates
3.
Parallel Processing of
4.
Dynamic
a.
C.
.
a.
2. Parallel
B.
.
of Expression Evaluation
Special Cases
Manageent of Temporaries
.
Increasing the Use of Temporaries
Stream
Instruction
Efficiencies
Procedural
Control
.
.
.
.
.
.
.
21
.
.
.
.
.
.
.
21
.
.
.
.
.
.
23
1.
Explicit
2.
Deferral of Tactical Decisions
3.
Algorithmic Encoding Density
.
.
.
.
.
.
.
24
4.
Hijgh
.
.
.
.
.
.
.
.
27
.
.
.
.
.
.
.
.
28
Level Interlocking
Concurrent Error Monitoring
.
iv
tV.
HYPOTHETICAL HIGH LEVEL LAN3UAGE MACHINE
.
.
.
.
33
A.
GenerAl Machine Organization
.
.
.
.
.
.
.
.
33
B.
Objects
.
.
.
.
.
.
.
.
36
.
.
.
.
.
.
36
.
.
.
.
.
.
.
C. Aspects of Program Iaterpretation
I.
.
.
.
.
.
.
.
.
.
38
.
.
.
.
.
.
.
.
.
.
39
.
.
.
.
.
.
.
.
.
.
.
42
.
.
.
.
.
.
.
.
.
.
.
45
.
.
.
.
.
.
.
.
.
.
47
Prograa
2.
Program Activatiots
3.
Program Execution
CONCLUSIONS
REFERENCES
TIN TS
.
Representation
1.
.
.
AND
.
.
.
.
.
BIBLIOGRAPHY
.
.
.
.#
56
LIST OF FIGURES
Page
Figure
1. rypical execution-sequencing operators .
.
.
.
10
2. Conventional loop structure for vector operations
.
.
25
.
.
26
.
.
26
.
.
34
.
.
37
.
.
39
3.
*
.
System/360 implementation of vector addition
4. System/360 implementation of inner-product
5.
Jigh lvel language mactine structure
6. Internal representatin of an object
.
.
.
.
.
.
7. Internal representation of a PROGRAM object .
3. Internal cepresentation if
9.
Example
of
a TEXT object
a TEXT token
.
.
.
.
.
.
.
.
.
40
.
.
.
.
41
vi
I.
INTRODUCTION
reportei to have once said that most papers
Peter Landin is
in
descrioe
2omputer Science
This paper is no exceFtion
someone else already knew [cMseJ70J.
rule.
to that
of notions
combines a collection
It
what
learned
their author
how
that have
been extracted from the literature with a number of my own ideas
although independently
that,
before.
surely been
derived, have
noted
Wiile credit for many of the concepts presented belongs
to others,
I accept full
responsibility for any misrepresenta-
tions or ambiguities which may exist in this presentation.
This paper
is intended
to serve
primarily as
an aid
to
others in the field of computer design by collecting, organizing
these ileas.
anl commentinj upon
not permit much in
that the
this paper does
The scope of
the way of demonstrable results.
tor the
physical realizations
absence of
It is hoped
mechanisms
that are discussed will be compensated for by a somewhat brcader
perspective than is customary in the literature.
The reader
semantic
is assumed
features
langaages [M9;
of
the
M10'; M13).
to be
familiar with
APL,
PL/I
and
the principal
LISP
he should
In addition,
programming
have scme
knowledge of the goals of coatemporary programming systems,
problems
that
are
encountered
and
the
the
hardware/software
engiaeering technijues usei in seeking to attain these goals.
T.A.
Historical Perspective
many diverse efforts to
There have been
that
directly
interprets
a
high
level
design a computer
language
[AbraP70;
Bashr67; BashT68;
BerkK69;
.hes771; DennJ71;
McFaC70; meggJ64;
1ul1l63; RiceR71;
RossC64; ShawJ58; ThurK70;
WebeH67; ZaksR71;
M3].
The motivatian for doing so
has generally been based upon
either of two assumptions.
First, it
has been pastulated that
zaa provide a significant
level interface language
useful projramming
cOst.
One would
environment ia
[flifJ68,
p.
culminate
like to zreate a
p.
3erkK69,
machine
in
programmiig
function without
which run-time
14;
is
(DannJ69; )ennJ71], etc.
a high
increase in
a prohibitive
increase in
well disciplined programming
programming errors
in
60],
lumps
generality
while desirable
a machine with
are detected
which such errors need not
-BashT67;
guaranteed
McFaC70],
or
at
in
least
which
likely
H:owevec, it is widely rezognized that,
features such as
these can be
programmed upon
contemporary low level machiiAes (indeed, upon Tucing machines!),
the
performance
interpretive
BerkK69].
cost
software
of
is
implementing
unacceptable
such
features
[Ilif358,
pp.
with
4,13;
Various
more directly.
level language
compilers
has included
designs have
interpreters
and
ThurK70;
Ross264;
BarkK69;
a high
sought to
is normally incurred by software
avoid the costly overhead ttiat
impleifmented
overall system
be attained by implementing
improvement can
performance
assumed that an
has often been
it
Second,
[B3ashT67;
BashT68;
At least one recent effort
WebeH67].
such as
system functions
several basic operating
main storage management and process dispatching [RiceR71].
the goal was enhanced
But whether
or the
an
alternative
achiaving these goals has Deen
is
specifically
interpretation
of a
more
and
to date have been based
organizations.
machine
language-independent
existing
Peceatly,
which
functions, most
accolerated performance of conventional
designs of high level language machines
upon
functional capabilities
strategy
promising
for
to employ a machiae organization
for
designed
the
level langiage
particular high
semantic
efficient
[AbraP70;
T.urK7); ZatsR71].
I.B.
Objectives
objective of
The primary
this thesis
previous
machine
whose
work
in
computational rates
this
language
area
by reducing
identify the
attained, in principle, by a
performance advantages that may be
processor
is to
has
is
in
high
sought
the number
to
Most
level.
achieve
higher
of interfaces,
or
tai
high level
interpretation, between the
layers of
the
tailoring
organization to
machine
by
semantic
(e.g.,
to
This paper will concern itself with
features of APL).
the array
primary
the
programming language
an existing
characteristics of
further
gone
have
Some efforts
circuitry.
machiie
language and
the oerforman-e improvemeats that may be attained over and above
those that
its treatment to the features
It will not restrict
interfaces.
Instead,
of any particular existing language.
compatible language
Then,
copatatioaal mechanisms,
the general ways
exceeds
that
consideration of
by the
be made
an attempt will
in which a macaine with a
language can
intecface
of a
a set of loosely
featuces will be proposed with computational
mind.
performance in
number of
reduction in the
accompany a brute-force
level
achieve a
machine
to identify
suitabie high level
which
of performance
a conventional
with
relevant
Neumann
von
architectur e.
A higa level
is
language machine will be
somewhat less flexible than
characterized
a conventional design,
which
but which
significantly out-performs the conventional machiae when used in
a way that exploits the high level language.
It
should be noted that no
problems of translating
the lifficult
the high level machine languige.
machine
attempt will be made to address
language would
have to
In
from other languages into
most practical systems,
serve as
an effective
the
target
langiage for such transilatioas.
Here,
the sole zoncern is with
thi hiiqh speel interpretatioa of a single, although excepticnally powerful,
language.
THIS
PAGE INI'ENTIONALLY
LEFT BLANK
II.
FAV3RABLE LANSUAGE CHARACTERISTICS
section
This
that are
many features
pissesses
system,
characterizes
impractical
that are
to
a
machine
programming
ia a
iesirable
which
language
implement upoa
contemporary
machine arhitectures, but taat favorably affect the performance
high level language machine.
capabilities of the
The language
of their relation to the
chiricteristics are discussel in terms
structure of programs and tue nature of primitive objects.
not comprise
section does
This
Such
lanqage.
dissertation
What
itself.
in
ciaracterization
formidable task
a
that is
of a
the definition
would
require
sought
is
sufficient to
is
a
a
support the
new
lengthy
language
following
sections of this paper.
For
and
precision,
extraneous)
issues of
to
syntax,
the
avoid
(and
thorny
the program examplas
here
that follow
ia this document will use a simple LISP-like notation:
(operator
operandl
opecand2
...
operandU)
N3tation variables and constants will be indicated by lower-case
and upper-case symbols,
respectively.
Proggla Structure
TI.A.
In
terms of
its program
structure,
the envisioned
high
level machine language most closely resembles the language LISP.
Each of its programs is a structured expression consisting of an
operands; and each operand is,
operator and an optional list of
in turn,
a structured expression.
As in LISP, the operators and operands within the text of a
program
are
environment
merely
(or
symbols whose
context) ia
course, various static and
are possible;
which
upon
invoked.
the
Of
lynamic symbol resolation mechanisms
(which would
general,
that, in
be determined
program cannot
resolution time
depends
the program is
is important is
but what
of a
sanantics
meaning
typically be
prior to
the
symbol
as program
as late
activation time).
The mntivation
that
desired
expression
an
be
equivalency
twotold.
First, it is
betwaan
values
and
should be passible to replace aay value with an
that evaluates
in other
vice-versa.
closed
there
It
expressions.
for these features is
to (or
woris,
under composition.
"returns")
that value,
all language constricts
Secoad,
there is
to
and
should be
be no
sacred
distinction between builtin operators and user-defined programs.
It
should be
opieritor,
possible
within his
for a
own
user
local
to
redefine any
environment,
by
"system"
providing
a
operator should
of an
the redefinition
Moreover,
the appropriate name.
program with
to those
require changes
not in itself
programs that use the operator.
also possesses
The language
operators
builtin
(a
superset of
set of
and powerful
a large
Of
of APL),
the operators
siagular importance are the execution-sequencing operators which
the COBOL
there wouli
be some sort
its
operands in
of SEQUENCE operator
operands
its
employing
concurrent hardware
which would
repetitively evaluate
some number
of times,
or
one of
until some
further stipulated
(perhaps
operator
operands
either
met,
and so
condition is
that the
no
genecation
way
modify itself.
of
shareable
programmiig practices,
erecition-time
programs for
the high
That is, an executing program
leveL language machine be "pure".
in
which
(see Figure 1 on page 10)
It is
may
its
example,
operator
a REPEAT
processing),
IF
which evaluates
order
to
regard
without
evaluates
forth.
a PAaALLEL
segueae,
strict
For
etc.
statement,
PERF3RM
statements,
DO and
the PL/I
of
to those
functions analogous
perform
constraint promotes
This
outlaws
software,
certain conmon
ani aliminates
programming errors.
many
It
also has
the
"tricky"
types of
implications
rejarding the organization of the language interpreter.
(SEQUENCE
expression1
(PARALLEL
expressionl expression2
expression2 ...
expressionN)
expressionN)
...
(REPEAT expression1 expression2 TIMES)
(REPEAT
expression1 WHILE expression2)
ELSE expression3)
(IF expression1 THEN expression2
operators
Figure 1 : Typical execution-sequencing
Another
important
feature
language is its facility
of
[M13,
Sr'NAL
114-106]
pp.
statement.
in
exceptions
4ith
singlae
handles
which
provided
A
a
proposed
high
level
for processing exceptianal conditions.
In this regari, it is unlike either
PL/I
the
uniform
APL or LISP, but similar to
its conditions,
exception
ON-units,
handling mechanism
both
builtin
way.
Upon
is
user-generated
and
the
and
of
occurrence
an
exception, the current activation chain is sequeatially searched
(from
the
most
for
activation)
consists
primarily
recent
an
of
the context
found,
in
the
a
program
then it
is
to
which
be
(which
corresponds
executed as if
any variable in
If
executed).
It
that context
to
an
the
were invoked in
it
which the exception occurred.
inspect or molify
"root"
system
exception-action-specification
exception-action-specification
exception is
to
activation
may choose to
(subject to the
authorization mechanism), to signal another exception, to return
t3 the Poiat
of interruptio,
to execute a
return (with value)
from
the interrupted
suspend
program, to
results in a SUSPENSION exception in
If
forth.
so
ani
process),
actian-specification is found,
The system
is sigaalled.
then
(which
the process which owns this
corresponding
exception-
an EXCEPTIONEEROR exception
root activation
eaxcption-action-specification
always provides
an
for this exception.
Prizitive Data Types
IT.B.
of primitive
The set
machine inludes such
tu-)les.
type (e.g.,
(e.g.,
shape
are recognized
aggcegate objects as vectors,
eaca object
in the
scalar, vector, tuple),
by the
arrays and
system has
whica contains information
program, character, integer,
its ownership,
real or
an
regarding its
complex) and
as well as an indication of
persistence and access-authorization.
Consequeatly,
the
objects that
implies that
Tiis
as3ociated descriptor
of
no
the process
the machine is able to examine the attributes
objects that
it manipulates,
functions as operator distribution,
and thereby
perform such
domain-rule enforcement and
data protection.
The internal encodings of the object descriptors and values
are inaccessible to the user.
pcovided for the
to another
(e.g.,
Appropriate builtin operators are
purpose of converting an object
from REAL to ZOMPLEX).
from one type
Predicate operators,
such
as
ISINTEGER or
ISPRJGRAM,
are
pairpose of accessing the information
also provided
for
the
in the object descriptors.
Information is written into the object descriptors only by means
of the
BUILDOBJECT operator,
objects
(the conversion operators employ BUILD3BJECT).
As
a
result
of
the sole
haviag
means foc
self-describing
constructing
data
which
is
manipulated by an attribute examining machine, it is possible to
have objects whose type and/or
exists
an object
of
is
orject whose type
is
undefiaed, etc.
unlifined
objects
value is undefinal.
uniefined type
and
Thus, there
undefined value,
constraiaed to be INTEGER
an
but whose value
Furthermore, it is reasonable to permit such
be compoaents
to
of
an
aggregate
object
without requiring that the eatire aggregate object be undefined.
By definition,
Cesults il the
UNDEINEDVALUE.
operator waich
signalling of an appropriate
which guote their
can be
they
lo
not
place to
another, and
operands (by
using the
applied to undefined objects or
contain undefined objects
because
information.
from one
copies objects
builtin QU)TE operator),
eaxception
exception, such as
However, certain operators, such as the builtin
programmed operators
objects that
an undefined part of an object
an attempt to use
and will not
signal an
use
undefined
actually
the
MECHANISMS F3R ACHIEVING IMPROVED PERFORMANCE
III.
section
This
perf ormance
the
discusses
For the purpose of this
suitable high level interface language.
are grouped
these mechanisms
that apply
those
ins3tructia- unit
or
ALU) ,
(I-unit
those
a conventional
it will
arbitrary;
machine.
sometimes
This
the
to
the
class
of
apply
a
do not relate to
any part
classification is
somewhat
execution-monitoring mechanisms that
of
been called
that
and
CU),
or
classes:
into three
traditionally
what has
to
(E-unit
execution-unit
it has a
available to a machine because
mnhinisms that become
exposition,
improvement
case
be the
that a
particular
machinism could be viewed as belonging to more than one of these
classes.
ITI.A.
22timizatign of Ex2gio2n Evaluation
Part A
high level
whicn
language machine
it evaluates
programs,
the
which
are
may use
expressions.
The
presence of primitive
implicit management
viewel
evaluation rate.
the techniques
section deals with
of this
the rate
at
contextual structure
of
to increase
aggregate objects
of temporaries are three
as
contributing
that a
to
a
and the
language features
higher
expression
Zontext-Sensitive 3ptimizations
III.A.1.
its programs
Because
may employ a top-down
level language macaine
which each encountered
execution in
well
its computation
knowledge concerning
its
ia
dynamically
performance
operator is
is
a wealth
a
of
to optimize
able
ways
several
in
executed
not determinable
that is
result, it
As a
execution time.
prior to
method of program
machine possesses
Such a
iefiaed context.
high
expressions, a
are structured
that
are
not
pssible on conventional context-free computing zachines.
Avoiling Unnecessary Operations
ITIA.1.a.
context-sensitive
is
no
[ElsoM70, p.
in
the
be
the
The
operation,
same,
results that they
of
course,
but
the evaluation of the expression
(LENGTH
(CONCATENATE
STRING1 STRING2))
need
to actually
concatenate
167].
All that is
required is
result of concatenating
machine,
For
"short cuts" may be used internally to improve
Thus,
performance.
there
must
of tair
cost
which they are executed.
produce
ultilately
contextual
behave differently depending upon
certain builtin operators may
the context in
available
the
unnecessary operations.
minimize the
to
ia order
example,
performing
to avoid
infocmation
use
may
machine
the
First,
the strings.
On a
CONZATENATE operator could
the
two
the length
strings
of the
high level language
recognize that
it was
the LENGTH operator and that it
invoKed as a lirect argument to
the strings.
need not concatenate
Instead, it
could return as
taat contained the correct result
its value a string descriptor
This could be
value component was undefined.
length but whose
descriptors alone, with no
accomplished by accessing the string
need to even fetch the
(possibly lengthy) strings.
of this sort are
Many other optimizations
possible.
Note
that this optimization could not be performed prior to execution
time, as
compiler, since
by a
LENGTH and CONCATENATE is
is
it
However,
large class oE
may only
which
be performed
by the
operations.
into tne
two processes
There
is
a
transformations that
instruction stream
has identified
66-63]
pp.
he separates
symbols
not possible for context-dependent operators
more global work reduction
Abrams [AbcaP70,
of the
not then known.
avoid all unnessary
these to
such as
the resolution
interpreter.
a number
of these
of drag-along
and
beating.
Drag-along is
the process whereby
the machine
defers the
evaluation of each operator and operand for as long as possible.
By deferriag the evaluation of an expression it becomes possible
the expression
to simplify
only
small parts
consists
of
of
the
manipulating
in ways
which are
expression are
the
deferred
impossible when
available,
Beating
expressions,
and
in order to reduce the
particularly the the object descriptors,
work
of
amount
For
done.
be
needs to
that
example,
the
TIMES.
This
expression
(TAKE 3 (TIMES (NEGATIVE v) vectorl))
might be reduced to
(TINES
deferral of
by the
v)
(NEGATIVE
3 vector1))
type operator
the non-select
transformation
particular
(TAKE
vectorl)
(SHAPE
(MINUS
avoids
3)
unne:essary multiplication operations.
Reoriering Operations
IiI.A.1.b.
Ramamoorthy [RamaC71]
time can
only it
be minimized
or1ering of
subexpressions.
subexpressions
decreasing
be
should
reoriering
resolution.
In particular,
evaluated
process
must
be
rhe overhead involved in
unacceptable when the high level
requirements.
level
machine.
language
ex3cution of several
choose
after
tacKle
if
symbol
language is implemented upon a
well be viable upon a
Thus,
when
faced
unordered expressions, and when
to
But
such a dynamic process is
execute all of the expressions simultaneously,
rationally
their
of
execution time,
performed
conventional machiie, but the process may
high
shown that
order
the
in
the
given to
he has
be reordered to minimize
sibexpressions are to
the
consideration is
processor time
and
memory
expression execution
has noted that
the
most
with
the
unable to
the machine could
resaurce-demanding
exorassians first.
III.A.l.c.
Exploiting Special Cases
The tachaological
control stores
development of writeable
suggests thtat the microprogram for a particular machine might be
many times larger than the capacity
facility cilli
be provided
store and
backing
implementiag
would
permit
specializei
some
store,
to
be performed
be useful
in
Essentially,
it
it would
machine.
employ
a
library
Depending upon
micro-procedures.
operation to
microcole between
level langaage
machine
the
If a
for paging
the control
a high
of the control store.
and the
context,
the
the
of
highly
particular
machine could
invoke a micro-procedure that is specifically designed to handle
that
situation.
extensive
In
"special
effect, the
casing"
ElsoM69] withait
dascribel La
machine would
:similar
to
cequiring a
the
be capable
OMD
of
mechanism
larger than
ncrmal
control store.
III.A.2.
Parallel Processing of Aggregates
since aggregate
language,
i high
objects are
level language machine may
hardware techniques to efficiently
ajgregates
through a
primitive witiin
are particularly
pipeline, or direct
employ specialized
deal with then.
amenable
its machine
to high-speed
parallel processing
Homogeneous
streaming
by cellular
lojic arrays.
components
logic-in-memory
c)Iaventional
be
will
than
economical
more
justifications see
logic [for
"random"
LSI technology,
advent of
that, with the
Indications are
HenlR69].
It will be possible to incorporate logical functions directly in
the memory because the size
may
number
the
oa
constraint
of
the
relative to
large
is becoming
a chip
on
be placel
(and complexity) of the circuit that
interconnections.
chip-to-chip
Thus, the most cost-effective coaputer organizations will employ
regular arrays of memory with builtin logic capabilities.
III.A.3.
Parallel Processing of Special Cases
There
matrix,
inverting a
the
operation
exist several
waich there
for
as
such
operations,
many
are
algorithms
which exhibit various degrees of speed and applicability.
case that
commonly the
will "work"
whenever it
is
the >peratian
Arl
there
are
from
the domain
inversion,
is
defined,
several
significantly faster,
a particular
there is
although
it
executes
algorithms
any noasingular
is
algorithm which
Thus,
rather slowly.
execute
which
work for special cases
although they only
of arguments.
It
to an argument for the which
applied
other
of
in the
n-by-n matrix
may
case of
matrix
be inverted
by
3
triangular decomposition, requiring slightly more than n scalar
multiplications
and
divisions.
However,
if
the
matrix
is
obtained in only n3 /2 scalar
symmatric, then its inverse may be
multiplications and divisions [RalsA65, p. 446, p. 462].
A
whose performance
machine
language
high level
is
of
paramount importance could exploit this situation as follows: To
iavert a mitrix,
domain-test algorithm which would
algorithms in parallel with a
silect
more iatrix inversion
it could execute two or
from
the result
fastest algorithm
the
that
properly
apolies.
Some Dther operations that have
the determinaat of
are the calculation of
this sort
special case algorithms of
a matrix,
the aigenvalues and eigenvectors of a matrix and the zeroes of a
polynomial.
IIt.A.4.
Dynamic Management of remporaries
In the evaluation
machine, perhaps
customary on
Tiese
possess its
cells
lue to
software implemented dynamic
the
number
of stocage
own
are usually
load-time or activate-time,
advantage
dynamically manage temporaries.
conventional machines
somewhat statically
cells.
potential performance
the greatest
results from the ability to
is
higa level language
of expressions by a
calls
tor
set of
allocated
each procedure
to
reserved temporary
at
compile-time,
the computational expense of
storage allocation.
that
It
are
Conseguently,
dedicated to
use
as
tenporaries throughout the system far exceeds the minimum number
that are actually needed.
On
a
allocate
high
level
temporaries on
after their
it
language machine,
demand
and
is
release them
use.
Since
a temporary
immediate context
of an
enclosing subexpressioa,
is only
storage may be used to
small amount of
possible
to
immediately
used within
the
a relatively
efficieatly satisfy the
temporary storage requirements of a large system.
the
Because
temporaries is
instantaneous
generally small, a
need not
This implies
contribute to
for
local store
very high speed
used to contain
with the processor could be
that is integrated
the temporaries.
reguirement
storage
that references
to temporaries
data transfer
the processor-to-storage
bottleneck.
III.A.4.a.
Increasing the Use of Temporaries
Since references to
than
references
wartawhile
to
to
temporary-references
doing so
is to
in
operanis
attempt
to
employ a
more efficient
temporaries may be far
to
main
storage,
increase
staraye-references.
machiae language
the
ane
that is
it
may
ratio
method
be
of
for
expression
oriented and has an abundance of builtin monadic operators.
20
contains a
possesses
language
APL
The
dyadic operators
operands (e.g.,
the reciprocal and exponential
of nontrivial
percentage
programs
suited to exploit the
tnat
The
by the large
consist of
level machine
of their
operators).
demonstratel
of APL is
contrast, low
expression. In
value for one
that assume a iefault
expression oriented nature
are merely
monadic operators which
large number of
it
characteristics;
these
a
single
languages are
ill
efficiencies of temporaries, particularly
when the values involved are nonscalar.
111.3.
Instruction Stream Efficiencies
Part
B of
mi-hLne with
having
a
this sectioa
a high level
high
level
discusses
four ways
interface language can
instruction
stream.
The
in which
a
benefit from
presence
of
operators for explicit execution-sequencing control, the absence
of detailed and unnecessary
density of program
inteclockiag
II.3.1.
encoding and the freedom
are presented
instruction-issuing
tactical specifications, the higher
as factors
from much needless
contributing to
higher
rates.
Explicit Procedural Control
on high performance machines such
as the Control Data 7600
aal the IBM System/360 ModeL 195, pipelining and parallelism are
used within
the E-unit
instruction
execution rate.
power
is
wasted
to achieve a
because
major improvement
However,
the
much
I-unit
is
of this
in the
increased
unable
to
decode
instcuctions and issue them to the E-unit at a commensurate rate
[11,
pp.
13-13,
31-34; ThorJ7l,
to decode several separate
pp.
nature of
severely
effectiveness
ThorJ71; M1; M4; 18].
conditional branches
The
pipeline
the instructions
of
but the
being decoded
this process
[BuchW62;
[-unit is continually "surprised" by
and otaer discontinuities which
reloading of instraction
E-unit
Attempts are made
instructions simultaneously,
nominally sequential
limits the
124-125].
buffers and cause a
streams.
As
Flynn
require a
disruption in the
[FlynM72,
p.
21]
has
observed:
Model 91 had execution
rhus the IBM System/360
...
resources in excess of 70 MIPS (million instructions per
sec) while this was immediately restricted at a maximum
instruction decode rate of 16 MIPS; further with an average
incidence of branch and data dependencies this was reduced
to 6
4 IPS.
Thus the discrepancy
betieen available
resources of 70 MIPS and average request rate of 6 MIPS.
If
a
machine has
program structure
these
I-unit
as described in
difficulties
surmounted,
caa be
a hijh level
with
the
By requiring
that
relieved
of
interface language
section II.A.,
instruction
all
then
stream
procedures be
the responsibility
write-operations into prefetched instructions.
of
with a
most of
can
pure,
be
the
supporting
instructions
branch
terms alone -DijkE68; DijKE70;
convey
information
valuable
processor
to the
content of an
iteration, the number of times
be performed,
the extent
etc.
conditional,
to
informatia
The
true and
of the
micaine
organize the
10 can
regarding
the
aa iteration will
false clauses
on a
use
this
in principle,
can,
and thereby
its resources
use of
Language
Figure 1 on page
as those described in
coastructs such
MilIH70], they
implications.
performance
machine
promising
havae
the
language can be justified
structured programming aspects ot the
in user-oriented
While
encounterel.
be
need
that
number of
in the
the reduction
importance is
Of greater
optimize its own performance.
III.3.2.
Deferral of Tactical Decisions
A pro3lea
allocating
machine resources at
unique
language
must
instructions
be
whica
mapped
reference specific
is a complicated task
This
generally requires
an optimizing compiler
a Model
35,
high level
level
low
registers
and
to perform, and
it well.
to perform
a System 360 Model 50 may be
Even then, what is "good code" for
inefficient on
of
machine
storige locations.
a
fra
sequence
a
inta
detail that
a level of
The statements
reluires overspecification.
that of
based systems is
that plagues compiler
and vice-versa.
In fact,
on high
performance machines such as the 360/195 these a priori tactical
decisions are a severe handicap.
As
Chen [Chen711a,
p. 74] has
note,:
a piece of proceiural language code retains a wealth of
..
statement
FORIRAN
A
information.
independence
job
connected
causally
of
string
a
describes
essentially
locally
are often
statements
but adjacent
events;
executed
be
can
and
other,
each
of
indepeniant
conventional compiling process
Yet the
concurrently.
machine instructions are
resultant
obscuces causality. The
unrealistic causality
imposing
tactical prescriptions,
and arbitrary facility
time)
a
demanis (one instruction at
assignmeats ("register 2", "address 32768"); they becloud
process, and
human understanding and impede the debuggia
that
inefficiency
are such potential sources of computer
statements
original
machines are known to recoastruct the
internally for better traffic flow.
problem entirely.
interprets
programs that it
The
avoid this
interface language may
with a high level
A machina
can be free
from purely machine-oriented constraints.
Algorithmic Encoiing Density
TI.3.3.
On
the
machiae,
conventional
a
that manipulate non-scalar
operations
imolemental
by means
level
(either iterative
of some sequence of low level instructions.
recursive)
language
objects generally must be
repetition
of the
high
or
Consider
the addition of two vectors with the vector sum replacing one of
the argument vectors.
of the
Filure 2
form A=A+B
(page 25)
the program
To ba specific,
where
A and B are
indicates,
there
loop structura whica
The first of these consists
consider a PL/I statement
vectors of length
are four logical
underlies suca
n.
As
parts in
an operation.
of several setup instructions which
ace executed only once at the beginning of the vator operation.
ENTE R
I
V
I setup
Initialize registers for loop
I
r- ----------
I
r-->I Scalar
lOperationi
V
r----------1
lIncrementi
I
I Index
A (i)
<-
e.g.,
i <-
i+LenIth (A(i))
A(i) +B (i)
V
I
-
e.g.,
Loop until vectors have been
processed
-Condition
rest
I
EXIT
Figure 2: Conventional loop structure for vector operations
or ctr
to
fashion.
the operation
accomplish
a
Thus,
praportional
however,
three parts,
The remaining
certiin
number
required for
to n,
is
3 and
4 (page
iterated n
are
an
in
of
the
times in
element-by-elemenit
references,
memory
purpose of
fetching
instructions.
Figures
impl1ementatioas
operitions.
of
the
rhese programs
26)
vector
contain "optimal"
addition
and
are optimal in the
System/360
inner-product
sense that they
occupy the fewest bytes possible and have the shortest execution
25
LI
L
A
ST
BXLE
LDOP
R3,R5,LOOPCTRL
R2,A(R3)
R2,A(R3)
R2,A(R3)
R3,R4,LJJP
Setup for iteration
Fetch A(i)
Add 3(i)
Replace A(i) with sum
Loop until A(n) is processed
Initial
LOOPCTRL
F 'i4'
nv
nF
A
3
value
for index R3
Limit value m=n*4-1
Increment
Vector of fullword
integers
Vector of fullword integers
Figura 3: System/360 implementation ot vector addition
LOOP
LI
R3,R5,LJOPCTRL
Setup for iteration
SER
ME
F2, F2
F4, A (R3)
F4, B (R3)
AER
BXLE
R3,R4,L3JP
STE
F2,INNRROD
Clear accumulator
Fetch A(i)
Multiply by 3(i)
Add to sum
Loop until A(n) is processed
Store sum
LE
F2, F4
1'
LIOPZTRL
A
B
INNRPR)D
F14e
nE
nE
E
Initial value for index R3
Limit value a=n*4-1
Increment
Vector of short float
Vector of shart float
Result of INNERPRODUCT(A,Bd)
FIGMURE 4: System/360 implementation of inner-product
time.
assume
They are somewhat unrealistically
convenient
addressability
efficient in that they
to all
the
Nevertheless, instruction fetching accounts for
menmory references in
63% in
the case of the vector
the case of the inner-product.
required
data.
over 57% of all
adlition, and over
A
with
machiae
repetitively
operators,
fetching
only a
by this overhead
Since
instructions.
atomic
as
its
are builtin and automatically distribute
such as ADD,
over vector3,
language,
interface
II, will not be burdened
describel Ln section
of
level
high
a
neei be
single "instruction"
fetched in
orier to perform the entire operation.
III.B.4.
As
High Level Interlocking
in
noted
dependencies
in
the
the
111.B.1.,
Section
instruction stream
legradation in the instruction execution
hign-performance
Elaborate
machine
,31,
such as
schemes,
the
results
in
of
data
a
major
rate of a conventional
31-34;
pp.
presence
Scoreboard
FlynM72,
on the
P.
21].
CDC
6600
[rhorJ71;
DennJ70], are required to interlock storage references
in order
to prevent
with
conflicts (i.e.,
storage cell, to insure that no aperation is
a given
respect to
interchanged with a
write operation).
This
problem
will
language machine but
one:
to exist
continue
will be of a much
a
high
level
smaller magnitude.
For
thing, most storage references made by a high level language
by the machine rather than
machine will be generated internill y
by
on
the programmer.
For
builtin operator applied
machine generation of
example,
the
programmer's
to vector operands will
use of
a
result in the
the numerous storage references
that are
required to
scehenata may
without
be used),
the burden
to prevent conflicts
necessary on a larger scale
will still be
a pipeline
interlocking
Of course,
of interlocking.
be
these references
machine may issue
the
can
algorithm that
the extreme case,
be conflict free (in
designed to
Since these
the vector.-
fixed
a
generated oy
are
references
elements of
process the
But the interlocking mechanism
amo)nj the liiga level operators.
will need to be used much less frequently.
III.X
oncurent Error Monitoring
tea
largely
component
developments in
due to
of
introduction
continue to
to
is expectel
years ani
the past
has iacreased manytold over
Hardware reliability
tecanology and
hardware-error
sophisticate
This
improve.
detection
is
the
and
correctibn schemes.
Unfortinately,
systems which
to expand
systeD
in scope,
software to
sinultaneous
users.
crash an
Yet,
crisis, no widely-used and
or
place
It is now commonplace
reliability.
tie
have emerged since the
CP-67/CMS)
has
a
experienced
not
similar
the complex
operating
days of IBSYS,
and which
Moreover,
in reliability.
iaprovement
cmatinue
has
software
increasing emphasis
upon
for a simple malfunction in
entire system
despite the
apparent
with its
and
many
growing
general-purpose system (e.g., OS/360
overcome this
problem.
It
is
generally
as we know them
a-knowledgad that powerful pcogramming systems,
today,
are never completely debugged.
many
execution-time
errors
conventional machine
fact,
in
are,
detectable
detected
be
to
machine language,
low level
with a
all
If
systems.
contempacacy
on
prictically
detected
be
cannot
errors
software
of
types
common
is that
uncliability of software
reason for the
A major
cn
a
then
same substantial fraction of the machine's instruction execution
rate must be expended continally upon error checking [a partial
exceotion is the Burroughs 36700 family of machines which have a
lw level
for block-structured
requirements, particularly
that
is
more
(e.g.
checking
distinguishable);
see
incurred rejardless
the technology
software
conducive to
of th e
it
perform error monitoring.
lastead,
instructions
cimpating
which
power.
of
tae
consu me
Virtua Illy
be
as the
the machine cannot
as parallelism) to efficiently
(such
meaas
must
cost
and hence does not convey any
low in laval,
emil-y hardware techniques
by
are
data
As long
is implemented.
high level language semant ics to the machine,
performed
other
organizaticn or
machine's internal
with which
machine laaguage is
This
M5 and 3rgaE71].
than
small degree of
an1
instructions
,
laiguages, and
reliability
that provides only a
cantemporary systems, but
error
certain software
does reflect
machine language that
error monitoring can only be
addition
some
ao
of
fraction
explicit
of
the
general-purpose
machine
machine's
programming
29
the
checking because
run-time error
employ extensive
syst2-ms
costs involved are unacceptable.
consider
example,
For
CBS,
BIt
accompanying
the
activities.
to
certain
In particular,
it
is
system as the
second
perform subscript testing within a
This
compiler
debugs it
wites a
program, fully
facility,
and then installs the
tests
removed.
programmiag
Howevec,
development
use such
a
in
is
oriented,
to
particular program only if a
based upon
scheme is
the
limits
executed operating system
compila time [MI1,
program-checkout option was specified at
172-173].
to
handle.
application programs.
waich is
approach,
program
infeasible
basis for a frequently
or for computation-intensive
A
degradation
performance
system
the
of
usefalness
detect and
easy to
are therefore
sibscripting errors
and
software
by
interpreted
are
statements
PL/I
all
In
[M12].
as exemplified by CPS
totally interpretive approaca
is the
First, there
are used.
approaches that
basically tio
There are
PL/I.
a language such as
subscripting operations in
illegal
of detecting
problem
the
the assumption
using the
pp.
that one
program checkout
debugged program with the error
practice,
bugs manifest themselves days,
a program las been in productive operation.
many
non-trivial
or even years,
after
In order to get a rough measure of the overhead involved in
p3Ecfrmirg
type
this
of
anI STRINGRANGE'
tests.
It was found
was accompanied
program execution
contemporary
programs were
the compiler generate1 SUBS2RIPTRANGE
run both with and without
error checking
(F) [M11]
"off-the-shelf" PL/I
machine, saveral
a
on
checking
error
by a
a 6B%
time and
that this simple
to
15%
179% increase
increase in
to 97%
ty .e of
in
program
size.
Although the compiler generated tests were not as efficient
as hand-coded
tests, they
were reasonably
overhead could be reduced by at
good.
Perhaps
most a factor of two.
the
However,
it siould also be noted that the test case programs did not make
heavy
use of
that make
subscripting or
extensive use of
string manipulations.
these facilities
Programs
would undoubtably
incur a higher penalty.
On a machine with a high
of
error letection
could 0e
level machine language, this type
performed
concurrently with
actual computation that is being monitored.
the
THIS PAGE INl'ENTIINALLY
LEFT BLANK
32
IV.
HYP3IHETICAL HIGH LEVEL LAN3UAGE MACHINE
This section describes a machine
which is
sole purpose of directly executing a
type described
in Section I.
omitted.
important topics
Some
Dersistence
are
not
provided are intended
the
specific values
designed for the
high level language of the
Of necessity, many
such as
even addressed.
object ownership
The
details
to illustrate the nature
that are
used for
meant to be reasonable but generally
details are
that
and
are
of the machine;
design parameters
are
have not been subjected to
system-wide tradeoffs.
IV,A.
General Machine Orjanization
The proposed
with
a structure
functions
of
machine is
as
a shared
indicated in
each I-unit
are
encoding of a program written in
resource multiprocessor
Figure
to
5
(page 34).
step through
a
linearized
a high level machine language,
to maintaia the current state of execution for that program,
to issue
reguests for computation to
results.
Ihe
I-units.
It
units (FU's)
E-unit services
consists of a
to
the E-unit and
the computational
and
await the
needs of
the
collection of specialized functional
which are centrally coordinated.
There are
I-units
The
a
a number
common
of reasons
E-unit.
for coupling
First,
because
the multiple
the
builtin
|LogiC-in-Memory)
I
Cache
I
I
I E-unit
-----
r
r-----i
I-unit
I-unit
jI-unitj
1 -unit
r------i
r----,-
r
r -----
jCache I
ICacae I
JCache I
Icache I
,
Figure 5: High level language machine structure
op3rators are
numerous and
quita large.
Furthermore,
which
perform
the
operations) are used
complex, an
iost of the
matrix inversion,
E-unit is
the FU6S
FU's (such as
coot
square
Thus, it is
irregularly.
necessarily
index
and
unreasonable to
dedicate a complete E-unit to each instruction stream.
Second, the
need to
E-unit operations
orller to prevent conflicts.
If
be interlocked
multiple E-units were used,
would not really be independent, but
in
they
would need to be centrally
coordlinated anyway.
Third,
machine can
and
lastly,
the
E-unit for
accept requests at a
a high
level language
much higher rite than
it
can
34
possibly complete them --
this familiar pipelining phenomencn is
accentuated by the more substantial
operations that are builtin
on such a machine.
The interface
between the
I-units and
either synchronous or asynchronous.
Flynn :FlynM72] and by
It appears
synchrony or
protocol is
not sensitive tj
may be
Attractive approaches have
been iavestigated by
that the
the E-unit
Plummer [PiumW72].
asynchrony of
tae use
of a high
the interface
level machine
language.
Underlying
whic!
provides
(16
If
every
bits
per
-ell)
spaces may
names.
At
which
are
system
of uniguely
an orderai set
of fixed
consecutively
of holding
implemented
in
the
without needing
generation rate
the machine
running out of
the purpose
be addressed
a space
five microseconds,
years before
one-level storage
the space names are 48 bits in length, then up to
2.3.1014 distinct
reuse space
a
Each space consists of
cells
addressed.
is
an effectively inexhaustable number
named spaces.
length
the processor
could run
unique names.
machine generated
one-level
storage
Spaces
of one
space
for about
39
created for
temporaries are
systea,
to
and
do
not
not
contribute to the consumption of space names.
Each I-unit has its
storage
hierarchy.
own cache
Since
all
store as an interface tc the
programs
are read-only,
these
35
eich other or with the activity of the E-unit.
capaoilities
logic-in-memory
possesses
either with
aad ace not interlocked,
caches are unidirectional
and
The E-unit cache
is
organized
in
sectors [as described in StonH70 and ThurK70].
Objegts
IV.B.
descriptor and an associated
which contains a
in
Figure 5 (page 37),
and access
structure
system consists of
stored within the
Each object
a space
As shown
value.
the descriptor specifies the object type,
For
constraints.
aggregate objects,
it
also specifLes the object caak and dimensions.
are
There
representation
an
of
an
executel.
execution
is
a program
in
are used to
phase
This section
ia
which the
in
phise
used
which a
an
process
character
a
PROGRAI object
string
PROGRAM
and an
ACTIVATION object,
ACTIVATION
iiscusses several key aspects
different phases of program interpretation.
of
hypothetical
to generate
generate an
which
tae
on this
language program
activation phase
ENVIRONMENT object
in
phases
temporal
a translation
machine:
3oject,
three
a machine
interpreting
ani
Int ergetation
Asgacts of Prog2a
IV.C.
object
is
of these
Bit: 0123456789ABCDEF
r----------------i
Cell 0 jAAAABBBCDDDDEEEEJ
Cell
1 IFFFFFFFFFFFFFFFFI
Cll
2 1IGGGG
Cell
3 1GG GGGGGGGGGGGGGJ
GGGGGG)
Cell n IVVVVVVVVVVVVVVVV4
1-----------------
AAAA
-
Object ?ype: indefined, Logical, Integer,
Real, Complex, Character, Label, System
or Mixed
BBB
-
Object Structure:
Undefined,
Scalar,
Vector or Array
C
-
Value Pcesent Flag
DDD)
-
Object
0000
0001
0010
0100
1000
1100
EEE
-
Object Descriptor Extension
FF...F
-
Optional rank field
GG..G
-
Optional dimeasion fields
Access Constraints:
- uncoastrained
- walue constrained
- type constrained
- cank constrained
- limension constrained
- struzture constrained
vectors and arrays,
arrays)
VV...V
-
(present for arrays)
(present for
field repeats for
Value eacaded in as many cells as
required
Figure 6: InternaL repcasentation of an object
37
Program Representation
IV.C.1.
A PRO3RAM
3bject,, which
the TRANSLATE operator to
cceatei by applying
evaluates to
the program that invoka
rhis allows
exc2ptions.
appropriate
operator signals
TRANSLATE
then the
translatimn,
denotes a
program during
in the source
errors are detected
If
program.
in operand which
whose value
string object
a character
SYSTEM, is
of type
is an object
TRANSLATE to
decile whether to continue or to abort the operation.
Figure 7
As
of
c-iaracter
string
rhe second,
dhcived.
whici contains
a TEXT object,
an encoded
is
a
the
object
was
an object of type SYSTEM
is
linearization of
as a linkage vector for binding nonlocal symbols.
object, is
a SYSnBM
object
tree.
the program
The third, a LINKAGE object, is also a SYSTEM object.
SYIB)L
is
of
copy
PROGRAM
the
from which
object
PR3GRAM object
a
The first
elements.
five
comprised
illustrates,
(page 39)
which servas
It serves
The fourth, a
as a
symbol
table, containing such information as the symbolic names for all
tokens
boundary
the TEXT
ii
object.
address (3DY)
the
And
which is
to distinguish
used
is
fifth component
a
between
local and nonlocal symbol references.
An object of singular importance
soecifies the actual algorithm to be
eKecuting a
given program.
is
the TEXT object,
which
performed in the course of
It consists of
an ordered
set of
38
Bit: 3123456789ABCDEF
r
Cell 0
Calls
-
--
-
--
-
-
-
1O11131011111000I
ptr.
1-3
to SOURCE
Cells 4-6
ptr. to TEXT
Calls 7-9
ptr. to LINKAGEJ
--------------------------- 4
ptr. to SY&BOL
Cells 10-12
r-------------------
elemants tait
form shown in
or TEXT tokens of the
pointers
trand
are either op
Figure 8 (page 43).
Figure 9
(page 41)
contains
an example of a TEXT object (note that this object has undergone
symbil resolution; i.e., it
Program Activations
IV.C.2.
The
builtin
ATIVATIDN
copy of the
a
opecator
ACTIVATE
object
a
given
ENVIRONMENr objects.
in
is part of an ACTIVArION object).
way.
Its
35 bits of BDY,
local symbols
in this
and
to
create
one
or
an
more
It accamplishes this operation by making a
"activation area" of storage,
hih orler
used
PR3GRAM object
PROGRAM object and
privileged
is
then manipulating
functions
storing
include
the new object
allocating
an
the area aidress into the
creating the re4uired
"activation area",
instances of
binding operands
to
pcgram symbols, and resolviig nonlocal symbols by searching the
Bit: 0123456789ABCDEF
IABBCCCC2CCCCCCCI
A
-
Builtin/Defined Flag
This flag is set to 1 by the symbol resolution
mechanism (shen creating an ACTIVATION) it this
symbol resolves onto a builtin operator. Hence,
during execution, builtin operators are
immediately self-identifying.
BB
-
Operand Designator
00 - no operands (symbol is niladic)
01 - one operand (symbol is monadic)
10 - two 3perands (symbol is dyadic)
and first operand has no operands
11 - abitcary number of operands
rhis field indicates the number of operands
that are actually being passed to the symbol.
(It does not indicate the number of operands
that the symbol will accept; symbols may
caoose to accept varying numbers of operands)
Its purpose is to reduce the number of
operand poiaters that are required.
CC...2
-
Token Offset
If this offset is greater than tie low order
13 bits of BDY (in the ACTIVATI3N object),
then this offset points to an entry in the
nonlocal symbol LINKAGE object; else this
offset, waea appended to the higi order 35
bits of BDY, coastitutes the spice name of
an object that is local to this ACTIVATION.
A program may reference up to 8192 distinct
objects.
Figuce 8: Internal Representation of a TEXT token
40
Source program:
(ASSI3N
X (SUM Y (F-N1
(MAX
X Y Z)
(FCN2
Y))))
Corresponding TEXT object:
Bit: 312 3456769ABCDEF
Contaias
)11J13131111103311
object descriptor
11101
builtin dyadic opcode
'ASSIGN'
I
i----------------- I
13001
I-----------11101
niladic reference
X
'SUA'
3
I-----------------1
13001
builtin dyadic opcode
niladic reference
Yi
a-adic reference
-----------------
1
2
operand pointer
i
operand pointer
1111I
'MAX'
I
operand pointer
3
3
builtin n-adic opcode
1
operand pointer
operand pointer
3
niladic reference
13001
niladic reference
I -------
1000i
1
niladic reference
I ---------
10011
FCN2
monadic reference
1001
Y
niladic reference
Figure 9: Example of
a TEXT object
ENVIRONMENr objects
in the sequence provided
(each ENVIRONMENT
object may also specify a successor ENVIRONMENT object)
IV.C.3.
Program Execution
An ACTIVATION
builtin EXECUTE operator.
number of
the application
executed by
may be
of the
The execution of a program involves a
activities in the I-unit
and E-unit portions
of the
ma chine.
The I-unit is envisionel
which
linkage
operate under
fetch unit
fetch unit has
central
and an
control: a
instruction
its own cache from
sequential manner)
object component
as consisting of three major parts
the tokans
that are
of the program.
program in a top
unit reals the
fetch unit,
assembler.
The
which it reads (in
contained in
It is equiped
stacks so that it may conveniently
through the
token
a
token
a highly
the TEXT
with hardware
recurse when walking its way
down fashion.
contents of the LINKAGE object
The
linkage fetch
component of the
program in order to obtain tie space
name of an object to which
a nonlocal
The instruction
buills
symbol has
been bound.
logical instructions
for the
opcole and a list of the space
issues the
logical instructions
reply, which is
or an exception.
E-unit
assembler
by collecting
names for its operands.
to the
either the space name of
E-unit and
an
It then
awaits the
the resultant object,
The E-unit
does all
interlocks upon the
the actual
operaad space names.
the value component of an Agjregate
characteristics
of
logic-in-memory cache.
a machine that
fetching of
the
vArious
The
object is
functional
operands and
actual layout of
determined by the
units
and
the
It is crucial to the performance of such
its objects Da
internally orgaaized
in order to
maximize spacial locality since the one-level store will used so
extensively.
THIS
PAGE INI'EN'IONALLY
LEFT BLANK
V.
2ONCLUSIONS
This
paper has
wtich a machine
langiuage
investigated a
variety
that directly interprets a
might expect
to
achieve
of mechanisms
by
suitable high level
improved pecformance.
One
result of this effort is a catalog of such mechanisms, which may
b:
of some use in the design of high performance computers.
Another
sijnifican:e
result
is
an
iacreased
(in terms of performance)
macrhine languige.
It
is
understanding
of
the
of adopting a high level
now the author's view that the use of a
low level machine language, as an intermediate interface between
the
high
effects
level language
upon the
and
the
machine, has
potential execution
rate of
two
inherent
the high
level
language.
First,
relevant
the low
level interface
semantic information
program to the machine.
iatent of
the high level
restricts
which flows
the amount
froi the
apecations.
acceptable to
While,
with yesterday's
it was
sequ3nce of
context-free atomic orders, advances
a high
executing
The computer is deprivei of most of the
technology,
now permit
of
decompose a
performance Maciine
program into
a
in technology
to profitably
employ a
knowledge Df the macroscopic operators and operands.
45
Second,
artificial conustraints
camputatioa because
impacts
detaila:
conastraints have
a low level
tactical
long been
are imposei
language, by its
prescriptions.
an abstacle to
upon the
the
very nature,
These
the design
unwanted
of high
performance machines and will become even less acceptable as the
functional capabilities of hardware increases.
REFERENCES AND BIBLIOGRAPHY
Abbreviatians
Jou rnaal s:
ACMI
AF'IPS
BCS
E JCC
FJCC
IBM J.
3f Res.
and Dev.
IBM Sys.
J.
IEFEE
I FIP
NAE 0 N
SIGPLAN
WJCC
Association for Computing Machinery
American Federation of Information
Processing Societies
British Computer Society
Eastern Joint Computer Conference
Fall Joint Computer Conference
I3 Journal of Research and DeveloFment
11 Systems Journal
institute of Electrical and Electronics
Eagineers
International Federation for
Information Processing
National Aerospace Electronics
Conference
ACM Speial Interest Group on
Prograiming Languages
Spring Joint Computer Canference
Western Joint Computer Conference
General:
Bull.
Comm.
Conf.
Cong.
J.
Pro-.
Pt.
Pub.
Res.
Rept.
Supp.
Symp.
Tech. Rept.
Trans.
bulletin
Comm unications
Conference
Congress
Journal
Proceedings
Part
Publication
Research Report
Supplement
Symposium
rcanical Report
rcAnsactions
47
Books, Periodicals and Reports
AiraP73
Abrams, P. S.,
An APL Machine, Tech. Rept. No. 114,
Stanford Electronics Laboratories, Stanford University, February 197)
Alle?71
Allen, F. E.,
AlleL169
Allan, M. W.,
and T. Pearcy, Developments in Machine
Arciitecture, Proc. of the Fourth Australian Computer
ConE., Adelaide, South Australia, pp. 227-230, 1969
Anila364
Amdahl, 3. M.,
et al.,
The Structure of SYSTEM/360,
Part III - Processing Jnit Design Considerations, IBM
Sys. J., Vol. 3, No. 2, pp. 144-164, 1964
AndeD67
Anderson, D. W.,
F. J. Sparacio and R. M. Tomasulo,
The IBI System/36) Model 91: Machine Philosophy and
Instruction-Handling, IBM J. of Res. and Dev., Vol.
11, No. 1, pp. 8-24, January 1967
Brt R69
Barton,
R.
3rganization:
International
and John Cocke,
A Catalog of Optimizing
Transformations, Res. Rept. RC 3548,
IBM Thomas J.
Watson Research Centec, Yorktown Heights, New York,
September 1971
Science --
S.,
Ideas
for
Computer
Systems
A Personal
Survey, COINS-69, Third
Symp.
on Computer
and Information
Software Engineering,
December 1969
BA;hT67
3ashkow, T. R.,
at al.,
System Design of a FOETRAN
Machine,
IEEE Trans. on Electronic Computers, Vol.
EC-16, No. 4, pp. 485-499, August 1967
BashT68
Bashkow, T. R., et al., Study of a Computer for Direct
Processiag of List Processing Language, Tech. Rept.
No. 103, Columbia Jniversity, New York, January 1968
Bell71
3ell,
.,
1.
an1
A.
Newell, Computer
Structures:
Readings and Examples, McGraw-Hill Book Co., New York,
1971
BerkK69
Berkling, K.,
A Computing Machine Based on Tree
Structures and the Lambda Calculus, Res. Rept. RC
2589, IBM Thomas J. Watson Research Center, Yorktown
Heights, New York, August 1969
BjorD70
3jorner, D.,
On Higher Level Language Machines, Res.
Rept. RJ 792, IBM Research Laboratory, San Jose,
California, December 1970
48
BlacE59
Bloch, E.,
The
Engineering Design of
the
Computer, Proc. of the EJCC, pp. 48-59, 1959
BrawJ71
Brown, J. A.,
A generalization of APL,
Systems and
Information Science, Syracuse University,
September
1971
Bu:h462
3ucaholz, W.,
Plaaning a Computer System,
Book Co., New York, 1952
ChamD71
Chamberlin, D. D., The "Single-Assignment" Approach to
Parallel Processing, Res. Rept. RC 3308, IBM Thomas J.
Watson Research
Center,
Yorktown Heights,
New York,
March 1971
Chenr71a
Chen, T. C.,
Parallelism, Pipelining,
Efficiency, Computer
Design, Vol. 1),
53-74, January 1971
Chenr7lb
Chen, r. C.,
Unconventional Superspeei Computer Systems,
Proc. of thae SJCC, Vol. 38,
AFIPS Press,
New
York, pp. 365-371,
1971
Ches;71
Chesley,
G.
D.,
and
W.
R.
Smith,
The
HardwareImplemented High-Level Machine Language for SYMBOL,
Proc. of the SJCC, Vol. 38, AFIPS Press, New York, pp.
563-573, 1971
Chro 71
Chroust, G.,
Comparative Study
of Implementation of
Expressions, Tech. Rept.
TR
25.112, IBM Laboratory
Vienna, Austria, March 1971
CoatC69
Conti, C. J., Concepts for Buffer Storage, Tech. Rept.
rR 30.1352, IB& Poaghkeepsie Laboratory, February 1969
CurtR71
Curtis, R. L., Management of
High Speed Memory in the
STAR-110 Computer,
Proc. of
the IEEE
International
Computer Society Cont., pp. 131-132, September 1971
Davi?71
Davis, R. L., and S. Zucker,
Structure of
a Multiprocessor
Using Aicroprogrammable
Building
Blocks,
NAECON Record, pp. 186-200, May 1971
D? nnj69
Dennis, J. B., Programming Generality, Parailelism and
Computer Architecture, Proc. of the IFIP Cong. 1968,
North-Holland, Amsterdam, pp. 484-492, 1969
Dnn.J70
Dennis, J. B.,
Modular,
Asynchronous Control Structures for a High Paerformance
Processor, Record of the
Project MAC Conf. on Concurrent Systems
and Parallel
Computation, ACI, New York, pp. 55-80, June 1970
Stretch
McGraw-Hill
and Computer
No.
1, pp.
49
DanJ71
Dennis, J. B.,
3n the Design and Specification of a
Common Base Language, Symp. on Computers and Automata,
Polytechnic Institute of Brooklyn, April 1971
DijkE63
)ijkstra, E. W.,
3o To Statement Considered Harmful,
letter to the Editor, Comm. of the ACM, Vol. 11, No.
3, pp. 147-148, March 1968
DijkE70
)ijkstca,
E.
W.,
Structured
Engineering Techaigues,
NATD, Brussels 39,
Programming,
Scientific Aftairs
Belgium,
pp.
84-88,
Software
Division,
April 1970
Elso 69
Elson, M.,
R. A. Larer, et al.,
A Prototype PL/I
3ptimizing Compilar, IBM Tech. Rept. rR 44.0071, IBM
Systems Development
Division, Boulier,
Colorado,
November 1969
Elso47
Elson, M.,
and S. T. Rake, Code-Generation Technique
for Large-Language Compilers, IBM Sys. J., Vol. 9, No.
3, pp. 155-188, 1370
Flei17l
Fleisher, H.,
A. deinberger and V. D. Winkler, The
iriteable Personalized Chip, Computer Design, Vol. 9,
No. 6, pp. 59-65, June 1970
Flin3i73
Plinders, M., et al.,
Functional Memory as a General
Purpose Systems rechnology, Proc. of the IEEE International Computer 3roup Conf., pp. 314-324, June 1970
Flyn165
Flynn, K. J., aal 3. M. Amdahl, Engineering Aspects of
Large
High
Speed
Computer
Design,
Symp.
on
Microelectronics and Large Systems, Spartan Books, pp.
77-95,
1965
FlynI66
Flynn, M. J., Vecy High Speed Computing Systems, Proc.
of the IEEE, Vol. 54,
No. 12, pp. 1901-1939, December
1966
Flvn172
Flyan,
M. J.,
and A. Podvin,
Shared
Resource
processing, Computer (A Pub, of the IEEE
Society), pp. 2)-23, March/April 1972
Multi-
Computer
Fost271
Faster, C. C.,
Jncoapling
Central Processor and
Storage Device Speeds, The Computer Journal, A Pub. of
the BCS, Vol. 14, No. 1, February 1971
Gird?71
jariner, P. L.,
Functional Memory and Its Microprogramming Implications, IEEE Trans. on Computers,
Vol.
C-23,
No.
7,
pp.
764-775,
July 1971
50
GictJ71
ertz, J.
L.,
Hierarchical Associative
Memories for
Parallel Computation, Tech. Rept. MAC TR-69, Project
4AC, Massachusetts Institute of Technology, Cambridge,
June 1970
HAssA71
Hassitt,
A.,
Microprogramming
ani
High
Level
Languages, Proc. of tie IEEE International ComFuter
3ociety Conf., pp. 91-92, September 1971
Han169
Henle, R. A.,
et al., Structured Logic,
Proc. of the
FJC2, Vol. 35, AFIPS Press, iew York, pp. 61-67, 1969
IHbbL71
Hobbs, L. C. (Editor), et al., Parallel Processor Systems, Technologies, and Applications, Spartan Books,
New York, 1970
Huss373
Husson, S. S., Microprogramming:
tices, Prentice-Hall, 1970
IlifJ63
Iliffe, J. K.,
Basic Machine Principles,
Elsevier Publishing Co.(in Europe: MacDonald,
New York, 1968
JohnJ71
Johnston, J. B., Tae Contour Model of Block Structured
Processes, SIGPLAN Notices, Vol. 6, No. 2, pp. 55-82,
February 1971
JoseE69
Joseph, E.
Proc.
of
Amsterdam,
C.,
the
pp.
Principles and PracAmerican
London),
computers: Trends Toward the Future,
IFIP
Cong.
1968,
North-Holland,
665-677,
1969
LawsJ63
Lawson, H. W.,
Jr.,
Programming-Language-oriented
Instruction Streams, IEEE Trans.
on Zomputers, Vol.
%-17, No. 5, May 1968
MainS71
Madnick, S. E., An Analysis
Project MAC, Massachusetts
December 1971
Mc:rD71
IcCracken, D.,
and
3.
of the Page Size Anomaly,
Institute of Technology,
Robertson, C.ai(P.L*)
--
An L*
Processor for C.ai, Rept. No. CMU-CS-71-106, Dept. of
omputer Science, arnegie-Mellon University, October
1971
McFa7)
IcFarland, C.,
A Language-Oriented -omputer Design,
Proc. of the FJ2C, Vol. 37, AFIPS Press, New York, pp.
629-640, 1970
McKei67
jqJ64
IcK;eeman, W.
M.,
Proc. if the FJCC,
413-417, 1967
Language Directed
Computer Design,
Vol. 31, AFIPS Press, New York, pp.
Meggitt, J. E.,
A Character Computer
for High-Level
Language Interpretation, IBM
Sys. J., Vol. 3,
No. 1,
pp. 68-78, 1964
Ml69
M1elliar-Smith, P. M., A Design for a Fast Computer for
Scientific Calculations, Proc. of the FJCC,
Vol. 35,
AFIPS Press, New York, pp. 201-208, 1959
MiLlI73
Mills,
H. D.,
Structured Programming,
Unpublished
Paper,
IBM Corporation,
Federal Systems Division,
Gaithersburg, Maryland, October 1970
Mos e7)
1
oses, J. , The Functioi of FUNCTION in LISP or Why the
FuNARG
Problem Shioall
be
Called
the
Environment
Problem, Artificial Intelligence Memo No. 199, Project
4AC, Massachusetts Institute of Technology, June 1970
MullA63
Mullery, A.
P.,
R. F. Schauer
and R. Rice,
ADAMi: A
Problem Driented Symbol Processor, Proc. of the SJCC,
Vol. 23, AFIPS Press, New York, pp. 367-380, 1963
OcjaB71
)rganick, E. I.,
and J. G. Cleary,
A Data Structure
Model of
the 357)) Computer System, SIGPLAN Notices,
Vol. 6, No. 2, pp. 83-145, February 1971
Plim72
Plummer, W. W.,
:omputers, Vol.
RalsA65
Ralston, A.,
A First Course in Numerical Analysis,
McGraw-Hill Book Company, New York, 1955
Rala:69
Ramamoorthy, C. V., and M. J. Gonzalez, A Survey of
rechniques
for
Recognizing
Parallel
Processable
Streams in Computer Programs, Proc.
of the FJCC, Vol.
35, AFIPS Press, New York, pp. 1-15, 1969
RanaC71
lamamoorthy, C. V.,
and
M. J. Gonzalez, Subexpression
3rdering in the Execution of Arithmetic Expressions,
Comm. of the ACM, pp. 479-485, July 1971
RiceR71
Rice, R., and W. d. Smith,
SYMBOL - A lajor Departure
from Classic Software Dominated
von Neumann Computing
Systems, Proc. of the 3JCC,
Vol. 38, AFIPS Press, New
fork, pp. 575-587, 1971
Asynchronous Arbiters, IEEE Trans. on
C-21, No. 1, pp. 37-42, January 1972
52
RPse363
Rosen,
S.,
Hardware
Design Retlecting
Software
Requirements, Proc. of the FJCC, Vol. 33, Pt. 2, AFIPS
Press, New YorK,
pp.
1443-1449,
1968
R>3s264
Ross, 2., et
al., A New Approach to Computer Ccmmand
Structures, Tech. Rept. No.
RADC-TDR-64-135, Rome Air
Development Center, Griffis Air Force Base, May 1964
RaggJ69
Ruggiero, J. F.,
and D. A. Coryell, An Auxiliary
Processing System for Array Calculations, IBM Sys. J.,
Vol. 8, No. 2, pp. 118-135, 1969
SammJ69
Sammet, J. E.,
Fundamentals,
717-719, 1969
Schr472
and J.
H. Saltzar, A Hardware
Schroeder,
M. D.,
Architecture for Implementing Protection Rings, Comm.
of the A:M, Vol. 15, No. 3, pp. 157-170, March 1972
SenzD65
Senzig, D. N., and R.
V. Smith, Computer Organization
for Array Processing, Proc. of
the FJ2O, Vol. 27, Pt.
1, AFIPS Press, New York, pp. 117-128, 1965
SenzD67
on High Performance
Senzig, D.
N.,
3bservations
lachines, Proc. of tae FJCC, Vol. 31, AFIPS Press, New
York, pp. 791-799, 1967
SethR70
Sethi,R., and J. D. Ullman, The Generation of Optimal
Expressions, J. of the ACM, Vol.
-ode for Arithmeti
17, No. 4, pp. 715-728, October 1970
ShAwJ53
Structure for Complex
et al., A Command
Shaw, J. C.,
Information Processing,
Proc. of
the
WJCC,
pp.
119-128, 1958
SinqS71
Adder for Multiple Operands
Singh, S., and R. Waxman,
IBM
Tech.
and its Application for Multiplication,
Components Division, East
IBM
rR
22.1356,
R3apt.
Fishkill, New York, 3ctober 1971
Stinl7l
Stone, H. S.,
on Computers,
SamnF71
Sumner, F. H., Operand Accessing in tae MU5 Computer,
Computer Society
the IEEE International
of
Proc.
1971
September
pp. 119-120,
.onf.,
Languages: History
Programming
Section
X.6.,
Prentice-Hall,
and
pp.
A Logic-in-Memory Computer, IEEE Trans.
pp. 73-78, January 1970
53
T
3rj7)
Thornton, J.
E.,
Design of
Data 6609, Scott, Foresman
Illinois, 1970
a Computer:
and
The Control
Company,
Glenview,
TharK71
Thurber, K. J., and J. W. Myrna, System Design of a
2ellular APL Computer, IEEE Trans. on Computers, Vol.
2-19, No. 4, pp. 241-3)3, April 1970
TiurK71
Ihurber, K. J.,
and R. 0. Berg,
Applications of
Associative Processors, Computer Design, Vol. 10, No.
11, up. 103-110, November 1971
Tjad71
Tjaden, 3. S., and M. J. Flynn, Detection and Parallel
Execution of Independent Instructions, IEEE Trans. on
-omputers, Vol. 2-19, No. 10,
pp. 884-895,
October
1970
TimaR67
romasulo, R. M.,
An Efficient Algarithm for the
Automatic Exploitation of :ultiple Execution Units,
IBM J. of Res. ani Dev.,
Vol. 11, No. 1, pp. 25-33,
January 1967
raikA71
rucker, A. B.,
and M. J. Flynn, Dynamic Microprogramming: Processor Organization and Programaing,
Comm. of the ACM, Vol. 14, No. 4, ACM, New York, pp.
243-253, April 1971
Ware472
dare, W. H., The Jltimate Computer, IEEE Spectrum, pp.
34-91, March 1972
7
Weber, it.,
A Microprogrammed Implementation
of EULER
an IBM System/363 Model 30, Comm. of the ACM, Vol. 10,
No. 9, pp. 549-553, September 1967
W.-oebe6
ZitsR71
Zaks,
R.,
Microprogrammed
APL, Proc.
International Computec
Society Conf.,
September 1971
of
pp.
the
IEEE
193-194,
Minu9is ani Miscellaneous
F1]
IBM System/350 Model 195, Functional Characteristics, Form
A22-6943-0, International
Business Machines Corporation,
Pougheepsie,
F2]
IBM
New Yock,
System/360 Model
August
195,
1969
Theory
of Operation:
System
Introduction and Instruction Processor, Form SY22-6855-0,
International Business Machines Corporation, Poughkeepsie,
New fork, August 1970
[13]
NCR
304 Electronic
Data
Processing Systam,
Programming
Manual, National Cash-Register Company, June 1960
(obtained through the courtesy of Jean Sammet)
[
]
,5]
Control Data
7600 Coumputer System,
Preliminary Reference
Manual, Pub. No.
60258230,
Control Data Corporation,
Minneapolis, Minnesota, 1969
Burroughs 36500/B7500 Information
Processing
Characteristics Manual, Burroughs Corporation,
Michigan,
Systems,
Detroit,
1967
[3'5]
IBM System/370 Model 165, Functional Characteristics, Form
GA22-6935-3, International Business Machines Corporation,
White Plains, New York, June 1970
[17]
A Guide to the IBM System/370 Model 165, Form GC20-1730-0,
International Business Macines Corporation, White Plains,
New York, June 1970
[13]
IBM System/370 Model 155, Theory
of Operation (Volume 2):
I-unit, Form SY22-6831-0,
International Business Machines
Corporation, White Plains, New York, January 1971
F19)
APL/360 User's Manual, Form Gd20-0683-1,
International
Business Machines Corporation, White Plains, New York,
March 1970
['10] PL/I
Language
Specifications,
Form
Y33-6003-1,
International Business Machines Corporation,
White Plains, New
York, April 1969
[%111] IBM
System/360
Operating
System,
PL/I
(F)
Language
Refarence Manual, Form C2d-8201-2, International Business
Machiaes Corporation, 4hite Plains, New York, October 1969
[412] Conversational Programming
Manual,
Fora GH20-0758-0,
System
(CPS),
Ierminal User's
International Business Machines
Corporation, White Plains, New York, January 1970
55
[113] Weissman, Z.,
LISP 1.5 Primer, Dickenson
Belmont, California, 1967
Publishing Co.,
[114] McCacthy, J.,
et al., LISP
1.5 Programmer's Manual, MIT
Press, Cambridge, Massachusetts, February 1965
[115] Griswold, R. E.,
et al.,
Language, Prentice-Hall, 1968
The
SNOBOL
4
Programming
[116] Sussman, G. J., and I. Winograd, Micro-Planner Reference
Manuil, Artificial Intelligence Memo No. 2M3, Project MAC,
Massachusetts Institute of Technology, July 1970
[117] Evans, A., Jr.,
PAL -Pedagogic Algorithmic Language,
Reference
Manual and
Primer,
Dept. of
Electrical
Engineecing,
Massachusetts Institute
of
Technology,
October 1970
[118] Reynolds, J. C., GEDANKEN - A Simple Typeless Language
Based on the Principle of Completeness and the Reference
Concept, Comm. of the ACM, Vol. 13, No. 5, ACM, New York,
pp. 338-319, May 1973
-
FINIS
-
Download