APPLICATION OF MICROCODE TO IMPROVE ... An Honors Thesis (ID 499)

advertisement
APPLICATION OF MICROCODE TO IMPROVE PROGRAM PERFORMANCE
An Honors Thesis (ID 499)
by
Dawn D. Greenwood
Thesis Director
,/
(advisor's signature)
Clinton P. Fuelling
Ball State University
Muncie, Indiana
May 18, 1981
Note:
Spring 1981
,
IP'
.
"",,'
~~,~ ~:"
""
)
,/l!
, ".) . : j
• (/l~q
CONTENTS
INTRODUCTION . . . . . . . . . . . .
1
MICROPROGRAMMING OPERATING SYSTEMS.
1
ARCHITECTURE REDEFINITION VIA MICROPROGRAMMING.
2
Manual Tuning . .
3
Automated Process
Frequency of Sequential Occurrence
Frequency of Execution
Combination Methods.
9
OPTIMIZATION . . . . . . .
-
-
9
10
11
13
Vertical Optimization
Removal of Nonessential Operations
Redundant Actions
Negated Actions
Code Motion. . . .
Code Consolidation
13
13
Horizontal Optimization
Local Compaction .
Compaction Algorithms
Linear Algorithm
Critical Path Algorithm.
Branch and Bound Algorithm
List Scheduling.
15
15
17
14
14
18
18
19
20
Global Optimization.
21
Register Allocation . . .
21
INTRODUCTION
The use of microcoding to improve program performance has introduced
new areas of study in recent years.
Microcoding operating syst:ems and
architecture redefinition via microprogramming are two such areas.
High-level microprogramming languages are also commanding great interest
and research.
All these techniques require optimization algorithms to
assure optimal execution times.
These areas will be considered in this
paper.
Due to the difficulty in locating resources, many articles were not
available for review.
In this case, care has been taken in guiding the
reader to the article by referencing it when appropriate.
A number of
books and articles in the bibliography offer a basic look at
microprogramming in general as well as the subjects covered in this
-
paper [AGRA74, AGRA76 , BROA74, CHUR70, DAVI78 , FULL76, HIGB78, HINS72 ,
JONE76a, JONE76b, KRAF81, LEVE77 , RAUS80, ROSI74, SALI76].
MICROPROGRAMMING OPERATING SYSTEMS
Since "users seldom (if ever) perform a computation without
assistance from the operating system," [DENN71] any impro'Tements in
execution rates of most frequently used segments should reflect
favorably on application programs.
Microcoding primitives used
throughout the operating system is one approach to improving operating
system time.
Larger segments can also be selected.
Choosing the most
appropriate segments is important to performance improvements [RAUS80,
BROW77].
-
In [BROW76] is a list of fourteen functions which should be
1
considered for microcoding.
These functions tend to be too exact for
hardware, relatively consistent, and noticeably affected by software
overhead.
Several applications in the area of software have been
discussed [HUBE70, BELG76, DHAU77, CHAT76, LUND76, DENN71, STOC77,
SOCK75] .
ARCHITECTURE REDEFINITION VIA MICROPROGRAMMING [RAUS76]
Called "tuning" by [ABDA74, ABDA73], the central concept is to
change a system's architecture to more efficiently solve a particular
problem.
Architecture in this sense refers to "the attributes of a com-
puter as seen by the programmer" [RAUS76].
This includes the memory and
particularly the machine instruction set which the programmer interacts
with.
The microprograms in control store which interpret the machine
instructions define a particular architecture.
By modifying these mi-
croprograms, the architecture changes accordingly.
Such modification is necessary for several reasons.
Typical in-
struction set design has followed hardware constraints instead of considering the problems to be solved and their structure.
The results are
instruction sets which are awkward and inefficient [RAUS76, WADE75].
In
addition, the general-purpose architecture was conceived through a
series of compromises that permit it to perform the widest. range of
services.
Like the jack-of-all-trades, it is frequently roaster of none
and handles all its functions in a suboptimal fashion [HUSS70].
To counteract these compromises and tune the instruction set, carefully chosen segments of machine code are microcoded, placed in control
2
storage and referenced by one new machine language instruction.
The
major advantage of this approach is the decrease in machine instruction
fetch-and-decode time.
Considering the average (FORTRAN) program spends
forty percent of its execution time on this function, it is estimated
that savings can easily be twenty percent [RAUS76].
microcode can be optimized.
Additionally, the
It is important to stress that optimization
strategies are vital to architecture redefinition success [ABDA74].
The
final section covers optimization strategies.
Isolation of the appropriate segments can be a manual process based
on environmentally determined areas of processing inefficiency.
An au-
tomated process can be based on the frequency of sequential occurrence
in an environment, the frequency of execution, or a combination thereof.
[RAUS76]
MANUAL TUNING
Manual tuning is the creation of specialized microcoded instructions
when the code is written by a programmer directly or in some microprogramming language.
r·
.
[ABDA74] applied the term to tuning efforts prior to
when writable control store made automated tuning feasible.
It is no
less applicable to vendor packages and installation-written code
designed to "speed up" a particular feature or activity.
Those activities most likely to benefit by microprogramming are CPU
bound, use a number of intermediate results, are highly repetitive, or
are not easily handled by existing machine language instructions
[CLAP72].
Such features include multiply routines, square root al-
gorithms, matrix operations, table searches, array address calculation,
.3
number generation, stacking routines, and tree searches [COOK70, HUSS70,
TUCK71, CLAP72, REIG72, REIG73, PARK73 , SNYD75 , HABI74 , TMfR75].
In addition, applying manual tuning concepts when an instruction set
is being designed will assure an application oriented architecture
[WADE72, WADE 73 , HARR73, WADE75].
[WADE75] capitalized on the sim-
ilarity of languages' constructs to design a "general-purpose" architecture capable of supporting a number of widely-accepted programming
languages.
Wade's basic instruction set included simple arithmetic,
compare, branch, logic, and load/store instructions.
In addition to
this "kernel" [RAUS76] language, Wade designed a set of "super
instructions" to efficiently execute five common high-level constructs.
These operations include arrays and array indexing, repetition loops,
block structure housekeeping, input/output editing, and character string
-
manipulation.
Each feature will be examined briefly, with somewhat more
detail on array indexing to allow for better understanding of the process of developing application oriented instructions.
ARRAYS AND ARRAY INDEXING
Wade designed a new machine language instruction ADGEN.
This in-
struction enables a microprogram which was specially written and placed
in control storage by Wade to calculate the address of an array element
and use that location in one of several ways.
A high-level language
version of the algorithm used to calculate the address is given below.
-
4
SUM
=
~
=1
For i
step 1 until n do
begin
(upper-bound [i] - lower bound Ii] + 1)
sum
= sum *
= sum +
sum
= sum *
element - size
sum
(subscript + [i] - lower bound [i])
end
computed - address
= array
- address + sum
The microcoded algorithm uses information stored in the instruction
and the array descriptor.
These formats are given below.
ARRAY DESCRIPTOR FORMAT
n)S array)S lower-bound - 1)S upper-bound - 1)S ... )S lower-bound - n)S
upper-bound - n)S element size (in bytes)
where:
n
array
-
= number of dimensions of array
= address of array in memory
5
INSTRUCTION FORMAT
ADGE~5~~
address of array descriptor
subscript -
~
~
subscript -
n~
flags
5~
= instruction operation code
1~ ... ~
index register
where:
flags
index register
= indicate
action to be performed with location
received calculated address
The algorithm is designed for arrays stored in row order internally.
For those stored by columns (FORTRAN), the subscript fields are
interchanged.
Repetition Loops (DO- or FOR-loops)
Observing that such loops can vary in intricacy, Wade designed four
different instructions.
Two instructions handle the simplier cases
requiring little more than an increment-compare-branch sequence, while
the remaining instructions were used for the more complicated loops.
Block Structure
Block structures generate overhead due to the housekeeping involved
in the dynamic manipulation of storage necessary for nested BEGIN blocks
and procedures CALL/RETURN statements.
When handled by inappropriate
instruction sets, that overhead becomes excessive.
6
To implement these
functions in microcode, BEGIN Block and END Block instructions, two
procedure calls, and a single RETURN statement were designed.
Input/Output Editing
I/O editing refers to the conversion of decimal character numbers
into binary form for internal storage and manipulation and the conversion back to decimal numbers for external use.
The instruction CONVERT
TO DECIMAL converts one binary number to its decimal representation.
Input instructions were provided for conversion to binary form.
Character String Manipulation
Wade's goal was to provide for the majority of the manipulation
features of PL/l.
A string descriptor similar to that of arrays
provided direct and indirect addressing capabilities.
usage of varying-length strings.
This simplifies
MOVE and COMPARE statements were
designed to best handle these features.
PERFORMANCE RESULTS
Wade's improved machine language instruction set was implemented and
its performance compared to that of the IBM 370 architecture.
of PLjl and FORTRAN programs were executed on both machines.
A number
The fol-
lowing chart lists the execution time improvements realized by the highlevel language oriented architecture over the IBM general purpose
architecture.
7
-
Percent Improvement
Program Function
PL/_1_
Finding prime numbers by sieve of Eranosthenes
32 . 2%
Generates random sentences from English grammar
64.2%
FORTRAN
Convert arithmetic expression from infix to
87.4%
postfix notation
54.2%
27.9%
51.7%
50.5%
Multiplication of two matrices
72..3%
31.4%
Evaluation of a determinant
85.9%
Adds record to linked list
Solving differential equation by fourth order
.-
Runge-Kutta method
In general, those programs performing non-numeric processing showed
greater performance improvements.
of Hans Jeans [JEAN65].
This is in keeping with the findings
Jeans found that arithmetic operations were
hardware bound and that the decrease in instruction fetch and execute
time was less significant than in logical operations.
-
8
AUTOMATED PROCESS
With the increased use of the writable control store, thought turned
to automating the manual process of tuning.
In this way each group of
similar programs, or each program, can have its own machine architecture
with instructions chosen to assure its optimal execution [ELAY77].
This
architecture is dynamically loaded into control store prior to program
execution.
When a unique architecture is created for each program, the
process is called dynamic problem oriented redefinition of computer architecture via microprogramming [RAUS76, RAUS75 , RAUS78 , RAUS80].
When
similar programs are grouped together and an improved instruction set
created for each group, this is called heuristic tuning [ABDA74].
Several methodologies exist for determining which groups of instructions should be replaced by new machine instructions to yield the
-
greatest execution gains with minimal control store requirements.
(Because of its expense, control store must be considered a limited
resource.
If this were not the case, entire programs could be
microcoded.)
These include the frequency of sequential occurrence and
the frequency of execution.
Heuristic tuning [ABDA74] and [RAUS76]'s
dynamic redefinition of architecture enlist both methods in an attempt
to achieve optimal performance improvement.
Frequency of Sequential Occurrence
This method is based on the observation that instructions characteristically occur in identical sequences.
This is found not only on the
machine instruction level, but also on the intermediate level generated
,9
by high-level language compilers.
The intermediate language code is
analyzed to locate such sequences and to determine how frequently they
occur.
From this information, sequences which would yield the greatest
execution time savings if microcoded are selected for new machine
instructions.
chosen.
The difficulty here is the length of sequences to be
Longer sequences occur less frequently, while shorter sequences
occur more often.
While execution time improvements are achieved through this process,
they are not optimal.
Typically, no attempt is made to determine rela-
tionships between sequences.
available.
Additionally, run time information is not
There is no way to determine if some sequences are execu-
tioned repeatedly as in a loop. [RAUS76]
Frequency of Execution
This method uses the execution behavior of a program to choose the
instructions to be microcoded.
blocks.
The program is analyzed into program
These are straight-line sections of code such that if the first
operation in the block is performed, all succeeding code is executed.
By knowing the number of times the block is executed and
1~he
execution
time improvement of a single microprogrammed block over that of the corresponding machine instructions, total execution time savings can be
calculated.
Those blocks with the greatest savings are chosen as new
machine language instructions.
-
10
Combination Methods
Rauscher and Agrawala [RAUS76, RAUS75 , RAUS78] devised a dynamic
redefinition scheme combining the previous two methods.
Program
sequences are considered for microcoding on the basis of both their frequency of occurrence and their frequency of execution.
The intermediate representation from the high-level language compiler is analyzed to determine the different instruction sequences and
where they occur in the program.
each segment are calculated.
ment will be executed.
The expected savings froln microcoding
Estimates are made on how often each seg-
With this additional information, total
projected time saved can be calculated.
Those instruction sequences
with the highest potential time savings are microcoded as new
instructions.
Experimentally, execution time improvements were found to
be between twenty-five and fifty percent.
In heuristic tuning [ABDA74], programs are classified according to
their application, language and/or special requirements.
A trace file
for each application class is built by monitoring the execution of numerous programs from that class.
This data may be collected through
hardware, software or microprogrammed means.
The trace file data is
then processed to obtain statistics on instruction execution frequencies, instruction dependencies and similar information.
These sta-
tistics make it possible for a synthesis algorithm to modify existing
microcode or create new microprograms which will allow optimal execution
of class programs.
11
The first step performed by the algorithm is the location of program
loops using trace file information.
Within the most frequently executed
loop, an optimization step takes place.
In it, arrangements are made so
that the most frequently accessed data is pre loaded into internal
registers.
Next, the instructions within the loop are decomposed into
micro-operations, optimized, and loaded into control store.
The trace
file is updated and an estimate of the performance improvernent is made.
This algorithm repeats, each time choosing the loop to be tuned from
those remaining.
It terminates when the algorithm determines that any
further execution improvements could not be greater than the overhead
they generate.
If the algorithm determines there are either no loops or very long
loops, synthesis of new instructions can still take place.
The trace
file supplies the most frequently occurring code sequences and an algorithm similar to the one used above combines the dependent code into
new instructions.
The new instruction set is tested to determine that it. is functionally equivalent to the general instruction set.
Once its results can be
trusted, the new architecture is ready to be incorporated into the existing system.
Several software modifications must be made.
Two programs were heuristically tuned and their original execution
compared with that using the synthesized instruction.
A data movement
example resulted in a 87.4% improvement in execution time.
A Fibonacci
number generation program achieved a 77% improvement [ABDA74].
It
should be emphasized that these examples reflect specific algorithms.
12
More complicated application programs would probably achieve somewhat
less of an improvement.
OPTIMIZATION
Optimization has several meanings depending on what concepts the
particular author wants to emphasize.
While optimization can refer to
decreasing the number of bits in a microinstruction, this material will
concentrate on methods of reducing the execution time by decreasing the
number of microinstructions in a microprogram [MALL78].
categories of algorithms which optimize microcode.
There are two
Vertical optimiza-
tion deals with decreasing the number of microoperations in a sequence
of microoperations.
Horizontal or microcode optimization., also known as
compaction, is the process of creating microinstructions in a machine
whose control word allows a great deal of parallelism.
This means a
large number of hardware devices can perform concurrently, and therefore
a number of microoperations can execute simulatenously.
Compaction can
be performed locally within straight-line sequences or globally
throughout the program.
Additionally, register allocation is discussed
because of its relationship to program performance.
Vertical optimization
Until recently, microcode was painstakingly optimized by hand.
When
automation of the process was considered, the natural route was to
modify processes that were used in standard compilers on predominantly
sequential code [ALLE69, ALLE72].
Kleir and Ramomoothy [KLEI71] were
13
the first to consider vertical optimization algorithms for microcode as
an alternative to hand optimization.
One method of optimization borrowed from traditional compilers is
removal of nonessential operations such as redudant actions and negated
actions.
Redudant actions are those available from a previous operation
which used identical input and identical destination of output.
Additionally, the output destination must have remained unchanged, and
the execution of one action must guarantee execution of the other to assure consistent performance.
Negated actions are those from which the
output is never used.
Code motion is the act of decreasing the number of operations performed in those segments of the program most frequently execut.ed.
program activity as a guide, dynamic analysis rates areas as
-
execution frequency to determine proper code movement.
t~o
Using
their
Unfortunately,
dynamic analysis is prone to suffer from false assumption and extensive
time usage.
Code motion without the benefit of execution statistics is
called static analysis.
It is assumed that the level of nesting indi-
cates frequency of execution.
Actions are migrated from inner to outer
segments [KLEI71].
Code consolidation is another possible vertical optimization.
It is
feasible on certain machines that a sequence of simpler instructions can
be replaced by one complex instruction.
This concept has been used in
the PLjMP compiler [TAN78].
The PLjMP compiler takes a high-level language and by applying both
machine dependent and machine independent optimizations, produces
14
microinstructions.
Similar compilers and the corresponding high-level
languages are discussed in a number of articles. [ECKH71, TIRR73,
BLAI74, BOND74, CLAR72, DASG78b, DASG80, LEWI79, DEWI76a, SCHR74,
FRIE76, DEWI76b, LLOY74b, RAMA74, TSUC72].
The intermediate representation of the compiler is a stream of primitive register-to-register operations.
After other vertical optimiza-
tions are applied, the code is searched for sequences which satisfy
predefined substitution rules.
As an example:
an add-immediate instruction
ADDIM b,@b,100
a load indirect
LOADI y,b
and an add instruction
ADD z,x,y
can be performed by one instruction
add-indirect-with-offset
ADDIO z,x,@b,100
The substition would be made only if the first three instructions
could be deleted.
If a subsequent operation uses y, this is not the
case.
Horizontal Optimization
Local Compaction
Local compaction has been the subject of much recent study [YAU74,
AGER76, LLOY74a, RAMA73, RAMA74, DASG76a, DASG76b, JACK74, TSUC74,
TSUC76, DEWI76a, TABA74, TOK077, MALL78, WOOD78, WOOD79, FISH79].
-
15
-
Because of the stringent performance expectations placed on microcode,
earlier study was directed toward producing optimal code.
The amount of
work this requires is prohibitive since all possible mappings of microoperations to microinstructions are found [AST071].
The time
required for such exhaustive work increases exponentially with the number of microoperations [LAND80].
Methods have been developed more
recently [TOK077, MALL78, WOOD79, FISH79] which result in optimal or
near-optimal code in polynomial or linear time.
This breakthrough makes
horizontal microcode compilers feasible.
Local compaction algorithms typically require two inputs: a
straight-line microcode section which contains no internal branches or
entry points; and some representation of the relationship between the
microoperations.
To maintain data integrity, it is vital that. the orig-
inal semantics of the sequential microcode be preserved.
This obviously
requires that certain microoperations be executed in a particular order.
Such microoperations are said to have a data interaction.
Given two
sequential microoperations a and b, the following three conditions
define data interaction:
1.
b requires an output resource of a as an input
2.
a requires an input source which b modifies
3.
a and b modify the same output resource [LAND80].
These interactions must be analyzed and represented so that the information can be used to create microinstructions.
Landskov [LAND80]
presented a general algorithm to record these data interactions in
-
16
graphical form.
Each microoperation is added to the graph in a sequen-
tial order and is represented by a node.
The new node must be connected
to previously placed nodes to indicate the data interactions.
It is
linked only to those nodes on which it is directly data dependent.
This
implies that nodes further up the path from the lowest node having a
data interaction with the new node are not also linked to it.
The
search first checks the last node on each path (the leave8) for data
interaction.
If the interaction is present the nodes are linked.
Otherwise, testing takes place on each preceeding node working up from
the leaves until the test is positive or a node is reached which is
already indirectly linked with the new node.
Once all microoperations
are added to the graph, the resulting tree indicates the sequencing
necessary to assure semantical equivalency.
Compaction Algorithms
Landskov [LAND80] suggests four categories for compaction
algorithms.
These include linear analysis, critical path, branch and
bound, and list scheduling.
parts.
Each algorithm consists primarily of two
First, a data dependency analysis orders the microoperations so
as to assure the semantics are not compromised when microinstructions
are created.
Secondly, a conflict analysis monitors the .assignment of
microoperations to particular microinstructions to assure there are no
hardware conflicts.
17
Linear Algorithm
The data dependency analysis begins with a list of microinstructions
(originally empty).
Proceeding sequentially, microoperations are
selected and the listed of microinstructions are searched from bottom to
top to determine the rise limit.
This is the first microinstruction in
which the microoperation can be included given the data dependencies.
The microinstruction list is searched top-to-bottom in conflict
analysis.
Beginning at the rise limit, search proceeds downward until a
microinstruction is found which can include the microoperation without
hardware conflict.
If a microoperation having a rise limit cannot be
included in an existing microinstruction, a new one is created at the
bottom of the list.
If the microoperation has no rise limit, it is in-
cluded in a new microinstruction at the beginning of the list.
linear algorithm does not guarantee optimal code.
The
Dasgupta's work has
produced an algorithm that is the model for Landskov's [DASG76, DASG77,
BARN78 , DASG78a, DASG79].
Critical Path Algorithm
An early partition is created as the first step.
By \llOrking down
through the microoperations using the data dependency graph, the
earliest that each microoperation can be executed, assuming there is no
conflicts for hardware, is determined.
The total number of steps or
frames needed to fit in all microoperations under these conditions is
the minimum number of microinstructions which are required for that
sequence.
Next, working upward, microoperations are placed in the last
18
possible frame in which they can be executed so that the minimum number
of frames determined earlier can be maintained.
Critical microopera-
tions are those with the same early and late timing and constitute the
critical partition.
The critical partition is then modified by dividing
any frame in which there is hardware conflicts into two (or more)
frames.
To form the final microinstructions, the non-critical Inicrooperations are added to the modified critical partition.
Each such microop-
eration can be included from their frame in the early partition to their
frame in the late partition inclusive.
If a hardware conflict does not
allow a microoperation to be included within these frames, a new microinstruction is inserted just following the last partition position.
This algorithm could result in many more microinstructions than is
optimal.
Its weakness is that adjacent microinstructions which could be
combined are not if they are in the different frames.
This algorithm
was apparently devised from critical path processor scheduling [RAMA79]
by Ramamoorthy and Tsuchiya [RAMA74].
Branch and Bound Algorithm
Using the data dependency graph, a data available set (Dset) is built which contains those microoperations which have all of their
directly data dependent microoperations allocated to a microinstruction.
Obviously, the original Dset consists of those microoperations with no
parent nodes in the graph.
Using the Dset, microinstructions are formed
such that every possible member of the Dset are included.
19
This is a
complete instruction and the collection of these is used to build a tree
whose nodes correspond to the microinstructions.
In BAB exhaustive, a
complete tree is built whose paths represent every microinstruction ordering possible.
From this tree, the path of optimal leng1:h is found
and the corresponding complete instructions are the shortest possible
microcode.
detail.
[LAND80] and [MALL78] explain this algorithm in great
This microcode compaction algorithm was originally presented by
Yau, Schowe, and Tsuchiya [YAU74].
Additionally, different neuristics
can be applied to the branch and bound algorithm to decrease the number
of possible paths and to cut execution time dramatically while still
achieving near-optimal results [MALL78].
List Scheduling
In list scheduling algorithms, only a single branch of a tree is
formed by choosing the "best" complete instruction from each Dset.
"Best" is determined by a weighting function.
The choice of function is
important to achieving optimal microcode [FISH79].
One possible weight
for a microoperation is the number of descendants that microoperation
has in the data dependency graph [WOOD78].
Thus when choosing a mi-
crooperation from the Dset, the order is from highest to lowest weight.
Each microoperation is added to the microinstruction unless a hardware
conflict occurs.
When all Dset members have been considered, a new Dset
is created and a new microinstruction is started.
20
Global Optimization
Unlike local optimization, global optimization can look beyond segments to include loops and resursive subroutines.
Its prinlary goal is
to eliminate the NOP's positioned within microinstructions in the interest of proper timing of operations [LAND80].
Global optimization is
an area where little research has been done until very recently.
[WOOD79, DASG79, MALL78 , FISH79, TOK078] deal with this subject.
[TOK078] has developed a global routine based on a microoperation
model called a microtemplate.
This microtemplate is two-dimentional,
representing both timing and machine resources.
[TOK078] uses an opti-
mization algorithm which seeks primarily to eliminate NOP's positioned
at the beginning and/or end of sequential segments by consoidation, or
entire redundant microtemplates, in either an upward or downward
direction.
When compared to hand optimized code, the globally optimized
code displayed an average improvement of 5.4%.
Similar approaches used
by [WOOD79] and [FISH79] treat a locally compacted sequence as a primitive in a larger sequence.
Register Allocation
Computers designed for efficient microprogramming typically have a
large number of very fast registers.
Ideally, there are enough regis-
ters so that each variable can be permanently assigned to a register
during program execution.
When there are too few registers, variables
must sometimes be stored in memory.
This means a store-and-load
sequence is often required when a variable is brought into a register.
21
This additional overhead has been shown to affect performance.
Liu
[LIU75] found that having sufficient registers could improve microprogram execution time by thirty-three percent over the time
registers.
\~ith
no
Obviously, it is important to determine which variables are
most frequently referenced and to assign these variables to registers
first.
It is also important to have an efficient realloca"tion algorithm
to keep reallocations to a minimum [DEWI76, TAN77].
With memory costs decreasing, manufacturers are tending to include
even more high-speed microprogramming registers in their machines.
This
trend could make the problem of register allocation of minimal concern
in the future.
22
BIBLIOGRAPHY
-
[ABDA73]
ABD-ALLA, A. M., AND KARLGAARD, D. C. "The heuristic
synthesis of application-oriented microcode,!! in Proc. 6th
Annual Workshop on Microprogramming (ACM) , 1973, pp. 83-90.
[ABDA74]
ABD-ALLA, A. M., AND KARLGAARD, D. C. "Heuristic synthesis of
microprogrammed computer architecture," IEEE Transactions on
Computers C-23, (August 1974), pp. 802-807.
[AGER76]
AGERWALA, T. "Microprogram optimization: A survey," IEEE
Transactions on Computers C-25, 10(October 1976), pp. 962-973.
[AGRA74]
AGRAWALA, A. K. AND RAUSCHER, T. G. "Microprogramming:
Perspective and status," IEEE Transactions on Co~ers C-23,
(August 1974), pp. 817-837.
[AGRA76]
AGRAWALA, A. K. AND RAUSCHER, T. G. Foundations of
microprogramming: Architecture, software, and applications,
ACM Monograph Series, Academic Press, New York, 1976.
[ALLE69]
ALLEN F. E. "Program optimization," Annual Review of Automatic
Programming 5, (1969).
[ALLE72]
ALLEN, F. E., AND COCKE, J. "A catalogue of optimizing
transformations," in The Design and Optimization. of Compilers,
R. Rustin, Ed., Prentice-Hall, Englewood Cliffs, N.J., 1972.
[BARN78]
BARNES, G. E. "Comments on the identification of maximal
parallelism in straight-line microprograms," IEEE Transactions
on Computers C-27, 3(March 1978), pp. 286-287.
[BELG76]
BELGARD, R. A. "A generalized virtual memory package for BIT
interpreter writers," SIGMICRO Newsletter 7, 4(December 1976),
pp. 31- .
[BLAI74]
BLAIN, G. PERRONE, M., AND HONG, N. X. "A compiler for the
generation of optimized microprograms," SIGMICRO Newsletter 5,
3(Oct. 1974), pp. 49-67.
[BOND74]
BONDI, J. 0., AND STIGALL, P. D. "Designing HMO, an
intergrated hardware microcode optimizer," in Proc. 7th Annual
Workshop on Microprogramming (ACM) , Oct. 1974, pp. 268-276.
[BROA74]
BROADBENT, J. K. "Microprogramming and system architecture,"
Computer Journal 17, 2(Feb. 1974), pp. 2-8.
[BROW76]
BROWN, G. E. et. al. "Operating system enhancement through
microprogramming," SIGMICRO Newsletter 7, l(March 1976), pp.
28-33.
[BROW77]
BROWN, G. E. et. al. "Operating system enhancement through
firmware," SIGMICRO Newsletter 8, 3(September 19'?7), pp. 119133.
[CHAT76]
CHATTERGY, R. "Microprogrammed implementation of a scheduler,"
SIGMICRO Newsletter 7, 3(September 1976), pp. 15-19.
[CHEN7!!']
CHEN, T. C. "Unconventional superspeed computer systems," IBM
Thomas J. Watson Research Report RC 782, Yorktown Heights
(Nov. 1970).
[CHUR7!!']
CHURCH, C. C. "Computer instruction repertoire - time for a
change," in Proc. 1970 AFIPS Spring Joint Computer Conference,
AFIPS Press, Montvale, N.J., pp. 343-349.
[CLAP72]
CLAPP, J. A. "The application of microprogramming technology,"
SIGMICRO Newsletter 3, l(April 1972), pp. 8-47.
[CLAR72]
CLARK, R. K. "Mirager, the 'best-yet' approach for horizontal
microprogramming," in Proc. ACM Nat. Congress, 1972, pp. 554571.
[COOK70]
COOK, R. W., AND FLYNN, M. J. "System design of a dynamic
microprocessor," IEEE Transactions on Electronic Computers C19, 3(March 1970), pp. 213-222.
[DASG76]
DASGUPTA, S., AND TARTAR, J. "The identification of maximal
parallelism in straight line microprograms," IEEE Transactions
on Computers C-25, 10(Oct. 1976), pp. 986-992.
[DASG77]
DASGUPTA, S. "Parallelism in loop free microprograms," in
Information Processing 77 (Proc. IFIP Congress 1977)l B.
Gilchrist, Ed., North-Holland Amsterdam, 1977.
[DASG78a] DASGUPTA, S. "Comment on the identification of maximal
parrallelism in straight-line microprograms," IEEE
Transactions on Computers C-27, 3(March 1978),-r~ 285-286.
[DASG78b] DASGUPTA, S. "Towards a microprogramming language schema," in
Proc. 11th Annual Workshop on Microprogramming (ACM), Nov.
1978, pp. 144-153.
[DASG79]
DASGUPTA, S. "The organization of microprogram stores,"
Computing Surveys 11, l(March 1979), pp. 40-65.
[DASG8!!']
DASGUPTA, S. "Some aspects of high-level microprogramming,"
Computing Surveys 12, 3(Sept. 1980), pp. 295-323.
[DAVI78]
DAVIDSON, S., AND SHRIVER, B. D. "An overview of firmware
engineering," Computer 11, 5(May 1978), pp. 21-33.
[DENN71]
DENNING, P. J. "Third generation computer systems," Computing
Surveys 3, 12(Dec. 1971), pp. 175-216.
[DEWI76a] DEWITT, D. J. "A machine-independent approach to the
production of horizontal microcode," Ph.D. dissertation, Univ.
Michigan, Ann Arbor, June 1976; Tech. Rep. 76 DT~ Aug. 1976.
[DEWI76b] DEWITT, D. J. "Extensibility - A new approach for designing
machine independent microprogramming languages," in Proc. 9th
Annual Workshop on Programming
SIGMICRO Newsletter 7,
3(Sept. 1976), pp. 33-41.
[DHAU77]
D'HAUTCOURT-CARETTE, F. "A microprogrammed virtual memory for
eclipse," SIGMICRO Newsletter 8, 2(June 1977), pp. 10-20.
[ECKH71]
ECKHOUSE, R. H., JR. "A high-level microprogramming language
(MPL)," in Proc. 1971 AFIPS Spring Joint Computer Conference,
AFIPS Press, Montvale, N.J., pp. 169-177.
[ELAY77]
EL-AYAT, K. A., AND HOWARD, J. A. "Algorithms for a selftuning microprogrammed computer," SIGMICRO Newsletter 8,
3(Sept. 1977), pp. 85-91.
[FISH79]
FISHER, J. A. The optimization of horizontal microcode within
and beyond basic blocks: An application of processor
scheduling with resources, U.S. Dept. of Energy Report,
Mathematics and Computing C~~-3~77-161, New York Univ., Oct.
1979.
[FRIE76]
FRIEDER, G. "The FORTRAN project - A multifaceted approach to
software-firmware high level language support," SIGMICRO
Newsletter 7, 3(Sept. 1976), pp. 47-50.
[FULL76]
FULLER, S. H., LESSER, V. R., BELL, C. G., AND KAMAN, C. H.
"The effects of emerging technology and emulation requirements
on microprogramming," IEEE Transactions on Computers 25,
10(Oct. 1976).
[HABI74]
HABIB, S. "Microprogrammed enhancements to higher level
languages - an overview," in Proc. 7th Annual Workshop on
Microprogramming (ACM) , Oct. 1974, pp. 80-84.
[HARR73]
HARRISON, M. C. "Language oriented instruction sets," in Proc.
ACM SIGPLANjSIGMICRO Interface Meeting, (1973).
[HIGB78]
HIGBIE, L. C. "Overlapped operation with microprogramming,"
IEEE Transactions on Computers C-27, 3(March 1978), pp. 270275.
[HINS72]
HINSHAW, D. 1. "Optimal selection of functional hardware units
for microprogram controlled central processing units," Ph.D.
dissertation, Univ. of Michigan, 1972.
[HUBE7~]
HUBERMAN, B. J. "Principles of operation of the Venus
microprogram," MITRE Corp., Bedford, Mass., MTR 1843, July
1970.
[HUSS7~]
HUSSON, S. S. Microprogramming: Principles and Practices,
Prentice-Hall, Englewood Cliffs, N.J., 1970.
[JEAN65]
JEANS, H. "Possible performance improvements by
microprogramming for System/360 Models 40, 50, and 65," IBM
Data Processing Division Report #320-3203, (1965).
[JONE76a] JONES, L. H. "Microprogramming bibliography: 1951-Early 1975,"
SIGMICRO Special Issue, (Mar. 1976).
[JONE76b] JONES, L. H. "An annotated bibliography of microprogramming:
Late 1975-Mid 1976," SIGMICRO Newsletter 7, 4(Dec. 1976) pp.
95-122.
-
[KLEI7l]
KLEIR, R. L., AND RAMAMOORTHY, C. V. "Optimization strategies
for microprograms," IEEE Transactions on Computers C-20,
7(July 1971), pp. 783-795.
[KRAF81]
KRAFT, G. D., AND TOY, W. N. Microprogrammed Control and
Design of Small Computers, Prentice-Hall, Englewood Cliffs,
N.J., 1981.
[LAND8~]
LANDSKOV, D., DAVIDSON, S., AND SHRIVER, B. "Local microcode
compaction techniques," Computing Surveys 12, 3(Sept. 1980),
pp. 261-294.
[LEVEn]
LEVENTHAL, L. 1. "Microprocessors and microprogramming: a
review and some suggestions," Simulation 29, (Aug. 1977), pp.
56-59.
[LEWI79]
LEWIS, T. G., MALIK, K., AND MA, P. Y. "Firmware engineering
using a high level microprogramming system to implement
virtual instruction set processors," Dept. of Computer
Science, Oregon State Univ., Corvallis, 1979.
[LIU75]
LIU, P. S. "Dynamic microprogramming in Fortran environment, "
Ph.D. dissertation, Purdue Univ., West Lafayette, In., Aug.
1975.
[LLOY74a] LLOYD, G. R. "Optimization of microcode for horizontally
encoded microprogrammable computers," Master's Thesis, Brown
Univ., Providence, Rhode Island, June 1974.
[LLOY74b] LLOYD, G. R., AND VAN DAM, A. "Design considerations for
microprogramming languages," in Proc. Nat. Comput. Con!. 43,
AFIPS Press, Arlington, Va., 1974, pp. 537-543.
-
[LUND76]
LUNDELL, E. D., Jr. "Reloadable control store marks two models
added to IBM370 system line," Computerworld X, (July 5, 1976)
pp. 1, 5.
[MALL78]
MALLETT, P. W. "Methods of compacting microprograms," Ph.D.
dissertation, Univ. Southwestern Louisiana, Lafayette, Dec.
1978.
-
[McKE67]
McKEEMAN, W. M. "Language directed computer design," Proc.
1967 AFIPS Fall Joint Computer Conference, AFIPS Press,
Montvale, N.J., pp. 413-174.
[MOFF76]
MOFFETT, L. H. "On-line virtual architecture tuning using
microcapture," Ph.D. dissertation, The George Washington
Univ., 1976.
[NESS79]
NESSETT, D. M., AND RECHARD, o. W. "Allocation algorithms for
dynamically microprogrammed multiprocessor systems," Computer
Journal 22, 5(May 1979), pp. 155-163.
[NOWL75]
NOWLIN, R. W. "Effective computer architecture for
microprogrammed machines," Ph.D. dissertation, Texas Tech.
University, 1975.
[PARK73]
PARK, H. "Fortran enhancement," in Proc. 6th Annual Workshop
on Microprogramming (ACM) , Sept. 1973, pp. 156-159.
[PATT76]
PATTERSON, D. A. "STRUM: Structured microprogramming system
for correct firmware," IEEE Transactions on Computers, 10(Oct.
1976), pp. 974-985.
[PATT78]
PATTERSON, D. A. "An approach to firmware engineering," in
Proc. Nat. Comput. Conf. 47, AFIPS Press, Arlington, Va.,
1978, pp. 643-647.
[PATT79]
PATTERSON, D. A. "An experiment in high level language
microprogramming and verification," unpublished manuscript,
Dept. of Electrical Engineering and Computer Science, Univ.
California, Berkeley, 1979.
[RAMA69]
RAMAMOORTHY, C. F., AND GONZALEZ, M. J. JR. "A survey of
techniques for recognizing parallel processable streams in
computer programs," in Proc. 1969 AFIPS Fall Joint Computer
Conference, AFIPS Press, Montvale, N.J., pp. 1-15.
[RAMA74]
RAMAMOORTHY, C. F., AND TSUCHIYA, M. "A high-level language
for horizontal microprogramming," IEEE Transactions on
Computers C-23, 8(Aug. 1974), pp. 791-801.
[RAUS75]
RAUSCHER, T. G. "Dynamic problem oriented redefinition of
computer architecture via microprogramming," Ph.D.
dissertation, Univ. of Maryland, 1975.
[RAUS76]
RAUSCHER, T. G., AND AGRAWALA, A. K. "Developing application
oriented computer architectures on general purpose
microprogrammable machines," in Proc. Nat. Comput. Conf. 45,
AFIPS Press, Arlington, Va., 1976, pp. 715-722.
[RAUS78]
RAUSCHER, T. G., AND AGRAWALA, A. K. "Dynamic problem oriented
redefinition of computer architecture via microprogramming,"
IEEE Transactions on Computers C-27, (Nov. 1978), pp. 10061014.
-
[RAUS8~]
RAUSCHER, T. G. "Microprogramming: A tutorial and survey of
recent developments," IEEE Transactions on Compu1:ers C-29,
l(Jan. 1980), pp. 2-20.
[REIG72]
REIGEL, E. W., FABER, V., AND FISHER, D. A. "The interpreterA microprogrammable building block system," in Proc. 1972
AFIPS Spring Joint Computer Conference, AFIPS Press, Montvale,
N.J., pp. 705-723.
[REIG73]
REIGEL, E. W., AND LAWSON, H. W. "At the progranuning languagemicroprogramming interface," in Proc. ACM SIGPLAN/SIGMICRO
Interface Meeting, May 1973, pp. 2-18.
[ROSI74]
ROSIN, R. F. "The significance of microprogramming," SIGMICRO
Newsletter 4, l(Jan. 1974), pp. 24-38.
[SAFA8~]
SAFEE, B. "A microprogrammed approach to efficient computer
language translation and execution," Ph.D. dissertation, State
Univ. of New York at Buffalo, 1980.
[SALI73]
SALISBURY, A. B. "The Evaluation of Microprogram Implemented
Emulators," Technical Report No. 60, Digital Systems
Laboratory, Stanford University, Stanford, Calif., July 1973.
[SALI76]
SALISBURY, A. B. Microprogrammable Computer Architectures,
American Elsevier Publishing Co., New York, 1976.
[SCHR74]
SCHREINER, E. "A comparative study of high level microprogramming languages," SIGMICRO Newsletter 5, l(April 1974),
p. 4.
[SHAY75]
SHAY, B. P. "A microprogrammed implementation of parallel
program schemata," Ph.D. dissertation, Univ. of Maryland,
1975.
[SNYD75]
SNYDER, D. C. "ComputeT performance by measurement and
microprogramming," Hewlett-Packard Journal, (Feb. 1975), pp.
17-24.
[SOCK75]
SOCKUT, G. H. "Firmware/hardware support for operating
systems: principles and selected history," SIGMICRO Newsletter
6, 4(Dec. 1975), pp. 17-26.
[STOC77]
STOCKENBERG, J. E. "Optimization through migration of
functions in a layered firmware-software system," Ph.D.
dissertation, Brown Univ., 1977.
[TABA74]
TABANDEH, M., AND RAMAMOORTHY, C. V. "Execution time and
memory optimization in microprograms," in Proc. 7th Annual
Workshop on Microprogramming (ACM) , 1974, pp. 519-527.
[TAFR75]
TAFRELIN, S. "Dynamic microprogramming and external subroutine
calls in a multics-type environment," BIT 15, 2(1975), pp.
192-202.
[TAN77]
TAN, C. J. "Register assignment algorithms for optimizing
compilers," IBM Research Report RC6481, April 1977.
[TAN78]
TAN, C. J. "Code optimization techniques for micro-code
compilers," in Proc. Nat. Comput. Con£. 47, AFIPS Press,
Arlington, Va., 1978, pp. 649-655.
[TIRR73]
TIRRELL, A. K. "A study of the application of compiler
techniques to the generation of microcode," in ~roc. ACM
SIGPLANjSIGMICRO Interface Meeting, May 1973, pp. 67-85.
[TOK077]
TOKORO, M., TAMURA, E., TAKASE, K., AND TAMARU, K. "An
approach to microprogram optimization considering resource
occupancy and instruction formats," in Proc. Tenth Annual
Workshop on Microprogramming CACM), Oct. 1977, pp. 92-108.
[TOK078]
TOKORO, M., TAKIZUKA, T., TAMURA, E., AND YAMAURA, I. "A
technique of global optimization of microprograms," in Proc.
11th Annual Workshop on Microprogramming CACM), 1978, pp. 4150.
[TSUC72]
TSUCHIYA, M. "Design of a multilevel microprogrammable
computer and a high-level microprogramming language," Ph.D.
dissertation, University of Texas at Austin, 1972.
[TSUC74]
TSUCHIYA, M., AND GONZALEZ, M. J., JR. "An approach to
optimization of horizontal microprograms," in Proc. 7th Annual
Workshop on Microprogramming CACM), 1974, pp. 8:.-90.
[TSUC76]
TSUCHIYA, M., AND GONZALEZ, M. J., JR. "Towards optimization
of horizontal microprograms," IEEE Transactions on Computers
C-25, 10COct. 1976), pp. 992-999.
[TUCK71]
TUCKER, A. E., AND FLYNN, M. J. "Dynamic microprogramming:
Processor organization and programming," Commun" ACM 14,
4CApril 1971), pp. 240-250.
[WADE72]
WADE, B. W., AND SCHNEIDER, V. B. "The L-machine: A computer
instruction set for the efficient execution of high-level
language programs," in Proc. 5th Annual Worksho~
Microprogramming CACM), Sept. 1972, preprints.
[WADE73]
WADE B. W., AND SCHNEIDER, V. B., "A general purpose highlevel language machine for mini-computers," in Proc.
SIGPLANjSIGMICRO Interface Meeting, May 1973, p-repdnts.
[WADE75]
WADE, B. W. "A microprogrammed minicomputer for the efficient
execution of high-level language programs," Ph.D.
dissertation, Purdue Univ., West Lafayette, Inc., 1975.
[WOOD78]
WOOD, G. "On the packing of micro-operations into
microinstruction words," in Proc. 11th Annual Workshop on
Microprogramming: SIGMICRO Newsletter 9, 4(Dec. 1978), pp. 5155.
--
-
[WOOD79]
WOOD, W. G., "The computer aided design of microprograms,"
Ph.D. dissertation, Univ. Edinburgh, Scotland, 1979.
[WORT73]
WORTMAN, D. B. "A study of language directed computer design,"
Ph.D. dissertation, Stanford Univ., Ca., 1973.
[YAU74]
YAU, S. S., SCHOWE, A. C., AND TSUCHIYA, M. "On storage
optimization of horizontal microprograms," in Proc. 7th Annual
Workshop on Microprogramming CACM), Oct. 1974, pp. 98-106.
Download