APPLICATION OF MICROCODE TO IMPROVE PROGRAM PERFORMANCE An Honors Thesis (ID 499) by Dawn D. Greenwood Thesis Director ,/ (advisor's signature) Clinton P. Fuelling Ball State University Muncie, Indiana May 18, 1981 Note: Spring 1981 , IP' . "",,' ~~,~ ~:" "" ) ,/l! , ".) . : j • (/l~q CONTENTS INTRODUCTION . . . . . . . . . . . . 1 MICROPROGRAMMING OPERATING SYSTEMS. 1 ARCHITECTURE REDEFINITION VIA MICROPROGRAMMING. 2 Manual Tuning . . 3 Automated Process Frequency of Sequential Occurrence Frequency of Execution Combination Methods. 9 OPTIMIZATION . . . . . . . - - 9 10 11 13 Vertical Optimization Removal of Nonessential Operations Redundant Actions Negated Actions Code Motion. . . . Code Consolidation 13 13 Horizontal Optimization Local Compaction . Compaction Algorithms Linear Algorithm Critical Path Algorithm. Branch and Bound Algorithm List Scheduling. 15 15 17 14 14 18 18 19 20 Global Optimization. 21 Register Allocation . . . 21 INTRODUCTION The use of microcoding to improve program performance has introduced new areas of study in recent years. Microcoding operating syst:ems and architecture redefinition via microprogramming are two such areas. High-level microprogramming languages are also commanding great interest and research. All these techniques require optimization algorithms to assure optimal execution times. These areas will be considered in this paper. Due to the difficulty in locating resources, many articles were not available for review. In this case, care has been taken in guiding the reader to the article by referencing it when appropriate. A number of books and articles in the bibliography offer a basic look at microprogramming in general as well as the subjects covered in this - paper [AGRA74, AGRA76 , BROA74, CHUR70, DAVI78 , FULL76, HIGB78, HINS72 , JONE76a, JONE76b, KRAF81, LEVE77 , RAUS80, ROSI74, SALI76]. MICROPROGRAMMING OPERATING SYSTEMS Since "users seldom (if ever) perform a computation without assistance from the operating system," [DENN71] any impro'Tements in execution rates of most frequently used segments should reflect favorably on application programs. Microcoding primitives used throughout the operating system is one approach to improving operating system time. Larger segments can also be selected. Choosing the most appropriate segments is important to performance improvements [RAUS80, BROW77]. - In [BROW76] is a list of fourteen functions which should be 1 considered for microcoding. These functions tend to be too exact for hardware, relatively consistent, and noticeably affected by software overhead. Several applications in the area of software have been discussed [HUBE70, BELG76, DHAU77, CHAT76, LUND76, DENN71, STOC77, SOCK75] . ARCHITECTURE REDEFINITION VIA MICROPROGRAMMING [RAUS76] Called "tuning" by [ABDA74, ABDA73], the central concept is to change a system's architecture to more efficiently solve a particular problem. Architecture in this sense refers to "the attributes of a com- puter as seen by the programmer" [RAUS76]. This includes the memory and particularly the machine instruction set which the programmer interacts with. The microprograms in control store which interpret the machine instructions define a particular architecture. By modifying these mi- croprograms, the architecture changes accordingly. Such modification is necessary for several reasons. Typical in- struction set design has followed hardware constraints instead of considering the problems to be solved and their structure. The results are instruction sets which are awkward and inefficient [RAUS76, WADE75]. In addition, the general-purpose architecture was conceived through a series of compromises that permit it to perform the widest. range of services. Like the jack-of-all-trades, it is frequently roaster of none and handles all its functions in a suboptimal fashion [HUSS70]. To counteract these compromises and tune the instruction set, carefully chosen segments of machine code are microcoded, placed in control 2 storage and referenced by one new machine language instruction. The major advantage of this approach is the decrease in machine instruction fetch-and-decode time. Considering the average (FORTRAN) program spends forty percent of its execution time on this function, it is estimated that savings can easily be twenty percent [RAUS76]. microcode can be optimized. Additionally, the It is important to stress that optimization strategies are vital to architecture redefinition success [ABDA74]. The final section covers optimization strategies. Isolation of the appropriate segments can be a manual process based on environmentally determined areas of processing inefficiency. An au- tomated process can be based on the frequency of sequential occurrence in an environment, the frequency of execution, or a combination thereof. [RAUS76] MANUAL TUNING Manual tuning is the creation of specialized microcoded instructions when the code is written by a programmer directly or in some microprogramming language. r· . [ABDA74] applied the term to tuning efforts prior to when writable control store made automated tuning feasible. It is no less applicable to vendor packages and installation-written code designed to "speed up" a particular feature or activity. Those activities most likely to benefit by microprogramming are CPU bound, use a number of intermediate results, are highly repetitive, or are not easily handled by existing machine language instructions [CLAP72]. Such features include multiply routines, square root al- gorithms, matrix operations, table searches, array address calculation, .3 number generation, stacking routines, and tree searches [COOK70, HUSS70, TUCK71, CLAP72, REIG72, REIG73, PARK73 , SNYD75 , HABI74 , TMfR75]. In addition, applying manual tuning concepts when an instruction set is being designed will assure an application oriented architecture [WADE72, WADE 73 , HARR73, WADE75]. [WADE75] capitalized on the sim- ilarity of languages' constructs to design a "general-purpose" architecture capable of supporting a number of widely-accepted programming languages. Wade's basic instruction set included simple arithmetic, compare, branch, logic, and load/store instructions. In addition to this "kernel" [RAUS76] language, Wade designed a set of "super instructions" to efficiently execute five common high-level constructs. These operations include arrays and array indexing, repetition loops, block structure housekeeping, input/output editing, and character string - manipulation. Each feature will be examined briefly, with somewhat more detail on array indexing to allow for better understanding of the process of developing application oriented instructions. ARRAYS AND ARRAY INDEXING Wade designed a new machine language instruction ADGEN. This in- struction enables a microprogram which was specially written and placed in control storage by Wade to calculate the address of an array element and use that location in one of several ways. A high-level language version of the algorithm used to calculate the address is given below. - 4 SUM = ~ =1 For i step 1 until n do begin (upper-bound [i] - lower bound Ii] + 1) sum = sum * = sum + sum = sum * element - size sum (subscript + [i] - lower bound [i]) end computed - address = array - address + sum The microcoded algorithm uses information stored in the instruction and the array descriptor. These formats are given below. ARRAY DESCRIPTOR FORMAT n)S array)S lower-bound - 1)S upper-bound - 1)S ... )S lower-bound - n)S upper-bound - n)S element size (in bytes) where: n array - = number of dimensions of array = address of array in memory 5 INSTRUCTION FORMAT ADGE~5~~ address of array descriptor subscript - ~ ~ subscript - n~ flags 5~ = instruction operation code 1~ ... ~ index register where: flags index register = indicate action to be performed with location received calculated address The algorithm is designed for arrays stored in row order internally. For those stored by columns (FORTRAN), the subscript fields are interchanged. Repetition Loops (DO- or FOR-loops) Observing that such loops can vary in intricacy, Wade designed four different instructions. Two instructions handle the simplier cases requiring little more than an increment-compare-branch sequence, while the remaining instructions were used for the more complicated loops. Block Structure Block structures generate overhead due to the housekeeping involved in the dynamic manipulation of storage necessary for nested BEGIN blocks and procedures CALL/RETURN statements. When handled by inappropriate instruction sets, that overhead becomes excessive. 6 To implement these functions in microcode, BEGIN Block and END Block instructions, two procedure calls, and a single RETURN statement were designed. Input/Output Editing I/O editing refers to the conversion of decimal character numbers into binary form for internal storage and manipulation and the conversion back to decimal numbers for external use. The instruction CONVERT TO DECIMAL converts one binary number to its decimal representation. Input instructions were provided for conversion to binary form. Character String Manipulation Wade's goal was to provide for the majority of the manipulation features of PL/l. A string descriptor similar to that of arrays provided direct and indirect addressing capabilities. usage of varying-length strings. This simplifies MOVE and COMPARE statements were designed to best handle these features. PERFORMANCE RESULTS Wade's improved machine language instruction set was implemented and its performance compared to that of the IBM 370 architecture. of PLjl and FORTRAN programs were executed on both machines. A number The fol- lowing chart lists the execution time improvements realized by the highlevel language oriented architecture over the IBM general purpose architecture. 7 - Percent Improvement Program Function PL/_1_ Finding prime numbers by sieve of Eranosthenes 32 . 2% Generates random sentences from English grammar 64.2% FORTRAN Convert arithmetic expression from infix to 87.4% postfix notation 54.2% 27.9% 51.7% 50.5% Multiplication of two matrices 72..3% 31.4% Evaluation of a determinant 85.9% Adds record to linked list Solving differential equation by fourth order .- Runge-Kutta method In general, those programs performing non-numeric processing showed greater performance improvements. of Hans Jeans [JEAN65]. This is in keeping with the findings Jeans found that arithmetic operations were hardware bound and that the decrease in instruction fetch and execute time was less significant than in logical operations. - 8 AUTOMATED PROCESS With the increased use of the writable control store, thought turned to automating the manual process of tuning. In this way each group of similar programs, or each program, can have its own machine architecture with instructions chosen to assure its optimal execution [ELAY77]. This architecture is dynamically loaded into control store prior to program execution. When a unique architecture is created for each program, the process is called dynamic problem oriented redefinition of computer architecture via microprogramming [RAUS76, RAUS75 , RAUS78 , RAUS80]. When similar programs are grouped together and an improved instruction set created for each group, this is called heuristic tuning [ABDA74]. Several methodologies exist for determining which groups of instructions should be replaced by new machine instructions to yield the - greatest execution gains with minimal control store requirements. (Because of its expense, control store must be considered a limited resource. If this were not the case, entire programs could be microcoded.) These include the frequency of sequential occurrence and the frequency of execution. Heuristic tuning [ABDA74] and [RAUS76]'s dynamic redefinition of architecture enlist both methods in an attempt to achieve optimal performance improvement. Frequency of Sequential Occurrence This method is based on the observation that instructions characteristically occur in identical sequences. This is found not only on the machine instruction level, but also on the intermediate level generated ,9 by high-level language compilers. The intermediate language code is analyzed to locate such sequences and to determine how frequently they occur. From this information, sequences which would yield the greatest execution time savings if microcoded are selected for new machine instructions. chosen. The difficulty here is the length of sequences to be Longer sequences occur less frequently, while shorter sequences occur more often. While execution time improvements are achieved through this process, they are not optimal. Typically, no attempt is made to determine rela- tionships between sequences. available. Additionally, run time information is not There is no way to determine if some sequences are execu- tioned repeatedly as in a loop. [RAUS76] Frequency of Execution This method uses the execution behavior of a program to choose the instructions to be microcoded. blocks. The program is analyzed into program These are straight-line sections of code such that if the first operation in the block is performed, all succeeding code is executed. By knowing the number of times the block is executed and 1~he execution time improvement of a single microprogrammed block over that of the corresponding machine instructions, total execution time savings can be calculated. Those blocks with the greatest savings are chosen as new machine language instructions. - 10 Combination Methods Rauscher and Agrawala [RAUS76, RAUS75 , RAUS78] devised a dynamic redefinition scheme combining the previous two methods. Program sequences are considered for microcoding on the basis of both their frequency of occurrence and their frequency of execution. The intermediate representation from the high-level language compiler is analyzed to determine the different instruction sequences and where they occur in the program. each segment are calculated. ment will be executed. The expected savings froln microcoding Estimates are made on how often each seg- With this additional information, total projected time saved can be calculated. Those instruction sequences with the highest potential time savings are microcoded as new instructions. Experimentally, execution time improvements were found to be between twenty-five and fifty percent. In heuristic tuning [ABDA74], programs are classified according to their application, language and/or special requirements. A trace file for each application class is built by monitoring the execution of numerous programs from that class. This data may be collected through hardware, software or microprogrammed means. The trace file data is then processed to obtain statistics on instruction execution frequencies, instruction dependencies and similar information. These sta- tistics make it possible for a synthesis algorithm to modify existing microcode or create new microprograms which will allow optimal execution of class programs. 11 The first step performed by the algorithm is the location of program loops using trace file information. Within the most frequently executed loop, an optimization step takes place. In it, arrangements are made so that the most frequently accessed data is pre loaded into internal registers. Next, the instructions within the loop are decomposed into micro-operations, optimized, and loaded into control store. The trace file is updated and an estimate of the performance improvernent is made. This algorithm repeats, each time choosing the loop to be tuned from those remaining. It terminates when the algorithm determines that any further execution improvements could not be greater than the overhead they generate. If the algorithm determines there are either no loops or very long loops, synthesis of new instructions can still take place. The trace file supplies the most frequently occurring code sequences and an algorithm similar to the one used above combines the dependent code into new instructions. The new instruction set is tested to determine that it. is functionally equivalent to the general instruction set. Once its results can be trusted, the new architecture is ready to be incorporated into the existing system. Several software modifications must be made. Two programs were heuristically tuned and their original execution compared with that using the synthesized instruction. A data movement example resulted in a 87.4% improvement in execution time. A Fibonacci number generation program achieved a 77% improvement [ABDA74]. It should be emphasized that these examples reflect specific algorithms. 12 More complicated application programs would probably achieve somewhat less of an improvement. OPTIMIZATION Optimization has several meanings depending on what concepts the particular author wants to emphasize. While optimization can refer to decreasing the number of bits in a microinstruction, this material will concentrate on methods of reducing the execution time by decreasing the number of microinstructions in a microprogram [MALL78]. categories of algorithms which optimize microcode. There are two Vertical optimiza- tion deals with decreasing the number of microoperations in a sequence of microoperations. Horizontal or microcode optimization., also known as compaction, is the process of creating microinstructions in a machine whose control word allows a great deal of parallelism. This means a large number of hardware devices can perform concurrently, and therefore a number of microoperations can execute simulatenously. Compaction can be performed locally within straight-line sequences or globally throughout the program. Additionally, register allocation is discussed because of its relationship to program performance. Vertical optimization Until recently, microcode was painstakingly optimized by hand. When automation of the process was considered, the natural route was to modify processes that were used in standard compilers on predominantly sequential code [ALLE69, ALLE72]. Kleir and Ramomoothy [KLEI71] were 13 the first to consider vertical optimization algorithms for microcode as an alternative to hand optimization. One method of optimization borrowed from traditional compilers is removal of nonessential operations such as redudant actions and negated actions. Redudant actions are those available from a previous operation which used identical input and identical destination of output. Additionally, the output destination must have remained unchanged, and the execution of one action must guarantee execution of the other to assure consistent performance. Negated actions are those from which the output is never used. Code motion is the act of decreasing the number of operations performed in those segments of the program most frequently execut.ed. program activity as a guide, dynamic analysis rates areas as - execution frequency to determine proper code movement. t~o Using their Unfortunately, dynamic analysis is prone to suffer from false assumption and extensive time usage. Code motion without the benefit of execution statistics is called static analysis. It is assumed that the level of nesting indi- cates frequency of execution. Actions are migrated from inner to outer segments [KLEI71]. Code consolidation is another possible vertical optimization. It is feasible on certain machines that a sequence of simpler instructions can be replaced by one complex instruction. This concept has been used in the PLjMP compiler [TAN78]. The PLjMP compiler takes a high-level language and by applying both machine dependent and machine independent optimizations, produces 14 microinstructions. Similar compilers and the corresponding high-level languages are discussed in a number of articles. [ECKH71, TIRR73, BLAI74, BOND74, CLAR72, DASG78b, DASG80, LEWI79, DEWI76a, SCHR74, FRIE76, DEWI76b, LLOY74b, RAMA74, TSUC72]. The intermediate representation of the compiler is a stream of primitive register-to-register operations. After other vertical optimiza- tions are applied, the code is searched for sequences which satisfy predefined substitution rules. As an example: an add-immediate instruction ADDIM b,@b,100 a load indirect LOADI y,b and an add instruction ADD z,x,y can be performed by one instruction add-indirect-with-offset ADDIO z,x,@b,100 The substition would be made only if the first three instructions could be deleted. If a subsequent operation uses y, this is not the case. Horizontal Optimization Local Compaction Local compaction has been the subject of much recent study [YAU74, AGER76, LLOY74a, RAMA73, RAMA74, DASG76a, DASG76b, JACK74, TSUC74, TSUC76, DEWI76a, TABA74, TOK077, MALL78, WOOD78, WOOD79, FISH79]. - 15 - Because of the stringent performance expectations placed on microcode, earlier study was directed toward producing optimal code. The amount of work this requires is prohibitive since all possible mappings of microoperations to microinstructions are found [AST071]. The time required for such exhaustive work increases exponentially with the number of microoperations [LAND80]. Methods have been developed more recently [TOK077, MALL78, WOOD79, FISH79] which result in optimal or near-optimal code in polynomial or linear time. This breakthrough makes horizontal microcode compilers feasible. Local compaction algorithms typically require two inputs: a straight-line microcode section which contains no internal branches or entry points; and some representation of the relationship between the microoperations. To maintain data integrity, it is vital that. the orig- inal semantics of the sequential microcode be preserved. This obviously requires that certain microoperations be executed in a particular order. Such microoperations are said to have a data interaction. Given two sequential microoperations a and b, the following three conditions define data interaction: 1. b requires an output resource of a as an input 2. a requires an input source which b modifies 3. a and b modify the same output resource [LAND80]. These interactions must be analyzed and represented so that the information can be used to create microinstructions. Landskov [LAND80] presented a general algorithm to record these data interactions in - 16 graphical form. Each microoperation is added to the graph in a sequen- tial order and is represented by a node. The new node must be connected to previously placed nodes to indicate the data interactions. It is linked only to those nodes on which it is directly data dependent. This implies that nodes further up the path from the lowest node having a data interaction with the new node are not also linked to it. The search first checks the last node on each path (the leave8) for data interaction. If the interaction is present the nodes are linked. Otherwise, testing takes place on each preceeding node working up from the leaves until the test is positive or a node is reached which is already indirectly linked with the new node. Once all microoperations are added to the graph, the resulting tree indicates the sequencing necessary to assure semantical equivalency. Compaction Algorithms Landskov [LAND80] suggests four categories for compaction algorithms. These include linear analysis, critical path, branch and bound, and list scheduling. parts. Each algorithm consists primarily of two First, a data dependency analysis orders the microoperations so as to assure the semantics are not compromised when microinstructions are created. Secondly, a conflict analysis monitors the .assignment of microoperations to particular microinstructions to assure there are no hardware conflicts. 17 Linear Algorithm The data dependency analysis begins with a list of microinstructions (originally empty). Proceeding sequentially, microoperations are selected and the listed of microinstructions are searched from bottom to top to determine the rise limit. This is the first microinstruction in which the microoperation can be included given the data dependencies. The microinstruction list is searched top-to-bottom in conflict analysis. Beginning at the rise limit, search proceeds downward until a microinstruction is found which can include the microoperation without hardware conflict. If a microoperation having a rise limit cannot be included in an existing microinstruction, a new one is created at the bottom of the list. If the microoperation has no rise limit, it is in- cluded in a new microinstruction at the beginning of the list. linear algorithm does not guarantee optimal code. The Dasgupta's work has produced an algorithm that is the model for Landskov's [DASG76, DASG77, BARN78 , DASG78a, DASG79]. Critical Path Algorithm An early partition is created as the first step. By \llOrking down through the microoperations using the data dependency graph, the earliest that each microoperation can be executed, assuming there is no conflicts for hardware, is determined. The total number of steps or frames needed to fit in all microoperations under these conditions is the minimum number of microinstructions which are required for that sequence. Next, working upward, microoperations are placed in the last 18 possible frame in which they can be executed so that the minimum number of frames determined earlier can be maintained. Critical microopera- tions are those with the same early and late timing and constitute the critical partition. The critical partition is then modified by dividing any frame in which there is hardware conflicts into two (or more) frames. To form the final microinstructions, the non-critical Inicrooperations are added to the modified critical partition. Each such microop- eration can be included from their frame in the early partition to their frame in the late partition inclusive. If a hardware conflict does not allow a microoperation to be included within these frames, a new microinstruction is inserted just following the last partition position. This algorithm could result in many more microinstructions than is optimal. Its weakness is that adjacent microinstructions which could be combined are not if they are in the different frames. This algorithm was apparently devised from critical path processor scheduling [RAMA79] by Ramamoorthy and Tsuchiya [RAMA74]. Branch and Bound Algorithm Using the data dependency graph, a data available set (Dset) is built which contains those microoperations which have all of their directly data dependent microoperations allocated to a microinstruction. Obviously, the original Dset consists of those microoperations with no parent nodes in the graph. Using the Dset, microinstructions are formed such that every possible member of the Dset are included. 19 This is a complete instruction and the collection of these is used to build a tree whose nodes correspond to the microinstructions. In BAB exhaustive, a complete tree is built whose paths represent every microinstruction ordering possible. From this tree, the path of optimal leng1:h is found and the corresponding complete instructions are the shortest possible microcode. detail. [LAND80] and [MALL78] explain this algorithm in great This microcode compaction algorithm was originally presented by Yau, Schowe, and Tsuchiya [YAU74]. Additionally, different neuristics can be applied to the branch and bound algorithm to decrease the number of possible paths and to cut execution time dramatically while still achieving near-optimal results [MALL78]. List Scheduling In list scheduling algorithms, only a single branch of a tree is formed by choosing the "best" complete instruction from each Dset. "Best" is determined by a weighting function. The choice of function is important to achieving optimal microcode [FISH79]. One possible weight for a microoperation is the number of descendants that microoperation has in the data dependency graph [WOOD78]. Thus when choosing a mi- crooperation from the Dset, the order is from highest to lowest weight. Each microoperation is added to the microinstruction unless a hardware conflict occurs. When all Dset members have been considered, a new Dset is created and a new microinstruction is started. 20 Global Optimization Unlike local optimization, global optimization can look beyond segments to include loops and resursive subroutines. Its prinlary goal is to eliminate the NOP's positioned within microinstructions in the interest of proper timing of operations [LAND80]. Global optimization is an area where little research has been done until very recently. [WOOD79, DASG79, MALL78 , FISH79, TOK078] deal with this subject. [TOK078] has developed a global routine based on a microoperation model called a microtemplate. This microtemplate is two-dimentional, representing both timing and machine resources. [TOK078] uses an opti- mization algorithm which seeks primarily to eliminate NOP's positioned at the beginning and/or end of sequential segments by consoidation, or entire redundant microtemplates, in either an upward or downward direction. When compared to hand optimized code, the globally optimized code displayed an average improvement of 5.4%. Similar approaches used by [WOOD79] and [FISH79] treat a locally compacted sequence as a primitive in a larger sequence. Register Allocation Computers designed for efficient microprogramming typically have a large number of very fast registers. Ideally, there are enough regis- ters so that each variable can be permanently assigned to a register during program execution. When there are too few registers, variables must sometimes be stored in memory. This means a store-and-load sequence is often required when a variable is brought into a register. 21 This additional overhead has been shown to affect performance. Liu [LIU75] found that having sufficient registers could improve microprogram execution time by thirty-three percent over the time registers. \~ith no Obviously, it is important to determine which variables are most frequently referenced and to assign these variables to registers first. It is also important to have an efficient realloca"tion algorithm to keep reallocations to a minimum [DEWI76, TAN77]. With memory costs decreasing, manufacturers are tending to include even more high-speed microprogramming registers in their machines. This trend could make the problem of register allocation of minimal concern in the future. 22 BIBLIOGRAPHY - [ABDA73] ABD-ALLA, A. M., AND KARLGAARD, D. C. "The heuristic synthesis of application-oriented microcode,!! in Proc. 6th Annual Workshop on Microprogramming (ACM) , 1973, pp. 83-90. [ABDA74] ABD-ALLA, A. M., AND KARLGAARD, D. C. "Heuristic synthesis of microprogrammed computer architecture," IEEE Transactions on Computers C-23, (August 1974), pp. 802-807. [AGER76] AGERWALA, T. "Microprogram optimization: A survey," IEEE Transactions on Computers C-25, 10(October 1976), pp. 962-973. [AGRA74] AGRAWALA, A. K. AND RAUSCHER, T. G. "Microprogramming: Perspective and status," IEEE Transactions on Co~ers C-23, (August 1974), pp. 817-837. [AGRA76] AGRAWALA, A. K. AND RAUSCHER, T. G. Foundations of microprogramming: Architecture, software, and applications, ACM Monograph Series, Academic Press, New York, 1976. [ALLE69] ALLEN F. E. "Program optimization," Annual Review of Automatic Programming 5, (1969). [ALLE72] ALLEN, F. E., AND COCKE, J. "A catalogue of optimizing transformations," in The Design and Optimization. of Compilers, R. Rustin, Ed., Prentice-Hall, Englewood Cliffs, N.J., 1972. [BARN78] BARNES, G. E. "Comments on the identification of maximal parallelism in straight-line microprograms," IEEE Transactions on Computers C-27, 3(March 1978), pp. 286-287. [BELG76] BELGARD, R. A. "A generalized virtual memory package for BIT interpreter writers," SIGMICRO Newsletter 7, 4(December 1976), pp. 31- . [BLAI74] BLAIN, G. PERRONE, M., AND HONG, N. X. "A compiler for the generation of optimized microprograms," SIGMICRO Newsletter 5, 3(Oct. 1974), pp. 49-67. [BOND74] BONDI, J. 0., AND STIGALL, P. D. "Designing HMO, an intergrated hardware microcode optimizer," in Proc. 7th Annual Workshop on Microprogramming (ACM) , Oct. 1974, pp. 268-276. [BROA74] BROADBENT, J. K. "Microprogramming and system architecture," Computer Journal 17, 2(Feb. 1974), pp. 2-8. [BROW76] BROWN, G. E. et. al. "Operating system enhancement through microprogramming," SIGMICRO Newsletter 7, l(March 1976), pp. 28-33. [BROW77] BROWN, G. E. et. al. "Operating system enhancement through firmware," SIGMICRO Newsletter 8, 3(September 19'?7), pp. 119133. [CHAT76] CHATTERGY, R. "Microprogrammed implementation of a scheduler," SIGMICRO Newsletter 7, 3(September 1976), pp. 15-19. [CHEN7!!'] CHEN, T. C. "Unconventional superspeed computer systems," IBM Thomas J. Watson Research Report RC 782, Yorktown Heights (Nov. 1970). [CHUR7!!'] CHURCH, C. C. "Computer instruction repertoire - time for a change," in Proc. 1970 AFIPS Spring Joint Computer Conference, AFIPS Press, Montvale, N.J., pp. 343-349. [CLAP72] CLAPP, J. A. "The application of microprogramming technology," SIGMICRO Newsletter 3, l(April 1972), pp. 8-47. [CLAR72] CLARK, R. K. "Mirager, the 'best-yet' approach for horizontal microprogramming," in Proc. ACM Nat. Congress, 1972, pp. 554571. [COOK70] COOK, R. W., AND FLYNN, M. J. "System design of a dynamic microprocessor," IEEE Transactions on Electronic Computers C19, 3(March 1970), pp. 213-222. [DASG76] DASGUPTA, S., AND TARTAR, J. "The identification of maximal parallelism in straight line microprograms," IEEE Transactions on Computers C-25, 10(Oct. 1976), pp. 986-992. [DASG77] DASGUPTA, S. "Parallelism in loop free microprograms," in Information Processing 77 (Proc. IFIP Congress 1977)l B. Gilchrist, Ed., North-Holland Amsterdam, 1977. [DASG78a] DASGUPTA, S. "Comment on the identification of maximal parrallelism in straight-line microprograms," IEEE Transactions on Computers C-27, 3(March 1978),-r~ 285-286. [DASG78b] DASGUPTA, S. "Towards a microprogramming language schema," in Proc. 11th Annual Workshop on Microprogramming (ACM), Nov. 1978, pp. 144-153. [DASG79] DASGUPTA, S. "The organization of microprogram stores," Computing Surveys 11, l(March 1979), pp. 40-65. [DASG8!!'] DASGUPTA, S. "Some aspects of high-level microprogramming," Computing Surveys 12, 3(Sept. 1980), pp. 295-323. [DAVI78] DAVIDSON, S., AND SHRIVER, B. D. "An overview of firmware engineering," Computer 11, 5(May 1978), pp. 21-33. [DENN71] DENNING, P. J. "Third generation computer systems," Computing Surveys 3, 12(Dec. 1971), pp. 175-216. [DEWI76a] DEWITT, D. J. "A machine-independent approach to the production of horizontal microcode," Ph.D. dissertation, Univ. Michigan, Ann Arbor, June 1976; Tech. Rep. 76 DT~ Aug. 1976. [DEWI76b] DEWITT, D. J. "Extensibility - A new approach for designing machine independent microprogramming languages," in Proc. 9th Annual Workshop on Programming SIGMICRO Newsletter 7, 3(Sept. 1976), pp. 33-41. [DHAU77] D'HAUTCOURT-CARETTE, F. "A microprogrammed virtual memory for eclipse," SIGMICRO Newsletter 8, 2(June 1977), pp. 10-20. [ECKH71] ECKHOUSE, R. H., JR. "A high-level microprogramming language (MPL)," in Proc. 1971 AFIPS Spring Joint Computer Conference, AFIPS Press, Montvale, N.J., pp. 169-177. [ELAY77] EL-AYAT, K. A., AND HOWARD, J. A. "Algorithms for a selftuning microprogrammed computer," SIGMICRO Newsletter 8, 3(Sept. 1977), pp. 85-91. [FISH79] FISHER, J. A. The optimization of horizontal microcode within and beyond basic blocks: An application of processor scheduling with resources, U.S. Dept. of Energy Report, Mathematics and Computing C~~-3~77-161, New York Univ., Oct. 1979. [FRIE76] FRIEDER, G. "The FORTRAN project - A multifaceted approach to software-firmware high level language support," SIGMICRO Newsletter 7, 3(Sept. 1976), pp. 47-50. [FULL76] FULLER, S. H., LESSER, V. R., BELL, C. G., AND KAMAN, C. H. "The effects of emerging technology and emulation requirements on microprogramming," IEEE Transactions on Computers 25, 10(Oct. 1976). [HABI74] HABIB, S. "Microprogrammed enhancements to higher level languages - an overview," in Proc. 7th Annual Workshop on Microprogramming (ACM) , Oct. 1974, pp. 80-84. [HARR73] HARRISON, M. C. "Language oriented instruction sets," in Proc. ACM SIGPLANjSIGMICRO Interface Meeting, (1973). [HIGB78] HIGBIE, L. C. "Overlapped operation with microprogramming," IEEE Transactions on Computers C-27, 3(March 1978), pp. 270275. [HINS72] HINSHAW, D. 1. "Optimal selection of functional hardware units for microprogram controlled central processing units," Ph.D. dissertation, Univ. of Michigan, 1972. [HUBE7~] HUBERMAN, B. J. "Principles of operation of the Venus microprogram," MITRE Corp., Bedford, Mass., MTR 1843, July 1970. [HUSS7~] HUSSON, S. S. Microprogramming: Principles and Practices, Prentice-Hall, Englewood Cliffs, N.J., 1970. [JEAN65] JEANS, H. "Possible performance improvements by microprogramming for System/360 Models 40, 50, and 65," IBM Data Processing Division Report #320-3203, (1965). [JONE76a] JONES, L. H. "Microprogramming bibliography: 1951-Early 1975," SIGMICRO Special Issue, (Mar. 1976). [JONE76b] JONES, L. H. "An annotated bibliography of microprogramming: Late 1975-Mid 1976," SIGMICRO Newsletter 7, 4(Dec. 1976) pp. 95-122. - [KLEI7l] KLEIR, R. L., AND RAMAMOORTHY, C. V. "Optimization strategies for microprograms," IEEE Transactions on Computers C-20, 7(July 1971), pp. 783-795. [KRAF81] KRAFT, G. D., AND TOY, W. N. Microprogrammed Control and Design of Small Computers, Prentice-Hall, Englewood Cliffs, N.J., 1981. [LAND8~] LANDSKOV, D., DAVIDSON, S., AND SHRIVER, B. "Local microcode compaction techniques," Computing Surveys 12, 3(Sept. 1980), pp. 261-294. [LEVEn] LEVENTHAL, L. 1. "Microprocessors and microprogramming: a review and some suggestions," Simulation 29, (Aug. 1977), pp. 56-59. [LEWI79] LEWIS, T. G., MALIK, K., AND MA, P. Y. "Firmware engineering using a high level microprogramming system to implement virtual instruction set processors," Dept. of Computer Science, Oregon State Univ., Corvallis, 1979. [LIU75] LIU, P. S. "Dynamic microprogramming in Fortran environment, " Ph.D. dissertation, Purdue Univ., West Lafayette, In., Aug. 1975. [LLOY74a] LLOYD, G. R. "Optimization of microcode for horizontally encoded microprogrammable computers," Master's Thesis, Brown Univ., Providence, Rhode Island, June 1974. [LLOY74b] LLOYD, G. R., AND VAN DAM, A. "Design considerations for microprogramming languages," in Proc. Nat. Comput. Con!. 43, AFIPS Press, Arlington, Va., 1974, pp. 537-543. - [LUND76] LUNDELL, E. D., Jr. "Reloadable control store marks two models added to IBM370 system line," Computerworld X, (July 5, 1976) pp. 1, 5. [MALL78] MALLETT, P. W. "Methods of compacting microprograms," Ph.D. dissertation, Univ. Southwestern Louisiana, Lafayette, Dec. 1978. - [McKE67] McKEEMAN, W. M. "Language directed computer design," Proc. 1967 AFIPS Fall Joint Computer Conference, AFIPS Press, Montvale, N.J., pp. 413-174. [MOFF76] MOFFETT, L. H. "On-line virtual architecture tuning using microcapture," Ph.D. dissertation, The George Washington Univ., 1976. [NESS79] NESSETT, D. M., AND RECHARD, o. W. "Allocation algorithms for dynamically microprogrammed multiprocessor systems," Computer Journal 22, 5(May 1979), pp. 155-163. [NOWL75] NOWLIN, R. W. "Effective computer architecture for microprogrammed machines," Ph.D. dissertation, Texas Tech. University, 1975. [PARK73] PARK, H. "Fortran enhancement," in Proc. 6th Annual Workshop on Microprogramming (ACM) , Sept. 1973, pp. 156-159. [PATT76] PATTERSON, D. A. "STRUM: Structured microprogramming system for correct firmware," IEEE Transactions on Computers, 10(Oct. 1976), pp. 974-985. [PATT78] PATTERSON, D. A. "An approach to firmware engineering," in Proc. Nat. Comput. Conf. 47, AFIPS Press, Arlington, Va., 1978, pp. 643-647. [PATT79] PATTERSON, D. A. "An experiment in high level language microprogramming and verification," unpublished manuscript, Dept. of Electrical Engineering and Computer Science, Univ. California, Berkeley, 1979. [RAMA69] RAMAMOORTHY, C. F., AND GONZALEZ, M. J. JR. "A survey of techniques for recognizing parallel processable streams in computer programs," in Proc. 1969 AFIPS Fall Joint Computer Conference, AFIPS Press, Montvale, N.J., pp. 1-15. [RAMA74] RAMAMOORTHY, C. F., AND TSUCHIYA, M. "A high-level language for horizontal microprogramming," IEEE Transactions on Computers C-23, 8(Aug. 1974), pp. 791-801. [RAUS75] RAUSCHER, T. G. "Dynamic problem oriented redefinition of computer architecture via microprogramming," Ph.D. dissertation, Univ. of Maryland, 1975. [RAUS76] RAUSCHER, T. G., AND AGRAWALA, A. K. "Developing application oriented computer architectures on general purpose microprogrammable machines," in Proc. Nat. Comput. Conf. 45, AFIPS Press, Arlington, Va., 1976, pp. 715-722. [RAUS78] RAUSCHER, T. G., AND AGRAWALA, A. K. "Dynamic problem oriented redefinition of computer architecture via microprogramming," IEEE Transactions on Computers C-27, (Nov. 1978), pp. 10061014. - [RAUS8~] RAUSCHER, T. G. "Microprogramming: A tutorial and survey of recent developments," IEEE Transactions on Compu1:ers C-29, l(Jan. 1980), pp. 2-20. [REIG72] REIGEL, E. W., FABER, V., AND FISHER, D. A. "The interpreterA microprogrammable building block system," in Proc. 1972 AFIPS Spring Joint Computer Conference, AFIPS Press, Montvale, N.J., pp. 705-723. [REIG73] REIGEL, E. W., AND LAWSON, H. W. "At the progranuning languagemicroprogramming interface," in Proc. ACM SIGPLAN/SIGMICRO Interface Meeting, May 1973, pp. 2-18. [ROSI74] ROSIN, R. F. "The significance of microprogramming," SIGMICRO Newsletter 4, l(Jan. 1974), pp. 24-38. [SAFA8~] SAFEE, B. "A microprogrammed approach to efficient computer language translation and execution," Ph.D. dissertation, State Univ. of New York at Buffalo, 1980. [SALI73] SALISBURY, A. B. "The Evaluation of Microprogram Implemented Emulators," Technical Report No. 60, Digital Systems Laboratory, Stanford University, Stanford, Calif., July 1973. [SALI76] SALISBURY, A. B. Microprogrammable Computer Architectures, American Elsevier Publishing Co., New York, 1976. [SCHR74] SCHREINER, E. "A comparative study of high level microprogramming languages," SIGMICRO Newsletter 5, l(April 1974), p. 4. [SHAY75] SHAY, B. P. "A microprogrammed implementation of parallel program schemata," Ph.D. dissertation, Univ. of Maryland, 1975. [SNYD75] SNYDER, D. C. "ComputeT performance by measurement and microprogramming," Hewlett-Packard Journal, (Feb. 1975), pp. 17-24. [SOCK75] SOCKUT, G. H. "Firmware/hardware support for operating systems: principles and selected history," SIGMICRO Newsletter 6, 4(Dec. 1975), pp. 17-26. [STOC77] STOCKENBERG, J. E. "Optimization through migration of functions in a layered firmware-software system," Ph.D. dissertation, Brown Univ., 1977. [TABA74] TABANDEH, M., AND RAMAMOORTHY, C. V. "Execution time and memory optimization in microprograms," in Proc. 7th Annual Workshop on Microprogramming (ACM) , 1974, pp. 519-527. [TAFR75] TAFRELIN, S. "Dynamic microprogramming and external subroutine calls in a multics-type environment," BIT 15, 2(1975), pp. 192-202. [TAN77] TAN, C. J. "Register assignment algorithms for optimizing compilers," IBM Research Report RC6481, April 1977. [TAN78] TAN, C. J. "Code optimization techniques for micro-code compilers," in Proc. Nat. Comput. Con£. 47, AFIPS Press, Arlington, Va., 1978, pp. 649-655. [TIRR73] TIRRELL, A. K. "A study of the application of compiler techniques to the generation of microcode," in ~roc. ACM SIGPLANjSIGMICRO Interface Meeting, May 1973, pp. 67-85. [TOK077] TOKORO, M., TAMURA, E., TAKASE, K., AND TAMARU, K. "An approach to microprogram optimization considering resource occupancy and instruction formats," in Proc. Tenth Annual Workshop on Microprogramming CACM), Oct. 1977, pp. 92-108. [TOK078] TOKORO, M., TAKIZUKA, T., TAMURA, E., AND YAMAURA, I. "A technique of global optimization of microprograms," in Proc. 11th Annual Workshop on Microprogramming CACM), 1978, pp. 4150. [TSUC72] TSUCHIYA, M. "Design of a multilevel microprogrammable computer and a high-level microprogramming language," Ph.D. dissertation, University of Texas at Austin, 1972. [TSUC74] TSUCHIYA, M., AND GONZALEZ, M. J., JR. "An approach to optimization of horizontal microprograms," in Proc. 7th Annual Workshop on Microprogramming CACM), 1974, pp. 8:.-90. [TSUC76] TSUCHIYA, M., AND GONZALEZ, M. J., JR. "Towards optimization of horizontal microprograms," IEEE Transactions on Computers C-25, 10COct. 1976), pp. 992-999. [TUCK71] TUCKER, A. E., AND FLYNN, M. J. "Dynamic microprogramming: Processor organization and programming," Commun" ACM 14, 4CApril 1971), pp. 240-250. [WADE72] WADE, B. W., AND SCHNEIDER, V. B. "The L-machine: A computer instruction set for the efficient execution of high-level language programs," in Proc. 5th Annual Worksho~ Microprogramming CACM), Sept. 1972, preprints. [WADE73] WADE B. W., AND SCHNEIDER, V. B., "A general purpose highlevel language machine for mini-computers," in Proc. SIGPLANjSIGMICRO Interface Meeting, May 1973, p-repdnts. [WADE75] WADE, B. W. "A microprogrammed minicomputer for the efficient execution of high-level language programs," Ph.D. dissertation, Purdue Univ., West Lafayette, Inc., 1975. [WOOD78] WOOD, G. "On the packing of micro-operations into microinstruction words," in Proc. 11th Annual Workshop on Microprogramming: SIGMICRO Newsletter 9, 4(Dec. 1978), pp. 5155. -- - [WOOD79] WOOD, W. G., "The computer aided design of microprograms," Ph.D. dissertation, Univ. Edinburgh, Scotland, 1979. [WORT73] WORTMAN, D. B. "A study of language directed computer design," Ph.D. dissertation, Stanford Univ., Ca., 1973. [YAU74] YAU, S. S., SCHOWE, A. C., AND TSUCHIYA, M. "On storage optimization of horizontal microprograms," in Proc. 7th Annual Workshop on Microprogramming CACM), Oct. 1974, pp. 98-106.