THE PERFIRMANCE ADVANTAGES OF A HIGH LEVEL LANGUAGE MACHINE by JAMES WALTER RYMARCZYK Submitted in Partial Fulfillment of the R.equirements for the Degree of Bichelor of Science at the INSTITUTE OF TECHNOLOGY MASSACHUSETTS June, 1972 Signtture 3f Authoa . ; . .ngineeri.. Department of Electrical C3rtified by S S 0 0 S S S S . May Engineering; 197 12, 1972 S Thesis Supervisor Accepted by S S S S S S S S S S S S S S S . . Chairmin, Departmental Committee on Theses ABSTRACT This paper identifies and discusses a number of mechanisms by which a machine with a suitable high level interface language might achieve a level of performance which ex.eeds that of a A high machine with a conventional von Neumann architecture. level language machine is characterized which is somewhat less fl3xible than a conventional design, but which significantly otit-performs the conventional machine when used in a way that explits the high level language. K3ywords and Phrases: Computer Architecture, Machine Organization, High Level Language Machine, Language Oriented Computer Design, Computer Command Structures, Hardware Implementation of Programming Languages, High Performance Computer Design. ACKNOWLEDGEMENT my thesis supervisor, and to I am grateful to Stu Mainick, Dive Kelleher and Steve Zilles of IBM for their generous efforts in reviewing and commenting upon this paper. iii TABLE OF CONTENTS Page Section I. II. . . . . . . . . . 1 . . . . . . . . . . 2 . . . . . . . . . . 3 CHARACrERISTICS . . . . . . . 7 8 Perspective A. Historical B. objectives . . . . LANGUAGE FAVORA3LE A. . . . . . INTRODUCrION Program Structure . B. Primitive Data Types . . . , . . . . . . . . . . . . . . . . . . 11 , . . 13 III. MECBiA1ISMS FOR ACHIEVIN3 IPROVED PERFORMANZE A. Optimization 1. Coatext-Sensitive . . . 13 . . . . . . 14 Operations . . . . . 14 . . . . . . . 16 . . . . . . . 17 . . . . . 17 . . . . 18 . . . . 19 . . . . 20 Optimizations Unnecessary Avoiding b. Reordering operations c. Exploiting Special Cases . Processing ot Aggregates 3. Parallel Processing of 4. Dynamic a. C. . a. 2. Parallel B. . of Expression Evaluation Special Cases Manageent of Temporaries . Increasing the Use of Temporaries Stream Instruction Efficiencies Procedural Control . . . . . . . 21 . . . . . . . 21 . . . . . . 23 1. Explicit 2. Deferral of Tactical Decisions 3. Algorithmic Encoding Density . . . . . . . 24 4. Hijgh . . . . . . . . 27 . . . . . . . . 28 Level Interlocking Concurrent Error Monitoring . iv tV. HYPOTHETICAL HIGH LEVEL LAN3UAGE MACHINE . . . . 33 A. GenerAl Machine Organization . . . . . . . . 33 B. Objects . . . . . . . . 36 . . . . . . 36 . . . . . . . C. Aspects of Program Iaterpretation I. . . . . . . . . . 38 . . . . . . . . . . 39 . . . . . . . . . . . 42 . . . . . . . . . . . 45 . . . . . . . . . . 47 Prograa 2. Program Activatiots 3. Program Execution CONCLUSIONS REFERENCES TIN TS . Representation 1. . . AND . . . . . BIBLIOGRAPHY . . . .# 56 LIST OF FIGURES Page Figure 1. rypical execution-sequencing operators . . . . 10 2. Conventional loop structure for vector operations . . 25 . . 26 . . 26 . . 34 . . 37 . . 39 3. * . System/360 implementation of vector addition 4. System/360 implementation of inner-product 5. Jigh lvel language mactine structure 6. Internal representatin of an object . . . . . . 7. Internal representation of a PROGRAM object . 3. Internal cepresentation if 9. Example of a TEXT object a TEXT token . . . . . . . . . 40 . . . . 41 vi I. INTRODUCTION reportei to have once said that most papers Peter Landin is in descrioe 2omputer Science This paper is no exceFtion someone else already knew [cMseJ70J. rule. to that of notions combines a collection It what learned their author how that have been extracted from the literature with a number of my own ideas although independently that, before. surely been derived, have noted Wiile credit for many of the concepts presented belongs to others, I accept full responsibility for any misrepresenta- tions or ambiguities which may exist in this presentation. This paper is intended to serve primarily as an aid to others in the field of computer design by collecting, organizing these ileas. anl commentinj upon not permit much in that the this paper does The scope of the way of demonstrable results. tor the physical realizations absence of It is hoped mechanisms that are discussed will be compensated for by a somewhat brcader perspective than is customary in the literature. The reader semantic is assumed features langaages [M9; of the M10'; M13). to be familiar with APL, PL/I and the principal LISP he should In addition, programming have scme knowledge of the goals of coatemporary programming systems, problems that are encountered and the the hardware/software engiaeering technijues usei in seeking to attain these goals. T.A. Historical Perspective many diverse efforts to There have been that directly interprets a high level design a computer language [AbraP70; Bashr67; BashT68; BerkK69; .hes771; DennJ71; McFaC70; meggJ64; 1ul1l63; RiceR71; RossC64; ShawJ58; ThurK70; WebeH67; ZaksR71; M3]. The motivatian for doing so has generally been based upon either of two assumptions. First, it has been pastulated that zaa provide a significant level interface language useful projramming cOst. One would environment ia [flifJ68, p. culminate like to zreate a p. 3erkK69, machine in programmiig function without which run-time 14; is (DannJ69; )ennJ71], etc. a high increase in a prohibitive increase in well disciplined programming programming errors in 60], lumps generality while desirable a machine with are detected which such errors need not -BashT67; guaranteed McFaC70], or at in least which likely H:owevec, it is widely rezognized that, features such as these can be programmed upon contemporary low level machiiAes (indeed, upon Tucing machines!), the performance interpretive BerkK69]. cost software of is implementing unacceptable such features [Ilif358, pp. with 4,13; Various more directly. level language compilers has included designs have interpreters and ThurK70; Ross264; BarkK69; a high sought to is normally incurred by software avoid the costly overhead ttiat impleifmented overall system be attained by implementing improvement can performance assumed that an has often been it Second, [B3ashT67; BashT68; At least one recent effort WebeH67]. such as system functions several basic operating main storage management and process dispatching [RiceR71]. the goal was enhanced But whether or the an alternative achiaving these goals has Deen is specifically interpretation of a more and to date have been based organizations. machine language-independent existing Peceatly, which functions, most accolerated performance of conventional designs of high level language machines upon functional capabilities strategy promising for to employ a machiae organization for designed the level langiage particular high semantic efficient [AbraP70; T.urK7); ZatsR71]. I.B. Objectives objective of The primary this thesis previous machine whose work in computational rates this language area by reducing identify the attained, in principle, by a performance advantages that may be processor is to has is in high sought the number to Most level. achieve higher of interfaces, or tai high level interpretation, between the layers of the tailoring organization to machine by semantic (e.g., to This paper will concern itself with features of APL). the array primary the programming language an existing characteristics of further gone have Some efforts circuitry. machiie language and the oerforman-e improvemeats that may be attained over and above those that its treatment to the features It will not restrict interfaces. Instead, of any particular existing language. compatible language Then, copatatioaal mechanisms, the general ways exceeds that consideration of by the be made an attempt will in which a macaine with a language can intecface of a a set of loosely featuces will be proposed with computational mind. performance in number of reduction in the accompany a brute-force level achieve a machine to identify suitabie high level which of performance a conventional with relevant Neumann von architectur e. A higa level is language machine will be somewhat less flexible than characterized a conventional design, which but which significantly out-performs the conventional machiae when used in a way that exploits the high level language. It should be noted that no problems of translating the lifficult the high level machine languige. machine attempt will be made to address language would have to In from other languages into most practical systems, serve as an effective the target langiage for such transilatioas. Here, the sole zoncern is with thi hiiqh speel interpretatioa of a single, although excepticnally powerful, language. THIS PAGE INI'ENTIONALLY LEFT BLANK II. FAV3RABLE LANSUAGE CHARACTERISTICS section This that are many features pissesses system, characterizes impractical that are to a machine programming ia a iesirable which language implement upoa contemporary machine arhitectures, but taat favorably affect the performance high level language machine. capabilities of the The language of their relation to the chiricteristics are discussel in terms structure of programs and tue nature of primitive objects. not comprise section does This Such lanqage. dissertation What itself. in ciaracterization formidable task a that is of a the definition would require sought is sufficient to is a a support the new lengthy language following sections of this paper. For and precision, extraneous) issues of to syntax, the avoid (and thorny the program examplas here that follow ia this document will use a simple LISP-like notation: (operator operandl opecand2 ... operandU) N3tation variables and constants will be indicated by lower-case and upper-case symbols, respectively. Proggla Structure TI.A. In terms of its program structure, the envisioned high level machine language most closely resembles the language LISP. Each of its programs is a structured expression consisting of an operands; and each operand is, operator and an optional list of in turn, a structured expression. As in LISP, the operators and operands within the text of a program are environment merely (or symbols whose context) ia course, various static and are possible; which upon invoked. the Of lynamic symbol resolation mechanisms (which would general, that, in be determined program cannot resolution time depends the program is is important is but what of a sanantics meaning typically be prior to the symbol as program as late activation time). The mntivation that desired expression an be equivalency twotold. First, it is betwaan values and should be passible to replace aay value with an that evaluates in other vice-versa. closed there It expressions. for these features is to (or woris, under composition. "returns") that value, all language constricts Secoad, there is to and should be be no sacred distinction between builtin operators and user-defined programs. It should be opieritor, possible within his for a own user local to redefine any environment, by "system" providing a operator should of an the redefinition Moreover, the appropriate name. program with to those require changes not in itself programs that use the operator. also possesses The language operators builtin (a superset of set of and powerful a large Of of APL), the operators siagular importance are the execution-sequencing operators which the COBOL there wouli be some sort its operands in of SEQUENCE operator operands its employing concurrent hardware which would repetitively evaluate some number of times, or one of until some further stipulated (perhaps operator operands either met, and so condition is that the no genecation way modify itself. of shareable programmiig practices, erecition-time programs for the high That is, an executing program leveL language machine be "pure". in which (see Figure 1 on page 10) It is may its example, operator a REPEAT processing), IF which evaluates order to regard without evaluates forth. a PAaALLEL segueae, strict For etc. statement, PERF3RM statements, DO and the PL/I of to those functions analogous perform constraint promotes This outlaws software, certain conmon ani aliminates programming errors. many It also has the "tricky" types of implications rejarding the organization of the language interpreter. (SEQUENCE expression1 (PARALLEL expressionl expression2 expression2 ... expressionN) expressionN) ... (REPEAT expression1 expression2 TIMES) (REPEAT expression1 WHILE expression2) ELSE expression3) (IF expression1 THEN expression2 operators Figure 1 : Typical execution-sequencing Another important feature language is its facility of [M13, Sr'NAL 114-106] pp. statement. in exceptions 4ith singlae handles which provided A a proposed high level for processing exceptianal conditions. In this regari, it is unlike either PL/I the uniform APL or LISP, but similar to its conditions, exception ON-units, handling mechanism both builtin way. Upon is user-generated and the and of occurrence an exception, the current activation chain is sequeatially searched (from the most for activation) consists primarily recent an of the context found, in the a program then it is to which be (which corresponds executed as if any variable in If executed). It that context to an the were invoked in it which the exception occurred. inspect or molify "root" system exception-action-specification exception-action-specification exception is to activation may choose to (subject to the authorization mechanism), to signal another exception, to return t3 the Poiat of interruptio, to execute a return (with value) from the interrupted suspend program, to results in a SUSPENSION exception in If forth. so ani process), actian-specification is found, The system is sigaalled. then (which the process which owns this corresponding exception- an EXCEPTIONEEROR exception root activation eaxcption-action-specification always provides an for this exception. Prizitive Data Types IT.B. of primitive The set machine inludes such tu-)les. type (e.g., (e.g., shape are recognized aggcegate objects as vectors, eaca object in the scalar, vector, tuple), by the arrays and system has whica contains information program, character, integer, its ownership, real or an regarding its complex) and as well as an indication of persistence and access-authorization. Consequeatly, the objects that implies that Tiis as3ociated descriptor of no the process the machine is able to examine the attributes objects that it manipulates, functions as operator distribution, and thereby perform such domain-rule enforcement and data protection. The internal encodings of the object descriptors and values are inaccessible to the user. pcovided for the to another (e.g., Appropriate builtin operators are purpose of converting an object from REAL to ZOMPLEX). from one type Predicate operators, such as ISINTEGER or ISPRJGRAM, are pairpose of accessing the information also provided for the in the object descriptors. Information is written into the object descriptors only by means of the BUILDOBJECT operator, objects (the conversion operators employ BUILD3BJECT). As a result of the sole haviag means foc self-describing constructing data which is manipulated by an attribute examining machine, it is possible to have objects whose type and/or exists an object of is orject whose type is undefiaed, etc. unlifined objects value is undefinal. uniefined type and Thus, there undefined value, constraiaed to be INTEGER an but whose value Furthermore, it is reasonable to permit such be compoaents to of an aggregate object without requiring that the eatire aggregate object be undefined. By definition, Cesults il the UNDEINEDVALUE. operator waich signalling of an appropriate which guote their can be they lo not place to another, and operands (by using the applied to undefined objects or contain undefined objects because information. from one copies objects builtin QU)TE operator), eaxception exception, such as However, certain operators, such as the builtin programmed operators objects that an undefined part of an object an attempt to use and will not signal an use undefined actually the MECHANISMS F3R ACHIEVING IMPROVED PERFORMANCE III. section This perf ormance the discusses For the purpose of this suitable high level interface language. are grouped these mechanisms that apply those ins3tructia- unit or ALU) , (I-unit those a conventional it will arbitrary; machine. sometimes This the to the class of apply a do not relate to any part classification is somewhat execution-monitoring mechanisms that of been called that and CU), or classes: into three traditionally what has to (E-unit execution-unit it has a available to a machine because mnhinisms that become exposition, improvement case be the that a particular machinism could be viewed as belonging to more than one of these classes. ITI.A. 22timizatign of Ex2gio2n Evaluation Part A high level whicn language machine it evaluates programs, the which are may use expressions. The presence of primitive implicit management viewel evaluation rate. the techniques section deals with of this the rate at contextual structure of to increase aggregate objects of temporaries are three as contributing that a to a and the language features higher expression Zontext-Sensitive 3ptimizations III.A.1. its programs Because may employ a top-down level language macaine which each encountered execution in well its computation knowledge concerning its ia dynamically performance operator is is a wealth a of to optimize able ways several in executed not determinable that is result, it As a execution time. prior to method of program machine possesses Such a iefiaed context. high expressions, a are structured that are not pssible on conventional context-free computing zachines. Avoiling Unnecessary Operations ITIA.1.a. context-sensitive is no [ElsoM70, p. in the be the The operation, same, results that they of course, but the evaluation of the expression (LENGTH (CONCATENATE STRING1 STRING2)) need to actually concatenate 167]. All that is required is result of concatenating machine, For "short cuts" may be used internally to improve Thus, performance. there must of tair cost which they are executed. produce ultilately contextual behave differently depending upon certain builtin operators may the context in available the unnecessary operations. minimize the to ia order example, performing to avoid infocmation use may machine the First, the strings. On a CONZATENATE operator could the two the length strings of the high level language recognize that it was the LENGTH operator and that it invoKed as a lirect argument to the strings. need not concatenate Instead, it could return as taat contained the correct result its value a string descriptor This could be value component was undefined. length but whose descriptors alone, with no accomplished by accessing the string need to even fetch the (possibly lengthy) strings. of this sort are Many other optimizations possible. Note that this optimization could not be performed prior to execution time, as compiler, since by a LENGTH and CONCATENATE is is it However, large class oE may only which be performed by the operations. into tne two processes There is a transformations that instruction stream has identified 66-63] pp. he separates symbols not possible for context-dependent operators more global work reduction Abrams [AbcaP70, of the not then known. avoid all unnessary these to such as the resolution interpreter. a number of these of drag-along and beating. Drag-along is the process whereby the machine defers the evaluation of each operator and operand for as long as possible. By deferriag the evaluation of an expression it becomes possible the expression to simplify only small parts consists of of the manipulating in ways which are expression are the deferred impossible when available, Beating expressions, and in order to reduce the particularly the the object descriptors, work of amount For done. be needs to that example, the TIMES. This expression (TAKE 3 (TIMES (NEGATIVE v) vectorl)) might be reduced to (TINES deferral of by the v) (NEGATIVE 3 vector1)) type operator the non-select transformation particular (TAKE vectorl) (SHAPE (MINUS avoids 3) unne:essary multiplication operations. Reoriering Operations IiI.A.1.b. Ramamoorthy [RamaC71] time can only it be minimized or1ering of subexpressions. subexpressions decreasing be should reoriering resolution. In particular, evaluated process must be rhe overhead involved in unacceptable when the high level requirements. level machine. language ex3cution of several choose after tacKle if symbol language is implemented upon a well be viable upon a Thus, when faced unordered expressions, and when to But such a dynamic process is execute all of the expressions simultaneously, rationally their of execution time, performed conventional machiie, but the process may high shown that order the in the given to he has be reordered to minimize sibexpressions are to the consideration is processor time and memory expression execution has noted that the most with the unable to the machine could resaurce-demanding exorassians first. III.A.l.c. Exploiting Special Cases The tachaological control stores development of writeable suggests thtat the microprogram for a particular machine might be many times larger than the capacity facility cilli be provided store and backing implementiag would permit specializei some store, to be performed be useful in Essentially, it it would machine. employ a library Depending upon micro-procedures. operation to microcole between level langaage machine the If a for paging the control a high of the control store. and the context, the the of highly particular machine could invoke a micro-procedure that is specifically designed to handle that situation. extensive In "special effect, the casing" ElsoM69] withait dascribel La machine would :similar to cequiring a the be capable OMD of mechanism larger than ncrmal control store. III.A.2. Parallel Processing of Aggregates since aggregate language, i high objects are level language machine may hardware techniques to efficiently ajgregates through a primitive witiin are particularly pipeline, or direct employ specialized deal with then. amenable its machine to high-speed parallel processing Homogeneous streaming by cellular lojic arrays. components logic-in-memory c)Iaventional be will than economical more justifications see logic [for "random" LSI technology, advent of that, with the Indications are HenlR69]. It will be possible to incorporate logical functions directly in the memory because the size may number the oa constraint of the relative to large is becoming a chip on be placel (and complexity) of the circuit that interconnections. chip-to-chip Thus, the most cost-effective coaputer organizations will employ regular arrays of memory with builtin logic capabilities. III.A.3. Parallel Processing of Special Cases There matrix, inverting a the operation exist several waich there for as such operations, many are algorithms which exhibit various degrees of speed and applicability. case that commonly the will "work" whenever it is the >peratian Arl there are from the domain inversion, is defined, several significantly faster, a particular there is although it executes algorithms any noasingular is algorithm which Thus, rather slowly. execute which work for special cases although they only of arguments. It to an argument for the which applied other of in the n-by-n matrix may case of matrix be inverted by 3 triangular decomposition, requiring slightly more than n scalar multiplications and divisions. However, if the matrix is obtained in only n3 /2 scalar symmatric, then its inverse may be multiplications and divisions [RalsA65, p. 446, p. 462]. A whose performance machine language high level is of paramount importance could exploit this situation as follows: To iavert a mitrix, domain-test algorithm which would algorithms in parallel with a silect more iatrix inversion it could execute two or from the result fastest algorithm the that properly apolies. Some Dther operations that have the determinaat of are the calculation of this sort special case algorithms of a matrix, the aigenvalues and eigenvectors of a matrix and the zeroes of a polynomial. IIt.A.4. Dynamic Management of remporaries In the evaluation machine, perhaps customary on Tiese possess its cells lue to software implemented dynamic the number of stocage own are usually load-time or activate-time, advantage dynamically manage temporaries. conventional machines somewhat statically cells. potential performance the greatest results from the ability to is higa level language of expressions by a calls tor set of allocated each procedure to reserved temporary at compile-time, the computational expense of storage allocation. that It are Conseguently, dedicated to use as tenporaries throughout the system far exceeds the minimum number that are actually needed. On a allocate high level temporaries on after their it language machine, demand and is release them use. Since a temporary immediate context of an enclosing subexpressioa, is only storage may be used to small amount of possible to immediately used within the a relatively efficieatly satisfy the temporary storage requirements of a large system. the Because temporaries is instantaneous generally small, a need not This implies contribute to for local store very high speed used to contain with the processor could be that is integrated the temporaries. reguirement storage that references to temporaries data transfer the processor-to-storage bottleneck. III.A.4.a. Increasing the Use of Temporaries Since references to than references wartawhile to to temporary-references doing so is to in operanis attempt to employ a more efficient temporaries may be far to main storage, increase staraye-references. machiae language the ane that is it may ratio method be of for expression oriented and has an abundance of builtin monadic operators. 20 contains a possesses language APL The dyadic operators operands (e.g., the reciprocal and exponential of nontrivial percentage programs suited to exploit the tnat The by the large consist of level machine of their operators). demonstratel of APL is contrast, low expression. In value for one that assume a iefault expression oriented nature are merely monadic operators which large number of it characteristics; these a single languages are ill efficiencies of temporaries, particularly when the values involved are nonscalar. 111.3. Instruction Stream Efficiencies Part B of mi-hLne with having a this sectioa a high level high level discusses four ways interface language can instruction stream. The in which a benefit from presence of operators for explicit execution-sequencing control, the absence of detailed and unnecessary density of program inteclockiag II.3.1. encoding and the freedom are presented instruction-issuing tactical specifications, the higher as factors from much needless contributing to higher rates. Explicit Procedural Control on high performance machines such as the Control Data 7600 aal the IBM System/360 ModeL 195, pipelining and parallelism are used within the E-unit instruction execution rate. power is wasted to achieve a because major improvement However, the much I-unit is of this in the increased unable to decode instcuctions and issue them to the E-unit at a commensurate rate [11, pp. 13-13, 31-34; ThorJ7l, to decode several separate pp. nature of severely effectiveness ThorJ71; M1; M4; 18]. conditional branches The pipeline the instructions of but the being decoded this process [BuchW62; [-unit is continually "surprised" by and otaer discontinuities which reloading of instraction E-unit Attempts are made instructions simultaneously, nominally sequential limits the 124-125]. buffers and cause a streams. As Flynn require a disruption in the [FlynM72, p. 21] has observed: Model 91 had execution rhus the IBM System/360 ... resources in excess of 70 MIPS (million instructions per sec) while this was immediately restricted at a maximum instruction decode rate of 16 MIPS; further with an average incidence of branch and data dependencies this was reduced to 6 4 IPS. Thus the discrepancy betieen available resources of 70 MIPS and average request rate of 6 MIPS. If a machine has program structure these I-unit as described in difficulties surmounted, caa be a hijh level with the By requiring that relieved of interface language section II.A., instruction all then stream procedures be the responsibility write-operations into prefetched instructions. of with a most of can pure, be the supporting instructions branch terms alone -DijkE68; DijKE70; convey information valuable processor to the content of an iteration, the number of times be performed, the extent etc. conditional, to informatia The true and of the micaine organize the 10 can regarding the aa iteration will false clauses on a use this in principle, can, and thereby its resources use of Language Figure 1 on page as those described in coastructs such MilIH70], they implications. performance machine promising havae the language can be justified structured programming aspects ot the in user-oriented While encounterel. be need that number of in the the reduction importance is Of greater optimize its own performance. III.3.2. Deferral of Tactical Decisions A pro3lea allocating machine resources at unique language must instructions be whica mapped reference specific is a complicated task This generally requires an optimizing compiler a Model 35, high level level low registers and to perform, and it well. to perform a System 360 Model 50 may be Even then, what is "good code" for inefficient on of machine storige locations. a fra sequence a inta detail that a level of The statements reluires overspecification. that of based systems is that plagues compiler and vice-versa. In fact, on high performance machines such as the 360/195 these a priori tactical decisions are a severe handicap. As Chen [Chen711a, p. 74] has note,: a piece of proceiural language code retains a wealth of .. statement FORIRAN A information. independence job connected causally of string a describes essentially locally are often statements but adjacent events; executed be can and other, each of indepeniant conventional compiling process Yet the concurrently. machine instructions are resultant obscuces causality. The unrealistic causality imposing tactical prescriptions, and arbitrary facility time) a demanis (one instruction at assignmeats ("register 2", "address 32768"); they becloud process, and human understanding and impede the debuggia that inefficiency are such potential sources of computer statements original machines are known to recoastruct the internally for better traffic flow. problem entirely. interprets programs that it The avoid this interface language may with a high level A machina can be free from purely machine-oriented constraints. Algorithmic Encoiing Density TI.3.3. On the machiae, conventional a that manipulate non-scalar operations imolemental by means level (either iterative of some sequence of low level instructions. recursive) language objects generally must be repetition of the high or Consider the addition of two vectors with the vector sum replacing one of the argument vectors. of the Filure 2 form A=A+B (page 25) the program To ba specific, where A and B are indicates, there loop structura whica The first of these consists consider a PL/I statement vectors of length are four logical underlies suca n. As parts in an operation. of several setup instructions which ace executed only once at the beginning of the vator operation. ENTE R I V I setup Initialize registers for loop I r- ---------- I r-->I Scalar lOperationi V r----------1 lIncrementi I I Index A (i) <- e.g., i <- i+LenIth (A(i)) A(i) +B (i) V I - e.g., Loop until vectors have been processed -Condition rest I EXIT Figure 2: Conventional loop structure for vector operations or ctr to fashion. the operation accomplish a Thus, praportional however, three parts, The remaining certiin number required for to n, is 3 and 4 (page iterated n are an in of the times in element-by-elemenit references, memory purpose of fetching instructions. Figures impl1ementatioas operitions. of the rhese programs 26) vector contain "optimal" addition and are optimal in the System/360 inner-product sense that they occupy the fewest bytes possible and have the shortest execution 25 LI L A ST BXLE LDOP R3,R5,LOOPCTRL R2,A(R3) R2,A(R3) R2,A(R3) R3,R4,LJJP Setup for iteration Fetch A(i) Add 3(i) Replace A(i) with sum Loop until A(n) is processed Initial LOOPCTRL F 'i4' nv nF A 3 value for index R3 Limit value m=n*4-1 Increment Vector of fullword integers Vector of fullword integers Figura 3: System/360 implementation ot vector addition LOOP LI R3,R5,LJOPCTRL Setup for iteration SER ME F2, F2 F4, A (R3) F4, B (R3) AER BXLE R3,R4,L3JP STE F2,INNRROD Clear accumulator Fetch A(i) Multiply by 3(i) Add to sum Loop until A(n) is processed Store sum LE F2, F4 1' LIOPZTRL A B INNRPR)D F14e nE nE E Initial value for index R3 Limit value a=n*4-1 Increment Vector of short float Vector of shart float Result of INNERPRODUCT(A,Bd) FIGMURE 4: System/360 implementation of inner-product time. assume They are somewhat unrealistically convenient addressability efficient in that they to all the Nevertheless, instruction fetching accounts for menmory references in 63% in the case of the vector the case of the inner-product. required data. over 57% of all adlition, and over A with machiae repetitively operators, fetching only a by this overhead Since instructions. atomic as its are builtin and automatically distribute such as ADD, over vector3, language, interface II, will not be burdened describel Ln section of level high a neei be single "instruction" fetched in orier to perform the entire operation. III.B.4. As High Level Interlocking in noted dependencies in the the 111.B.1., Section instruction stream legradation in the instruction execution hign-performance Elaborate machine ,31, such as schemes, the results in of data a major rate of a conventional 31-34; pp. presence Scoreboard FlynM72, on the P. 21]. CDC 6600 [rhorJ71; DennJ70], are required to interlock storage references in order to prevent with conflicts (i.e., storage cell, to insure that no aperation is a given respect to interchanged with a write operation). This problem will language machine but one: to exist continue will be of a much a high level smaller magnitude. For thing, most storage references made by a high level language by the machine rather than machine will be generated internill y by on the programmer. For builtin operator applied machine generation of example, the programmer's to vector operands will use of a result in the the numerous storage references that are required to scehenata may without be used), the burden to prevent conflicts necessary on a larger scale will still be a pipeline interlocking Of course, of interlocking. be these references machine may issue the can algorithm that the extreme case, be conflict free (in designed to Since these the vector.- fixed a generated oy are references elements of process the But the interlocking mechanism amo)nj the liiga level operators. will need to be used much less frequently. III.X oncurent Error Monitoring tea largely component developments in due to of introduction continue to to is expectel years ani the past has iacreased manytold over Hardware reliability tecanology and hardware-error sophisticate This improve. detection is the and correctibn schemes. Unfortinately, systems which to expand systeD in scope, software to sinultaneous users. crash an Yet, crisis, no widely-used and or place It is now commonplace reliability. tie have emerged since the CP-67/CMS) has a experienced not similar the complex operating days of IBSYS, and which Moreover, in reliability. iaprovement cmatinue has software increasing emphasis upon for a simple malfunction in entire system despite the apparent with its and many growing general-purpose system (e.g., OS/360 overcome this problem. It is generally as we know them a-knowledgad that powerful pcogramming systems, today, are never completely debugged. many execution-time errors conventional machine fact, in are, detectable detected be to machine language, low level with a all If systems. contempacacy on prictically detected be cannot errors software of types common is that uncliability of software reason for the A major cn a then same substantial fraction of the machine's instruction execution rate must be expended continally upon error checking [a partial exceotion is the Burroughs 36700 family of machines which have a lw level for block-structured requirements, particularly that is more (e.g. checking distinguishable); see incurred rejardless the technology software conducive to of th e it perform error monitoring. lastead, instructions cimpating which power. of tae consu me Virtua Illy be as the the machine cannot as parallelism) to efficiently (such meaas must cost and hence does not convey any low in laval, emil-y hardware techniques by are data As long is implemented. high level language semant ics to the machine, performed other organizaticn or machine's internal with which machine laaguage is This M5 and 3rgaE71]. than small degree of an1 instructions , laiguages, and reliability that provides only a cantemporary systems, but error certain software does reflect machine language that error monitoring can only be addition some ao of fraction explicit of the general-purpose machine machine's programming 29 the checking because run-time error employ extensive syst2-ms costs involved are unacceptable. consider example, For CBS, BIt accompanying the activities. to certain In particular, it is system as the second perform subscript testing within a This compiler debugs it wites a program, fully facility, and then installs the tests removed. programmiag Howevec, development use such a in is oriented, to particular program only if a based upon scheme is the limits executed operating system compila time [MI1, program-checkout option was specified at 172-173]. to handle. application programs. waich is approach, program infeasible basis for a frequently or for computation-intensive A degradation performance system the of usefalness detect and easy to are therefore sibscripting errors and software by interpreted are statements PL/I all In [M12]. as exemplified by CPS totally interpretive approaca is the First, there are used. approaches that basically tio There are PL/I. a language such as subscripting operations in illegal of detecting problem the the assumption using the pp. that one program checkout debugged program with the error practice, bugs manifest themselves days, a program las been in productive operation. many non-trivial or even years, after In order to get a rough measure of the overhead involved in p3Ecfrmirg type this of anI STRINGRANGE' tests. It was found was accompanied program execution contemporary programs were the compiler generate1 SUBS2RIPTRANGE run both with and without error checking (F) [M11] "off-the-shelf" PL/I machine, saveral a on checking error by a a 6B% time and that this simple to 15% 179% increase increase in to 97% ty .e of in program size. Although the compiler generated tests were not as efficient as hand-coded tests, they were reasonably overhead could be reduced by at good. Perhaps most a factor of two. the However, it siould also be noted that the test case programs did not make heavy use of that make subscripting or extensive use of string manipulations. these facilities Programs would undoubtably incur a higher penalty. On a machine with a high of error letection could 0e level machine language, this type performed concurrently with actual computation that is being monitored. the THIS PAGE INl'ENTIINALLY LEFT BLANK 32 IV. HYP3IHETICAL HIGH LEVEL LAN3UAGE MACHINE This section describes a machine which is sole purpose of directly executing a type described in Section I. omitted. important topics Some Dersistence are not provided are intended the specific values designed for the high level language of the Of necessity, many such as even addressed. object ownership The details to illustrate the nature that are used for meant to be reasonable but generally details are that and are of the machine; design parameters are have not been subjected to system-wide tradeoffs. IV,A. General Machine Orjanization The proposed with a structure functions of machine is as a shared indicated in each I-unit are encoding of a program written in resource multiprocessor Figure to 5 (page 34). step through a linearized a high level machine language, to maintaia the current state of execution for that program, to issue reguests for computation to results. Ihe I-units. It units (FU's) E-unit services consists of a to the E-unit and the computational and await the needs of the collection of specialized functional which are centrally coordinated. There are I-units The a a number common of reasons E-unit. for coupling First, because the multiple the builtin |LogiC-in-Memory) I Cache I I I E-unit ----- r r-----i I-unit I-unit jI-unitj 1 -unit r------i r----,- r r ----- jCache I ICacae I JCache I Icache I , Figure 5: High level language machine structure op3rators are numerous and quita large. Furthermore, which perform the operations) are used complex, an iost of the matrix inversion, E-unit is the FU6S FU's (such as coot square Thus, it is irregularly. necessarily index and unreasonable to dedicate a complete E-unit to each instruction stream. Second, the need to E-unit operations orller to prevent conflicts. If be interlocked multiple E-units were used, would not really be independent, but in they would need to be centrally coordlinated anyway. Third, machine can and lastly, the E-unit for accept requests at a a high level language much higher rite than it can 34 possibly complete them -- this familiar pipelining phenomencn is accentuated by the more substantial operations that are builtin on such a machine. The interface between the I-units and either synchronous or asynchronous. Flynn :FlynM72] and by It appears synchrony or protocol is not sensitive tj may be Attractive approaches have been iavestigated by that the the E-unit Plummer [PiumW72]. asynchrony of tae use of a high the interface level machine language. Underlying whic! provides (16 If every bits per -ell) spaces may names. At which are system of uniguely an orderai set of fixed consecutively of holding implemented in the without needing generation rate the machine running out of the purpose be addressed a space five microseconds, years before one-level storage the space names are 48 bits in length, then up to 2.3.1014 distinct reuse space a Each space consists of cells addressed. is an effectively inexhaustable number named spaces. length the processor could run unique names. machine generated one-level storage Spaces of one space for about 39 created for temporaries are systea, to and do not not contribute to the consumption of space names. Each I-unit has its storage hierarchy. own cache Since all store as an interface tc the programs are read-only, these 35 eich other or with the activity of the E-unit. capaoilities logic-in-memory possesses either with aad ace not interlocked, caches are unidirectional and The E-unit cache is organized in sectors [as described in StonH70 and ThurK70]. Objegts IV.B. descriptor and an associated which contains a in Figure 5 (page 37), and access structure system consists of stored within the Each object a space As shown value. the descriptor specifies the object type, For constraints. aggregate objects, it also specifLes the object caak and dimensions. are There representation an of an executel. execution is a program in are used to phase This section ia which the in phise used which a an process character a PROGRAI object string PROGRAM and an ACTIVATION object, ACTIVATION iiscusses several key aspects different phases of program interpretation. of hypothetical to generate generate an which tae on this language program activation phase ENVIRONMENT object in phases temporal a translation machine: 3oject, three a machine interpreting ani Int ergetation Asgacts of Prog2a IV.C. object is of these Bit: 0123456789ABCDEF r----------------i Cell 0 jAAAABBBCDDDDEEEEJ Cell 1 IFFFFFFFFFFFFFFFFI Cll 2 1IGGGG Cell 3 1GG GGGGGGGGGGGGGJ GGGGGG) Cell n IVVVVVVVVVVVVVVVV4 1----------------- AAAA - Object ?ype: indefined, Logical, Integer, Real, Complex, Character, Label, System or Mixed BBB - Object Structure: Undefined, Scalar, Vector or Array C - Value Pcesent Flag DDD) - Object 0000 0001 0010 0100 1000 1100 EEE - Object Descriptor Extension FF...F - Optional rank field GG..G - Optional dimeasion fields Access Constraints: - uncoastrained - walue constrained - type constrained - cank constrained - limension constrained - struzture constrained vectors and arrays, arrays) VV...V - (present for arrays) (present for field repeats for Value eacaded in as many cells as required Figure 6: InternaL repcasentation of an object 37 Program Representation IV.C.1. A PRO3RAM 3bject,, which the TRANSLATE operator to cceatei by applying evaluates to the program that invoka rhis allows exc2ptions. appropriate operator signals TRANSLATE then the translatimn, denotes a program during in the source errors are detected If program. in operand which whose value string object a character SYSTEM, is of type is an object TRANSLATE to decile whether to continue or to abort the operation. Figure 7 As of c-iaracter string rhe second, dhcived. whici contains a TEXT object, an encoded is a the object was an object of type SYSTEM is linearization of as a linkage vector for binding nonlocal symbols. object, is a SYSnBM object tree. the program The third, a LINKAGE object, is also a SYSTEM object. SYIB)L is of copy PROGRAM the from which object PR3GRAM object a The first elements. five comprised illustrates, (page 39) which servas It serves The fourth, a as a symbol table, containing such information as the symbolic names for all tokens boundary the TEXT ii object. address (3DY) the And which is to distinguish used is fifth component a between local and nonlocal symbol references. An object of singular importance soecifies the actual algorithm to be eKecuting a given program. is the TEXT object, which performed in the course of It consists of an ordered set of 38 Bit: 3123456789ABCDEF r Cell 0 Calls - -- - -- - - - 1O11131011111000I ptr. 1-3 to SOURCE Cells 4-6 ptr. to TEXT Calls 7-9 ptr. to LINKAGEJ --------------------------- 4 ptr. to SY&BOL Cells 10-12 r------------------- elemants tait form shown in or TEXT tokens of the pointers trand are either op Figure 8 (page 43). Figure 9 (page 41) contains an example of a TEXT object (note that this object has undergone symbil resolution; i.e., it Program Activations IV.C.2. The builtin ATIVATIDN copy of the a opecator ACTIVATE object a given ENVIRONMENr objects. in is part of an ACTIVArION object). way. Its 35 bits of BDY, local symbols in this and to create one or an more It accamplishes this operation by making a "activation area" of storage, hih orler used PR3GRAM object PROGRAM object and privileged is then manipulating functions storing include the new object allocating an the area aidress into the creating the re4uired "activation area", instances of binding operands to pcgram symbols, and resolviig nonlocal symbols by searching the Bit: 0123456789ABCDEF IABBCCCC2CCCCCCCI A - Builtin/Defined Flag This flag is set to 1 by the symbol resolution mechanism (shen creating an ACTIVATION) it this symbol resolves onto a builtin operator. Hence, during execution, builtin operators are immediately self-identifying. BB - Operand Designator 00 - no operands (symbol is niladic) 01 - one operand (symbol is monadic) 10 - two 3perands (symbol is dyadic) and first operand has no operands 11 - abitcary number of operands rhis field indicates the number of operands that are actually being passed to the symbol. (It does not indicate the number of operands that the symbol will accept; symbols may caoose to accept varying numbers of operands) Its purpose is to reduce the number of operand poiaters that are required. CC...2 - Token Offset If this offset is greater than tie low order 13 bits of BDY (in the ACTIVATI3N object), then this offset points to an entry in the nonlocal symbol LINKAGE object; else this offset, waea appended to the higi order 35 bits of BDY, coastitutes the spice name of an object that is local to this ACTIVATION. A program may reference up to 8192 distinct objects. Figuce 8: Internal Representation of a TEXT token 40 Source program: (ASSI3N X (SUM Y (F-N1 (MAX X Y Z) (FCN2 Y)))) Corresponding TEXT object: Bit: 312 3456769ABCDEF Contaias )11J13131111103311 object descriptor 11101 builtin dyadic opcode 'ASSIGN' I i----------------- I 13001 I-----------11101 niladic reference X 'SUA' 3 I-----------------1 13001 builtin dyadic opcode niladic reference Yi a-adic reference ----------------- 1 2 operand pointer i operand pointer 1111I 'MAX' I operand pointer 3 3 builtin n-adic opcode 1 operand pointer operand pointer 3 niladic reference 13001 niladic reference I ------- 1000i 1 niladic reference I --------- 10011 FCN2 monadic reference 1001 Y niladic reference Figure 9: Example of a TEXT object ENVIRONMENr objects in the sequence provided (each ENVIRONMENT object may also specify a successor ENVIRONMENT object) IV.C.3. Program Execution An ACTIVATION builtin EXECUTE operator. number of the application executed by may be of the The execution of a program involves a activities in the I-unit and E-unit portions of the ma chine. The I-unit is envisionel which linkage operate under fetch unit fetch unit has central and an control: a instruction its own cache from sequential manner) object component as consisting of three major parts the tokans that are of the program. program in a top unit reals the fetch unit, assembler. The which it reads (in contained in It is equiped stacks so that it may conveniently through the token a token a highly the TEXT with hardware recurse when walking its way down fashion. contents of the LINKAGE object The linkage fetch component of the program in order to obtain tie space name of an object to which a nonlocal The instruction buills symbol has been bound. logical instructions for the opcole and a list of the space issues the logical instructions reply, which is or an exception. E-unit assembler by collecting names for its operands. to the either the space name of E-unit and an It then awaits the the resultant object, The E-unit does all interlocks upon the the actual operaad space names. the value component of an Agjregate characteristics of logic-in-memory cache. a machine that fetching of the vArious The object is functional operands and actual layout of determined by the units and the It is crucial to the performance of such its objects Da internally orgaaized in order to maximize spacial locality since the one-level store will used so extensively. THIS PAGE INI'EN'IONALLY LEFT BLANK V. 2ONCLUSIONS This paper has wtich a machine langiuage investigated a variety that directly interprets a might expect to achieve of mechanisms by suitable high level improved pecformance. One result of this effort is a catalog of such mechanisms, which may b: of some use in the design of high performance computers. Another sijnifican:e result is an iacreased (in terms of performance) macrhine languige. It is understanding of the of adopting a high level now the author's view that the use of a low level machine language, as an intermediate interface between the high effects level language upon the and the machine, has potential execution rate of two inherent the high level language. First, relevant the low level interface semantic information program to the machine. iatent of the high level restricts which flows the amount froi the apecations. acceptable to While, with yesterday's it was sequ3nce of context-free atomic orders, advances a high executing The computer is deprivei of most of the technology, now permit of decompose a performance Maciine program into a in technology to profitably employ a knowledge Df the macroscopic operators and operands. 45 Second, artificial conustraints camputatioa because impacts detaila: conastraints have a low level tactical long been are imposei language, by its prescriptions. an abstacle to upon the the very nature, These the design unwanted of high performance machines and will become even less acceptable as the functional capabilities of hardware increases. REFERENCES AND BIBLIOGRAPHY Abbreviatians Jou rnaal s: ACMI AF'IPS BCS E JCC FJCC IBM J. 3f Res. and Dev. IBM Sys. J. IEFEE I FIP NAE 0 N SIGPLAN WJCC Association for Computing Machinery American Federation of Information Processing Societies British Computer Society Eastern Joint Computer Conference Fall Joint Computer Conference I3 Journal of Research and DeveloFment 11 Systems Journal institute of Electrical and Electronics Eagineers International Federation for Information Processing National Aerospace Electronics Conference ACM Speial Interest Group on Prograiming Languages Spring Joint Computer Canference Western Joint Computer Conference General: Bull. Comm. Conf. Cong. J. Pro-. Pt. Pub. Res. Rept. Supp. Symp. Tech. Rept. Trans. bulletin Comm unications Conference Congress Journal Proceedings Part Publication Research Report Supplement Symposium rcanical Report rcAnsactions 47 Books, Periodicals and Reports AiraP73 Abrams, P. S., An APL Machine, Tech. Rept. No. 114, Stanford Electronics Laboratories, Stanford University, February 197) Alle?71 Allen, F. E., AlleL169 Allan, M. W., and T. Pearcy, Developments in Machine Arciitecture, Proc. of the Fourth Australian Computer ConE., Adelaide, South Australia, pp. 227-230, 1969 Anila364 Amdahl, 3. M., et al., The Structure of SYSTEM/360, Part III - Processing Jnit Design Considerations, IBM Sys. J., Vol. 3, No. 2, pp. 144-164, 1964 AndeD67 Anderson, D. W., F. J. Sparacio and R. M. Tomasulo, The IBI System/36) Model 91: Machine Philosophy and Instruction-Handling, IBM J. of Res. and Dev., Vol. 11, No. 1, pp. 8-24, January 1967 Brt R69 Barton, R. 3rganization: International and John Cocke, A Catalog of Optimizing Transformations, Res. Rept. RC 3548, IBM Thomas J. Watson Research Centec, Yorktown Heights, New York, September 1971 Science -- S., Ideas for Computer Systems A Personal Survey, COINS-69, Third Symp. on Computer and Information Software Engineering, December 1969 BA;hT67 3ashkow, T. R., at al., System Design of a FOETRAN Machine, IEEE Trans. on Electronic Computers, Vol. EC-16, No. 4, pp. 485-499, August 1967 BashT68 Bashkow, T. R., et al., Study of a Computer for Direct Processiag of List Processing Language, Tech. Rept. No. 103, Columbia Jniversity, New York, January 1968 Bell71 3ell, ., 1. an1 A. Newell, Computer Structures: Readings and Examples, McGraw-Hill Book Co., New York, 1971 BerkK69 Berkling, K., A Computing Machine Based on Tree Structures and the Lambda Calculus, Res. Rept. RC 2589, IBM Thomas J. Watson Research Center, Yorktown Heights, New York, August 1969 BjorD70 3jorner, D., On Higher Level Language Machines, Res. Rept. RJ 792, IBM Research Laboratory, San Jose, California, December 1970 48 BlacE59 Bloch, E., The Engineering Design of the Computer, Proc. of the EJCC, pp. 48-59, 1959 BrawJ71 Brown, J. A., A generalization of APL, Systems and Information Science, Syracuse University, September 1971 Bu:h462 3ucaholz, W., Plaaning a Computer System, Book Co., New York, 1952 ChamD71 Chamberlin, D. D., The "Single-Assignment" Approach to Parallel Processing, Res. Rept. RC 3308, IBM Thomas J. Watson Research Center, Yorktown Heights, New York, March 1971 Chenr71a Chen, T. C., Parallelism, Pipelining, Efficiency, Computer Design, Vol. 1), 53-74, January 1971 Chenr7lb Chen, r. C., Unconventional Superspeei Computer Systems, Proc. of thae SJCC, Vol. 38, AFIPS Press, New York, pp. 365-371, 1971 Ches;71 Chesley, G. D., and W. R. Smith, The HardwareImplemented High-Level Machine Language for SYMBOL, Proc. of the SJCC, Vol. 38, AFIPS Press, New York, pp. 563-573, 1971 Chro 71 Chroust, G., Comparative Study of Implementation of Expressions, Tech. Rept. TR 25.112, IBM Laboratory Vienna, Austria, March 1971 CoatC69 Conti, C. J., Concepts for Buffer Storage, Tech. Rept. rR 30.1352, IB& Poaghkeepsie Laboratory, February 1969 CurtR71 Curtis, R. L., Management of High Speed Memory in the STAR-110 Computer, Proc. of the IEEE International Computer Society Cont., pp. 131-132, September 1971 Davi?71 Davis, R. L., and S. Zucker, Structure of a Multiprocessor Using Aicroprogrammable Building Blocks, NAECON Record, pp. 186-200, May 1971 D? nnj69 Dennis, J. B., Programming Generality, Parailelism and Computer Architecture, Proc. of the IFIP Cong. 1968, North-Holland, Amsterdam, pp. 484-492, 1969 Dnn.J70 Dennis, J. B., Modular, Asynchronous Control Structures for a High Paerformance Processor, Record of the Project MAC Conf. on Concurrent Systems and Parallel Computation, ACI, New York, pp. 55-80, June 1970 Stretch McGraw-Hill and Computer No. 1, pp. 49 DanJ71 Dennis, J. B., 3n the Design and Specification of a Common Base Language, Symp. on Computers and Automata, Polytechnic Institute of Brooklyn, April 1971 DijkE63 )ijkstra, E. W., 3o To Statement Considered Harmful, letter to the Editor, Comm. of the ACM, Vol. 11, No. 3, pp. 147-148, March 1968 DijkE70 )ijkstca, E. W., Structured Engineering Techaigues, NATD, Brussels 39, Programming, Scientific Aftairs Belgium, pp. 84-88, Software Division, April 1970 Elso 69 Elson, M., R. A. Larer, et al., A Prototype PL/I 3ptimizing Compilar, IBM Tech. Rept. rR 44.0071, IBM Systems Development Division, Boulier, Colorado, November 1969 Elso47 Elson, M., and S. T. Rake, Code-Generation Technique for Large-Language Compilers, IBM Sys. J., Vol. 9, No. 3, pp. 155-188, 1370 Flei17l Fleisher, H., A. deinberger and V. D. Winkler, The iriteable Personalized Chip, Computer Design, Vol. 9, No. 6, pp. 59-65, June 1970 Flin3i73 Plinders, M., et al., Functional Memory as a General Purpose Systems rechnology, Proc. of the IEEE International Computer 3roup Conf., pp. 314-324, June 1970 Flyn165 Flynn, K. J., aal 3. M. Amdahl, Engineering Aspects of Large High Speed Computer Design, Symp. on Microelectronics and Large Systems, Spartan Books, pp. 77-95, 1965 FlynI66 Flynn, M. J., Vecy High Speed Computing Systems, Proc. of the IEEE, Vol. 54, No. 12, pp. 1901-1939, December 1966 Flvn172 Flyan, M. J., and A. Podvin, Shared Resource processing, Computer (A Pub, of the IEEE Society), pp. 2)-23, March/April 1972 Multi- Computer Fost271 Faster, C. C., Jncoapling Central Processor and Storage Device Speeds, The Computer Journal, A Pub. of the BCS, Vol. 14, No. 1, February 1971 Gird?71 jariner, P. L., Functional Memory and Its Microprogramming Implications, IEEE Trans. on Computers, Vol. C-23, No. 7, pp. 764-775, July 1971 50 GictJ71 ertz, J. L., Hierarchical Associative Memories for Parallel Computation, Tech. Rept. MAC TR-69, Project 4AC, Massachusetts Institute of Technology, Cambridge, June 1970 HAssA71 Hassitt, A., Microprogramming ani High Level Languages, Proc. of tie IEEE International ComFuter 3ociety Conf., pp. 91-92, September 1971 Han169 Henle, R. A., et al., Structured Logic, Proc. of the FJC2, Vol. 35, AFIPS Press, iew York, pp. 61-67, 1969 IHbbL71 Hobbs, L. C. (Editor), et al., Parallel Processor Systems, Technologies, and Applications, Spartan Books, New York, 1970 Huss373 Husson, S. S., Microprogramming: tices, Prentice-Hall, 1970 IlifJ63 Iliffe, J. K., Basic Machine Principles, Elsevier Publishing Co.(in Europe: MacDonald, New York, 1968 JohnJ71 Johnston, J. B., Tae Contour Model of Block Structured Processes, SIGPLAN Notices, Vol. 6, No. 2, pp. 55-82, February 1971 JoseE69 Joseph, E. Proc. of Amsterdam, C., the pp. Principles and PracAmerican London), computers: Trends Toward the Future, IFIP Cong. 1968, North-Holland, 665-677, 1969 LawsJ63 Lawson, H. W., Jr., Programming-Language-oriented Instruction Streams, IEEE Trans. on Zomputers, Vol. %-17, No. 5, May 1968 MainS71 Madnick, S. E., An Analysis Project MAC, Massachusetts December 1971 Mc:rD71 IcCracken, D., and 3. of the Page Size Anomaly, Institute of Technology, Robertson, C.ai(P.L*) -- An L* Processor for C.ai, Rept. No. CMU-CS-71-106, Dept. of omputer Science, arnegie-Mellon University, October 1971 McFa7) IcFarland, C., A Language-Oriented -omputer Design, Proc. of the FJ2C, Vol. 37, AFIPS Press, New York, pp. 629-640, 1970 McKei67 jqJ64 IcK;eeman, W. M., Proc. if the FJCC, 413-417, 1967 Language Directed Computer Design, Vol. 31, AFIPS Press, New York, pp. Meggitt, J. E., A Character Computer for High-Level Language Interpretation, IBM Sys. J., Vol. 3, No. 1, pp. 68-78, 1964 Ml69 M1elliar-Smith, P. M., A Design for a Fast Computer for Scientific Calculations, Proc. of the FJCC, Vol. 35, AFIPS Press, New York, pp. 201-208, 1959 MiLlI73 Mills, H. D., Structured Programming, Unpublished Paper, IBM Corporation, Federal Systems Division, Gaithersburg, Maryland, October 1970 Mos e7) 1 oses, J. , The Functioi of FUNCTION in LISP or Why the FuNARG Problem Shioall be Called the Environment Problem, Artificial Intelligence Memo No. 199, Project 4AC, Massachusetts Institute of Technology, June 1970 MullA63 Mullery, A. P., R. F. Schauer and R. Rice, ADAMi: A Problem Driented Symbol Processor, Proc. of the SJCC, Vol. 23, AFIPS Press, New York, pp. 367-380, 1963 OcjaB71 )rganick, E. I., and J. G. Cleary, A Data Structure Model of the 357)) Computer System, SIGPLAN Notices, Vol. 6, No. 2, pp. 83-145, February 1971 Plim72 Plummer, W. W., :omputers, Vol. RalsA65 Ralston, A., A First Course in Numerical Analysis, McGraw-Hill Book Company, New York, 1955 Rala:69 Ramamoorthy, C. V., and M. J. Gonzalez, A Survey of rechniques for Recognizing Parallel Processable Streams in Computer Programs, Proc. of the FJCC, Vol. 35, AFIPS Press, New York, pp. 1-15, 1969 RanaC71 lamamoorthy, C. V., and M. J. Gonzalez, Subexpression 3rdering in the Execution of Arithmetic Expressions, Comm. of the ACM, pp. 479-485, July 1971 RiceR71 Rice, R., and W. d. Smith, SYMBOL - A lajor Departure from Classic Software Dominated von Neumann Computing Systems, Proc. of the 3JCC, Vol. 38, AFIPS Press, New fork, pp. 575-587, 1971 Asynchronous Arbiters, IEEE Trans. on C-21, No. 1, pp. 37-42, January 1972 52 RPse363 Rosen, S., Hardware Design Retlecting Software Requirements, Proc. of the FJCC, Vol. 33, Pt. 2, AFIPS Press, New YorK, pp. 1443-1449, 1968 R>3s264 Ross, 2., et al., A New Approach to Computer Ccmmand Structures, Tech. Rept. No. RADC-TDR-64-135, Rome Air Development Center, Griffis Air Force Base, May 1964 RaggJ69 Ruggiero, J. F., and D. A. Coryell, An Auxiliary Processing System for Array Calculations, IBM Sys. J., Vol. 8, No. 2, pp. 118-135, 1969 SammJ69 Sammet, J. E., Fundamentals, 717-719, 1969 Schr472 and J. H. Saltzar, A Hardware Schroeder, M. D., Architecture for Implementing Protection Rings, Comm. of the A:M, Vol. 15, No. 3, pp. 157-170, March 1972 SenzD65 Senzig, D. N., and R. V. Smith, Computer Organization for Array Processing, Proc. of the FJ2O, Vol. 27, Pt. 1, AFIPS Press, New York, pp. 117-128, 1965 SenzD67 on High Performance Senzig, D. N., 3bservations lachines, Proc. of tae FJCC, Vol. 31, AFIPS Press, New York, pp. 791-799, 1967 SethR70 Sethi,R., and J. D. Ullman, The Generation of Optimal Expressions, J. of the ACM, Vol. -ode for Arithmeti 17, No. 4, pp. 715-728, October 1970 ShAwJ53 Structure for Complex et al., A Command Shaw, J. C., Information Processing, Proc. of the WJCC, pp. 119-128, 1958 SinqS71 Adder for Multiple Operands Singh, S., and R. Waxman, IBM Tech. and its Application for Multiplication, Components Division, East IBM rR 22.1356, R3apt. Fishkill, New York, 3ctober 1971 Stinl7l Stone, H. S., on Computers, SamnF71 Sumner, F. H., Operand Accessing in tae MU5 Computer, Computer Society the IEEE International of Proc. 1971 September pp. 119-120, .onf., Languages: History Programming Section X.6., Prentice-Hall, and pp. A Logic-in-Memory Computer, IEEE Trans. pp. 73-78, January 1970 53 T 3rj7) Thornton, J. E., Design of Data 6609, Scott, Foresman Illinois, 1970 a Computer: and The Control Company, Glenview, TharK71 Thurber, K. J., and J. W. Myrna, System Design of a 2ellular APL Computer, IEEE Trans. on Computers, Vol. 2-19, No. 4, pp. 241-3)3, April 1970 TiurK71 Ihurber, K. J., and R. 0. Berg, Applications of Associative Processors, Computer Design, Vol. 10, No. 11, up. 103-110, November 1971 Tjad71 Tjaden, 3. S., and M. J. Flynn, Detection and Parallel Execution of Independent Instructions, IEEE Trans. on -omputers, Vol. 2-19, No. 10, pp. 884-895, October 1970 TimaR67 romasulo, R. M., An Efficient Algarithm for the Automatic Exploitation of :ultiple Execution Units, IBM J. of Res. ani Dev., Vol. 11, No. 1, pp. 25-33, January 1967 raikA71 rucker, A. B., and M. J. Flynn, Dynamic Microprogramming: Processor Organization and Programaing, Comm. of the ACM, Vol. 14, No. 4, ACM, New York, pp. 243-253, April 1971 Ware472 dare, W. H., The Jltimate Computer, IEEE Spectrum, pp. 34-91, March 1972 7 Weber, it., A Microprogrammed Implementation of EULER an IBM System/363 Model 30, Comm. of the ACM, Vol. 10, No. 9, pp. 549-553, September 1967 W.-oebe6 ZitsR71 Zaks, R., Microprogrammed APL, Proc. International Computec Society Conf., September 1971 of pp. the IEEE 193-194, Minu9is ani Miscellaneous F1] IBM System/350 Model 195, Functional Characteristics, Form A22-6943-0, International Business Machines Corporation, Pougheepsie, F2] IBM New Yock, System/360 Model August 195, 1969 Theory of Operation: System Introduction and Instruction Processor, Form SY22-6855-0, International Business Machines Corporation, Poughkeepsie, New fork, August 1970 [13] NCR 304 Electronic Data Processing Systam, Programming Manual, National Cash-Register Company, June 1960 (obtained through the courtesy of Jean Sammet) [ ] ,5] Control Data 7600 Coumputer System, Preliminary Reference Manual, Pub. No. 60258230, Control Data Corporation, Minneapolis, Minnesota, 1969 Burroughs 36500/B7500 Information Processing Characteristics Manual, Burroughs Corporation, Michigan, Systems, Detroit, 1967 [3'5] IBM System/370 Model 165, Functional Characteristics, Form GA22-6935-3, International Business Machines Corporation, White Plains, New York, June 1970 [17] A Guide to the IBM System/370 Model 165, Form GC20-1730-0, International Business Macines Corporation, White Plains, New York, June 1970 [13] IBM System/370 Model 155, Theory of Operation (Volume 2): I-unit, Form SY22-6831-0, International Business Machines Corporation, White Plains, New York, January 1971 F19) APL/360 User's Manual, Form Gd20-0683-1, International Business Machines Corporation, White Plains, New York, March 1970 ['10] PL/I Language Specifications, Form Y33-6003-1, International Business Machines Corporation, White Plains, New York, April 1969 [%111] IBM System/360 Operating System, PL/I (F) Language Refarence Manual, Form C2d-8201-2, International Business Machiaes Corporation, 4hite Plains, New York, October 1969 [412] Conversational Programming Manual, Fora GH20-0758-0, System (CPS), Ierminal User's International Business Machines Corporation, White Plains, New York, January 1970 55 [113] Weissman, Z., LISP 1.5 Primer, Dickenson Belmont, California, 1967 Publishing Co., [114] McCacthy, J., et al., LISP 1.5 Programmer's Manual, MIT Press, Cambridge, Massachusetts, February 1965 [115] Griswold, R. E., et al., Language, Prentice-Hall, 1968 The SNOBOL 4 Programming [116] Sussman, G. J., and I. Winograd, Micro-Planner Reference Manuil, Artificial Intelligence Memo No. 2M3, Project MAC, Massachusetts Institute of Technology, July 1970 [117] Evans, A., Jr., PAL -Pedagogic Algorithmic Language, Reference Manual and Primer, Dept. of Electrical Engineecing, Massachusetts Institute of Technology, October 1970 [118] Reynolds, J. C., GEDANKEN - A Simple Typeless Language Based on the Principle of Completeness and the Reference Concept, Comm. of the ACM, Vol. 13, No. 5, ACM, New York, pp. 338-319, May 1973 - FINIS -