FORTRAN EXTENSIONS FOR CONVEYOR AND SIMD SYSTEMS CONTROL VECTORS Illiac IV was a system of the SIMD type designed in Illinois University, USA, in 1971. In first amongst them, Burroughs Illiac IV Fortran, for the address masking were used control vectors as indices of arrays; elements of the control vector take values .true or .false; .true indicates, that operation must be executed for the respective element, * means a control vector of the any length, which all elements are .true. Consider an example: real a(100), b(100), c(100) logical m(100) do 10 i=1,100,2 - varies from 1 to 100 with the step 2 m(i)=.true. - odd elements m(i+1)=.false. - even elements 10 continue a(*)=b(*)+a(*) c(m(*))=b(m(*))+a(m(*)) - * as index DO CYCLES In the next compiler IVTRAN Compiler for Illiac (1973) to represent parallel calculations there was used another approach: cycle do for all indicated that its body must be compiled as a vector instruction. Unlike usual Fortran cycles do, here instead of the cycle variable was used a set of index tuples, for each components of which there was determined a range, and all the set of tuples was got as a Cartesian product of these ranges. Such cycles can’t be embedded. Example real a(10,20), b(10,20) do 10 for all (i,j) [1..10].c.[1..20] for each pair of i,j .c. – Cartesian product a(i,j)=b(i,j)+a(i,j) 10 continue This fragment of the program defines 200 parallel calculations, in each of which (they are defined by pairs i,j, herewith i is changed from 1 to 10, and j - from 1 to 20) there are summed values of respective elements of arrays a,b with saving results in the array a. TRIPLETS Advanced Scientific Computer by company Texas Instruments in 1972, STAR-100 by company Control Data Corporation in 1973. TI-ASC NX Fortran Compiler (i.e. for the machine ASC) was one of the first compilers for such processors. Triplet is a triple of numeric expressions, written across the colon (beginning, end, step), i.e. B:E:S. 1) integer a(4), b(4), c(8) data b/2,4,6,8/ data c/1,2,3,4,5,6,7,8/ a=b - operation on the whole array a(1:2)=b(3:4) - triplet with first two elements - duplet a b a(1:3:2)=b(2:4:2)*c(3:8:4) is equivalent to: a(1)=b(2)*c(3), a(3)=b(4)*c(8). integer a(4,4,4), b(4,8), c(4,6,4,4) a(1:3:2,2:4:2,1)=b(1:2,2:4:2)*c(3:4,5,2:4:2,1) a(1,2,1)=b(1,2)*c(3,5,2,1) a(3,2,1)=b(2,2)*c(4,5,2,1) a(1,4,1)=b(1,4)*c(3,5,4,1) a(3,4,1)=b(2,4)*c(4,5,4,1), i.e. here are multiplied sections of size 2*2. WHERE STATEMENTS Triplets define address masks, but do not enable to express all vector expressions; in particular it is impossible to assign a conditional mask. For this aim it is used statement where, general form of which is such: Where <array_logical expression > <array_assignment > [otherwise <array_assignment>] end where, where <array_assignment> - are operators of the assignment for arrays or sections of arrays, <array_logical_expression> is a logical expression, which operands are elements of arrays, square brackets show optional part of the operator. Example of using where use (negative elements of the array a are here zeroed): integer a(100) where (a(1:100).LT.0) a(1:100)=0 end where .LT. stands for less than IDENTIFY STATEMENTS Having array a(10,10), by means of triplets and statements where to we can’t formulate command of assigning values to diagonal elements, though distance in memory between these elements is one and the same (equal to 11 elements). Statement identify allows to bind part of the elements of the array so that it would be possible to them vector operation. 1,1 10,1 1,2 2,2 10 2 To get access to diagonal elements the following code may be used: real a(10,10) identify (diag(i)=a(i,i),i=1,10) diag(:)=1 Note that same is possible to get by means of statement forall, which looks similarly to identify, but instead of the collecting just executes operations: Real a(10,10) forall (i=1,10) a(i,i)=1 Statement forall can use also conditional expression, determining a conditional mask. For instance, in the following fragment are zeroed negative diagonal elements: integer a(10,10) forall (i=1,10,a(i,i).LT.0) a(i,i)=0 EXTENSIONS OF FORTRAN FOR MIMD-SYSTEMS MIMD-systems vastly differ one from the other, so Fortran for such systems was not standardized and for each system possesses specific features, determined by the architecture of the corresponding system. Meanwhile there are used basically the following two approaches: parallelizing on the micro level (micro tasking); parallelizing on the macro level (macro tasking). Macro tasking is a partition of the task in large parts (programs of separate users, separate procedures of one program), large-block parallelizing. Macro tasking as a rule is realized by means of operators fork/join. Operator fork creates a new task for execution by the operating system. Making a new task is usually a highly labor-consuming operation. The useful work carried out by the task, must be sufficiently large to compensate expenses of its creation. Macro tasking is used not only in multiprocessor systems, but as well in uni-processor (for instance, HEP, designed at the end 1970-s). Uniprocessor module HEP uses a parallel multiple flow of commands, using time sharing. Several pipelines and ensemble of registers allow HEP to be quickly switched between processes without saving in memories information on processes; as a minimum 8 processes required for full loading of a processor. Main drawback of macro tasking is a waste of time on creating a process and switching between processes. FORTRAN HEP create and resume - for creating a new process, executed parallel with parent process. Statement create creates a parallel to the main program process on the base of specified subroutine. Statement resume allows to the launched subroutine to activate its parent process so that they will be executed hereinafter in parallel. Presented below two fragments of HEP Fortran program result in parallel with the main program execution of a subroutine subr: 1) create subr(x1,y1,z1) // main program … end subroutine subr(x,y,z) … return end … 2) call subr(x1,y1,z1) // main program … end subroutine subr(x,y,z) resume … return end FORTRAN CRAY-X/MP Many modern supercomputers contain comparatively small number of large powerful vector processors. For example, CRAY-XM/P pertains to such systems and affords users of Fortran a library of subroutines for the support of macro tasking. Programmer must separate its program on tasks, to be executed in parallel. Automatic parallelizing here is not made. Library subroutines generate new tasks at a level of subroutines. For instance, the following fragment of the code leads to the creation of two tasks, running one and the same subroutine, but for different values of parameters: external subr integer task1(3), data(1000) call tskstart (task1,subr,data,1,500) call subr(data,501,1000) The first call creates a new task executing subr on data elements from 1 to 500. The second call starts subr as a subroutine, processing elements of data from 501 to 1000. Two important classes it is possible to select amongst methods of synchronizing: - a synchronizing on the base of the mutual exclusion (critical section) – it is required that processes not to meet; - a synchronizing on the base of waiting the events – it is required that processes to meet. CRITICAL SECTIONS AND EVENTS Critical section assumes that certain region of the code in each given moment of the time may be executed no more than by one process. To clarify this let’s consider such situation: several processes execute one and same procedure and the later procedure has a region of the code, where process works with the so named critical resource, simultaneous access to which by several processes is forbidden. Mechanism of events is intended for the delay of some processes until a moment of termination by other processes of some actions: waiting process must declare on waiting of certain event, and process, which is awaited, must upon completion of necessary actions declare an expected event. Fortran HEP for synchronizing uses so called asynchronous variables. Names of asynchronous variables begin from the sign $. These variables have associated with them synchronization bit, which can be in two states: set (1) and thrown (0). Asynchronous variable can be read, if its synchronization bit is set in "1", and then bit just is immediately thrown to "0". To write information in asynchronous variable is possible only under the thrown synchronization bit, and immediately after that it is set to "1". Process, wanting to read/write asynchronous variable, will wait, when bit of synchronizing will turn out to be in the respective state. Section of the code, staying between read/write of asynchronous variable, is a critical section (i.e. by means of such framing a task of the mutual exception is solved). These variable can be used and for synchronizing of a "waiting an event" type. SOLUTIONS OF “PRODUCERS-CONSUMERS” AND “MUTUAL EXCLUSION” TASKS In the example it is assumed that buffer is unary. 1) Solution by means of Fortran HEP: integer $emp,$fll data $emp,$fll/1,0/ … producer e=$emp write in buffer $fll=1 end consumer f=$fll read fro buffer $emp=1 end 2) Solution using means of CRAY –XM/P: integer emp,fll call evasgn (emp) - procedure of assigning event emp call evasgn (fll) - procedure of assigning event fll call evpost (emp) - it is post event – buffer is empty SOLUTIONS OF “PRODUCERS-CONSUMERS” AND “MUTUAL EXCLUSION” TASKS (CONT) …. producer: call evwait (emp) write in buffer call evpost (fll) end consumer: call evwait (fll) read from buffer call evpost (emp) end Example of solving the problem of critical section: 1) by means of HEP: integer $lkvar data $lkvar/1/ … l=$lkvar critical section $lkvar=1 2) by means of CRAY-XM/P: integer lkvar call lockasgn (lkvar) variable … call lockon (lkvar) critical section call lockoff (lkvar) - asynchronous variable - bit set to 1 - read - write synchronization bit - assign lkvar as critical section CEDAR FORTRAN System Cedar is a development of Illinois university (USA), it comprises of 4 parallel processors Alliant FX/8 (32 processors), but can be extended up to hundreds of processors. System Cedar is two-level: on the top-level there are clusters of Alliant FX/8, which consist of separate processors; in general case, cluster is a collection of processors, which are tightly coupled with each other. In Fortran Cedar there are used vector extensions, in particular statements forall, where and triplets. Facility for the parallelizing is provided on the micro- and macro tasking levels. Micro tasking level - a parallelizing of iterations of cycles - is represented by the operator doacross, syntax which is such: doacross [mark[,]] i=e1,e2[,e3] [types] [operators] [loop] [operators] [mark] operator | end doacross, where i is a name of integer cycle variable, e1,e2,e3 are integer expressions. Operators, standing in the body of the cycle before loop, are performed once on each processor, involved in its calculation. Operators, standing in the body of the cycle after loop, are performed in the each iteration. If loop is absent, all operators are performed in the each iteration. Operators of the determination of the type can appear only right after statement doacross; their purpose is to announce internal for doacross variables and arrays, to which it would be possible to refer only incite doacross. Each iteration of the cycle allocates a place for local, private copy of these internal variables and arrays. All preceding definitions are superceded by these definitions for a period of execution of the cycle.