FORTRAN EXTENSIONS FOR CONVEYOR AND SIMD SYSTEMS

advertisement
FORTRAN EXTENSIONS FOR CONVEYOR AND SIMD SYSTEMS
CONTROL VECTORS
Illiac IV was a system of the SIMD type designed in Illinois
University, USA, in 1971. In first amongst them, Burroughs Illiac IV
Fortran, for the address masking were used control vectors as indices of
arrays; elements of the control vector take values .true or .false; .true
indicates, that operation must be executed for the respective element, *
means a control vector of the any length, which all elements are .true.
Consider an example:
real a(100), b(100), c(100)
logical m(100)
do 10 i=1,100,2 - varies from 1 to 100 with the step 2
m(i)=.true.
- odd elements
m(i+1)=.false.
- even elements
10 continue
a(*)=b(*)+a(*)
c(m(*))=b(m(*))+a(m(*))
- * as index
DO CYCLES
In the next compiler IVTRAN Compiler for Illiac (1973) to represent
parallel calculations there was used another approach: cycle do for all
indicated that its body must be compiled as a vector instruction. Unlike usual
Fortran cycles do, here instead of the cycle variable was used a set of index
tuples, for each components of which there was determined a range, and all
the set of tuples was got as a Cartesian product of these ranges. Such cycles
can’t be embedded. Example
real a(10,20), b(10,20)
do 10 for all (i,j)
[1..10].c.[1..20]
for each pair of i,j
.c. – Cartesian product
a(i,j)=b(i,j)+a(i,j)
10 continue
This fragment of the program defines 200 parallel calculations, in
each of which (they are defined by pairs i,j, herewith i is changed from 1 to
10, and j - from 1 to 20) there are summed values of respective elements of
arrays a,b with saving results in the array a.
TRIPLETS
Advanced Scientific Computer by company Texas Instruments in
1972, STAR-100 by company Control Data Corporation in 1973.
TI-ASC NX Fortran Compiler (i.e. for the machine ASC) was one of
the first compilers for such processors.
Triplet is a triple of numeric expressions, written across the colon
(beginning, end, step), i.e. B:E:S.
1)
integer a(4), b(4), c(8)
data b/2,4,6,8/
data c/1,2,3,4,5,6,7,8/
a=b - operation on the whole array
a(1:2)=b(3:4)
- triplet with first
two elements - duplet
a
b
a(1:3:2)=b(2:4:2)*c(3:8:4) is equivalent to: a(1)=b(2)*c(3),
a(3)=b(4)*c(8).
integer a(4,4,4), b(4,8), c(4,6,4,4)
a(1:3:2,2:4:2,1)=b(1:2,2:4:2)*c(3:4,5,2:4:2,1)
a(1,2,1)=b(1,2)*c(3,5,2,1)
a(3,2,1)=b(2,2)*c(4,5,2,1)
a(1,4,1)=b(1,4)*c(3,5,4,1)
a(3,4,1)=b(2,4)*c(4,5,4,1), i.e. here
are multiplied sections of size 2*2.
WHERE STATEMENTS
Triplets define address masks, but do not enable to express all vector
expressions; in particular it is impossible to assign a conditional mask. For
this aim it is used statement where, general form of which is such:
Where
<array_logical expression >
<array_assignment >
[otherwise <array_assignment>]
end where,
where
<array_assignment> - are operators of the assignment for arrays or sections
of arrays,
<array_logical_expression> is a logical expression, which operands are
elements of arrays, square brackets show optional part of the operator.
Example of using where use (negative elements of the array a are here
zeroed):
integer a(100)
where (a(1:100).LT.0)
a(1:100)=0
end where
.LT. stands for less than
IDENTIFY STATEMENTS
Having array a(10,10), by means of triplets and statements where to
we can’t formulate command of assigning values to diagonal elements,
though distance in memory between these elements is one and the same
(equal to 11 elements). Statement identify allows to bind part of the elements
of the array so that it would be possible to them vector operation.
1,1
10,1 1,2 2,2
10
2
To get access to diagonal elements the following code may be used:
real a(10,10)
identify (diag(i)=a(i,i),i=1,10)
diag(:)=1
Note that same is possible to get by means of statement forall, which
looks similarly to identify, but instead of the collecting just executes
operations:
Real a(10,10)
forall (i=1,10) a(i,i)=1
Statement forall can use also conditional expression, determining a
conditional mask. For instance, in the following fragment are zeroed
negative diagonal elements:
integer a(10,10)
forall (i=1,10,a(i,i).LT.0) a(i,i)=0
EXTENSIONS OF FORTRAN FOR MIMD-SYSTEMS
MIMD-systems vastly differ one from the other, so Fortran for such
systems was not standardized and for each system possesses specific
features, determined by the architecture of the corresponding system.
Meanwhile there are used basically the following two approaches:
parallelizing on the micro level (micro tasking);
parallelizing on the macro level (macro tasking).
Macro tasking is a partition of the task in large parts (programs of
separate users, separate procedures of one program), large-block
parallelizing. Macro tasking as a rule is realized by means of operators
fork/join. Operator fork creates a new task for execution by the operating
system. Making a new task is usually a highly labor-consuming operation.
The useful work carried out by the task, must be sufficiently large to
compensate expenses of its creation.
Macro tasking is used not only in multiprocessor systems, but as well
in uni-processor (for instance, HEP, designed at the end 1970-s). Uniprocessor module HEP uses a parallel multiple flow of commands, using
time sharing. Several pipelines and ensemble of registers allow HEP to be
quickly switched between processes without saving in memories information
on processes; as a minimum 8 processes required for full loading of a
processor. Main drawback of macro tasking is a waste of time on creating a
process and switching between processes.
FORTRAN HEP
create and resume - for creating a new process, executed parallel
with parent process. Statement create creates a parallel to the main program
process on the base of specified subroutine. Statement resume allows to the
launched subroutine to activate its parent process so that they will be
executed hereinafter in parallel. Presented below two fragments of HEP
Fortran program result in parallel with the main program execution of a
subroutine subr:
1) create subr(x1,y1,z1) // main program
…
end
subroutine subr(x,y,z)
…
return
end
…
2) call subr(x1,y1,z1)
// main program
…
end
subroutine subr(x,y,z)
resume
…
return
end
FORTRAN CRAY-X/MP
Many modern supercomputers contain comparatively small number of
large powerful vector processors. For example, CRAY-XM/P pertains to
such systems and affords users of Fortran a library of subroutines for the
support of macro tasking. Programmer must separate its program on tasks, to
be executed in parallel. Automatic parallelizing here is not made. Library
subroutines generate new tasks at a level of subroutines. For instance, the
following fragment of the code leads to the creation of two tasks, running
one and the same subroutine, but for different values of parameters:
external subr
integer task1(3), data(1000)
call tskstart (task1,subr,data,1,500)
call subr(data,501,1000)
The first call creates a new task executing subr on data elements from
1 to 500. The second call starts subr as a subroutine, processing elements of
data from 501 to 1000.
Two important classes it is possible to select amongst methods of
synchronizing:
- a synchronizing on the base of the mutual exclusion (critical section)
– it is required that processes not to meet;
- a synchronizing on the base of waiting the events – it is required that
processes to meet.
CRITICAL SECTIONS AND EVENTS
Critical section assumes that certain region of the code in each given
moment of the time may be executed no more than by one process. To
clarify this let’s consider such situation: several processes execute one and
same procedure and the later procedure has a region of the code, where
process works with the so named critical resource, simultaneous access to
which by several processes is forbidden.
Mechanism of events is intended for the delay of some processes
until a moment of termination by other processes of some actions: waiting
process must declare on waiting of certain event, and process, which is
awaited, must upon completion of necessary actions declare an expected
event.
Fortran HEP for synchronizing uses so called asynchronous
variables. Names of asynchronous variables begin from the sign $. These
variables have associated with them synchronization bit, which can be in
two states: set (1) and thrown (0). Asynchronous variable can be read, if its
synchronization bit is set in "1", and then bit just is immediately thrown to
"0". To write information in asynchronous variable is possible only under
the thrown synchronization bit, and immediately after that it is set to "1".
Process, wanting to read/write asynchronous variable, will wait, when bit of
synchronizing will turn out to be in the respective state. Section of the code,
staying between read/write of asynchronous variable, is a critical section (i.e.
by means of such framing a task of the mutual exception is solved). These
variable can be used and for synchronizing of a "waiting an event" type.
SOLUTIONS OF “PRODUCERS-CONSUMERS” AND “MUTUAL
EXCLUSION” TASKS
In the example it is assumed that buffer is unary.
1) Solution by means of Fortran HEP:
integer $emp,$fll
data $emp,$fll/1,0/
…
producer
e=$emp
write in buffer
$fll=1
end
consumer
f=$fll
read fro buffer
$emp=1
end
2) Solution using means of CRAY –XM/P:
integer emp,fll
call evasgn (emp)
- procedure of assigning event emp
call evasgn (fll)
- procedure of assigning event fll
call evpost (emp)
- it is post event – buffer is empty
SOLUTIONS OF “PRODUCERS-CONSUMERS” AND “MUTUAL
EXCLUSION” TASKS (CONT)
….
producer:
call evwait (emp)
write in buffer
call evpost (fll)
end
consumer:
call evwait (fll)
read from buffer
call evpost (emp)
end
Example of solving the problem of critical section:
1) by means of HEP:
integer $lkvar
data $lkvar/1/
…
l=$lkvar
critical section
$lkvar=1
2) by means of CRAY-XM/P:
integer lkvar
call lockasgn (lkvar)
variable
…
call lockon (lkvar)
critical section
call lockoff (lkvar)
- asynchronous variable
- bit set to 1
- read
- write synchronization bit
- assign lkvar as critical section
CEDAR FORTRAN
System Cedar is a development of Illinois university (USA), it
comprises of 4 parallel processors Alliant FX/8 (32 processors), but can be
extended up to hundreds of processors.
System Cedar is two-level: on the top-level there are clusters of
Alliant FX/8, which consist of separate processors; in general case, cluster is
a collection of processors, which are tightly coupled with each other.
In Fortran Cedar there are used vector extensions, in particular
statements forall, where and triplets. Facility for the parallelizing is provided
on the micro- and macro tasking levels.
Micro tasking level - a parallelizing of iterations of cycles - is
represented by the operator doacross, syntax which is such:
doacross [mark[,]] i=e1,e2[,e3] [types] [operators]
[loop]
[operators]
[mark] operator | end doacross,
where i is a name of integer cycle variable, e1,e2,e3 are integer expressions.
Operators, standing in the body of the cycle before loop, are performed once
on each processor, involved in its calculation. Operators, standing in the
body of the cycle after loop, are performed in the each iteration. If loop is
absent, all operators are performed in the each iteration. Operators of the
determination of the type can appear only right after statement doacross;
their purpose is to announce internal for doacross variables and arrays, to
which it would be possible to refer only incite doacross. Each iteration of
the cycle allocates a place for local, private copy of these internal variables
and arrays. All preceding definitions are superceded by these definitions for
a period of execution of the cycle.
Download