T. Hagerup A. Schmitt H. Seidl April 22, 1994

advertisement
FORK
A High{Level Language for PRAM s
T. Hagerup1
A. Schmitt2
H. Seidl2
April 22, 1994
22/1990
ABSTRACT
We present a new programming language designed to allow the convenient expression of
algorithms for a parallel random access machine (PRAM ). The language attempts to satisfy
two potentially conicting goals: On the one hand, it should be simple and clear enough to
serve as a vehicle for human-to-human communication of algorithmic ideas. On the other
hand, it should be automatically translatable to ecient machine (i.e., PRAM ) code, and it
should allow precise statements to be made about the amount of resources (primarily time)
consumed by a given program. In the sequential setting, both objectives are reasonably well
met by the Algol-like languages, e.g., with the RAM as the underlying machine model, but
we are not aware of any language that allows a satisfactory expression of typical PRAM
algorithms. Our contribution should be seen as a modest attempt to ll this gap.
Fachbereich 14
Universitat des Saarlandes
Im Stadtwald
6600 Saarbrucken
1
2
Supported by the Deutsche Forschungsgemeinschaft, SFB 124, TP B2
Supported by the Deutsche Forschungsgemeinschaft, SFB 124, TP C1
1 INTRODUCTION
1 Introduction
1
A PRAM is a parallel machine whose main components are a set of processors and a
global memory. Although every real machine is nite, we consider an ideal PRAM to
have a countably innite number of both processors and global memory cells, of which
only a nite number is used in any nite computation. Both the processors and the global
memory cells are numbered consecutively starting at 0 the number of a processor is called
its processor number or its index, and the number of a memory cell is, as usual, also
known as its address. Each processor has an innite local memory and a local program
counter. All processors are controlled by the same global clock and execute precisely one
instruction in each clock cycle. A PRAM may hence also be characterized as a synchronous
shared{memory MIMD (multiple-instruction multiple-data) machine.
The set of instructions available to each processor is a superset of those found in a
standard RAM (see, e.g., 2]). The additional instructions not present in a RAM are an
instruction LOADINDEX to load the index of the executing processor into a cell in its
local memory and instructions READ and WRITE to copy the contents of a given global
memory cell to a given cell in the local memory of the executing processor, and vice versa.
All processors can access a global memory cell in the same step, with some restrictions
concerning concurrent access by several processors to the same cell (see Section 2.5).
Among researchers working on the development of concrete algorithms, the PRAM is
one of the most popular models of parallel computation, and the number of published
PRAM algorithms is large and steadily growing. This is due mainly to the convenient and
very powerful mechanism for inter-processor communication provided by the global memory. Curiously, there is no standard PRAM programming language, and each researcher,
in so far as he wants to provide a formal description of his algorithms, develops his own
notation from scratch. The disadvantages of this are evident:
1. At least potentially, diculties of communication are aggravated by the lack of a
common language
2. The same or very similar denitions are repeated again and again, resulting in a
waste of human time and journal space
3. Since the designer of an algorithm is more interested in the algorithm than in the
notation used to describe it, any language fragments that he may introduce are not
likely to be up to current standards in programming language design.
In the wider area of parallel computing in general, much eort has gone into the
development of adequate programming languages. Most of these languages, however, are
intended to be used with loosely coupled multiprocessor systems consisting of autonomous
computers, each with its own clock, that run mainly independently, but occasionally exchange messages. The facilities provided for inter-processor communication and synchronization are therefore based on concepts such as message exchange (Ada 15] OCCAM
14] Concurrent C 10]) or protected shared variables (Concurrent Pascal 12]). In particular, a global memory simultaneously accessible to all processors is not supported by
such languages, and it can be simulated only with an unacceptably high overhead. While
such languages may be excellent tools in the area of distributed computing, they are not
suited to the description of PRAM algorithms.
Before we go on to discuss other languages more similar in spirit to ours, we describe
what we consider to be important features of such languages. Most obviously, they must
1 INTRODUCTION
2
oer a way to state that certain operations can be executed in parallel. Secondly, we want
to write programs for a shared{memory machine. Therefore, the language should distinguish between shared variables, which exist only once and can be accessed by a certain
group of processors, and private variables, of which there might be several instances, each
residing in a dierent processor's private memory and possibly having a dierent value.
Also, the machine facilities of synchronous access to shared data, should be reected in
the language. Finally, program constructs like recursion, which are well suited for writing
clear and well structured sequential programs, should be allowed to be freely combined
with parallelism. Recursion is characterized by a subdivision of a given problem into a set
of subproblems that can be solved independently and possibly in parallel. Each subproblem may again be worked on by several processors. Therefore, the programming language
should provide the programmer with a means of generating independently working subgroups of synchronously running processors. Since the eciency of many algorithms relies
on a subtle distribution of processors over tasks, an explicit method should be available
to assign processors to newly created subgroups.
A frequently used tool for indicating parallelly executable program sections is a for
loop where all loop iterations are supposed to be executed in parallel. Such a construct is,
e.g., used in extensions of sequential imperative languages like force 13] and ParC 4].
Also textbooks about PRAM algorithms, e.g. 3, 11], usually employ some Algol{style
notation together with a statement like for i:=1 to n pardo ... endpardo .
A dierent approach is taken in the language gatt 7]. In gatt all processors are
started simultaneously at the beginning. During procedure calls subgroups can be formed
to solve designated subproblems. However, since gatt is designed for describing ecient
algorithms on processor networks, gatt lacks the concept of shared variables. Instead,
every variable has to reside in the private memory of one of the processors.
For PRAMs, there are various examples of descriptions of recursive algorithms using
an informal group concept, e.g., see 3, sect. 4.6.3, p. 101], 8, 6]. An attempt to formulate
a recursive PRAM algorithm more precisely is made in 5]. Corresponding to the machinelevel fork instruction of 9], a fork statement is introduced, which allows a given group of
synchronously working processors to be divided into subgroups. This fork statement gave
the name to our language.
The present paper embeds the fork statement suggested in 5] into a complete programming language. In detail, the contributions of FORK are the following:
It adds a start construct, which allows a set of new processors with indices in a
specied range to be started.
It makes precise the extent to which the semantics guarantees synchronous program
execution (and hence synchronous data access).
Besides the implicit synchronization at the beginning of every statement, as proposed
in 5], it introduces implicit splitting into subgroups at every branching point of the
program where the branch taken by a processor depends on its private variables.
It is argued that the available program constructs can be freely nested. In particular,
iteration and recursion are compatible both with the starting of new processors and the
forking of subprocesses.
The paper is organized as follows. In Section 2 we explain the mechanism of synchronism of FORK together with the new constructs in FORK for maintaining parallelism.
Moreover, we introduce the three basic concepts of FORK, namely the concepts of a
2 AN OVERVIEW ON THE PROGRAMMING LANGUAGE FORK
3
\logical processor", of a \group" of logical processors and of \synchronous program execution". These basic concepts are used in Section 3, which presents a description of the
syntax of FORK together with an informal description of its semantics. Section 4 concludes with some hints on how programs of the proposed language can be compiled to
eciently running PRAM machine code.
It should be emphasized that although our language design aims to satisfy the needs
of theoreticians, we want to provide a practical language. The language FORK was
developed in close connection with a research project at the Saarbrucken Computer Science
Department that in detail explores the possibilities of constructing a PRAM 1] and is
going to build a prototype. We plan to write a compiler for our language that produces
code for this physical machine.
Both a formal semantics of FORK and a more precise description of a compiler for
FORK are in preparation.
2 An overview on the programming language FORK
Parallelism in FORK is controlled by two orthogonal instructions, namely
start <expr>..<expr>] and fork <expr>..<expr>]. The start instruction can be used
to readjust the group of processors available to the PRAM for program execution, whereas
the fork instruction leaves the number of processors unchanged, but creates independently
operating subgroups for distinct subtasks and allows for a precise distribution of the available processors among them. The eect of these instructions together with FORK's concept of synchronous program execution will be explained by the examples below.
2.1 Creating new processors: The start statement
A basic concept of FORK is a logical processor. Logical processors are meant to be mapped
onto the physical processors provided by the hardware architecture. However, the number
of actually working logical processors may vary during program execution also, the number
of logical processors may exceed the number of physically available processors. Therefore,
these two kinds of processors should not be confused. In the sequel, if we loosely speak of
\processors" we always mean \logical processors". If we mean physical processors we will
state so explicitly.
Every (logical) processor p owns a distinguished integer constant ] whose value is
referred to as the processor number of p. Also, it may have other private constants,
private types, and private variables which are only accessible by itself. Objects declared
as shared by a group of processors can be accessed by all processors of the given group.
As a rst example consider the following problem. Assume that we are given a forest
F with nodes 1 : : : N. F is described by an array A of N integers, where Ai] = i if i is
a root of F, and Ai] contains the father of i otherwise. For an integer constant N, the
following program computes an array R of N integers such that Ri] contains the root of
the tree to which i belongs in F.
As in pascal, the integer constant N, the loop variable t, and the arrays A and R must
be declared in the surrounding context in FORK this declaration indicates whether variables are shared (as in the example) or private and hence only accessible to the individual
processor itself.
2 AN OVERVIEW ON THE PROGRAMMING LANGUAGE FORK
:::
shared const N = : : :
shared var t : integer
shared var A : array 1 .. N] of integer
shared var R : array 1 .. N] of integer
:::
start 1..N]
R]] := A]] for t := 1 to log(N) do
= log(N) denotes ?$nlceilnlog 2(N)nrceil$? =
R]] := RR]]]
enddo
endstart
:::
4
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
(14)
Initially there is just one processor with processor number 0. The instruction start 1..N] in
line (7) starts processors with processor numbers 1 : : : N. The corresponding instruction
endstart stops these processors again and reestablishes the former processors. Hence the
sequence of an instruction start 1..N] immediately followed by an instruction start 1..M]
does not start NM processors but only M processors. An occurrence of endstart nishes
the phase where M processors were running and again there are N processors with numbers
1 : : : N.
At the machine level every instruction consumes exactly one time unit. However, the
semantics of a high-level program should be independent of the special features of the
translation schemes. Therefore, it should be left unspecied how many time units are
precisely consumed by, e.g., an assignment statement of FORK.
For this reason a notion of synchronous program execution is needed which only depends on the program text itself. Again, the semantic notion of \synchronous program
execution" should not be confused with the notion of a global clock of a physical PRAM.
For example, the underlying hardware may allow dierent processors to execute dierent
instructions within the same clock cycle, whereas our notion of synchronism does not allow
for a synchronous execution of dierent statements. Being synchronous is a property of a
set of processors. It implies that all processors within this set are at the same program
point. This means that they not only execute the same statement within the same loop
within the same procedure. It also means that the \history" of the recursive call to that
procedure and the number of iterations within the same loop agree. There is no explicit
synchronization mechanism in FORK. Implicit synchronization in FORK is done statement by statement. At the end of each statement there is an (implicit) synchronization
point. This means that if a set of processors synchronously executes a statement sequence
<statement>1 <statement>2
the processors of this set rst synchronously execute <statement>1 . When all processors of this set have nished the execution of <statement>1 they synchronously execute
<statement>2. Note that within the execution of <statement>1 dierent processors may
reach dierent program points. Thus they may become asynchronous in between.
FORK is well structured there are no gotos. Hence implicit synchronization points
cannot be circumvened. Nontermination caused by innite waiting for deviating processors
is therefore not possible.
2 AN OVERVIEW ON THE PROGRAMMING LANGUAGE FORK
5
In the given example all the processors execute the same code. According to our convention they execute statement by statement synchronously. Hence, rst every processor
copies the value of A]] to R]]. Recall that the constant ] is distinct for every processor.
Then all processors assign 1 to the variable t, followed by the execution of line (11). Then
they assign 2 to t, and so forth. Since the upper bound for t depends on shared data only
(namely on N), all processors nish the for loop at the same time.
Observe here that the synchronous execution of an assignment statement is subdivided
into three synchronously executed steps: rst, the right-hand side is evaluated secondly,
the variable corresponding to the left-hand side is determined nally, the value of the
right hand side is assigned to the variable described by the left-hand side.
In our example, in line (11), rst the value of RR]]] is computed in parallel for every
processor, secondly, the variable R]] is determined, which receives its new value in step
three.
2.2 Forming groups of processors: The fork statement
FORK allows free combination of parallelism and recursion. This gives rise to the second basic concept of FORK: a group. Groups are formed by a (possibly empty) set of
processors. Shared variables are always shared relative to a group of processors, meaning
that they can be accessed by processors within this group but not by processors from the
outside.
Groups can be divided into subgroups. This is done by the fork construct of FORK.
The most recently established groups are called leaf groups. Leaf groups play a special
role in FORK. As a minimum, the processors within one leaf group work synchronously.
Also, if new shared objects are declared, they are established as shared relative to the leaf
group executing this declaration.
Every group has a group number. The group number of the most recently created
group can be accessed by its members through the distinguished private integer constant
@. Clearly, the values of @ are equal throughout that group. Initially, there is just one
group with group number 0 which consists of the processor with processor number 0.
As an example, consider the following generic divide-and-conquer algorithm DC. DC
has a recursion parameter N describing the maximal number of processors available to
solve the given problem and additional parameters containing the necessary data, which
for simplicity are indicated by : : : . Assuming that the problem size is reduced to its square
root at every recursion step, DC may be programmed as follows:
procedure DC(shared const N: integer : : : )
(1)
:::
(2)
if trivial(N)
(3)
then conquer( : : : )
(4)
else
(5)
fork 0 .. sqrt(N);1]
(6)
@ = ] div sqrt(N) (7)
] = ] mod sqrt(N) (8)
DC(sqrt(N), : : : ) = sqrt(N) denotes ?$nlceilnsqrt Nnrceil$? =
(9)
endfork
(10)
combine( : : : )
(11)
endif
(12)
2 AN OVERVIEW ON THE PROGRAMMING LANGUAGE FORK
:::
6
(13)
When a leaf group reaches the fork instruction in line (6), a set of subgroups with
group numbers 0 : : : sqrt(N) ; 1 is created. These newly created groups are leaf groups
during the execution of the rest of the fork statement, which, in the example, consists of
line (9). Observe that procedure calls inside a fork may allocate distinct instances of the
same shared variable for each of the new leaf groups.
Executing the right{hand side of line (7), every processor determines the leaf group to
which it will belong.
In order to make the call to a recursive procedure simpler it may be reasonable for a
processor to receive a new processor number w.r.t. the chosen leaf group. In the example
this new number is computed in line (8).
When the new leaf groups have been formed, the existing processors have been distributed among these groups, and the processor numbers have been redened, the leaf groups
independently execute the statement list inside the fork construct. In the example this
consists just of a recursive call to DC. Clearly, the parameters of this recursive call which
contain the specication of the subproblem in general depend on the value of the constant
@ of its associated leaf group.
When the statements inside a fork statement are nished the leaf groups disappear |
in the example at line (10). The original group is reestablished as a leaf group, and all
the processors continue to synchronously execute the next statement (11).
2.3 Why no pardo statement?
There is no pardo statement in FORK. This choice was motivated by the observation that
in general pardo is used simply in the sense of our start. A dierence occurs for nested
pardos. Consider the program segment
for i := 1 to n pardo
(1)
for j := 1 to m pardo
(2)
op(i,j)
(3)
endpardo
(4)
endpardo
(5)
Using a similar semantics as for the start instruction in FORK, the second pardo simply
would overwrite the rst one, which means that on the whole only m processors execute
line (3) moreover, the value of i in line (3) would no longer be dened. This is not the
intended meaning.
Instead, two nested pardos as in lines (1) and (2) are meant to start nm processors
indexed by pairs (i j). Precisely, a pardo statement of the form
for i := <expr>1 to <expr>2 pardo <statement> endpardo
where the expressions <expr>1 and <expr>2 and the statement <statement> do not use
any private objects, can be simulated as follows:
begin
(1)
= declare two new auxiliary constants in order
(2)
to avoid double evaluation of the
(3)
(4)
expressions ?$nNTfexprg 1$? and ?$nNTfexprg 2$?
=
(5)
2 AN OVERVIEW ON THE PROGRAMMING LANGUAGE FORK
7
shared const a1 = ?$nNTfexprg 1$?
shared const a2 = ?$nNTfexprg 2$?
(6)
(7)
= start a2;a1+1 new processors ?$nldots$? =
(8)
starta1 .. a2]
(9)
=?$nldots$? and distribute them among a2;a1+1 new groups = (10)
forka1 .. a2]
(11)
@ = ]
(12)
] = 0
(13)
= each leaf group creates a new variable i and
(14)
initilizes it with the group number
(15)
=
(16)
begin
(17)
shared var i : integer
(18)
i := @
(19)
?nNTfstatementg?
(20)
end
(21)
endfork
(22)
endstart
(23)
end = of the pardo simulation =
(24)
In order to avoid redundancies we decided not to include the pardo construct in FORK.
On the other hand one may argue that the fork construct as provided by FORK is
overly complicated. Using the very simple pardo would suce in every relevant situation.
Using pardo a generic divide-and-conquer algorithm may look as follows:
procedure DC(shared const N: integer : : : )
(1)
:::
(2)
if trivial(N)
(3)
then conquer( : : : )
(4)
else
(5)
for i := 1 to sqrt(N) pardo
(6)
DC(sqrt(N), : : : )
(7)
endpardo
(8)
combine( : : : )
(9)
endif
(10)
:::
(11)
In the pardo version of DC beginning with one processor, successively more and more
processors are started. In particular, every subtask is always supplied with one processor
to solve it. Opposed to that, in the fork version the leaf group of processors is successively subdivided and distributed among the subtasks. The leaf group calling DC does
not necessarily form a contiguous interval. Hence there might be subtasks which receive
an empty set of processors and thus are not executed at all. In fact, this capability is
essentially exploited in the order-chaining algorithm of 5]. This algorithm is not easily
expressible using pardos. This was one of the reasons for introducing the fork construct.
2 AN OVERVIEW ON THE PROGRAMMING LANGUAGE FORK
2.4 Forming subgroups implicitly: The if statement
8
So far we have not explained what happens if the processors of a given leaf group synchronously arrive at a conditional branching point within the program. As an example, assume
that for some algorithm the processors 1 : : : N are conceptually organized in the form of
a tree of height log(N). At time t, a processor should execute a procedure op1(]) if its
height in the tree is at most t, and another procedure op2(]) otherwise. The corresponding
piece of program may look like:
shared var t : integer
(1)
:::
(2)
for t := 1 to log(N) do
(3)
if height(]) <= t
(4)
then op1(])
(5)
else op2(])
(6)
endif
(7)
enddo
(8)
:::
(9)
For every t the condition of line (4) may evaluate to true for some processors, and to
false for others. Moreover, the evaluation of both op1(]) and op2(]) may introduce local
shared variables, which are distinct even if they have the same names. Therefore, every
if{then{else statement whose condition depends on private variables implicitly introduces
two new leaf groups of processors, namely those that evaluate the condition to true and
those that evaluate it to false. Both groups receive the group number of their father group,
i.e. the private constants @ are not redened.
Clearly, within each new leaf group every processor is at the same program point.
Hence, they in fact can work synchronously as demanded by FORK's group concept. As
soon as the two leaf groups have nished the then and the else parts, respectively, (i.e., at
the instruction endif) the original leaf group is reestablished and the synchronous execution
proceeds with the next statement. Case statements and loops are treated analogously.
In the above example the condition of the for loop in line (3) depends only on the
shared variable t. Therefore, the present leaf group is not subdivided into subgroups after
line (3). However, this subdivision occurs after line (4). The two groups for the then and
the else parts execute lines (5) and (6) in parallel, each group internally synchronously but
asynchronously w.r.t. the processors of the other group. Line (7) reestablishes the original
leaf group, which in return synchronously executes the next round of the loop, and so on.
The fact that we implicitly form subgroups at branching points whose conditions depend on private data allows for an unrestricted nesting of if s, loops, procedure calls and
forks.
Observe that at every program point the system of groups and subgroups containing a
given processor forms a hierarchy. Corresponding to that hierarchy, the shared variables
can be organized in a tree{like fashion. Each node corresponds to a group in the hierarchy
and contains the shared variables relative to that group. For a processor of a leaf group
all those variables are relevant that are situated on the path from this leaf group to the
root. Along this path, the ordinary scoping rules hold.
For returning results at the end of a fork or for exchanging data between dierent leaf
groups of processors it is necessary also to have at least in some cases a synchronous access
to data shared between dierent subgroups of processors.
2 AN OVERVIEW ON THE PROGRAMMING LANGUAGE FORK
9
Consider the following example:
:::
shared var A : array 0..N;1] of integer
shared var i : integer
:::
fork 0..N;1]
@ = ::: ] = ::: for i := 0 to N;1 do
A (@+1) mod N ] := result(i,@,A)
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
enddo
(10)
endfork
(11)
:::
(12)
In this example the array A is used as a mail box for communication between the groups
1 : : : N. The loop index i and the limits of the for loop of lines (8) and (9) are shared
not only by the processors within every leaf group, but also between all groups generated
in line (5). If (as in the example) the loop condition depends only on variables shared
by all the existing groups, the semantics of FORK guarantees that the loop is executed
synchronously throughout those groups.
Hence, the results computed in round i are available to all groups in round i + 1. The
general rule by which every processor (and hence also the programmer) can determine
the largest surrounding group within which it runs synchronously is described in detail in
Section 3.4.3.
2.5 How to solve read and write conicts
So far we have explained the activation of processors and the generation of subgroups. We
left open what happens when several processors access the same shared variable synchronously. In this case, failure or success and the eect of the succeeding access is determined
according to an initially xed regime for solving access conicts. Most PRAM models
allow common read operations. However, we do not restrict ourselves to such a model.
The semantics of a FORK program also may be determined, e.g., w.r.t. an exclusive read
regime where synchronous read accesses of more than one processor to the same shared
variable leads to program abortion. Also, several regimes for solving write conicts are
possible. For example, we may x a regime where common writes are allowed provided
all processors assign the same value to the shared variable. In this case, the for loop
in the example above is executed successfully, whereas if we x a regime where common
writes are forbidden, a for loop with a shared loop parameter causes a failure of program
execution.
As another example consider a regime where common writes are allowed, and the
result is determined according to the processor numbers of the involved processors, e.g.,
the processor with the smallest number wins. This regime works ne if the processors
only access shared data within the present leaf group. If processors synchronously write
to variables declared in a larger group g they may solve the write conict according to
their processor numbers relative to that group g. Consider the following example:
3 SYNTAX AND SEMANTICS OF FORK
10
:::
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
shared var A : array0 .. N;1] of integer
:::
start0 .. 2N;1]
fork0 .. 1]
@ = ] div N
] = ] mod N
A]] := result(A,@,])
endfork
endstart
:::
The shared variable A is declared for some group g having after line (4) processors numbered 0 : : : 2 N ; 1. For n 2 f0 : : : N ; 1g there are processors p 0 and p 1 of processor
number n in the rst and the second leaf group, respectively, that want to assign a value
to An] in line (8). This conict is solved according to the processor numbers of p 0 and
p 1 relative to group g, i.e. n and N + n, respectively. Hence, in line (8), the result of
processor p 0 is stored in An].
Observe that this scheme fails if another start occurs inside the fork statement because the original processor numbers can no longer be determined. In this case program
execution fails.
In any case, the semantics of a FORK program is determined by the regime for solving
read and write conicts. Hence, FORK is intended to give the syntax and a scheme for
the semantics of FORK programs.
n
n
n
n
n
2.6 Input/Output in FORK
There are at least two natural choices for designing input and output facilities for FORK.
First, one may provide shared input/output facilities. These can be realized by (one{way
innite) shared arrays. Within this framework, synchronous I/O is realized by synchronous
access to the corresponding array where conicts are solved according to the same regime
as with other accesses to shared variables (see Section 2.5).
As a second possibility one may provide private input/output facilities. These can be
realized by equipping every logical processor with private input streams from which it can
read, and private output streams to which it can write. The latter is the straightforward
extension of pascal's I/O mechanism to the case of several logical processors.
Which of these choices is more suitable depends on the computing environment, e.g.,
the available hardware, the capabilities of an operating system and the applications.
Therefore, input functions and output procedures are not described in this rst draft
on FORK.
3 Syntax and semantics of FORK
In this section we give a description of the PRAM language FORK by a contextfree
grammar. We introduce the context conditions (i.e. correct type of variables, using only
variables that are declared, : : :) as well as the semantics of the language in an informal
way. We assume the reader to be familiar with pascal or any similar language.
3 SYNTAX AND SEMANTICS OF FORK
11
As usual, nonterminals are enclosed into angled brackets, and we use them also for
denoting arbitrary elements of the corresponding syntactical category, e.g., an arbitrary
statement may be addressed by <statement>. The empty word is denoted by ".
Programs are executed by processors which are organized in a group hierarchy. For
each construct of FORK we have to explain how the execution of this construct aects
the group hierarchy, the synchronism among the processors, and the scopes of objects.
We use the following terminology. A group hierarchy H is a nite rooted tree whose
nodes are labeled by sets of processors. A node of H is called a group (in H). Assume G
is a group. A subgroup of G is a node in the subtree with root G. A leaf group of G is
a leaf of this tree. A processor p is contained in a group G if it is contained in (the label
of) a leaf group of G.
In the sequel we dene inductively the actual group hierarchy and the notion of a
maximally synchronous group w.r.t. this hierarchy. According to the inductive denition
all processors in a maximally synchronous group are at the same program point. Also,
each leaf group is a subgroup of a maximally synchronous group. A group G is called
synchronous if it is a subgroup of a maximally synchronous group. If we loosely speak of a
synchronous group G executing, e.g., a statement, we always mean that all the processors
within G synchronously execute this statement.
3.1 Programs and blocks
A program consists of a program name and a block. This is expressed by the following
rule of the grammar:
<program>;!program <name> <block> .
Initially, the group hierarchy H consists just of one group numbered 0 which contains
a single processor also numbered 0. This group is maximally synchronous. At the end of
program execution we again have this hierarchy H.
A block contains the declarations of constants, data types, variables, and procedures
and a sequence of statements that should be executed. These statements may only use
constants, variables, and procedures according to pascal's scoping rules. A block is given
by the following rule:
<block>;!begin <decls> <stats> end
A block is syntactically correct if the declarations <decls> and the statements <stats>
are both syntactically correct. Declaration sequences and statement sequences are executed by maximally synchronous groups (w.r.t. the actual group hierarchy). In the sequel
G always denotes such a maximally synchronous group w.r.t. the actual group hierarchy
and H the subtree with root G. The execution of declarations and statements may change
the group hierarchy and the synchronism | but only within the subtree H. Therefore,
we only describe these changes.
3.2 Declarations
A sequence of declarations is either a single declaration or a single declaration followed by
a sequence of declarations. A sequence of declarations is (syntactically) correct if all of its
3 SYNTAX AND SEMANTICS OF FORK
12
declarations are correct and inside the sequence no name is declared twice. A sequence of
declarations is constructed according to the following grammar rule:
<decls>;!"
;!<declaration> ;!<declaration> <decls>
Assume G executes a declaration sequence
<declaration> <decls> .
Then G rst executes the declaration <declaration>. During the execution of
<declaration> the synchronism among the executing processors and the group hierarchy with root G may change. However, at the end of <declaration>, H is reestablished,
and G is maximally synchronous again w.r.t. the actual group hierarchy. Now G starts
executing <decls>. In contrast to pascal, declaration sequences have to be executed
since the declared objects cannot be determined before program execution (see ,e.g., Section 3.2.1). A declaration is either empty or it is a constant declaration, a type declaration,
a variable declaration, a procedure declaration, or a function declaration.
Constant and variable declarations also determine whether the newly created objects
are private or shared. For this we have the grammar rule:
<access> ;!private
;!shared
If an object is declared private, a distinct instance of this object is created for every
processor executing the declaration. If an object is declared shared, each leaf group executing this declaration receives a distinct instance of the object, which is accessible by all
the processors of this leaf group.
3.2.1 Constant declarations
A constant declaration is of the form
<declaration>;!<access> const <name> = <expr>
Thus a constant declaration species the name of the constant to be declared and its
value. Moreover it species whether this constant should be treated as a private or as a
shared constant. The type of the constant <name> is the type of the expression <expr>.
Contrary to constant declarations in pascal, it is not (always) possible to determine
the value of a constant at compile time. The dierence to variables, however, is that the
value of constants once determined cannot be changed any more. Therefore, constants are
typically used for declaring types, e.g. for index ranges of arrays.
A constant declaration is syntactically correct if the expression <expr> on the righthand side is a syntactically correct expression.
The way in which the value of a constant is determined is analogous to the method of
determining the value which a variable receives during an assignment (see Section 3.4.2).
3 SYNTAX AND SEMANTICS OF FORK
13
3.2.2 Type declarations
A type declaration is of the form
<declaration>;!type <name> = <type expr>
Here <type expr> is either a basic type (such as integer, real, boolean or char), a name
denoting a type (i.e., this name has been previously declared as a type) or an \expression"
for dening array and record types. The precise form of a type expression <type expr>
and its semantics will be described in Section 3.3.
A type declaration is syntactically correct if the type expression <type expr> is. If
<type expr> depends on private data (such as the processor number ] or names that are
declared as private constants) or private types then we treat the new type <name> as
private. This has the eect that we are not allowed to declare any shared variable of type
<name>. If the <type expr> does not involve any private data or types, then the new
type <name> is treated as a shared type. In this case we are allowed to declare both
shared and private variables of type <name>.
The semantics of a type declaration is similar to that of a constant declaration. If
<type expr> denotes a private type then all processors of G evaluate <type expr> to a
type value describing the access to the components of an object of this type. These type
values may be dierent for dierent processors. Now each processor associates the type
name <name> with its type value and a ag marking this type name as a private type.
If <type expr> does not depend on any private data or types, then all processors
of G evaluate <type expr> to a type value that may depend only on the leaf group to
which the processor belongs. This means that all processors inside the same leaf group
evaluate <type expr> to the same type value. Now each leaf group associates the type
name <name> with its type value and marks it as a shared type of this leaf group.
Note that it is possible to check statically (i.e., before the run of the program) whether
a type used by the program is private or shared. But usually it is impossible to evaluate
a type expression before running the program because type expressions can depend on
input data. This enables the use of dynamic arrays.
3.2.3 Variable declaration
A variable declaration is of the form
<declaration>;!<access> var <name> : <type expr>
A private variable declaration is syntactically correct if <type expr> is a syntactically
correct type expression. Every processor p of G creates a new distinct instance of this
variable and marks it as private w.r.t. p.
A shared variable declaration is syntactically correct if <type expr> is a syntactically
correct shared type expression. This means that <type expr> may not involve any private
data or types. In this case, each leaf group g of G creates a new distinct instance of this
variable and marks it as shared relative to group g. Note that in the case of a shared
variable declaration, all processors inside the same leaf group evaluate <type expr> to
the same type value.
3 SYNTAX AND SEMANTICS OF FORK
14
3.2.4 Procedure and function declarations
Procedure and function declarations are very similar to pascal's procedure and function
declarations. They are constructed according to the following grammar rule:
<declaration>;!forward procedure <name>(<formal par list>)
;!forward <access> function <name>
(<formal par list>):<simple type>
;!procedure <name>(<formal par list>) <block>
;!<access> function <name>
(<formal par list>):<simple type> <block>
;!procedure <name> <block>
;!function <name> <block>
Forward declarations are used to construct mutually recursive functions or procedures.
If there is a forward declaration of the form
forward procedure <name>(<formal par list>)
then the declaration sequence must contain the explicit declaration of procedure <name>.
This means there has to be a declaration of the form
procedure <name> <block>
Note that we do not repeat the list of formal parameters in the explicit declaration of a
procedure that has been declared forward previously.
If a declaration sequence contains a declaration of the form
procedure <name> <block>
then this procedure declaration must be preceded by a forward declaration of name
<name> as a procedure. Forward function declarations are treated analogously.
The remaining context conditions of procedure and function declarations are the same
as for procedure and function declarations in pascal. Only the formal parameters and
the return values of functions are treated dierently:
Our language uses const parameters instead of pascal's value parameters.
Every formal parameter has to be declared as private or as shared.
In a function declaration we have to specify whether the return value should be
treated as a private or as a shared value. If it is declared shared then it is shared
relative to the leaf group which called the function.
A formal parameter is of the form:
<formal par>;!<access> const <name> : <simple type>
;!<access> var <name> : <simple type>
where <simple type> is either a basic type or the name of a previously declared type.
A formal parameter list is of the form:
<formal par list>;!"
;!<formal par>
;!<formal par list> <formal par>
Inside a formal parameter list no name may be declared twice.
3 SYNTAX AND SEMANTICS OF FORK
15
3.3 Type expressions
Type expressions are similar to pascal's type expressions. A type expression is a basic
type (such as integer, real, : : :), a type name, or an array or record type. This is expressed
by the following grammar rules:
<basic type> ;!integer j real j boolean j char
<simple type>;!<basic type> j <name>
Here <name> is the name of a previously dened type.
<type expr> ;!<simple type>
;!array <range list>] of <type expr>
;!record <record list> end
<range list> ;!<const range>
;!<range list> , <const range>
<const range>;!<expr> .. <expr>
Here both expressions <expr> are constant expressions of type integer. This means that
they may not involve any variables or user{dened functions.
<record list> ;!<record item>
;!<record list> <record item>
<record item>;!<name> : <type expr>
The semantics of type expressions is the straightforward extension of pascal's semantics of type expressions. Observe that in contrast to pascal we are able to declare dynamic
arrays. Also, ] and @ are allowed in type expressions.
3.4 Statements
Our language supports 7 kinds of statements:
1. assignments
2. branching statements (if and case statement)
3. loop statements (while, repeat and for statement)
4. procedure calls
5. blocks
6. activation of new processors (start statement)
7. splitting of groups into subgroups (fork statement)
The statements of types 1|4 are similar to their counterparts in pascal. But due to the
fact that there are usually several processors executing such a statement synchronously
there are some dierences in the semantics.
The start statement allows the activation of new processors by need. If an algorithm
needs a certain number of processors these processors are activated via start. When
the algorithm has terminated, the new processors are deactivated and the computation
continues with the processors that were active before the start.
3 SYNTAX AND SEMANTICS OF FORK
16
The fork statement does not change the number of active processors, but renes their
division into subgroups.
A sequence of statements is constructed according to the grammar rule
<stats>;!<statement>
;!<statement> <stats>
Assume G is a maximally synchronous group executing a statement sequence
<statement> <stats> .
Then G rst executes the statement <statement>. During the execution of <statement>
the synchronism among the executing processors and the group hierarchy H with root G
may change. However, at the end of <statement>, H is reestablished, and G is maximally
synchronous again w.r.t. the actual group hierarchy. Then G starts executing <stats>.
We now give the syntax and semantics of statements.
3.4.1 The empty statement
The empty statement is of the form
<statement>;!"
It has no eect on the semantics of a program.
3.4.2 The assignment statement
The assignment statement is of the form
<statement>;!<expr> := <expr>
An assignment is syntactically correct if the following conditions hold:
The expression <expr> on the left{hand side denotes a variable of type t or the
name of a function returning a value of type t. In the latter case the assignment has
to occur in the statement sequence of that function.
The expression on the right{hand side denotes a value of type t.
First all processors of G synchronously evaluate the two expressions (see Section 3.5).
Each processor of G evaluates the left{hand side expression to a private or shared variable.
Then the processors of G synchronously bind the value of the right{hand side expression
to this variable. The eect of this is dened as follows:
1. if the variable is private, then after the assignment it contains the value written by
the processor.
2. if the variable is shared, then the regime for solving write conicts (see Section 2.5)
determines the success or failure of the assignment, and, in case of success, the value
to which the variable is bound.
3 SYNTAX AND SEMANTICS OF FORK
17
Example Assume that four processors numbered 0 | 3 synchronously execute the
assignment
x := @ + ]
where x is a variable of type integer. The value of x after the assignment depends on
whether x is declared as private or as shared. We list some cases below.
1. x is a private variable. In this case there are four distinct incarnations of x. Then
after the assignment x contains for each processor the value of the expression @ +
].
2. the four processors form a leaf group and the variable x is shared relative to that
leaf group. In this case the four processors write to the same variable and this write
conict is solved according to the regime for solving write conicts (see Section 2.5).
3. the four processors form two dierent leaf groups (i.e. processors 0 and 1 are in the
rst one, 2 and 3 in the second one). Each leaf group has a distinct instance of the
variable x. Then processors 0 and 1 write to the same variable and so do processors
2 and 3. The regime for solving write conicts determines the value of the (two
distinct) variables x after the assignment.
Above we have described the assignment of basic values. Our language supports assignment of structured values (e.g., a := b, where a and b are arrays of the same type),
which is carried out component by component synchronously.
3.4.3 The if statement
The if statement is constructed according to the following grammar rules:
<statement>;!if <expr> then <stats> <else part>
<else part> ;!endif
;!else <stats> endif
The if statement is syntactically correct if <expr> denotes an expression of type boolean
and <stats> is a syntactically correct sequence of statements.
All processors of G rst evaluate the expression <expr> synchronously. Depending on
the result of this evaluation the processors synchronously execute the statements of the
then or of the else part, respectively. Since dierent processors may evaluate <expr> to
dierent values, we cannot make sure that all of them continue to work synchronously. The
group hierarchies and the new maximally synchronous groups executing the then and the
else part (if present), respectively, are determined according to the constants, variables and
functions on which the expression <expr> \depends". Precisely, we say <expr> depends
on a variable x if x occurs in <expr> outside any actual parameter of a function call. The
case of constants and functions is analogous. We have to treat two cases:
1. the expression does not depend on any private variables, constants, or functions.
In this case we do not change the group hierarchy H. Only the synchronism among
the processors may change. Consider a processor p of G. We choose the maximal
group g of H which satises the following conditions:
p
3 SYNTAX AND SEMANTICS OF FORK
18
g is a subgroup (not necessarily a proper one) of G.
p is contained in g .
The condition <expr> does not depend on any shared variables or other shared
data relative to a proper subgroup of g . Note that the return value of a shared
function is a shared datum only relative to a leaf group (see Section 3.5).
p
p
p
Under these conditions all processors in g evaluate <expr> to the same value.
Therefore all these processors choose the same branch of the if statement and hence
are at the same program point.
Note that a processor outside of g runs asynchronously with the processor p even
if it evaluates expression <expr> to the same value.
Example Assume that during program execution we have obtained the following
group hierarchy.
G0
p
p
,ll
,
ll
,,
G1
G2
TT
TT
T
T
G3
G4
G5
G6
and that all processors execute synchronously the if statement
if x = 5 then S1 else S2 endif
where an instance of x is a shared variable relative to G1, and another instance
of x is a shared variable relative to G2 . Then the processors of group G3 work
synchronously with the processors of group G4 and the same holds for groups G5
and G6, respectively. The processors of group G4 work asynchronously with the
processors of group G6 even if the two instances of variable x contain the same
value.
2
When a processor of G has nished the execution of <stats> it waits until the other
processors of G have nished their statement sequences. When all processors of
G have arrived at the end of the if statement, G becomes maximally synchronous
again. This means that even if two processors of G work asynchronously inside the
if statement they become synchronous again after the if statement.
2. The expression depends on private variables, constants, or functions
In this case we cannot even be certain that all processors inside the same leaf group
evaluate <expr> to the same value. In this case both the group hierarchy H and
the synchronism are changed as follows.
Each leaf group of G generates two new leaf groups: The rst one contains all processors of the leaf group that evaluate <expr> to true, and the second one contains
the rest. All new leaf groups obtain the group number of their father group. The
processors inside the same new subgroup work synchronously, while the processors
of dierent subgroups work asynchronously.
3 SYNTAX AND SEMANTICS OF FORK
19
Again, when a processor reaches the end of the if statement, it waits until the
other processors reach this point. When all the newly generated leaf groups have
terminated the execution of the if statement, i.e., when all processors have reached
endif, the leaf groups are removed. The original group hierarchy H is reestablished
and G is again maximally synchronous.
3.4.4 The case statement
A case statement is constructed according to the grammar rules
<statement>;!case <expr> of <case list> <end case>
<case list> ;!<case item>
;!<case list> <case item>
<end case> ;!endcase
;!else <stats> endcase
<case item> ;!<expr> : <stats>
;!<range> : <stats>
<range> ;!<expr> .. <expr>
A case statement is syntactically correct if the expression <expr> is of type integer.
Moreover, in the list <case list> of case items all expressions have to be expressions of
type integer.
The semantics of the case statement is similar to the semantics of pascal's case statement. The group hierarchy and the synchronism among the processors executing the
dierent branches of the case statement is determined in the same fashion as for the if
statement (see Section 3.4.3).
3.4.5 The while statement
The while statement is of the form
<statement>;!while <expr> do <stats> enddo
It is syntactically correct if <expr> is an expression of type boolean and <stats> is a
syntactically correct sequence of statements.
All processors of G rst evaluate <expr> to a boolean value. Those processors that
evaluate <expr> to true start to execute the statements <stats>. The others wait until
all processors have nished executing the while statement. The group hierarchy and the
synchronism among the processors executing <stats> is determined in the same way as
for the if statement (see Section 3.4.3). When the processors have nished the execution
of <stats> they again try to execute the while statement. Those that evaluate <expr>
to true will again execute <stats> while the others have terminated the while statement.
Thus the execution of the while statement
while <expr> do <stats> enddo
is semantically equivalent to the execution of
if <expr> then
<stats> while <expr> do <stats> enddo
endif
3 SYNTAX AND SEMANTICS OF FORK
20
3.4.6 The repeat statement
The repeat statement is of the form
<statement>;!repeat <stats> until <expr>
It is syntactically correct if <stats> is a syntactically correct sequence of statements
and <expr> is an expression of type boolean.
The repeat statement
repeat <stats> until <expr>
is just an abbreviation for the sequence of statements
<stats>
while not (<expr>) do <stats> enddo
3.4.7 The for statement
The for statement is of the form
<statement>;!for <name> := <expr> to <expr> <step> do
<stats> enddo
<step>
;!"
;!step <expr>
It is syntactically correct if
1. <name> is a variable of type integer,
2. the expressions <expr> are of type integer,
3. inside <stats> there are no assignments to <name>, nor is <name> used as a
var parameter in procedure and function calls. This means that inside <stats>,
<name> is treated like a constant of type integer.
We give the semantics of the for statement by reducing a for statement to a while loop,
or more precisely: we rewrite a program that uses n 1 for statements into a program
that contains only n ; 1 0 for statements. By iterating this process we are able to
eliminate all for loops. Observe that, in contrast with the simulation of for by while in a
sequential context, we have to be careful about where auxiliary variables are declared.
1. The statement
for <name> := <expr>1 to <expr>2 do
<stats>
enddo
is an abbreviation for
<name> := <expr>1 <name>2 := <expr>2 while <name> <= <name>2 do
<stats> <name> := <name> + 1
enddo
3 SYNTAX AND SEMANTICS OF FORK
21
2. The statement
for <name> := <expr>1 to <expr>2 step <expr> 3 do
<stats>
enddo
is an abbreviation for
<name> := <expr>1 <name>2 := <expr>2 <name>3 := <expr>3 if <name>3 > 0 then
while <name> <= <name>2 do
<stats> <name> := <name> + <name>3
enddo
else
while <name> >= <name>2 do
<stats> <name> := <name> + <name>3
enddo
endif
Here <name>2 and <name>3 denote variables of type integer such that the following
holds:
The names are new, i.e. these names are not used anywhere in the program containing
the for statement.
In the simulating program these names are declared in the same declaration sequence
as the index <name>.
If <name> is declared to be private (shared), then <name>2 and <name>3 are
declared private (shared).
3.4.8 Procedure calls
Syntax and semantics of procedure calls are similar to those of pascal. There are some
slight dierences due to the fact that processors can access two dierent kinds of objects:
shared and private objects. This imposes some restrictions on the actual parameters of a
procedure call.
The syntax of a procedure call is given by the following grammar rules:
<statement>;!<name> ( <expr list> )
<expr list> ;!"
;!<expr>
;!<expr list> , <expr>
A procedure call is syntactically correct if the following conditions hold:
<name> has to be the name of a procedure declared previously.
the list of actual parameters (<expr list>) and the list of formal parameters have to
match, i.e.,
3 SYNTAX AND SEMANTICS OF FORK
22
1. Both lists have the same length.
2. Corresponding actual and formal parameters are of the same type.
3. If a formal parameter is a shared{var{parameter then the corresponding actual
parameter has to be a variable which does not depend on any private object,
e.g., ar]] is not allowed as an actual shared var parameter even if ar is a shared
array.
4. If a formal parameter is a private{var{parameter then the corresponding actual
parameter can be a private or a shared variable.
5. If a formal parameter is a shared{const{parameter then the corresponding actual parameter may denote a private or a shared value.
6. If a formal parameter is a private{const{paramenter then the corresponding
actual parameter may denote a private or a shared value.
The case where a formal shared{var{parameter is bound to a private variable is explicitly
excluded since it allows some processor to modify the contents of this private variable,
which may belong to a dierent processor. In contrast, in the case of a formal shared{
const{parameter the value of that formal parameter during the execution of the procedure
is determined by the regime for solving write conicts. This is done in the same way as
when determining the value of a shared variable after the assignment of a private value
(see Section 3.4.2).
Assume the maximally synchronous group G executes a procedure call
<name>(<expr> 1 : : : <expr> ) .
First the actual parameters <expr>1 : : : <expr> are synchronously evaluated and
bound to the corresponding formal parameters synchronously from left to right. Then
G synchronously executes the procedure block, i.e. the declarations and the statements of
that procedure.
n
n
3.4.9 Blocks as statements
<statement>;!<block>
A statement which is a block enables the declaration of new objects that are valid only
inside <block>.
3.4.10 The start statement
The start statement is used for activating new processors. The syntax is given by the rule:
<statement>;!start <range>] <stats> endstart
The start statement is syntactically correct if the expressions of <range> are of type
integer and do not depend on any private data. Moreover <stats> may not use any private types, variables or constants, except @ and ], that are declared outside the start
statement.
When the processors of G execute the statement
start <expr>1 ::<expr>2] <stats> endstart
3 SYNTAX AND SEMANTICS OF FORK
23
they rst evaluate the range <expr>1 .. <expr>2 . Since this range does not depend on
any private data this gives for all processors in the same leaf group g the same range
v 1 : : : v 2. At every leaf group g of G a new leaf group is added which contains v 2 ;
v 1 +1 new processors numbered with the elements of fv 1 : : : v 2g. The group numbers
of the new leaf groups are the same as for their father groups. G remains maximally
synchronous. Now G executes <stats>, which means that the new processors of the new
leaf groups execute <stats> synchronously. When G reaches endstart, the leaf groups are
removed, i.e. the original hierarchy H is reestablished.
g
g
g
g
g
g
3.4.11 The fork statement
The fork statement is used to generate several new leaf groups explicitly. The new leaf
groups obtain new group numbers and the processors inside the new leaf groups are renumbered.
The syntax of the fork statement is given by the following grammar rules:
<statement> ;!fork<range>] <new values> <stats> endfork
<new values>;!<new group> <new proc>
;!<new proc> <new group>
<new group> ;!@ = <expr>
<new proc> ;!] = <expr>
The fork statement is syntactically correct if the expressions <expr> are of type integer and the expressions of <range> do not depend on any private data.
Assume the processors of G execute the statement
fork<expr>1 .. <expr>2 ]
@ = <expr>3 ] = <expr>4 <stats>
endfork
First each processor p of G evaluates the expressions <expr>1 : : : <expr>4. Since the
expressions <expr>1 and <expr>2 do not depend on any private data, all processors
within the same leaf group g of G evaluate <expr> 1 and <expr> 2 to the same values
v 1 and v 2 respectively. At every leaf group g of G v 2 ; v 1 + 1 new leaf groups are
added which are numbered with the elements of fv 1 : : : v 2 g. The new leaf group with
number i is labeled by the subset of those processors of g which evaluate <expr> 3 to i.
Each of these processors obtains the value of <expr>4 as its new processor number. G
remains maximally synchronous and executes <stats>. When G reaches endfork, the new
leaf groups are removed, i.e., the original group hierarchy is reestablished.
g
g
g
g
3.5 Expressions
g
g
Syntax and semantics of expressions are similar to those of pascal. There are just two new
predened constants: the processor number ] and the group number @, which both are
private constants of type integer. The context conditions for function calls are analogous to
those for procedure calls (see Section 3.4.8). Every processor of the maximally synchronous
group G evaluates the expression and returns a value. The return value of a shared function
is determined according to the choosen regime for solving write conicts seperately for each
4 IMPLEMENTATION
24
leaf group of G and is treated as a shared object of this leaf group. Note that the evaluation
of expressions may cause read conicts, which are solved according to the choosen regime
for solving read conicts.
Example
:::
shared var a: array1 .. 10] of integer
:::
: : : := a3]+3
:::
Determining the variables corresponding to the subexpression a3] does not cause a read
conict, whereas determining the values of these variables may cause read conicts. 2
4 Implementation
In this section we sketch some ideas showing that programs of FORK can not only be
translated to semantically equivalent PRAM code, but also to code that runs eciently.
These considerations are supposed to be useful both to theoreticians and to compiler writers, who may have dierent realizations of PRAMs available, possibly without powerful
operating systems for memory management and processor allocation.
Our basic idea for compiling FORK is to extend the usual stack-based implementation
of recursive procedure calls of, e.g., the P-machine 16] by a corresponding regime for the
shared data structures, a synchronization mechanism, and a management of group and
processor numbers. Hence, here we address only the following issues:
creating new subgroups
synchronization
starting new processors with start.
4.1 Creating new subgroups
The variables which are shared relative to some group have to be placed into some portion of the shared memory of the PRAM, which is reserved for this group. Therefore,
the crucial point in creating new subgroups is the question of how the subgroups obtain
distinct portions of shared memory. There are (at least) two ways to do this with little
computational overhead:
1. by address arithmetic, as suggested in 5],
2. taking into account that in practice the available shared memory always is nite, by
equally subdividing the remaining free space among the newly created subgroups.
The rst method corresponds to an addressing scheme where the remaining storage is
viewed as a two{dimensional matrix. Its rows are indexed by the group numbers, whereas
the column index gives the address of a storage cell relative to a given group. For the
second method the role of rows and columns are simply exchanged. In both cases splitting
into subgroups can be executed in constant time. Also, the addresses in the physical
shared memory can be computed from the (virtual) addresses corresponding to the shared
4 IMPLEMENTATION
25
memory of a subgroup in constant time. This memory allocation scheme is well suited to
group hierarchies with balanced space requirements. It may lead to an exponential waste
of space in other cases. Consider the following while loop, whose condition depends on
private variables:
while cond(]) do
(1)
work(])
(2)
enddo
(3)
Whenever the group of synchronously working processors executes line (1), it is subdivided into two groups, one consisting of the processors that no longer satisfy cond(]),
and one consisting of the remaining processors. Hence, the rst group needs no shared
memory space (besides perhaps some constant amount for organizational reasons) a fair
subdivision into two equal portions would unnecessarily halve the space available to the
second group to execute work(]) of line (2).
However, there is an immediate optimization to the above storage distribution scheme:
we attach only a xed constant amount of space to groups of which it is known at compile
time that they do not need new shared memory space, and subdivide the remaining space
equally among the other subgroups.
This optimization clearly can be performed automatically for loops as in the given
example, but also for one-sided if s, or if s where one alternative does not involve blocks
with a non{empty declaration part.
4.2 Synchronization
In order to reestablish a group g, the runtime system has to determine when all the subgroups of group g have nished. This is the termination detection problem for subgroups.
If all processors run synchronously, no explicit synchronization is necessary. In the
general case where the subgroups of group g run asynchronously, there are the following
possibilities for implementing termination detection:
1. Use of special hardware support such as a unit{time fetch&add operation which
allows the processors within a group to simultaneously add an integer to a shared
variable.
2. Static analysis (possibly assisted by user annotation) most of the PRAM algorithms
published in the literature are of such a simple and regular structure that the relative
times of execution sequences can be determined in this way.
3. Use of a termination detection algorithm at runtime. The latter is always possible
however, complicated programs cheating a static analyzer will be punished by an
extra loss of eciency.
4.3 Starting new processors
The following method which is analogous to the storage distribution scheme works only for
concurrent{read machines with a nite number of processors. Before running the program,
all processors are started, all of them with processor number ] = 0. If in a subsequent start
statement of the program fewer processors are started than what are physically available
in the leaf group executing this statement, then several physical processors may remain
\identical", i.e., receive the same new processor number. These identical processors elect
4 IMPLEMENTATION
26
a \leader". All of them execute the program but only the leader is allowed to perform
write operations to shared variables. Consider the following example. Assume that we are
given 512 physical processors.
start 0 .. 127]
(1)
if ] < 64 then
(2)
start 0 .. 212]
(3)
compute(212)
(4)
endstart
(5)
endif
(6)
endstart
(7)
Before line (1), all the 512 physical processors are started. After line (1), there are always
four processors having the same processor number ]. Having executed the condition of
line (2) all the processors whose processor number is less than 64 enter the then part: these
are 256. All of them are available for the start instruction of line (3), where they receive
the new numbers 0 : : : 212. Two physical processors are assigned to each of the rst
43 logical processors whereas one physical processor is assigned to each of the remaining
170 logical processors. The original processor identities are put onto the private system
stack. When the endstart in line (5) is reached, the processors reestablish their former
processor numbers.
If more processors are started than physially available in the present group, then every
processor within that group has to simulate an appropriate subset of the newly started
processors.
In both cases start can be executed in constant time by every leaf group consisting of a
contiguous interval of processors: this is the case, e.g., for starts occuring in the statement
sequence of the toplevel block. Thus, a programming style is encouraged where the logical
processors necessary for program execution are either started at the beginning, i.e., before
splitting the initial group into subgroups, or are started in a balanced way by contiguous
groups.
To maximally exploit the resources of the given PRAM architecture, a programmer
may wish to write programs which use dierent algorithms for dierent numbers of physically available processors. Therefore, a (shared) system constant of type integer should be
provided, whose value is the number of physical processors available on the given PRAM.
This allows programs to adopt themselves to the underlying hardware.
References
1] F. Abolhassan, J. Keller, and W.J. Paul. U berblick uber PRAM{Simulationen und
ihre Realisierbarkeit. In Proceedings der Tagung der SFBs 124 und 182 in Dagstuhl,
Sept. 1990, Informatik Fachberichte. Springer Verlag, to appear.
2] A.V. Aho, J.E. Hopcroft, and J.D. Ullman. The Design and Analysis of Computer
Algorithms. Addison{Wesley, Reading Massachusetts, 1974.
3] S.G. Akl. The Design and Analysis of Parallel Algorithms. Prentice Hall, 1989.
4] Y. Ben{Asher, D.G. Feitelson, and L. Rudolph. ParC | an extension of C for Shared
Memory Parallel Processing. Technical report, The Hebrew University of Jerusalem,
1990.
5] P.C.P. Bhatt, K. Diks, T. Hagerup, V.C. Prasad, S. Saxena, and T. Radzik. Improved
deterministic parallel integer sorting. Information and Computation, to appear.
6] A. Borodin and J.E. Hopcroft. Routing, merging and sorting on parallel models of
computation. J. Comp. Sys. Sci. 30, pages 130 { 145, 1985.
7] M. Dietzfelbinger and F. Meyer auf der Heide (ed.). Das GATT-Manual. In: Analyse
paralleler Algorithmen unter dem Aspekt der Implementierbarkeit auf verschiedenen
parallelen Rechenmodellen. Technical report, Universitat Dortmund, 1989.
8] F.E. Fich, P. Ragde, and A. Widgerson. Simulations among concurrent-write PRAMs.
Algorithmica 3, pages 43 { 51, 1988.
9] S. Fortune and J. Wyllie. Parallelism in random access machines. In 10th ACM
Symposium on Theory of Computing, pages 114{118, 1978.
10] N.H. Gehani and W.D. Roome. Concurrent C. In N.H. Gehani and A.D. McGettric,
editors, Concurrent Programming, pages 112{141. Addison Wesley, 1988.
11] A. Gibbons and W. Rytter. Ecient Parallel Algorithms. Cambridge University
Press, 1988.
12] P.B. Hansen. The programming language Concurrent Pascal. IEEE Transactions on
Software Engeneering 1(2), pages 199{207, June 1975.
13] H.F. Jordan. Structuring parallel algorithms in a MIMD, shared memory environment. Parallel Comp. 3, pages 93{110, 1986.
14] Inmos Ltd. OCCAM Programming Manual. Prentice Hall, New Jersey, 1984.
15] United States Department of Defense. Reference manual for the Ada programming
language. ANSI/MIL-STD-1815A-1983.
16] St. Pemberton and M. Daniels. Pascal implementation: The P4 compiler. Ellis
Horwood, 1982.
Download