A Benes Network Control Algorithm for Frequently Used Permutations

advertisement
IEEE
TRANSACTIONS ON
637
COMPUTERS, VOL. c-27, NO. 7, JULY 1978
He is currently an Assistant Professor of
REFERENCES
Computer Science, Southern Methodist Univer-
[1] C. V. Ramamoorthy and M. J. Gonzalez, "A survey of techniques for
[2]
[3]
[4]
[5]
recognizing parallel processable streams in computer programs," in
Proc. 1969 Fall Joint Computer Conf., pp. 1-15, 1969.
C. V. Ramamoorthy, T. F. Fox, and H. F. Li, "Scheduling parallel
processable tasks for a uniprocessor," IEEE Trans. Comput., vol. C-25,
pp. 485-495, May 1976.
R. W. Conway, W. L. Maxwell, and L. W. Miller, Theory ofScheduling.
Reading, MA: Addison-Wesley, 1967.
N. J. Nilsson, Problem Solving Methods in Artificial Intelligence. New
York: McGraw-Hill, 1971.
T. F. Fox and C. V. Ramamoorthy, "Scheduling parallel processable
tasks for a uniprocessor," Electron. Res. Cent., University of Texas at
Austin, Tech. Memo 38, NSF-GJ28452, Jan. 1973.
William F. Appelbe received the Ph.D. and M.S. degrees in computer
science at the University of British Columbia, Vancouver, in 1978 and
1975, respectively, and the B.S. degree in information science at Monash
University, Melbourne, Australia, in 1973.
sity, Dallas, TX. His current research interests
include compiler design, the semantics of programming languages, and operating systems
design for distributed processing.
Mabo R. Ito was born in Vancouver, Canada, in
1938. He received the B.Sc. degree in engineering physics and the M.Sc. degree in electrical
engineering from the University of Manitoba,
Manitoba, in 1960 and 1963, respectively, and
the Ph.D. degree from the University of British
Columbia, Vancouver, in 1971.
From 1962 to 1966 and from 1970 to 1973 he
__
was with the National Research Council of Canada, where he worked on the design of special
purpose computers, microprogramming, realtime software, computer graphics, and pattern recognition. From 1973
to the present, he has been a faculty member of the Department of
Electrical Engineering, University of British Columbia. His research
interests are in computer and digital systems architecture, real-time software and operating systems, microprogramming, computer graphics,
pattern analysis, and recognition.
Parallel Permutations of Data: A Benes Network
Control Algorithm for Frequently Used
Permutations
JACQUES LENFANT
Abstract-The Benes binary network can realize any one-to-one
mapping of its 2' inlets onto its 2" outlets. Several authors have
proposed algorithms which compute control patterns for this network
from any bijection assignment. However, these algorithms are both
time-consuming and space-consuming. In order to meet the time
constraints arising from the use of a Benes network as the alignment
network of a parallel computer, another approach must be chosen. In
this paper, we consider typical functions and show that the set of
needed permutations of data is very small, as compared to the whole
symmetric group. We gather frequently used bijections into five
families. For each family we present an algorithm that can control
the two-state switches on the fly, as the vector of data passes through
Manuscript received February 2, 1977; revised October 6, 1977. This
work was supported by the Service d'Orientation et de Synthese de la
Recherche en Informatique (France) under Contract SESORI-IRIA76121.
The author is with the Institut de Recherche en Informatique et
Systemes Aleatoires, University of Rennes, Rennes, France.
the network. Finally, we describe one possible scheme to implement
an instruction "Trigger a Frequently Used Bijection."
Index Terms-Alignment network, Benes network, Clos network,
divide-and-conquer technique, memory-processor connection, parallel computer, permutation network, switching network.
I. INTRODUCTION
FOR SEVERAL YEARS, there has been considerable
in array computer architecture [28] which,
according to Flynn's classification [8], is designated by the
acronym for Single-Instruction stream, Multiple-Data
stream (SIMD). These computer systems are characterized
by a large number of identical processing elements (PE) to
which a single instruction is broadcast from a central control
during each time unit. Only those PE's which are not
inhibited as a result of previous operations execute the
Finterest
0018-9340/78/0700-0637$00.75 C 1978 IEEE
638
6EEE TRANSACTIONS ON COMPUTERS, VOL. c-27, NO. 7, JULY 1978
common instruction. In order to avoid access conflicts, the
main storage is divided into memory blocks which are at
least as numerous as the processing elements. Two types of
connection between processing elements and memory
blocks are depicted on Fig. 1. The first structure is appropriate only when the number N of processing elements is
equal to the number M of memory blocks. Then each
processing element is permanently linked to one-and only
one-memory block. Communication between processing
elements is established by means of the interconnection
network which can realize some permutations on the contents of registers. When several permutations are successively performed on the same vector of data, this structure is
very efficient since the main storage whose technology
is usually slower than processing element technology is not
involved in the transfers. This design was chosen for most
array computers which are currently in use or under
development [1], [3], [11]. A slightly different approach is
also illustrated by Fig. 1. If the routing network is used to
connect memory blocks to processing elements, then it can
cope with a number M of memory blocks greater than the
number N of processing elements. This is an interesting
possibility since, as shown by Lawrie [16], access to rows,
columns, diagonals, reverse diagonals, and blocks of a
matrix cannot all be implemented if M = N, but they can if
M = 2N.
The performance of the interconnection network is critical with respect to the overall performance of an array
computer. Its design must be aimed at achieving low cost of
hardware, high operating speed, large combinatorial power
(i.e., a high percentage of realizable bijections ofinputs onto
outputs), and simplicity of control. In this paper, we describe
a network that can realize any one-to-one connection
between N = 2' inputs and N outputs. For this network we
shall propose algorithms that yield control patterns for the
bijections frequently used in parallel processing. These
selected bijections, referred to in the sequel as Frequently
Used Bijections (FUB), are grouped into five families.
Within its family, a bijection is referenced by a small number
of parameters. More precisely, the name of a FUB can be
easily coded with a number of bits that is close to 2n.
Consequently, the length of such a name is, in practice,
smaller than the memory word length.
The next section is devoted to a description of FUB's and
to a survey of those interconnection networks which have
been previously designed for SIMD computers. In Section
III, we describe the Benes binary rearrangeable network [5]
for which we present control algorithms and a possible
implementation at hardware level.
II. PARALLEL ALGORITHMS AND
INTERCONNECTION NETWORKS
A. Notation
Let n be a strictly positive integer and denote by E(n) the set
{O, 1, 2, * , 2n 1}. The elements of E(") will be considered as
integers, as residue classes modulo 2", or as n-dimensional
vectors whose elements are either 0 or 1: the vector (xn,
MB
MB
MB
MB
PE
PE
PE
PE
Interconnection
MB
MB
MB
Interconnection
P
Network
PE
MB
Netwo rk
PE
P
Fig. 1. Memory and processing elements in SIMD computers-two
cases.
xn1,
, x2, x1) is identified with the integer
1l. xi2--, xn being the most significant digit. The
x=
symbol D denotes the bit-per-bit EXCLUSIVE OR operation. A
segment of size 2' in E(") is a subset of E(") consisting of 2'
consecutive integers, of which the smallest is a multiple of 2'
(O < i < n).
We denote by ,9(n) the set of permutations on E("). The
permutations that keep all the subsets {2j, 2j + 1} of E(")
globally invariant (O < j < 2n- 1) are of special interest in
this paper. They form a subgroup of the symmetric group
that is denoted by _on). Let h,j, 1, m, and k be five nonnegative
integers, k being less than 2". The permutations on E(") that
will be considered in the sequel are defined in Table I and
commented on in Sections II-B and C. In Table I, 'a and a2
are the subvectors (xn j, x"- - 1, , x 1) and (x",xn _ 1, ,
xn-j+ 1), respectively. In the same way, b4, b3, b2, b1 and c5,
C4, C3, C, I are bit substrings of x = (x", x"-, , x2, xl)
whose lengths are j, 1, m, n - (j + 1 + m) and j, 1, m, h,
n - (j + I + m + h), respectively. For instance, since
97 = (1, 1, 0, 0, 0, 0, 1), permutation I473)7,1,97 maps the
element
(X7 , X,
X3, X2 , X1)
0 1 7-4-0-1
X5, X4,
4 bits
of E(7) onto the element
(X7, X6, X5, X4
X2 X1,
X3)
where x, = xi ( 1 is the complement of xi.
B. Bijections Used in Parallel Processing
Five families of bijections are defined in Table I. Each of
them was obtained by selecting a set of permutations with
respect to their usefulness and by extending this set, if
necessary, in order to meet a stability criterion. This stability
criterion, which is explained in Section III, allows recursive
639
LENFANT: PARALLEL PERMUTATIONS OF DATA
PERMUTATIONS
:X=(xnaxn- x2.xl)
x=xnxn-i *'x2,0l)
,(n)
p(n)
1
.
x
-
-
n
k
,
(xn_1.-- x2 xI xn)
o- (x ,lx2 *. aaxn-l0xn)
n
T(n)
TABLE I
SET E(1) = {0, 1, 2,
ON THE
a
x
n
k
x +
k
2" - 1}
PERFECT SHUFFLE
BIT REVERSAL
mod. 2n
CYCLIC SHIFT OF AMPLITUDE k
THE FIVE FAMILIES
x(n)
a2,1)
<(,k
(a2' 1)
mn)
(E5,E4,EJ,E2,E1)
PERMUTATIONS IN SUBGROUP
(n)
(or
(n)
(or
+
a
k
mod. 2
(a2
0 k
(a1))
^(b4,b1,,b3)
-
(j odd)
CYCLIC SHIFT WITHIN SEGMENTS
OF SIZE 2nBIT REVERSAL WITHIN SEGMENTS
(Al))
(j
n)
OF SIZE
*
(V5,A2,EJ,E0,V1(
k
(ji+t+m
<
(j+a+m+ll
2n-j
n)
nn)
SEGMENT SHUFFLE
B(n)
x
CO
(n))
IDENTITY
x
+ 1
sX2')
-
in)(ar x=(xC.Xn_ls.
(n
x
(ao>
(b4, b3, b2'b1)
B,(,m,k
C
j
x
,k
J
-xn'n- n"l
..
=
if
x
nn
x is even
EXCHANGE
@ xj = (xn,xn_l, ..........
(1
algorithms to be developed for the control of a Benes binary
rearrangeable network. Because of this extension, a family
may include some permutations of little interest to parallel
programming. Nevertheless, any permutation belonging to
one of these five families -will be referred to as a FUB. Most
permutations which are necessary to implement previously
published algorithms are FUB's. Typical applications are
presented below.
1) The Family {),j')}: If j is an odd integer, A(J) is the
bijection that maps any x in E(nl ontoj x + k mod 2n. Note
that A(n) is the cyclic shift of amplitude k and is also denoted
by r( .
These permutations are used to implement parallel LOAD
and STORE operations on matrices. In order to take advantage of the SIMD architecture, data and computations must
be organized so that as many instructions as possible can
fetch N words of data and process them simultaneously. If
two of the data words to be fetched are in the same memory
block, all the processors sit idle until the second memory
cycle is completed. This can seriously degrade the performance of these computer systems. As far as numerical
analysis applications are concerned [12], it is usual to handle
matrices by rows, columns, diagonals, and blocks. Therefore, much attention has been paid to storage schemes which
allow conflict-free access to these subarrays [6], [16], [22]. As
in Illiac IV software [13], linear skewing schemes may be
used: in an N-block memory, the element a,,, of an array A is
assigned to memory block u s + v t (mod N). Then row
access is conflict-free iff v and N are relatively prime (similar
conditions hold for parallel access to other types of subarray). Fetching row so consists of loading a,O,i into the
accumulator of processing element i for all i, 0 < i < N - 1.
As element aso, is stored in memory block u so + v i, the
N-word vector delivered by the memory must be unscrambled [26]: that is, the word delivered by memory block
x (O < x < N - 1) must be broadcast to accumulator
v-1x -v-u sO, where v-' denotes the inverse of v in
the ring of integers modulo N. This permutation, which is
,xfax isn add
.l,x-2,xl1 e(
< i5
n)
performed by the interconnection network, belongs to the
family {Aj(,)}: here n = log2 N, j = v- 1, k = -u s0. The
same family is involved in accesses to other subarrays of
interest: columns, diagonals, and blocks.
2) The Family {6(n)}: The permutation 6)k is the cyclic
shift of amplitude k. By b(n)k' a shift of k places occurs within
both the first and second half of E() (see Fig. 2). In the same
way, b(n) shifts each quarter, and so on. Permutations within
segments, of which the (n,) are examples, are very useful in
parallel processing. They arise when the "divide and
conquer" technique is used in algorithm design; that is,
when a computation on N items is replaced by two computations on N/2 items, which are themselves replaced by four
computations on N/4 items, and so forth.
It is worth noting that the permutations available to the
programmers of the Staran system [3] are the cyclic shifts
within segments, 6(J), and the Mirror permutations within
segments, T(n) (k = 2i- 1, 2 < j < n).
3) The Family {fX")}: The most interesting permutations
in the family {a(x,} are the bit reversal on the whole set E(n)
O,=
p(nln) and the bit reversals within segments oc(Jo. In
conjunction with the perfect shuffle, they can be used to
transfer data during the computation of a fast Fourier
transform [25].
Another important subset of this family consists of
(n) =-T(n) (O < k < 2n), which maps x onto x E k. As an
example of its usefulness, let us consider the skewing scheme
built into the Staran hardware [3]. The memory of this
computer may be viewed as a 2" x 2n array of 1-bit words,
whose element ax,y (O < x, y < 2 n) is assigned to block x @ y.
Since the EXCLUSIVE OR operation is a group operation, any
row or column may be accessed in one memory cycle.
Permutation zi unscrambles line i.
The other permutations, i.e., acjzn with k * O or j $ n, are
necessary to meet the stability requirement (see Theorem 4).
With a few exceptions, they have not been used in previously
published algorithms.
4) The Families {l(Jf),m,k} and {fY(),m,h}: Much emphasis has
640
IEEE TRANSACTIONS ON COMPUTERS, VOL.
0.0
*0
1 .
'o 1
2
0
,,
1 *-
-
1
0
2
3
2.
3.
:
4
o
5
3.
.3
4.
4
5 .
67
7
0
c-27, NO. 7, JULY 1978
0
Fig 2. The permutation
6
7
,
8
8
9
*
5
1O *
6
1 1 *
12 *
7
13 *
14 0
15
P)I'll
5
6
7
0
9
0
-
10
11
12
X
-1
-
. 13
.
14
15
4
SEGMENT SHUFFLE
been laid on the prominent part played by the perfect shuffle
in parallel programming (see [25], for instance). This permutation maps x = (x"1, x,,., , x2, x1) onto (")(x) =
(xn_1 ..., x2, xl, xn). The families {I(j,m,k} and {y17(),h}
mainly include variations around this bijection.
The permutation y(J U,Oh( + I + h < n) has the following
interpretation: within each segment of size 2r-i, subsegments of size 2"r-j--h are reordered by the Ith power of the
perfect shuffle. This relationship between the subfamily
{YJ(' ,0,h} and the perfect shuffle is shown in Fig. 3. The perfect
o
shuffle itself is (0-)= Y_O_Tei,ntThe permutation I(J'I),m,o is just Y,,n-j-M The introduction of permutations (in,",m, with k -* 0 into the family ,B
is essentially aimed at stability in the sense of Section III.
The use of the perfect shuffle in programming the fast
Fourier transform has been noted in the previous section. As
concerns parallel sorting, the permutations involved in
Batcher's network [2] are perfect shuffles within segments of
1 and
size 2' (2 < i < n) and their inverses; i.e., (n)
(n)- wYn i,i 1,0 I
Let us consider a final example of the use of these
bijections. If a 2a x 2" - array A is stored as a vector-of data
in the Algol fashion (i.e., row by row), its transposition is the
permutation ,o,ao , as may easily be verified. If an array B,
declared as B[1: 2', 1: 2b, 1: 2r-a'], is stored in this way, its
"transpose" C, such that B[i,j, k] is equal to C[i, k,j] (to C[j,
k, i], C[, i, k], C[k, i,j], C[k,j, i], respectively), is obtained by
~
~~()
()
permutation y (n
(y(a(),
permutation
Y(a,b,O,n -a-b(b',a,O,n-a'
YO,a,O,b, YO,a+b,O,n-a-b,
YO,a,b,n-a-b, respectively).
5) Remarks: A given FUB may appear several times in
the same family. For instance, y(¶,O,h and y(J,)h are the same
bijection. The identity may be denoted as y(),o,o or as 7(P)
(0 < j + I < n). Moreover, a FUB may belong to several
families. For instance, T(n) equals o (n), or fi¶?onOk) If k =
(0 < j < n), then T(n) iS the cyclic shift of amplitude 2i within
segments of size 2i++
2
n-12i
(1,)
Note also that rT(1) is the Exchange permutation 4(n)
In a parallel computer, the permutations of data are
performed by the interconnection network. Several possible
-
1 *
2
a
3 *
91,90
0
,1
1
2
-
* 3
PERFECT SHUFFLE
2
Fig. 3. The structure of the segment shuffles.
schemes have been considered for this essential device. They
are presented below.
C. A Survey of Interconnection Networks
The 64 processing elements of Illiac IV are connected
according to the Nearest Neighbor scheme. Interpreting the
set of processing elements as a square, this scheme means
that processing element i (0 < i < 63) is connected to
processing elements i + 1, i - 1, i + 8, i - 8 (mod 64). In
, T(6)) and T(6) can be
other words, only permutations T(6), T(6)
performed in one step. Orcutt [20] considered the problem
of generating any permutation by repeated transfers of data
through such an interconnection network.
A more powerful network to realize bijections between
N = 2" inlets and 2" outlets is the Omega network proposed
by Lawrie [16]. This is composed of n stages of 2 x 2
switches linked by perfect shuffles (see Fig. 4). Control
patterns may be easily derived from permutation assignment [15], [16]. Unfortunately, this network is not capable of
realizing all possible permutations. In one pass, it can
perform such bijections as A(n) , T(z), or 60), but not the perfect
shuffle nor the bit reversal. As an extension of Lawrie's
Omega network, Lang and Stone [14] have proposed a
network that can realize any permutation in 0(8N/) shuffleexchange steps.
A Clos network is a three-stage network defined by three
parameters p, q, r, and consequently will be denoted hereafter by C(p, q, r). It is constructed from crossbar switches
whose dimensions depend on the stages p x q, r x r, and
q x p, respectively. Each output of a stage is linked to an
input of the next stage (see Fig. 5). The Slepian-Duguid
LENFANT: PARALLEL PERMUTATIONS OF DATA
641
0
1
0
2
3
2
4
4
Recall that Y(n) is the symmetric group on E(, whereas #'n)
is the subgroup of those permutations which keep all the
subsets {2j, 2j + 1} (O < j < 2" 1) globally invariant.
A. From the Clos Network C(2, 2, 2"- 1)
to the Benes Network R ")
As a result of the Slepian-Duguid theorem, the Clos
network C(2, 2,2n- 1) is capable of realizing all permutations
between inputs and outputs (see Fig. 6). In other words, for
any permutation 0 in o(n), there are permutations E1 and v2
in _on) 4) and i in °('- ) such that
0 = C2 ° C(n) (4, f) 0 (,(n))-1 0 ,
(3)
Hereafter, we contract equality (3) into the following
notation:
1
5
6
7
6
7
Fig. 4. The -Omegai-etwork with 8 inputs- and 8 outputs.
pxq switches
r-r switches
qxp switches
0
[SI; (0, 0); S21(4)
Note that all permutations 0 may be generated in at least
two different ways. By another casting between the upper
switch and the lower switch in the middle stage, we obtain
0
=
It 1; (0, uf); £g2] [4(n) ° £1; (l*, 0); X{") ° £2].
(5)
(Remember that ~4n) iS the Exchange permutation.)
Moreover, the inverse of permutation 0 is
0-l = [C2; ()-'1, 0 1); C1].
(6)
If we replace the median switches of C(2, 2, 2"-') by
networks C(2, 2, 2" -2) and repeat this operation until the
=
rr
r
-
Fig. 5. The three-stage Clos network C(p, q, r).
theorem [5] states that this network can realize all bijections
between its p r inputs and outputs if (and obviously, only
if) q p: with telephone exchange terminology, this condition means that the network is rearrangeable. As concerns
the cost of hardware, a C(16, 16, 16) network is competitive
with the Omega network -to interconnect 256 processing
elements [11]. However, no control procedure is currently
known for a Clos network, with the exception of some
algorithms that may meet time constraints in a switchboard
environment, but are completely inadequate for parallel
computers [5].
III. CONTROL ALGORITHMS
In this
section,
we
FOR
present
a
BENES BINARY NETWORK
Benes
binary rearrangeable
network for interconnection of2" inputs and 2" outputs. This
network may be derived from the Clos network C(2, 2,2"-')
by repeated use of the Clos structure until all switches are
2 x 2 switches. Th-anks to this recursivity in the design of the
network, we can derive recursive algorithms which yield
control patterns for the FUB's (Sections III-B and C).
Finally, we discuss some implementation ideas.
First, let
us
introduce
a new
notation. If
a
permutation
v
median switches are 2 x 2 switches, we then obtain the
network R("), of which some properties have been studied by
Benes [5]. As an example, R(3) is shown in Fig.-7. Network
R(") is made from (2n - 1) stages of 2" ' two-state switches
that perform the direct connection 1) or the crossed
connection 4<1) between their two inputs and their two
outputs. The links between the stages of R(") realize the
bijections (from left to right)
!0n- 1,0,09 ,n_-2,0,09
2,1,0,0
(j,n- j -1,0,0
for the first half of the network and, for the second half,
..
...
9
fln
2, 1,0,0 9
9
I
,10,0
9
..
9
,1,0,01) g0 1,0,
The Benes network R(") can realize all the bijections between
its inputs and its outputs. As a matter of fact, it results from
equality (5) that each bijection can be obtained by at least
2(2N 1_ l) different control patterns. Another consequence of
equality (5) is that, without loss of combinatorial power, one
switch in the first (or in the third) stage of C(2, 2, 2n -l) can be
permanently forced to one of its possible states-or be
replaced by a fixed link. This modification of a Clos network
can be repeated in all steps ofthe construction of R(", saving
(2" - 1) basic switches. Waksman [29] has considered
such a modified Benes network.
E(") is such that for all x = (x", xn- 1, *, x ,) in E"), the
most significant digits of x and v(x) are equal, then we shall
denote it by
v = (+, *)
(2)
B. Controlling the Benes Network R ")
where and are the bijections on E("- 1) defined by
The structure ofcontrol algorithms will be traced from the
X2, X1) = V(0, xn-1 * - i X2 X1)
(Xn -15
recurrent structure of the network R We consider families
of permutations that meet the following stability property: a
X2, XI)
f(Xn 1
V(1, Xn_1, * *. X2, X1).
on
.
...,
-
,
=
642
IEEE TRANSACTIONS ON COMPUTERS, VOL. c-27, NO. 7, JULY 1978
0
If n = 1 and j = 0, then
b(f =) =
If j = n, then
0
1
2
3
2
3
4
5
4
5
6
7
6
7
~~~~~~~~~~~~~0
2
2
3
3
4
4
.5
5
6
6
()
Theorem 5: If n
R(n)
>
(n) _ (n)
(n)
_ (n)
>
1 j+ +m
g, =-1,k=
Tf,
fOn,k =Tk
*
= n
_ r (n) tn -j I+ I VIj,,,m
I- 1,m + 1,k'9
,I lm,k-
I(f-1)
where k*
= 6:
k'
If n
2m
R(n)
_
>
(n)
<
n, then
=); 1(,) t11
m + h, then
2 and
(n)
MjO,m,k- Ijm O,k
Fig. 7. The Benes binary network R(31.
=
2 and 0 <j < n - 2, then
[i )1= 1+1n ) j; (n-+1) n 1) 4(n)* k "(n) j]
and k** = k
where k* = k' @ 2mn- if2
k,0 j. If n 2 1, then
Theorem 4: If n
Fig. 6. The Clos network C(2, 2, 4).
0
b(l)
1 )
tj,0,0,k
If n > 2e, i 1+O
andj + h+ m = n, then
(n) - f (n)
Tk
Mj,l,m,k
_
(n)
Ik
=
family JV is stable iff, for all 0 in J, there are two bijections
and in and two bijections E 1 and in Vn) (for some n) If n = 1, then
such that 0 = [E1; (4, 0i/); e2]. The search for control patterns
#(i,l,m,k- kT, yielding 0 is replaced by the search for control patterns
Theorem 6: If n > 2 and j + +n m + h < n, then
yielding and The domain ofthe latter two permutations
aI d tr l nompt1)ion-p1) (n)y
b
T
(n)
is half the size of the domain of the former. Permutations e1
Yj,lmh
S\Yj,l,m,h dJ,I,m,hJ
and describe the control patterns for the first and the last
Else (i.e., ifnj + + m + h = n or if n a 1)
stage of R(n), respectively. As and belong to the stable
()n)I i 212(n) ,m,
family, this procedure can be used recursively. The final
j,
^j,lmh
permutations to generate are 0tt) or 4(1), which are realized
Corollaro (of Theorem 2 or 3): If n > 2, then
by the two-state switches.
(n) = (n); (,,(n -1), 1(n 1)); C n ] This "divide-and-conquer" technique may be applied to
the five families of Frequently Used Bijections by means of
the theorems below. These theorems are proved in the If n = 1 , then
Appendix and an example of their use is given in the next
7(1) = 1(0) U(1) = ().
section.
Let k be a positive integer and denote, respectively, by k'
In the previous theorems, the number of distinct bijecand k1 its quotient and its remainder in the division by 2, i.e., tions describing control patterns for the extreme stages of
<
k = 2k' + k1.
R(n) is as small as 2n. These bijections are m(n) () (2<wjot)
aple
This
Theorem 1: If n > 2, then
and
their
composition products by
(see Fig. 8)
is
since
the cardinality of the subgroup
scarcity
noteworthy
(kn) = (n); (,Z:(n -1), (n -1));Sn)
j,(n) iS 2(2n 1 )
If n = 1, then
C. An Example
TMol = 1(1) TM1 = 4(M)
Suppose that a Bit Reversal p (3) = a(3,) is to be performed
(Remember that C(n) is either the Identity, (n), or the
by R (3) on an 8-word vector of data. The three steps of the
Exchange, 4(n), according to whether k, is 0 or 1.)
of a control pattern are as follows.
Theorem 2: Assume that j is odd: j = 2' + 1. If n 2, computation
First Step: From the first case of Theorem 4, we obtain
then
(3)
(3)h
old(3). (2) (2)
=(
)
[(n). (,(n - )1) n ±-+) . (n)].
0,0 -L 3 6 vl,0, OC1,2), q3 J
MI V+J
j'+
kJ
c
j,k'
ki
j,k-L
In the second step a similar reduction will be applied to oa()'
If n = 1, then
m
) .mn,
and oa(,2)2
111,0- I
Second Step: From the second case of Theorem 4 and
Theorem 3: If n >22 and O < j < n, then.
from the first case of Theorem 1, the following equalities
1)
hold:
6.n)
j,k' 1), b(n
i,k'+ k J; C(n)].
ki
J,k [ (n); (6(n
e2
/.
e2
643
LENFANT: PARALLEL PERMUTATIONS OF DATA
o
1 *
p
2 "
3*
4 5 *
6 *
7.
- 1
2
3
a* 4
*- 5
0 6
37
X
*
*
0
O1
* 2
* 3
43
1*
2 *
3 *
4
5 *
6 *
7 *
5
6
*7
O*
1
*I..
>O
1
*--
2
* 2
3*
3
4"
4
5 *
~~ 5
0 6
6 *
0 7
7 *
Fig. 8. From left to right: the permutations t(3), (33), and 4(3)*
t(l2,) =T(o2) = 1(2); (T(1), T(0)); 1(2)]
2122= (2) [I(2); (t1) T (1)); 1(2)].
Third Step: As a result of Theorem 1, the bijections zT
and zTi are recognized as the basic controls (1) and 4(1),
respectively.
Consequently, the five stages of R(3) must realize the
following bijections (with a "vertical" notation to match the
vertical drawing of stages in the figures)
I(M1
0
0
1
2
2
3
3
4 :
5
4
6
6
7
5
1 X-.
7
Fig. 9. The network R(31 under a control yielding p(3).
Our approach is different. By selecting a set of Frequently
Used Bijections, we allow an efficient representation ofthese
bijections. As concerns the families of Table I, the paparameters of the bijections, say h,j, k, 1, m, are either in the
range {0, 1,
n} or in the range {0, 1,
2" 1}. In the
latter case, they may be recorded in an n-bit word with
the usual binary expansion (denoted by R2 hereafter). In
the former case, it would be more convenient to represent an
,
,
integer m by an (n + 1)-bit word (mn m", * mi, ., ml
such that mi = 1 iff m = i - 1. In this way (denoted by RI),
+ 1,
i(2)
t(2)
(1)
q13(3)
(3)
'13
,
,
the increments required by the theorems of Section Ill-B are
performed as shifts. Table II specifies this coding scheme
which results from a tradeoff between space saving and time
saving.
1(2)
1(2)
-.
4(~~~~~1)
The selection of a set of interconnections deserving special
treatment because of their frequent use implies that two
different instructions must be included in the machine
repertoire: Trigger a Frequently Used Bijection (TFUB) and
Trigger an Ordinary Bijection (TOB). Due to our choice of
FUB's, an instruction word TFUB contains six fields corresponding to the following:
1) the opcode;
the name of the FUB family (a, f3, y, 6, or A): 3 bits;
2)
0 00 0
3) the four parameters: 4(n + 1) bits.
The length of the FUB name, i.e., 4n + 7 bits, is consistent
1 0 1 0 1
with the format of an instruction: in a computer with 256
1 0 1 0 1
processing elements, this name would be 39 bits long. When
This setting of the network is illustrated in Fig. 9.
executed, the instruction TFUB forces the basic 2 x 2
switches to the correct state and broadcasts a 2'-word vector
D. Implementation of the Control Algorithms
of data through the network. The other instruction, TOB,
Several authors [19], [29] have proposed algorithms that has one parameter that is the address in main storage of a
can compute control patterns for the network R , whatever
vector of (2n 1)21 I bits which is used to set the two-state
the bijection under consideration. These powerful algor- switches. The vector may be computed by one ofthe general
ithms suffer from several shortcomings. First, they are procedures mentioned at the beginning of the present
space-consuming. As input, they accept the representation section.
of a permutation 0 by its value assignment; i.e., the sequence
The network control mechanism must be devised to cope
0(0), 0(1),
0(2n - 1), which requires n2n bits to be with the implementation of the instruction TFUB. One
memorized. A considerably larger amount of memory is possible structure, which readily lends itself to pipelining,
needed for the computation. Second, these algorithms are consists of a binary tree of registers. A node of this tree is a set
time-consuming. As far as applications to parallel proces- of four registers whose lengths are (n - d + 2) bits at depth d
sors are concerned, they could be used to fill in tables of in the tree (1 < d < n) (see Fig. 10). From each node at depth
control patterns before run time. Such tables would be huge: d (1 < d < n 1), it is possible to compute 2n-d bits of the
(2n 1)2n-1 bits to store the control pattern of each control patterns for stages d and 2n - d. If several transfers
considered mapping. This figure is 1920 bits per permuta- of data vectors through the network are allowed to be in
tion in the case of a network interconnecting 256 processing progress at the same time (as a pipeline system where the
elements.
units are the stages), the control patterns for stage 2n d of
If we assume that signal 0 (signal 1, respectively) on its
control line forces a 2 x 2 switch to state P1) (W(1), respectively), we can represent the control pattern by the following
4 x 5 matrix
-
...,
-
-
-
644
IEEE TRANSACTIONS ON COMPUTERS, VOL. c-27, NO. 7, JULY 1978
TABLE II
CODING THE BUECTION NAMES
1st PARAMETER
2nd PARAMETER
;(n)
J,k
R2
R2
J,k
Rl
R2
(n)
Rl
R2
Rl
Rl
Rl
R2
Y~~~n) ~Rl
Rl
Rl
Rl
BIJECTION
j,k
B(n),m,k
j ,),m,k
RI
R2
parameter in the range
parameter in the range
F1
{O,1,
{O,1,i
F22
t0
...
,n}
2-11
3rd PARAMETER
4th PARAMETER
("unary" representation)
(binary representation)
F3
a4
2) If A is declared as an array of bits A[: h], the
affectation
A:=(O, B[b:b +h -2])
is used for: A[1] :=0 and A[i] := B[b + i -2] (2 < i < h).
3) int(A) is the integer whose "unary" representation
(representation R1) is the bit array A.
4) The procedures mirror(A) and complement(A) deliver
bit arrays that are the mirror image of A and the ones'
complement of A, respectively.
We focus our attention on node c at depth d in the tree
(1 < c < 2d1-'), assuming that the hardware of the interconnection network performs the same operations on all nodes
at the same depth. As concerns a node at depth d with
1 < d . n - 1, two registers of this node, denoted by J and
K and their respective sons, are involved in these operations.
The control patterns for stage 2n -d are recorded in the free
entry of Q2n -d; whereas, the control signals sent to the
switches of stage d are represented as bits of an array. The
proposed algorithm for node c at depth d (1 < d <n - 1
1 < c . 2d-i) is as follows:
bit array J, K[1:n+ 2 -d];
bit array upper-son-of-J, upper-son-of-K, lower-son-of-J,
lower-son-of-K[l: n + 1 -d];
bit array control-of-stage-d, free-Q2, d[l:2" 1];
bit array table-d[1: n + 1 - d, 1: 2 -d];
if J[n + 1 -d:n + 2 -d] =00 then
a5
0
~
parbegin
0~~~~
comment we use Theorem 4 end of comment
control-of-stage-d[(c - 1)2-d + 1: c2-d]
. 1\
1 2 3
01 2
Ml
1
free-Q2n - d[(C- 1)2
T2
=table-d[int(mirror(J)), 1:2" -d],
d+ 1: c2"]
:=if K[int(mirror(J)] = K[1]
then table-d[int(mirror(J)), 1: 2"-d]
else complement(table-d[int(mirror(J), 1: 2 -d]),
upper-son-of-J = (0, J[1: n -d]),
R(") must be stored in a queue Q2n-d (1 d n -1);
upper-son-of-K := K[2: n + 2 -d],
whereas, the control patterns for the n first stages can be used
:= (0, J[1: n -d]),
lower-son-of-J
as soon as they are computed. Moreover, if the network is
lower-son-of-K := K[2: n + 2 - d]
built as a pipeline system, it is necessary to provide level d of
(D
(mirror(J[1: n - d]), 0),
the tree (1 < d < n) with a 3-bit register Fdwhich records the
parend
(common) family name of the FUB's whose parameters are
memorized in the nodes of depth d. Finally, a table Td is else
associated with depth d (1 < d < n). Each of its (n + 1 d)
parbegin
entries contains the 2" -d-bit pattern that controls either
I(n+I-d) (for entry 1) or t(n+1-d) (for entry j,
comment we use Theorem 1 end of comment
2 <.j.n + 1 -d).
control-of-stage-d[(c - 1)2" -d + 1: c2" -d]
We shall make this structure clear by describing the
= table-d[1, 1: 2"-d]
register-to-register operations involved in the computation
free-Q2,..-d[(C- 1)2 + 1 c2d] :=if K[1] = 0
of control signals for family x(.") This description is exthen table-d[1, 1: 2n-d]
pressed in a language that deserves a few words of
else complement(table-d[1, 1:2"-d]),
explanation.
:= J[2: n + 2 -d],
upper-son-of-J
1) Three types of data are used: bits, arrays of bits, and
(for array indices only) integers.
upper-son-of-K := K[2: n + 2 -d],
Fig. 10. A possible structure for the control mechanism.
-
645
LENFANT: PARALLEL PERMUTATIONS OF DATA
lower-son-of-J = J[2: n + 2 - d],
lower-son-of-K:= K[2: n + 2 -d],
parend
F2
E1
uppe r -son - of - J
upper- son-of - K
For a node c at depth n, the only operation to perform is
control-of-stage-n[c]
=
K[2].
Fig. 11 shows this algorithm operating on the root (d = 1,
=
c 1) of the control mechanism of the network Rt3). The
permutation to be performed is the bit reversal p(3) =(3,0
which has been considered in the example of Section 111-C.
Arrays of bits are represented by registers whose rightmost
bit is the first element of the array.
IV. CONCLUSION
Several authors have proposed algorithms which compute control patterns for the Benes binary network from any
bijection assignment. If such a network is used to interconnect processing elements in a SIMD computer, the time
constraints hinder the execution of the aforementioned
algorithms every time a transfer of data is needed. An
alternative would be to use these algorithms in order to fill a
table of control patterns. This is not very practical, because
this table would need a huge amount of main storage.
In this paper, we have chosen another approach. We have
restricted our attention to a set of frequently used bijections
for which we have proposed tailored algorithms. The
number of selected bijections for a network with 2" inputs is
of the order of magnitude 22" as compared to 2"! for the
whole symmetric group. It is small enough so that the name
of a bijection can be coded within the parameter field of an
instruction Trigger a Frequently Used Bijection. Another
point which makes the implementation of such an instruction feasible is that control patterns can be computed "on
the fly" by our algorithms, as shown in Section III-D. If a
required permutation failed to be one of the selected
Frequently Used Bijections, then it would be possible to use
one of the general algorithms which have been previously
published.
Due to the recurrent structure of both Benes network and
our algorithms, a data vector of 2N words can be easily
processed by a computer with N = 2" processing elements.
For this purpose the interconnection network is used successively as the upper median switch and as the lower median
switch ofaCs network C(2, 2,2"). This recurrent structure
is also very interesting in the case of several configurations of
the same computer which differ from each other by the
I
K 01
ao
Il
/
_
Is
\ -
iower-son-of-J
free-Q5
lower-son-of-K
-A
00 0
LoL Lo
LOLoLI
L2oJi
.J
-o
control signals sent
stage 1 of R 3':
to
1[ ]
Fig. 11. The first step of the computation of a control pattern for the
permutation o) = pp .
APPENDIX
PROOFS OF THE THEOREMS OF SECTION III-B
The bijections achieved by the first, second, and third
stages of a Clos network C(2, 2, 2" 1) are denoted by F, S,
and T in the sequel. The fixed links between the stages are E
and a; i.e., the inverse of the perfect shuffle and the perfect
shuffle itself. The sign o will be omitted in the expression of
bijection products. Moreover, the notation y ,, will be used
instead of y mod 2". Finally, the notations of Table I are
valid for this appendix.
We shall restrict our attention to the first statement of
each theorem; the others result immediately from the
definitions of Table I.
Proof of Theorem 1: Let x = (Xn, Xn_ 1, * ** X2! X 1) be an
element of E("). Denote by x' the (n - 1 )tuple (x", X"- 1, * *,
x2) so that x = (x', x 1). If n is greater than 1, then x is mapped
by gF onto (xl, x');
by SeF onto (x1, x' @ k);
by aSeF onto (x' e k', x,) = (x', xl) / (k', 0);
by TcSEF onto ((x', xI) ® (k', 0)) @ k,
= (x', x) ® (k', k1) = x e k.
Q.E.D.
Therefore the theorem holds for n 2 2.
Proof of Theorem 2: A word of data is broadcast to the
upper median switch or to the lower median switch according to the parity of the number of the line by which it enters
the network. As in the previous proof, let us consider an
element x = (xn, xl) = 2x' + x of ), omittingthe obvious
case n = 1. This integer is mapped
number of processing elements.
- by eF onto x' + 2" -1x;
Further research could usefully be conducted into the
selection of families of FUB's. It is noteworthy that a stable
Ij x' + k'Inifxl = 0
1- by SeF onto
family may contain stable subfamilies: e.g., the subfamily
ix'+ + k' + k, In-1, if X1
fI, m,k for a given j is stable since its first parameter is not x
affected by the derivation of Theorem S. New bijections Thus, if x is even, it is mapped by arSeF ontocould be selected in relation to the progress of algorithm
y= 12jx'+2k'l = ljx+2k'ln
x
n
design. Our approach is well-suited for algorithms obtained
is
=-0
As
or
"divide
and
a
concurrent
by
conquer" technique [21 [231.
y even, y D kI (kI
1) equals y + kl, so that xis
6EEE TRANSACTIONS ON COMPUTERS, VOL. c-27, NO. 7, JULY 1978
646
mapped by TrSeF onto I ix +2 k' +kiIn-= ix +kI. since (x1 (3 x"_;) xl = x,,"_; Finally, the image of x by
This proves the case for x even. For odd x, x is mapped by ToSeF, i.e., C(n)(n(n) j(z)), is
aSeF onto
(a2, Xl, p("-j-2)(a0), xn _j)ek = (a2, p "n- (a)) k.
Z= li(2x')+(2j'+1)+2k'+2kiln
Q.E.D.
= lijx+2k'+2k,
InI
Proof of Theorem 5: Assume that n > 2,1 > 0, andj + I +
Since z is odd, z (D k 1 equals z - k I. Con'sequently, the image m < n. Let x = (6&, 61, b2, 61) be an element of E("). We
ofanoddxbyThSeFis ij'x+2k'+k1In= |jx +kIn
denote by bP and b3 the integers (or vectors of bits) defined
Q.E.D. by the equalities
Proofof Theorem 3: Assume that n > 2 and 0 < j < n. Let
x = (x, x- 1, , x2,x1) = (a2,a1)be an element of E() and
Ab^3 = (3, Xn-j-l+ 1)
denote by a'1 the integer (xn_j, Xn-j-1, , x2), so that
Notice that inequalities 1 > 0 and j + I + m < n imply that
1 = 2 a1 + x1. The integer x = 02, a1, x1) is mapped
b'1 and b'3 exist. Theelementx = (4, b3, xn1±1,b2,b1,x1)
* by EF onto (x1, a2, a1);
is mapped
* by SEF onto
*by 8F onto (X1 (Xn_ +1 b4, b3, Xn-j-1+1 b2, bJ);
if x1=O
x1,a2, Ia'1+k'In-j 1),
b4,
* by S&F onto (X IXn i b-1,
y,4 b2, P3)Dk'
(Xl, a2,9 a1 + k' + k1 In -j- 1), if x1 1
where bit y is equal to Xn-Ji-+1, if X1llXn-j-I+1=O;
i.e., onto (xI, a2, la+ k' + k1 XI In-j-1);
otherwise, to Xn-j_l+ I D 1. Obviously, y is
xn_j1l+I1®(xl,3x,-j-1+1)=x,. By oSsF, x is mapped
* by oSeF onto 02, la1 + k' + k1 x
onto
by ThSeF onto @2, Y)
z = (64, b1, xl, b2, b3, Xl DXn-j-l+ 1) (2 k')
where y is the integer
+n to this value, we obtain
Applying the bijection -lm
(Ia, + k' + k X X1knj)1,
t171"4m+I(Z) = (64, b1, xl, b2, 63, xn-j-1+1)G (2 k')
®
injI,x1);
=
=
=
12a'1+2k'+2kX1i+(X1+kl-2x1kl)lnj
1(2 a", +x1) + (2k' + k1) In..
Q.E.D.
la1+kkl TO(- j(a1).
Proofof Theorem 4: Assume that n > 2 and 0 < j < n - 2.
Letx = (x",xn, ,x2,x1)a=
(a2'aI)beanelementofE(")
and denote by ao the integer (xn j - 1, xn - j- 2' *, X 2), so that
a1 = (Xn-j, ao, x1). The integer x = (a2, x,-j, ao, x1) is
mapped
by F onto 02, x,_, a,
j);
by eF onto (x1 Xnj,a2, x.-j, ao).
Note that the effect of eF is to broadcast on the upper
median switch (on the lower median switch, respectively) the
data entering the network by an inlet x such that x 1 = xn -j
(x1 $ x,-j, respectively). The image of x by SeF is
(XI (3 Xn-j, a29 y, p(
(a'o)) 3k'
where bit y is equal to Xn-i if x1 e xn = 0, and equal to
1 if xIl Xn j = 1. We can summarize both cases in
one expression
X nj
(X1 E3
Xn-i)
which shows that y = x1. By qSeF, x is mapped onto
y = Xn-j
0=(a2,
( x
pX2I(a),
Applying the permutation
qtn")
Xn
j)@ (2 k').
we obtain
4n)j(z) = (a2, xl, p-21(a0), Xn-j) (2 k')
=
(64, 61, 62, b3) D (2 kV).
Finally, the image of x by TrSsF, i.e., C(n)(q(n)m+ 1(z)), is
(64, bI, b2, b3) (3 (2 k' () k1) = (64, 61, b2, b3) @ k.
Q.E.D.
Proof of Theorem 6: Assume that n . 2 and] + I + m +
h <n. Let x = (5, C4, C33 C2, 1) be an element of E()1, and
denote by c" the integer such that c1 = (cl, xl). The element
x of E(n) is mapped
by eF onto (x1 , C4, C3, C2' C1);
by SeF onto (x1, 5c2, c3, c4,9'1);
by TuSeF onto (c5, C2, C3, &4, C x1)
=
(C5
C2
C3 C4,
C1). Q.E.D.
REFERENCES
[1] G. H. Barnes et al., "The ILLIAC IV computer," IEEE Trans.
Comput., vol. C-17, pp. 746-757, Aug. 1968.
[2] K. E. Batcher, "Sorting networks and their applications," in Proc.
Spring Joint Computer Conf., AFIPS Conf. (Montvale, N.J.: AFIPS
Press, 1968), vol. 32, pp. 307-314.
[3] -, "STARAN parallel processor system hardware," in Proc. Fall
Joint Computer Conf., AFIPS Conf. (Montvale, N.J.: AFIPS Press,
1974), vol. 43, pp. 405-410.
[4]
7 "The multi-dimensional access memory in STARAN," in Proc.
5th Sagamore ConfJ Parallel Processing, Lecture Notes in Computer
Science (New York: Springer, 1976), vol. 24.
[5] V. E. Benes, Mathematical Theory of Connecting Networks and Telephone Traffic. New York: Academic, 1968.
[6] P. Budnick and D. Kuck, "The organization and use of parallel
memories," IEEE Trans. Comput., vol. C-20, pp. 1566-1569, Dec.
1971.
,im
..
'r
LENFANT: PARALLEL
7t7 ..
.e
lj... '..71..
PERMUJTATIONS OF DATA
[7] C. Clos, "A study of non-blocking switching networks," Bell Syst.
Tech. J., vol. 32, pp. 406-424, 1953.
[8] M. J. Flynn, "Very high speed computing systems," Proc. IEEE, vol.
54, pp. 1901-1909, 1966.
[9] D. Fraser, "Array permutation by index-digit permutation," J. Ass.
Comput. Mach., vol. 23, pp. 298-309, Apr. 1976.
[10] S. W. Golomb, "Permutations by cutting and shuffling" SIAM Rev.,
vol. 3, pp. 293-297, Oct. 1961.
[11] M. L. Graham and D. L. Slotnick, "An array computer for the class
of problems typified by the general circulation model of the atmosphere," Dep. Comput. Sci., Univ. IliEnois, Urbana, IL, Rep.
UIUCDS-R-75-761, Dec. 1975 (IEEE Repository no. 76-83)
[12] D. Heller, "A survey of parallel algorithms in numerical linear algebra," Carnegie-Mellon Univ., Res. Rep., 1976.
[13] D. Kuck, "ILLIAC IV software and application programming"
IEEE Trans. Comput., vol. C-17, pp. 758-770, Aug. 1968.
[14] T. Lang, "Interconnections between processors and memory modules
using the shuffle-exchange network," IEEE Trans. Comput., vol.
C-25, pp. 496-503, May 1976.
[15] T. Lang and H. S. Stone, "A shuffle-exchange network with simplified
control," IEEE Trans. Comput., vol. C-25, pp. 55-65, Jan. 1976.
[16] D. H. Lawrie, "Access and alignment of data in an array computer,"
IEEE Trans. Comput., vol. C-24, pp. 1145-1155, Dec. 1975.
[17] J. Lenfant, "Fast random and sequential access to dynamic memories
of any size," IEEE Trans. Comput., vol. C-26, pp. 847-855, Sept. 1977.
[18] S. B. Morris and R. E. Hartwig "The generalized faro shuffle,"
Discrete Math., vol. 15, pp. 333-346, 1976.
[19] D. C. Opferman and N. T. Tsao-Wu, "On a class of rearrangeable
switching networks," Bell Syst. Tech. J., vol. 50, pp. 1579-1618,
May/June 1971.
[20] S. E. Orcutt, "Implementation of permutation functions in ILLIAC
IV-type computers," IEEE Trans. Comput., voL C-25, pp. 929-936,
Sept. 1976.
[21] M. C. Pease, "An adaptation of the fast Fourier transform for parallel
processing," J. Ass. Comput. Mach., voL 15, pp. 252-264, Apr. 1968.
[22] H. D. Shapiro, "Theoretical hmitations on the use of parallel memories," Ph.D. dissertation, Dep. Comput. Sci., Univ. Illhnois, Urbana,
IL, Rep. UIUCDCS-R-75-776, Dec. 1975 (IEEE Repository no.
76-82)
1
..t..N
.,1,
647
[23] W. J. Stewart, "A note on cyclic odd-even reduction," IRISA, Univ.
Rennes, Rennes, France, Res. Rep. 1977.
[24] H. S. Stone, "Dynamic memories with fast random and sequential
access," IEEE Trans. Comput., vol. C-24, pp. 1167-1174, Dec. 1975.
[25]
"Parallel processing with the perfect shuffle," IEEE Trans.
Comput., vol. C-20, pp. 153-161, Feb. 1971.
[26] R. C. Swanson, "Interconnections for parallel memories to unscramble p-ordered vectors," IEEE Trans. Comput., voL C-23, pp.
1105-1116, Nov. 1974.
[27] K. J. Thurber, "Programmable indexing networks," in Proc. 1970
Spring Joint Computer Conf., AFIPS Conf. (Montvale, N.J.: AFIPS
Press, 1970), vol. 36, pp. 51-58.
[28] K. J. Thurber and L. D. Wald, "Associative and parallel processors,"
Comput. Surveys, vol. 8, pp. 215-255, 1976.
[29] A. Waksman, "A permutation network," J. Ass. Comput. Mach., vol.
15, pp. 159-163, Jan. 1968.
-
Jacques Lenfant was born in Boulogne-sur-mer,
France, on June 21, 1947. He received the B.S.
degree in mathematics and the M.S. degree in
algebraic topology from the University of Paris,
Paris, France, and the Ecole Normale Sup6rieure
de Saint-Cloud, and the Doctorat-es-Sciences degree from the University of Rennes, Rennes,
France.
Since 1970, he has occupied various academic
positions in Rennes, with an interruption in 1975
when he was a Visiting Assistant-Professor with
the Department of Electrical and Computer Engineering, University of
Michigan, Ann Arbor. He is currently the Vice-Director of the Institut
de Recherche en Informatique et Systemes Aleatoires (IRISA) and a
Professor of Computer Science. His research interests are in computer
system evaluation, program behavior modeling scheduling, and parallel
processing.
Dr. Lenfant is a member of the IEEE Computer Society and the
Association for Computing Machinery. He serves as an Associate Editor
of the RAIRO, the Journal of the French Computer Society AFCET.
Download