The Experimental Arrangement for the Control Method Validation

advertisement
AU J.T. 8(1): 1-7 (Jul. 2004)
The Experimental Arrangement for the Control Method Validation
in the Fault-Tolerant Benes Network
Gennady Veselovsky, Mohammad Hasanul Karim
and Maliha Mahboob
Faculty of Engineering, Assumption University
Bangkok, Thailand
Abstract
This paper considers the experimental checking and the fast routing algorithm
ease of the fault tolerant modification of the Benes network. The perfect symmetry of
two sub-networks between outer stages of the Benes network allows using the control set
computed for ‘not faulty’ condition in presence of faults, if only one of sub-networks is
affected. But the given control set in this case is divided in two subsets, which are
applied to the ‘not faulty’ sub-network. In turn, it provided some necessary changes in
setup of switches in outer stages. Here the aforesaid general approach proposed by
Veselovsky et al. is tested for the case of routing BPC permutations with the use of fast
algorithm developed by Nassimi and Sahni.
Keywords: Fast routing algorithm, BPC permutations, multiple processor,
interconnection network
network. With a crossbar full connectivity is
provided, i.e. all processors can communicate
with each other (or with parallel memory
modules) simultaneously without reduction of
bandwidth. To connect N inputs to N outputs
with a crossbar N2 switches are needed.
Consequently, crossbar networks cannot be
scaled to any arbitrary size. Multistage interconnection networks (MINs) provide more
cost-effective communication than crossbar
networks. An N x N multistage switching network is
a communication network with N input
terminals (sources) and N output terminals
(destinations) composed of a certain number of
stages of switching elements (with 2 x 2
switching elements) with a total number of
cross-points equal to O(Nlog2N).
A class of non-blocking multistage
switching networks that are of considerable
practical interest was proposed by Clos (1953)
and Benes (1964; 1965). The objective was to
develop a multistage network realizing crossbar
switching functions; i.e. no blocking, yet
having fewer connection points. Such networks
are fairly efficient in terms of hardware costs
and are known in the literature as
Introduction
There has been a long debate on whether
one fast processor would be faster and more
cost effective than a system with more than one
slower, but less expensive processor. However
at present, the consensus appears that multiple
processors must be used to obtain
improvements in speed even though the
difficulties of using more than one processor
on a single problem have not been resolved.
Multiprocessor systems are also designed to
gain fault tolerance, i.e. to be able to continue
operating in the presence of hardware (and
possibly software) faults. Hardware fault
tolerance is achieved by the addition of circuits
that are not necessary for normal operation, but
enable the system to continue if the faults
occur. The number of faults that can be present
is limited and dependent upon the system design.
Interconnection network is the key
component of parallel computer architecture.
Interconnection networks can be used for
internal connecting among processors, memory
modules and I/O devices. Crossbar is an
example of single-stage interconnection
1
AU J.T. 8(1): 1-7 (Jul. 2004)
two network cycles. If the first and last stages
are
augmented
with
multiplexers/
demultiplexers, then the modified Benes
network will be tolerant to any single faults.
The structure of the fault tolerant modification
of the 8 x 8 Benes network is shown in Fig. 1.
Similar bypassing circuitry in outer stages of a
network was proposed by Adams and Siegel
(1982) in concerns with extra stage cube
topology. If a fault is detected the data in the
first pass is switched through the non-faulty
paths. Because of the perfect symmetry of the
two alternative paths between the same pair of
input/output terminals, the data is switched to
the alternative path in the second pass by
applying the control signals intended for the
switching
elements
of
one
“storey”
intermediate stages (or one sub-network) to
elements of another “storey” (or another subnetwork). The control signals are generated in
advance for the given permutation by one of
the known algorithms, assuming a non-faulty
network. The control for the switching
elements of the first and last stages
participating in the unrealized connection is
reversed, i.e. if one element was initially
connected “straight”, it is now switched to
“exchange” and conversely.
If the fault is detected in the first or last
stage, then the network control is somewhat
different. Specifically if the fault is detected in
a switching element of the first (last) stage
connected “straight”, then this element is
disconnected, the bypass multiplexers/
demultiplexers are activated, and the required
switching can be realized even in a single pass,
because all the other control signals are not
altered. If the faulty element was realizing
exchange function, then the faulty element is
disconnected, all the input/output elements,
except those connected to the faulty element,
are switched in the first pass, and the bypass
multiplexers- demultiplexers of the faulty
element are activated in the second pass, when
the control signals from the first storey of
intermediate stages are transferred to the
second storey and conversely, thus exciting two
alternative paths relative to the initial paths in
the faulty network. So, the modified Benes
network allows decomposition of the entire
rearrangeable, because in case of blocking
there is always a possibility of realizing the
desired connection by rearranging some of the
previously established connections. The Benes
network can perform any arbitrary permutation
of N inputs. The term ‘permutation’ in this
context defines a request for parallel
connecting N sources to N destinations with a
distinct destination for each of the sources. The
best-known setup algorithm for performing an
arbitrary permutation by the Benes network
requires O (Nlog2N) time. This is known as fast
routing algorithm, developed by Nassimi and
Sahni (1981), with which the Benes network
can perform certain permutations in O (log2N)
time (including the setup time). It was
demonstrated that the richness of the set F (the
set of permutations realizable on “fast routing”
Benes network) includes most classes of
permutations studied in the parallel processing
literature. Here we limit our discussion with
BPC (bit-permute-complement) permutations,
which belong to the broader set F. A
permutation is called a BPC permutation if the
destination address can be obtained from the
source address by permuting bits in the source
address (sn-1 sn-2 … s0) and/or complementing
some or all of its bit positions.
With a large number of processors
(hundreds and thousands), the networks are
fairly complex devices. The switching control
algorithm is also fairly complex, essentially for
Benes network. Since the reliability of a
switching network has a direct impact on the
reliability of the entire system, reliability
improvement can be achieved by designing
networks that are fault tolerant with respect to
single faults (usually). Fault tolerance implies
neutralization of the effect of faults in the
network by providing static and dynamic
redundancies.
The Fault Tolerant Modification of
the Benes Network
In the Benes network, there is more than
one independent path for each input-output
pair, so that in principle it is possible to bypass
single faults and to realize any given
permutation in two passes or in other words, in
2
AU J.T. 8(1): 1-7 (Jul. 2004)
Fig. 1. The fault tolerant modification of the 8 x 8 Benes network
network into two sub-networks of equal size,
and the non-faulty half can emulate the entire
network. This decomposition is useful during
the recovery of the faulty half of the network.
The aforesaid approach was originally
introduced by Veselovsky et al. (1989).
The state of a switch in stage b or stage 2n-2b, 0≤ b ≤ n-1, is determined by bit b of the
destination tag of its upper input. If bit b is 0,
the switch is set to state 0, otherwise to state
1. One of the most frequently used in parallel
programming is a permutation what is called
bit reversal. This permutation belongs to the
BPC class. With bit reversal the destination
tag for each input/output pair is produced
from the source tag by changing the order of
bits in its binary representation to the reverse
one. E.g. in 32 x 32 Benes network the
destination tag for source tag 10110 with bit
reversal looks as follows: 01101. Bit reversal
is very useful in particular when
programming ‘Fast Fourier Transform’ for
an array computer. Bit reversal permutation
for N = 8 with decimal representation of
source and destination tags is given below
(numbers in the upper row are source tags
and numbers in the lower row are destination
tags respectively):
Algorithm for Fast Routing in
the Benes Network
For experimental testing the validity of
the aforesaid technique for providing fault
tolerance in the Benes network the routing
algorithm proposed by Nassimi and Sahni
(1981) was chosen. To describe the method
let Di be the “destination tag” on input
terminal i, 0≤ i ≤ N-1. Here (Dn-1 Dn-2 … D1
D0) is a permutation of (N-1, …, 1,0). The
data at input terminal i is to be routed to
output terminal Di. The switch settings are
determined from the binary representation of
Di. If an N= 2n input / output Benes network
is being used, then there are 2n-1 stages of
switches whose settings need to be obtained.
Let the stages be numbered 0 through 2n-2.
p=
3
(
01234567
04261537
)
AU J.T. 8(1): 1-7 (Jul. 2004)
Fig. 2 shows the switch settings in 8 x 8
Benes network obtained with the use of
Nassimi and Sahni routing algorithm to
realize bit reversal permutation.
d2d1d0
000
000
010
000
001
010
101
010
111
111
011
101
100
100
000
010
000
101
010
110
101
111
001
101
100
110
100
001
001
011
011
111
001
011
110
011
100
111
110
110
Stage:
0
Control: d0
1
2
3
4
d1
d2
d1
d0
Fig. 2. An 8 x 8 Benes network with switches set up to realize bit reversal permutation
any hexadecimal number from 0 to F. In this
board, there were two 74LS688 ICs (8-bit
magnitude comparators) to compare the
incoming address bits from the ISA bus (A4A11) with the adjusted address bits in two DIP
switches. After comparing the address bits, the
outputs of both the comparator ICs were fed
into one OR gate. Address bit A2 from ISA bus
and the output of the OR gate was then again
fed to another OR gate which is used to select
Port I (one of the two 82C55A ICs). To select
Port II, another 82C55A IC, the inverted A2
address bit was ORed along with the output of
the first OR gate.
Each of the 82C55A, programmable
parallel ports had three 8-bit ports namely PA,
PB, PC. PA of Port I was used for taking input
from the user and outputting it into the Benes
network, PB of Port I for taking input from the
outputs of the Benes network, PC of Port I for
handling faulty conditions, PA, PB, PC (lower)
of
Port II for setting the control bits of the
switches of the Benes network. PC (upper) of
Port II was fed to a 3-to-8 Decoder for enabling
Arrangement Design
To test experimentally the aforesaid
approach the 8 x 8 fault tolerant modification
of the Benes network was implemented with
the option to insert faults by removing any
selected switch(es). The Nassimi and Sahni
routing algorithm was programmed to compute
automatically a control set necessary for
realizing an assigned BPC permutation. The
network was interfaced to a personal computer
to receive control sets and input data. The
possibility to send back permuted data to the
computer was also provided. The arrangement
demonstrated the capability of the implemented
network to realize an assigned BPC
permutation in presence of a fault for two
passes through the network. The hardware of
the experimental arrangement consisted of two
boards. In board I, the circuitry for address
decoder was implemented. Fig. 3 gives the
block diagram of the board I. The addresses
used for both boards are 30x, where x can be
4
AU J.T. 8(1): 1-7 (Jul. 2004)
fault tolerance for the input and output sides,
the first and the last stage of the Benes network
was augmented with 8 demultiplexer
(74LS155) and 8 multiplexer (74LS157) ICs
along with 8 inverter (74LS04) ICs.
The
arrangement described above was successfully
tested and demonstrated the ability to realize
BPC permutations in presence of any single
fault in the modified Benes network for two
network cycles.
However, it is clear that there is no
contradiction for realizing any arbitrary
permutation in similar ways under a faulty
condition in the network, but in this case the
control set should be computed using more
complex algorithm than that used in our
research, for example, the looping algorithm
proposed by Opferman and Tsao-wu (1971)
the latches in Board II. Both of the 82C55A
ICs were set in Mode 0 (Basic Input/Output)
where the control word for Port I is 82H and
the control word for Port II is 80H.
As to board II, which block diagram is
presented in Figure 4, there is one 3-to-8
Decoder (74LS138) for enabling seven Octal
D-type transparent latches (74373) to grab the
inputs from the ports from board I at proper
time. The seven latches are used for holding the
appropriate values until they are refreshed. One
octal 3-state buffer (74LS244) is used with
each latch to provide improved noise rejection
and high fan-out outputs. LEDs were also used
to display the input data going to and output
data coming from the Benes network.
To implement Benes network 20
switching elements were constructed using
74LS08, 74LS32 and 74LS04 ICs. To provide
A 4 -A 1 1
FR O M
IS A B U S
A D D R E S S D E C O D E R
A 2 FR O M
IS A B U S
0
0
D O -D 7
F R O M IS A
B U S
A 2 FR O M
IS A B U S
0
0
D O -D 7
F R O M IS A
B U S
O R
O R
0
0
P R O G R A M M A B LE
P A R A LLE L P O R T 1
P R O G R A M M A B LE
P A R A LLE L P O R T 2
8
8
P C 0P C 7
8
P B 0P B 7
8
8
32
P C 0P C 7
P A 0P A 7
Fig. 3. Block diagram of board I
5
P B 0P B 7
P A 0P A 7
AU J.T. 8(1): 1-7 (Jul. 2004)
FRO M PR O G RAM M ABLE
PAR ALLEL PO RT 2
PC6
PC5
PC4
OUTPUT
DATA
LO AD_D M
D M IN P U T S
LED
IN D IC A T IO R
3 TO 8 DECODER
O C T A L 3 -S T A T E B U F F E R
O C T A L D -T Y P E L A T C H
LOAD_AND
LOAD_DM
LOAD_C51_C54
LOAD_C31_C44
LOAD_C11_C24
LOAD_OUTPUT
LOAD_INPUT
O C T A L 3 -S T A T E B U F F E R
OUTPUT
DATA
AND GATES
O C T A L 3 -S T A T E B U F F E R
D M 0 -D M 7
O C T A L D -T Y P E L A T C H
L O A D _ IN P U T
IN P U T D A T A
LO A D _C 11_C 24
C 11- C 24
LO A D _C 31_C 44
C 31- C 44
LO A D _C 51_C 54
C 51 - C 54
AND
IN P U T S
O C T A L D -T Y P E L A T C H
O C T A L D -T Y P E L A T C H
O C T A L D -T Y P E L A T C H
O C T A L D -T Y P E L A T C H
O C T A L 3 -S T A T E B U F F E R
O C T A L 3 -S T A T E B U F F E R
O C T A L 3 -S T A T E B U F F E R
O C T A L 3 -S T A T E B U F F E R
O C T A L D -T Y P E L A T C H
LED
IN D IC A T IO R
LO A D_O UTPU T
C 11 - C 24
SW 11
C 11
SW 12
IN 0 -IN 7
C 12
SW 13
C 13
SW 14
C 14
LO A D_A ND
C 51 - C 54
C 31 - C 44
SW 21
SW 31
C 21
C 31
C 41
SW 22
SW 32
SW 42
SW 52
C 32
C 42
C 52
C 22
SW 23
C 23
SW 24
C 24
SW 33
SW 41
SW 43
C 43
C 33
SW 34
SW 44
C 44
C 34
SW 51
C 51
O U T 0 -O U T 7
SW 53
C 53
SW 54
C 54
Fig. 4. Block diagram of board II
Note: To reduce complexity of the diagram demultiplexers and multiplexers in input and output
stages and the connections between switches of different stages are not shown.
may be considered. It is also clear that the idea
implemented in our arrangement may be used
for exchanging data packets of any size rather
than only one bit between the processors, or
between processors and modules of parallel
memory. It is pertinent to note that for actual
parallel computing systems using multistage
inter-connection networks. Techniques for
automatic diagnosis of faults in a network have
already been developed. The work by Feng and
Wu (1981) in this field is well known.
6
AU J.T. 8(1): 1-7 (Jul. 2004)
Fig. 5. Logic diagram of a 2 x 2 switch
Connecting Networks and Telephone Traffic,
Academic Press, New York, NY. USA.
Clos, C. 1953. A study of non-blocking
switching networks. Bell Syst. Tech. J 32:
406-4.
Feng, T.Y.; and Wu, C.L. 1981. Faultdiagnosis for a class of multistage
interconnection networks. IEEE Trans.
Comput. C-30(10): 743-58.
Nassimi, D.; and Sahni, S. 1981. A self-routing
Benes network and parallel permutation
algorithms. IEEE Trans. Comput. C-30(5):
332-40.
Opferman, D. C.; and Tsao-wu, N. T. 1971. On
a class of rearrangeable switching networks.
Bell Syst. Tech. J., 50: 1579-618.
Siegel, H.J.; and Smith, D.S. 1978. Study of
Multistage
SIMD
Interconnection
Networks. Proc. 5th Annual Symp. on
Computing Architectures, pp. 223-9, IEEE
Computer Society, Boston, Mass., USA.
Veselovskii, G.G.; Karavai, M.F.; and
Kuznechik,
S.M.
1989.
Switching
Networks for SIMD Multiprocessor
Computing Systems, Automation and
Remote Control, Plenum Publ. Corp., New
York, Vol. 5, No. 2, Part 2: 133-51.
Conclusion
Fault tolerance is one of the essential
features required of an actual interconnection
network in an actual parallel computer. An
approach
for
providing
fault-tolerant
functioning of a Benes network based on the
use of a control set computed for non-faulty
conditions, but divided in two subsets which
are applied in turn to the non-faulty subnetwork which was experimentally tested. The
experiments were done for BPC permutations
setups with the use of specific fast routing
algorithm, verifying the validity of the
aforesaid approach.
References
Adams, III, G. B.; and Siegel, H. J. 1982. The
extra stage cube: A fault-tolerant interconnection network for supersystems. IEEE
Trans. Comput., C-31, No. 5: 443-4.
Benes, V.E. 1964. Optimal rearrangeable
multistage connecting network. Bell Syst.
Tech. J. 43: 1641-56.
Benes, V.E. 1965. Mathematical Theory of
7
Download