AU J.T. 8(1): 1-7 (Jul. 2004) The Experimental Arrangement for the Control Method Validation in the Fault-Tolerant Benes Network Gennady Veselovsky, Mohammad Hasanul Karim and Maliha Mahboob Faculty of Engineering, Assumption University Bangkok, Thailand Abstract This paper considers the experimental checking and the fast routing algorithm ease of the fault tolerant modification of the Benes network. The perfect symmetry of two sub-networks between outer stages of the Benes network allows using the control set computed for ‘not faulty’ condition in presence of faults, if only one of sub-networks is affected. But the given control set in this case is divided in two subsets, which are applied to the ‘not faulty’ sub-network. In turn, it provided some necessary changes in setup of switches in outer stages. Here the aforesaid general approach proposed by Veselovsky et al. is tested for the case of routing BPC permutations with the use of fast algorithm developed by Nassimi and Sahni. Keywords: Fast routing algorithm, BPC permutations, multiple processor, interconnection network network. With a crossbar full connectivity is provided, i.e. all processors can communicate with each other (or with parallel memory modules) simultaneously without reduction of bandwidth. To connect N inputs to N outputs with a crossbar N2 switches are needed. Consequently, crossbar networks cannot be scaled to any arbitrary size. Multistage interconnection networks (MINs) provide more cost-effective communication than crossbar networks. An N x N multistage switching network is a communication network with N input terminals (sources) and N output terminals (destinations) composed of a certain number of stages of switching elements (with 2 x 2 switching elements) with a total number of cross-points equal to O(Nlog2N). A class of non-blocking multistage switching networks that are of considerable practical interest was proposed by Clos (1953) and Benes (1964; 1965). The objective was to develop a multistage network realizing crossbar switching functions; i.e. no blocking, yet having fewer connection points. Such networks are fairly efficient in terms of hardware costs and are known in the literature as Introduction There has been a long debate on whether one fast processor would be faster and more cost effective than a system with more than one slower, but less expensive processor. However at present, the consensus appears that multiple processors must be used to obtain improvements in speed even though the difficulties of using more than one processor on a single problem have not been resolved. Multiprocessor systems are also designed to gain fault tolerance, i.e. to be able to continue operating in the presence of hardware (and possibly software) faults. Hardware fault tolerance is achieved by the addition of circuits that are not necessary for normal operation, but enable the system to continue if the faults occur. The number of faults that can be present is limited and dependent upon the system design. Interconnection network is the key component of parallel computer architecture. Interconnection networks can be used for internal connecting among processors, memory modules and I/O devices. Crossbar is an example of single-stage interconnection 1 AU J.T. 8(1): 1-7 (Jul. 2004) two network cycles. If the first and last stages are augmented with multiplexers/ demultiplexers, then the modified Benes network will be tolerant to any single faults. The structure of the fault tolerant modification of the 8 x 8 Benes network is shown in Fig. 1. Similar bypassing circuitry in outer stages of a network was proposed by Adams and Siegel (1982) in concerns with extra stage cube topology. If a fault is detected the data in the first pass is switched through the non-faulty paths. Because of the perfect symmetry of the two alternative paths between the same pair of input/output terminals, the data is switched to the alternative path in the second pass by applying the control signals intended for the switching elements of one “storey” intermediate stages (or one sub-network) to elements of another “storey” (or another subnetwork). The control signals are generated in advance for the given permutation by one of the known algorithms, assuming a non-faulty network. The control for the switching elements of the first and last stages participating in the unrealized connection is reversed, i.e. if one element was initially connected “straight”, it is now switched to “exchange” and conversely. If the fault is detected in the first or last stage, then the network control is somewhat different. Specifically if the fault is detected in a switching element of the first (last) stage connected “straight”, then this element is disconnected, the bypass multiplexers/ demultiplexers are activated, and the required switching can be realized even in a single pass, because all the other control signals are not altered. If the faulty element was realizing exchange function, then the faulty element is disconnected, all the input/output elements, except those connected to the faulty element, are switched in the first pass, and the bypass multiplexers- demultiplexers of the faulty element are activated in the second pass, when the control signals from the first storey of intermediate stages are transferred to the second storey and conversely, thus exciting two alternative paths relative to the initial paths in the faulty network. So, the modified Benes network allows decomposition of the entire rearrangeable, because in case of blocking there is always a possibility of realizing the desired connection by rearranging some of the previously established connections. The Benes network can perform any arbitrary permutation of N inputs. The term ‘permutation’ in this context defines a request for parallel connecting N sources to N destinations with a distinct destination for each of the sources. The best-known setup algorithm for performing an arbitrary permutation by the Benes network requires O (Nlog2N) time. This is known as fast routing algorithm, developed by Nassimi and Sahni (1981), with which the Benes network can perform certain permutations in O (log2N) time (including the setup time). It was demonstrated that the richness of the set F (the set of permutations realizable on “fast routing” Benes network) includes most classes of permutations studied in the parallel processing literature. Here we limit our discussion with BPC (bit-permute-complement) permutations, which belong to the broader set F. A permutation is called a BPC permutation if the destination address can be obtained from the source address by permuting bits in the source address (sn-1 sn-2 … s0) and/or complementing some or all of its bit positions. With a large number of processors (hundreds and thousands), the networks are fairly complex devices. The switching control algorithm is also fairly complex, essentially for Benes network. Since the reliability of a switching network has a direct impact on the reliability of the entire system, reliability improvement can be achieved by designing networks that are fault tolerant with respect to single faults (usually). Fault tolerance implies neutralization of the effect of faults in the network by providing static and dynamic redundancies. The Fault Tolerant Modification of the Benes Network In the Benes network, there is more than one independent path for each input-output pair, so that in principle it is possible to bypass single faults and to realize any given permutation in two passes or in other words, in 2 AU J.T. 8(1): 1-7 (Jul. 2004) Fig. 1. The fault tolerant modification of the 8 x 8 Benes network network into two sub-networks of equal size, and the non-faulty half can emulate the entire network. This decomposition is useful during the recovery of the faulty half of the network. The aforesaid approach was originally introduced by Veselovsky et al. (1989). The state of a switch in stage b or stage 2n-2b, 0≤ b ≤ n-1, is determined by bit b of the destination tag of its upper input. If bit b is 0, the switch is set to state 0, otherwise to state 1. One of the most frequently used in parallel programming is a permutation what is called bit reversal. This permutation belongs to the BPC class. With bit reversal the destination tag for each input/output pair is produced from the source tag by changing the order of bits in its binary representation to the reverse one. E.g. in 32 x 32 Benes network the destination tag for source tag 10110 with bit reversal looks as follows: 01101. Bit reversal is very useful in particular when programming ‘Fast Fourier Transform’ for an array computer. Bit reversal permutation for N = 8 with decimal representation of source and destination tags is given below (numbers in the upper row are source tags and numbers in the lower row are destination tags respectively): Algorithm for Fast Routing in the Benes Network For experimental testing the validity of the aforesaid technique for providing fault tolerance in the Benes network the routing algorithm proposed by Nassimi and Sahni (1981) was chosen. To describe the method let Di be the “destination tag” on input terminal i, 0≤ i ≤ N-1. Here (Dn-1 Dn-2 … D1 D0) is a permutation of (N-1, …, 1,0). The data at input terminal i is to be routed to output terminal Di. The switch settings are determined from the binary representation of Di. If an N= 2n input / output Benes network is being used, then there are 2n-1 stages of switches whose settings need to be obtained. Let the stages be numbered 0 through 2n-2. p= 3 ( 01234567 04261537 ) AU J.T. 8(1): 1-7 (Jul. 2004) Fig. 2 shows the switch settings in 8 x 8 Benes network obtained with the use of Nassimi and Sahni routing algorithm to realize bit reversal permutation. d2d1d0 000 000 010 000 001 010 101 010 111 111 011 101 100 100 000 010 000 101 010 110 101 111 001 101 100 110 100 001 001 011 011 111 001 011 110 011 100 111 110 110 Stage: 0 Control: d0 1 2 3 4 d1 d2 d1 d0 Fig. 2. An 8 x 8 Benes network with switches set up to realize bit reversal permutation any hexadecimal number from 0 to F. In this board, there were two 74LS688 ICs (8-bit magnitude comparators) to compare the incoming address bits from the ISA bus (A4A11) with the adjusted address bits in two DIP switches. After comparing the address bits, the outputs of both the comparator ICs were fed into one OR gate. Address bit A2 from ISA bus and the output of the OR gate was then again fed to another OR gate which is used to select Port I (one of the two 82C55A ICs). To select Port II, another 82C55A IC, the inverted A2 address bit was ORed along with the output of the first OR gate. Each of the 82C55A, programmable parallel ports had three 8-bit ports namely PA, PB, PC. PA of Port I was used for taking input from the user and outputting it into the Benes network, PB of Port I for taking input from the outputs of the Benes network, PC of Port I for handling faulty conditions, PA, PB, PC (lower) of Port II for setting the control bits of the switches of the Benes network. PC (upper) of Port II was fed to a 3-to-8 Decoder for enabling Arrangement Design To test experimentally the aforesaid approach the 8 x 8 fault tolerant modification of the Benes network was implemented with the option to insert faults by removing any selected switch(es). The Nassimi and Sahni routing algorithm was programmed to compute automatically a control set necessary for realizing an assigned BPC permutation. The network was interfaced to a personal computer to receive control sets and input data. The possibility to send back permuted data to the computer was also provided. The arrangement demonstrated the capability of the implemented network to realize an assigned BPC permutation in presence of a fault for two passes through the network. The hardware of the experimental arrangement consisted of two boards. In board I, the circuitry for address decoder was implemented. Fig. 3 gives the block diagram of the board I. The addresses used for both boards are 30x, where x can be 4 AU J.T. 8(1): 1-7 (Jul. 2004) fault tolerance for the input and output sides, the first and the last stage of the Benes network was augmented with 8 demultiplexer (74LS155) and 8 multiplexer (74LS157) ICs along with 8 inverter (74LS04) ICs. The arrangement described above was successfully tested and demonstrated the ability to realize BPC permutations in presence of any single fault in the modified Benes network for two network cycles. However, it is clear that there is no contradiction for realizing any arbitrary permutation in similar ways under a faulty condition in the network, but in this case the control set should be computed using more complex algorithm than that used in our research, for example, the looping algorithm proposed by Opferman and Tsao-wu (1971) the latches in Board II. Both of the 82C55A ICs were set in Mode 0 (Basic Input/Output) where the control word for Port I is 82H and the control word for Port II is 80H. As to board II, which block diagram is presented in Figure 4, there is one 3-to-8 Decoder (74LS138) for enabling seven Octal D-type transparent latches (74373) to grab the inputs from the ports from board I at proper time. The seven latches are used for holding the appropriate values until they are refreshed. One octal 3-state buffer (74LS244) is used with each latch to provide improved noise rejection and high fan-out outputs. LEDs were also used to display the input data going to and output data coming from the Benes network. To implement Benes network 20 switching elements were constructed using 74LS08, 74LS32 and 74LS04 ICs. To provide A 4 -A 1 1 FR O M IS A B U S A D D R E S S D E C O D E R A 2 FR O M IS A B U S 0 0 D O -D 7 F R O M IS A B U S A 2 FR O M IS A B U S 0 0 D O -D 7 F R O M IS A B U S O R O R 0 0 P R O G R A M M A B LE P A R A LLE L P O R T 1 P R O G R A M M A B LE P A R A LLE L P O R T 2 8 8 P C 0P C 7 8 P B 0P B 7 8 8 32 P C 0P C 7 P A 0P A 7 Fig. 3. Block diagram of board I 5 P B 0P B 7 P A 0P A 7 AU J.T. 8(1): 1-7 (Jul. 2004) FRO M PR O G RAM M ABLE PAR ALLEL PO RT 2 PC6 PC5 PC4 OUTPUT DATA LO AD_D M D M IN P U T S LED IN D IC A T IO R 3 TO 8 DECODER O C T A L 3 -S T A T E B U F F E R O C T A L D -T Y P E L A T C H LOAD_AND LOAD_DM LOAD_C51_C54 LOAD_C31_C44 LOAD_C11_C24 LOAD_OUTPUT LOAD_INPUT O C T A L 3 -S T A T E B U F F E R OUTPUT DATA AND GATES O C T A L 3 -S T A T E B U F F E R D M 0 -D M 7 O C T A L D -T Y P E L A T C H L O A D _ IN P U T IN P U T D A T A LO A D _C 11_C 24 C 11- C 24 LO A D _C 31_C 44 C 31- C 44 LO A D _C 51_C 54 C 51 - C 54 AND IN P U T S O C T A L D -T Y P E L A T C H O C T A L D -T Y P E L A T C H O C T A L D -T Y P E L A T C H O C T A L D -T Y P E L A T C H O C T A L 3 -S T A T E B U F F E R O C T A L 3 -S T A T E B U F F E R O C T A L 3 -S T A T E B U F F E R O C T A L 3 -S T A T E B U F F E R O C T A L D -T Y P E L A T C H LED IN D IC A T IO R LO A D_O UTPU T C 11 - C 24 SW 11 C 11 SW 12 IN 0 -IN 7 C 12 SW 13 C 13 SW 14 C 14 LO A D_A ND C 51 - C 54 C 31 - C 44 SW 21 SW 31 C 21 C 31 C 41 SW 22 SW 32 SW 42 SW 52 C 32 C 42 C 52 C 22 SW 23 C 23 SW 24 C 24 SW 33 SW 41 SW 43 C 43 C 33 SW 34 SW 44 C 44 C 34 SW 51 C 51 O U T 0 -O U T 7 SW 53 C 53 SW 54 C 54 Fig. 4. Block diagram of board II Note: To reduce complexity of the diagram demultiplexers and multiplexers in input and output stages and the connections between switches of different stages are not shown. may be considered. It is also clear that the idea implemented in our arrangement may be used for exchanging data packets of any size rather than only one bit between the processors, or between processors and modules of parallel memory. It is pertinent to note that for actual parallel computing systems using multistage inter-connection networks. Techniques for automatic diagnosis of faults in a network have already been developed. The work by Feng and Wu (1981) in this field is well known. 6 AU J.T. 8(1): 1-7 (Jul. 2004) Fig. 5. Logic diagram of a 2 x 2 switch Connecting Networks and Telephone Traffic, Academic Press, New York, NY. USA. Clos, C. 1953. A study of non-blocking switching networks. Bell Syst. Tech. J 32: 406-4. Feng, T.Y.; and Wu, C.L. 1981. Faultdiagnosis for a class of multistage interconnection networks. IEEE Trans. Comput. C-30(10): 743-58. Nassimi, D.; and Sahni, S. 1981. A self-routing Benes network and parallel permutation algorithms. IEEE Trans. Comput. C-30(5): 332-40. Opferman, D. C.; and Tsao-wu, N. T. 1971. On a class of rearrangeable switching networks. Bell Syst. Tech. J., 50: 1579-618. Siegel, H.J.; and Smith, D.S. 1978. Study of Multistage SIMD Interconnection Networks. Proc. 5th Annual Symp. on Computing Architectures, pp. 223-9, IEEE Computer Society, Boston, Mass., USA. Veselovskii, G.G.; Karavai, M.F.; and Kuznechik, S.M. 1989. Switching Networks for SIMD Multiprocessor Computing Systems, Automation and Remote Control, Plenum Publ. Corp., New York, Vol. 5, No. 2, Part 2: 133-51. Conclusion Fault tolerance is one of the essential features required of an actual interconnection network in an actual parallel computer. An approach for providing fault-tolerant functioning of a Benes network based on the use of a control set computed for non-faulty conditions, but divided in two subsets which are applied in turn to the non-faulty subnetwork which was experimentally tested. The experiments were done for BPC permutations setups with the use of specific fast routing algorithm, verifying the validity of the aforesaid approach. References Adams, III, G. B.; and Siegel, H. J. 1982. The extra stage cube: A fault-tolerant interconnection network for supersystems. IEEE Trans. Comput., C-31, No. 5: 443-4. Benes, V.E. 1964. Optimal rearrangeable multistage connecting network. Bell Syst. Tech. J. 43: 1641-56. Benes, V.E. 1965. Mathematical Theory of 7