ON PARALLEL GENERATION OF COMBINATIONS IN ASSOCIATIVE PROCESSOR ARCHITECTURES ZBIGNIEW KOKOSINSKI * University of Aizu, Department of Computer Software Aizu-Wakamatsu, 965-80 Fukushima, Japan e-mail: kokosin@u-aizu.ac.jp Abstract In this paper two new parallel algorithms are presented for generation of (n,k)-combinations. Computations run in associative processor models. Objects are generated in lexicographic order, with O(1) time per object, in two dierent representations. The rst algorithm uses the conventional representation of combinations while the second algorithm generates combinations in the form of binary vectors and therefore is particularly well suited for row/column masks generation in associative processors. The algorithms may be also used for generation of related combinatorial objects like combinations with repetitions and integer compositions. Keywords: choice function, combination generation, associative processing. 1. INTRODUCTION The rst known algorithm for generating (n,k)combinations published in 1960 is due to Lehmer [27]. In the following years a number of sequential algorithms was developed [7, 9, 26, 33, 38, 40, 41]. Sequential generation methods were reviewed and compared two times [1, 36]. An increasing interest in parallel computation systems resulted also in development of many parallel algorithms for generation of combinatorial objects. The rst parallel algorithm for generating combinations was published in 1984 [44]. In this adaptive algorithm a new technique for unranking combinations was used, complementary to the rst such technique given by Lehmer [28]. Then, numerous parallel solutions to the combination generation problem were published for various models of computations 3 on leave from Politechnika Krakowska,Krak ow,Poland. [2, 4, 6, 8, 10, 14, 19, 29, 30, 31, 42, 43]. Simultaneously, the structure of sets of all (n,k)-combinations and related combinatorial objects like combinations with repetitions and integer compositions was investigated, and new ranking/unranking techniques were proposed satisfying various requirements [3, 14, 17, 21, 23, 44]. Most known generation algorithms use the conventional representation of combinations, which is not suitable for applications, where fast generation in the binary representation is required, and therefore objects generated in one representation have to be converted into another. One instance of such application is massively parallel associative processing [11, 24, 46]. The associative machine model has been shown to have new applications in many dierent areas of parallel computing including NP-complete problems, processing of data bases, computational geometry, expert systems, etc. [12, 13, 15]. Many ecient algorithms developed in these areas explore the power of massive associative processing. However, many associative processors need in their environement two additional hardware components, which are able to perform mask/comparand vector generation eciently: a fast network generating permutations and a generator of combinations in the binary representation (see Fig.1). A hardware n-permutation generator, suitable for pattern vector generation, was presented in [18]. Next, a versatile programmable generator of n-permutations, (n,k)-combinations and (n,m)-partitions (i.e. partitions of n-element set into at most m nonempty blocks) was proposed [19, 20]. Although that last generator produces combinations and blocks of partitions in the binary representation, the time needed for the generation of a single object becomes too long for large values of n, because of O(n) circuit (cellular array) propagation time. Therefore, the problem of nding a better generation technique remains still open. In the present paper new combination generators are proposed. The generation algorithms for both conventional and binary combination representations of combinations are described which are simultaneously simple and ecient. Generation of combinations runs PATTERN INPUT COLUMN MASK PERMUTATION NETWORK R O W ASSOCIATIVE M A S PROCESSOR K OUTPUT DATA Figure 1: Applications of masks/pattern generators in associative processors. in associative models of computations what coincides with the dedicated processor architecture type. Consecutive objects are generated in lexicographic order, with constant time per object. Thus, all the requirements for application in associative processing are fully satised. The rest of the paper is organized as follows. The next section introduces combinatorial objects representations. Section 3 describes models of computations used throughout this paper. Associative algorithms for generation of combinations in conventional and binary representation are presented in section 4. Section 5 contains concluding remarks. 2. REPRESENTATIONS OF COMBINATORIAL OBJECTS Let us introduce basic notions used throughout this paper. Let < Ai >i2I denote an indexed family of sets Ai = A, where: A =f0, ... ,ng, I =f1, ... , kg, 1 n,k. Any mapping f which "chooses" one element from each set A1 ; :::; Ak is called a choice function of the family < Ai >i2I [32]. With additional restrictions we can model by choice functions various classes of combinatorial objects [14, 16, 18, 19, 20]. If a suplementary conditions: I =f1, ... , ng and ai 2 f0; 1g are satised then any choice function = < ai >i2I , that belongs to the indexed family < Ai >i2I , is called binary choice function of this family. Sets of all binary choice functions, with the number of ai = 1 equal to k, are binary representations of the set of all k-subsets (combinations) of the set A (in these cases we deal in fact with indexed sets Bi = f0,1g Ai ). If a suplementary condition: ai < aj , for i < j, and i, j 2 I , are satised then any choice function =< ai >i2I , that belongs to the indexed family < Ai >i2I , is called increasing choice function of this family. Sets of all increasing choice functions are representations of the set of all k-subsets (combinations) of the set A. In the conventional representation of combinations we deal in fact with indexed sets Ci = fi, ... ,n-k+ig Ai , i 2 I . If a suplementary condition: ai aj , for i < j, and i, j 2 I , is satised then any choice function =< ai >i2I , that belongs to the indexed family < Ai >i2I , is called nondecreasing choice function of this family. Sets of all nondecreasing choice functions are representations of the set of all k-subsets with repetitions (combinations with repetitions) of the set A. In the conventional representation of combinations with repetitions we deal in fact with indexed sets Di = f1, ... ,n-k+1g Ai . P If a suplementary condition: ki=1 ai = n, is satised then any choice function =< ai >i2I that belongs to the indexed family < Ai >i2I is called k-composition of the integer n. Set of all such choice functions represents the set of all k-compositions of the integer n. For given 0n and 1k, the number of all choice functions equals n+k0k01 1 . Let us introduce now lexicographic order on the set of all choice functions of the family < Ai >i2I . For given choice functions =< d1 ; :::; dk > and =< g1 ; :::; gk >, we say that is less then according to the increasing lexicographic order, if and only if there exists i 2 f1, ... ,kg, satisfying di < gi , and dj = gj , for every j < i. For given choice functions =< d1 ; :::; dk > and =< g1 ; :::; gk >, we say that is less then according to the decreasing lexicographic order, if and only if there exists i 2 f1, ... ,kg satisfying di > gi and dj = gj , for every j < i. From the above denitions result immediately the following properties of choice functions and , respectively: Property 1 0 1 Any given c.f. t ; 1 < t nk , may be obtained from the c.f. t01 , preceding it in lexicographic order, by incrementing the rightmost element t01 [g ] < (n 0 k + 1), and seting all (at most k-1) elements t01 [h], h > g , to the same value t01 [g ] + 1. A 1 A[1] ... ... κ A[k] k Algorithms, which implement both transformations, are presented in section 4. They can be used for parallel generation of , , and sequences in lexicographic order. 3. MODELS OF COMPUTATIONS Parallel algorithms described later in this paper run in associative processor models. The basic model (see Fig.2) consists of single memory cell S and associative memory block A of size k, with memory cells lineary ordered, and containing respective order numbers as their address keys. Cell S and cells in block A are considered to be elementary processors. As most parallel algorithms, the two combination generation algorithms presented in this paper require an interprocessor communication pattern. In particular, we need a single source processor (cell S) to sent identical data to a subset of destination processors (cells) in the block A. This sort of communication operation is called one-to-subset broadcast (single-node broadcast) and together with single-node accumulation operation is used in several important parallel algorithms including matrix-vector multiplication, vector inner product, Gaussian elimination, and shortest path [25]. All processors of the block A and processor S are connected to a bus which is used for data transmission. In order to perform one-to-subset broadcast operation (i.e. to determine destination processors) the memory cells have to execute associative range matching in processor block A. Only one one-to-subset broadcast is performed at a time. The basic model is used for the generation of control sequences in both algorithms COMBGEN and BINCOMBGEN. A slightly simplied version of this model is used exclusively in the algorithm BINCOMBGEN. This second model consist of one associative O U T P U T converter Property 2 Similar transformation of corresponding c.f. t01 into t , may be obtained by two independent substring reversal operations : < t01 [g + t01 [g ] 0 1], t01 [g + t01 [g ]] >r and < t01 [g + t01 [g ] + 1] ,..., t01 [n] >r (as a result substrings 01 and 13 03 are received, respectively). λ data bus address range A S S Figure 2: Basic model of associative computations. memory block B of size n, with memory cells lineary ordered, and containing respective order numbers as their address keys. The only operation performed in this model is one-to-subset broadcast but the source is restricted to values from the set f0,1g. Thus, the second model can be considered as a degenerate version of the basic model. 4. ALGORITHMS Construction of both presented algorithms is based on the observation that despite of applying dierent approaches to the generation task various generation algorithms for given class of objects reveal a common control structure. For instance, the common control structure of permutation generation algorithms was discovered by Sedgewick [39] and this structure was used for the construction of a permutation generator [18]. In this paper we assume that the common control structure for (n,k)-combinations in both representations, (n-k+1,k)-combinations with repetitions and (n-k,k+1)-compositions of integers is the structure of (n-k+1,k)-combinations with repetitions itself. The properties of the sequence of combinations with repetitions as nondecreasing choice functions are a key factor of our parallelization method. Therefore the sequence of choice functions has been chosen as a basic control sequence for the generation. Actually, other related objects can be obtained from c.f. by certain conversion operations. Both presented algorithms use consequently uniform one-to-subset broadcast operations described in section 3. In order to produce control sequences the algorithms operate on the set of associative memory locations A and single memory cell S. The range of the subset of destination cells in set A is determined in parallel by associative compare range operation which requires O(1) time. In the algorithm COMBGEN the procedure OUTPUT performs a conversion, which is required to produce from the control sequence in table A proper output sequences. A pseudocode of the parallel algorithm COMBGEN for generating combinations in the ordinary representation is as follows: Algorithm COMBGEN Input : n - size of the set, k - size of the subsets. Output: Table K with the consecutive choice functions . Method: In table S future values of A subsequences are computed and stored in advance. Computations begin with S=1. Then, the rst function in the table A is obtained (steps 1-2), and next value of S is determined (step 3). In step 4 the rst output is produced. Next, consecutive values A and S are produced and output sequences are computed (step 5). Computations run until the last c.f. is generated, i.e. IND=0. /1-3 initialization phase/ 1. MAX:=n-k+1; IND:=1; S:=1; 2. ONE2SUBSET(S,A,IND,k); 3. S:= A(IND)+1; 4. do in parallel 4.1. OUTPUT; 4.2. IND:=k; 5. while IND>0 do 5.1. ONE2SUBSET(S,A,IND,k); 5.2. do in parallel 5.2.1. OUTPUT; 5.2.2. if A[IND]<MAX then 5.2.2.1. S:= A(IND)+1; 5.2.2.2. IND:=k; else 5.2.2.3. IND:=IND-1; ONE2SUBSET(ONE,SET,LEFT,RIGHT) /one-to-subset broadcast/ 1. for I:=LEFT to RIGHT do in parallel SET[I]:=ONE; OUTPUT /conversion and output/ 1. for I:=1 to k do in parallel K[I]:=A[I]+I-1; 2. output K; Exemplary sequences generated by the algorithm, for n=6, k=3, are depicted in Table I. In columns 3 and 4 (Tables S and A) transformations of the control sequence are shown. The bold font in these columns points out the source and the destination memory cells in all one-to-subset broadcasts between S and A. TABLE I Sequences generated by algorithms COMBGEN and BINCOMBGEN (n=6, k=3) No. IND S A=L K B 1 1 1 111 123 111000 2 3 2 112 124 110100 3 3 3 113 125 110010 4 3 4 114 126 110001 5 2 2 122 134 101100 6 3 3 123 135 101010 7 3 4 124 136 101001 8 2 3 133 145 100110 9 3 4 134 146 100101 10 2 4 144 156 100011 11 1 2 222 234 011100 12 3 3 223 235 011010 13 3 4 224 236 011001 14 2 3 233 245 010110 15 3 4 234 246 010101 16 2 4 244 256 010011 17 1 3 333 345 001110 18 3 4 334 346 001101 19 2 4 344 356 001011 20 1 4 444 456 000111 0 Exemplary output sequences obtained from the conversion procedure and representing combinations are shown in column 5 (Table K). Constant delay between objects is provided by execution of conditional step 5.2.2 in constant time. If constant delay is not essential further speedup may be achieved through a hardware implemention if, for IND=k, all one-to-one broadcasts are replaced by increment operations. Algorithm BINCOMBGEN Input : n - size of the set, k - size of the subsets. Output: Table B with the consecutive choice functions . Method: Computations in tables S and A run identically as in algorithm COMBGEN. Table B is initialized at the same time as tables A and S (steps 1-3). In step 4 the rst output is produced. Then, in step 5, the alternate modication of tables A and S take place, while table B is modied in one or two phases (phase 1: steps 5.2 and 5.3; phase 2: steps 5.4.2.1 and 5.4.2.2). Computations run until the last -sequence is generated, i.e. IND=0. /1-3initialization phase/ 1. MAX:=n-k+1; IND:=1; S:=1; 2. do in parallel 2.1. ONE2SUBSET(S,A,IND,k); 2.2. ONE2SUBSET(0,B,1,n); 3. do in parallel 3.1. S:= A(IND)+1; 3.2. ONE2SUBSET(1,B,1,k); 4. do in parallel 4.1. output B; 4.2. IND:=k; 5. while IND>0 do 5.1. do in parallel 5.1.1. ONE2SUBSET(S,A,IND,k); 5.1.2. v:=IND+S; 5.2. ONE2SUBSET(0,B,v-2,v-2); 5.3. ONE2SUBSET(1,B,v-1,v-1); 5.4. if A(IND)<MAX then 5.4.1. S:= A(IND)+1; 5.4.2. if IND<k then 5.4.2.1. do in parallel 5.4.2.1.1. ONE2SUBSET(0,B,v,n); 5.4.2.1.2. IND:=k; 5.4.2.2.ONE2SUBSET(1,B,v,IND+S-2); else 5.4.3. IND:=IND-1; 5.5. output B; ONE2SUBSET(ONE,SET,LEFT,RIGHT) /one-to-subset broadcast/ 1. for I:=LEFT to RIGHT do in parallel SET[I]:=ONE; Theorem 1 Algorithm COMBGEN generates, in the conventional representation, all (n,k)-combinations in the lexicographic order with constant time per combination in an associative model with k+1 processors, each of constant size. Thus, the algorithm COMBGEN is optimal. The algorithm BINCOMBGEN preserves the basic control structure of the algorithm COMBGEN. In the algorithm -sequences corresponding to -sequences are derived by performing analogous one-to-subset broadcast operations on additional set of associative memory locations, i.e. the table B of size n. In the column 6 of Table I transformations of the output -sequence (Table B) are depicted; italic and bold fonts indicate destinations of all one-to-subset broadcasts performed in the table B. It is worth to notice that lexicographic orders of c.f. and do not match (B is generated in reverse lexicographic order). In order to generate -sequences in lexicographic order, negation of the binary sequence in B is sucient. Algorithm BINCOMBGEN produces -sequences in O(1) time per object. However, it turns out that the most frequent degenerate broadcasts performed sequentialy in steps 5.2 and 5.3 (shown in Table 1 in italics) can be easily replaced by two simultaneous logical operations (negations): B[v-2]:= : B[v-2] and B[v-1]:= : B[v-1], what leads to an improvement of time parameters in a practical hardware implementation. The details of such implementation are outside the scope of this paper. Constant delay between objects can be obtain in a similar way as in algorithm COMBGEN. Theorem 2 Algorithm BINCOMBGEN generates, in the binary representation, all (n,k)-combinations in the reverse lexicographic order with constant time per combination in an associative model with k+1 processors, each of constant size. Thus, the algorithm BINCOMBGEN is optimal. 5. CONCLUDING REMARKS Two algorithms for (n,k)-combination generation in two dierent representations has been described in this paper. Although originally designed in order to support fast associative processing, they may be usefull also in solving classes of combinatorial problems in other models of computations. In both algorithms generation time for producing a single object is independent of parameters n and k. Unlike other genera- tion methods the algorithm BINCOMBGEN produces combinations in a binary representation and no timeconsuming conversion from the conventional representation is required. The algorithms provide the parallelization of computations on the level of single combinatorial object, satisfying all postulates listed in [5]. However, they can be used in adaptive combination generation too, enabling further parallelization on the set of objects level. In this case standard unranking techniques for combinations may be applied with a little eort for programming a number of generators working in parallel [21, 23]. Splitting the generation task in adaptive generation algorithm is much easier to accomplish in associative model then in linear array model, since in the latest model it is necessary to know states of six registers involved in computations and message passing in each of k systolic processors in order to program one combination generator [4, 5]. Integer compositions can be generated in O(1) time per object using algorithm COMBGEN providing that known transformation from (n,k)-combinations [34, 35, 37] is included in OUTPUT procedure. The solution presented for the combinations in the binary representation can be seen as a special case of parallel generation of n-compositions of the integer k, with a restiction on component size (the component size is less or equal 1). Solving a more general problem of integer composition generation with restrictions on component sizes remains an open problem. Acknowledgement The author thanks professor Adam Kapralski, who encouraged him to take up this research. The research was conducted as a part of \Generation of combinatorial congurations in parallel" project and supported by University of Aizu grant R-1-3/96. References [1] Akl S.G.: A comparison of combination generation methods, ACM Trans. of Math. Software, 7 (1981), pp. 42-45. [2] Akl S.G.: Adaptive and optimal parallel algorithms for enumerating permutations and combinations, The Computer Journal, 30 (1987), pp. 433-436. [3] Akl S.G.: Design and analysis of parallel algorithms, Prentice Hall, Englewood Clis, N.J., 1989, pp. 148-150. [4] Akl S.G., Gries D., Stojmenovic I.: An optimal parallel algorithm for generating combinations, Information Processing Letters, 33 (1989/90), pp. 135-139. [5] Akl S.G., Stojmenovic I.: Generating combinatorial objects on a linear array of processors, [in:] Zomaya A.Y. (editor): Parallel Computing; Paradigms and Applications, Int. Thompson Comp. Press, 1996, pp. 639-670. [6] Chan B., Akl S.G.: Generating combinations in parallel, BIT, 26 (1986), pp. 2-6. [7] Chase P.J.: Algorithm 382: Combinations of m out of n objects, Combination, Comm. ACM, 13 (1970), p. 368. [8] Chen G.H., Chern M.S.: Parallel generation of permutations and combinations, BIT, 26 (1986), pp. 277-283. [9] Even S.: Algorithmic Combinatorics, Macmillan, New York 1973. [10] Elhage H., Stojmenovic I.: Systolic generation of combinations from arbitrary elements, Parallel Processing Letters, 2 (1992), pp. 241-248. [11] Foster C.C.: Content addresable parallel processors, Van Nostrand Reinhold, New York 1976. [12] Kapralski A.: Binary matrices, their applications and processing in dedicated processors, Monograa 95, Politechnika Krakowska, Krakow, Poland, 1989. (in Polish) [13] Kapralski A.: Supercomputing for solving a class of NP-complete and isomorphic complete problems, Computer Systems Science & Eng., 7 (1992), No.4, pp. 218-228. [14] Kapralski A.: New methods for generation permutations, combinations and other combinatorial objects in parallel, J. Parallel and Distrib. Computing, 17 (1993), pp. 315-326. [15] Kapralski A.: Sequential and parallel processing in depth search machines, World Scientic, 1994. [16] Kapralski A.: Representation and parallel generation of number and set partitions, decompositions,compositions and related combinatorial objects , Tech. Rep. 94-1-038, University of Aizu, Aizu-Wakamatsu, Japan, 1994, 29 pp. [17] Knott G.D.: A numbering system for combinations, Comm. ACM, 17 (1974), pp. 45-46. [18] Kokosinski Z.: On generation of permutations through decomposition of symmetric groups into cosets, BIT, 30 (1990), pp. 583-591. [19] Kokosinski Z.: Circuits generating combinatorial congurations for sequential and parallel computer systems, Monograa 160, Politechnika Krakowska, Krakow, Poland, 1993, 106 pp. (in Polish) [32] Mirsky L.: Transversal theory, Academic Press, N.Y. 1971. [20] Kokosinski Z.: Mask and pattern generation for associative supercomputing, Proc. of the 12th Int. Conference \Applied Informatics", Annecy, France, May 1994, pp. 324-326. [34] Nijenhius A., Wilf H.S.: Combinatorial Algorithms, Academic Press, New York 1978. [21] Kokosinski Z.: Algorithms for unranking combinations and their applications, Proc. of the 7th Int. Conference \Parallel and Distributed Computing and Systems", Washington D.C., USA, October 1995, pp. 216-224. [22] Kokosinski Z.: On parallel generation of combinations in associative processor architectures, Tech. Rep. 96-1-007, University of Aizu, AizuWakamatsu, Japan, 1996, 14 pp. [23] Kokosinski Z.: Unranking combinations in parallel, Proc. of the Int. Conference \Parallel and Distributed Processing Technics and Applications", Sunnyvale, CA, USA, August 1996, Vol.I/III, pp. 79-82. [24] Krikeslis A. Weems C.C. (eds): Associative processing and processors, IEEE Computer Society Press, Los Alamitos, 1997. [25] Kumar V., Grama A., Gupta A. and Karypis G.: Introduction to parallel computing. Design and analysis of algorithms, The Benjamin/Cummings Publishing Company, Redwood City 1994. [33] Misfud C.J.: Algorithm 154: Combination in lexicographic order, Comm. ACM, 6 (1963), p. 103. [35] Page E.S., Wilson L.B.: An Introduction to Computational Combinatorics, Cambridge University Press, 1979. [36] Payne W.H., Yves F.M.: Combination generators, ACM Trans. of Math. Software, 5 (1979), pp. 163-172. [37] Reingold E.M., Nievergelt J., Deo N.: Combinatorial Algorithms, Prentice Hall, Englewood Clis, New Jersey 1977. [38] Ruskey F., Adjacent interchange generation of combinations, J. of Algorithms, 9 (1988), pp. 162180. [39] Sedgewick R.: Permutation generation methods, Computing Survey, 9 (1977), pp. 137-164. [40] Semba I.: A note of enumerating combinations in lexicographic order, J. Information Processing, 4 (1981), pp. 35-37. [41] Semba I.: An ecient algorithm for generating all k-subsets (1 k m n) of the set f1,2,...,ng in lexicographical order, J. of Algorithms, 5 (1984), pp. 281-283. Combination, [42] Stojmenovic I.: On random and adaptive parallel generation of combinatorial objects, Int. J. Computer Mathematics, 42 (1992), pp. 125-135. [27] Lehmer D.H.: Teaching combinatorial tricks to a computer, [in:] Proc. of Symposium Appl. Math., Combinatorial Analysis, 10, Amer. Math. Society, Providence, R.I. 1960, pp. 179-193. [43] Stojmenovic I.: A simple systolic algorithm for generating combinations in lexicographic order, Computer and Mathematics with Applications, 24 (1992), pp. 61-64. [28] Lehmer D.H.: The machine tools of combinatorics, [in:] Beckenbach E.F. (editor): Applied combinatorial mathematics, John Wiley, N.Y. 1964, pp. 5-31. [44] Tang C.Y., Du M.W., Lee R.C.T.: Parallel generation of combinations, Proc. Int. Comput. Symp., Taipei, Taiwan 1984, pp. 1006-1010. [26] Kurtzberg J.: Algorithm 94: Comm. ACM, 5 (1962), p. 344. [29] Lin C.J.: A parallel algorithm for generating combinations, Computer and Mathematics with Applications, 17 (1989), pp. 1523-1533. [30] Lin C.J.: Generating subsets on a systolic array, Computer and Mathematics with Applications, 21 (1991), pp. 103-109. [31] Lin C.J., Tsay J.C.: A systolic generation of combinations, BIT, 29 (1989), pp. 23-36. [45] von zur Gathen J.: Parallel linear algebra, [in:] Rei J.H. (editor): Synthesis of parallel algorithms, Morgan Kaufman 1993, pp. 573-617. [46] Yau S.S., Fung H.S.: Associative processor architecture - a survey, Computing Survey, 9 (1977), No.1, pp. 3-27.