Fast Parallel Molecular Solution To the Hitting-set Problem Nung-Yue Shi and Chih-Ping Chu Department of Computer Science and Information Engineering, National Cheng Kung University, Tainan City 701, Taiwan, Republic of China E-mail: p7890112@ccmail.ncku.edu.tw Abstract The hitting-set problem is an nucleotides. Nucleotides contain three major components those are deoxyribose, phosphate group, and the base. Different nucleotides are tested by their base which could be adenine (abbreviated as A), guanine (G), cytosine (C) and thymine (T). Two strands of DNA could integer K |S|, and we would like to know if there is a subset S S with | S | K such that S hits (contains) at least one element from each subset in C. In this paper, the DNA-based algorithm is proposed to solve the hitting-set problem. Furthermore, the simulated experiment is applied to verify correction of the proposed DNA-based algorithm for solving the hitting-set problem. words: chucp@csie.ncku.edu.tw decide the inheritance model of human beings. DNA is made up of a linear chain of smaller units which is called NP-complete problem in set theory. Assume that we have collection C of subsets of a finite set S, and a positive Key and form a double helix if the respective bases are the famous Watson-Crick complements which are C matches G and A matches T. Of course, the 3 end (the 3rd carbon of the deoxyribose) will match the 5 end (the 5th carbon attaching a phosphate group) in each strand. DNA-based Supercomputing, the Hitting-set Problem, DNA-based Algorithm, the NP-Complete Problems, Set Theory. DNA-based Computing [1, 2, 3, 4, 5, 6, 8] treats the DNA strands as the bits in the traditional digital computers, and use the techniques such as PCR (polymerase chain reaction), gel electrophoresis, and enzyme reactions to separate, concatenate, delete, and duplicate the DNA strands [11]. 1. Introduction DNA (deoxyribonucleic acid) is the main material of nucleus and could 1 Nowadays, we could produce roughly bio-molecular operations and are used to 18 10 DNA strands in a test tube [9]. It also means that we could represent 1018 bits of information. By the biological operations in the following section, we seem to have 1018 processors running in parallel. The massive power of parallelism could solve the most intractable problem in computer science so far [1, 4, 6, 7, 10, 12-20]. perform calculation and logical operations. So, bio-molecular programs can be regarded as the arithmetic logic unit of the von Neumann architecture. A robot is used to automatically control the operations of a tube (the memory and the input/output subsystem) and bio-molecular programs (the ALU). This implies that the robot can be regarded as the control unit of the von Neumann 2. Background architecture. A single DNA strand is chained from the 3 end (attaching a hydroxyl group) nucleotide to the next 5 end (attaching 3. A DNA-based Algorithm for Solving the Hitting-set Problem a phosphate group) nucleotide via a phosphate group, one by one nucleotide and form a single DNA strand. If the strand contains 20 nucleotides, we say it is 20 mer. For a double stranded DNA, 3.1 Definition of the Hitting-set Problem Informally, we are given a collection the length is counted by its base pairs. If a double stranded DNA has the base pairs 20, then we know it is made by two single DNA strands each has the length 20 mer [1, 5, 8, 9, 10]. C of subsets of a finite set S, and a positive integer K, and would like to find a subset S S with | S | K such that S contains at least one element from each subset in C. In other words, S hits (intersects) every subset in C. The formal definition is described below. In bio-molecular computing, data also are represented as binary patterns (a sequence of 0s and 1s). Those binary patterns are encoded by sequences of bio-molecules and are stored in a tube. This is to say that a tube is the only storage area in bio-molecular computing and is aslo the memory and the input/output subsystem of the von Neumann architecture. Bio-molecular programs are made of a set of Definition 3-1: Assume that a ground set S and a collection C of subsets {C1, C2, C3, … Cn} are given, where Ci is a subset of S and a positive integer K ≦ |S|. The problem is to find if there is some subset S of S such that | S | K and (Ci ∩ S ) ≠ , where i = 1, 2, 3, … n. 2 For example, S = {1, 2, 3, 4} and C = {{1, 2, 3}, {4}}. The hitting-set for S = {1, 2, 3, 4} and C = {{1, 2, 3}, {4}} consists of {1, 4}, {2, 4}, {3, 4}, {1, 2, 4}, {1, 3, 4}, {2, 3, 4} and {1, 2, 3, 4}. From definition 3-1, the answer of the hitting-set problem for S = {1, 2, 3, 4} and C = {{1, 2, 3}, {4}} contains {1, 4}, {2, 4} and {3, 4}. 0000 {2} 0010 {4} 1000 {1,3} 0101 {2,3} 0110 {3,4} 1100 {1,2,4} 1011 {2,3,4} 1110 subset Encoding number 3.2 Constructing Solution Space of DNA Sequences for the Hitting-set Problem Suppose that an n-digit binary number corresponds to each possible hitting-set and n is the number of elements of ground set S. The encoding scheme is: if the ith element appears in the subset, then the corresponding ith bit 0001 {3} 0100 {1,2} 0011 {1,4} 1001 {2,4} 1010 {1,2,3} 0111 {1,3,4} 1101 {1,2,3,4} 1111 3.3 The DNA Algorithm for Solving the Hitting-set Problem for the encoding number is 1, otherwise it is 0. In a real world implementing scheme, assume that an n-bit binary number Q is represented by a binary number z1, …, zn, where the value of zk Given that S is a finite set and C is a set of subsets, we define a literal zi1 to be a logical variable which is the ith element in the finite set S and is 1 since it appears in the subset S , and zi0 is also the ith element in the finite set S and is 0 since it does not appear in the subset S . is either 1 or 0 for 1 k n. A bit zk is the kth bit in an n-bit binary number Q and it represents the kth element in S. All possible subsets S of a ground set S = {1, 2, 3, 4} are shown in Table 3-1. The initial set T0 contains many strings, each encoding a single n-bit sequence. All possible 2n choices of subsets are encoded in the tube T0. Lipton does not define his biological operations clearly in [5], but his solution could be concluded in terms of the operations described by Adleman in [11] Table 3-1: each possible subsets S of a ground set S = {1, 2, 3, 4}. subset {1} Encoding number 3 which is discussed in this paper problem for an n-element set S and a subsection 2.1. The initial set T contains 2n encoding number, each encoding a single n-bit sequence representing a subset S . The pseudo-code algorithm of solving the hitting-set problem for an n-element set S and a collection of subset C will proceed as follows: collection of subset C. (0a) Append-head(T1, z11). (0b) Append-head(T2, z10). (0c) T = (T1, T2). (1) For k = 2 to n (1a) Amplify(T, T1, T2). (1b) Append-head(T1, zk1). (1c) Append-head(T2, zk0). (1) Create initial set T (2) For each subset do begin (1d) T = (T1, T2). (1e) EndFor (2) For a = 1 to |C| do begin (3) For b = 1 to |Ca| do begin (3) For each element in a subset do begin (4) if (the bth element in the ath subset in C is the ith element in S) (5) then begin (6) put the subset whose ith encoding bit is 1 on Tb; (7) put the (4) If (the bth element in the ath subset in C is the ith element in S) (5) then begin (6) Tb= +(T,zi1) subset whose ith encoding bit is 0 on T; (8) end (9) End for (10) Delete T (11) Create new set T by merging those extracted strings from Tb (12) End for (9) (10) Endfor Discard (T) (11) T (Tb ) Lemma 1: Algorithm 1 can be used to solve the hitting-set problem for an (13) If T is nonempty then T is the hitting-set. n-element set S and a collection of subset C. T = (T,zi1) end (7) (8) |Ca | b 1 (12) Endfor The above pseudo-code algorithm could be rewritten by means of using biological operations more formally: Proof: For 2n possible hitting-sets to an n-element set S and a collection of subset C bits, its solution space is produced from each execution for Steps Algorithm 1: Solving the hitting-set 4 (0a) through (1e). Step (2) is the outer is 1 on T1, and put the subset whose loop which is the number of subsets in C, and step (3) is the inner loop which is the number of elements in each subset in C. Each time the outer loop (step 2) is executed, the number of executions of the inner loop is the number of elements of that subset in the ath subset in C. Step (6) and step (7) say that we extract the subset whose zi is 1 and put it on test tube Tb, extract the subset whose zi is 0 rightmost encoding bit is 0 on T, so we get T1= {0001, 0011, 0101, 1001, 0111, 1011, 1101, 1111} and T = {0000, 0010, 0100, 1000, 0110, 1010, 1100, 1110}. Next, the second execution of Step (6) and Step (7) when a = 1 and b = 2, we get T2 = {0010, 0110, 1010, 1110} and T = {0000, 0100, 1000, 1100}. Then, the third execution of Step (6) and Step (7) when a = 1 and b = 3, we obtain T3 = and put it on test tube T. When the inner loop is ended, we discard T and merge all Tb into T. Repeat the outer loop in the {0100, 1100} and T = {0000, 1000}. Because the first outer loop is ended, the first execution of Step (10) is applied to same way. When all outer loops are ended, the hitting set is in test tube T. discard test tube T and the first execution of Step (11) is applied to merge test tube T1, T2, T3 into T, we get T = {0001, 0011, 0101, 1001, 0111, 1011, 1101, 1111, 0010, 0110, 1010, 1110, 0100, 1100}. From Algorithm 1, it is very clear that Steps (13) through (19) are used to figure out the number of one for those hitting-sets in T0. Next, Step (20) is the last loop and is used to find the answer. If the kth execution of Step (21) returns a “yes”, then Step (23) is applied to read the answer. Otherwise, repeat to execute Step (20) through Step (25) until the answer is found. For the second outer loop when a = 2 and b = 1, from the fourth execution of Step (6) and Step (7), we get T1 = {1001, 1011, 1101, 1111, 1010, 1110, 1100} and T = {0001, 0011, 0101, 0111, 0010, 0110, 0100}. Because the second subset has only one element, the second outer loop is ended also. The second execution of Step (10) is applied to discard T and 3.4 The Power of the DNA Algorithm for Solving the Hitting-set Problem the second execution of Step (11) is used to merge T1 into T. This implies the hitting set T = {1001, 1011, 1101, 1111, 1010, 1110, 1100}. The example for S = {1, 2, 3, 4} and C = {{1, 2, 3}, {4}} in subsection 3.1 is applied to show the power of Algorithm 1. The first execution of Step (6) and Step (7) when a = 1 and b = 1, we put the subset whose rightmost encoding bit 4. Discussion and Conclusion In this paper, we propose the 5 DNA-based algorithm to solve the series in Discrete Mathematics and hitting-set problem. Nowadays, many NP-complete problems which could not be solved by a traditional digital computer is now tried to be solved by DNA-based algorithm. Even so, it is still very difficult to support biological operations using mathematical instructions. In the future, there are still many difficulties to be overcome and we hope that DNA-based supercomputing Theoretical Computer Science, American Mathematical Society, pp. 1-29, 1999. [6] D. Boneh, C. Dunworth, R. J. Lipton and J. Sgall. “On the Computational Power of DNA”. Discrete Applied Mathematics, Special Issue on Computational Molecular Biology, Volume 71, pp. 79-94, 1996. [7] D. Boneh, C. Dunworth, and R. J. Lipton. “Breaking DES using a molecular computer”. In Proceedings of the 1st DIMACS could become a reality someday. References Workshop on DNA Based Computers, 1995, American Mathematical Society. In DIMACS Series in Discrete Mathematics and Theoretical Computer Science, Volume 27, pp. 37-66, 1996. [1] L. Adleman, “Molecular computation of solutions to combinatorial problems”, Science, 266:1021-1024, Nov. 11, 1994 [2] D. Beaver,” A Universal Molecular [4] R. J. Lipton. “DNA Solution of Hard Computational Problems”. Science, 268, pp. 542-545, 1995. [8] M. Amos, Theoretical and Experimental DNA Computation. Springer, 2005. [9] R. R. Sinden, DNA Structure and Function, Academic Press., 1994. [10] Leonard M. Adleman. On constructing a molecular computer. [11] R. S. Braich, C. Johnson, P. W. K. Rothemund, D. Hwang, N. Chelyapov, M. Leonard, and L. M. [5] S. Roweis, E. Winfree, R. Burgoyne, N. V. Chelyapov, M. F. Goodman, Paul W.K. Rothemund and L. M. Adleman. “A Sticker Based Model for DNA Computation”. 2nd annual workshop on DNA Computing, Princeton University. Eds. L. Landweber and E. Baum, DIMACS: Adleman, “Solution of a satisfiability problem on a gel-based DNA computer” in Proceedings of the Sixth International Conference on DNA Computation ( DNA 2000 ), Lecture Notes in Computer Science 2054, pp. 27-42,2001 Computer”, Penn State University Tech Report CSE-95-001. [3] R. P. Feynman. “In Minaturization”. D.H. Gilbert, Ed., Reinhold Publishing Corporation, New York, 1961, pp. 282-296. 6