A DNA-based Algorithm for the Solution of One-In-Three 3-SAT Problem Nung-Yue Shi and Chih-Ping Chu Department of Computer Science and Information Engineering, National Cheng Kung University, Tainan City 701, Taiwan, Republic of China E-mail: p7890112@ccmail.ncku.edu.tw Abstract and chucp@csie.ncku.edu.tw find all true assignment (3 SAT problem) and find One-In-Three (1IN3) solutions on DNA-based Satisfiability problem is given a Boolean formula, Supercomputing. The complexity of the presented and decide if a satisfying truth assignment exists. DNA-based solution is discussed in section three. And ( x12 x5 ) ( x 24 x3 x13 x 9 ) … the simulated experiment is applied to verify x12 ) ( x17 x8 x18 ) is an example of solving the One-In-Three 3-SAT problem in section ( correction of the proposed DNA-based algorithm for four. Boolean formula. k-SAT means that each clause has exactly k literals. One-In-Three (1IN3) 3-SAT Key Words: Satisfiability problem, 3-SAT problem, problem is defined by Garey and Johnson in 1979 as NP-complete problem, One-In-Three 3-SAT problem, follows: There is a set V of variables and a collection Molecular Solution, DNA-based Supercomputing. C of clauses over V such that each clause has 3 literals. And the question is : Is there a truth assignment for V 1. Introduction such that each clause in C has exactly one true literal? The words “computer” reminds us a lot of Note that the only difference with 3-SAT is that each electronic components like semiconductor memory, clause in C should be one of the three forms which CPU, hard disk …etc. But must the computer be this could be 100, 010, or 001 since it has three literals and way? Adleman propose a new concept of computation exaclly one true literal ( the other two literals are of in the molecular level at his paper in 1998 [1]. The course false ) . DNA molecules with their sequences of adenine, thymine, guanine and cytosine ( represented by the In this paper, we will use molecular solution to letters A, T, G, C ) can be used to store information and perform computation. But actually how is the 6. DNA synthesis. Nowadays, we could ask a biology related to computer science [5]? commercial DNA synthesis facility to make the DNA We could make a molecular computer with the sequence. Just in a few days, we will receive a test tube containing about 1018 molecules of DNA which tools as the following: is the sequence we ask. 1. Watson-Crick complements. Two strands of DNA will anneal to form a famous double helix if the The above six techniques is the basis of respective base meets its Watson-Crick complements Adleman-Lipton DNA computing modle. which are C matches G and A matches T. Of course, if From which, Adleman developed eight bio-molecular a molecule of DNA meets another DNA molecule instructions to perform bio-molecular programs. That which is not its complement, then they will not is what we mention next section. anneal. 2. Ligases. Ligases bond the splitted DNA molecule 2. background together. For example, DNA ligase will take two strands of DNA molecule and covalently connect A test tube contains molecules of DNA which is them into a single strand. In fact, ligase is used by the a finite set over the alphabet {A, C, G, T}, we could cell to repair the broken DNA strands. perform the following operations [2 10 ]: 1. Append-tail. Given a tube T and a binary digit xj, 3. Nucleases. Nucleases would cut nucleic acid of a the operation, "Append-tail", will append xj onto the DNA molecule. For example, nucleases would look end of every data stored in the tube T. The formal for a predetermined sequences of bases of a strand of representation for the operation is written as DNA molecules, if found, would cut the DNA strands "Append-tail(T, xj)". into two pieces. 2. Amplify. Given a tube T, the operation “Amplify(T, 4. Polymerases. Polymerases copy information from T1, T2)” will produce two new tubes T1 and T2 so that one DNA molecule into the other. Furthermore, DNA T1 and T2 are totally a copy of T (T1 and T2 are now polymerases identical) and T becomes an empty tube. will make a Watson-Crick complementary copy from a DNA strand template. In fact, if we tell it where to start—that is a primer 3. Merge. Given n tubes T1 Tn, the merge operation provided by a short piece of DNA strand, DNA is to merge data stored in any n tubes into one tube, polymerase will begin adding bases to the primer to without any change in the individual data. The formal create a complementary copy of the template. representation for the merge operation is written as "(T1, , Tn)", where (T1, , Tn) = T1 Tn. 5. Gel electrophoresis. A solution of DNA molecules is placed in one end of gel, and we applied electric 4. Extract. Given a tube T and a binary digit xk, the current to the gel. This process separates DNA strands extract operation will produce two tubes +(T, xk) and by length. (T, xk), where +(T, xk) is all of the data in T which contain xk and (T, xk) is all of the data in T which do not contain xk. called Boolean satisfiability problem whose instance is a Boolean expression written using only AND, OR, 5. Detect. Given a tube T, the detect operation is used NOT, variables and parantheses. A more formal to check whether any a data is included in T or not. If definition of satisfiability problem is: There is a set U at least one data is included in T we have “yes”, and if of variables and a collection C of clauses over U, is no data is included in T we have “no“. The formal there a satisfying truth assignment for C? representation for the operation is written as The problem remains NP-complete even if all “Detect(T)“. expressions are written in conjunctive normal form with 3 variables per clause (3-CNF), yielding the 6. Discard. Given a tube T, the discard operation will 3-SAT problem. 3-satisfiability is a special case of discard T. The formal representation for the operation k-satisfiability (k-SAT) when each clause contains is written as “Discard(T)“. 7. Read. Given a tube T, the read operation is used to describe any a data, which is contained in T. Even if T exactly k=3 literals. For example, E = ( x1 x2 x3 ) ( x1 x3 x 4 ). Note that each contains many different data, the operation can give clause has exactly 3 literals, that is why we call it an explicit description of exactly one of them. The 3-SAT. One-In-Three (1in3) 3-SAT problem is defined formal representation for the operation is written as as follows. “read(T)“. Definition 3-1: 8. Append-head. Given a tube T and a binary digit xj, Instance: A set V of logical variables and a collection the operation, "Append-head", will append xj onto the C of clauses over V such that each clause head of every data stored in the tube T. The formal has 3 literals. representation for the operation is written as Question: Is there a truth assignment for V such that “Append-head(T, xj) “. 3. Molecular solution of One-In-Three (1IN3) 3-SAT problem each clause in C has exactly one true literal? For example, V = ( x1, x2, x3, x4 ) and C = ( x1 x 2 x3 ) ( x1 x3 x 4 ). Suppose x1 is 3.1 Definition of One-In-Three (1IN3) the leftmost bit and x4 is the rightmost bit, we could 3-SAT problem find all truth assignments are { 1000, 1001, 1010, 1011, 1100, 1101, 1110, 1111, 0000, 0001, 0100, 0101, Satisfiability is the first NP-complete problem which 0011}. There are only 3 assignments ( { 1110, 0100, determine if the variables of a given Boolean formula 0011 } ) such that each clause in C has exactly one can be assigned in such a way that it makes the true literal. formula evaluate to be true. If there is no such 0011 }. We get final answer T = { 1110, 0100, assignment found, we say that the function is unsatisfiable, otherwise it is satisfiable. Satisfiability problem is also a decision problem which is also 3.2 Generate DNA-based algorithm to solve One-In-Three (1IN3) 3-SAT problem (11) Given that x1, x2, x3, x4 are 4 logical variables For a = 1 to |C| do begin (12) For b = 1 to |Ca| do begin (13) If vba = xj then begin x 2 x3 ) (14) Tb = +(T, vba=1) (15) T = (T, vba=1) ( x1 x3 x 4 ). We also define vba is a (16) (17) else begin logical variable which is the xbth bit in the ath clause. (18) Tb = +(T, vba = 0) Suppose that x1 is the leftmost bit and x4 is the (19) T = (T, vba = 0) rightmost bit. Basically our algorithm contains 2 (20) blocks of codes, the first block will produce solution (21) spaces and generate truth assignments of the 3-SAT (22) and f ( x1, x2, x3, x4 ) = C = ( x1 end End for Discard (T) |Ca | problem [2]. The second block is moreover separated (23) into three parts. end First part would collect truth T (Tb ) b 1 assignments which make the clause be “100”. Second (24) part would collect truth assignments which make the (25) * second block clause be “010”. Third (26) *first part- truth assignment which make * part would collect truth Endfor * assignments which make the clause be “001”. Then go (27) * clause “100” is in T1 * to the second round (for a = 2, that is, the second (28) clause ), and do the same thing again. After all (29) clauses are done, the answer is left on T such that each (30) T1 = +(T, v1a = 1) clause has exactly one true literal. (31) T2 = (T, v1a = 1) We propose the following For a = 1 to |C| do begin If v1a = xj then begin DNA-based (32) end algorithm to solve One-In-Three (1IN3) 3-SAT (33) else begin problem. (34) T1 = +(T, v1a = 0) Algorithm 1: Solving One-In-Three (1IN3) 3-SAT (35) T2 = (T, v1a = 0) problem for n logical variables and a collection C of (36) end clauses over n (37) If v2a = xj then begin (38) T3 = (T1, v2a = 0) T1 = +(T1, v2a = 0) (1) * first block (generate truth assignments) * (39) (2) Append-tail(T1, z11). (40) (3) Append-tail(T2, z10). (41) (4) T = (T1, T2). (42) T3 = (T1, v2a = 1) (5) For k = 2 to n (43) T1 = +(T1, v2a = 1) (6) Amplify(T, T1, T2). (44) end (7) Append-tail(T1, zk1). (45) (8) Append-tail(T2, zk0). (46) T4 = (T1, v3a = 0) (9) T = (T1, T2). (47) T1 = +(T1, v3a = 0) (10) EndFor (48) end else begin If v3a = xj then begin end (49) else begin (88) T9 = +(T, v1a = 1) T10 = (T, v1a = 1) (50) T4 = (T1, v3a = 1) (89) (51) T1 = +(T1, v3a = 1) (90) (52) end (91) (53) T = (T2, T3, T4) (92) T11 = (T9, v2a = 0) (54)*second part-truth assignment which make* (93) T9 = +(T9, v2a = 0) (55) * clause “010” is in T5 * (94) (56) If v1a = xj then begin end If v2a = xj then begin end (95) else begin (57) T5 = +(T, 0) (96) T11 = (T9, v2a = 1) (58) T6 = (T, v1a = 0) (97) T9 = +(T9, v2a = 1) (98) end v1a = (59) end If v3a = xj then begin (60) else begin (99) (61) T5 = +(T, v1a = 1) (100) T12 = (T9, v3a = 1) (62) T6 = (T, v1a = 1) (101) T9 = +(T9, v3a = 1) (63) end (102) end v2a = xj then begin (103) (65) T7 = (T5, v2a = 1) (104) T12 = (T9, v3a = 0) (66) T5 = +(T5, v2a = 1) (105) T9 = +(T9, v3a = 0) (106) end (64) If (67) end (68) else begin (107) else begin Discard (T) (70) T7 = (T5, v2a = 0) (108) (69) T5 = +(T5, v2a = 0) (109) Endfor (71) end (110) EndAlgorithm (72) If v3a = xj then begin (74) T8 = (T5, v3a = 0) (73) T5 = +(T5, v3a = 0) (75) end (76) else begin (77) T8 = (T5, v3a = 1) (78) T5 = +(T5, v3a = 1) (79) end (80) T = (T6, T7, T8) T = ( T1, T5, T9 ) The answer is in test tube T. 3.3 The Power of the DNA Algorithm to Solve One-In-Three (1IN3) 3-SAT problem The example for V = ( x1, x2, x3, x4 ) and f ( x1, x2, x3, x4 ) = C = ( x1 x 2 x3 ) ( x1 x3 (81) *third part-truth assignment which make* x 4 ) in subsection 3.1 is applied to show the power (82) * clause “001” is in T9 * of Algorithm 1. The first execution of Step (13) (83) If v1a = xj then begin through Step (20) when a = 1 and b = 1, we put the (84) T9 = +(T, v1a = 0) subset whose leftmost encoding bit is 1 on T1, and put (85) T10 = (T, v1a = 0) the subset whose leftmost encoding bit is 0 on T, so (86) (87) end else begin we get T1= {1000, 1001, 1010, 1011, 1100, 1101, 1110, 1111 } and T = { 0000, 0001, 0010, 0011, 0100, 0101, 0110, 0111 }. Next, the second execution of Step (13) 1000, 1001, 1101 } and T2 = {0000, 0001,0100, 0101, through Step (20) when a = 1 and b = 2, we get T2 = 0011}. Next, after the execution of step (37) through { 0000, 0001, 0010, 0011 } and T = { 0100, 0101, Step (44), we get T3 = {1010, 1011, 1000, 1001 } and 0110, 0111}. Then, the third execution of Step (13) T1 = {1110 , 1111, 1100 , 1101}. Then, after the through Step (20) when a = 1 and b = 3, we obtain T3 execution of step (45) through Step (52), we obtain T4 = {0100, 0101} and T = { 0100, 0111}. Because the = {1100, 1101} and T1 = {1110, 1111}. The first first outer loop is ended, the first execution of Step (22) execution of Step (53) is applied to merge test tube T2, is applied to discard test tube T and the first execution T3, T4 into T, we get T = {0000, 0001, 0100, 0101, of Step (23) is applied to merge test tube T1, T2, T3 0011, 1010, 1011, 1100, 1000, 1001, 1101 }. into T, we get T = {1000, 1001, 1010, 1011, 1100, 1101, 1110, 1111, 0000, 0001, 0010, 0011, 0100, 0101}. Then, after the execution of step (56) through step (63), we get T5 = {0000, 0001, 0100, 0101, 0011 } For the second outer loop of The forth execution and T6 = { 1010, 1011, 1100, 1000, 1001, 1101 }. of Step (13) through Step (20) when a = 2 and b = 1, After the execution of step (64) through step (71), we we put the subset whose leftmost encoding bit is 1 on get T7 = { 0100, 0101 } and T5 = { 0000, 0001, 0011 }. T1, and put the subset whose leftmost encoding bit is 0 After the execution of step (72) through step (79), we on T, so we get T1= {1000, 1001, 1010, 1011, 1100, get T8 = {0000, 0001} and T5 = {0011}. At step (80), 1101, 1110, 1111 } and T = { 0000, 0001, 0010, 0011, we merge T6, T7, T8 into T and get T = {1010, 1011, 0100, 0101, }. Next, the fifth execution of Step (13) 1100, 0100, 0101, 0000, 0001, 1000, 1001, 1101 }. through Step (20) when a = 2 and b = 2, we get T2 = Then, after the execution of step (83) through step { 0000, 0001, 0100, 0101 } and T = { 0010, 0011 }. (90), we get T9 = { 0100, 0101, 0000, 0001 } and T 10 Then, the sixth execution of Step (13) through Step = {1010, 1011, 1100, 1000, 1001, 1101 }. After the (20) when a = 2 and b = 3, we obtain T3 = {0011} and execution of step (91) through step (98), we get T 11 = T = { 0010 }. Because the second outer loop is ended, { 0000, 0001 } and T9 = { 0100, 0101 }. After the the second execution of Step (22) is applied to discard execution of step (99) through step (106), we get T 12 = test tube T and the second execution of Step (23) is and T9 = {0100, 0101 }. At step (107), we discard T. applied to merge test tube T1, T2, T3 into T, we get T = At step (108), we merge T1, T5, T9 into T and get T = {1000, 1001, 1010, 1011, 1100, 1101, 1110, 1111, { 1110, 1111, 0011, 0100, 0101 } which is the answer 0000, 0001, 0100, 0101, 0011 } are the truth when a = 1. assignments. For the second loop when a = 2, after the The truth assignments T = {1000, 1001, 1010, execution of step (29) through step (36), we get T 1 = 1011, 1100, 1101, 1110, 1111, 0000, 0001, 0100, 0101, {1110, 1111 } and T2 = { 0011, 0100, 0101 }. After 0011 }. We keep on tracing second block of codes the execution of step (37) through step (44), we get T 3 which would find the truth assignment such that each = and T1 = { 1110, 1111 }. After the execution of clause has exactly one true literal. The first execution step (45) through step (52), we get T 4 = {1111 } and of Step (29) through Step (36) when a = 1, we put the T1 = { 1110 }. The second execution of Step (53) is assignment whose leftmost encoding bit is 1 on T1, applied to merge test tube T2, T3, T4 into T, we get T = and put the assignment whose leftmost encoding bit is {0011, 0100, 0101, 1111 }. 0 on T2, so we get T1= {1110, 1111, 1010, 1011, 1100, Then, after the execution of step (56) through step (63), we get T5 = {0011, 0100, 0101 } and T 6 = use 2*9*p = (18p) “extraction” operations, (p ) { 1111 }. After the execution of step (64) through step “discard” operations, and (3p ) merge operations. (71), we get T7 = {0011 } and T5 = { 0100, 0101 }. Therefore, from the analysis above, it is inferred that After the execution of step (72) through step (79), we the time complexity of Algorithm 1 is O(24p) with get T8 = {0101 } and T5 = {0100 }. The second “extract” operation, O(2p) with “discard” operation, execution of Step (80) is applied to merge test tube T6, O(2n) with “append” operation, T7, T8 into T, we get T = { 1111, 0011, 0101 }. “merge” operation, and O(n-1) with “amplify” Then, after the execution of step (83) through step O(n+4p) with operation in the Adleman-Lipton model. (90), we get T9 = { 0011, 0101 } and T 10 = { 1111 }. Theorem 4-2: A set V of n logical variables and a After the execution of step (91) through step (98), we collection C of clauses which are {C1, C2, …, Cp} get T11 = {0101 } and T9 = { 0011 }. After the over n. The One-In-Three (1IN3) 3-SAT problem for execution of step (99) through step (106), we get T 12 = C and V can be solved with O( 2n ) library strands in and T9 = { 0011}. And then we discard T and merge the Adleman-Lipton model. T1, T5, T9 into T which is the final answer = T = {1110, Theorem 4-3: A set V of n logical variables and a 0100, 0011 }. collection C of clauses which are {C1, C2, …, Cp} over n. The One-In-Three (1IN3) 3-SAT problem for C and V can be solved with O( n ) tubes in the Adleman-Lipton model. 4. The Complexity of Algorithm 1 Theorem 4-4: A set V of n logical variables and a The following theorems describe time complexity and collection C of clauses which are {C1, C2, …, Cp} volume complexity of Algorithm 1, numbers of test over n. The One-In-Three (1IN3) 3-SAT problem for tube used and the longest library strand in solution C and V can be solved with the longest library strand, space in Algorithm 1. O( 15*n +15*p ), in the Adleman-Lipton model. Theorem 4-1: A set V of n logical variables and a collection C of clauses which are {C1, C2, …, Cp} 5.Simulated Experimental Results over n. The One-In-Three (1IN3) 3-SAT problem for C and V can be solved in O(24p) with “extract” operation, O(2p) with “discard” operation, O(2n) with “append” operation, O(n+4p) with “merge” operation, and O(n-1) with “amplify” operation in the Adleman-Lipton model. Consider the example for V = ( x1, x2, x3, x4 ) and f ( x1, x2, x3, x4 ) = C = ( x1 x 2 x3 ) ( x1 x3 x 4 ) in subsection 3.1, DNA sequences Proof: generated by Adleman’s program are shown in table Algorithm 1 can be applied to solve the One-In-Three 5-1. Adleman program is also used to calculate the (1IN3) 3-SAT problem for C and V. From the first enthalpy, entropy, and free energy for binding of each block of codes in algorithm 1, it is obvious that we use probe to its corresponding region on a library strand. 2*3*p = (6p) “extraction” operations, (p ) “discard” The energy used are shown in table 5-2. operations, ( 2*n ) “append” operations and ( n+p ) “merge” operations, and ( n-1) “amplify” operation. Table 5-1: Sequences chosen to represent xk1 and xk0 in From the second block of codes in Algorithms 1, we the example for V = ( x1, x2, x3, x4 ) and f ( x1, x2, x3, x4 ) = C = ( x1 x 2 x3 ) ( x1 x3 x 4 ) Standard 3.86073 10.5438 1.3245 deviation in subsection 3.1. Bit 5 3 DNA Sequence The library strands are shown in table 5-4 and x 11 CATTCACAAACAATT represent every possible truth assignments such that 0 TCATTCTCAACAAAA each clause has exactly one. x 21 CTCTATTCCTCTCAA x 20 ACACCCTCTAATCTA Table 5-4 : DNA sequences chosen to represent x 31 CATTCACAAACAATT answers in test tube T. x 30 TCCTATTTAACTCCC {0011} x 41 CTCTATTCCTCTCAA TCATTCTCAACAAAA ACACCCTCTAATCTA x 40 TATAACTTTCTCTCT CATTCACAAACAATT CTCTATTCCTCTCAA- 3 x1 5 - 3 Table 5-2: The energy for binding each probe to its AGTAAGAGTTGTTTT TGTGGGAGATTAGAT corresponding region on a library strand. GTAAGTGTTTGTTAA GAGATAAGGAGAGTT- 5 Entropy Enthalpy Bit Free energy energy energy (H) {0100} (G) TCATTCTCAACAAAA CTCTATTCCTCTCAA (S) x 11 109.6 285.7 24.4 x 10 104.5 267.5 24.3 x 21 114.2 295.4 26 x 20 102.6 261.2 24.4 x 31 103.9 273.3 22.1 x 30 103.2 265.5 24 x 41 102.4 270.7 21.5 x 40 105.1 271.8 23.9 The program also figured out the average and standard deviation for the enthalpy, entropy and free energy 5 TCCTATTTAACTCCC TATAACTTTCTCTCT- 3 3 AGTAAGAGTTGTTTT GAGATAAGGAGAGTT AGGATAAATTGAGGG ATATTGAAAGAGAGA- 5 {1110} 5 CATTCACAAACAATT CTCTATTCCTCTCAA CATTCACAAACAATT TATAACTTTCTCTCT- 3 3 GTAAGTGTTTGTTAA GAGATAAGGAGAGTT GTAAGTGTTTGTTAA ATATTGAAAGAGAGA- 5 over all probe/library strand interaction. The energy levels are shown in table 5-3. 6. Discussion and Conclusion Table 5-3: The energy over all probe/library strand In this paper, we propose the DNA-based interactions. Average Enthalpy Entropy energy energy (H) (S) 105.688 273.887 algorithm to solve the One-In-Three (1IN3) 3-SAT Free energy problem. Nowadays, many NP-complete problems (G) which could not be solved by a traditional digital 23.825 computer is now tried to be solved by DNA-based algorithm. Even so, it is still very difficult to support Journal of Computational Science and biological operations using mathematical instructions. Engineering, Volume 2, Number 1-2, 2006, pp. In the future, there are still many difficulties to be 72 – 80. overcome and we hope that DNA-based supercomputing could become a reality someday. [9] Weng-Long Chang, “Fast Parallel DNA-based Algorithms for Molecular Computation: the Set-Partition Problem”, IEEE Transactions on References Nanobioscience, Vol. 6, No. 1, 2007, pp 346 353. [1] L. Adleman, “Computing with DNA”, Scientific American, August, 1998 clique problem and the vertex cover problem in [2] M. Amos, Theoretical and Experimental DNA Computation. Springer, 2005 computer”, in DNA-bsed computers, volume 27 of DIMACS R. J. Lipton. “DNA Solution of Hard Computational Problems”. Science, 268, pp. 542-545, 1995. [5] Sun-Yuan Hsieh, Chao-Wen Huang,and Hsin-Hung Chou, “A DNA-based graph encoding scheme with its applications to graph isomorphism problems “, Applied Adleman-Lipton’s model”, in Proceedings of IASTED International Conference, Networks, [3] L. Adleman, “On constructing a molecular [4] [10]Weng-Long Chang and Minyi Guo, ”Solving the Mathematics and Computation, Volume 203, Issue 2, 15 September 2008, Pages 502-512 [6] Weng-Long Chang, Michael Ho, and Minyi Guo, "Molecular Solutions for the Subset-sum Problem on DNA-based Supercomputing", BioSystems (Elsevier Science), Vol. 73, No. 2, 2004, pp. 117-130. [7] Weng-Long Chang, Minyi Guo, and Michael Ho, "Towards solution of the set-splitting problem on gel-based DNA computing", Future Generation Computer Systems, Volume: 20, Issue: 5, June 15, 2004, pp. 875-885. [8] Weng-Long Chang, Michael Ho, Minyi Guo, Chengfei Liu, “Fast Parallel Bio-molecular Solutions: the Set-basis Problem”, International Parallel and Distributed Processing, and Applications, pp. 431-436, 2002