A DNA-based Algorithm for the Solution of Not-All-Equal 3-SAT Problem Speaker :Nung-Yue Shi 2016/7/13 1 Outline • Introduction – – • • Research motivation DNA computing Background Molecular solution of Not-All-Equal (NAE) 3-SAT problem – – – 2016/7/13 Definition of Not-All-Equal (NAE) 3-SAT problem Generate DNA-based algorithm to solve Not-All-Equal (NAE) 3-SAT problem The Power of the DNA Algorithm to Solve Not-AllEqual (NAE) 3-SAT problem 2 • • • The Complexity of Algorithm 1 Simulated Experimental Results Discussion and Conclusion 2016/7/13 3 1.introduction • Research motivation: DNA is a basic inheritance medium for all living cells. The main idea of DNA computing is to encode data in a DNA strand form, and use biooperation to manipulate DNA strands in a test tube to simulate arithmetical and logical operations. It is estimated that about 1018 DNA strands could operate 104 times faster than the speed of a today’s advanced supercomputer 2016/7/13 4 • Let us see another data, while modern supercomputers perform 1012 operations per second, Adleman estimates 1020 operations per second for molecular instructions to be realistic. 2016/7/13 5 • Similar impressive views concern the consumption of energy and the capacity of memory: A supercomputer needs one joule for 109 operations, while the same energy is sufficient to perform 2*1019 ligation operations . On a video tape, every bit needs 1012 cubic nanometers storage, whereas DNA stores information with a density of one bit per cubic nanometer . 2016/7/13 6 • This research has been motivated by the benefit and the application of DNA computing and gives new methods to solve Not-All-Equal 3-SAT problems which are NP-complete. 2016/7/13 7 DNA computing • The molecular computation was first proposed in 1961 by Feynman , while his idea had not been tested experimentally until 1994 when Adleman successfully solved an instance of the Hamiltonian path problem in a test tube by DNA strands. 2016/7/13 8 • Lipton in 1995 also demonstrated the DNA-solution could be used to solve satisfiability problem which is the first NPcomplete problem. Adleman and coauthors (Roweis et al.) in 1999 proposed sticker for enhancing the Adleman-Lipton model . In 2000, Adleman and his coauthors (Braich et al.) chose to solve a 6variable 11-clause formula on the 3-SAT problem 2016/7/13 9 • Moreover, in 2002, Adleman and his coauthors (Braich et al.) performed experiments to solve a 20-variable 24clause formula on the 3-SAT problem . 2016/7/13 10 Background • We could make a molecular computer with the tools as the following: • 1. Watson-Crick complements. Two strands of DNA will anneal to form a famous double helix if the respective base meets its Watson-Crick complements which are C matches G and A matches T. Of course, if a molecule of DNA meets another DNA molecule which is not its complement, then they will not anneal. 2016/7/13 11 • 2. Ligases. Ligases bond the splitted DNA molecule together. For example, DNA ligase will take two strands of DNA molecule and covalently connect them into a single strand. In fact, ligase is used by the cell to repair the broken DNA strands. 2016/7/13 12 • 3. Nucleases. Nucleases would cut nucleic acid of a DNA molecule. For example, nucleases would look for a predetermined sequences of bases of a strand of DNA molecules, if found, would cut the DNA strands into two pieces. 2016/7/13 13 • 4. Polymerases. Polymerases copy information from one DNA molecule into the other. Furthermore, DNA polymerases will make a Watson-Crick complementary copy from a DNA strand template. In fact, if we tell it where to start—that is a primer provided by a short piece of DNA strand, DNA polymerase will begin adding bases to the primer to create a complementary copy of the template. 2016/7/13 14 • 5. Gel electrophoresis. A solution of DNA molecules is placed in one end of gel, and we applied electric current to the gel. This process separates DNA strands by length. 2016/7/13 15 • 6. DNA synthesis. Nowadays, we could ask a commercial DNA synthesis facility to make the DNA sequence. Just in a few days, we will receive a test tube containing about 1018 molecules of DNA which is the sequence we ask. • The above six techniques is the basis of Adleman-Lipton DNA computing modle. • From which, Adleman developed eight biomolecular instructions to perform bio-molecular programs. 2016/7/13 16 • A test tube contains molecules of DNA which is a finite set over the alphabet {A, C, G, T}, we could perform the following operations • 1. Append-tail. Given a tube T and a binary digit xj, the operation, "Append-tail", will append xj onto the end of every data stored in the tube T. The formal representation for the operation is written as "Append-tail(T, xj)". 2016/7/13 17 • 2. Amplify. Given a tube T, the operation “Amplify(T, T1, T2)” will produce two new tubes T1 and T2 so that T1 and T2 are totally a copy of T (T1 and T2 are now identical) and T becomes an empty tube. 2016/7/13 18 • 3. Merge. Given n tubes T1 Tn, the merge operation is to merge data stored in any n tubes into one tube, without any change in the individual data. The formal representation for the merge operation is written as "(T1, , Tn)", where (T1, , Tn) = T1 Tn. 2016/7/13 19 • 4. Extract. Given a tube T and a binary digit xk, the extract operation will produce two tubes +(T, xk) and (T, xk), where +(T, xk) is all of the data in T which contain xk and (T, xk) is all of the data in T which do not contain xk. After Extract biological operation is completed, test tube T becomes an empty tube. 2016/7/13 20 • 5. Detect. Given a tube T, the detect operation is used to check whether any a data is included in T or not. If at least one data is included in T we have “yes”, and if no data is included in T we have “no“. The formal representation for the operation is written as “Detect(T)“ 2016/7/13 21 • 6. Discard. Given a tube T, the contents of T are discarded, and T is replaced by a new, empty tube. The formal representation for the operation is written as “Discard(T)“. 2016/7/13 22 • 7. Read. Given a tube T, the read operation is used to describe any a data, which is contained in T. Even if T contains many different data, the operation can give an explicit description of exactly one of them. The formal representation for the operation is written as “read(T)“. 2016/7/13 23 • 8. Append-head. Given a tube T and a binary digit xj, the operation, "Appendhead", will append xj onto the head of every data stored in the tube T. The formal representation for the operation is written as “Append-head(T, xj) “ 2016/7/13 24 • Satisfiability is the first NP-complete problem which determine if the variables of a given Boolean formula can be assigned in such a way that it makes the formula evaluate to be true. • The problem remains NP-complete even if all expressions are written in conjunctive normal form with 3 variables per clause (3-CNF), yielding the 3-SAT problem. 3-satisfiability is a special case of k-satisfiability (k-SAT) when each clause contains exactly k=3 literals. 2016/7/13 25 • For example, E = (x1∨⌐x2 ∨⌐x3) ∧ (x1∨⌐x3∨ x4). Note that each clause has exactly 3 literals, that is why we call it 3-SAT. Not-All-Equal (NAE) 3SAT problem is defined as follows. • Definition 3-1: • Instance: A set V of logical variables and a collection C of clauses over V such that each clause has 3 literals. • Question: Is there a truth assignment for V such that each clause has at least one true and at least one false literal? 2016/7/13 26 Generate DNA-based algorithm to solve Not-All-Equal (NAE) 3-SAT problem • Define a binary digit zk1 to be the kth bit (count from the leftmost side) which is 1 and zk0 to be the kth bit (count from the leftmost side) which is 0. |C| are numbers of clauses. |Ca| are numbers of elements of the Cath clause. We also define vba is a logical variable which is the xbth bit in the ath clause. 2016/7/13 27 • Suppose that x1 is the leftmost bit and x4 is the rightmost bit. • Basically our algorithm contains 2 blocks of codes, the first block will generate truth assignments of the 3-SAT problem • The second block deletes the truth assignments which make one of the clauses all 1’s. Note that no truth assignment would make any one of the clauses all 0’s. • Because if one of the clauses are all 0’s, then it is unsatisfiable. 2016/7/13 28 Algorithm 3-1: Solving Not-All-Equal (NAE) 3-SAT problem for n logical variables and a collection C of clauses over n • • • • • • • • • • • • • • • • • • • • • • • • (1) * first block (Not-All-Equal-0) (generate truth assignments) * (2) Append-tail(T1, z11). (3) Append-tail(T2, z10). (4) T = (T1, T2). (5) For k = 2 to n (6) Amplify(T, T1, T2). (7) Append-tail(T1, zk1). (8) Append-tail(T2, zk0). (9) T = (T1, T2). (10) EndFor (11) For a = 1 to |C| do begin (12) For b = 1 to |Ca| do begin (13) If vba = xj then begin (14) Tb = +(T, vba=1) (15) T = (T, vba=1) (16) end (17) else begin (18) Tb = +(T, vba = 0) (19) T = (T, vba = 0) (20) end (21) End for (22) Discard (T) (23) union all Tb (24) Endfor 2016/7/13 29 • • • • • • • • • • • • • • • • (25) * second block (Not-All-Equal-1) * (26) For a = 1 to |C| do begin (27) For b = 1 to |Ca| do begin (28) If vba = xj then begin (29) Tb = +(T, vba = 0) (30) T = (T, vba = 0) (31) end (32) else begin (33) Tb = +(T, vba = 1) (34) T = (T, vba = 1) (35) end (36) End for (37) Discard (T) (38) union all Tb (39) Endfor (40) EndAlgorithm 2016/7/13 30 • In the development of this research, I also find that the first block of codes is equal to delete the assignments which make any one of the clauses all 0’s (By definition). So, we have two block of codes in our DNA algorithm, first block is generate the truth assignment (or we can say that “ delete the assignment which make any one of the clauses all 0’s ”), second block is to delete the truth assignment which make any one of the clauses all 1’s. 2016/7/13 31 The Complexity of Algorithm 3-1 • Theorem 3-1: A set V of n logical variables and a collection C of clauses which are {C1, C2, …, Cp} over n. The NotAll-Equal (NAE) 3-SAT problem for C and V can be solved in O(12p) with “extract” operation, O(2p) with “discard” operation, O(2n) with “append” operation, O(n+2p) with “merge” operation, and O(n-1) with “amplify” operation in the Adleman-Lipton model. 2016/7/13 32 • Proof: • Algorithm 1 can be applied to solve the Not-All-Equal (NAE) 3-SAT problem for C and V. From the first block of codes in algorithm 1, it is obvious that we use 2*3*p = (6p) “extraction” operations, (p ) “discard” operations, ( 2*n ) “append” operations and ( n+p ) “merge” operations, and ( n-1) “amplify” operation. From the second block of codes in Algorithms 1, we use 2*3*p = (6p) “extraction” operations, (p ) “discard” operations, and (p ) merge operations. Therefore, from the analysis above, it is inferred that the time complexity of Algorithm 1 is O(12p) with “extract” operation, O(2p) with “discard” operation, O(2n) with “append” operation, O(n+2p) with “merge” operation, and O(n-1) with “amplify” operation in the Adleman-Lipton model. 2016/7/13 33 Simulated Experimental Results • Adleman and his coworkers devise a scheme to design DNA sequences for a combinatorial library encoding strings of zeros and ones • introduce seven constraints to ease the probe-library hybridization by reducing secondary structure in the DNA molecules . The constraints are: 2016/7/13 34 • (1). Library strands contain only As, Ts, and Cs. • (2).Every library and probe sequence has no runs of more than 4 As, 4 Ts, 4 Cs or 4Gs. • (3). Every probe sequence has fewer than 4 mismatches with any 15 base • alignment of any library strand (except for at its matching bit-value). • (4). No 15 base section of a library strand has fewer than 4 mismatches • with any 15 base alignment of itself or any other library strand. • (5). No 15 base probe has a run of more than 7 matches with any 8 base • alignment of any library strand (except for at its matching bit-value). • (6). No library strand has a run of more than 7 matches with any 8 base • alignment of itself or any other library strand. • (7). Every probe has 4, 5, or 6 Gs in its sequence. 2016/7/13 35 Sequences chosen to represent xk1 and xk0 in the example for V = ( x1, x2, x3, x4 ) and f ( x1, x2, x3, x4 ) = C = (x1∨⌐x2 ∨⌐x3) ∧ (x1∨⌐x3∨ x4). in subsection 3.1. Bit 53 x11 2016/7/13 DNA Sequence CATTCACAAACAATT x10 TCATTCTCAACAAAA x21 CTCTATTCCTCTCAA x20 ACACCCTCTAATCTA x31 TCTCCCTATCTATTT x30 TCCTATTTAACTCCC x41 CTCTACTCAAAATAA x40 TATAACTTTCTCTCT 36 Discussion and conclusion • In this paper, we propose the DNA-based algorithm to solve the Not-All-Equal (NAE) 3SAT problem. Nowadays, many NP-complete problems which could not be solved by a traditional digital computer is now tried to be solved by DNA-based algorithm. Even so, it is still very difficult to support biological operations using mathematical instructions. In the future, there are still many difficulties to be overcome and we hope that DNA-based supercomputing could become a reality someday. 2016/7/13 37