A DNA-based Algorithm for the Solution of Not-All-Equal 3-SAT Problem

advertisement
A DNA-based Algorithm for the
Solution of Not-All-Equal 3-SAT
Problem
Speaker :Nung-Yue Shi
2016/7/13
1
Outline
•
Introduction
–
–
•
•
Research motivation
DNA computing
Background
Molecular solution of Not-All-Equal (NAE) 3-SAT
problem
–
–
–
2016/7/13
Definition of Not-All-Equal (NAE) 3-SAT problem
Generate DNA-based algorithm to solve Not-All-Equal
(NAE) 3-SAT problem
The Power of the DNA Algorithm to Solve Not-AllEqual (NAE) 3-SAT problem
2
•
•
•
The Complexity of Algorithm 1
Simulated Experimental Results
Discussion and Conclusion
2016/7/13
3
1.introduction
• Research motivation:
DNA is a basic inheritance medium for all living
cells. The main idea of DNA computing is to
encode data in a DNA strand form, and use biooperation to manipulate DNA strands in a test
tube to simulate arithmetical and logical
operations. It is estimated that about 1018 DNA
strands could operate 104 times faster than the
speed of a today’s advanced supercomputer
2016/7/13
4
• Let us see another data, while modern
supercomputers perform 1012 operations
per second, Adleman estimates 1020
operations per second for molecular
instructions to be realistic.
2016/7/13
5
• Similar impressive views concern the
consumption of energy and the capacity of
memory: A supercomputer needs one
joule for 109 operations, while the same
energy is sufficient to perform 2*1019
ligation operations . On a video tape,
every bit needs 1012 cubic nanometers
storage, whereas DNA stores information
with a density of one bit per cubic
nanometer .
2016/7/13
6
• This research has been motivated by the
benefit and the application of DNA
computing and gives new methods to
solve Not-All-Equal 3-SAT problems
which are NP-complete.
2016/7/13
7
DNA computing
• The molecular computation was first
proposed in 1961 by Feynman , while his
idea had not been tested experimentally
until 1994 when Adleman successfully
solved an instance of the Hamiltonian path
problem in a test tube by DNA strands.
2016/7/13
8
• Lipton in 1995 also demonstrated the
DNA-solution could be used to solve
satisfiability problem which is the first NPcomplete problem. Adleman and coauthors (Roweis et al.) in 1999 proposed
sticker for enhancing the Adleman-Lipton
model . In 2000, Adleman and his coauthors (Braich et al.) chose to solve a 6variable 11-clause formula on the 3-SAT
problem
2016/7/13
9
• Moreover, in 2002, Adleman and his coauthors (Braich et al.) performed
experiments to solve a 20-variable 24clause formula on the 3-SAT problem .
2016/7/13
10
Background
• We could make a molecular computer with the
tools as the following:
• 1. Watson-Crick complements. Two strands of
DNA will anneal to form a famous double helix if
the respective base meets its Watson-Crick
complements which are C matches G and A
matches T. Of course, if a molecule of DNA
meets another DNA molecule which is not its
complement, then they will not anneal.
2016/7/13
11
• 2. Ligases. Ligases bond the splitted DNA
molecule together. For example, DNA
ligase will take two strands of DNA
molecule and covalently connect them into
a single strand. In fact, ligase is used by
the cell to repair the broken DNA strands.
2016/7/13
12
• 3. Nucleases. Nucleases would cut nucleic
acid of a DNA molecule. For example,
nucleases would look for a predetermined
sequences of bases of a strand of DNA
molecules, if found, would cut the DNA
strands into two pieces.
2016/7/13
13
• 4. Polymerases. Polymerases copy
information from one DNA molecule into
the other. Furthermore, DNA polymerases
will make a Watson-Crick complementary
copy from a DNA strand template. In fact,
if we tell it where to start—that is a primer
provided by a short piece of DNA strand,
DNA polymerase will begin adding bases
to the primer to create a complementary
copy of the template.
2016/7/13
14
• 5. Gel electrophoresis. A solution of DNA
molecules is placed in one end of gel, and
we applied electric current to the gel. This
process separates DNA strands by length.
2016/7/13
15
• 6. DNA synthesis. Nowadays, we could ask a
commercial DNA synthesis facility to make the
DNA sequence. Just in a few days, we will
receive a test tube containing about 1018
molecules of DNA which is the sequence we ask.
• The above six techniques is the basis of
Adleman-Lipton DNA computing modle.
• From which, Adleman developed eight biomolecular instructions to perform bio-molecular
programs.
2016/7/13
16
• A test tube contains molecules of DNA which is a
finite set over the alphabet {A, C, G, T}, we could
perform the following operations
• 1. Append-tail. Given a tube T and a binary digit
xj, the operation, "Append-tail", will append xj
onto the end of every data stored in the tube T.
The formal representation for the operation is
written as "Append-tail(T, xj)".
2016/7/13
17
• 2. Amplify. Given a tube T, the operation
“Amplify(T, T1, T2)” will produce two new
tubes T1 and T2 so that T1 and T2 are
totally a copy of T (T1 and T2 are now
identical) and T becomes an empty tube.
2016/7/13
18
• 3. Merge. Given n tubes T1  Tn, the
merge operation is to merge data stored in
any n tubes into one tube, without any
change in the individual data. The formal
representation for the merge operation is
written as "(T1, , Tn)", where (T1, ,
Tn) = T1    Tn.
2016/7/13
19
• 4. Extract. Given a tube T and a binary
digit xk, the extract operation will produce
two tubes +(T, xk) and (T, xk), where +(T,
xk) is all of the data in T which contain xk
and (T, xk) is all of the data in T which do
not contain xk. After Extract biological
operation is completed, test tube T
becomes an empty tube.
2016/7/13
20
• 5. Detect. Given a tube T, the detect
operation is used to check whether any a
data is included in T or not. If at least one
data is included in T we have “yes”, and if
no data is included in T we have “no“. The
formal representation for the operation is
written as “Detect(T)“
2016/7/13
21
• 6. Discard. Given a tube T, the contents of
T are discarded, and T is replaced by a
new, empty tube. The formal
representation for the operation is written
as “Discard(T)“.
2016/7/13
22
• 7. Read. Given a tube T, the read
operation is used to describe any a data,
which is contained in T. Even if T contains
many different data, the operation can give
an explicit description of exactly one of
them. The formal representation for the
operation is written as “read(T)“.
2016/7/13
23
• 8. Append-head. Given a tube T and a
binary digit xj, the operation, "Appendhead", will append xj onto the head of
every data stored in the tube T. The formal
representation for the operation is written
as “Append-head(T, xj) “
2016/7/13
24
• Satisfiability is the first NP-complete problem
which determine if the variables of a given
Boolean formula can be assigned in such a way
that it makes the formula evaluate to be true.
• The problem remains NP-complete even if all
expressions are written in conjunctive normal
form with 3 variables per clause (3-CNF),
yielding the 3-SAT problem. 3-satisfiability is a
special case of k-satisfiability (k-SAT) when each
clause contains exactly k=3 literals.
2016/7/13
25
• For example, E = (x1∨⌐x2 ∨⌐x3) ∧ (x1∨⌐x3∨ x4).
Note that each clause has exactly 3 literals, that
is why we call it 3-SAT. Not-All-Equal (NAE) 3SAT problem is defined as follows.
• Definition 3-1:
• Instance: A set V of logical variables and a
collection C of clauses over V such that each
clause has 3 literals.
• Question: Is there a truth assignment for V such
that each clause has at least one true and at
least one false literal?
2016/7/13
26
Generate DNA-based algorithm to solve Not-All-Equal
(NAE) 3-SAT problem
• Define a binary digit zk1 to be the kth bit
(count from the leftmost side) which is 1
and zk0 to be the kth bit (count from the
leftmost side) which is 0. |C| are numbers
of clauses. |Ca| are numbers of elements
of the Cath clause. We also define vba is a
logical variable which is the xbth bit in the
ath clause.
2016/7/13
27
• Suppose that x1 is the leftmost bit and x4 is the
rightmost bit.
• Basically our algorithm contains 2 blocks of
codes, the first block will generate truth
assignments of the 3-SAT problem
• The second block deletes the truth assignments
which make one of the clauses all 1’s. Note that
no truth assignment would make any one of the
clauses all 0’s.
• Because if one of the clauses are all 0’s, then it
is unsatisfiable.
2016/7/13
28
Algorithm 3-1: Solving Not-All-Equal (NAE) 3-SAT problem
for n logical variables and a collection C of clauses over n
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
(1) * first block (Not-All-Equal-0) (generate truth assignments) *
(2) Append-tail(T1, z11).
(3) Append-tail(T2, z10).
(4) T = (T1, T2).
(5) For k = 2 to n
(6) Amplify(T, T1, T2).
(7) Append-tail(T1, zk1).
(8) Append-tail(T2, zk0).
(9) T = (T1, T2).
(10) EndFor
(11) For a = 1 to |C| do begin
(12) For b = 1 to |Ca| do begin
(13)
If vba = xj then begin
(14)
Tb = +(T, vba=1)
(15)
T = (T, vba=1)
(16)
end
(17)
else begin
(18)
Tb = +(T, vba = 0)
(19)
T = (T, vba = 0)
(20)
end
(21) End for
(22) Discard (T)
(23) union all Tb
(24) Endfor
2016/7/13
29
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
(25) * second block (Not-All-Equal-1) *
(26) For a = 1 to |C| do begin
(27) For b = 1 to |Ca| do begin
(28)
If vba = xj then begin
(29)
Tb = +(T, vba = 0)
(30)
T = (T, vba = 0)
(31)
end
(32)
else begin
(33)
Tb = +(T, vba = 1)
(34)
T = (T, vba = 1)
(35)
end
(36) End for
(37) Discard (T)
(38) union all Tb
(39) Endfor
(40) EndAlgorithm
2016/7/13
30
• In the development of this research, I also find
that the first block of codes is equal to delete the
assignments which make any one of the clauses
all 0’s (By definition). So, we have two block of
codes in our DNA algorithm, first block is
generate the truth assignment (or we can say
that “ delete the assignment which make any
one of the clauses all 0’s ”), second block is to
delete the truth assignment which make any one
of the clauses all 1’s.
2016/7/13
31
The Complexity of Algorithm 3-1
• Theorem 3-1: A set V of n logical
variables and a collection C of clauses
which are {C1, C2, …, Cp} over n. The NotAll-Equal (NAE) 3-SAT problem for C and
V can be solved in O(12p) with “extract”
operation, O(2p) with “discard” operation,
O(2n) with “append” operation, O(n+2p)
with “merge” operation, and O(n-1) with
“amplify” operation in the Adleman-Lipton
model.
2016/7/13
32
• Proof:
• Algorithm 1 can be applied to solve the Not-All-Equal
(NAE) 3-SAT problem for C and V. From the first block
of codes in algorithm 1, it is obvious that we use 2*3*p =
(6p) “extraction” operations, (p ) “discard” operations,
( 2*n ) “append” operations and ( n+p ) “merge”
operations, and ( n-1) “amplify” operation. From the
second block of codes in Algorithms 1, we use 2*3*p =
(6p) “extraction” operations, (p ) “discard” operations,
and (p ) merge operations. Therefore, from the analysis
above, it is inferred that the time complexity of Algorithm
1 is O(12p) with “extract” operation, O(2p) with “discard”
operation, O(2n) with “append” operation, O(n+2p) with
“merge” operation, and O(n-1) with “amplify” operation in
the Adleman-Lipton model.
2016/7/13
33
Simulated Experimental Results
• Adleman and his coworkers devise a
scheme to design DNA sequences for a
combinatorial library encoding strings of
zeros and ones
• introduce seven constraints to ease the
probe-library hybridization by reducing
secondary structure in the DNA molecules .
The constraints are:
2016/7/13
34
• (1). Library strands contain only As, Ts, and Cs.
• (2).Every library and probe sequence has no runs of more than 4 As,
4 Ts, 4 Cs or 4Gs.
• (3). Every probe sequence has fewer than 4 mismatches with any
15 base
• alignment of any library strand (except for at its matching bit-value).
• (4). No 15 base section of a library strand has fewer than 4
mismatches
• with any 15 base alignment of itself or any other library strand.
• (5). No 15 base probe has a run of more than 7 matches with any 8
base
• alignment of any library strand (except for at its matching bit-value).
• (6). No library strand has a run of more than 7 matches with any 8
base
• alignment of itself or any other library strand.
• (7). Every probe has 4, 5, or 6 Gs in its sequence.
2016/7/13
35
Sequences chosen to represent xk1 and xk0 in the example for V = ( x1,
x2, x3, x4 ) and f ( x1, x2, x3, x4 ) = C = (x1∨⌐x2 ∨⌐x3) ∧ (x1∨⌐x3∨ x4). in
subsection 3.1.
Bit
53
x11
2016/7/13
DNA Sequence

CATTCACAAACAATT
x10
TCATTCTCAACAAAA
x21
CTCTATTCCTCTCAA
x20
ACACCCTCTAATCTA
x31
TCTCCCTATCTATTT
x30
TCCTATTTAACTCCC
x41
CTCTACTCAAAATAA
x40
TATAACTTTCTCTCT
36
Discussion and conclusion
• In this paper, we propose the DNA-based
algorithm to solve the Not-All-Equal (NAE) 3SAT problem. Nowadays, many NP-complete
problems which could not be solved by a
traditional digital computer is now tried to be
solved by DNA-based algorithm. Even so, it is
still very difficult to support biological operations
using mathematical instructions. In the future,
there are still many difficulties to be overcome
and we hope that DNA-based supercomputing
could become a reality someday.
2016/7/13
37
Download