以 計 算 對 D N A

advertisement
國 立 成 功 大 學
資 訊 工 程 學 系
博 士 論 文
以
D N A 計 算 對
Not-All-Equal 3-SAT 問題 及
One-In-Three 3-SAT 問題及 Hitting-set 問題之分析與研究
A Study on the Molecular Algorithmic Solutions for the
Not-All-Equal and One-In-Three 3-SAT Problems and
the Hitting-set Problem in DNA-based Supercomputing
研
究 生: 施
能
裕
指 導 教 授: 朱
治
平
Institute of Computer Science and Information Engineering,
National Cheng Kung University, Tainan, Taiwan, R.O.C.
Dissertation for Doctor of Philosophy,
June, 2010
中華民國 九十九 年 六 月
I
II
中 文 摘 要
現代數位計算機可分硬體及軟體兩大應用領域, 硬體是以電子電路及數位
晶片架構在以布林代數(Boolean Algebra)為數學理論基礎而發展出來的電腦母板
及一系列的 PC 板等應用. 軟體的濫觴則是源自於計算理論(Automata Theory)
而有了 state, input, output 等概念而發展至現今的視窗程式設計…等領域.
然而我們對電腦速度的追求總是無止盡的, 許多的應用(例如氣象預報, 或
模擬一顆核子彈的爆炸, 或在網路上追求更完美,極至的影音表現)或數學上的難
題(例如 NP-Complete…等問題) 都須要速度更快的電腦. 然而如上所述的傳統
由電子電路所造的數位計算機其運算速度幾已達極限, 很難再有突破性的五十
倍或百倍的速度上的進展. 所以我們必須揚棄傳統的思維, 而另行思考不一樣
的, 替代性的計算模式 (Alternative Computing Model).
在國外, 替代性的計算模式 (Alternative Computing Model)已被思考多年,
諸如函數計算(Functional Programming), 或邏輯計算(Logic Programming), 或
Petri Net, 或於 1990 年由 Adleman 博士所提出的 DNA 計算模式 … 等, 本文
即是以 DNA 計算模式的思維, 對三個傳統難解的 NP-Complete 問題, 提出其
在 DNA 計算模式下的數學演算法, 期待隨著 DNA 計算模式發展的日趨成熟,
藉由其高度的平行運算, 快速增進電腦的運算速度, 進而解決在傳統數位計算
III
機上難解的數學問題或實際的應用.
本論文針對三個數位電腦上的 NP-Complete 的問題(如本論文標題所示) ,
分別提出了 DNA 計算模式下的演算法.
日本學者 Junzo Watada 在 2008 年的 ISDA 國際學術研討會裏發表論文,
說到目前 DNA 計算模式的研究有兩個主流, 一為致力於發展精確的數學模型,
另一則根據正確的數學模型,專心的在奈米級的實驗室裏操作 DNA 生物指令. 我
所做的研究是屬於第一流派,發展了三個 DNA 計算模式下的演算法,亦歡迎第二
流派的學者以本論文做基礎,專心的在實驗室裏操作以期 DNA 計算模式終有實
現的一天.
關鍵字:DNA 計算模式, 滿足性問題, NP-Complete 問題, Not-All-Equal 3-SAT
問題, One-In-Three 3-SAT 問題, Hitting-set 問題
IV
Abstract
This dissertation is to illustrate the current state of the art of DNA computing
achievements, especially of new approaches to solve theoretical 3-SAT problems and
the hitting-set problem. Beginning with Adleman’s breakthrough which is an
molecular algorithm for the solution of a NP-complete, combinatorial problem, the
directed Hamiltonian path problem (HPP). Today, many researchers all over the
world concentrate on proposing new methods to solve engineering or application
problems with a DNA computing approach.
Satisfiability problem is given a Boolean formula, and decide if a satisfying truth
assignment exists. ( x12  x5 )  ( x 24  x3  x13  x 9 )  …  ( x12 )  ( x17  x8  x18 )
is an example of Boolean formula. k-SAT problem means that each clause has exactly
k literals. Not-All-Equal (NAE) 3-SAT problem and One-In-Three (1IN3) 3-SAT
problem are both NP-complete problems. In this dissertation, we present molecular
solutions to find all true assignments (3-SAT problem) and furthermore find
Not-All-Equal (NAE) solutions and One-In-Three (1IN3) solutions in DNA-based
Supercomputing.
Hitting-set problem assume that there exists a collection C of subsets of a finite
set S, and a positive integer K  |S|, and we need to know if there is a subset S   S
V
with | S  |  K such that S  contains at least one element of each subset in C. In other
words, S  is the subset that intersects every subset in C and is called the hitting-set.
In this dissertation, a DNA-based algorithm is proposed to solve the small
hitting-set problem. A small hitting-set is a hitting-set with the smallest K value, i.e.,
the hitting-set with the smallest number of elements. Furthermore, an algorithm is
introduced to find the number of ones from 2n combinations and minimum numbers
of ones represents the small hitting-set since K is expected to be as small as possible.
The complexity of all the presented DNA-based algorithms is also discussed. We
describe time complexity and volume complexity of three Algorithms, numbers of
test tube used and the longest library strand in solution space of all three Algorithms.
Finally, the simulated experiment is applied to verify correctness of the proposed
DNA-based algorithm for solving the One-In-Three (1IN3) 3-SAT problem, and
simulation of Not-All-Equal (NAE) 3-SAT problem is similar. Also, another
simulated experiment is applied to our proposed DNA-based algorithm 6-2, in order
to solve the well-known hitting-set problem.
This research has been motivated by the benefit and the application of DNA
computing and gives new methods to solve two 3-SAT problems and the hitting-set
problem which are NP-complete.
Key Words: Satisfiability problem, 3-SAT problem, Not-All-Equal 3-SAT problem,
One-In-Three 3-SAT problem, Hitting-set Problem, Molecular Solution, DNA-based
Supercomputing, DNA-based Algorithm, NP-Complete Problems.
VI
誌
謝
時光荏苒, 在成功大學資訊工程研究所學習的日子已告一段落, 回首這些
年來數不清有多少的日子來回奔走於工作與學習的崗位, 在這社會上多重角色
的扮演, 讓我更疲於奔命, 努力想演好每個角色. 在這段日子裏, 我得到了滿滿
的收穫, 是我人生中最美麗, 豐碩的回憶.
感謝我的指導教授 朱治平博士對我的諄諄教誨與悉心指導, 並提供經費讓
我出國參加學術研討會以增廣見聞. 謝孫源教授是我認識最聰明的中國人, 我
從謝教授的課堂上學習到很多, 對朱治平教授與謝孫源教授正派的作風與無私
的指導表示深深的敬意. 此外, 所上多位教授曾為我的任課教師, 如李強教授…
對我專業知識的啟迪與課業的關懷, 我也獻上深深的謝意.
非常感謝我的父母親長期默默的支持, 謹以此文獻給你們及我的一對寶貝
兒女.
施 能 裕
謹 識 于
中華民國九十九年三月十一日
VII
TABLE OF CONTENTS
Chapter 1
Introduction………………………………...……..………….….……1
1.1
Research Motivation…………………………..…………………...…1
1.2
Adleman’s Experiment…………………………………………...…...2
1.3
DNA computing……………………………...………………...……..3
Chapter 2
Background and related works………..…………..……………...…...6
2.1
The Adleman-Lipton model…………………………………………..6
2.2
Introduction to other related works…………………………………...9
Chapter 3 Molecular solution of Not-All-Equal (NAE) 3-SAT problem………..11
3.1
Definition of Not-All-Equal (NAE) 3-SAT problem……………..….11
3.2
Generate DNA-based algorithm to solve Not-All-Equal (NAE) 3-SAT
problem………….. ….………………............................…………..12
3.3
The Power of the DNA Algorithm to Solve Not-All-Equal (NAE)
3-SAT problem…………………………………………….………. 15
3.4
The Complexity Analysis of Algorithm 4-1……………………...…18
Chapter 4
Molecular solution of One-In-Three (1IN3) 3-SAT problem..............21
4.1
Definition of One-In-Three (1IN3) 3-SAT problem…………….......21
4.2
Generate DNA-based algorithm to solve One-In-Three (1IN3) 3-SAT
problem………….……………………………………………….......21
4.3
The Power of the DNA Algorithm to Solve One-In-Three (1IN3)
3-SAT problem…………………………………………………..….28
4.4
The Complexity Analysis of Algorithm 5-1…………………...…….31
Chapter 5
A DNA-based Algorithm for Solving the Hitting-set Problem…..….33
5.1
Definition of the Hitting-set Problem…………………………...…..33
5.2
Constructing Solution Space of DNA Sequences for the Hitting-set
VIII
Problem………………………….……………...……………………34
5.3
Introduction of Finding the Maximum and Minimum Numbers of Ones
in Bio-molecular Computing...............................................................35
5.4
Generate DNA-based algorithm to solve the Hitting-set
problem………………………………………...………………….…37
5.5
Simple Example of the Hitting-set Problem………………….….…..40
5.6
Complex Example of the Hitting-set Problem………….…………....42
5.7
The Complexity Analysis of Algorithm 6-2…………………..……...45
Chapter 6 Simulated Experimental Results………………...................................48
6.1
Simulation
of Experimental
Results of One-In-Three 3-SAT
problem………………………………………………………………50
6.2
Simulation of Experimental Results of Hitting-set problem……....…53
Chapter 7 Discussions and Conclusions…………………....................................57
References ……………………………………………………………………..….71
IX
List of
Tables
Table 6-1. Each possible subsets S  of a ground set S = { 1, 2, 3, 4}……….…34
Table 7-1. Sequences chosen to represent xk1 and xk0 in the example for V = ( x1, x2,
x3, x4 ) and f ( x1, x2, x3, x4 ) = C = ( x1  x 2  x3 )  ( x1  x3
 x 4 ) in subsection
5.1……………………………………………….50
Table 7-2. The energy for binding each probe to its corresponding region on a
library strand…………………………………………..……………51
Table 7-3. The energy over all probe/library strand interactions…………….….52
Table 7-4. DNA sequences chosen to represent answers in test tube T…………52
Table 7-5. Sequences chosen to represent zk1 and zk0 in the example for S = {1, 2,
3, 4} and C = {{1, 2, 3}, {4}} in subsection 6.1…………..…………53
Table 7-6. The energies for of binding each probe to its corresponding region on a
library strand…………………............………………………………54
Table 7-7. The energies over all probe/library strand interactions……………...54
Table 7-8. DNA sequences chosen to represent the hitting-set with k = 2 in tube
T0……………………………………………………………………..55
X
Chapter 1
Introduction
1.1 research motivation
This dissertation is to illustrate the current state of the art of DNA computing
achievements, especially of new approaches to solve theoretical 3-SAT problems and
the hitting-set problem. Beginning with Adleman’s breakthrough which is an
molecular algorithm for the solution of a NP-complete, combinatorial problem, the
directed Hamiltonian path problem (HPP). Today, many researchers all over the
world concentrate on proposing new methods to solve engineering or application
problems with a DNA computing approach [33, 34, 35].
DNA is a basic inheritance medium for all living cells. The main idea of DNA
computing is to encode data in a DNA strand form, and use bio-operation to
manipulate DNA strands in a test tube to simulate arithmetical and logical operations.
It is estimated that about 1018 DNA strands could operate 10 4 times faster than the
speed of a today’s advanced supercomputer [30]. Let us see another data, while
modern supercomputers perform 1012 operations per second, Adleman estimates 1020
operations per second for molecular instructions to be realistic.
1
Similar impressive views concern the consumption of energy and the capacity of
memory: A supercomputer needs one joule for 109 operations, while the same energy
is sufficient to perform 2*1019 ligation operations [31, 32]. On a video tape, every bit
needs 1012 cubic nanometers storage, whereas DNA stores information with a density
of one bit per cubic nanometer [36].
This research has been motivated by the benefit and the application of DNA
computing and gives new methods to solve two 3-SAT problems which are
NP-complete.
1.2 Adleman’s Experiment
Adleman implemented a DNA-based algorithm to solve the directed
Hamiltonian path problem which is NP-complete. A Hamiltonian path is to find a
directed edge through a graph that starts and ends at specified vertices, and visits
every vertex in the graph exactly once. The Hamiltonian path problem is to decide if
a Hamiltonian path exists in a graph.
Suppose there is a grapg G with n vertices where vertices Vstart and Vend are
marked. We want to decide if there is a Hamiltonian path which starts from Vstart and
ends at Vend. Adleman uses a DNA-based algorithm to solve the directed HPP as
2
follows [1, 31]:
Step 1. Generate random paths for the graph G.
Step 2. Extract only those paths which start with Vstart and end with Vend.
Step 3. Because the graph has n vertices, extract paths with length exactly
n-1.
Step 4. Extract paths that contain every vertex at most once.
Step 5. if any path remains, say ‘yes’, otherwise, say ‘no’.
The above steps are realized by bio-molecular instructions. Ligation builds DNA
dtrands that represent random paths in G on step1. PCR was performed representing
the Vstart and Vend on step 2. Step 3 was done with gel electrophoresis to extract
molecules of the proper length. Step 4 was done by checking each vertex if only
present in a path once. In the final step, the gel electrophoresis is used for testing if
any molecules left or not. If “yes”, they represent the Hamiltonian paths. If no
molecules were detected on the final gel, then, there is no Hamiltonian path existed.
1.3 DNA computing
Being the main material of nucleus, DNA (deoxyribonucleic acid) is able to
determine the inheritance model of natural creatures such as human beings and is
made up of a linear chain of smaller units which is called nucleotides. Nucleotides
3
contain three major components those are deoxyribose, phosphate group, and the base,
while different nucleotides are tested by their bases which could be adenine
(abbreviated as A), guanine (G), cytosine (C) or thymine (T). Two strands of DNA
could form a double helix if the respective bases are the famous Watson-Crick
complements, i.e., C matches G, and A matches T. Then the 3 end (the 3rd carbon of
the deoxyribose) will connect to the 5 end (the 5th carbon attaching a phosphate
group) in each strand. A single DNA strand is chained from the 3 -end (attaching a
hydroxyl group) nucleotide to the next 5 -end (attaching a phosphate group)
nucleotide via a phosphate group, by one nucleotide each time and then form another
single DNA strand. If the strand contains 20 nucleotides, we say it is 20-mer long.
For a double stranded DNA, the length is counted by its base pairs. If a double
stranded DNA has the base pairs with 20, then we know it is made by two single
DNA strands each has the length 20 mer long[1, 25].
DNA-based Computing [1, 2, 3, 4] treats the DNA strands as the bits in the
traditional digital computers, and use the techniques such as PCR (polymerase chain
reaction), gel electrophoresis, and enzyme reactions to separate, concatenate, delete,
and duplicate the DNA strands [5]. Nowadays, we could produce roughly 1018 DNA
strands in a test tube [10-15]. It also means that we could represent 1018 bits of
information. By the biological operations in the following section, we seem to have
1018 processors running in parallel. The massive power of parallelism could solve the
4
most intractable problem in computer science so far [19].
The molecular computation was first proposed in 1961 by Feynman [18], while
his idea had not been tested experimentally until 1994 when Adleman successfully
solved an instance of the Hamiltonian path problem in a test tube by DNA strands [1].
After that, DNA-based algorithms have been proposed for the solution of many
computational problems. The motivation behind using DNA to solve such problems
lies in the potential for massive inherent parallelism when performing operations on
populations of trillions of DNA molecules. Lipton in 1995 [22] also demonstrated the
DNA-solution could be used to solve satisfiability problem which is the first
NP-complete problem. Adleman and co-authors (Roweis et al.) in 1999 proposed
sticker for enhancing the Adleman-Lipton model [25]. In 2000, Adleman and his
co-authors (Braich et al.) chose to solve a 6-variable 11-clause formula on the 3-SAT
problem [5]. Moreover, in 2002, Adleman and his co-authors (Braich et al.)
performed experiments to solve a 20-variable 24-clause formula on the 3-SAT
problem [6].
In this dissertation, we try to solve Not-All-Equal (NAE) 3-SAT problem and
One-In-Three (1in3) 3-SAT problem and the hitting-set problem using molecular
solutions. We also discuss the complexity of the proposed DNA-based algorithms.
5
Time complexity, volume complexity, numbers of test tube used, and the longest
library strand in solution space of DNA-based algorithms are discussed.
Finally,
simulated experiment is applied to One-In-Three (1in3) 3-SAT problem and the
hitting-set problem, while Not-All-Equal (NAE) 3-SAT problem is similar.
This dissertation is organized as follows. In chapter 2, we provide background
of DNA-based Supercomputing and related works done by other scholars. In chapter
3, we define Not-All-Equal (NAE) 3-SAT problem and present the DNA-based
algorithm and also discuss complexity of this algorithm. In chapter 4, we define
One-In-Three (1in3) 3-SAT problem and present the DNA-based algorithm and also
discuss complexity of this algorithm. In chapter 5, we define the hitting-set problem
and present the DNA-based algorithm and also discuss complexity of this algorithm.
In chapter 6 ,we provide the simulated experimental results of One-In-Three (1in3)
3-SAT problem and hitting-set problem. Chapter 7 are discussions and conclusions.
6
Chapter 2
Background and related works
2.1 The Adleman-Lipton model
Adleman present a new concept of computation in the molecular level at his
paper [1-3]. According to his idea, We could make a molecular computer with the
tools as the following:
1. Watson-Crick complements. Two strands of DNA will anneal to form a famous
double helix if the respective base meets its Watson-Crick complements which are C
matches G and A matches T. Of course, if a molecule of DNA meets another DNA
molecule which is not its complement, then they will not anneal.
2. Ligases. Ligases bond the splitted DNA molecule together. For example, DNA
ligase will take two strands of DNA molecule and covalently connect them into a
single strand. In fact, ligase is used by the cell to repair the broken DNA strands.
3. Nucleases. Nucleases would cut nucleic acid of a DNA molecule. For example,
nucleases would look for a predetermined sequences of bases of a strand of DNA
molecules, if found, would cut the DNA strands into two pieces.
4. Polymerases. Polymerases copy information from one DNA molecule into the
7
other. Furthermore, DNA polymerases will make a Watson-Crick complementary
copy from a DNA strand template. In fact, if we tell it where to start—that is a primer
provided by a short piece of DNA strand, DNA polymerase will begin adding bases to
the primer to create a complementary copy of the template.
5. Gel electrophoresis. A solution of DNA molecules is placed in one end of gel, and
we applied electric current to the gel. This process separates DNA strands by length.
6. DNA synthesis. Nowadays, we could ask a commercial DNA synthesis facility to
make the DNA sequence. Just in a few days, we will receive a test tube containing
about 1018 molecules of DNA which is the sequence we ask.
The above six techniques is the basis of Adleman-Lipton DNA computing
modle.
From which, Adleman developed eight bio-molecular instructions to perform
bio-molecular programs. A test tube contains molecules of DNA which is a finite set
over the alphabet {A, C, G, T}, we could perform the following operations [915 ]:
1. Append-tail. Given a tube T and a binary digit xj, the operation, "Append-tail", will
append xj onto the end of every data stored in the tube T. The formal representation
for the operation is written as "Append-tail(T, xj)".
2. Amplify. Given a tube T, the operation “Amplify(T, T1, T2)” will produce two new
8
tubes T1 and T2 so that T1 and T2 are totally a copy of T (T1 and T2 are now identical)
and T becomes an empty tube.
3. Merge. Given n tubes T1  Tn, the merge operation is to merge data stored in any n
tubes into one tube, without any change in the individual data. The formal
representation for the merge operation is written as "(T1, , Tn)", where (T1, ,
Tn) = T1    Tn.
4. Extract. Given a tube T and a binary digit xk, the extract operation will produce two
tubes +(T, xk) and (T, xk), where +(T, xk) is all of the data in T which contain xk and
(T, xk) is all of the data in T which do not contain xk. After Extract biological
operation is completed, test tube T becomes an empty tube.
5. Detect. Given a tube T, the detect operation is used to check whether any a data is
included in T or not. If at least one data is included in T we have “yes”, and if no data
is included in T we have “no“. The formal representation for the operation is written
as “Detect(T)“.
6. Discard. Given a tube T, the contents of T are discarded, and T is replaced by a
new, empty tube. The formal representation for the operation is written as
“Discard(T)“.
7. Read. Given a tube T, the read operation is used to describe any a data, which is
9
contained in T. Even if T contains many different data, the operation can give an
explicit description of exactly one of them. The formal representation for the
operation is written as “read(T)“.
8. Append-head. Given a tube T and a binary digit xj, the operation, "Append-head",
will append xj onto the head of every data stored in the tube T. The formal
representation for the operation is written as “Append-head(T, xj) “.
2.2 Introduction to other related works
Adleman and his co-authors [6] performed experiments to solve a 20-variable
24-clause three-conjunctive normal form (3-CNF) formula. Zhang and Winfree [29]
presented an allosteric DNA molecule that, in its active configuration, catalyzes a
noncovalent DNA reaction. Yin and his co-authors [28] programed diverse molecular
self-assembly and disassembly pathways using a ‘reaction graph’ abstraction to
specify complementarity relationships between modular domains in a versatile DNA
hairpin motif. Cook and his co-authors [17] showed how several common digital
circuits (including de-multiplexers, random access memory, and Walsh transforms)
could be built in a bottom-up manner using biologically inspired self-assembly.
Bishop and his co-authors [8] considered the task of programming active
10
self-assembling and self-organizing systems at the level of interactions among
particles in the system. Chen and his co-authors [16] proposed dimension augmented
proof-reading, a technique that uses the third dimension to do error-correction in two
dimensional self-assembling systems. Suzuki and Murata [26] proposed a model of
DNA spike oscillator. Goodman and his co-authors [20] reported a family of DNA
tetrahedra, less than 10 nanometers on a side that can self-assemble in seconds with
near-quantitative yield of one diastereomer. O'Neill and his co-authors [24] studied
the nanotubes that have five nicks, one in the core of a tile and one at each corner and
reported the successful ligation of all four corner nicks by T4 DNA ligase. Yashin and
his co-authors [27] demonstrated cascades of particles with up to three layers and a
nonlinear network with an AND gate hub. Brijder and his co-authors [7] showed that
membrane systems are computationally universal. Majumder and his co-authors [23]
described how these self-assembly processes can be modeled as rapidly mixing
Markov Chains characterized chemical equilibrium in the context of self-assembly
processes and present a formulation for the equilibrium concentration of various
assemblies.
11
Chapter 3
Molecular solution of Not-All-Equal (NAE) 3-SAT problem
3.1 Definition of Not-All-Equal (NAE) 3-SAT problem
Satisfiability is the first NP-complete problem which determine if the variables
of a given Boolean formula can be assigned in such a way that it makes the formula
evaluate to be true. If there is no such assignment found, we say that the function is
unsatisfiable, otherwise it is satisfiable. Satisfiability problem is also a decision
problem which is also called Boolean satisfiability problem whose instance is a
Boolean expression written using only AND, OR, NOT, variables and parantheses. A
more formal definition of satisfiability problem is: There is a set U of variables and a
collection C of clauses over U, is there a satisfying truth assignment for C?
The problem remains NP-complete even if all expressions are written in
conjunctive normal form with 3 variables per clause (3-CNF), yielding the 3-SAT
problem. 3-satisfiability is a special case of k-satisfiability (k-SAT) when each clause
contains exactly k=3 literals. For example, E = ( x1  x 2  x3 )  ( x1  x3
 x 4 ).
Note that each clause has exactly 3 literals, that is why we call it 3-SAT.
Not-All-Equal (NAE) 3-SAT problem [19] is defined as follows.
12
Definition 3-1:
Instance: A set V of logical variables and a collection C of clauses over V such that
each clause has 3 literals.
Question: Is there a truth assignment for V such that each clause has at least one true
and at least one false literal?
For example, V = ( x1, x2, x3, x4 ) and C = ( x1  x 2  x3 )  ( x1  x3  x 4 ).
Suppose x1 is the leftmost bit and x4 is the rightmost bit, we could find all truth
assignments are { 1000, 1001, 1010, 1011, 1100, 1101, 1110, 1111, 0000, 0001, 0100,
0101, 0011}. Delete 3 assignments ( { 1000, 1001, 1101 } ) which will make one of
the clauses all 1’s, and no truth assignment would make one of the clauses all 0’s.
We get final answer T = { 0000, 0001, 0100, 0101, 0011, 1110, 1111, 1010, 1011,
1100 }.
3.2 Generate DNA-based algorithm to solve Not-All-Equal (NAE) 3-SAT
problem
Given that x1, x2, x3, x4 are 4 logical variables and

x2

f ( x1, x2, x3, x4 ) = C = ( x1
x3 )  ( x1  x3  x 4 ). Define a binary digit zk1 to be the kth bit (count
from the leftmost side) which is 1 and zk0 to be the kth bit (count from the leftmost
13
side) which is 0. |C| are numbers of clauses. |Ca| are numbers of elements of the Cath
clause. We also define vba is a logical variable which is the xbth bit in the ath clause.
Suppose that x1 is the leftmost bit and x4 is the rightmost bit. Basically our algorithm
contains 2 blocks of codes, the first block will generate truth assignments of the
3-SAT problem [4]. The second block deletes the truth assignments which make one
of the clauses all 1’s. Note that no truth assignment would make any one of the
clauses all 0’s. Because if one of the clauses are all 0’s, then it is unsatisfiable. The
answer is left on T such that each clause has at least one true and at least one false
literal.
Algorithm 3-1: Solving Not-All-Equal (NAE) 3-SAT problem for n logical variables
and a collection C of clauses over n.
(1) * first block (Not-All-Equal-0) (generate truth assignments) *
(2) Append-tail(T1, z11).
(3) Append-tail(T2, z10).
(4) T = (T1, T2).
(5) For k = 2 to n
(6) Amplify(T, T1, T2).
14
(7) Append-tail(T1, zk1).
(8) Append-tail(T2, zk0).
(9) T = (T1, T2).
(10) EndFor
(11) For a = 1 to |C| do begin
(12)
For b = 1 to |Ca| do begin
(13)
If vba = xj then begin
(14)
Tb = +(T, vba=1)
(15)
T = (T, vba=1)
(16)
end
(17)
else begin
(18)
Tb = +(T, vba = 0)
(19)
T = (T, vba = 0)
(20)
end
(21)
End for
(22)
Discard (T)
|Ca |
(23)
T   (Tb )
b 1
(24) Endfor
15
(25) * second block (Not-All-Equal-1) *
(26) For a = 1 to |C| do begin
(27)
For b = 1 to |Ca| do begin
(28)
If vba = xj then begin
(29)
Tb = +(T, vba = 0)
(30)
T = (T, vba = 0)
(31)
end
(32)
else begin
(33)
Tb = +(T, vba = 1)
(34)
T = (T, vba = 1)
(35)
end
(36)
End for
(37)
Discard (T)
|Ca |
(38)
T   (Tb )
b 1
(39) Endfor
(40) EndAlgorithm
Lemma 3-1: The algorithm 3-1 can be used to solve a Not-All-Equal 3-SAT
16
problem for n logical variables and a collection C of clauses over n.
Proof:
The algorithm 3-1 consists of 2 block of codes and is implemented by means of the
extract, amplify, append-tail, discard and merge operations. The first block of codes
contains solution space of 2n states of n bits which is generated from each execution
of step (2) through step (10). After those operations are performed, 2n combinations
of n bits are contained in tube T. The first block of codes generate all truth
assignments such that each clause would not be all 0’s. Step (11) and step (12) are,
respectively, the outer loop and inner loop. And step (13) to step (20) say that if vba =
xj then we extract vba=1 to Tb and extract vba= 0 to T, otherwise we extract vba = 0 to
Tb and extract vba = 1 to T where a and b are indexes of outer and inner loop
respectively. After inner loop is ended, we discard test tube T and union all test tube
Tb and repeat the second outer loop. After all outer loops are ended, all truth
assignments are in test tube T.
The second block of codes would delete some truth assignments such that each
clause would not be all 1’s. Step (26) and Step (27) are, respectively, the outer and
inner loop. Step (28) to step (35) say that if vba = xj then we extract vba= 0 to Tb and
extract vba= 1 to T, otherwise we extract vba = 1 to Tb and extract vba = 0 to T where a
17
and b are indexes of outer and inner loop respectively. After inner loop is ended, we
discard test tube T and union all test tube Tb and repeat the second outer loop. After
all outer loops are ended, all truth assignments such that each clause has at least one
true and at least one false literal are in test tube T.
3.3 The Power of the DNA Algorithm to Solve Not-All-Equal (NAE) 3-SAT
problem
The example for V = ( x1, x2, x3, x4 ) and f ( x1, x2, x3, x4 ) = C = ( x1  x 2  x3 )

( x1  x3  x 4 ) in subsection 3.1 is applied to show the power of Algorithm 3-1.
The first execution of Step (13) through Step (20) when a = 1 and b = 1, we put the
subset whose leftmost encoding bit is 1 on T1, and put the subset whose leftmost
encoding bit is 0 on T, so we get T1= {1000, 1001, 1010, 1011, 1100, 1101, 1110,
1111 } and T = { 0000, 0001, 0010, 0011, 0100, 0101, 0110, 0111 }. Next, the second
execution of Step (13) through Step (20) when a = 1 and b = 2, we get T2 = { 0000,
0001, 0010, 0011 } and T = { 0100, 0101, 0110, 0111}. Then, the third execution of
Step (13) through Step (20) when a = 1 and b = 3, we obtain T3 = {0100, 0101} and T
= { 0100, 0111}. Because the first outer loop is ended, the first execution of Step (22)
is applied to discard test tube T and the first execution of Step (23) is applied to
18
merge test tube T1, T2, T3 into T, we get T = {1000, 1001, 1010, 1011, 1100, 1101,
1110, 1111, 0000, 0001, 0010, 0011, 0100, 0101}.
For the second outer loop of The forth execution of Step (13) through Step (20)
when a = 2 and b = 1, we put the subset whose leftmost encoding bit is 1 on T1, and
put the subset whose leftmost encoding bit is 0 on T, so we get T1= {1000, 1001,
1010, 1011, 1100, 1101, 1110, 1111 } and T = { 0000, 0001, 0010, 0011, 0100,
0101, }. Next, the fifth execution of Step (13) through Step (20) when a = 2 and b = 2,
we get T2 = { 0000, 0001, 0100, 0101 } and T = { 0010, 0011 }. Then, the sixth
execution of Step (13) through Step (20) when a = 2 and b = 3, we obtain T3 = {0011}
and T = { 0010 }. Because the second outer loop is ended, the second execution of
Step (22) is applied to discard test tube T and the second execution of Step (23) is
applied to merge test tube T1, T2, T3 into T, we get T = {1000, 1001, 1010, 1011, 1100,
1101, 1110, 1111, 0000, 0001, 0100, 0101, 0011 } are the truth assignments.
The truth assignments T = {1000, 1001, 1010, 1011, 1100, 1101, 1110, 1111, 0000,
0001, 0100, 0101, 0011 }. We keep on tracing second block of codes which delete the
truth assignment make any one of clauses all 1’s. The first execution of Step (28)
through Step (35) when a = 1 and b = 1, we put the assignment whose leftmost
encoding bit is 0 on T1, and put the assignment whose leftmost encoding bit is 1 on T,
19
so we get T1= {0000, 0001, 0100, 0101, 0011} and T = {1000, 1001, 1010, 1011,
1100, 1101, 1110, 1111}. Next, the second execution of step (28) through Step (35)
when a = 1 and b = 2, we get T2 = {1100, 1101, 1110, 1111} and T = {1000, 1001,
1010, 1011}. Then, the third execution of step (28) through Step (35) when a = 1 and
b = 3, we obtain T3 = {1010, 1011} and T = {1000, 1001}. Because the first outer
loop is ended, the first execution of Step (37) is applied to discard test tube T and the
first execution of Step (38) is applied to merge test tube T1, T2, T3 into T, we get T =
{0000, 0001, 0100, 0101, 0011, 1100, 1101, 1110, 1111, 1010, 1011}.
For the second outer loop when a = 2 and b = 1, from the fourth execution of step
(28) through Step (35), we get T1 = {0000, 0001, 0100, 0101, 0011} and T = {1100,
1101, 1110, 1111, 1010, 1011}. The fifth execution of step (28) through Step (35)
when a = 2 and b = 2, we get T2 = {1110, 1111, 1010, 1011} and T = {1100,1101}.
The sixth execution of step (28) through Step (35) when a = 2 and b = 3, we get T3 =
{1100} and T = {1101}. The second outer loop is ended also. The second execution
of Step (37) is applied to discard T and the second execution of Step (38) is used to
merge T1, T2, T3 into T. This implies the answer T = {0000, 0001, 0100, 0101, 0011,
1110, 1111, 1010, 1011, 1100}. We discard 3 truth assignments ({1000, 1001, 1101})
which would make one of the clauses all 1’s. No truth assignment would make any
20
one of the clauses all 0’s. So, T is the final answer.
In the development of this research, I also find that the first block of codes is
equal to delete the assignments which make any one of the clauses all 0’s (By
definition). So, we have two block of codes in our DNA algorithm, first block is
generate the truth assignment (or we can say that “ delete the assignment which make
any one of the clauses all 0’s ”), second block is to delete the truth assignment which
make any one of the clauses all 1’s.
3.4 The Complexity of Algorithm 3-1
The following theorems describe time complexity and volume complexity of
Algorithm 4-1, numbers of test tube used and the longest library strand in solution
space in Algorithm 4-1.
Theorem 3-1: A set V of n logical variables and a collection C of clauses which are
{C1, C2, …, Cp} over n. The Not-All-Equal (NAE) 3-SAT problem for C and V can be
solved in O(12p) with “extract” operation, O(2p) with “discard” operation, O(2n)
with “append” operation, O(n+2p) with “merge” operation, and O(n-1) with
“amplify” operation in the Adleman-Lipton model.
Proof:
21
Algorithm 3-1 can be applied to solve the Not-All-Equal (NAE) 3-SAT problem
for C and V. From the first block of codes in algorithm 1, it is obvious that we use
2*3*p = (6p) “extraction” operations, (p ) “discard” operations, ( 2*n ) “append”
operations and ( n+p ) “merge” operations, and ( n-1) “amplify” operation. From the
second block of codes in Algorithms 1, we use 2*3*p = (6p) “extraction” operations,
(p ) “discard” operations, and (p ) merge operations. Therefore, from the analysis
above, it is inferred that the time complexity of Algorithm 1 is O(12p) with “extract”
operation, O(2p) with “discard” operation, O(2n) with “append” operation, O(n+2p)
with “merge” operation, and O(n-1) with “amplify” operation in the Adleman-Lipton
model.
Theorem 3-2: A set V of n logical variables and a collection C of clauses which are
{C1, C2, …, Cp} over n. The Not-All-Equal (NAE) 3-SAT problem for C and V can be
solved with O( 2n ) library strands in the Adleman-Lipton model.
Proof:
Refer to Lemma 3-1 and Theorm 3-1.
Theorem 3-3: A set V of n logical variables and a collection C of clauses which are
{C1, C2, …, Cp} over n. The Not-All-Equal (NAE) 3-SAT problem for C and V can be
solved with O( n ) tubes in the Adleman-Lipton model.
22
Proof:
Refer to Lemma 3-1 and Theorm 3-1.
Theorem 3-4: A set V of n logical variables and a collection C of clauses which are
{C1, C2, …, Cp} over n. The Not-All-Equal (NAE) 3-SAT problem for C and V can be
solved with the longest library strand, O( 15*n +15*p ), in the Adleman-Lipton
model.
Proof:
Refer to Lemma 3-1 and Theorm 3-1.
23
Chapter 4
Molecular solution of One-In-Three (1IN3) 3-SAT problem
4.1 Definition of One-In-Three (1IN3) 3-SAT problem
One-In-Three (1in3) 3-SAT problem [19] is defined as follows.
Definition 4-1:
Instance: A set V of logical variables and a collection C of clauses over V such that
each clause has 3 literals.
Question: Is there a truth assignment for V such that each clause in C has exactly one
true literal?
We analyse the definition above and find that if each clause in C has three
literals and exactly one true literal ( the other two literals are of course false ), then
each clause should be one of the three forms which are 100, 010, 001.
For example,
V = ( x1, x2, x3, x4 ) and C = ( x1  x 2  x3 )  ( x1  x3  x 4 ). Suppose x1 is the
leftmost bit and x4 is the rightmost bit, we could find all truth assignments are { 1000,
1001, 1010, 1011, 1100, 1101, 1110, 1111, 0000, 0001, 0100, 0101, 0011}. There are
only 3 assignments ( { 1110, 0100, 0011 } ) such that each clause in C has exactly
one true literal. We get final answer T = { 1110, 0100, 0011 }.
24
4.2 Generate DNA-based algorithm to solve One-In-Three (1IN3) 3-SAT
problem
Given that x1, x2, x3, x4 are 4 logical variables and
f ( x1 , x2 , x3 , x4 ) = C =
( x1  x 2  x3 )  ( x1  x3  x 4 ). Define a binary digit zk1 to be the kth bit
(count from the leftmost side) which is 1 and zk0 to be the kth bit (count from the
leftmost side) which is 0. |C| are numbers of clauses. |Ca| are numbers of elements of
the Cath clause. We also define vba is a logical variable which is the xbth bit in the ath
clause. Suppose that x1 is the leftmost bit and x4 is the rightmost bit. Basically our
algorithm contains 2 blocks of codes, the first block will produce solution spaces and
generate truth assignments of the 3-SAT problem [4]. The second block is moreover
separated into three parts. First part would collect truth assignments which make
the clause be “100”. Second part would collect truth assignments which make the
clause be “010”. Third part would collect truth assignments which make the clause be
“001”. Then go to the second round (for a = 2, that is, the second clause ), and do the
same thing again. After all clauses are done, the answer is left on T such that each
clause has exactly one true literal.
We propose the following DNA-based algorithm to solve One-In-Three (1IN3)
3-SAT problem.
25
Algorithm 4-1: Solving One-In-Three (1IN3) 3-SAT problem for n logical variables
and a collection C of clauses over n.
(1) * first block (generate truth assignments) *
(2) Append-tail(T1, z11).
(3) Append-tail(T2, z10).
(4) T = (T1, T2).
(5) For k = 2 to n
(6) Amplify(T, T1, T2).
(7) Append-tail(T1, zk1).
(8) Append-tail(T2, zk0).
(9) T = (T1, T2).
(10) EndFor
(11) For a = 1 to |C| do begin
(12)
For b = 1 to |Ca| do begin
(13)
If vba = xj then begin
(14)
Tb = +(T, vba=1)
(15)
T = (T, vba=1)
(16)
end
26
(17)
else begin
(18)
Tb = +(T, vba = 0)
(19)
T = (T, vba = 0)
(20)
end
(21)
End for
(22)
Discard (T)
|Ca |
(23)
T   (Tb )
b 1
(24) Endfor
(25) * second block *
(26) *first part- truth assignment which make *
(27) * clause “100” is in T1 *
(28) For a = 1 to |C| do begin
(29)
If v1a = xj then begin
(30)
T1 = +(T, v1a = 1)
(31)
T2 = (T, v1a = 1)
(32)
end
(33)
else begin
(34)
T1 = +(T, v1a = 0)
27
(35)
T2 = (T, v1a = 0)
(36)
end
(37)
If v2a = xj then begin
(38)
T3 = (T1, v2a = 0)
(39)
T1 = +(T1, v2a = 0)
(40)
(41)
end
else begin
(42)
T3 = (T1, v2a = 1)
(43)
T1 = +(T1, v2a = 1)
(44)
end
(45)
If v3a = xj then begin
(46)
T4 = (T1, v3a = 0)
(47)
T1 = +(T1, v3a = 0)
(48)
(49)
end
else begin
(50)
T4 = (T1, v3a = 1)
(51)
T1 = +(T1, v3a = 1)
(52)
end
28
T =  (T2, T3, T4)
(53)
(54)*second part-truth assignment which make*
(55) * clause “010” is in T5 *
(56)
If v1a = xj then begin
(57)
T5 = +(T, v1a = 0)
(58)
T6 = (T, v1a = 0)
(59)
end
(60)
else begin
(61)
T5 = +(T, v1a = 1)
(62)
T6 = (T, v1a = 1)
(63)
end
(64)
If
v2a = xj then begin
(65)
T7 = (T5, v2a = 1)
(66)
T5 = +(T5, v2a = 1)
(67)
(68)
end
else begin
(70)
T7 = (T5, v2a = 0)
(69)
T5 = +(T5, v2a = 0)
29
(71)
(72)
end
If
v3a = xj then begin
(74)
T8 = (T5, v3a = 0)
(73)
T5 = +(T5, v3a = 0)
(75)
end
(76)
else begin
(77)
T8 = (T5, v3a = 1)
(78)
T5 = +(T5, v3a = 1)
(79)
end
(80)
T =  (T6, T7, T8)
(81) *third part-truth assignment which make*
(82) * clause “001” is in T9 *
(83)
If
v1a = xj then begin
(84)
T9 = +(T, v1a = 0)
(85)
T10 = (T, v1a = 0)
(86)
end
(87)
else begin
(88)
T9 = +(T, v1a = 1)
30
(89)
T10 = (T, v1a = 1)
(90)
(91)
end
If
v2a = xj then begin
(92)
T11 = (T9, v2a = 0)
(93)
T9 = +(T9, v2a = 0)
(94)
end
(95)
else begin
(96)
T11 = (T9, v2a = 1)
(97)
T9 = +(T9, v2a = 1)
(98)
end
(99)
If
v3a = xj then begin
(100)
T12 = (T9, v3a = 1)
(101)
T9 = +(T9, v3a = 1)
(102)
(103)
end
else begin
(104)
T12 = (T9, v3a = 0)
(105)
T9 = +(T9, v3a = 0)
(106)
end
31
(107)
(108)
Discard (T)
T =  ( T1, T5, T9 )
(109) Endfor
(110) EndAlgorithm
The answer is in test tube T.
Lemma 4-1: The algorithm 4-1 can be used to solve a One-In-Three 3-SAT problem
for n logical variables and a collection C of clauses over n.
Proof:
The algorithm 4-1 consists of 2 block of codes and is implemented by means of
the extract, amplify, append-tail, discard and merge operations. The first block of
codes (from step(2) to step(24)) contains solution space of 2n states of n bits which is
generated from each execution of step(2) through step(10). From step(11) to step(24)
would generate all truth assignments. The proof of generating solution space and
truth assignments can be referred to Lemma 3-1 of algorithm 3-1.
The second block of codes can be furthermore divided into three parts of codes.
The first part of codes contain from step(28) to step(53). The second part of codes
contain from step(56) to step(80). The third part of codes contain from step(83) to
32
step(106). Each part of codes contains three if-like instructions which are exactly
extract operations. There are totally nine if-like instructions. First part of three if-like
instructions (from step(28) to step(52))would extract clauses which are of the form
100 and union the others (step(53))which are not of the form 100 into test tube T.
Keep on executing second part of three if-like instructions (from step(56) to
step(79))which would extract clauses of the form 010 and union the others
(step(80))which are not of the form 010 into test tube T. Keep on executing third
part of three if-like instructions (from step(83) to step(106)) which would extract
clauses of the form 001 and union the others which are not of the form 001 into test
tube T. Discard T (step(107))and put answers of the first round into test tube T
(step(108)). Go back to step(28) and keep on extracting clauses of the forms of 100,
010, 001 (execute from step(28) to step(108)). Repeat these steps after all clauses are
checked and the answers are left in test tube T.
4.3 The Power of the DNA Algorithm to Solve One-In-Three (1IN3) 3-SAT
problem
The example for V = ( x1, x2, x3, x4 ) and f ( x1, x2, x3, x4 ) = C = ( x1  x 2  x3 )

( x1  x3  x 4 ) in subsection 5.1 is applied to show the power of Algorithm 4-1.
33
The first execution of Step (13) through Step (20) when a = 1 and b = 1, we put the
subset whose leftmost encoding bit is 1 on T1, and put the subset whose leftmost
encoding bit is 0 on T, so we get T1= {1000, 1001, 1010, 1011, 1100, 1101, 1110,
1111 } and T = { 0000, 0001, 0010, 0011, 0100, 0101, 0110, 0111 }. Next, the second
execution of Step (13) through Step (20) when a = 1 and b = 2, we get T2 = { 0000,
0001, 0010, 0011 } and T = { 0100, 0101, 0110, 0111}. Then, the third execution of
Step (13) through Step (20) when a = 1 and b = 3, we obtain T3 = {0100, 0101} and T
= { 0100, 0111}. Because the first outer loop is ended, the first execution of Step (22)
is applied to discard test tube T and the first execution of Step (23) is applied to
merge test tube T1, T2, T3 into T, we get T = {1000, 1001, 1010, 1011, 1100, 1101,
1110, 1111, 0000, 0001, 0010, 0011, 0100, 0101}.
For the second outer loop of The forth execution of Step (13) through Step (20)
when a = 2 and b = 1, we put the subset whose leftmost encoding bit is 1 on T1, and
put the subset whose leftmost encoding bit is 0 on T, so we get T1= {1000, 1001,
1010, 1011, 1100, 1101, 1110, 1111 } and T = { 0000, 0001, 0010, 0011, 0100,
0101, }. Next, the fifth execution of Step (13) through Step (20) when a = 2 and b = 2,
we get T2 = { 0000, 0001, 0100, 0101 } and T = { 0010, 0011 }. Then, the sixth
execution of Step (13) through Step (20) when a = 2 and b = 3, we obtain T3 = {0011}
34
and T = { 0010 }. Because the second outer loop is ended, the second execution of
Step (22) is applied to discard test tube T and the second execution of Step (23) is
applied to merge test tube T1, T2, T3 into T, we get T = {1000, 1001, 1010, 1011, 1100,
1101, 1110, 1111, 0000, 0001, 0100, 0101, 0011 } are the truth assignments.
The truth assignments T = {1000, 1001, 1010, 1011, 1100, 1101, 1110, 1111,
0000, 0001, 0100, 0101, 0011 }. We keep on tracing second block of codes which
would find the truth assignment such that each clause has exactly one true literal. The
first execution of Step (29) through Step (36) when a = 1, we put the assignment
whose leftmost encoding bit is 1 on T1, and put the assignment whose leftmost
encoding bit is 0 on T2, so we get T1= {1110, 1111, 1010, 1011, 1100, 1000, 1001,
1101 } and T2 = {0000, 0001,0100, 0101, 0011}. Next, after the execution of step (37)
through Step (44), we get T3 = {1010, 1011, 1000, 1001 } and T1 = {1110 , 1111,
1100 , 1101}. Then, after the execution of step (45) through Step (52), we obtain T4 =
{1100, 1101} and T1 = {1110, 1111}. The first execution of Step (53) is applied to
merge test tube T2, T3, T4 into T, we get T = {0000, 0001, 0100, 0101, 0011, 1010,
1011, 1100, 1000, 1001, 1101 }.
Then, after the execution of step (56) through step (63), we get T5 = {0000, 0001,
0100, 0101, 0011 } and T6 = { 1010, 1011, 1100, 1000, 1001, 1101 }. After the
35
execution of step (64) through step (71), we get T7 = { 0100, 0101 } and T5 = { 0000,
0001, 0011 }. After the execution of step (72) through step (79), we get T8 = {0000,
0001} and T5 = {0011}. At step (80), we merge T6, T7, T8 into T and get T = {1010,
1011, 1100, 0100, 0101, 0000, 0001, 1000, 1001, 1101 }. Then, after the execution of
step (83) through step (90), we get T9 = { 0100, 0101, 0000, 0001 } and T10 = {1010,
1011, 1100, 1000, 1001, 1101 }. After the execution of step (91) through step (98),
we get T11 = { 0000, 0001 } and T9 = { 0100, 0101 }. After the execution of step (99)
through step (106), we get T12 =  and T9 = {0100, 0101 }. At step (107), we discard
T. At step (108), we merge T1, T5, T9 into T and get T = { 1110, 1111, 0011, 0100,
0101 } which is the answer when a = 1.
For the second loop when a = 2, after the execution of step (29) through step
(36), we get T1 = {1110, 1111 } and T2 = { 0011, 0100, 0101 }. After the execution of
step (37) through step (44), we get T3 =  and T1 = { 1110, 1111 }. After the
execution of step (45) through step (52), we get T4 = {1111 } and T1 = { 1110 }. The
second execution of Step (53) is applied to merge test tube T2, T3, T4 into T, we get T
= {0011, 0100, 0101, 1111 }.
Then, after the execution of step (56) through step (63), we get T5 = {0011, 0100,
0101 } and T6 = { 1111 }. After the execution of step (64) through step (71), we get
36
T7 = {0011 } and T5 = { 0100, 0101 }. After the execution of step (72) through step
(79), we get T8 = {0101 } and T5 = {0100 }. The second execution of Step (80) is
applied to merge test tube T6, T7, T8 into T, we get T = { 1111, 0011, 0101 }.
Then, after the execution of step (83) through step (90), we get T9 = { 0011, 0101 }
and T10 = { 1111 }. After the execution of step (91) through step (98), we get T 11 =
{0101 } and T9 = { 0011 }. After the execution of step (99) through step (106), we get
T12 =  and T9 = { 0011}. And then we discard T and merge T1, T5, T9 into T which is
the final answer = T = {1110, 0100, 0011 }.
4.4 The Complexity of Algorithm 4-1
The following theorems describe time complexity and volume complexity of
Algorithm 4-1, numbers of test tube used and the longest library strand in solution
space in Algorithm 4-1.
Theorem 4-1: A set V of n logical variables and a collection C of clauses which are
{C1, C2, …, Cp} over n. The One-In-Three (1IN3) 3-SAT problem for C and V can be
solved in O(24p) with “extract” operation, O(2p) with “discard” operation, O(2n)
with “append” operation, O(n+4p) with “merge” operation, and O(n-1) with
“amplify” operation in the Adleman-Lipton model.
37
Proof:
Algorithm 4-1 can be applied to solve the One-In-Three (1IN3) 3-SAT problem for C
and V. From the first block of codes in algorithm 1, it is obvious that we use 2*3*p
= (6p) “extraction” operations, (p ) “discard” operations, ( 2*n ) “append” operations
and ( n+p ) “merge” operations, and ( n-1) “amplify” operation. From the second
block of codes in Algorithms 1, we use 2*9*p = (18p) “extraction” operations, (p )
“discard” operations, and (3p ) merge operations. Therefore, from the analysis above,
it is inferred that the time complexity of Algorithm 1 is O(24p) with “extract”
operation, O(2p) with “discard” operation, O(2n) with “append” operation, O(n+4p)
with “merge” operation, and O(n-1) with “amplify” operation in the Adleman-Lipton
model.
Theorem 4-2: A set V of n logical variables and a collection C of clauses which are
{C1, C2, …, Cp} over n. The One-In-Three (1IN3) 3-SAT problem for C and V can be
solved with O( 2n ) library strands in the Adleman-Lipton model.
Proof:
Refer to Lemma 4-1 and Theorem 4-1.
Theorem 4-3: A set V of n logical variables and a collection C of clauses which are
{C1, C2, …, Cp} over n. The One-In-Three (1IN3) 3-SAT problem for C and V can be
38
solved with O( n ) tubes in the Adleman-Lipton model.
Proof:
Refer to Lemma 4-1 and Theorem 4-1.
Theorem 4-4: A set V of n logical variables and a collection C of clauses which are
{C1, C2, …, Cp} over n. The One-In-Three (1IN3) 3-SAT problem for C and V can be
solved with the longest library strand, O( 15*n +15*p ), in the Adleman-Lipton
model.
Proof:
Refer to Lemma 4-1 and Theorem 4-1.
39
Chapter 5
A DNA-based Algorithm for Solving the Hitting-set Problem
5.1 Definition of the Hitting-set Problem
Informally, assume it is given a collection C of subsets of a finite set S, and a
positive integer K, and expecting to find a subset S   S with | S  |  K, such that S 
contains at least one element from each subset in C. In other words, S  is the
smallest subset that hits (intersects) every subset in C and the size of the sets in S 
cannot be larger than K. The formal definition is then described as below.
Definition 5-1: Assume that a ground set S with n elements and a collection C of
subsets {C1, C2, …, Ci, …, Cp} are given, where Ci is a subset of S and a positive
integer K ≦ |S|. The hitting-set problem is to find if there is some subset S  of S such
that | S  |  K and (Ci ∩ S  ) ≠ , where i = 1, 2, 3, … p.
For example, S = {1, 2, 3, 4} and C = {{1, 2, 3}, {4}}. From definition 6-1, the
hitting-set for S and C consists of {1, 4}, {2, 4}, {3, 4} and K = 2.
40
5.2 Constructing Solution Space of DNA Sequences for the Hitting-set Problem
Suppose that an n-bit binary number corresponds to each possible hitting set and
n is the number of elements of ground set S. The encoding scheme is that, if the ith
element appears in the subset, then the corresponding ith bit for the encoding number
is 1, otherwise it is set to 0. In a real-world implementing scheme, assume that an
n-bit binary number Q is represented by a binary number z1, …, zn, where the value of
zk is either 1 or 0 for 1  k  n. A bit zk is the kth bit in an n-bit binary number Q and it
represents the kth element in S. All possible subsets S  of a ground set S = {1, 2, 3, 4}
are shown in Table 5-1.
Table 5-1: each possible subsets S  of a ground set S = {1, 2, 3, 4}.
Subset
Encoded sequence
Subset
Encoded sequence

0000
{1}
0001
{2}
0010
{3}
0100
{4}
1000
{1,2}
0011
{1,3}
0101
{1,4}
1001
{2,3}
0110
{2,4}
1010
41
{3,4}
1100
{1,2,3}
0111
{1,2,4}
1011
{1,3,4}
1101
{2,3,4}
1110
{1,2,3,4}
1111
5.3 Introduction of Finding the Maximum and Minimum Numbers of Ones in
Bio-molecular Computing
Consider that four combinations of two bits that are, subsequently, 00(010),
01(110), 10(210), and 11(310). One interesting question is that, how these four
combinations are classified from the number of ones in their combinations. Because
the numbers of ones for 11(310), 10(210), 01(110) and 00(010) are respectively, two, one,
one and zero, cases of 11(310) and 00(010) are two different classes and 10(210) and
01(110) are under the same classification scheme, i.e. with only one-bit 1 each.
Similarly, we can extend this interesting question to, how the 2n combinations of n
bits are classified from the number of ones in their combinations. This is to say that
those combinations have k ones for 0  k  n, for a n-bit long sequence.
Assume that a binary number of n bits, zn, zn1, …, z2, z1 can be applied to form 2n
combinations, where the value for each zk bit is either one or zero for 1  k  n. For
the sake of convenience, zk1 denotes the fact that the value of zk is one and zk0 denotes
42
the fact that the value of zk is zero. The following algorithm is proposed to find the
maximum and minimum numbers of one from the 2n combinations.
Algorithm 5-1: ParallelFind(T0)
(1) Append-head(T1, z11);
(2) Append-head(T2, z10);
(3) T0 = (T1, T2);
(4) For k = 2 to n
(5)
Amplify(T0, T1, T2);
(6)
Append-head(T1, zk1);
(7)
Append-head(T2, zk0);
(8)
T0 = (T1, T2);
(9) EndFor
(10) For k = 0 to n  1
(11)
For j = k downto 0
(12)
Tj + 1ON = +(Tj, zk + 11) and Tj = (Tj, zk + 11);
(13)
Tj + 1 = (Tj + 1, Tj + 1ON);
(14)
EndFor
43
(15) EndFor
(16) EndAlgorithm
Lemma 5-1: The algorithm, ParallelFind(T0), can be used to find the maximum and
minimum numbers of one from 2n combinations of any n-bit binary sequence.
Proof:
The algorithm ParallelFind(T0) is implemented by means of the extract, amplify,
append-head and merge operations. Solution space of 2n states of the n bits is
generated from each execution for steps (1) through (9). After those operations are
performed, the 2n combinations of n bits are then contained in tube T0. Tj is
distinguished from TjON, and the test tube Tn would have n 1s finally.
Steps (10) and (11) of Algorithm 3.1 are respectively the outer and inner loops of
the proposed nested loop. Because the loop index k is varied from 0 to n  1, steps
(10) and (11) are mainly employed to figure out the influence of zk + 1 for the number
of ones in tubes T0 through Tj + 1, for that the value of j is from k through 0. On each
execution of step (12), it calls the extract operation from tube Tj to form two different
44
tubes, i.e. Tj
+ 1
ON
and Tj. This fact implies that tube Tj
+ 1
ON
contains those
combinations that having zk + 1 = 1 and tube Tj includes those combinations which
have zk + 1 = 0, since those combinations in tube Tj have j ones and the combinations
in Tj + 1ON are with (j + 1) ones. Next, each execution of step (13) applies the merge
operation to pour tube Tj + 1ON into tube Tj + 1. Repeat executing steps (12) and (13)
until the influence of zn for the number of ones in tubes T0 through Tn is processed.
This implies that those combinations in tube Tk for 0  k  n have k ones.
5.4 Generate DNA-based algorithm to solve the Hitting-set problem
Followed the definitions presented in previous subsection, let a literal zi1 be a
logical variable which represents the ith element in the finite set S and is set to 1 if it
appears in the subset S  , and zi0 states set to 0 it does not appear in the subset S  . The
initial set T contains many strings, while each encodes a single n-bit sequence. All
possible 2n choices of subsets are encoded in the tube T. The following DNA-based
algorithm is proposed to solve the hitting-set problem.
Algorithm 5-2:
(1) Append-head(T1, z11);
45
(2) Append-head(T2, z10);
(3) T = (T1, T2);
(4)
For k = 2 to n
(5)
Amplify(T, T1, T2);
(6)
Append-head(T1, zk1);
(7)
Append-head(T2, zk0);
(8)
T = (T1, T2);
(9)
EndFor
(10) For a = 1 to |C| do begin
(11)
(12)
For b = 1 to |Ca| do begin
If (the bth element in the ath subset in C is the ith element in S)
(13)
then begin
(14)
Tb= +(T,zi1);
(15)
T = (T,zi1);
(16)
end
(17)
Endfor
(18)
Discard (T);
(19)
T   (Tb ) ;
|Ca |
b 1
46
(20) Endfor
(21) T0 = (T0, T);
(22) For k = 0 to n  1
(23)
For j = k down to 0
(24)
Tj + 1ON = +(Tj, zk + 11) and Tj = (Tj, zk + 11);
(25)
Tj + 1 = (Tj + 1, Tj + 1ON);
(26)
EndFor
(27) EndFor
(28) For k = 1 to n
(29)
If (Detect(Tk) = = “yes”)
(30)
then Begin
(31)
(32)
Read(Tk) and terminate the algorithm;
End
(33) EndFor
(34) EndAlgorithm
Lemma 5-2: Algorithm 5-2 can be used to solve the hitting-set problem for an
n-element set S and a collection of subset C.
47
Proof:
The solution space of 2n states of the n-bit pattern is generated from each
execution for steps (1) through (9). After those operations are performed, 2n
combinations of the n-bit sequence are contained within tube T0. Step (10) is the outer
loop which is run the number of subsets in C, and step (11) is the inner loop which is
executed the number of elements in each subset in C. Each time the outer loop (step
10) is proceeded, the number of execution iterations of the inner loop is equal to the
number of elements of the ath subset in C. Steps (14) and (15) say that we extract the
subset whose zi is 1 and put it on test tube Tb, and extract the subset whose zi is 0 and
place it into test tube T. When the inner loop is ended, we discard T and merge all Tb
into T. The outer loop will be repeated in the same way. When all outer loops are
ended, the hitting set is then in test tube T.
From Algorithm 5-1, it is very clear that steps (21)-(27) are used to figure out the
number of ones for those hitting-sets in T0. Next, step (28) is the last loop and is used
to find the final answer. If the kth execution of step (29) returns a “yes”, then step (31)
is applied to read the answer and Algorithm 5-2 is terminated. Otherwise, repeat to
48
execute step (28) through step (33) until the answer is found.
5.5 Simple Example of the Hitting-set Problem
The example for S = {1, 2, 3, 4} and C = {{1, 2, 3}, {4}} in subsection 3.1 is
applied here again to show the power of Algorithm 5-2. During the first execution of
steps (14) and (15), where a = 1 and b = 1, we put the subset whose rightmost
encoded bit is 1 in T1, as well as 0 in T. Therefore, we get T1= {0001, 0011, 0101,
1001, 0111, 1011, 1101, 1111} and T = {0000, 0010, 0100, 1000, 0110, 1010, 1100,
1110}. Next, the second execution of steps (14) and (15), where a = 1 and b = 2, we
have now T2 = {0010, 0110, 1010, 1110} and T = {0000, 0100, 1000, 1100}. Then,
the third execution of step (14) and step (15), when a = 1 and b = 3, we obtain T3 =
{0100, 1100} and T = {0000, 1000}. Since the first outer loop is ended, the first
execution of step (18) is applied to discard test tube T and the first execution of step
(19) is applied to merge test tubes T1, T2, T3 into T, we grant T = {0001, 0011, 0101,
1001, 0111, 1011, 1101, 1111, 0010, 0110, 1010, 1110, 0100, 1100}.
For the second outer loop under a = 2 and b = 1, from the fourth execution of
steps (14) and (15), we get T1 = {1001, 1011, 1101, 1111, 1010, 1110, 1100} and T =
49
{0001, 0011, 0101, 0111, 0010, 0110, 0100}. Because the second subset has only one
element, the second outer loop is ended as well. The second execution of step (18) is
applied to discard T and the second execution of step (19) is used to merge T1 into T.
This implies T = {1001, 1011, 1101, 1111, 1010, 1110, 1100}.
After we get the hitting set, we keep going to find the minimum numbers of ones in
T, i.e. find the smallest k of the hitting-set which is the small hitting-set. From the
execution of step (21), we obtain T0 = {1001, 1011, 1101, 1111, 1010, 1110, 1100}.
Next, from the first execution of steps (24)-(25) when k = 0 and j = 0, we have T1ON =
{1001, 1011, 1101, 1111}, T0 = {1010, 1110, 1100}, and T1 = {1001, 1011, 1101,
1111}. From the second execution of steps (24) and (25) under k = 1 and j = 1, we get
T2ON = {1011, 1111}, T1 = {1001, 1101}, and T2 = {1011, 1111}. Then, from the third
execution of steps (24)-(25), with k = 1 and j = 0, we obtain T1ON = {1010, 1110}, T0
= {1100}, and T1 = {1001, 1101, 1010, 1110}. From the fourth execution of steps (24)
and (25) when k = 2 and j = 2, we get T3ON = {1111}, T2 = {1011}, and T3 = {1111}.
Lately, from the fifth execution of steps (24) and (25) where k = 2 and j = 1, we
obtain T2ON = {1101, 1110}, T1 = {1001, 1010}, and T2 = {1011, 1101, 1110}. From
the sixth execution of these two steps, with k = 2 and j = 0, we get T1ON = {1100}, T0
50
= , and T1 = {1001, 1010, 1100}. Thereafter, from the seventh execution of step (24)
and step (25) when k = 3 and j = 3, we have T4ON = {1111}, T3 = , and T4 = {1111},
while from the eighth execution , when k = 3 and j = 2, we obtain T3ON = {1011, 1101,
1110}, T2 = , and T3 = {1011, 1101, 1110}. Then, from the ninth execution of step
(24) and step (25) when k = 3 and j = 1, we get T2ON = {1001, 1010, 1100}, T1 = (T1,
z41) = , and T2 = {1001, 1010, 1100}. Finally, from the tenth execution of steps (24)
and (25) under k = 3 and j = 0, we obtain T1ON = , T0 = , and T1 = . After those
operations are finished, we obtain T0 = , T1 = , T2 = {1001, 1010, 1100}, T3 =
{1011, 1101, 1110}, and T4 = {1111}.
Since T1 is an empty tube and T2 is not empty, a “yes” is returned from the second
execution of step (29). As the result, the answer from the first execution of step (31)
is {1001, 1010, 1100} which is the small hitting-set with k =2.
5.6 Complex Example of the Hitting-set Problem
Consider another complex example of S = {1, 2, 3, 4} and C = {{1, 2, 3}, {2, 3,
4}}, we would get the hitting-set which are {{2}, {3}, {1, 2}, {1, 3}, {1, 4}, {2, 3},
{2, 4}, {3, 4}, {1, 2, 3}, {1, 2, 4}, {1, 3, 4}, {2, 3, 4}, {1, 2, 3, 4}}, and the small
51
hitting-set are {{2}, {3}} when k = 1.
The first execution of steps (14) and (15) of Algorithm 5-2, when a = 1 and b = 1,
we put the subset whose rightmost encoded bit is 1 into T1, as well as 0 into T, so we
can get T1= {0001, 0011, 0101, 1001, 0111, 1011, 1101, 1111} and T = {0000, 0010,
0100, 1000, 0110, 1010, 1100, 1110}. Next, the second execution of these two steps
under a = 1 and b = 2, we get T2 = {0010, 0110, 1010, 1110} and T = {0000, 0100,
1000, 1100}. Then, the third execution when a = 1 and b = 3, we obtain T3 = {0100,
1100} and T = {0000, 1000}. Because the first outer loop is ended, the first execution
of step (18) is applied to discard test tube T and the first execution of step (19) is
applied to merge test tubes T1, T2, T3 into T, we get T = {0001, 0011, 0101, 0111,
1001, 1011, 1101, 1111, 0010, 0110, 1010, 1110, 0100, 1100}.
For the second outer loop when a = 2 and b = 1, from the fourth execution of
steps (14) and (15), we put the subset whose second rightmost encoding bit is 1 into
T1, as well as 0 into T, so we get T1 = {0011, 0111, 1011, 1111, 0010, 0110, 1010,
1110} and T = {0001, 0101, 1001, 1101, 0100, 1100}. Then, the fifth execution of
these two steps of Algorithm 5-2 when a = 2 and b = 2, we have T2 = {0101, 1101,
0100, 1100} and T = {0001, 1001}, while after the sixth execution of step (14) and
step (15) when a = 2 and b = 3, T3 = {1001} and T = {0001}. The second execution
52
of step (18) is applied to discard T and the second execution of step (19) is used to
merge T1, T2, and T3 into T. This implies that T = {0011, 0111, 1011, 1111, 0010,
0110, 1010, 1110, 0101, 1101, 0100, 1100, 1001} is the hitting-set.
After located the hitting set, we keep going to find the minimum numbers of ones
in T, i.e. to find the smallest k of the hitting-set which is the small hitting-set. In
Algorithm 5-2, from the execution of step (21), we obtain T0 = {0011, 0111, 1011,
1111, 0010, 0110, 1010, 1110, 0101, 1101, 0100, 1100, 1001}. Then, from the first
execution of steps (24) and (25) when k = 0 and j = 0, we get T1ON = {0011, 0111,
1011, 1111, 0101, 1101, 1001}, T0 = {0010, 0110, 1010, 1110, 0100, 1100}, and T1 =
{0011, 0111, 1011, 1111, 0101, 1101, 1001}, while from the second execution of step
(24) and step (25) when k = 1 and j = 1, we get T2ON = {0011, 0111, 1011, 1111}, T1 =
{0101, 1101, 1001}, and T2 = {0011, 0111, 1011, 1111}. From the third execution of
steps (24)-(25) under k = 1 and j = 0, we grant T1ON = {0010, 0110, 1010, 1110}, T0 =
{0100, 1100}, and T1 = {0101, 1101, 1001, 0010, 0110, 1010, 1110}. From the fourth
execution of steps (24) and (25) when k = 2 and j = 2, we get T3ON = {0111, 1111}, T2
= {0011, 1011}, and T3 = {0111, 1111}. Then, followed by the fifth execution of steps
(24) and (25) when k = 2 and j = 1, we get T2ON = {0101, 1101, 0110, 1110}, T1 =
53
{1001, 0010, 1010}, and T2 = {0011, 1011, 0101, 1101, 0110, 1110}, while when k =
2 and j = 0, we have T1ON = {0100, 1100}, T0 = , and T1 = {1001, 0010, 1010, 0100,
1100}. Next, by the seventh execution of steps (24)-(25), when k = 3 and j = 3, we get
T4ON = {1111}, T3 = {0111}, T4 = {1111}, and, from the eighth execution of step (24)
and step (25) when k = 3 and j = 2, we get T3ON = {1011, 1101, 1110}, T2 = {0011,
0101, 0110}, T3 = {0111, 1011, 1101, 1110}. From the ninth execution of step (24)
and step (25) when k = 3 and j = 1, we get T2ON = {1001, 1010, 1100}, T1 = {0010,
0100}, T2 = {0011, 0101, 0110, 1001, 1010, 1100}. Next, from the tenth execution of
step (24) and step (25) when k = 3 and j = 0, T1ON = , T0 = , T1 = {0010, 0100}.
After those operations are finished, T0 = , T1 = {0010, 0100}, T2 = {0011, 0101,
0110, 1001, 1010, 1100}, T3 = {0111, 1011, 1101, 1110}, and T4 = {1111}.
Since T1 is not an empty tube, a “yes” is returned from the first execution of
step (29). Therefore, the answer from the first execution of step (31) is {0010, 0100}
which is the small hitting-set with k =1.
5.7 The Complexity Analysis of Algorithm 5-2
The following theorems describe time complexity and volume complexity of
Algorithm 5-2, as well as its numbers of test tube used and the longest library strand
54
in solution space.
Theorem 5-1: A ground set S with n elements and a collection C of subsets {C1, C2,
C3, … Cp} are given, where Ci is a subset of S and a positive integer K ≦ |S|. We
define |C1| + |C2| +…+ |Cp| = q and the hitting-set problem for S and C can be solved
in O(2q+n2+n) with “extract” operation, O(p) with “discard” operation, O(2n) with
“append” operation, O(1+p+(n2+3n)/2) with “merge” operation, and O(n-1) with
“amplify” operation in the Adleman-Lipton model.
Proof:
Algorithm 5-2 can be applied to solve the hitting-set problem for S and C. From
steps (1)-(20), it is obvious that we use 2*q = (2q) “extraction” operations, (p)
“discard” operations, (2n) “append” operations and (n+p) “merge” operations, and
(n-1) “amplify” operation. From step (21) to step (34) in Algorithms 3-1, we use 2*
(1+2+…+(n-1)+n) = 2*((1+n)*n)/2 = n2+n “extraction” operations, no “discard”
operations, no append operation, and 1+1+2+…+(n-1)+n = 1 + ((1+n)*n)/2 = 1 +
(n2+n)/2 merge operations, and no “amplify” operation. Therefore, from the analysis
above, it is inferred that the time complexity of Algorithm 3-2 is O(2p+n2+n) with
“extract” operation, O(p) with “discard” operation, O(2n) with “append” operation,
O(1+p+(n2+3n)/2) with “merge” operation, and O(n-1) with “amplify” operation in
55
the Adleman-Lipton model.
Theorem 5-2: A ground set S with n elements and a collection C of subsets {C1, C2,
C3, … Cp} are given, where Ci is a subset of S and a positive integer K ≦ |S|. The
hitting-set problem for S and C can be solved with O(2n) library strands in the
Adleman-Lipton model.
Proof:
(Refer to Lemma 5-2 and Theorem 5-1.)
Theorem 5-3: A ground set S with n elements and a collection C of subsets {C1, C2,
C3, … Cp} are given, where Ci is a subset of S and a positive integer K ≦ |S|. The
hitting-set problem for S and C can be solved with O( n ) tubes in the
Adleman-Lipton model.
Proof:
(Refer to Lemma 6-2 and Theorem 6-1.)
Theorem 5-4: A ground set S with n elements and a collection C of subsets {C1, C2,
C3, … Cp} are given, where Ci is a subset of S and a positive integer K ≦ |S|. The
56
hitting-set problem for S and C can be solved with the longest library strand, O(15*n
+15*p), in the Adleman-Lipton model.
Proof:
(Refer to Lemma 5-2 and Theorem 5-1.)
57
Chapter 6
Simulated Experimental Results
Adleman and his coworkers devise a scheme to design DNA sequences for a
combinatorial library encoding strings of zeros and ones [2,5]. In this scheme a
particular N-bit number is represented by a DNA sequence that is (N * K) bases long
and is divided logically into N sequential k-base long blocks. Each block bears one of
two unique DNA sequences, one that represents a ‘1’ and the other represents a ‘0’.
Importantly, the sequence that encodes ‘0’ in the first block is different from the
sequence that encodes ‘0’ in the second block and all of the other blocks. Thus there
are 2N different short DNA sequences that are used to create any of the 2^N possible
library strands. DNA sequence design is a very important issue because DNA-based
computing relies on the biochemical operations and these operations could cause
errors if we do not have a proper design. Adleman and his coworkers introduce seven
constraints to ease the probe-library hybridization by reducing secondary structure in
the DNA molecules [5]. The constraints are:
(1). Library strands contain only As, Ts, and Cs.
(2).Every library and probe sequence has no runs of more than 4 As, 4 Ts, 4 Cs or
58
4Gs.
(3). Every probe sequence has fewer than 4 mismatches with any 15 base
alignment of any library strand (except for at its matching bit-value).
(4). No 15 base section of a library strand has fewer than 4 mismatches with any
15 base alignment of itself or any other library strand.
(5). No 15 base probe has a run of more than 7 matches with any 8 base
alignment of any library strand (except for at its matching bit-value).
(6). No library strand has a run of more than 7 matches with any 8 base alignment
of itself or any other library strand.
(7). Every probe has 4, 5, or 6 Gs in its sequence.
By the constraint (1), we know that library strands contain only As, Ts, and Cs
which would have less secondary structure than those contain equal numbers of As,
Ts, Cs, and Gs, and have more opportunity for binding probes. By the constraint (2),
long homopolymer tracts may have unusual secondary structure that inhibits the
binding of probes to library strands and the melting temperatures of probe and library
strands hybridization will be more similar if they do not have long homopolymer
tracts. Constraints (3) and (5) are intended to ensure that probes bind only weakly
where they are not intended to bind. Constraints (4) and (6) are intended to ensure
59
that library strands have a low affinity for themselves. Constraint (7) is intended to
ensure that intended probe-library pairings have uniform melting temperatures.
We run Adleman’s program [5] using a AMD Athlon XP CPU and 1 GB of main
memory. Our operating system is Window XP and the compiler is Visual C++ 6.0 .
The program is applied to generate DNA sequences to solve the One-In-Three (1in3)
3-SAT problem and construct each 15-base DNA sequences for every bit of the
library. For each bit, the program generates two 15-base random sequences (‘1’ and
‘0’) and check to see if the library strands satisfy the seven constraints with the new
DNA sequences added [5]. If the constraints are satisfied, the new DNA sequences
are ‘greedily’ accepted. If the constraints are not satisfied then mutations are
introduced one by one into the new block until either: (A) the constraints are satisfied
and the new DNA sequences are then accepted or (B) a threshold for the number of
mutations is exceeded and the program has failed and so it exits, printing the
sequence found so far. If all bits that satisfy the constraints are found then the
program has succeeded and it outputs these sequences.
6.1 Simulation of Experimental Results of One-In-Three 3-SAT problem
60
Consider the example V = ( x1, x2, x3, x4 ) and f ( x1, x2, x3, x4 ) = C = ( x1  x 2

x3 )  ( x1  x3
 x 4 ) in subsection
5.1 for solving the One-In-Three (1in3)
3-SAT problem , simulation of Not-All-Equal (NAE) 3-SAT problem is similar. DNA
sequences generated by Adleman’s program are shown in table 6-1. Adleman’s
program is also used to calculate the enthalpy, entropy, and free energy for binding of
each probe to its corresponding region on a library strand. The energy used are shown
in table 6-2.
Table 6-1: Sequences chosen to represent xk1 and xk0 in the example for V = ( x1, x2, x3,
x4 ) and f ( x1, x2, x3, x4 ) = C = ( x1  x 2  x3 )  ( x1  x3  x 4 ) in subsection
4.1.
Bit
5  3 DNA Sequence
x1 1
CATTCACAAACAATT
x1 0
TCATTCTCAACAAAA
x2 1
CTCTATTCCTCTCAA
x2 0
ACACCCTCTAATCTA
x3 1
TCTCCCTATCTATTT
x3 0
TCCTATTTAACTCCC
x4 1
CTCTACTCAAAATAA
61
x4 0
TATAACTTTCTCTCT
Table 6-2: The energy for binding each probe to its corresponding region on a library
strand.
Bit
Entropy energy
Free energy
(S)
(G)
Enthalpy energy (H)
x1 1
109.6
285.7
24.4
x1 0
104.5
267.5
24.3
x2 1
114.2
295.4
26
x2 0
102.6
261.2
24.4
x3 1
103.9
273.3
22.1
x3 0
103.2
265.5
24
x4 1
102.4
270.7
21.5
x4 0
105.1
271.8
23.9
The program also figured out the average and standard deviation for the enthalpy,
entropy and free energy over all probe/library strand interaction. The energy levels
are shown in table 6-3.
62
Table 6-3: The energy over all probe/library strand interactions.
Average
Enthalpy energy
Entropy energy
Free energy
(H)
(S)
(G)
105.688
273.887
23.825
3.86073
10.5438
1.3245
Standard
deviation
The library strands are shown in table 6-4 and represent every possible truth
assignments such that each clause has exactly one.
Table 6-4 : DNA sequences chosen to represent answers in test tube T.
{0011}
5 -
TCATTCTCAACAAAA ACACCCTCTAATCTA
TCTCCCTATCTATTT CTCTACTCAAAATAA - 3
3 -
AGTAAGAGTTGTTTT TGTGGGAGATTAGAT
AGAGGGATAGATAAA GAGATGAGTTTTATT - 5
{0100}
5 -
63
TCATTCTCAACAAAA CTCTATTCCTCTCAA
TCCTATTTAACTCCC TATAACTTTCTCTCT- 3
3 -
AGTAAGAGTTGTTTT GAGATAAGGAGAGTT
AGGATAAATTGAGGG ATATTGAAAGAGAGA- 5
{1110}
5 -
CATTCACAAACAATT CTCTATTCCTCTCAA
TCTCCCTATCTATTT TATAACTTTCTCTCT- 3
3 -
GTAAGTGTTTGTTAA GAGATAAGGAGAGTT
AGAGGGATAGATAAA ATATTGAAAGAGAGA- 5
6.2 Simulation of Experimental Results of Hitting-set problem
Consider the example with S = {1, 2, 3, 4} and C = {{1, 2, 3}, {4}} mentioned in
subsection 5.1, and DNA sequences generated by Adleman’s program are shown in
64
Table 6-5. Adleman’s program is also used to calculate the enthalpy, entropy, and free
energy for binding of each probe to its corresponding region on a library strand, while
the energy used is shown in Table 6-6.
Table 6-5: Sequences chosen to represent zk1 and zk0 in the example for S = {1, 2, 3, 4}
and C = {{1, 2, 3}, {4}} in subsection 5.1.
Bit
5  3 DNA Sequence
z11
CATTCACAAACAATT
z10
TCATTCTCAACAAAA
z21
CTCTATTCCTCTCAA
z20
ACACCCTCTAATCTA
z31
TCTCCCTATCTATTT
z30
TCCTATTTAACTCCC
z41
CTCTACTCAAAATAA
z40
TATAACTTTCTCTCT
Table 6-6: The energies for of binding each probe to its corresponding region on a
library strand.
65
Bit
Entropy energy
Free energy
(S)
(G)
Enthalpy energy (H)
z1 1
109.6
285.7
24.4
z1 0
104.5
267.5
24.3
z2 1
114.2
295.4
26
z2 0
102.6
261.2
24.4
z3 1
103.9
273.3
22.1
z3 0
103.2
265.5
24
z4 1
102.4
270.7
21.5
z4 0
105.1
271.8
23.9
Our program also figured out the average and standard deviation for the enthalpy,
entropy and free energy over all probe/library strand interaction. The energy levels
are shown as in Table 6-7. Table 6-8 presents the library strands and the hitting-set
with k = 2: {{1, 4},{2, 4},{3, 4}}.
Table 6-7: The energies over all probe/library strand interactions.
Enthalpy energy
Entropy energy
66
Free energy
Average
(H)
(S)
(G)
105.688
273.887
23.825
3.86073
10.5438
1.3245
Standard
deviation
Table 6-8: DNA sequences chosen to represent the hitting-set with k = 2 in tube T0.
{1,4}
5 -
CTCTACTCAAAATAA TCCTATTTAACTCCC
ACACCCTCTAATCTA CATTCACAAACAATT- 3
3 -
GAGATGAGTTTTATT AGGATAAATTGAGGG
TGTGGGAGATTAGAT GTAAGTGTTTGTTAA- 5
{2,4}
5 -
CTCTACTCAAAATAA TCCTATTTAACTCCC
CTCTATTCCTCTCAA TCATTCTCAACAAAA- 3
3 -
GAGATGAGTTTTATT AGGATAAATTGAGGG
67
GAGATAAGGAGAGTT AGTAAGAGTTGTTTT- 5
(3,4)
5 -
CTCTACTCAAAATAA TCTCCCTATCTATTT
ACACCCTCTAATCTA TCATTCTCAACAAAA- 3
3 -
GAGATGAGTTTTATT AGAGGGATAGATAAA
TGTGGGAGATTAGAT AGTAAGAGTTGTTTT- 5
68
Chapter 7
Discussions and Conclusions
Lipton is the first people using DNA to solve SAT problem while his paper is still
very primitive [22]. Adleman and his co-authors (Braich et al.) chose to solve a
6-variable 11-clause formula on the 3-SAT problem [5]. Adleman’s paper
emphasizes the experiment with the design in laboratory to test the biochemical
feasibility. Then, Amos presented a DNA-based algorithm to solve a 3-SAT problem
[4]. Based on Amos’ algorithm, I do deeper research on 3-SAT problems which are
Not-All-Equal (NAE) and One-In-Three (1IN3) 3-SAT problems. My DNA-based
algorithms of the mathematical model of these two problems are complete and clear.
The Hitting-set problem is a NP-complete problem and it takes exponential time
to solve it in a traditional digital computer, while in DNA-based supercomputing,
only polynomial time is needed to solve such a complex problem (see the section 6.7
for complexity analysis). The main contribution of this paper in this part is that this
work is the first one to solve the hitting-set problem by applying a DNA-based
algorithm.
In this paper, we propose the DNA-based algorithm (Algorithm 3-1, algorithm
69
4-1, and Algorithm 5-2) to solve the Not-All-Equal (NAE) and One-In-Three (1IN3)
3-SAT problems and the hitting-set problem. These presented algorithms are based on
biological operations in the Adleman-Lipton model and therefore, inherit several
advantages from it. First, the proposed algorithms actually have a lower rate of errors
for hybridization because we use Adleman,s program to generate good DNA
sequences for constructing the solution space of Not-All-Equal (NAE) and
One-In-Three (1IN3) 3-SAT problems and the Hitting-set problem. Only simple and
fast biological operations in the Adleman-Lipton model were employed to solve these
three problems. Second, the basic biological operations in the Adleman-Lipton model
had been performed in a fully automated manner in their lab.
The full automation
manner is essential not only for the speedup of computation but also for error-free
computation.
Nowadays, many NP-complete problems which could not be solved by a
traditional digital computer is now tried to be solved by DNA-based algorithm. Even
so, it is still very difficult to support biological operations using mathematical
instructions. In the future, there are still many difficulties to be overcome and we
hope that DNA-based supercomputing could become a reality someday.
70
References
[1] L. M. Adleman. “Molecular Computation of Solutions to Combinatorial
Problems”. Science, 266, pp. 1021-1024, Nov. 11, 1994.
[2] L. M. Adleman, “On constructing a molecular computer”, in DNA-bsed computers,
volume 27 of DIMACS
[3] L. M. Adleman, “Computing with DNA”, Scientific American, August, 1998
[4] M. Amos, Theoretical and Experimental DNA Computation. Springer, 2005
[5] R. S. Braich, C. Johnson, P. W. K. Rothemund, D. Hwang, N. Chelyapov, and L.
M. Adleman, “Solution of a satisfiability problem on a gel-based DNA
computer” in Proceedings of the Sixth International Conference on DNA
Computation ( DNA 2000 ), Lecture Notes in Computer Science 2054, pp.
27-42,2001
[6] R. S. Braich, C. Johnson, P.W.K. Rothemund, N. Chelyapov, and L. M. Adleman,
2002. Solution of a 20-variable 3-SAT problem on a DNA computer. Science,
vol. 296, No. 5567, 499–502.
[7] R. Brijder, M. Cavaliere, A. Riscos-Núñez, G. Rozenberg, and D. Sburlan 2008.
Membrane systems with proteins embedded in membranes. Theoretical
Computer Science, 404, 26-39.
71
[8] J. Bishop and E. Klavins 2007. An improved autonomous DNA nanomotor.
Nanoletters, Sep., Vol. 7, No. 9, 2574-2577.
[9] W. L. Chang and M. Guo, ”Solving the clique problem and the vertex cover
problem in Adleman-Lipton’s model”, in Proceedings of IASTED
International Conference, Networks, Parallel and Distributed Processing, and
Applications, pp. 431-436, 2002
[10] W. L. Chang, M. Ho, and M. Guo, "Molecular Solutions for the Subset-sum
Problem on DNA-based Supercomputing", BioSystems (Elsevier Science), Vol. 73,
No. 2, 2004, pp. 117-130.
[11] W. L. Chang, M. Guo, and M. Ho, "Towards solution of the set-splitting problem
on gel-based DNA computing", Future Generation Computer Systems, Volume:
20, Issue: 5, June 15, 2004, pp. 875-885.
[12] W. L. Chang, M. Guo and J. Cao, "Using Sticker to Solve the 3-Dimensional
Matching Problem in Molecular Supercomputers", International Journal of
High Performance Computing and Networking, 2004, Vol. 1, No.1/2/3 pp. 128 139.
[13] W. L. Chang, M. Guo, and J. Wu, “ Solving the Independent-set Problem in a
DNA-based Super Computer Model “, Parallel Processing Letters, Vol. 15, No. 4
72
(2005) 469-479.
[14] W. L. Chang, M. Ho, M. Guo, C. Liu, “Fast Parallel Bio-molecular Solutions:
the Set-basis Problem”, International Journal of Computational Science and
Engineering, Volume 2, Number 1-2, 2006, pp. 72 – 80.
[15] W. L.Chang, “Fast Parallel DNA-based Algorithms for Molecular Computation:
the Set-Partition Problem”, IEEE Transactions on Nanobioscience, Vol. 6, No. 1,
2007, pp 346 - 353.
[16] H. Chen, A. Goel, and C. Luhrs 2008. Dimension augmentation and
combinatorial
criteria for efficient
error-resistant
DNA self-assembly.
ACM-SIAM Symposium on Discrete Algorithms (SODA) 409-418.
[17] M. Cook, P. W. K. Rothemund and E. Winfree Self-assembled circuit patterns.
2004. DNA Computers 9, LNCS v. 2943, 91-107.
[18] R. P. Feynman, “In Minaturization”. D.H. Gilbert, Ed., Reinhold Publishing
Corporation, New York, 1961, pp. 282-296.
[19] M.R. Garey and D.S. Johnson (1979), “ Computers and Intractability
A
Guide to the Theory of NP-Completeness“, San Francisco, CA
[20] R. P. Goodman, I. A. T. Schaap, C. F. Tardin, C. M. Erben, R.M. Berry, C. F.
Schmidt and A. J. Turberfield 2005. Rapid chiral assembly of rigid DNA
73
building blocks for molecular nanofabrication. Science 310, 1661-1665.
[21] S.Y. Hsieh, C.W. Huang and H.H. Chou, “A DNA-based graph encoding scheme
with its applications to graph isomorphism problems “, Applied Mathematics
and Computation, Volume 203, Issue 2, 15 September 2008, Pages 502-512
[22] R. J. Lipton. “DNA Solution of Hard Computational Problems”. Science, 268,
pp. 542-545, 1995.
[23] U. Majumder, J. H. Reif, and S. Sahu 2008. Stochastic analysis of reversible
self-assembly. Journal of Computational and Theoretical Nanoscience, Volume
5, Number 7, 1289-1305.
[24] P. O'Neill, P.W.K. Rothemund, A. Kumar and D. K. Fygenson 2006. Sturdier
DNA Nanotubes via Ligation. Nano Letters, 6:1379-1381.
[25] S. Roweis, E. Winfree, R. Burgoyne, N.V. Chelyapov, M.F. Goodman, P.W.K.
Rothemund, L.M. Adleman, 1999. A sticker based model for dna computation.
In: Landweber, L., Baum, E. (Eds.), Second Annual Workshop on DNA
Computing, Princeton University. DIMACS: Series in Discrete Mathematics and
Theoretical Computer Science. American Mathematical Society, pp. 1–29.
[26] K. Suzuki and S. Murata 2007. Design of DNA spike oscillator. Unconventional
Computing, 163-175.
74
[27] R. Yashin, R. Rudchenko, and M. N. Stojanovic 2007. Networking particles over
distance using oligonucleotide-based devices. Journal of the American Chemical
Society, 129 (50), 15581 -15584.
[28] P. Yin, H. M. T. Choi, C. R. Calvert and N.A. Pierce 2008. Programming
biomolecular self-assembly pathways. Nature, 2008, 451: 318-322.
[29] D. Y. Zhang and E. Winfree Dynamic allosteric control of noncovalent DNA
catalysis reactions. J. Am. Chem. Soc., 130 (42), 13921–13926.
[30] L. Kari, “From micro-soft to bio-soft: Computing with DNA”, Biocputing and
emergent computation: Proceedings of BCEC97, World Scientific 1997, Skovde,
Sweden, 1997, pp. 146-164.
[31] J. Watada and R. B. A. Bakar, “DNA Computing and Its Applications”, Eighth
International Conference on Intelligent Systems Design and Applications,
pp.288-294
[32] S. Zhou, Q. Zhang, J. Zhao and J. Li, Optimization of DNA Encodings Based on
Free Energy, ICIC Express Letters, vol.1, no.1, pp.33-37, 2007.
[33] J. Li, Q. Zhang, R. Li and S. Zhou, Optimization of DNA Encoding Based on
Combinatorial Constraints, ICIC Express Letters, vol.2, no.1, pp.81-88, 2008.
[34] Rohani Binti Abu Bakar, Junzo Watada and Witold Pedrycz, A Proximity
75
Approach to DNA Based Clustering Analysis, International Journal of
Innovative Computing, Information and Control, vol.4, no.5, pp.1203-1212,
2008.
[35] Tadahiro Kin, Ken-ichi Makino, Nobuo Noda, Kazuharu Koide and Masahiro
Nakano, The Molecular Dynamics Calculation of Clathrate Hydrate Structure
Stability for Innovative Organ Preservation Method, International Journal of
Innovative Computing, Information and Control, vol.4, no.2, pp.249-254, 2008.
[36] D. Rooss, “Recent Developments in DNA-Computing”, Proceedings of the
International Symposium on Multiple-Valued Logic, 1997, pp. 3-9
76
Vita
Nung-Yue Shi ( 施 能 裕 ) received the B.S. degree in Computer Science
Department from Feng-Chia University in Taichung, Taiwan. Then, he studied abroad
in Brooklyn, New York City, United States and received the Master of Science from
Polytechnic University in New York City. From 1987 till now, he has been an
instructor in Southern Taiwan University in Taiwan and from 2001, he has been
working towards the Ph.D. degree and currently a doctoral candidate in the
Department of Computer Science and Information Engineering in National Cheng
Kung University , Taiwan. His research interests include DNA Computing, Quantum
Computing, and Net Work Topology and Net Work Analysis.
77
Publications
Journal Papers
1. Nung-Yue Shi and Chih-Ping Chu, “A molecular solution to the hitting-set
problem in DNA-based supercomputing” , Information Sciences, Volume 180,
Issue 6, 15 March 2010, Pages 1010-1019, Special Issue on Modelling
Uncertainty
2. Nung-Yue Shi and Chih-Ping Chu, “A Molecular Algorithmic Solution for the
Not-All-Equal
and
One-In-Three
3-SAT
Problems
in
DNA-based
Supercomputing”, Accepted by International Jouranal of Innovative Computing,
Information and Control
International Conference Papers
1.Nung-Yue Shi and Chih-Ping Chu: "Fast Parallel Molecular Solution To the
Hitting-set Problem", Eighth International Conference on Intelligent systems
Design And Application,Vol. 3, pp. 442-447, ISDA 2008
78
2.Nung-Yue Shi and Chih-Ping Chu, "A DNA-based Algorithm for the
Solution of One-In-Three 3-SAT Problem", 2009 WASE International
Conference on Information Engineering, Volume I, pp.620-625 (榮獲最佳論
文獎) (Received Best Paper Award)
3.Nung-Yue Shi and Chih-Ping Chu, "A DNA-based Algorithm for the
Solution of Not-All-Equal 3-SAT Problem", 2009 WASE International
Conference on Information Engineering, Volume II, pp.94-99 (榮獲最佳論
文獎) (Received Best Paper Award)
79
Hornors
1. paper “A DNA-based Algorithm for the Solution of One-In-Three 3-SAT
Problem” received the Best Paper Award in 2009 WASE International
Conference on Information Engineering.
2. paper “A DNA-based Algorithm for the Solution of Not-All-Equal 3-SAT
Problem” received the Best Paper Award in 2009 WASE International
Conference on Information Engineering.
80
Download