國 立 成 功 大 學 資 訊 工 程 學 系 博 士 論 文 以 D N A 計 算 對 Not-All-Equal 3-SAT 問題 及 One-In-Three 3-SAT 問題及 Hitting-set 問題之分析與研究 A Study on the Molecular Algorithmic Solutions for the Not-All-Equal and One-In-Three 3-SAT Problems and the Hitting-set Problem in DNA-based Supercomputing 研 究 生: 施 能 裕 指 導 教 授: 朱 治 平 Institute of Computer Science and Information Engineering, National Cheng Kung University, Tainan, Taiwan, R.O.C. Dissertation for Doctor of Philosophy, June, 2010 中華民國 九十九 年 六 月 I II 中 文 摘 要 現代數位計算機可分硬體及軟體兩大應用領域, 硬體是以電子電路及數位 晶片架構在以布林代數(Boolean Algebra)為數學理論基礎而發展出來的電腦母板 及一系列的 PC 板等應用. 軟體的濫觴則是源自於計算理論(Automata Theory) 而有了 state, input, output 等概念而發展至現今的視窗程式設計…等領域. 然而我們對電腦速度的追求總是無止盡的, 許多的應用(例如氣象預報, 或 模擬一顆核子彈的爆炸, 或在網路上追求更完美,極至的影音表現)或數學上的難 題(例如 NP-Complete…等問題) 都須要速度更快的電腦. 然而如上所述的傳統 由電子電路所造的數位計算機其運算速度幾已達極限, 很難再有突破性的五十 倍或百倍的速度上的進展. 所以我們必須揚棄傳統的思維, 而另行思考不一樣 的, 替代性的計算模式 (Alternative Computing Model). 在國外, 替代性的計算模式 (Alternative Computing Model)已被思考多年, 諸如函數計算(Functional Programming), 或邏輯計算(Logic Programming), 或 Petri Net, 或於 1990 年由 Adleman 博士所提出的 DNA 計算模式 … 等, 本文 即是以 DNA 計算模式的思維, 對三個傳統難解的 NP-Complete 問題, 提出其 在 DNA 計算模式下的數學演算法, 期待隨著 DNA 計算模式發展的日趨成熟, 藉由其高度的平行運算, 快速增進電腦的運算速度, 進而解決在傳統數位計算 III 機上難解的數學問題或實際的應用. 本論文針對三個數位電腦上的 NP-Complete 的問題(如本論文標題所示) , 分別提出了 DNA 計算模式下的演算法. 日本學者 Junzo Watada 在 2008 年的 ISDA 國際學術研討會裏發表論文, 說到目前 DNA 計算模式的研究有兩個主流, 一為致力於發展精確的數學模型, 另一則根據正確的數學模型,專心的在奈米級的實驗室裏操作 DNA 生物指令. 我 所做的研究是屬於第一流派,發展了三個 DNA 計算模式下的演算法,亦歡迎第二 流派的學者以本論文做基礎,專心的在實驗室裏操作以期 DNA 計算模式終有實 現的一天. 關鍵字:DNA 計算模式, 滿足性問題, NP-Complete 問題, Not-All-Equal 3-SAT 問題, One-In-Three 3-SAT 問題, Hitting-set 問題 IV Abstract This dissertation is to illustrate the current state of the art of DNA computing achievements, especially of new approaches to solve theoretical 3-SAT problems and the hitting-set problem. Beginning with Adleman’s breakthrough which is an molecular algorithm for the solution of a NP-complete, combinatorial problem, the directed Hamiltonian path problem (HPP). Today, many researchers all over the world concentrate on proposing new methods to solve engineering or application problems with a DNA computing approach. Satisfiability problem is given a Boolean formula, and decide if a satisfying truth assignment exists. ( x12 x5 ) ( x 24 x3 x13 x 9 ) … ( x12 ) ( x17 x8 x18 ) is an example of Boolean formula. k-SAT problem means that each clause has exactly k literals. Not-All-Equal (NAE) 3-SAT problem and One-In-Three (1IN3) 3-SAT problem are both NP-complete problems. In this dissertation, we present molecular solutions to find all true assignments (3-SAT problem) and furthermore find Not-All-Equal (NAE) solutions and One-In-Three (1IN3) solutions in DNA-based Supercomputing. Hitting-set problem assume that there exists a collection C of subsets of a finite set S, and a positive integer K |S|, and we need to know if there is a subset S S V with | S | K such that S contains at least one element of each subset in C. In other words, S is the subset that intersects every subset in C and is called the hitting-set. In this dissertation, a DNA-based algorithm is proposed to solve the small hitting-set problem. A small hitting-set is a hitting-set with the smallest K value, i.e., the hitting-set with the smallest number of elements. Furthermore, an algorithm is introduced to find the number of ones from 2n combinations and minimum numbers of ones represents the small hitting-set since K is expected to be as small as possible. The complexity of all the presented DNA-based algorithms is also discussed. We describe time complexity and volume complexity of three Algorithms, numbers of test tube used and the longest library strand in solution space of all three Algorithms. Finally, the simulated experiment is applied to verify correctness of the proposed DNA-based algorithm for solving the One-In-Three (1IN3) 3-SAT problem, and simulation of Not-All-Equal (NAE) 3-SAT problem is similar. Also, another simulated experiment is applied to our proposed DNA-based algorithm 6-2, in order to solve the well-known hitting-set problem. This research has been motivated by the benefit and the application of DNA computing and gives new methods to solve two 3-SAT problems and the hitting-set problem which are NP-complete. Key Words: Satisfiability problem, 3-SAT problem, Not-All-Equal 3-SAT problem, One-In-Three 3-SAT problem, Hitting-set Problem, Molecular Solution, DNA-based Supercomputing, DNA-based Algorithm, NP-Complete Problems. VI 誌 謝 時光荏苒, 在成功大學資訊工程研究所學習的日子已告一段落, 回首這些 年來數不清有多少的日子來回奔走於工作與學習的崗位, 在這社會上多重角色 的扮演, 讓我更疲於奔命, 努力想演好每個角色. 在這段日子裏, 我得到了滿滿 的收穫, 是我人生中最美麗, 豐碩的回憶. 感謝我的指導教授 朱治平博士對我的諄諄教誨與悉心指導, 並提供經費讓 我出國參加學術研討會以增廣見聞. 謝孫源教授是我認識最聰明的中國人, 我 從謝教授的課堂上學習到很多, 對朱治平教授與謝孫源教授正派的作風與無私 的指導表示深深的敬意. 此外, 所上多位教授曾為我的任課教師, 如李強教授… 對我專業知識的啟迪與課業的關懷, 我也獻上深深的謝意. 非常感謝我的父母親長期默默的支持, 謹以此文獻給你們及我的一對寶貝 兒女. 施 能 裕 謹 識 于 中華民國九十九年三月十一日 VII TABLE OF CONTENTS Chapter 1 Introduction………………………………...……..………….….……1 1.1 Research Motivation…………………………..…………………...…1 1.2 Adleman’s Experiment…………………………………………...…...2 1.3 DNA computing……………………………...………………...……..3 Chapter 2 Background and related works………..…………..……………...…...6 2.1 The Adleman-Lipton model…………………………………………..6 2.2 Introduction to other related works…………………………………...9 Chapter 3 Molecular solution of Not-All-Equal (NAE) 3-SAT problem………..11 3.1 Definition of Not-All-Equal (NAE) 3-SAT problem……………..….11 3.2 Generate DNA-based algorithm to solve Not-All-Equal (NAE) 3-SAT problem………….. ….………………............................…………..12 3.3 The Power of the DNA Algorithm to Solve Not-All-Equal (NAE) 3-SAT problem…………………………………………….………. 15 3.4 The Complexity Analysis of Algorithm 4-1……………………...…18 Chapter 4 Molecular solution of One-In-Three (1IN3) 3-SAT problem..............21 4.1 Definition of One-In-Three (1IN3) 3-SAT problem…………….......21 4.2 Generate DNA-based algorithm to solve One-In-Three (1IN3) 3-SAT problem………….……………………………………………….......21 4.3 The Power of the DNA Algorithm to Solve One-In-Three (1IN3) 3-SAT problem…………………………………………………..….28 4.4 The Complexity Analysis of Algorithm 5-1…………………...…….31 Chapter 5 A DNA-based Algorithm for Solving the Hitting-set Problem…..….33 5.1 Definition of the Hitting-set Problem…………………………...…..33 5.2 Constructing Solution Space of DNA Sequences for the Hitting-set VIII Problem………………………….……………...……………………34 5.3 Introduction of Finding the Maximum and Minimum Numbers of Ones in Bio-molecular Computing...............................................................35 5.4 Generate DNA-based algorithm to solve the Hitting-set problem………………………………………...………………….…37 5.5 Simple Example of the Hitting-set Problem………………….….…..40 5.6 Complex Example of the Hitting-set Problem………….…………....42 5.7 The Complexity Analysis of Algorithm 6-2…………………..……...45 Chapter 6 Simulated Experimental Results………………...................................48 6.1 Simulation of Experimental Results of One-In-Three 3-SAT problem………………………………………………………………50 6.2 Simulation of Experimental Results of Hitting-set problem……....…53 Chapter 7 Discussions and Conclusions…………………....................................57 References ……………………………………………………………………..….71 IX List of Tables Table 6-1. Each possible subsets S of a ground set S = { 1, 2, 3, 4}……….…34 Table 7-1. Sequences chosen to represent xk1 and xk0 in the example for V = ( x1, x2, x3, x4 ) and f ( x1, x2, x3, x4 ) = C = ( x1 x 2 x3 ) ( x1 x3 x 4 ) in subsection 5.1……………………………………………….50 Table 7-2. The energy for binding each probe to its corresponding region on a library strand…………………………………………..……………51 Table 7-3. The energy over all probe/library strand interactions…………….….52 Table 7-4. DNA sequences chosen to represent answers in test tube T…………52 Table 7-5. Sequences chosen to represent zk1 and zk0 in the example for S = {1, 2, 3, 4} and C = {{1, 2, 3}, {4}} in subsection 6.1…………..…………53 Table 7-6. The energies for of binding each probe to its corresponding region on a library strand…………………............………………………………54 Table 7-7. The energies over all probe/library strand interactions……………...54 Table 7-8. DNA sequences chosen to represent the hitting-set with k = 2 in tube T0……………………………………………………………………..55 X Chapter 1 Introduction 1.1 research motivation This dissertation is to illustrate the current state of the art of DNA computing achievements, especially of new approaches to solve theoretical 3-SAT problems and the hitting-set problem. Beginning with Adleman’s breakthrough which is an molecular algorithm for the solution of a NP-complete, combinatorial problem, the directed Hamiltonian path problem (HPP). Today, many researchers all over the world concentrate on proposing new methods to solve engineering or application problems with a DNA computing approach [33, 34, 35]. DNA is a basic inheritance medium for all living cells. The main idea of DNA computing is to encode data in a DNA strand form, and use bio-operation to manipulate DNA strands in a test tube to simulate arithmetical and logical operations. It is estimated that about 1018 DNA strands could operate 10 4 times faster than the speed of a today’s advanced supercomputer [30]. Let us see another data, while modern supercomputers perform 1012 operations per second, Adleman estimates 1020 operations per second for molecular instructions to be realistic. 1 Similar impressive views concern the consumption of energy and the capacity of memory: A supercomputer needs one joule for 109 operations, while the same energy is sufficient to perform 2*1019 ligation operations [31, 32]. On a video tape, every bit needs 1012 cubic nanometers storage, whereas DNA stores information with a density of one bit per cubic nanometer [36]. This research has been motivated by the benefit and the application of DNA computing and gives new methods to solve two 3-SAT problems which are NP-complete. 1.2 Adleman’s Experiment Adleman implemented a DNA-based algorithm to solve the directed Hamiltonian path problem which is NP-complete. A Hamiltonian path is to find a directed edge through a graph that starts and ends at specified vertices, and visits every vertex in the graph exactly once. The Hamiltonian path problem is to decide if a Hamiltonian path exists in a graph. Suppose there is a grapg G with n vertices where vertices Vstart and Vend are marked. We want to decide if there is a Hamiltonian path which starts from Vstart and ends at Vend. Adleman uses a DNA-based algorithm to solve the directed HPP as 2 follows [1, 31]: Step 1. Generate random paths for the graph G. Step 2. Extract only those paths which start with Vstart and end with Vend. Step 3. Because the graph has n vertices, extract paths with length exactly n-1. Step 4. Extract paths that contain every vertex at most once. Step 5. if any path remains, say ‘yes’, otherwise, say ‘no’. The above steps are realized by bio-molecular instructions. Ligation builds DNA dtrands that represent random paths in G on step1. PCR was performed representing the Vstart and Vend on step 2. Step 3 was done with gel electrophoresis to extract molecules of the proper length. Step 4 was done by checking each vertex if only present in a path once. In the final step, the gel electrophoresis is used for testing if any molecules left or not. If “yes”, they represent the Hamiltonian paths. If no molecules were detected on the final gel, then, there is no Hamiltonian path existed. 1.3 DNA computing Being the main material of nucleus, DNA (deoxyribonucleic acid) is able to determine the inheritance model of natural creatures such as human beings and is made up of a linear chain of smaller units which is called nucleotides. Nucleotides 3 contain three major components those are deoxyribose, phosphate group, and the base, while different nucleotides are tested by their bases which could be adenine (abbreviated as A), guanine (G), cytosine (C) or thymine (T). Two strands of DNA could form a double helix if the respective bases are the famous Watson-Crick complements, i.e., C matches G, and A matches T. Then the 3 end (the 3rd carbon of the deoxyribose) will connect to the 5 end (the 5th carbon attaching a phosphate group) in each strand. A single DNA strand is chained from the 3 -end (attaching a hydroxyl group) nucleotide to the next 5 -end (attaching a phosphate group) nucleotide via a phosphate group, by one nucleotide each time and then form another single DNA strand. If the strand contains 20 nucleotides, we say it is 20-mer long. For a double stranded DNA, the length is counted by its base pairs. If a double stranded DNA has the base pairs with 20, then we know it is made by two single DNA strands each has the length 20 mer long[1, 25]. DNA-based Computing [1, 2, 3, 4] treats the DNA strands as the bits in the traditional digital computers, and use the techniques such as PCR (polymerase chain reaction), gel electrophoresis, and enzyme reactions to separate, concatenate, delete, and duplicate the DNA strands [5]. Nowadays, we could produce roughly 1018 DNA strands in a test tube [10-15]. It also means that we could represent 1018 bits of information. By the biological operations in the following section, we seem to have 1018 processors running in parallel. The massive power of parallelism could solve the 4 most intractable problem in computer science so far [19]. The molecular computation was first proposed in 1961 by Feynman [18], while his idea had not been tested experimentally until 1994 when Adleman successfully solved an instance of the Hamiltonian path problem in a test tube by DNA strands [1]. After that, DNA-based algorithms have been proposed for the solution of many computational problems. The motivation behind using DNA to solve such problems lies in the potential for massive inherent parallelism when performing operations on populations of trillions of DNA molecules. Lipton in 1995 [22] also demonstrated the DNA-solution could be used to solve satisfiability problem which is the first NP-complete problem. Adleman and co-authors (Roweis et al.) in 1999 proposed sticker for enhancing the Adleman-Lipton model [25]. In 2000, Adleman and his co-authors (Braich et al.) chose to solve a 6-variable 11-clause formula on the 3-SAT problem [5]. Moreover, in 2002, Adleman and his co-authors (Braich et al.) performed experiments to solve a 20-variable 24-clause formula on the 3-SAT problem [6]. In this dissertation, we try to solve Not-All-Equal (NAE) 3-SAT problem and One-In-Three (1in3) 3-SAT problem and the hitting-set problem using molecular solutions. We also discuss the complexity of the proposed DNA-based algorithms. 5 Time complexity, volume complexity, numbers of test tube used, and the longest library strand in solution space of DNA-based algorithms are discussed. Finally, simulated experiment is applied to One-In-Three (1in3) 3-SAT problem and the hitting-set problem, while Not-All-Equal (NAE) 3-SAT problem is similar. This dissertation is organized as follows. In chapter 2, we provide background of DNA-based Supercomputing and related works done by other scholars. In chapter 3, we define Not-All-Equal (NAE) 3-SAT problem and present the DNA-based algorithm and also discuss complexity of this algorithm. In chapter 4, we define One-In-Three (1in3) 3-SAT problem and present the DNA-based algorithm and also discuss complexity of this algorithm. In chapter 5, we define the hitting-set problem and present the DNA-based algorithm and also discuss complexity of this algorithm. In chapter 6 ,we provide the simulated experimental results of One-In-Three (1in3) 3-SAT problem and hitting-set problem. Chapter 7 are discussions and conclusions. 6 Chapter 2 Background and related works 2.1 The Adleman-Lipton model Adleman present a new concept of computation in the molecular level at his paper [1-3]. According to his idea, We could make a molecular computer with the tools as the following: 1. Watson-Crick complements. Two strands of DNA will anneal to form a famous double helix if the respective base meets its Watson-Crick complements which are C matches G and A matches T. Of course, if a molecule of DNA meets another DNA molecule which is not its complement, then they will not anneal. 2. Ligases. Ligases bond the splitted DNA molecule together. For example, DNA ligase will take two strands of DNA molecule and covalently connect them into a single strand. In fact, ligase is used by the cell to repair the broken DNA strands. 3. Nucleases. Nucleases would cut nucleic acid of a DNA molecule. For example, nucleases would look for a predetermined sequences of bases of a strand of DNA molecules, if found, would cut the DNA strands into two pieces. 4. Polymerases. Polymerases copy information from one DNA molecule into the 7 other. Furthermore, DNA polymerases will make a Watson-Crick complementary copy from a DNA strand template. In fact, if we tell it where to start—that is a primer provided by a short piece of DNA strand, DNA polymerase will begin adding bases to the primer to create a complementary copy of the template. 5. Gel electrophoresis. A solution of DNA molecules is placed in one end of gel, and we applied electric current to the gel. This process separates DNA strands by length. 6. DNA synthesis. Nowadays, we could ask a commercial DNA synthesis facility to make the DNA sequence. Just in a few days, we will receive a test tube containing about 1018 molecules of DNA which is the sequence we ask. The above six techniques is the basis of Adleman-Lipton DNA computing modle. From which, Adleman developed eight bio-molecular instructions to perform bio-molecular programs. A test tube contains molecules of DNA which is a finite set over the alphabet {A, C, G, T}, we could perform the following operations [915 ]: 1. Append-tail. Given a tube T and a binary digit xj, the operation, "Append-tail", will append xj onto the end of every data stored in the tube T. The formal representation for the operation is written as "Append-tail(T, xj)". 2. Amplify. Given a tube T, the operation “Amplify(T, T1, T2)” will produce two new 8 tubes T1 and T2 so that T1 and T2 are totally a copy of T (T1 and T2 are now identical) and T becomes an empty tube. 3. Merge. Given n tubes T1 Tn, the merge operation is to merge data stored in any n tubes into one tube, without any change in the individual data. The formal representation for the merge operation is written as "(T1, , Tn)", where (T1, , Tn) = T1 Tn. 4. Extract. Given a tube T and a binary digit xk, the extract operation will produce two tubes +(T, xk) and (T, xk), where +(T, xk) is all of the data in T which contain xk and (T, xk) is all of the data in T which do not contain xk. After Extract biological operation is completed, test tube T becomes an empty tube. 5. Detect. Given a tube T, the detect operation is used to check whether any a data is included in T or not. If at least one data is included in T we have “yes”, and if no data is included in T we have “no“. The formal representation for the operation is written as “Detect(T)“. 6. Discard. Given a tube T, the contents of T are discarded, and T is replaced by a new, empty tube. The formal representation for the operation is written as “Discard(T)“. 7. Read. Given a tube T, the read operation is used to describe any a data, which is 9 contained in T. Even if T contains many different data, the operation can give an explicit description of exactly one of them. The formal representation for the operation is written as “read(T)“. 8. Append-head. Given a tube T and a binary digit xj, the operation, "Append-head", will append xj onto the head of every data stored in the tube T. The formal representation for the operation is written as “Append-head(T, xj) “. 2.2 Introduction to other related works Adleman and his co-authors [6] performed experiments to solve a 20-variable 24-clause three-conjunctive normal form (3-CNF) formula. Zhang and Winfree [29] presented an allosteric DNA molecule that, in its active configuration, catalyzes a noncovalent DNA reaction. Yin and his co-authors [28] programed diverse molecular self-assembly and disassembly pathways using a ‘reaction graph’ abstraction to specify complementarity relationships between modular domains in a versatile DNA hairpin motif. Cook and his co-authors [17] showed how several common digital circuits (including de-multiplexers, random access memory, and Walsh transforms) could be built in a bottom-up manner using biologically inspired self-assembly. Bishop and his co-authors [8] considered the task of programming active 10 self-assembling and self-organizing systems at the level of interactions among particles in the system. Chen and his co-authors [16] proposed dimension augmented proof-reading, a technique that uses the third dimension to do error-correction in two dimensional self-assembling systems. Suzuki and Murata [26] proposed a model of DNA spike oscillator. Goodman and his co-authors [20] reported a family of DNA tetrahedra, less than 10 nanometers on a side that can self-assemble in seconds with near-quantitative yield of one diastereomer. O'Neill and his co-authors [24] studied the nanotubes that have five nicks, one in the core of a tile and one at each corner and reported the successful ligation of all four corner nicks by T4 DNA ligase. Yashin and his co-authors [27] demonstrated cascades of particles with up to three layers and a nonlinear network with an AND gate hub. Brijder and his co-authors [7] showed that membrane systems are computationally universal. Majumder and his co-authors [23] described how these self-assembly processes can be modeled as rapidly mixing Markov Chains characterized chemical equilibrium in the context of self-assembly processes and present a formulation for the equilibrium concentration of various assemblies. 11 Chapter 3 Molecular solution of Not-All-Equal (NAE) 3-SAT problem 3.1 Definition of Not-All-Equal (NAE) 3-SAT problem Satisfiability is the first NP-complete problem which determine if the variables of a given Boolean formula can be assigned in such a way that it makes the formula evaluate to be true. If there is no such assignment found, we say that the function is unsatisfiable, otherwise it is satisfiable. Satisfiability problem is also a decision problem which is also called Boolean satisfiability problem whose instance is a Boolean expression written using only AND, OR, NOT, variables and parantheses. A more formal definition of satisfiability problem is: There is a set U of variables and a collection C of clauses over U, is there a satisfying truth assignment for C? The problem remains NP-complete even if all expressions are written in conjunctive normal form with 3 variables per clause (3-CNF), yielding the 3-SAT problem. 3-satisfiability is a special case of k-satisfiability (k-SAT) when each clause contains exactly k=3 literals. For example, E = ( x1 x 2 x3 ) ( x1 x3 x 4 ). Note that each clause has exactly 3 literals, that is why we call it 3-SAT. Not-All-Equal (NAE) 3-SAT problem [19] is defined as follows. 12 Definition 3-1: Instance: A set V of logical variables and a collection C of clauses over V such that each clause has 3 literals. Question: Is there a truth assignment for V such that each clause has at least one true and at least one false literal? For example, V = ( x1, x2, x3, x4 ) and C = ( x1 x 2 x3 ) ( x1 x3 x 4 ). Suppose x1 is the leftmost bit and x4 is the rightmost bit, we could find all truth assignments are { 1000, 1001, 1010, 1011, 1100, 1101, 1110, 1111, 0000, 0001, 0100, 0101, 0011}. Delete 3 assignments ( { 1000, 1001, 1101 } ) which will make one of the clauses all 1’s, and no truth assignment would make one of the clauses all 0’s. We get final answer T = { 0000, 0001, 0100, 0101, 0011, 1110, 1111, 1010, 1011, 1100 }. 3.2 Generate DNA-based algorithm to solve Not-All-Equal (NAE) 3-SAT problem Given that x1, x2, x3, x4 are 4 logical variables and x2 f ( x1, x2, x3, x4 ) = C = ( x1 x3 ) ( x1 x3 x 4 ). Define a binary digit zk1 to be the kth bit (count from the leftmost side) which is 1 and zk0 to be the kth bit (count from the leftmost 13 side) which is 0. |C| are numbers of clauses. |Ca| are numbers of elements of the Cath clause. We also define vba is a logical variable which is the xbth bit in the ath clause. Suppose that x1 is the leftmost bit and x4 is the rightmost bit. Basically our algorithm contains 2 blocks of codes, the first block will generate truth assignments of the 3-SAT problem [4]. The second block deletes the truth assignments which make one of the clauses all 1’s. Note that no truth assignment would make any one of the clauses all 0’s. Because if one of the clauses are all 0’s, then it is unsatisfiable. The answer is left on T such that each clause has at least one true and at least one false literal. Algorithm 3-1: Solving Not-All-Equal (NAE) 3-SAT problem for n logical variables and a collection C of clauses over n. (1) * first block (Not-All-Equal-0) (generate truth assignments) * (2) Append-tail(T1, z11). (3) Append-tail(T2, z10). (4) T = (T1, T2). (5) For k = 2 to n (6) Amplify(T, T1, T2). 14 (7) Append-tail(T1, zk1). (8) Append-tail(T2, zk0). (9) T = (T1, T2). (10) EndFor (11) For a = 1 to |C| do begin (12) For b = 1 to |Ca| do begin (13) If vba = xj then begin (14) Tb = +(T, vba=1) (15) T = (T, vba=1) (16) end (17) else begin (18) Tb = +(T, vba = 0) (19) T = (T, vba = 0) (20) end (21) End for (22) Discard (T) |Ca | (23) T (Tb ) b 1 (24) Endfor 15 (25) * second block (Not-All-Equal-1) * (26) For a = 1 to |C| do begin (27) For b = 1 to |Ca| do begin (28) If vba = xj then begin (29) Tb = +(T, vba = 0) (30) T = (T, vba = 0) (31) end (32) else begin (33) Tb = +(T, vba = 1) (34) T = (T, vba = 1) (35) end (36) End for (37) Discard (T) |Ca | (38) T (Tb ) b 1 (39) Endfor (40) EndAlgorithm Lemma 3-1: The algorithm 3-1 can be used to solve a Not-All-Equal 3-SAT 16 problem for n logical variables and a collection C of clauses over n. Proof: The algorithm 3-1 consists of 2 block of codes and is implemented by means of the extract, amplify, append-tail, discard and merge operations. The first block of codes contains solution space of 2n states of n bits which is generated from each execution of step (2) through step (10). After those operations are performed, 2n combinations of n bits are contained in tube T. The first block of codes generate all truth assignments such that each clause would not be all 0’s. Step (11) and step (12) are, respectively, the outer loop and inner loop. And step (13) to step (20) say that if vba = xj then we extract vba=1 to Tb and extract vba= 0 to T, otherwise we extract vba = 0 to Tb and extract vba = 1 to T where a and b are indexes of outer and inner loop respectively. After inner loop is ended, we discard test tube T and union all test tube Tb and repeat the second outer loop. After all outer loops are ended, all truth assignments are in test tube T. The second block of codes would delete some truth assignments such that each clause would not be all 1’s. Step (26) and Step (27) are, respectively, the outer and inner loop. Step (28) to step (35) say that if vba = xj then we extract vba= 0 to Tb and extract vba= 1 to T, otherwise we extract vba = 1 to Tb and extract vba = 0 to T where a 17 and b are indexes of outer and inner loop respectively. After inner loop is ended, we discard test tube T and union all test tube Tb and repeat the second outer loop. After all outer loops are ended, all truth assignments such that each clause has at least one true and at least one false literal are in test tube T. 3.3 The Power of the DNA Algorithm to Solve Not-All-Equal (NAE) 3-SAT problem The example for V = ( x1, x2, x3, x4 ) and f ( x1, x2, x3, x4 ) = C = ( x1 x 2 x3 ) ( x1 x3 x 4 ) in subsection 3.1 is applied to show the power of Algorithm 3-1. The first execution of Step (13) through Step (20) when a = 1 and b = 1, we put the subset whose leftmost encoding bit is 1 on T1, and put the subset whose leftmost encoding bit is 0 on T, so we get T1= {1000, 1001, 1010, 1011, 1100, 1101, 1110, 1111 } and T = { 0000, 0001, 0010, 0011, 0100, 0101, 0110, 0111 }. Next, the second execution of Step (13) through Step (20) when a = 1 and b = 2, we get T2 = { 0000, 0001, 0010, 0011 } and T = { 0100, 0101, 0110, 0111}. Then, the third execution of Step (13) through Step (20) when a = 1 and b = 3, we obtain T3 = {0100, 0101} and T = { 0100, 0111}. Because the first outer loop is ended, the first execution of Step (22) is applied to discard test tube T and the first execution of Step (23) is applied to 18 merge test tube T1, T2, T3 into T, we get T = {1000, 1001, 1010, 1011, 1100, 1101, 1110, 1111, 0000, 0001, 0010, 0011, 0100, 0101}. For the second outer loop of The forth execution of Step (13) through Step (20) when a = 2 and b = 1, we put the subset whose leftmost encoding bit is 1 on T1, and put the subset whose leftmost encoding bit is 0 on T, so we get T1= {1000, 1001, 1010, 1011, 1100, 1101, 1110, 1111 } and T = { 0000, 0001, 0010, 0011, 0100, 0101, }. Next, the fifth execution of Step (13) through Step (20) when a = 2 and b = 2, we get T2 = { 0000, 0001, 0100, 0101 } and T = { 0010, 0011 }. Then, the sixth execution of Step (13) through Step (20) when a = 2 and b = 3, we obtain T3 = {0011} and T = { 0010 }. Because the second outer loop is ended, the second execution of Step (22) is applied to discard test tube T and the second execution of Step (23) is applied to merge test tube T1, T2, T3 into T, we get T = {1000, 1001, 1010, 1011, 1100, 1101, 1110, 1111, 0000, 0001, 0100, 0101, 0011 } are the truth assignments. The truth assignments T = {1000, 1001, 1010, 1011, 1100, 1101, 1110, 1111, 0000, 0001, 0100, 0101, 0011 }. We keep on tracing second block of codes which delete the truth assignment make any one of clauses all 1’s. The first execution of Step (28) through Step (35) when a = 1 and b = 1, we put the assignment whose leftmost encoding bit is 0 on T1, and put the assignment whose leftmost encoding bit is 1 on T, 19 so we get T1= {0000, 0001, 0100, 0101, 0011} and T = {1000, 1001, 1010, 1011, 1100, 1101, 1110, 1111}. Next, the second execution of step (28) through Step (35) when a = 1 and b = 2, we get T2 = {1100, 1101, 1110, 1111} and T = {1000, 1001, 1010, 1011}. Then, the third execution of step (28) through Step (35) when a = 1 and b = 3, we obtain T3 = {1010, 1011} and T = {1000, 1001}. Because the first outer loop is ended, the first execution of Step (37) is applied to discard test tube T and the first execution of Step (38) is applied to merge test tube T1, T2, T3 into T, we get T = {0000, 0001, 0100, 0101, 0011, 1100, 1101, 1110, 1111, 1010, 1011}. For the second outer loop when a = 2 and b = 1, from the fourth execution of step (28) through Step (35), we get T1 = {0000, 0001, 0100, 0101, 0011} and T = {1100, 1101, 1110, 1111, 1010, 1011}. The fifth execution of step (28) through Step (35) when a = 2 and b = 2, we get T2 = {1110, 1111, 1010, 1011} and T = {1100,1101}. The sixth execution of step (28) through Step (35) when a = 2 and b = 3, we get T3 = {1100} and T = {1101}. The second outer loop is ended also. The second execution of Step (37) is applied to discard T and the second execution of Step (38) is used to merge T1, T2, T3 into T. This implies the answer T = {0000, 0001, 0100, 0101, 0011, 1110, 1111, 1010, 1011, 1100}. We discard 3 truth assignments ({1000, 1001, 1101}) which would make one of the clauses all 1’s. No truth assignment would make any 20 one of the clauses all 0’s. So, T is the final answer. In the development of this research, I also find that the first block of codes is equal to delete the assignments which make any one of the clauses all 0’s (By definition). So, we have two block of codes in our DNA algorithm, first block is generate the truth assignment (or we can say that “ delete the assignment which make any one of the clauses all 0’s ”), second block is to delete the truth assignment which make any one of the clauses all 1’s. 3.4 The Complexity of Algorithm 3-1 The following theorems describe time complexity and volume complexity of Algorithm 4-1, numbers of test tube used and the longest library strand in solution space in Algorithm 4-1. Theorem 3-1: A set V of n logical variables and a collection C of clauses which are {C1, C2, …, Cp} over n. The Not-All-Equal (NAE) 3-SAT problem for C and V can be solved in O(12p) with “extract” operation, O(2p) with “discard” operation, O(2n) with “append” operation, O(n+2p) with “merge” operation, and O(n-1) with “amplify” operation in the Adleman-Lipton model. Proof: 21 Algorithm 3-1 can be applied to solve the Not-All-Equal (NAE) 3-SAT problem for C and V. From the first block of codes in algorithm 1, it is obvious that we use 2*3*p = (6p) “extraction” operations, (p ) “discard” operations, ( 2*n ) “append” operations and ( n+p ) “merge” operations, and ( n-1) “amplify” operation. From the second block of codes in Algorithms 1, we use 2*3*p = (6p) “extraction” operations, (p ) “discard” operations, and (p ) merge operations. Therefore, from the analysis above, it is inferred that the time complexity of Algorithm 1 is O(12p) with “extract” operation, O(2p) with “discard” operation, O(2n) with “append” operation, O(n+2p) with “merge” operation, and O(n-1) with “amplify” operation in the Adleman-Lipton model. Theorem 3-2: A set V of n logical variables and a collection C of clauses which are {C1, C2, …, Cp} over n. The Not-All-Equal (NAE) 3-SAT problem for C and V can be solved with O( 2n ) library strands in the Adleman-Lipton model. Proof: Refer to Lemma 3-1 and Theorm 3-1. Theorem 3-3: A set V of n logical variables and a collection C of clauses which are {C1, C2, …, Cp} over n. The Not-All-Equal (NAE) 3-SAT problem for C and V can be solved with O( n ) tubes in the Adleman-Lipton model. 22 Proof: Refer to Lemma 3-1 and Theorm 3-1. Theorem 3-4: A set V of n logical variables and a collection C of clauses which are {C1, C2, …, Cp} over n. The Not-All-Equal (NAE) 3-SAT problem for C and V can be solved with the longest library strand, O( 15*n +15*p ), in the Adleman-Lipton model. Proof: Refer to Lemma 3-1 and Theorm 3-1. 23 Chapter 4 Molecular solution of One-In-Three (1IN3) 3-SAT problem 4.1 Definition of One-In-Three (1IN3) 3-SAT problem One-In-Three (1in3) 3-SAT problem [19] is defined as follows. Definition 4-1: Instance: A set V of logical variables and a collection C of clauses over V such that each clause has 3 literals. Question: Is there a truth assignment for V such that each clause in C has exactly one true literal? We analyse the definition above and find that if each clause in C has three literals and exactly one true literal ( the other two literals are of course false ), then each clause should be one of the three forms which are 100, 010, 001. For example, V = ( x1, x2, x3, x4 ) and C = ( x1 x 2 x3 ) ( x1 x3 x 4 ). Suppose x1 is the leftmost bit and x4 is the rightmost bit, we could find all truth assignments are { 1000, 1001, 1010, 1011, 1100, 1101, 1110, 1111, 0000, 0001, 0100, 0101, 0011}. There are only 3 assignments ( { 1110, 0100, 0011 } ) such that each clause in C has exactly one true literal. We get final answer T = { 1110, 0100, 0011 }. 24 4.2 Generate DNA-based algorithm to solve One-In-Three (1IN3) 3-SAT problem Given that x1, x2, x3, x4 are 4 logical variables and f ( x1 , x2 , x3 , x4 ) = C = ( x1 x 2 x3 ) ( x1 x3 x 4 ). Define a binary digit zk1 to be the kth bit (count from the leftmost side) which is 1 and zk0 to be the kth bit (count from the leftmost side) which is 0. |C| are numbers of clauses. |Ca| are numbers of elements of the Cath clause. We also define vba is a logical variable which is the xbth bit in the ath clause. Suppose that x1 is the leftmost bit and x4 is the rightmost bit. Basically our algorithm contains 2 blocks of codes, the first block will produce solution spaces and generate truth assignments of the 3-SAT problem [4]. The second block is moreover separated into three parts. First part would collect truth assignments which make the clause be “100”. Second part would collect truth assignments which make the clause be “010”. Third part would collect truth assignments which make the clause be “001”. Then go to the second round (for a = 2, that is, the second clause ), and do the same thing again. After all clauses are done, the answer is left on T such that each clause has exactly one true literal. We propose the following DNA-based algorithm to solve One-In-Three (1IN3) 3-SAT problem. 25 Algorithm 4-1: Solving One-In-Three (1IN3) 3-SAT problem for n logical variables and a collection C of clauses over n. (1) * first block (generate truth assignments) * (2) Append-tail(T1, z11). (3) Append-tail(T2, z10). (4) T = (T1, T2). (5) For k = 2 to n (6) Amplify(T, T1, T2). (7) Append-tail(T1, zk1). (8) Append-tail(T2, zk0). (9) T = (T1, T2). (10) EndFor (11) For a = 1 to |C| do begin (12) For b = 1 to |Ca| do begin (13) If vba = xj then begin (14) Tb = +(T, vba=1) (15) T = (T, vba=1) (16) end 26 (17) else begin (18) Tb = +(T, vba = 0) (19) T = (T, vba = 0) (20) end (21) End for (22) Discard (T) |Ca | (23) T (Tb ) b 1 (24) Endfor (25) * second block * (26) *first part- truth assignment which make * (27) * clause “100” is in T1 * (28) For a = 1 to |C| do begin (29) If v1a = xj then begin (30) T1 = +(T, v1a = 1) (31) T2 = (T, v1a = 1) (32) end (33) else begin (34) T1 = +(T, v1a = 0) 27 (35) T2 = (T, v1a = 0) (36) end (37) If v2a = xj then begin (38) T3 = (T1, v2a = 0) (39) T1 = +(T1, v2a = 0) (40) (41) end else begin (42) T3 = (T1, v2a = 1) (43) T1 = +(T1, v2a = 1) (44) end (45) If v3a = xj then begin (46) T4 = (T1, v3a = 0) (47) T1 = +(T1, v3a = 0) (48) (49) end else begin (50) T4 = (T1, v3a = 1) (51) T1 = +(T1, v3a = 1) (52) end 28 T = (T2, T3, T4) (53) (54)*second part-truth assignment which make* (55) * clause “010” is in T5 * (56) If v1a = xj then begin (57) T5 = +(T, v1a = 0) (58) T6 = (T, v1a = 0) (59) end (60) else begin (61) T5 = +(T, v1a = 1) (62) T6 = (T, v1a = 1) (63) end (64) If v2a = xj then begin (65) T7 = (T5, v2a = 1) (66) T5 = +(T5, v2a = 1) (67) (68) end else begin (70) T7 = (T5, v2a = 0) (69) T5 = +(T5, v2a = 0) 29 (71) (72) end If v3a = xj then begin (74) T8 = (T5, v3a = 0) (73) T5 = +(T5, v3a = 0) (75) end (76) else begin (77) T8 = (T5, v3a = 1) (78) T5 = +(T5, v3a = 1) (79) end (80) T = (T6, T7, T8) (81) *third part-truth assignment which make* (82) * clause “001” is in T9 * (83) If v1a = xj then begin (84) T9 = +(T, v1a = 0) (85) T10 = (T, v1a = 0) (86) end (87) else begin (88) T9 = +(T, v1a = 1) 30 (89) T10 = (T, v1a = 1) (90) (91) end If v2a = xj then begin (92) T11 = (T9, v2a = 0) (93) T9 = +(T9, v2a = 0) (94) end (95) else begin (96) T11 = (T9, v2a = 1) (97) T9 = +(T9, v2a = 1) (98) end (99) If v3a = xj then begin (100) T12 = (T9, v3a = 1) (101) T9 = +(T9, v3a = 1) (102) (103) end else begin (104) T12 = (T9, v3a = 0) (105) T9 = +(T9, v3a = 0) (106) end 31 (107) (108) Discard (T) T = ( T1, T5, T9 ) (109) Endfor (110) EndAlgorithm The answer is in test tube T. Lemma 4-1: The algorithm 4-1 can be used to solve a One-In-Three 3-SAT problem for n logical variables and a collection C of clauses over n. Proof: The algorithm 4-1 consists of 2 block of codes and is implemented by means of the extract, amplify, append-tail, discard and merge operations. The first block of codes (from step(2) to step(24)) contains solution space of 2n states of n bits which is generated from each execution of step(2) through step(10). From step(11) to step(24) would generate all truth assignments. The proof of generating solution space and truth assignments can be referred to Lemma 3-1 of algorithm 3-1. The second block of codes can be furthermore divided into three parts of codes. The first part of codes contain from step(28) to step(53). The second part of codes contain from step(56) to step(80). The third part of codes contain from step(83) to 32 step(106). Each part of codes contains three if-like instructions which are exactly extract operations. There are totally nine if-like instructions. First part of three if-like instructions (from step(28) to step(52))would extract clauses which are of the form 100 and union the others (step(53))which are not of the form 100 into test tube T. Keep on executing second part of three if-like instructions (from step(56) to step(79))which would extract clauses of the form 010 and union the others (step(80))which are not of the form 010 into test tube T. Keep on executing third part of three if-like instructions (from step(83) to step(106)) which would extract clauses of the form 001 and union the others which are not of the form 001 into test tube T. Discard T (step(107))and put answers of the first round into test tube T (step(108)). Go back to step(28) and keep on extracting clauses of the forms of 100, 010, 001 (execute from step(28) to step(108)). Repeat these steps after all clauses are checked and the answers are left in test tube T. 4.3 The Power of the DNA Algorithm to Solve One-In-Three (1IN3) 3-SAT problem The example for V = ( x1, x2, x3, x4 ) and f ( x1, x2, x3, x4 ) = C = ( x1 x 2 x3 ) ( x1 x3 x 4 ) in subsection 5.1 is applied to show the power of Algorithm 4-1. 33 The first execution of Step (13) through Step (20) when a = 1 and b = 1, we put the subset whose leftmost encoding bit is 1 on T1, and put the subset whose leftmost encoding bit is 0 on T, so we get T1= {1000, 1001, 1010, 1011, 1100, 1101, 1110, 1111 } and T = { 0000, 0001, 0010, 0011, 0100, 0101, 0110, 0111 }. Next, the second execution of Step (13) through Step (20) when a = 1 and b = 2, we get T2 = { 0000, 0001, 0010, 0011 } and T = { 0100, 0101, 0110, 0111}. Then, the third execution of Step (13) through Step (20) when a = 1 and b = 3, we obtain T3 = {0100, 0101} and T = { 0100, 0111}. Because the first outer loop is ended, the first execution of Step (22) is applied to discard test tube T and the first execution of Step (23) is applied to merge test tube T1, T2, T3 into T, we get T = {1000, 1001, 1010, 1011, 1100, 1101, 1110, 1111, 0000, 0001, 0010, 0011, 0100, 0101}. For the second outer loop of The forth execution of Step (13) through Step (20) when a = 2 and b = 1, we put the subset whose leftmost encoding bit is 1 on T1, and put the subset whose leftmost encoding bit is 0 on T, so we get T1= {1000, 1001, 1010, 1011, 1100, 1101, 1110, 1111 } and T = { 0000, 0001, 0010, 0011, 0100, 0101, }. Next, the fifth execution of Step (13) through Step (20) when a = 2 and b = 2, we get T2 = { 0000, 0001, 0100, 0101 } and T = { 0010, 0011 }. Then, the sixth execution of Step (13) through Step (20) when a = 2 and b = 3, we obtain T3 = {0011} 34 and T = { 0010 }. Because the second outer loop is ended, the second execution of Step (22) is applied to discard test tube T and the second execution of Step (23) is applied to merge test tube T1, T2, T3 into T, we get T = {1000, 1001, 1010, 1011, 1100, 1101, 1110, 1111, 0000, 0001, 0100, 0101, 0011 } are the truth assignments. The truth assignments T = {1000, 1001, 1010, 1011, 1100, 1101, 1110, 1111, 0000, 0001, 0100, 0101, 0011 }. We keep on tracing second block of codes which would find the truth assignment such that each clause has exactly one true literal. The first execution of Step (29) through Step (36) when a = 1, we put the assignment whose leftmost encoding bit is 1 on T1, and put the assignment whose leftmost encoding bit is 0 on T2, so we get T1= {1110, 1111, 1010, 1011, 1100, 1000, 1001, 1101 } and T2 = {0000, 0001,0100, 0101, 0011}. Next, after the execution of step (37) through Step (44), we get T3 = {1010, 1011, 1000, 1001 } and T1 = {1110 , 1111, 1100 , 1101}. Then, after the execution of step (45) through Step (52), we obtain T4 = {1100, 1101} and T1 = {1110, 1111}. The first execution of Step (53) is applied to merge test tube T2, T3, T4 into T, we get T = {0000, 0001, 0100, 0101, 0011, 1010, 1011, 1100, 1000, 1001, 1101 }. Then, after the execution of step (56) through step (63), we get T5 = {0000, 0001, 0100, 0101, 0011 } and T6 = { 1010, 1011, 1100, 1000, 1001, 1101 }. After the 35 execution of step (64) through step (71), we get T7 = { 0100, 0101 } and T5 = { 0000, 0001, 0011 }. After the execution of step (72) through step (79), we get T8 = {0000, 0001} and T5 = {0011}. At step (80), we merge T6, T7, T8 into T and get T = {1010, 1011, 1100, 0100, 0101, 0000, 0001, 1000, 1001, 1101 }. Then, after the execution of step (83) through step (90), we get T9 = { 0100, 0101, 0000, 0001 } and T10 = {1010, 1011, 1100, 1000, 1001, 1101 }. After the execution of step (91) through step (98), we get T11 = { 0000, 0001 } and T9 = { 0100, 0101 }. After the execution of step (99) through step (106), we get T12 = and T9 = {0100, 0101 }. At step (107), we discard T. At step (108), we merge T1, T5, T9 into T and get T = { 1110, 1111, 0011, 0100, 0101 } which is the answer when a = 1. For the second loop when a = 2, after the execution of step (29) through step (36), we get T1 = {1110, 1111 } and T2 = { 0011, 0100, 0101 }. After the execution of step (37) through step (44), we get T3 = and T1 = { 1110, 1111 }. After the execution of step (45) through step (52), we get T4 = {1111 } and T1 = { 1110 }. The second execution of Step (53) is applied to merge test tube T2, T3, T4 into T, we get T = {0011, 0100, 0101, 1111 }. Then, after the execution of step (56) through step (63), we get T5 = {0011, 0100, 0101 } and T6 = { 1111 }. After the execution of step (64) through step (71), we get 36 T7 = {0011 } and T5 = { 0100, 0101 }. After the execution of step (72) through step (79), we get T8 = {0101 } and T5 = {0100 }. The second execution of Step (80) is applied to merge test tube T6, T7, T8 into T, we get T = { 1111, 0011, 0101 }. Then, after the execution of step (83) through step (90), we get T9 = { 0011, 0101 } and T10 = { 1111 }. After the execution of step (91) through step (98), we get T 11 = {0101 } and T9 = { 0011 }. After the execution of step (99) through step (106), we get T12 = and T9 = { 0011}. And then we discard T and merge T1, T5, T9 into T which is the final answer = T = {1110, 0100, 0011 }. 4.4 The Complexity of Algorithm 4-1 The following theorems describe time complexity and volume complexity of Algorithm 4-1, numbers of test tube used and the longest library strand in solution space in Algorithm 4-1. Theorem 4-1: A set V of n logical variables and a collection C of clauses which are {C1, C2, …, Cp} over n. The One-In-Three (1IN3) 3-SAT problem for C and V can be solved in O(24p) with “extract” operation, O(2p) with “discard” operation, O(2n) with “append” operation, O(n+4p) with “merge” operation, and O(n-1) with “amplify” operation in the Adleman-Lipton model. 37 Proof: Algorithm 4-1 can be applied to solve the One-In-Three (1IN3) 3-SAT problem for C and V. From the first block of codes in algorithm 1, it is obvious that we use 2*3*p = (6p) “extraction” operations, (p ) “discard” operations, ( 2*n ) “append” operations and ( n+p ) “merge” operations, and ( n-1) “amplify” operation. From the second block of codes in Algorithms 1, we use 2*9*p = (18p) “extraction” operations, (p ) “discard” operations, and (3p ) merge operations. Therefore, from the analysis above, it is inferred that the time complexity of Algorithm 1 is O(24p) with “extract” operation, O(2p) with “discard” operation, O(2n) with “append” operation, O(n+4p) with “merge” operation, and O(n-1) with “amplify” operation in the Adleman-Lipton model. Theorem 4-2: A set V of n logical variables and a collection C of clauses which are {C1, C2, …, Cp} over n. The One-In-Three (1IN3) 3-SAT problem for C and V can be solved with O( 2n ) library strands in the Adleman-Lipton model. Proof: Refer to Lemma 4-1 and Theorem 4-1. Theorem 4-3: A set V of n logical variables and a collection C of clauses which are {C1, C2, …, Cp} over n. The One-In-Three (1IN3) 3-SAT problem for C and V can be 38 solved with O( n ) tubes in the Adleman-Lipton model. Proof: Refer to Lemma 4-1 and Theorem 4-1. Theorem 4-4: A set V of n logical variables and a collection C of clauses which are {C1, C2, …, Cp} over n. The One-In-Three (1IN3) 3-SAT problem for C and V can be solved with the longest library strand, O( 15*n +15*p ), in the Adleman-Lipton model. Proof: Refer to Lemma 4-1 and Theorem 4-1. 39 Chapter 5 A DNA-based Algorithm for Solving the Hitting-set Problem 5.1 Definition of the Hitting-set Problem Informally, assume it is given a collection C of subsets of a finite set S, and a positive integer K, and expecting to find a subset S S with | S | K, such that S contains at least one element from each subset in C. In other words, S is the smallest subset that hits (intersects) every subset in C and the size of the sets in S cannot be larger than K. The formal definition is then described as below. Definition 5-1: Assume that a ground set S with n elements and a collection C of subsets {C1, C2, …, Ci, …, Cp} are given, where Ci is a subset of S and a positive integer K ≦ |S|. The hitting-set problem is to find if there is some subset S of S such that | S | K and (Ci ∩ S ) ≠ , where i = 1, 2, 3, … p. For example, S = {1, 2, 3, 4} and C = {{1, 2, 3}, {4}}. From definition 6-1, the hitting-set for S and C consists of {1, 4}, {2, 4}, {3, 4} and K = 2. 40 5.2 Constructing Solution Space of DNA Sequences for the Hitting-set Problem Suppose that an n-bit binary number corresponds to each possible hitting set and n is the number of elements of ground set S. The encoding scheme is that, if the ith element appears in the subset, then the corresponding ith bit for the encoding number is 1, otherwise it is set to 0. In a real-world implementing scheme, assume that an n-bit binary number Q is represented by a binary number z1, …, zn, where the value of zk is either 1 or 0 for 1 k n. A bit zk is the kth bit in an n-bit binary number Q and it represents the kth element in S. All possible subsets S of a ground set S = {1, 2, 3, 4} are shown in Table 5-1. Table 5-1: each possible subsets S of a ground set S = {1, 2, 3, 4}. Subset Encoded sequence Subset Encoded sequence 0000 {1} 0001 {2} 0010 {3} 0100 {4} 1000 {1,2} 0011 {1,3} 0101 {1,4} 1001 {2,3} 0110 {2,4} 1010 41 {3,4} 1100 {1,2,3} 0111 {1,2,4} 1011 {1,3,4} 1101 {2,3,4} 1110 {1,2,3,4} 1111 5.3 Introduction of Finding the Maximum and Minimum Numbers of Ones in Bio-molecular Computing Consider that four combinations of two bits that are, subsequently, 00(010), 01(110), 10(210), and 11(310). One interesting question is that, how these four combinations are classified from the number of ones in their combinations. Because the numbers of ones for 11(310), 10(210), 01(110) and 00(010) are respectively, two, one, one and zero, cases of 11(310) and 00(010) are two different classes and 10(210) and 01(110) are under the same classification scheme, i.e. with only one-bit 1 each. Similarly, we can extend this interesting question to, how the 2n combinations of n bits are classified from the number of ones in their combinations. This is to say that those combinations have k ones for 0 k n, for a n-bit long sequence. Assume that a binary number of n bits, zn, zn1, …, z2, z1 can be applied to form 2n combinations, where the value for each zk bit is either one or zero for 1 k n. For the sake of convenience, zk1 denotes the fact that the value of zk is one and zk0 denotes 42 the fact that the value of zk is zero. The following algorithm is proposed to find the maximum and minimum numbers of one from the 2n combinations. Algorithm 5-1: ParallelFind(T0) (1) Append-head(T1, z11); (2) Append-head(T2, z10); (3) T0 = (T1, T2); (4) For k = 2 to n (5) Amplify(T0, T1, T2); (6) Append-head(T1, zk1); (7) Append-head(T2, zk0); (8) T0 = (T1, T2); (9) EndFor (10) For k = 0 to n 1 (11) For j = k downto 0 (12) Tj + 1ON = +(Tj, zk + 11) and Tj = (Tj, zk + 11); (13) Tj + 1 = (Tj + 1, Tj + 1ON); (14) EndFor 43 (15) EndFor (16) EndAlgorithm Lemma 5-1: The algorithm, ParallelFind(T0), can be used to find the maximum and minimum numbers of one from 2n combinations of any n-bit binary sequence. Proof: The algorithm ParallelFind(T0) is implemented by means of the extract, amplify, append-head and merge operations. Solution space of 2n states of the n bits is generated from each execution for steps (1) through (9). After those operations are performed, the 2n combinations of n bits are then contained in tube T0. Tj is distinguished from TjON, and the test tube Tn would have n 1s finally. Steps (10) and (11) of Algorithm 3.1 are respectively the outer and inner loops of the proposed nested loop. Because the loop index k is varied from 0 to n 1, steps (10) and (11) are mainly employed to figure out the influence of zk + 1 for the number of ones in tubes T0 through Tj + 1, for that the value of j is from k through 0. On each execution of step (12), it calls the extract operation from tube Tj to form two different 44 tubes, i.e. Tj + 1 ON and Tj. This fact implies that tube Tj + 1 ON contains those combinations that having zk + 1 = 1 and tube Tj includes those combinations which have zk + 1 = 0, since those combinations in tube Tj have j ones and the combinations in Tj + 1ON are with (j + 1) ones. Next, each execution of step (13) applies the merge operation to pour tube Tj + 1ON into tube Tj + 1. Repeat executing steps (12) and (13) until the influence of zn for the number of ones in tubes T0 through Tn is processed. This implies that those combinations in tube Tk for 0 k n have k ones. 5.4 Generate DNA-based algorithm to solve the Hitting-set problem Followed the definitions presented in previous subsection, let a literal zi1 be a logical variable which represents the ith element in the finite set S and is set to 1 if it appears in the subset S , and zi0 states set to 0 it does not appear in the subset S . The initial set T contains many strings, while each encodes a single n-bit sequence. All possible 2n choices of subsets are encoded in the tube T. The following DNA-based algorithm is proposed to solve the hitting-set problem. Algorithm 5-2: (1) Append-head(T1, z11); 45 (2) Append-head(T2, z10); (3) T = (T1, T2); (4) For k = 2 to n (5) Amplify(T, T1, T2); (6) Append-head(T1, zk1); (7) Append-head(T2, zk0); (8) T = (T1, T2); (9) EndFor (10) For a = 1 to |C| do begin (11) (12) For b = 1 to |Ca| do begin If (the bth element in the ath subset in C is the ith element in S) (13) then begin (14) Tb= +(T,zi1); (15) T = (T,zi1); (16) end (17) Endfor (18) Discard (T); (19) T (Tb ) ; |Ca | b 1 46 (20) Endfor (21) T0 = (T0, T); (22) For k = 0 to n 1 (23) For j = k down to 0 (24) Tj + 1ON = +(Tj, zk + 11) and Tj = (Tj, zk + 11); (25) Tj + 1 = (Tj + 1, Tj + 1ON); (26) EndFor (27) EndFor (28) For k = 1 to n (29) If (Detect(Tk) = = “yes”) (30) then Begin (31) (32) Read(Tk) and terminate the algorithm; End (33) EndFor (34) EndAlgorithm Lemma 5-2: Algorithm 5-2 can be used to solve the hitting-set problem for an n-element set S and a collection of subset C. 47 Proof: The solution space of 2n states of the n-bit pattern is generated from each execution for steps (1) through (9). After those operations are performed, 2n combinations of the n-bit sequence are contained within tube T0. Step (10) is the outer loop which is run the number of subsets in C, and step (11) is the inner loop which is executed the number of elements in each subset in C. Each time the outer loop (step 10) is proceeded, the number of execution iterations of the inner loop is equal to the number of elements of the ath subset in C. Steps (14) and (15) say that we extract the subset whose zi is 1 and put it on test tube Tb, and extract the subset whose zi is 0 and place it into test tube T. When the inner loop is ended, we discard T and merge all Tb into T. The outer loop will be repeated in the same way. When all outer loops are ended, the hitting set is then in test tube T. From Algorithm 5-1, it is very clear that steps (21)-(27) are used to figure out the number of ones for those hitting-sets in T0. Next, step (28) is the last loop and is used to find the final answer. If the kth execution of step (29) returns a “yes”, then step (31) is applied to read the answer and Algorithm 5-2 is terminated. Otherwise, repeat to 48 execute step (28) through step (33) until the answer is found. 5.5 Simple Example of the Hitting-set Problem The example for S = {1, 2, 3, 4} and C = {{1, 2, 3}, {4}} in subsection 3.1 is applied here again to show the power of Algorithm 5-2. During the first execution of steps (14) and (15), where a = 1 and b = 1, we put the subset whose rightmost encoded bit is 1 in T1, as well as 0 in T. Therefore, we get T1= {0001, 0011, 0101, 1001, 0111, 1011, 1101, 1111} and T = {0000, 0010, 0100, 1000, 0110, 1010, 1100, 1110}. Next, the second execution of steps (14) and (15), where a = 1 and b = 2, we have now T2 = {0010, 0110, 1010, 1110} and T = {0000, 0100, 1000, 1100}. Then, the third execution of step (14) and step (15), when a = 1 and b = 3, we obtain T3 = {0100, 1100} and T = {0000, 1000}. Since the first outer loop is ended, the first execution of step (18) is applied to discard test tube T and the first execution of step (19) is applied to merge test tubes T1, T2, T3 into T, we grant T = {0001, 0011, 0101, 1001, 0111, 1011, 1101, 1111, 0010, 0110, 1010, 1110, 0100, 1100}. For the second outer loop under a = 2 and b = 1, from the fourth execution of steps (14) and (15), we get T1 = {1001, 1011, 1101, 1111, 1010, 1110, 1100} and T = 49 {0001, 0011, 0101, 0111, 0010, 0110, 0100}. Because the second subset has only one element, the second outer loop is ended as well. The second execution of step (18) is applied to discard T and the second execution of step (19) is used to merge T1 into T. This implies T = {1001, 1011, 1101, 1111, 1010, 1110, 1100}. After we get the hitting set, we keep going to find the minimum numbers of ones in T, i.e. find the smallest k of the hitting-set which is the small hitting-set. From the execution of step (21), we obtain T0 = {1001, 1011, 1101, 1111, 1010, 1110, 1100}. Next, from the first execution of steps (24)-(25) when k = 0 and j = 0, we have T1ON = {1001, 1011, 1101, 1111}, T0 = {1010, 1110, 1100}, and T1 = {1001, 1011, 1101, 1111}. From the second execution of steps (24) and (25) under k = 1 and j = 1, we get T2ON = {1011, 1111}, T1 = {1001, 1101}, and T2 = {1011, 1111}. Then, from the third execution of steps (24)-(25), with k = 1 and j = 0, we obtain T1ON = {1010, 1110}, T0 = {1100}, and T1 = {1001, 1101, 1010, 1110}. From the fourth execution of steps (24) and (25) when k = 2 and j = 2, we get T3ON = {1111}, T2 = {1011}, and T3 = {1111}. Lately, from the fifth execution of steps (24) and (25) where k = 2 and j = 1, we obtain T2ON = {1101, 1110}, T1 = {1001, 1010}, and T2 = {1011, 1101, 1110}. From the sixth execution of these two steps, with k = 2 and j = 0, we get T1ON = {1100}, T0 50 = , and T1 = {1001, 1010, 1100}. Thereafter, from the seventh execution of step (24) and step (25) when k = 3 and j = 3, we have T4ON = {1111}, T3 = , and T4 = {1111}, while from the eighth execution , when k = 3 and j = 2, we obtain T3ON = {1011, 1101, 1110}, T2 = , and T3 = {1011, 1101, 1110}. Then, from the ninth execution of step (24) and step (25) when k = 3 and j = 1, we get T2ON = {1001, 1010, 1100}, T1 = (T1, z41) = , and T2 = {1001, 1010, 1100}. Finally, from the tenth execution of steps (24) and (25) under k = 3 and j = 0, we obtain T1ON = , T0 = , and T1 = . After those operations are finished, we obtain T0 = , T1 = , T2 = {1001, 1010, 1100}, T3 = {1011, 1101, 1110}, and T4 = {1111}. Since T1 is an empty tube and T2 is not empty, a “yes” is returned from the second execution of step (29). As the result, the answer from the first execution of step (31) is {1001, 1010, 1100} which is the small hitting-set with k =2. 5.6 Complex Example of the Hitting-set Problem Consider another complex example of S = {1, 2, 3, 4} and C = {{1, 2, 3}, {2, 3, 4}}, we would get the hitting-set which are {{2}, {3}, {1, 2}, {1, 3}, {1, 4}, {2, 3}, {2, 4}, {3, 4}, {1, 2, 3}, {1, 2, 4}, {1, 3, 4}, {2, 3, 4}, {1, 2, 3, 4}}, and the small 51 hitting-set are {{2}, {3}} when k = 1. The first execution of steps (14) and (15) of Algorithm 5-2, when a = 1 and b = 1, we put the subset whose rightmost encoded bit is 1 into T1, as well as 0 into T, so we can get T1= {0001, 0011, 0101, 1001, 0111, 1011, 1101, 1111} and T = {0000, 0010, 0100, 1000, 0110, 1010, 1100, 1110}. Next, the second execution of these two steps under a = 1 and b = 2, we get T2 = {0010, 0110, 1010, 1110} and T = {0000, 0100, 1000, 1100}. Then, the third execution when a = 1 and b = 3, we obtain T3 = {0100, 1100} and T = {0000, 1000}. Because the first outer loop is ended, the first execution of step (18) is applied to discard test tube T and the first execution of step (19) is applied to merge test tubes T1, T2, T3 into T, we get T = {0001, 0011, 0101, 0111, 1001, 1011, 1101, 1111, 0010, 0110, 1010, 1110, 0100, 1100}. For the second outer loop when a = 2 and b = 1, from the fourth execution of steps (14) and (15), we put the subset whose second rightmost encoding bit is 1 into T1, as well as 0 into T, so we get T1 = {0011, 0111, 1011, 1111, 0010, 0110, 1010, 1110} and T = {0001, 0101, 1001, 1101, 0100, 1100}. Then, the fifth execution of these two steps of Algorithm 5-2 when a = 2 and b = 2, we have T2 = {0101, 1101, 0100, 1100} and T = {0001, 1001}, while after the sixth execution of step (14) and step (15) when a = 2 and b = 3, T3 = {1001} and T = {0001}. The second execution 52 of step (18) is applied to discard T and the second execution of step (19) is used to merge T1, T2, and T3 into T. This implies that T = {0011, 0111, 1011, 1111, 0010, 0110, 1010, 1110, 0101, 1101, 0100, 1100, 1001} is the hitting-set. After located the hitting set, we keep going to find the minimum numbers of ones in T, i.e. to find the smallest k of the hitting-set which is the small hitting-set. In Algorithm 5-2, from the execution of step (21), we obtain T0 = {0011, 0111, 1011, 1111, 0010, 0110, 1010, 1110, 0101, 1101, 0100, 1100, 1001}. Then, from the first execution of steps (24) and (25) when k = 0 and j = 0, we get T1ON = {0011, 0111, 1011, 1111, 0101, 1101, 1001}, T0 = {0010, 0110, 1010, 1110, 0100, 1100}, and T1 = {0011, 0111, 1011, 1111, 0101, 1101, 1001}, while from the second execution of step (24) and step (25) when k = 1 and j = 1, we get T2ON = {0011, 0111, 1011, 1111}, T1 = {0101, 1101, 1001}, and T2 = {0011, 0111, 1011, 1111}. From the third execution of steps (24)-(25) under k = 1 and j = 0, we grant T1ON = {0010, 0110, 1010, 1110}, T0 = {0100, 1100}, and T1 = {0101, 1101, 1001, 0010, 0110, 1010, 1110}. From the fourth execution of steps (24) and (25) when k = 2 and j = 2, we get T3ON = {0111, 1111}, T2 = {0011, 1011}, and T3 = {0111, 1111}. Then, followed by the fifth execution of steps (24) and (25) when k = 2 and j = 1, we get T2ON = {0101, 1101, 0110, 1110}, T1 = 53 {1001, 0010, 1010}, and T2 = {0011, 1011, 0101, 1101, 0110, 1110}, while when k = 2 and j = 0, we have T1ON = {0100, 1100}, T0 = , and T1 = {1001, 0010, 1010, 0100, 1100}. Next, by the seventh execution of steps (24)-(25), when k = 3 and j = 3, we get T4ON = {1111}, T3 = {0111}, T4 = {1111}, and, from the eighth execution of step (24) and step (25) when k = 3 and j = 2, we get T3ON = {1011, 1101, 1110}, T2 = {0011, 0101, 0110}, T3 = {0111, 1011, 1101, 1110}. From the ninth execution of step (24) and step (25) when k = 3 and j = 1, we get T2ON = {1001, 1010, 1100}, T1 = {0010, 0100}, T2 = {0011, 0101, 0110, 1001, 1010, 1100}. Next, from the tenth execution of step (24) and step (25) when k = 3 and j = 0, T1ON = , T0 = , T1 = {0010, 0100}. After those operations are finished, T0 = , T1 = {0010, 0100}, T2 = {0011, 0101, 0110, 1001, 1010, 1100}, T3 = {0111, 1011, 1101, 1110}, and T4 = {1111}. Since T1 is not an empty tube, a “yes” is returned from the first execution of step (29). Therefore, the answer from the first execution of step (31) is {0010, 0100} which is the small hitting-set with k =1. 5.7 The Complexity Analysis of Algorithm 5-2 The following theorems describe time complexity and volume complexity of Algorithm 5-2, as well as its numbers of test tube used and the longest library strand 54 in solution space. Theorem 5-1: A ground set S with n elements and a collection C of subsets {C1, C2, C3, … Cp} are given, where Ci is a subset of S and a positive integer K ≦ |S|. We define |C1| + |C2| +…+ |Cp| = q and the hitting-set problem for S and C can be solved in O(2q+n2+n) with “extract” operation, O(p) with “discard” operation, O(2n) with “append” operation, O(1+p+(n2+3n)/2) with “merge” operation, and O(n-1) with “amplify” operation in the Adleman-Lipton model. Proof: Algorithm 5-2 can be applied to solve the hitting-set problem for S and C. From steps (1)-(20), it is obvious that we use 2*q = (2q) “extraction” operations, (p) “discard” operations, (2n) “append” operations and (n+p) “merge” operations, and (n-1) “amplify” operation. From step (21) to step (34) in Algorithms 3-1, we use 2* (1+2+…+(n-1)+n) = 2*((1+n)*n)/2 = n2+n “extraction” operations, no “discard” operations, no append operation, and 1+1+2+…+(n-1)+n = 1 + ((1+n)*n)/2 = 1 + (n2+n)/2 merge operations, and no “amplify” operation. Therefore, from the analysis above, it is inferred that the time complexity of Algorithm 3-2 is O(2p+n2+n) with “extract” operation, O(p) with “discard” operation, O(2n) with “append” operation, O(1+p+(n2+3n)/2) with “merge” operation, and O(n-1) with “amplify” operation in 55 the Adleman-Lipton model. Theorem 5-2: A ground set S with n elements and a collection C of subsets {C1, C2, C3, … Cp} are given, where Ci is a subset of S and a positive integer K ≦ |S|. The hitting-set problem for S and C can be solved with O(2n) library strands in the Adleman-Lipton model. Proof: (Refer to Lemma 5-2 and Theorem 5-1.) Theorem 5-3: A ground set S with n elements and a collection C of subsets {C1, C2, C3, … Cp} are given, where Ci is a subset of S and a positive integer K ≦ |S|. The hitting-set problem for S and C can be solved with O( n ) tubes in the Adleman-Lipton model. Proof: (Refer to Lemma 6-2 and Theorem 6-1.) Theorem 5-4: A ground set S with n elements and a collection C of subsets {C1, C2, C3, … Cp} are given, where Ci is a subset of S and a positive integer K ≦ |S|. The 56 hitting-set problem for S and C can be solved with the longest library strand, O(15*n +15*p), in the Adleman-Lipton model. Proof: (Refer to Lemma 5-2 and Theorem 5-1.) 57 Chapter 6 Simulated Experimental Results Adleman and his coworkers devise a scheme to design DNA sequences for a combinatorial library encoding strings of zeros and ones [2,5]. In this scheme a particular N-bit number is represented by a DNA sequence that is (N * K) bases long and is divided logically into N sequential k-base long blocks. Each block bears one of two unique DNA sequences, one that represents a ‘1’ and the other represents a ‘0’. Importantly, the sequence that encodes ‘0’ in the first block is different from the sequence that encodes ‘0’ in the second block and all of the other blocks. Thus there are 2N different short DNA sequences that are used to create any of the 2^N possible library strands. DNA sequence design is a very important issue because DNA-based computing relies on the biochemical operations and these operations could cause errors if we do not have a proper design. Adleman and his coworkers introduce seven constraints to ease the probe-library hybridization by reducing secondary structure in the DNA molecules [5]. The constraints are: (1). Library strands contain only As, Ts, and Cs. (2).Every library and probe sequence has no runs of more than 4 As, 4 Ts, 4 Cs or 58 4Gs. (3). Every probe sequence has fewer than 4 mismatches with any 15 base alignment of any library strand (except for at its matching bit-value). (4). No 15 base section of a library strand has fewer than 4 mismatches with any 15 base alignment of itself or any other library strand. (5). No 15 base probe has a run of more than 7 matches with any 8 base alignment of any library strand (except for at its matching bit-value). (6). No library strand has a run of more than 7 matches with any 8 base alignment of itself or any other library strand. (7). Every probe has 4, 5, or 6 Gs in its sequence. By the constraint (1), we know that library strands contain only As, Ts, and Cs which would have less secondary structure than those contain equal numbers of As, Ts, Cs, and Gs, and have more opportunity for binding probes. By the constraint (2), long homopolymer tracts may have unusual secondary structure that inhibits the binding of probes to library strands and the melting temperatures of probe and library strands hybridization will be more similar if they do not have long homopolymer tracts. Constraints (3) and (5) are intended to ensure that probes bind only weakly where they are not intended to bind. Constraints (4) and (6) are intended to ensure 59 that library strands have a low affinity for themselves. Constraint (7) is intended to ensure that intended probe-library pairings have uniform melting temperatures. We run Adleman’s program [5] using a AMD Athlon XP CPU and 1 GB of main memory. Our operating system is Window XP and the compiler is Visual C++ 6.0 . The program is applied to generate DNA sequences to solve the One-In-Three (1in3) 3-SAT problem and construct each 15-base DNA sequences for every bit of the library. For each bit, the program generates two 15-base random sequences (‘1’ and ‘0’) and check to see if the library strands satisfy the seven constraints with the new DNA sequences added [5]. If the constraints are satisfied, the new DNA sequences are ‘greedily’ accepted. If the constraints are not satisfied then mutations are introduced one by one into the new block until either: (A) the constraints are satisfied and the new DNA sequences are then accepted or (B) a threshold for the number of mutations is exceeded and the program has failed and so it exits, printing the sequence found so far. If all bits that satisfy the constraints are found then the program has succeeded and it outputs these sequences. 6.1 Simulation of Experimental Results of One-In-Three 3-SAT problem 60 Consider the example V = ( x1, x2, x3, x4 ) and f ( x1, x2, x3, x4 ) = C = ( x1 x 2 x3 ) ( x1 x3 x 4 ) in subsection 5.1 for solving the One-In-Three (1in3) 3-SAT problem , simulation of Not-All-Equal (NAE) 3-SAT problem is similar. DNA sequences generated by Adleman’s program are shown in table 6-1. Adleman’s program is also used to calculate the enthalpy, entropy, and free energy for binding of each probe to its corresponding region on a library strand. The energy used are shown in table 6-2. Table 6-1: Sequences chosen to represent xk1 and xk0 in the example for V = ( x1, x2, x3, x4 ) and f ( x1, x2, x3, x4 ) = C = ( x1 x 2 x3 ) ( x1 x3 x 4 ) in subsection 4.1. Bit 5 3 DNA Sequence x1 1 CATTCACAAACAATT x1 0 TCATTCTCAACAAAA x2 1 CTCTATTCCTCTCAA x2 0 ACACCCTCTAATCTA x3 1 TCTCCCTATCTATTT x3 0 TCCTATTTAACTCCC x4 1 CTCTACTCAAAATAA 61 x4 0 TATAACTTTCTCTCT Table 6-2: The energy for binding each probe to its corresponding region on a library strand. Bit Entropy energy Free energy (S) (G) Enthalpy energy (H) x1 1 109.6 285.7 24.4 x1 0 104.5 267.5 24.3 x2 1 114.2 295.4 26 x2 0 102.6 261.2 24.4 x3 1 103.9 273.3 22.1 x3 0 103.2 265.5 24 x4 1 102.4 270.7 21.5 x4 0 105.1 271.8 23.9 The program also figured out the average and standard deviation for the enthalpy, entropy and free energy over all probe/library strand interaction. The energy levels are shown in table 6-3. 62 Table 6-3: The energy over all probe/library strand interactions. Average Enthalpy energy Entropy energy Free energy (H) (S) (G) 105.688 273.887 23.825 3.86073 10.5438 1.3245 Standard deviation The library strands are shown in table 6-4 and represent every possible truth assignments such that each clause has exactly one. Table 6-4 : DNA sequences chosen to represent answers in test tube T. {0011} 5 - TCATTCTCAACAAAA ACACCCTCTAATCTA TCTCCCTATCTATTT CTCTACTCAAAATAA - 3 3 - AGTAAGAGTTGTTTT TGTGGGAGATTAGAT AGAGGGATAGATAAA GAGATGAGTTTTATT - 5 {0100} 5 - 63 TCATTCTCAACAAAA CTCTATTCCTCTCAA TCCTATTTAACTCCC TATAACTTTCTCTCT- 3 3 - AGTAAGAGTTGTTTT GAGATAAGGAGAGTT AGGATAAATTGAGGG ATATTGAAAGAGAGA- 5 {1110} 5 - CATTCACAAACAATT CTCTATTCCTCTCAA TCTCCCTATCTATTT TATAACTTTCTCTCT- 3 3 - GTAAGTGTTTGTTAA GAGATAAGGAGAGTT AGAGGGATAGATAAA ATATTGAAAGAGAGA- 5 6.2 Simulation of Experimental Results of Hitting-set problem Consider the example with S = {1, 2, 3, 4} and C = {{1, 2, 3}, {4}} mentioned in subsection 5.1, and DNA sequences generated by Adleman’s program are shown in 64 Table 6-5. Adleman’s program is also used to calculate the enthalpy, entropy, and free energy for binding of each probe to its corresponding region on a library strand, while the energy used is shown in Table 6-6. Table 6-5: Sequences chosen to represent zk1 and zk0 in the example for S = {1, 2, 3, 4} and C = {{1, 2, 3}, {4}} in subsection 5.1. Bit 5 3 DNA Sequence z11 CATTCACAAACAATT z10 TCATTCTCAACAAAA z21 CTCTATTCCTCTCAA z20 ACACCCTCTAATCTA z31 TCTCCCTATCTATTT z30 TCCTATTTAACTCCC z41 CTCTACTCAAAATAA z40 TATAACTTTCTCTCT Table 6-6: The energies for of binding each probe to its corresponding region on a library strand. 65 Bit Entropy energy Free energy (S) (G) Enthalpy energy (H) z1 1 109.6 285.7 24.4 z1 0 104.5 267.5 24.3 z2 1 114.2 295.4 26 z2 0 102.6 261.2 24.4 z3 1 103.9 273.3 22.1 z3 0 103.2 265.5 24 z4 1 102.4 270.7 21.5 z4 0 105.1 271.8 23.9 Our program also figured out the average and standard deviation for the enthalpy, entropy and free energy over all probe/library strand interaction. The energy levels are shown as in Table 6-7. Table 6-8 presents the library strands and the hitting-set with k = 2: {{1, 4},{2, 4},{3, 4}}. Table 6-7: The energies over all probe/library strand interactions. Enthalpy energy Entropy energy 66 Free energy Average (H) (S) (G) 105.688 273.887 23.825 3.86073 10.5438 1.3245 Standard deviation Table 6-8: DNA sequences chosen to represent the hitting-set with k = 2 in tube T0. {1,4} 5 - CTCTACTCAAAATAA TCCTATTTAACTCCC ACACCCTCTAATCTA CATTCACAAACAATT- 3 3 - GAGATGAGTTTTATT AGGATAAATTGAGGG TGTGGGAGATTAGAT GTAAGTGTTTGTTAA- 5 {2,4} 5 - CTCTACTCAAAATAA TCCTATTTAACTCCC CTCTATTCCTCTCAA TCATTCTCAACAAAA- 3 3 - GAGATGAGTTTTATT AGGATAAATTGAGGG 67 GAGATAAGGAGAGTT AGTAAGAGTTGTTTT- 5 (3,4) 5 - CTCTACTCAAAATAA TCTCCCTATCTATTT ACACCCTCTAATCTA TCATTCTCAACAAAA- 3 3 - GAGATGAGTTTTATT AGAGGGATAGATAAA TGTGGGAGATTAGAT AGTAAGAGTTGTTTT- 5 68 Chapter 7 Discussions and Conclusions Lipton is the first people using DNA to solve SAT problem while his paper is still very primitive [22]. Adleman and his co-authors (Braich et al.) chose to solve a 6-variable 11-clause formula on the 3-SAT problem [5]. Adleman’s paper emphasizes the experiment with the design in laboratory to test the biochemical feasibility. Then, Amos presented a DNA-based algorithm to solve a 3-SAT problem [4]. Based on Amos’ algorithm, I do deeper research on 3-SAT problems which are Not-All-Equal (NAE) and One-In-Three (1IN3) 3-SAT problems. My DNA-based algorithms of the mathematical model of these two problems are complete and clear. The Hitting-set problem is a NP-complete problem and it takes exponential time to solve it in a traditional digital computer, while in DNA-based supercomputing, only polynomial time is needed to solve such a complex problem (see the section 6.7 for complexity analysis). The main contribution of this paper in this part is that this work is the first one to solve the hitting-set problem by applying a DNA-based algorithm. In this paper, we propose the DNA-based algorithm (Algorithm 3-1, algorithm 69 4-1, and Algorithm 5-2) to solve the Not-All-Equal (NAE) and One-In-Three (1IN3) 3-SAT problems and the hitting-set problem. These presented algorithms are based on biological operations in the Adleman-Lipton model and therefore, inherit several advantages from it. First, the proposed algorithms actually have a lower rate of errors for hybridization because we use Adleman,s program to generate good DNA sequences for constructing the solution space of Not-All-Equal (NAE) and One-In-Three (1IN3) 3-SAT problems and the Hitting-set problem. Only simple and fast biological operations in the Adleman-Lipton model were employed to solve these three problems. Second, the basic biological operations in the Adleman-Lipton model had been performed in a fully automated manner in their lab. The full automation manner is essential not only for the speedup of computation but also for error-free computation. Nowadays, many NP-complete problems which could not be solved by a traditional digital computer is now tried to be solved by DNA-based algorithm. Even so, it is still very difficult to support biological operations using mathematical instructions. In the future, there are still many difficulties to be overcome and we hope that DNA-based supercomputing could become a reality someday. 70 References [1] L. M. Adleman. “Molecular Computation of Solutions to Combinatorial Problems”. Science, 266, pp. 1021-1024, Nov. 11, 1994. [2] L. M. Adleman, “On constructing a molecular computer”, in DNA-bsed computers, volume 27 of DIMACS [3] L. M. Adleman, “Computing with DNA”, Scientific American, August, 1998 [4] M. Amos, Theoretical and Experimental DNA Computation. Springer, 2005 [5] R. S. Braich, C. Johnson, P. W. K. Rothemund, D. Hwang, N. Chelyapov, and L. M. Adleman, “Solution of a satisfiability problem on a gel-based DNA computer” in Proceedings of the Sixth International Conference on DNA Computation ( DNA 2000 ), Lecture Notes in Computer Science 2054, pp. 27-42,2001 [6] R. S. Braich, C. Johnson, P.W.K. Rothemund, N. Chelyapov, and L. M. Adleman, 2002. Solution of a 20-variable 3-SAT problem on a DNA computer. Science, vol. 296, No. 5567, 499–502. [7] R. Brijder, M. Cavaliere, A. Riscos-Núñez, G. Rozenberg, and D. Sburlan 2008. Membrane systems with proteins embedded in membranes. Theoretical Computer Science, 404, 26-39. 71 [8] J. Bishop and E. Klavins 2007. An improved autonomous DNA nanomotor. Nanoletters, Sep., Vol. 7, No. 9, 2574-2577. [9] W. L. Chang and M. Guo, ”Solving the clique problem and the vertex cover problem in Adleman-Lipton’s model”, in Proceedings of IASTED International Conference, Networks, Parallel and Distributed Processing, and Applications, pp. 431-436, 2002 [10] W. L. Chang, M. Ho, and M. Guo, "Molecular Solutions for the Subset-sum Problem on DNA-based Supercomputing", BioSystems (Elsevier Science), Vol. 73, No. 2, 2004, pp. 117-130. [11] W. L. Chang, M. Guo, and M. Ho, "Towards solution of the set-splitting problem on gel-based DNA computing", Future Generation Computer Systems, Volume: 20, Issue: 5, June 15, 2004, pp. 875-885. [12] W. L. Chang, M. Guo and J. Cao, "Using Sticker to Solve the 3-Dimensional Matching Problem in Molecular Supercomputers", International Journal of High Performance Computing and Networking, 2004, Vol. 1, No.1/2/3 pp. 128 139. [13] W. L. Chang, M. Guo, and J. Wu, “ Solving the Independent-set Problem in a DNA-based Super Computer Model “, Parallel Processing Letters, Vol. 15, No. 4 72 (2005) 469-479. [14] W. L. Chang, M. Ho, M. Guo, C. Liu, “Fast Parallel Bio-molecular Solutions: the Set-basis Problem”, International Journal of Computational Science and Engineering, Volume 2, Number 1-2, 2006, pp. 72 – 80. [15] W. L.Chang, “Fast Parallel DNA-based Algorithms for Molecular Computation: the Set-Partition Problem”, IEEE Transactions on Nanobioscience, Vol. 6, No. 1, 2007, pp 346 - 353. [16] H. Chen, A. Goel, and C. Luhrs 2008. Dimension augmentation and combinatorial criteria for efficient error-resistant DNA self-assembly. ACM-SIAM Symposium on Discrete Algorithms (SODA) 409-418. [17] M. Cook, P. W. K. Rothemund and E. Winfree Self-assembled circuit patterns. 2004. DNA Computers 9, LNCS v. 2943, 91-107. [18] R. P. Feynman, “In Minaturization”. D.H. Gilbert, Ed., Reinhold Publishing Corporation, New York, 1961, pp. 282-296. [19] M.R. Garey and D.S. Johnson (1979), “ Computers and Intractability A Guide to the Theory of NP-Completeness“, San Francisco, CA [20] R. P. Goodman, I. A. T. Schaap, C. F. Tardin, C. M. Erben, R.M. Berry, C. F. Schmidt and A. J. Turberfield 2005. Rapid chiral assembly of rigid DNA 73 building blocks for molecular nanofabrication. Science 310, 1661-1665. [21] S.Y. Hsieh, C.W. Huang and H.H. Chou, “A DNA-based graph encoding scheme with its applications to graph isomorphism problems “, Applied Mathematics and Computation, Volume 203, Issue 2, 15 September 2008, Pages 502-512 [22] R. J. Lipton. “DNA Solution of Hard Computational Problems”. Science, 268, pp. 542-545, 1995. [23] U. Majumder, J. H. Reif, and S. Sahu 2008. Stochastic analysis of reversible self-assembly. Journal of Computational and Theoretical Nanoscience, Volume 5, Number 7, 1289-1305. [24] P. O'Neill, P.W.K. Rothemund, A. Kumar and D. K. Fygenson 2006. Sturdier DNA Nanotubes via Ligation. Nano Letters, 6:1379-1381. [25] S. Roweis, E. Winfree, R. Burgoyne, N.V. Chelyapov, M.F. Goodman, P.W.K. Rothemund, L.M. Adleman, 1999. A sticker based model for dna computation. In: Landweber, L., Baum, E. (Eds.), Second Annual Workshop on DNA Computing, Princeton University. DIMACS: Series in Discrete Mathematics and Theoretical Computer Science. American Mathematical Society, pp. 1–29. [26] K. Suzuki and S. Murata 2007. Design of DNA spike oscillator. Unconventional Computing, 163-175. 74 [27] R. Yashin, R. Rudchenko, and M. N. Stojanovic 2007. Networking particles over distance using oligonucleotide-based devices. Journal of the American Chemical Society, 129 (50), 15581 -15584. [28] P. Yin, H. M. T. Choi, C. R. Calvert and N.A. Pierce 2008. Programming biomolecular self-assembly pathways. Nature, 2008, 451: 318-322. [29] D. Y. Zhang and E. Winfree Dynamic allosteric control of noncovalent DNA catalysis reactions. J. Am. Chem. Soc., 130 (42), 13921–13926. [30] L. Kari, “From micro-soft to bio-soft: Computing with DNA”, Biocputing and emergent computation: Proceedings of BCEC97, World Scientific 1997, Skovde, Sweden, 1997, pp. 146-164. [31] J. Watada and R. B. A. Bakar, “DNA Computing and Its Applications”, Eighth International Conference on Intelligent Systems Design and Applications, pp.288-294 [32] S. Zhou, Q. Zhang, J. Zhao and J. Li, Optimization of DNA Encodings Based on Free Energy, ICIC Express Letters, vol.1, no.1, pp.33-37, 2007. [33] J. Li, Q. Zhang, R. Li and S. Zhou, Optimization of DNA Encoding Based on Combinatorial Constraints, ICIC Express Letters, vol.2, no.1, pp.81-88, 2008. [34] Rohani Binti Abu Bakar, Junzo Watada and Witold Pedrycz, A Proximity 75 Approach to DNA Based Clustering Analysis, International Journal of Innovative Computing, Information and Control, vol.4, no.5, pp.1203-1212, 2008. [35] Tadahiro Kin, Ken-ichi Makino, Nobuo Noda, Kazuharu Koide and Masahiro Nakano, The Molecular Dynamics Calculation of Clathrate Hydrate Structure Stability for Innovative Organ Preservation Method, International Journal of Innovative Computing, Information and Control, vol.4, no.2, pp.249-254, 2008. [36] D. Rooss, “Recent Developments in DNA-Computing”, Proceedings of the International Symposium on Multiple-Valued Logic, 1997, pp. 3-9 76 Vita Nung-Yue Shi ( 施 能 裕 ) received the B.S. degree in Computer Science Department from Feng-Chia University in Taichung, Taiwan. Then, he studied abroad in Brooklyn, New York City, United States and received the Master of Science from Polytechnic University in New York City. From 1987 till now, he has been an instructor in Southern Taiwan University in Taiwan and from 2001, he has been working towards the Ph.D. degree and currently a doctoral candidate in the Department of Computer Science and Information Engineering in National Cheng Kung University , Taiwan. His research interests include DNA Computing, Quantum Computing, and Net Work Topology and Net Work Analysis. 77 Publications Journal Papers 1. Nung-Yue Shi and Chih-Ping Chu, “A molecular solution to the hitting-set problem in DNA-based supercomputing” , Information Sciences, Volume 180, Issue 6, 15 March 2010, Pages 1010-1019, Special Issue on Modelling Uncertainty 2. Nung-Yue Shi and Chih-Ping Chu, “A Molecular Algorithmic Solution for the Not-All-Equal and One-In-Three 3-SAT Problems in DNA-based Supercomputing”, Accepted by International Jouranal of Innovative Computing, Information and Control International Conference Papers 1.Nung-Yue Shi and Chih-Ping Chu: "Fast Parallel Molecular Solution To the Hitting-set Problem", Eighth International Conference on Intelligent systems Design And Application,Vol. 3, pp. 442-447, ISDA 2008 78 2.Nung-Yue Shi and Chih-Ping Chu, "A DNA-based Algorithm for the Solution of One-In-Three 3-SAT Problem", 2009 WASE International Conference on Information Engineering, Volume I, pp.620-625 (榮獲最佳論 文獎) (Received Best Paper Award) 3.Nung-Yue Shi and Chih-Ping Chu, "A DNA-based Algorithm for the Solution of Not-All-Equal 3-SAT Problem", 2009 WASE International Conference on Information Engineering, Volume II, pp.94-99 (榮獲最佳論 文獎) (Received Best Paper Award) 79 Hornors 1. paper “A DNA-based Algorithm for the Solution of One-In-Three 3-SAT Problem” received the Best Paper Award in 2009 WASE International Conference on Information Engineering. 2. paper “A DNA-based Algorithm for the Solution of Not-All-Equal 3-SAT Problem” received the Best Paper Award in 2009 WASE International Conference on Information Engineering. 80