Fast parallel molecular solution to the Hitting-set problem

advertisement

Fast parallel molecular solution to the Hitting-set problem

Speaker Nung-Yue Shi

Abstract:hitting-set is defined by Garey and Johnson in 1979 in the book

“computers and intractability A guide to the theory of NPCompleteness”

Assume that we have collection C of subsets of a finite set S, and a positive integer K  |S|, and we would like to know if there is a subset s’S with | s’

|  K such that s’ hits (contains) at least one element from each subset in

C.

1.

Introduction

DNA (deoxyribonucleic acid) is the main material of nucleus and could decide the inheritance model of human beings. DNA is made up of a linear chain of smaller units which is called

nucleotides. Nucleotides contain three major components those are

deoxyribose, phosphate group, and the

base.

Different nucleotides are tested by their base which could be adenine

(abbreviated as A), guanine (G),

cytosine (C) and thymine (T). Two strands of DNA could form a double helix if the respective bases are the famous Watson-Crick complements which are C matches G and A matches

T.

 the 3 ’ end (the 3rd carbon of the deoxyribose) will match the 5 ’ end (the

5th carbon attaching a phosphate group) in each strand.

DNA-based Computing [1, 2, 3, 4, 5, 6,

8] treats the DNA strands as the bits in the traditional digital computers, and use the techniques such as PCR

(polymerase chain reaction), gel

electrophoresis, and enzyme reactions to separate, concatenate, delete, and duplicate the DNA strands [11].

We could make a molecular computer with the tools as the following:

1. Watson-Crick complements. Two strands of DNA will anneal to form a famous double helix if the respective base meets its Watson-Crick complements which are C matches G and A matches T.

2. Ligases. Ligases bond the splitted

DNA molecule together. For example,

DNA ligase will take two strands of DNA molecule and covalently connect them into a single strand.

3. Nucleases. Nucleases would cut nucleic acid of a DNA molecule. For example, nucleases would look for a predetermined sequences of bases of a strand of DNA molecules, if found, would cut the DNA strands into two pieces.

4. Polymerases. Polymerases copy information from one DNA molecule into the other.

5. Gel electrophoresis. A solution of

DNA molecules is placed in one end of gel, and we applied electric current to the gel. This process separates DNA strands by length.

6. DNA synthesis. Nowadays, we could ask a commercial DNA synthesis facility to make the DNA sequence. Just in a few days, we will receive a test tube containing about 1018 molecules of

DNA which is the sequence we ask.

The above six techniques is the basis of

Adleman-Lipton DNA computing modle.

From which, Adleman developed eight bio-molecular instructions to perform bio-molecular programs. That is what we mention next section.

2. background

 we could perform the following operations [2  10 ]:

1. Append-tail. Given a tube T and a binary digit x j

, the operation, "Append-

tail", will append x j onto the end of every data stored in the tube T. The formal representation for the operation is written as "Append-tail(T, x j

)".

2. Amplify. Given a tube T, the operation “ Amplify(T, T1, T2) ” will produce two new tubes T1 and T2 so that T1 and T2 are totally a copy of T

(T1 and T2 are now identical) and T becomes an empty tube.

3. Merge. Given n tubes T1  Tn, the merge operation is to merge data stored in any n tubes into one tube, without any change in the individual data. The formal representation for the merge operation is written as "  (T1,  ,

Tn)", where  (T1,  , Tn) = T1   

Tn.

4. Extract. Given a tube T and a binary digit xk, the extract operation will produce two tubes +(T, xk) and  (T,

xk), where +(T, xk) is all of the data in

T which contain xk and  (T, xk) is all of the data in T which do not contain xk.

5. Detect. Given a tube T, the detect operation is used to check whether any a data is included in T or not. If at least one data is included in T we have “ yes ” , and if no data is included in T we have

“ no “ . The formal representation for the operation is written as “ Detect(T) “ .

6. Discard. Given a tube T, the discard operation will discard T. The formal representation for the operation is written as “ Discard(T) “ .

7. Read. Given a tube T, the read operation is used to describe any a data, which is contained in T. Even if T contains many different data, the operation can give an explicit description of exactly one of them. The formal representation for the operation is written as “ read(T) “ .

8. Append-head. Given a tube T and a binary digit xj, the operation, "Appendhead", will append xj onto the head of every data stored in the tube T. The formal representation for the operation is written as “ Append-head(T, xj) “ .

Definition 3-1: Assume that a ground set S and a collection C of subsets {C1,

C2, C3, … Cn} are given, where Ci is a subset of S and a positive integer K

|S|. The problem is to find if there is some subset of S such that |s ’ |  K and

(Ci ∩ s ’ ) ≠  , where i = 1, 2, 3, … n.

For example, S = {1, 2, 3, 4} and C =

{{1, 2, 3}, {4}}. The hitting-set for S

= {1, 2, 3, 4} and C = {{1, 2, 3}, {4}} consists of {1, 4}, {2, 4}, {3, 4}, {1,

2, 4}, {1, 3, 4}, {2, 3, 4} and {1, 2, 3,

4}.

Table 3-1: each possible subsets of a ground set

S

= {1, 2, 3, 4}.

subset

{2,3}

{3,4}

{1,2,4}

{2,3,4}

{2}

{4}

{1,3}

Encoding number

0110

1100

1011

1110

0000

0010

1000

0101 subset

{1}

{3}

{1,2}

{1,4}

{2,4}

{1,2,3}

{1,3,4}

{1,2,3,4}

Encoding number

1010

0111

1101

1111

0001

0100

0011

1001

6.

7.

8.

9.

10.

1.

2.

3.

4.

5.

11.

12.

13.

The pseudo-code algorithm of solving the hitting-set problem for an n -element set S and a collection of subset C will proceed as follows:

Create initial set T

For each subset do begin

For each element in a subset do begin if (the b th element in the a th subset in C is the i th element in S) then begin put the subset whose i th encoding bit is 1 on Tb; put the subset whose i th encoding bit is 0 on T; end

End for

Delete T

Create new set T by merging those extracted strings from Tb

End for

If T is nonempty then T is the hitting-set.

Algorithm 2 : Solving the hitting-set problem for an n -element set S and a collection of subset C .

(0a) Append-head(T1, z

(0b) Append-head(T2, z

1

1

1

0

).

).

(0c) T =  (T1, T2).

(1) For k = 2 to n

(1a) Amplify(T, T1, T2).

(1b) Append-head(T1, z k

1 k

0

).

).

(1c) Append-head(T2, z

(1d) T =  (T1, T2).

(1e) EndFor

(2) For a = 1 to |C| do begin

(3) For b = 1 to |Ca| do begin

(4) If (the b th element in the a th subset in C is the i th element in S)

(5) then begin

(6)

(7)

Tb= +(T,z

T =  (T,z

(8) end i

1 i

1

)

)

(9) Endfor

(10) Discard (T)

(11) T =  (T1, T2, … T b

(12) Endfor

).

The example for S = {1, 2, 3, 4} and C = {{1, 2, 3},

{4}} in subsection 3.1 is applied to show the power of

Algorithm 1. The first execution of Step (6) and Step

(7) when a = 1 and b = 1, we put the subset whose rightmost encoding bit is 1 on T1, and put the subset whose rightmost encoding bit is 0 on T, so we get T1=

{0001, 0011, 0101, 1001, 0111, 1011, 1101, 1111} and T = {0000, 0010, 0100, 1000, 0110, 1010, 1100,

1110}. Next, the second execution of Step (6) and

Step (7) when a = 1 and b = 2, we get T2 = {0010,

0110, 1010, 1110} and T = {0000, 0100, 1000,

1100}.

Then, the third execution of Step (6) and

Step (7) when a = 1 and b = 3, we obtain T3

= {0100, 1100} and T = {0000, 1000}.

Because the first outer loop is ended, the first execution of Step (10) is applied to discard test tube T and the first execution of Step (11) is applied to merge test tube T1, T2, T3 into T, we get T = {0001, 0011, 0101, 1001, 0111,

1011, 1101, 1111, 0010, 0110, 1010, 1110,

0100, 1100}.

For the second outer loop when a = 2 and b

= 1, from the fourth execution of Step (6) and Step (7), we get T1 = {1001, 1011,

1101, 1111, 1010, 1110, 1100} and T =

{0001, 0011, 0101, 0111, 0010, 0110,

0100}. Because the second subset has only one element, the second outer loop is ended also. The second execution of Step (10) is applied to discard T and the second execution of Step (11) is used to merge T1 into T. This implies the hitting set T = {1001, 1011, 1101,

1111, 1010, 1110, 1100}.

4.Discussion and Conclusion

In this paper, we propose the DNA-based algorithm to solve the hitting-set problem.

Nowadays, many NP-complete problems which could not be solved by a traditional digital computer is now tried to be solved by

DNA-based algorithm. Even so, it is still very difficult to support biological operations using mathematical instructions. In the future, there are still many difficulties to be overcome and we hope that DNA-based supercomputing could become a reality someday.

Download