Speaker Nung-Yue Shi
Assume that we have collection C of subsets of a finite set S, and a positive integer K |S|, and we would like to know if there is a subset s’ S with | s’
| K such that s’ hits (contains) at least one element from each subset in
C.
1.
DNA (deoxyribonucleic acid) is the main material of nucleus and could decide the inheritance model of human beings. DNA is made up of a linear chain of smaller units which is called
nucleotides. Nucleotides contain three major components those are
deoxyribose, phosphate group, and the
base.
Different nucleotides are tested by their base which could be adenine
(abbreviated as A), guanine (G),
cytosine (C) and thymine (T). Two strands of DNA could form a double helix if the respective bases are the famous Watson-Crick complements which are C matches G and A matches
T.
the 3 ’ end (the 3rd carbon of the deoxyribose) will match the 5 ’ end (the
5th carbon attaching a phosphate group) in each strand.
DNA-based Computing [1, 2, 3, 4, 5, 6,
8] treats the DNA strands as the bits in the traditional digital computers, and use the techniques such as PCR
(polymerase chain reaction), gel
electrophoresis, and enzyme reactions to separate, concatenate, delete, and duplicate the DNA strands [11].
1. Watson-Crick complements. Two strands of DNA will anneal to form a famous double helix if the respective base meets its Watson-Crick complements which are C matches G and A matches T.
2. Ligases. Ligases bond the splitted
DNA molecule together. For example,
DNA ligase will take two strands of DNA molecule and covalently connect them into a single strand.
3. Nucleases. Nucleases would cut nucleic acid of a DNA molecule. For example, nucleases would look for a predetermined sequences of bases of a strand of DNA molecules, if found, would cut the DNA strands into two pieces.
4. Polymerases. Polymerases copy information from one DNA molecule into the other.
5. Gel electrophoresis. A solution of
DNA molecules is placed in one end of gel, and we applied electric current to the gel. This process separates DNA strands by length.
6. DNA synthesis. Nowadays, we could ask a commercial DNA synthesis facility to make the DNA sequence. Just in a few days, we will receive a test tube containing about 1018 molecules of
DNA which is the sequence we ask.
The above six techniques is the basis of
Adleman-Lipton DNA computing modle.
From which, Adleman developed eight bio-molecular instructions to perform bio-molecular programs. That is what we mention next section.
we could perform the following operations [2 10 ]:
1. Append-tail. Given a tube T and a binary digit x j
, the operation, "Append-
tail", will append x j onto the end of every data stored in the tube T. The formal representation for the operation is written as "Append-tail(T, x j
)".
2. Amplify. Given a tube T, the operation “ Amplify(T, T1, T2) ” will produce two new tubes T1 and T2 so that T1 and T2 are totally a copy of T
(T1 and T2 are now identical) and T becomes an empty tube.
3. Merge. Given n tubes T1 Tn, the merge operation is to merge data stored in any n tubes into one tube, without any change in the individual data. The formal representation for the merge operation is written as " (T1, ,
Tn)", where (T1, , Tn) = T1
Tn.
4. Extract. Given a tube T and a binary digit xk, the extract operation will produce two tubes +(T, xk) and (T,
xk), where +(T, xk) is all of the data in
T which contain xk and (T, xk) is all of the data in T which do not contain xk.
5. Detect. Given a tube T, the detect operation is used to check whether any a data is included in T or not. If at least one data is included in T we have “ yes ” , and if no data is included in T we have
“ no “ . The formal representation for the operation is written as “ Detect(T) “ .
6. Discard. Given a tube T, the discard operation will discard T. The formal representation for the operation is written as “ Discard(T) “ .
7. Read. Given a tube T, the read operation is used to describe any a data, which is contained in T. Even if T contains many different data, the operation can give an explicit description of exactly one of them. The formal representation for the operation is written as “ read(T) “ .
8. Append-head. Given a tube T and a binary digit xj, the operation, "Appendhead", will append xj onto the head of every data stored in the tube T. The formal representation for the operation is written as “ Append-head(T, xj) “ .
Definition 3-1: Assume that a ground set S and a collection C of subsets {C1,
C2, C3, … Cn} are given, where Ci is a subset of S and a positive integer K ≦
|S|. The problem is to find if there is some subset of S such that |s ’ | K and
(Ci ∩ s ’ ) ≠ , where i = 1, 2, 3, … n.
For example, S = {1, 2, 3, 4} and C =
{{1, 2, 3}, {4}}. The hitting-set for S
= {1, 2, 3, 4} and C = {{1, 2, 3}, {4}} consists of {1, 4}, {2, 4}, {3, 4}, {1,
2, 4}, {1, 3, 4}, {2, 3, 4} and {1, 2, 3,
4}.
S
subset
{2,3}
{3,4}
{1,2,4}
{2,3,4}
{2}
{4}
{1,3}
Encoding number
0110
1100
1011
1110
0000
0010
1000
0101 subset
{1}
{3}
{1,2}
{1,4}
{2,4}
{1,2,3}
{1,3,4}
{1,2,3,4}
Encoding number
1010
0111
1101
1111
0001
0100
0011
1001
6.
7.
8.
9.
10.
1.
2.
3.
4.
5.
11.
12.
13.
The pseudo-code algorithm of solving the hitting-set problem for an n -element set S and a collection of subset C will proceed as follows:
Create initial set T
For each subset do begin
For each element in a subset do begin if (the b th element in the a th subset in C is the i th element in S) then begin put the subset whose i th encoding bit is 1 on Tb; put the subset whose i th encoding bit is 0 on T; end
End for
Delete T
Create new set T by merging those extracted strings from Tb
End for
If T is nonempty then T is the hitting-set.
Algorithm 2 : Solving the hitting-set problem for an n -element set S and a collection of subset C .
(0a) Append-head(T1, z
(0b) Append-head(T2, z
1
1
1
0
).
).
(0c) T = (T1, T2).
(1) For k = 2 to n
(1a) Amplify(T, T1, T2).
(1b) Append-head(T1, z k
1 k
0
).
).
(1c) Append-head(T2, z
(1d) T = (T1, T2).
(1e) EndFor
(2) For a = 1 to |C| do begin
(3) For b = 1 to |Ca| do begin
(4) If (the b th element in the a th subset in C is the i th element in S)
(5) then begin
(6)
(7)
Tb= +(T,z
T = (T,z
(8) end i
1 i
1
)
)
(9) Endfor
(10) Discard (T)
(11) T = (T1, T2, … T b
(12) Endfor
).
The example for S = {1, 2, 3, 4} and C = {{1, 2, 3},
{4}} in subsection 3.1 is applied to show the power of
Algorithm 1. The first execution of Step (6) and Step
(7) when a = 1 and b = 1, we put the subset whose rightmost encoding bit is 1 on T1, and put the subset whose rightmost encoding bit is 0 on T, so we get T1=
{0001, 0011, 0101, 1001, 0111, 1011, 1101, 1111} and T = {0000, 0010, 0100, 1000, 0110, 1010, 1100,
1110}. Next, the second execution of Step (6) and
Step (7) when a = 1 and b = 2, we get T2 = {0010,
0110, 1010, 1110} and T = {0000, 0100, 1000,
1100}.
Then, the third execution of Step (6) and
Step (7) when a = 1 and b = 3, we obtain T3
= {0100, 1100} and T = {0000, 1000}.
Because the first outer loop is ended, the first execution of Step (10) is applied to discard test tube T and the first execution of Step (11) is applied to merge test tube T1, T2, T3 into T, we get T = {0001, 0011, 0101, 1001, 0111,
1011, 1101, 1111, 0010, 0110, 1010, 1110,
0100, 1100}.
For the second outer loop when a = 2 and b
= 1, from the fourth execution of Step (6) and Step (7), we get T1 = {1001, 1011,
1101, 1111, 1010, 1110, 1100} and T =
{0001, 0011, 0101, 0111, 0010, 0110,
0100}. Because the second subset has only one element, the second outer loop is ended also. The second execution of Step (10) is applied to discard T and the second execution of Step (11) is used to merge T1 into T. This implies the hitting set T = {1001, 1011, 1101,
1111, 1010, 1110, 1100}.
In this paper, we propose the DNA-based algorithm to solve the hitting-set problem.
Nowadays, many NP-complete problems which could not be solved by a traditional digital computer is now tried to be solved by
DNA-based algorithm. Even so, it is still very difficult to support biological operations using mathematical instructions. In the future, there are still many difficulties to be overcome and we hope that DNA-based supercomputing could become a reality someday.