Fast Parallel Molecular Solution To the Hitting-set Problem

advertisement
Fast Parallel Molecular Solution To the
Hitting-set Problem
Nung-Yue Shi and Chih-Ping Chu
Department of Computer Science and Information Engineering,
National Cheng Kung University,
Tainan City 701, Taiwan, Republic of China
E-mail: p7890112@ccmail.ncku.edu.tw
Abstract
The
hitting-set
problem
is
an
nucleotides. Nucleotides contain three
major components those are deoxyribose,
phosphate group, and the base. Different
nucleotides are tested by their base
which could be adenine (abbreviated as
A), guanine (G), cytosine (C) and
thymine (T). Two strands of DNA could
integer K  |S|, and we would like to
know if there is a subset S   S with
| S  |  K such that S  hits (contains) at
least one element from each subset in C.
In this paper, the DNA-based algorithm
is proposed to solve the hitting-set
problem. Furthermore, the simulated
experiment is applied to verify
correction of the proposed DNA-based
algorithm for solving the hitting-set
problem.
words:
chucp@csie.ncku.edu.tw
decide the inheritance model of human
beings. DNA is made up of a linear
chain of smaller units which is called
NP-complete problem in set theory.
Assume that we have collection C of
subsets of a finite set S, and a positive
Key
and
form a double helix if the respective
bases are the famous Watson-Crick
complements which are C matches G
and A matches T. Of course, the 3 end
(the 3rd carbon of the deoxyribose) will
match the 5 end (the 5th carbon
attaching a phosphate group) in each
strand.
DNA-based
Supercomputing,
the
Hitting-set
Problem, DNA-based Algorithm, the
NP-Complete Problems, Set Theory.
DNA-based Computing [1, 2, 3, 4, 5,
6, 8] treats the DNA strands as the bits
in the traditional digital computers, and
use the techniques such as PCR
(polymerase chain reaction), gel
electrophoresis, and enzyme reactions to
separate, concatenate, delete, and
duplicate the DNA strands [11].
1. Introduction
DNA (deoxyribonucleic acid) is the
main material of nucleus and could
1
Nowadays, we could produce roughly
bio-molecular operations and are used to
18
10 DNA strands in a test tube [9]. It
also means that we could represent 1018
bits of information. By the biological
operations in the following section, we
seem to have 1018 processors running in
parallel. The massive power of
parallelism could solve the most
intractable problem in computer science
so far [1, 4, 6, 7, 10, 12-20].
perform
calculation
and
logical
operations. So, bio-molecular programs
can be regarded as the arithmetic logic
unit of the von Neumann architecture. A
robot is used to automatically control the
operations of a tube (the memory and
the input/output subsystem) and
bio-molecular programs (the ALU). This
implies that the robot can be regarded as
the control unit of the von Neumann
2. Background
architecture.
A single DNA strand is chained from
the 3 end (attaching a hydroxyl group)
nucleotide to the next 5 end (attaching
3. A DNA-based Algorithm for
Solving the Hitting-set
Problem
a phosphate group) nucleotide via a
phosphate group, one by one nucleotide
and form a single DNA strand. If the
strand contains 20 nucleotides, we say it
is 20 mer. For a double stranded DNA,
3.1 Definition of the Hitting-set
Problem
Informally, we are given a collection
the length is counted by its base pairs. If
a double stranded DNA has the base
pairs 20, then we know it is made by
two single DNA strands each has the
length 20 mer [1, 5, 8, 9, 10].
C of subsets of a finite set S, and a
positive integer K, and would like to
find a subset S   S with | S  |  K such
that S  contains at least one element
from each subset in C. In other words,
S  hits (intersects) every subset in C.
The formal definition is described
below.
In bio-molecular computing, data
also are represented as binary patterns (a
sequence of 0s and 1s). Those binary
patterns are encoded by sequences of
bio-molecules and are stored in a tube.
This is to say that a tube is the only
storage area in bio-molecular computing
and is aslo the memory and the
input/output subsystem of the von
Neumann architecture. Bio-molecular
programs are made of a set of
Definition 3-1: Assume that a ground
set S and a collection C of subsets {C1,
C2, C3, … Cn} are given, where Ci is a
subset of S and a positive integer K ≦
|S|. The problem is to find if there is
some subset S  of S such that | S  |  K
and (Ci ∩ S  ) ≠ , where i = 1, 2, 3, …
n.
2
For example, S = {1, 2, 3, 4} and C
= {{1, 2, 3}, {4}}. The hitting-set for S
= {1, 2, 3, 4} and C = {{1, 2, 3}, {4}}
consists of {1, 4}, {2, 4}, {3, 4}, {1, 2,
4}, {1, 3, 4}, {2, 3, 4} and {1, 2, 3, 4}.
From definition 3-1, the answer of the
hitting-set problem for S = {1, 2, 3, 4}
and C = {{1, 2, 3}, {4}} contains {1, 4},
{2, 4} and {3, 4}.

0000
{2}
0010
{4}
1000
{1,3}
0101
{2,3}
0110
{3,4}
1100
{1,2,4}
1011
{2,3,4}
1110
subset
Encoding
number
3.2 Constructing Solution Space
of DNA Sequences for the
Hitting-set Problem
Suppose that an n-digit binary
number corresponds to each possible
hitting-set and n is the number of
elements of ground set S. The encoding
scheme is: if the ith element appears in
the subset, then the corresponding ith bit
0001
{3}
0100
{1,2}
0011
{1,4}
1001
{2,4}
1010
{1,2,3}
0111
{1,3,4}
1101
{1,2,3,4}
1111
3.3 The DNA Algorithm for
Solving the Hitting-set Problem
for the encoding number is 1, otherwise
it is 0. In a real world implementing
scheme, assume that an n-bit binary
number Q is represented by a binary
number z1, …, zn, where the value of zk
Given that S is a finite set and C is a
set of subsets, we define a literal zi1 to be
a logical variable which is the ith
element in the finite set S and is 1 since
it appears in the subset S  , and zi0 is also
the ith element in the finite set S and is 0
since it does not appear in the subset S  .
is either 1 or 0 for 1  k  n. A bit zk is
the kth bit in an n-bit binary number Q
and it represents the kth element in S. All
possible subsets S  of a ground set S =
{1, 2, 3, 4} are shown in Table 3-1.
The initial set T0 contains many strings,
each encoding a single n-bit sequence.
All possible 2n choices of subsets are
encoded in the tube T0.
Lipton does not define his biological
operations clearly in [5], but his solution
could be concluded in terms of the
operations described by Adleman in [11]
Table 3-1: each possible subsets S  of
a ground set S = {1, 2, 3, 4}.
subset
{1}
Encoding
number
3
which
is
discussed
in
this
paper
problem for an n-element set S and a
subsection 2.1. The initial set T contains
2n encoding number, each encoding a
single n-bit sequence representing a
subset S  . The pseudo-code algorithm of
solving the hitting-set problem for an
n-element set S and a collection of
subset C will proceed as follows:
collection of subset C.
(0a) Append-head(T1, z11).
(0b) Append-head(T2, z10).
(0c) T = (T1, T2).
(1) For k = 2 to n
(1a)
Amplify(T, T1, T2).
(1b)
Append-head(T1, zk1).
(1c)
Append-head(T2, zk0).
(1) Create initial set T
(2) For each subset do begin
(1d)
T = (T1, T2).
(1e) EndFor
(2) For a = 1 to |C| do begin
(3) For b = 1 to |Ca| do begin
(3)
For each element in a subset
do begin
(4)
if (the bth element in the ath
subset in C is the ith element in
S)
(5)
then begin
(6)
put the
subset whose ith encoding bit is
1 on Tb;
(7)
put the
(4)
If (the bth element in the
ath subset in C is the ith element
in S)
(5)
then begin
(6)
Tb= +(T,zi1)
subset whose ith encoding bit is
0 on T;
(8)
end
(9)
End for
(10) Delete T
(11) Create new set T by
merging those extracted strings
from Tb
(12) End for
(9)
(10)
Endfor
Discard (T)
(11)
T   (Tb )
Lemma 1: Algorithm 1 can be used to
solve the hitting-set problem for an
(13) If T is nonempty then T is the
hitting-set.
n-element set S and a collection of
subset C.
T = (T,zi1)
end
(7)
(8)
|Ca |
b 1
(12) Endfor
The above pseudo-code algorithm
could be rewritten by means of using
biological operations more formally:
Proof:
For 2n possible hitting-sets to an
n-element set S and a collection of
subset C bits, its solution space is
produced from each execution for Steps
Algorithm 1: Solving the hitting-set
4
(0a) through (1e). Step (2) is the outer
is 1 on T1, and put the subset whose
loop which is the number of subsets in C,
and step (3) is the inner loop which is
the number of elements in each subset in
C. Each time the outer loop (step 2) is
executed, the number of executions of
the inner loop is the number of elements
of that subset in the ath subset in C. Step
(6) and step (7) say that we extract the
subset whose zi is 1 and put it on test
tube Tb, extract the subset whose zi is 0
rightmost encoding bit is 0 on T, so we
get T1= {0001, 0011, 0101, 1001, 0111,
1011, 1101, 1111} and T = {0000, 0010,
0100, 1000, 0110, 1010, 1100, 1110}.
Next, the second execution of Step (6)
and Step (7) when a = 1 and b = 2, we
get T2 = {0010, 0110, 1010, 1110} and T
= {0000, 0100, 1000, 1100}. Then, the
third execution of Step (6) and Step (7)
when a = 1 and b = 3, we obtain T3 =
and put it on test tube T. When the inner
loop is ended, we discard T and merge
all Tb into T. Repeat the outer loop in the
{0100, 1100} and T = {0000, 1000}.
Because the first outer loop is ended, the
first execution of Step (10) is applied to
same way. When all outer loops are
ended, the hitting set is in test tube T.
discard test tube T and the first
execution of Step (11) is applied to
merge test tube T1, T2, T3 into T, we get
T = {0001, 0011, 0101, 1001, 0111,
1011, 1101, 1111, 0010, 0110, 1010,
1110, 0100, 1100}.
From Algorithm 1, it is very clear
that Steps (13) through (19) are used to
figure out the number of one for those
hitting-sets in T0. Next, Step (20) is the
last loop and is used to find the answer.
If the kth execution of Step (21) returns
a “yes”, then Step (23) is applied to read
the answer. Otherwise, repeat to execute
Step (20) through Step (25) until the
answer is found.
For the second outer loop when a =
2 and b = 1, from the fourth execution of
Step (6) and Step (7), we get T1 = {1001,
1011, 1101, 1111, 1010, 1110, 1100} and
T = {0001, 0011, 0101, 0111, 0010,
0110, 0100}. Because the second subset
has only one element, the second outer
loop is ended also. The second execution
of Step (10) is applied to discard T and
3.4 The Power of the DNA
Algorithm for Solving the
Hitting-set Problem
the second execution of Step (11) is used
to merge T1 into T. This implies the
hitting set T = {1001, 1011, 1101, 1111,
1010, 1110, 1100}.
The example for S = {1, 2, 3, 4} and
C = {{1, 2, 3}, {4}} in subsection 3.1 is
applied to show the power of Algorithm
1. The first execution of Step (6) and
Step (7) when a = 1 and b = 1, we put
the subset whose rightmost encoding bit
4. Discussion and Conclusion
In this paper, we propose the
5
DNA-based algorithm to solve the
series in Discrete Mathematics and
hitting-set problem. Nowadays, many
NP-complete problems which could not
be solved by a traditional digital
computer is now tried to be solved by
DNA-based algorithm. Even so, it is still
very difficult to support biological
operations
using
mathematical
instructions. In the future, there are still
many difficulties to be overcome and we
hope that DNA-based supercomputing
Theoretical
Computer
Science,
American Mathematical Society, pp.
1-29, 1999.
[6] D. Boneh, C. Dunworth, R. J. Lipton
and J. Sgall. “On the Computational
Power of DNA”. Discrete Applied
Mathematics, Special Issue on
Computational Molecular Biology,
Volume 71, pp. 79-94, 1996.
[7] D. Boneh, C. Dunworth, and R. J.
Lipton. “Breaking DES using a
molecular
computer”.
In
Proceedings of the 1st DIMACS
could become a reality someday.
References
Workshop
on
DNA
Based
Computers,
1995,
American
Mathematical Society. In DIMACS
Series in Discrete Mathematics and
Theoretical
Computer
Science,
Volume 27, pp. 37-66, 1996.
[1] L. Adleman, “Molecular
computation of solutions to
combinatorial problems”, Science,
266:1021-1024, Nov. 11, 1994
[2] D. Beaver,” A Universal Molecular
[4] R. J. Lipton. “DNA Solution of Hard
Computational Problems”. Science,
268, pp. 542-545, 1995.
[8] M. Amos, Theoretical and
Experimental
DNA
Computation.
Springer, 2005.
[9] R. R. Sinden, DNA Structure and
Function, Academic Press., 1994.
[10] Leonard M. Adleman. On
constructing a molecular computer.
[11] R. S. Braich, C. Johnson, P. W. K.
Rothemund, D. Hwang, N.
Chelyapov, M. Leonard, and L. M.
[5] S. Roweis, E. Winfree, R. Burgoyne,
N. V. Chelyapov, M. F. Goodman,
Paul W.K. Rothemund and L. M.
Adleman. “A Sticker Based Model
for DNA Computation”. 2nd annual
workshop on DNA Computing,
Princeton University. Eds. L.
Landweber and E. Baum, DIMACS:
Adleman, “Solution of a
satisfiability problem on a
gel-based DNA computer” in
Proceedings of the Sixth
International Conference on DNA
Computation ( DNA 2000 ), Lecture
Notes in Computer Science 2054,
pp. 27-42,2001
Computer”, Penn State University
Tech Report CSE-95-001.
[3] R. P. Feynman. “In Minaturization”.
D.H. Gilbert, Ed., Reinhold
Publishing Corporation, New York,
1961, pp. 282-296.
6
Download