• Introduction
• Swarm Intelligence
• Neural networks
• Evolutionary computing
• Special Topics
– 2 or 3 lectures on other areas of bioinspired computing
– Not an exhaustive survey!
– Today: DNA Computing
– Next: A Bio-inspired Approach to Grid Resource Allocation
• Guest lecturer
– And possibly one more – watch the news group
1
"the smallest biological computing device"
(Guinness World Records)
Biochemical "nanocomputers" already exist in nature; they are manifest in all living things. But they're largely uncontrollable by humans. We cannot, for example, program a tree to calculate the digits of pi!
The idea of using DNA to store and process information started in 1994 when a California computer scientist,
Leonard Adleman, used DNA in a test tube to solve a simple mathematical problem: (a variant of) the travelling salesman problem.
3
• Given a graph with directed edges, find a Hamiltonian Path , i.e. a path which starts at one node, finishes at another, and goes through all other nodes exactly once.
• Variant of the travelling salesman problem where all roads are the same length:
• A salesman wants to travel over a fixed set of roads between N different towns without ever coming back to a town he has already visited.
Find a sequence of fights which goes from Fresno to Boston landing at all other airports exactly once.
4
A P problem is one which can be solved in polynomial time, basically at a rate which is some fixed power of N, where N is the size of the problem. An NP hard problem is one for which no one knows an algorithm which does not take exponential time (2 or some other power of a number > 1).
Methods taking exponential operations can work out whether or not such a route exists and report it if it does, but even for small problems they take too much time to be practical.
Any NP problem can be transformed to another in P time – so solve one efficiently and can solve any NP problem efficiently!
5
• he had 3x10 13 copies of each of 20 or so molecules, so about six hundred million million molecules at his service. This was like having six hundred million million (albeit very rudimentary) computers working on the problem at the same time
• Parallel approach
– Like using lots of passengers to find Hamiltonian path in flight network
• ingenious set of steps, using recombinant DNA technology, to filter out all except the strands representing a Hamiltonian path.
• Adleman uses DNA molecules for their computational ability alone. As a part of DNA's replication mechanism, a strand can recognize complementary strands or parts of strands by bonding to them. This is the only computation these molecules can perform, but it turns out to be enough.
– Doesn’t replicate/use much of the way DNA works in living organisms (e.g. involving amino acids/proteins)
6
•
DNA (Deoxyribonucleic acid) is the molecular basis of genetics. For our purposes, the following features are important.
• A DNA molecule is made of two intertwined, parallel strands (the “double helix'‘”.
•
Each strand has the following structure: p p p p = phosphate
.. \ / \ / \ / \ / ... s s s s s = sugar
| | | b b b b b = base where each b can be any one of four bases:
= adenine, = thymine, = cytosine, = guanine.
Two consecutive s-p bonds must occur at distinct places on the s molecule: one at the 3' (``three-prime'') position and one at the 5', so each strand has a 3'-end and a 5'-end, and so strands can be systematically oriented . We will write when the strand is being read from the 3'-end to the 5'-end, and when the strand is being read the opposite way, that is, from the 5'-end to the 3'-end
.
7
• The two intertwined strands in a DNA molecule have opposite orientations, and complementary base sequences. This means that opposite every on one strand is a on the other, and opposite every is an .
Likewise opposite every on one strand is a on the other, and opposite every is a . A typical stretch of the
DNA molecule looks like:
Usually abbreviated to
8
• Adleman assigned to each vertex, and to each link, a single DNA strand 20 bases long.
• E.g.
Vertex 2 TATCGGATCG GTATATCCGA
Vertex 3 GCTATTCGAG CTTAAAGCTA
Vertex 4 GGCTAGGTACCAGCATGCTT
Link 2 !
3 GTATATCCGA GCTATTCGAG
- Note that Link 2 !
3 is made of the last half of 2 plus the first half of 3.
Link 3 !
4 CTTAAAGCTAGGCTAGGTAC
He used a slightly different representation for start and finish nodes, which we will ignore here.
We will use a simplified representation with just 8 bases in rest of these slides
9
10
In the experiment, strands representing the flights are mixed in a test tube with the complements to the strands representing the airports. If
Atlanta* represents the complementary strand, etc., this gives
The test-tube then holds
11
Shake the test-tube: generate all possible routes
• complementary sections of strand will bond, yielding products of reaction like:
DNA synthesizer with 10 13 copies of each city* and flight to ensure all routes are generated. Still fits in a test-tube!
12
Using recombinant chemistry to extract the solution
• In Adleman's Hamiltonian path experiment (involving seven
``airports'' and fifteen ``flights'') the yield of the first mixing included molecules like
But also like: in which an airport is visited more than once. Also the sequences shown here visit only three of the seven airports, and none of them start with Fresno or end with Boston.
13
Using recombinant chemistry to extract the solution (contd)
A sequence of steps (the chemical details are given in Adleman's article) uses recombinant DNA technology to eliminate
• All molecules which do not start with Fresno* and do not end with Boston*.
–
Polymerase chain reaction
• All molecules which do not contain exactly 7 airports (i.e. all molecules which do not have a certain exact length).
–
Gel Electrophoresis
•
All molecules which contain a repeated airport.
– Affinity purification
•
If there is anything left, it must be molecules encoding a path that goes from Fresno to Boston visiting each of the other airports exactly once: the graph has a
Hamiltonian path.
•
In our example there is exactly one such path, which can be read off by analysis of the yield of the complete experiment
–
Graduated PCR
• It is interesting that even though this whole procedure is completely artificial, the
``technology'' which permits the various steps comes from the harnessing of the enzymes used by cells themselves to replicate, to transcribe and when necessary to destroy DNA.
14
• The initial calculation took approximately one second but Adleman’s extrapolation was performed over a period of a week.
• Method scales in time, but not space – needs exponential amount of DNA – for
200 cities DNA would weigh more than the earth!
• Stochastic, so possible problems of compounding errors
• There are many advantages of DNA Computing such as performing millions of operations simultaneously, generating a complete set of potential solutions, conducting large parallel searches, efficiently handling massive amounts of working memory.
• None of the researchers have been able to redo the experiment done by Adleman in the wet lab. The problem is because of the underlying assumption that the biological operations are error-free. Adleman talked of a week of work in lab, but tuning such an experiment could take one month work.
• Many researchers thought of implementing the idea on DNA Computing in actual computers but there are advantages and disadvantages and because of the technological issues DNA Computers are not in use yet.
• DNA Computers have amazing property of parallel computing, incredibly light weight, consumes low power and solves complex problems quickly.
• DNA Computers have a great disadvantage of time, they are occasionally slower and are not reliable.
15
DNA computers can perform rudimentary functions.
DNA computers are programmable, but not universal.
Speed and size:
DNA computers surpass conventional computers:
The DNA molecule found in the nucleus of all cells can hold more information in a cubic centimeter than a trillion music CDs. A spoonful of DNA
"computer soup" contains 15,000 trillion computers.
16
Parallelism:
DNA strands produce billions of potential answers simultaneously.
Energy efficiency:
A biological system such as a cell can perform 2x10 19 power operations using one joule of energy (the amount of energy needed to burn a 100-watt light bulb for a second), while a supercomputer only manages 10 10 operations, making it 10 10 less energy efficient!
17
• The maximal clique problem has been solved by means of molecular biology techniques. A pool of
DNA molecules corresponding to the total ensemble of six-vertex cliques was built, followed by a series of selection processes. The algorithm is highly parallel and has satisfactory fidelity. This work represents further evidence for the ability of
DNA computing to solve NP-complete search problems. (Science 1997)
18
• A 20-variable instance of the NP-complete threesatisfiability (3-SAT) problem was solved on a simple
DNA computer. The unique answer was found after an exhaustive search of more than 1 million (220) possibilities. This computational problem may be the largest yet solved by nonelectronic means. Problems of this size appear to be beyond the normal range of unaided human computation. (Science 2002)
– 3SAT: given a set of triples of propositional variables, each representing a disjunction, find whether there is an assignment of
T/F to the variables to make all the disjunctions true.
19
• “Think of DNA as software, and enzymes as hardware. Put them together in a test tube. The way in which these molecules undergo chemical reactions with each other allows simple operations to be performed as a byproduct of the reactions. The scientists tell the devices what to do by controlling the composition of the DNA software molecules. It's a completely different approach to pushing electrons around a dry circuit in a conventional computer.” (National
Geographic)
• 330 trillion operations per second,
• the single DNA molecule that provides the computer with the input data also provides all the necessary fuel.
• programmable, but not universal
– can only answer yes or no
– E.g. can check whether a list of zeros and ones has an even number of ones.
The computer cannot count how many ones are in a list
20
21