PPT1

advertisement
Adleman
and
computing
on
a
surface
Course outline
1
Introduction
2
Theoretical background
Biochemistry/molecular biology
3
Theoretical background computer science
4
History of the field
5
Splicing systems
6
P systems
7
Hairpins
8
Detection techniques
9
Micro technology introduction
10
Microchips and fluidics
11
Self assembly
12
Regulatory networks
13
Molecular motors
14
DNA nanowires
15
Protein computers
16
DNA computing - summery
17
Presentation of essay and discussion
Who’s who?
Tom Head
Department of Mathematical Sciences
Binghamton University
Areas of interest
Algebra
Computing with biomolecules
Formal representations of communication
http://www.math.binghamton.edu/tom/
Leonard Adleman
Department of Computer Science
Areas of interest

Method for Obtaining Digital Signatures and
Public-Key Cryptosystems
Turing Award 2002

Distinguishing Prime Numbers From Composite
Numbers

The First Case of Fermat's Last Theorem

Primality
Testing
And
Two
Dimensional
Abelian Varieties Over Finite Fields

Molecular
Computation
of
Combinatorial Problem
http://www.usc.edu/dept/molecular-science/fm-adleman.htm
Solutions
To
Richard Lipton
Theoretical Computer Science College of
Computing, Georgia Tech
Areas of interest

Algorithms and Complexity Theory

Cryptography

DNA Computing
http://www.cc.gatech.edu/computing/Theory/theory.html
Laura Landweber
Dept. of Ecology and Evolutionary Biology
Princeton University
Areas of interest
Origins
the
of Genes, Genomes
Genetic Code
Early
Pathways of RNA Evolution
Scrambled
RNA
Editing
Gene
DNA
http://www.princeton.edu/~lfl/
Genes
Scrambling
Computing
John Reif
Computer Science
Duke University
Areas of interest
DNA
nanostructures
Molecular
Computation
Efficient
Algorithms
Parallel
Computation
Robotic
Motion Planning
Optical
Computing.
http://www.cs.duke.edu/~reif/
Erik Winfree
Computer Science
Computation and Neural Systems
Caltech,
Areas of interest
MacArthur Fellow 2000
DNA-based
computers
Computing
by self-assembly
Genetic
Signal
Regulatory Networks
Transduction Cascades
Ribosomal
DNA
Translation
and RNA folding
http://www.dna.caltech.edu/~winfree/
Nadrian Seeman
Department of Chemistry
New York University
Areas of interest
DNA
Nanotechnology
Macromolecular
Biophysical
Design and Topology
Chemistry of
Recombinational Intermediates
DNA-Based
Computation
Crystallography
http://www.nyu.edu/pages/chemistry/faculty/seeman.html
Robert Corn
Chemistry Department
University of Wisconsin
Areas of interest
surface
plasmon resonance (SPR) to monitor
biopolymer adsorption, the chemical
modification of surfaces,
characterization
electron
of molecular monolayers
transfer processes at
liquid/liquid electrochemical interfaces.
DNA computing algorithms at surfaces
multilayer
polyelectrolyte films for ion
transport applications.
http://corninfo.chem.wisc.edu/
Hagiya Masami
Department of Computer Science,
University of Tokyo
Areas of interest
Automated
Deduction, Formal
Verification and Programming Languages
Bio-Computing
Hybrid
http://hagi.is.s.u-tokyo.ac.jp
Systems...
Akira Suyama
Graduate School of Arts and Sciences,
University of Tokyo
Areas of interest
SNPs
Probe
design DNA chips
Quantitative
Hybrid
gene expression
Systems...
http://talent.c.u-tokyo.ac.jp/suyama/
John Rose
Department of Computer Science,
University of Tokyo
Areas of interest

the DNA chip, especially Tag-Antitag
Systems

Whiplash PCR, a simple autonomous DNA
computer

equilibrium chemistry/statistical
thermodynamic model
http://hagi.is.s.u-tokyo.ac.jp/~johnrose/
Gheorghe Păun
Institute of Mathematics of
the Romanian Academy
Areas of interest
Formal
language theory (and applications)
Combinatorics
on words
Semiotics
operational
DNA
Computing
Membrane
http://stoilow.imar.ro/~gpaun/
research
Computing
Grzegorz Rozenberg
Institute of Advanced Computer Science
University of Leiden
Areas of interest
Molecular
Computing
Evolutionary
Neural
Algorithms
Networks
http://www.wi.leidenuniv.nl/~rozenber/
Giancarlo Mauri
Dipartimento di Informatica,
Sistemistica e Comunicazione (DISCo)
Milano
Areas of interest
H
systems
P
systems
Neural
Networks
http://bioinformatics.bio.disco.unimib.it/
Ehud Shapiro
Computer Science and Applied Mathematics
the Weizmann Institute
Areas of interest
DNA
as input fuel
Biological
Turing
nanocomputer
machine-like model
http://www.weizmann.ac.il/mathusers/lbn/index.html
Byoung-Tak Zhang
School of Computer Science and Engineering
Seoul National University
Areas of interest
Evolutionary
Neural
Intelligence
Intelligence
Molecular
Intelligence
Computational
http://scai.snu.ac.kr/~btzhang/
Learning Theory
Danny van Noort
School of Computer Science and Engineering
Seoul National University
Areas of interest
microstructure
design and fabrication
DNA-hybridisation
instrumentation
fluorescent
affinity
protein
DNA
biosensors
chips
computing
cell
http://bi.snu.ac.kr/~danny/
microscopy
behaviour
NP complete problems
The theory of NP-completeness

Tractable and intractable problems

NP-complete problems
Classifying problems

Classify
problems
as
tractable
or
intractable.

Problem is tractable if there exists at least
one
polynomial
bound
algorithm
that
solves
it.

An algorithm is polynomial bound if its worst
case growth rate can be bound by a polynomial
p(n) in the size n of the problem
p(n)  an n  ...  a1n  a0 where k is a constant
k
Intractable problems
•
Problem is intractable if it is not tractable.
•
All algorithms that solve the problem are not polynomial
bound.
•
It has a worst case growth rate f(n) which cannot be bound
by a polynomial p(n) in the size n of the problem.
•
For intractable problems the bounds are:
f (n)  c , or n
n
log n
, etc.
Hard practical problems

There are many practical problems for which
no one has yet found a polynomial bound
algorithm.

Examples: traveling salesperson, 0/1
knapsack, graph coloring, bin packing etc.

Most design automation problems such as
testing and routing.

Many networks, database and graph problems.
The theory of NP-completeness

The
theory
of
NP-completeness
enables
showing that these problems are at least as
hard as NP-complete problems

Practical implication of knowing problem is
NP-complete
is
that
it
is
probably
intractable ( whether it is or not has not
been proved yet)

So
any
algorithm
that
solves
it
probably be very slow for large inputs
will
Decision problems
decision problem answers yes or no for a
given input

A

Examples:
G Is there a path from s to t
of length at most k?
 Given a graph
 Does graph
G contain a Hamiltonian cycle?
 Given a graph
G is it bipartite?
Decision problem: Hamiltonian cycle

A
Hamiltonian cycle of a graph G is a
cycle that includes each vertex of the
graph exactly once.

Problem: Given a graph G, does G have
a Hamiltonian cycle?
The class P

P is the class of decision problems that
are polynomial bounded

Is the following problem in P?
Given
a weighted graph G, is there a
spanning

tree
of
weight
at
most
B?
The decision versions of problems such as
shortest
distance,
tree belong to P
and
minimum
spanning
The class NP

NP
is
which
the
class
there
of
is
decision
a
problems
polynomial
for
bounded
verification algorithm

It can be shown that:

all decision problems in P, and

decision
problems
such
as
traveling
salesman, knapsack, bin pack, are also in
NP
The relation between P and NP

P  NP

If
a
time,
problem
a
algorithm
is
solvable
polynomial
time
can
be
easily
in
polynomial
verification
designed
that
ignores the certificate and answers “yes”
for all inputs with the answer “yes”.
The relation between P and NP

It
is
not

Problems

Problems in NP can be verified “quickly”.

It is easier to verify a solution than to
in
known
P
can
whether
be
solved
P
=
NP.
“quickly”
solve a problem.

Some researchers believe that P and NP
are not the same class.
NP-complete problems

A problem A is NP-complete if
1. It is in NP and
2. For every other problem A’ in NP, A’  A

A problem A is NP-hard if
For every other problem A’ in NP, A’  A
Examples of NP-complete problems

Cook’s theorem
Satisfiability is NP-complete

This was the first problem shown to be NP-complete

Other problems
the decision version of knapsack,
the decision version of traveling salesman
Satisfiability problem
The satisfiability problem

First, Conjunctive Normal Form (CNF)
will be defined

Then, the Satisfiability problem will
be defined
Conjunctive normal form (CNF)

A logical (Boolean) variable is a variable
that may be assigned the value true or false
(x, y, w and z are Boolean variables)

A
literal
is
a
logical
variable
or
the
negation of a logical variable (x and y are
literals)

A clause is a disjunction of literals
((wxy) and (xy) are clauses)
Conjunctive normal form (CNF)

A
logical
Conjunctive
(Boolean)
Normal
expression
Form
if
is
it
is
in
a
conjunction of clauses.

The
following
expression
is
conjunctive normal form:
(wxy)  (wyz)  (xy)  (wy)
in
The satisfiability problem

Is
there
variables
a
of
truth
a
assignment
logical
to
the
expression
n
in
Conjunctive Normal Form which makes the
value of the expression true?

For the answer to be yes, all clauses
must evaluate to true

Otherwise the answer is no
The satisfiability problem

x=F, y=F, w=T and z=T is a truth
assignment for:
(wxy)  (wyz)  (xy)  (wy)

Note that if y=F then y=T

Each clause evaluates to true
Adleman’s experiment
The 1994 experiment
DNA computer
The 1994 experiment
The 1994 experiment
Basic Idea
Perform
molecular
biology
experiment
to find solution to math problem.
Hamiltonian path

(Proposed by William Hamilton)

Given
a
connections
network
between
of
nodes
them,
is
and
directed
there
a
path
through the network that begins with the start
node and concludes with the end node visiting
each node only once (“Hamiltonian path")?

Does a Hamiltonian path exist, or not?”
Hamiltonian path does exist
end city
Detroit
Chicago
Boston
start city
Atlanta
Hamiltonian path does not exist
start city
Detroit
Chicago
Boston
end city
Atlanta
Solving the Hamiltonian problem
Generation-&-Test Algorithm
Step 1
Generate random paths on the network.
Step 2
Keep
only
those
paths
that
begin
with
start city and conclude with end city.
Step 3
If there are N cities, keep only those
paths of length N.
Step 4
Keep only those that enter all cities at
least
Step 5
once.
Any remaining paths are solutions (i.e.,
Hamiltonian paths).
The paths
[X]
D -> B -> A
[X]
B -> C -> D -> B -> A -> B
[X]
A -> B -> C -> B
[X]
C -> D -> B -> A
[x]
A -> B -> A -> D
[O]
A -> B -> C -> D
[X]
A -> B -> A -> B -> C -> D
Solving the Hamiltonian problem
Combinatorial explosion

The total number of paths grows exponentially
as the network size increases:

(e.g.) 106 paths for N=10 cities, 1012 paths
(N=20), 10100 paths!! (N =100)

The Generation-&-Test algorithm takes “forever”.
Some sort of smart algorithm must be devised;
none has been found so far (NP-hard).
Finding a solution with DNA
The key to solving the problem is using DNA to
perform the five steps of the Generation-&Test algorithm in parallel search, instead of
serial search.
Intermezzo: DNA polymerase

Protein that produces complementary DNA strand

A -> T, T -> A, C -> G, G -> C

Requires primer and starter

Enables DNA to reproduce
Intermezzo: DNA polymerase
The bio-nanomachine

hops onto DNA strand

slides along

reads each base

writes
its
onto new strand
complement
Experimental set-up
Ingredients and tools needed

DNA strands that encode city names and
connections between them

Polymerases, ligase, water, salt, other
ingredients

Polymerase chain reaction (PCR) set

Gel
electrophoresis
tool
out non-solution strands)
(that
filters
Gel electrophoresis
Solving a Hamiltonian path problem
end city
Detroit
Chicago
Boston
start city
Atlanta
City coding
CITY
DNA NAME
ATLANTA
ACTTGCAG
BOSTON
TCGGACTG
CHICAGO
GGCTATGT
DETROIT
CCGAGCAA
CONNECTING PATH
ATLANTA-BOSTON
ATLANTA-DETROIT
BOSTON-CHICAGO
BOSTON-DETROIT
BOSTON-ATLANTA
CHICAGO-DETROIT
COMPLEMENT
TGAACGTC
AGCCTGAC
CCGATACA
GGCTCGTT
DNA PATH
GCAGTCGG
GCAGCCGA
ACTGGGCT
ACTGCCGA
ACTGACTT
ATGTCCGA
City coding with DNA
Boston
Atlanta
Atlanta -Boston
GCAGTCGG
TGAACGTC AGCCTGAC
Atlanta
Boston
Possible paths
end city
Detroit
Chicago
Boston
start city
Atlanta
Atlanta-Boston
Atlanta*
Boston-Chicago
Boston*
Chicago-Detroit
Chicago*
Detroit*
Possible paths
end city
Detroit
Chicago
Boston
start city
Atlanta
Boston-Atlanta
Boston*
Atlanta-Detroit
Atlanta*
Detroit*
In pictures
The DNA experiment
1. In a test tube, mix the prepared DNA pieces
together (which will randomly link with each
other, forming all different paths).
2. Perform PCR with two ‘start’ and ‘end’ DNA
pieces
as
primers
(which
creates
millions’
copies of DNA strands with the right start
and end).
3. Perform gel electrophoresis to identify only
those pieces of right length (e.g., N=4).
The DNA experiment
4. Use DNA ‘probe’ molecules to check whether
their
paths
pass
through
all
intermediate
cities.
5. All DNA pieces that are left in the tube
should
be
precisely
those
representing
Hamiltonian paths.

If the tube contains any DNA at all, then
conclude that a Hamiltonian path exists, and
otherwise not.

When it does, the DNA sequence represents
the specific path of the solution.
Summary and conclusion
Why does it work?

Enormous parallelism, with 1023 DNA pieces
working
in
parallel
to
find
solution
simultaneously.

Takes
less
than
a
week
(vs.
thousands
years for supercomputer)
Extraordinary energy efficient

(10-10 of supercomputer energy use)
Note this is a Universal Turing machine
Experimental set-up
Experimental set-up
CAPTURE LAYER (-R or G)
Experimental set-up
CAPTURE LAYER (-R or G)
-
+
Experimental set-up
CAPTURE LAYER (-R or G)
-
+
Experimental set-up
CAPTURE LAYER (-R or G)
-
+
Experimental set-up
CAPTURE LAYER (-R or G)
-
HOT
+
Experimental set-up
Experimental set-up
Experimental set-up
DNA computing on a surface
DNA computing on surfaces
DNA computing on surfaces

Advantages over “solution phase” chemistry
Facile purification steps
Reduced interference between strands
Easily automated

Disadvantages:
Loss of information density (2D)
Lower surface hybridization efficiency
Slower surface enzyme kinetics
DNA surface model: input
DNA strands representing the set {0,1}^n are
synthesized and subsequently immobilized on
a surface in a non-addressed fashion
Encoding binary information
Word
Bit
A strand is comprised of
words.
1
2
3
4
1
2
3
4
1
2
3
4
.
.
.
short
Each
word
is
a
DNA strand (16mer)
representing one or more
bits.
DNA word design problem

Requirements of a “DNA code”
 Success
in
specific hybridization
between
a
DNA code word and its Watson-crick complement
 Few false positive signals

Virtually
all
designs
enforce
combinatorial
constraints on the code words

Applications:
 Information
storage,
retrieval
for
computing
 Molecular bar codes for chemical libraries
DNA
DNA word design problem

Hamming: distance between two code words
should be large

Reverse complement: distance between a
word
and
the
reverse
complement
of
another word should be large

Also: frame shift, distinct sub-words,
forbidden sub-words, …
Work on DNA code design

Seeman (1990): de novo design of sequences
for nucleic acid structural engineering

Brenner (1997): sorting polynucleotides
using DNA tags

Shoemaker et al. (1996): analysis of yeast
deletion mutants using a parallel molecular
bar-coding strategy

Many other examples in DNA computing
Word design example
DNA surface model: process
MARK
strands in which bit j = 0 (or 1):
hybridize with Watson-Crick complements of
word
containing
polymerization
DESTROY
UNMARK
bit
j,
followed
by
DNA surface model: process
MARK
strands in which bit j = 0 (or 1)
DESTROY
unmarked strands:
exonuclease degradation
UNMARK
DNA surface model: process
MARK strands in which bit j = 0 (or 1):
hybridize with Watson-Crick complements of word
containing bit j, followed by polymerization
DNA surface model: process
MARK
strands in which bit j = 0 (or 1)
DESTROY
UNMARK
unmarked strands
strands:
wash in distilled water
DNA surface model: output
Detect remaining strands (if any) by
detaching
amplifying
strands
from
using
PCR
chain reaction).
surface
and
(polymerase
Computational power
Theorem
can
Any CNFSAT
be
computed
formula of size m
using
O(m)
mark,
unmark and destroy operations.
Theorem
Any circuit of size m can be
computed
using
O(m)
mark,
destroy, and append operations.
unmark,
The satisfiability problem
Input
16 strands
Process
MARK if bit z = 1
MARK if bit w = 1
MARK if bit y = 0
DESTROY
UNMARK
MARK if bit w = 0
MARK if bit y = 0
DESTROY
UNMARK
…
Output
and
or
or
not
z
exactly those strands that satisfy
the circuit remain on the surface.
w
or
not
y
or
not
x
4-variable SAT demo
(wxy)  (wyz)  (xy)  (wy)
{0000}
{0010}
{0100}
{0110}
{1000}
{1010}
{1100}
{1110}
{0001}
{0011}
{0101}
{0111}
{1001}
{1011}
{1101}
{1111}
4-variable SAT demo
4-variable SAT demo
4-variable SAT demo

The
logic
computation
leading
at
types
of
of
in
the
the
each
end
DNA
DNA
cycle,
to
four
molecules
remaining on the surface.

The
identity
of
those
molecules that correspond to
the solutions was
by PCR.

Solution:
S3
S7
S8
S9
determined
4-variable SAT, the answers
S3: w=0, x=0, y=1, z=1
S7: w=0, x=1, y=1, z=1
S8: w=1, x=0, y=0, z=0
S9: w=1, x=0, y=0, z=1
y=1:
(w V x V y)
z=1:
(w V y V z)
x=0 or y=1:
(x V y)
w=0:
(w V y)
4-variable SAT demo
Synthesize;
Attach
Mark
Destroy
Unmark
Readout
Cycle
4-variable SAT demo
Conclusions

Solid-phase chemistry is a promising approach
to DNA computing

DNA
computing
will
require
greatly
improved
DNA surface attachment chemistries and control
of chemical and enzymatic processes
Download